git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps
@ 2024-03-20 22:04 Taylor Blau
  2024-03-20 22:05 ` [PATCH 01/24] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
                   ` (27 more replies)
  0 siblings, 28 replies; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:04 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

This series implements a new idea in the pack-bitmap machinery called
"pseudo-merge reachability bitmaps".

BACKGROUND
==========

Pseudo-merge bitmaps conceptually represent storing bitmaps for
octopus-merges covering the reference tips in a repository not selected
for bitmaps, decreasing the size (number of parents) of consecutive
merges according to an inverse power-law.

(Note that a complete description of the on-disk format can be found in
Documentation/technical/bitmap-format.txt, and the configuration
details in the "bitmapPseudoMerge" section of 'git-config(1)').

Commits are assigned into pseudo-merge group(s) according to a number of
parameters unique to each defined group, like so:

    [bitmapPseudoMerge "name"]
        pattern = "refs/*"
        decay = 100
        sampleRate = 100
        threshold = 1.week.ago
        maxMerges = 64
        stableThreshold = 1.month.ago
        stableSize = 1024

Users can create zero or more pseudo-merge groups. Pseudo-merge groups
look at reference tips matching the given pattern, and use the commits
at the tips of those (matching) references as candidates for forming the
pseudo-merge groups.

The full details are spelled out in git-config(1), but the gist is as
follows:

  - "decay" controls the decay rate between consecutive groups
    (effectively setting 'k' in the decay function 'f(n)=C*n^-k')

  - "sampleRate" controls how often (between 0 and 100, inclusive) to
    select a candidate for inclusion within a pseudo-merge group

  - "threshold" sets the minimum age for an un-bitmapped reference tip
    to join a pseudo-merge gruop

  - "maxMerges" controls how many unstable pseudo-merge groups we'll
    create

In addition, there are a pair of parameters controlling "stable"
pseudo-merge groups, meant to cover commits that are so old as to be
highly unlikely to change. "stableThreshold" controls the minimum age,
and "stableSize" controls the fixed size of each of these groups. Stable
pseudo-merges do not follow the inverse power-law decay function as
above.

Pseudo-merges can be grouped evevn further if the given pattern has one
or more capture groups, similar to delta islands. In a fork-network
repository (where each fork is a remote, and all objects live in
aggregate in a repository called "network.git"), you can group
branches/tags separately by fork ID like so:

    [bitmapPseudoMerge "branches"]
        pattern = "refs/remotes/([0-9])+/heads/"
        [...]

    [bitmapPseudoMerge "tags"]
        pattern = "refs/remotes/([0-9])+/tags/"
        [...]

Pseudo-merges are an additive optimization. A pseudo-merge may only be
used when all of its parents are part of either the "haves" or "wants"
side of a reachability query.

This is why groups with "older" commits tend to be larger (since they
are less likely to change, thus more likely to be usable over time). The
same is true for "newer" pseudo-merge groups being smaller.

USAGE
=====

Here's a best-case scenario for using pseudo-merge bitmaps. This is a
local copy of a relatively large (private) repository that has a large
number of references (~44k) with poor bitmap coverage, spike-y branches,
and deep-ish trees.

First, we generate a new bitmap with 64 pseudo-merge commits covering
the un-bitmapped parts of this repository, like so.

    $ GIT_TRACE2_PERF=1 GIT_TRACE2_PERF_BRIEF=1 \
      git.compile \
        -c bitmapPseudoMerge.all.pattern='refs/' \
        -c bitmapPseudoMerge.all.threshold=now \
        -c bitmapPseudoMerge.all.stableThreshold=never \
        -c bitmapPseudoMerge.all.maxMerges=64 \
        -c pack.writeBitmapLookupTable=true \
        repack -adb
    [...]
    d1 | main | data         | r1  | 476.227928 |  0.058245 | progress     | ....total_objects:1
    d1 | main | region_leave | r1  | 476.227938 |  0.058255 | progress     | ..label:Selecting pseudo-merge commits
    d1 | main | region_enter | r1  | 476.227946 |           | progress     | ..label:Building bitmaps
    d1 | main | region_enter | r1  | 476.227949 |           | pack-bitmap- | ....label:building_bitmaps_total
    d1 | main | data         | r1  | 476.228348 |  0.000399 | bitmap       | ......opened bitmap file:.git/objects/pack/pack-47e6bced8a8f64d82802a77f2f1cf5eeac24f295.pack
    d1 | main | data         | r1  | 485.947246 |  9.719297 | pack-bitmap- | ......num_selected_commits:733
    d1 | main | data         | r1  | 485.947269 |  9.719320 | pack-bitmap- | ......num_maximal_commits:115
    d1 | main | region_leave | r1  | 1033.008932 | 556.780983 | pack-bitmap- | ....label:building_bitmaps_total
    d1 | main | data         | r1  | 1033.008953 | 556.781007 | pack-bitmap- | ....building_bitmaps_reused:700
    d1 | main | data         | r1  | 1033.008957 | 556.781011 | pack-bitmap- | ....building_bitmaps_pseudo_merge_reused:0
    d1 | main | data         | r1  | 1033.008968 | 556.781022 | progress     | ....total_objects:733
    d1 | main | region_leave | r1  | 1033.008971 | 556.781025 | progress     | ..label:Building bitmaps

(Note: this repository had an up-to-date .bitmap prior to generating
pseudo-merges, so we spend about ~9 minutes (on an unoptimized build)
generating psuedo-merges. This feels expensive, but maybe my intuition
is wrong. I've spent quite a bit of time trying to drive this down, but
I still think there is some medium-hanging fruit left here to get this
to be even cheaper.)

PERFORMANCE
===========

From there, we can compare the time it takes to generate a count of
reachable objects with and without using pseudo-merge bitmaps:

    $ hyperfine -L v ,.compile 'git{v} rev-list --all --objects --count --use-bitmap-index'
    Benchmark 1: git rev-list --all --objects --count --use-bitmap-index
      Time (mean ± σ):     16.129 s ±  0.079 s    [User: 15.681 s, System: 0.446 s]
      Range (min … max):   16.029 s … 16.243 s    10 runs

    Benchmark 2: git.compile rev-list --all --objects --count --use-bitmap-index
      Time (mean ± σ):     874.9 ms ±  20.4 ms    [User: 611.4 ms, System: 263.3 ms]
      Range (min … max):   847.1 ms … 904.3 ms    10 runs

    Summary
      git.compile rev-list --all --objects --count --use-bitmap-index ran
       18.43 ± 0.44 times faster than git rev-list --all --objects --count --use-bitmap-index

Similarly for clone performance (this impact here is less dramatic than
above, primarily because we end up accumulating most objects via
pack-reuse, but enumeration is faster by roughly the same quantity as
above):

    $ hyperfine --runs=3 -L v ,.compile 'git{v} pack-objects \
      --all --use-bitmap-index --delta-base-offset \
      --stdout --all-progress </dev/null >/dev/null'
    Benchmark 1: git pack-objects \
      --all --use-bitmap-index --delta-base-offset \
      --stdout --all-progress </dev/null >/dev/null
      Time (mean ± σ):     93.811 s ±  2.202 s    [User: 80.905 s, System: 7.674 s]
      Range (min … max):   92.136 s … 96.305 s    3 runs

    Benchmark 2: git.compile pack-objects \
      --all --use-bitmap-index --delta-base-offset \
      --stdout --all-progress </dev/null >/dev/null
      Time (mean ± σ):     79.608 s ±  0.764 s    [User: 68.304 s, System: 7.215 s]
      Range (min … max):   78.912 s … 80.425 s    3 runs

    Summary
      git.compile pack-objects --all --use-bitmap-index --delta-base-offset --stdout --all-progress </dev/null >/dev/null ran
        1.18 ± 0.03 times faster than git pack-objects --all --use-bitmap-index --delta-base-offset --stdout --all-progress </dev/null >/dev/null

In a smaller repository, like git.git, the perfomrance number are much
less dramatic (here we're generating the numbers from the new p5333
performance test), but show that we do not pay a performance penalty for
using pseudo-merges:

    Test                                                                this tree
    -----------------------------------------------------------------------------------
    5333.2: git rev-list --count --all --objects (no bitmaps)           3.46(3.37+0.09)
    5333.3: git rev-list --count --all --objects (no pseudo-merges)     0.13(0.11+0.01)
    5333.4: git rev-list --count --all --objects (with pseudo-merges)   0.12(0.11+0.01)

CONCLUSION
==========

This series turned out to be quite the adventure to work on ;-). Many
thanks to Peff, who developed this idea together with me during the end
of October, 2021.

Review on this large topic is greatly appreciated. Likewise, if folks
have good ideas on either (a) how to speed-up generating these bitmaps,
or (b) a sense of whether or not the current performance is acceptable,
I am all ears.

Thanks in advance for anyone interested in reviewing this topic, and I
look forward to your feedback!

Taylor Blau (24):
  Documentation/technical: describe pseudo-merge bitmaps format
  config: repo_config_get_expiry()
  ewah: implement `ewah_bitmap_is_subset()`
  pack-bitmap: drop unused `max_bitmaps` parameter
  pack-bitmap: move some initialization to `bitmap_writer_init()`
  pseudo-merge.ch: initial commit
  pack-bitmap-write: support storing pseudo-merge commits
  pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()`
  pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public
  pseudo-merge: implement support for selecting pseudo-merge commits
  pack-bitmap-write.c: select pseudo-merge commits
  pack-bitmap-write.c: write pseudo-merge table
  pack-bitmap: extract `read_bitmap()` function
  pseudo-merge: scaffolding for reads
  pack-bitmap.c: read pseudo-merge extension
  pseudo-merge: implement support for reading pseudo-merge commits
  ewah: implement `ewah_bitmap_popcount()`
  pack-bitmap: implement test helpers for pseudo-merge
  t/test-lib-functions.sh: support `--date` in `test_commit_bulk()`
  pack-bitmap.c: use pseudo-merges during traversal
  pack-bitmap: extra trace2 information
  ewah: `bitmap_equals_ewah()`
  pseudo-merge: implement support for finding existing merges
  t/perf: implement performace tests for pseudo-merge bitmaps

 Documentation/config.txt                     |   2 +
 Documentation/config/bitmap-pseudo-merge.txt |  75 ++
 Documentation/technical/bitmap-format.txt    | 205 +++++
 Makefile                                     |   1 +
 builtin/pack-objects.c                       |   3 +-
 config.c                                     |  18 +
 config.h                                     |   2 +
 ewah/bitmap.c                                |  76 ++
 ewah/ewok.h                                  |   3 +
 midx.c                                       |   3 +-
 pack-bitmap-write.c                          | 275 ++++++-
 pack-bitmap.c                                | 359 ++++++++-
 pack-bitmap.h                                |  16 +-
 pseudo-merge.c                               | 739 +++++++++++++++++++
 pseudo-merge.h                               | 218 ++++++
 t/helper/test-bitmap.c                       |  34 +-
 t/perf/p5333-pseudo-merge-bitmaps.sh         |  32 +
 t/t5333-pseudo-merge-bitmaps.sh              | 389 ++++++++++
 t/test-lib-functions.sh                      |  12 +-
 19 files changed, 2401 insertions(+), 61 deletions(-)
 create mode 100644 Documentation/config/bitmap-pseudo-merge.txt
 create mode 100644 pseudo-merge.c
 create mode 100644 pseudo-merge.h
 create mode 100755 t/perf/p5333-pseudo-merge-bitmaps.sh
 create mode 100755 t/t5333-pseudo-merge-bitmaps.sh


base-commit: 3bd955d26919e149552f34aacf8a4e6368c26cec
-- 
2.44.0.303.g1dc5e5b124c

^ permalink raw reply	[flat|nested] 157+ messages in thread

* [PATCH 01/24] Documentation/technical: describe pseudo-merge bitmaps format
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
@ 2024-03-20 22:05 ` Taylor Blau
  2024-03-21 21:24   ` Junio C Hamano
  2024-03-20 22:05 ` [PATCH 02/24] config: repo_config_get_expiry() Taylor Blau
                   ` (26 subsequent siblings)
  27 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Prepare to implement pseudo-merge bitmaps over the next several commits
by first describing the serialization format which will store the new
pseudo-merge bitmaps themselves.

This format is implemented as an optional extension within the bitmap v1
format, making it compatible with previous versions of Git, as well as
the original .bitmap implementation within JGit.

The format (as well as a general description of pseudo-merge bitmaps,
and motivating use-case(s)) is described in detail in the patch contents
below, but the high-level description is as follows:

  - An array of pseudo-merge bitmaps, each containing a pair of EWAH
    bitmaps: one describing the set of pseudo-merge "parents", and
    another describing the set of object(s) reachable from those
    parents.

  - A lookup table to determine which pseudo-merge(s) a given commit
    appears in. An optional extended lookup table follows when there is
    at least one commit which appears in multiple pseudo-merge groups.

  - Trailing metadata, including the number of pseudo-merge(s), number
    of unique parents, the offset within the .bitmap file for the
    pseudo-merge commit lookup table, and the size of the optional
    extension itself.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/bitmap-format.txt | 179 ++++++++++++++++++++++
 1 file changed, 179 insertions(+)

diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index f5d200939b0..63a7177ac08 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -255,3 +255,182 @@ triplet is -
 	xor_row (4 byte integer, network byte order): ::
 	The position of the triplet whose bitmap is used to compress
 	this one, or `0xffffffff` if no such bitmap exists.
+
+Pseudo-merge bitmaps
+--------------------
+
+If the `BITMAP_OPT_PSEUDO_MERGES` flag is set, a variable number of
+bytes (preceding the name-hash cache, commit lookup table, and trailing
+checksum) of the `.bitmap` file is used to store pseudo-merge bitmaps.
+
+A "pseudo-merge bitmap" is used to refer to a pair of bitmaps, as
+follows:
+
+Commit bitmap::
+
+  A bitmap whose set bits describe the set of commits included in the
+  pseudo-merge's "merge" bitmap (as below).
+
+Merge bitmap::
+
+  A bitmap whose set bits describe the reachability closure over the set
+  of commits in the pseudo-merge's "commits" bitmap (as above). An
+  identical bitmap would be generated for an octopus merge with the same
+  set of parents as described in the commits bitmap.
+
+Pseudo-merge bitmaps can accelerate bitmap traversals when all commits
+for a given pseudo-merge are listed on either side of the traversal,
+either directly (by explicitly asking for them as part of the `HAVES`
+or `WANTS`) or indirectly (by encountering them during a fill-in
+traversal).
+
+=== Use-cases
+
+For example, suppose there exists a pseudo-merge bitmap with a large
+number of commits, all of which are listed in the `WANTS` section of
+some bitmap traversal query. When pseudo-merge bitmaps are enabled, the
+bitmap machinery can quickly determine there is a pseudo-merge which
+satisfies some subset of the wanted objects on either side of the query.
+Then, we can inflate the EWAH-compressed bitmap, and `OR` it in to the
+resulting bitmap. By contrast, without pseudo-merge bitmaps, we would
+have to repeat the decompression and `OR`-ing step over a potentially
+large number of individual bitmaps, which can take proportionally more
+time.
+
+Another benefit of pseudo-merges arises when there is some combination
+of (a) a large number of references, with (b) poor bitmap coverage, and
+(c) deep, nested trees, making fill-in traversal relatively expensive.
+For example, suppose that there are a large enough number of tags where
+bitmapping each of the tags individually is infeasible. Without
+pseudo-merge bitmaps, computing the result of, say, `git rev-list
+--use-bitmap-index --count --objects --tags` would likely require a
+large amount of fill-in traversal. But when a large quantity of those
+tags are stored together in a pseudo-merge bitmap, the bitmap machinery
+can take advantage of the fact that we only care about the union of
+objects reachable from all of those tags, and answer the query much
+faster.
+
+=== File format
+
+If enabled, pseudo-merge bitmaps are stored in an optional section at
+the end of a `.bitmap` file. The format is as follows:
+
+....
++-------------------------------------------+
+|               .bitmap File                |
++-------------------------------------------+
+|                                           |
+|  Pseudo-merge bitmaps (Variable Length)   |
+|  +---------------------------+            |
+|  | commits_bitmap (EWAH)     |            |
+|  +---------------------------+            |
+|  | merge_bitmap (EWAH)       |            |
+|  +---------------------------+            |
+|                                           |
++-------------------------------------------+
+|                                           |
+|  Lookup Table                             |
+|  +------------+--------------+            |
+|  | commit_pos |    offset    |            |
+|  +------------+--------------+            |
+|  |  4 bytes   |   8 bytes    |            |
+|  +------------+--------------+            |
+|                                           |
+|  Offset Cases:                            |
+|  -------------                            |
+|                                           |
+|  1. MSB Unset: single pseudo-merge bitmap |
+|     + offset to pseudo-merge bitmap       |
+|                                           |
+|  2. MSB Set: multiple pseudo-merges       |
+|     + offset to extended lookup table     |
+|                                           |
++-------------------------------------------+
+|                                           |
+|  Extended Lookup Table (Optional)         |
+|                                           |
+|  +----+----------+----------+----------+  |
+|  | N  | Offset 1 |   ....   | Offset N |  |
+|  +----+----------+----------+----------+  |
+|  |    |  8 bytes |   ....   |  8 bytes |  |
+|  +----+----------+----------+----------+  |
+|                                           |
++-------------------------------------------+
+|                                           |
+|  Pseudo-merge Metadata                    |
+|  +------------------+----------------+    |
+|  | # pseudo-merges  | # Commits      |    |
+|  +------------------+----------------+    |
+|  | 4 bytes          | 4 bytes        |    |
+|  +------------------+----------------+    |
+|                                           |
+|  +------------------+----------------+    |
+|  | Lookup offset    | Extension size |    |
+|  +------------------+----------------+    |
+|  | 8 bytes          | 8 bytes        |    |
+|  +------------------+----------------+    |
+|                                           |
++-------------------------------------------+
+....
+
+* One or more pseudo-merge bitmaps, each containing:
+
+  ** `commits_bitmap`, an EWAH-compressed bitmap describing the set of
+     commits included in the this psuedo-merge.
+
+  ** `merge_bitmap`, an EWAH-compressed bitmap describing the union of
+     the set of objects reachable from all commits listed in the
+     `commits_bitmap`.
+
+* A lookup table, mapping pseudo-merged commits to the pseudo-merges
+  they belong to. Entries appear in increasing order of each commit's
+  bit position. Each entry is 12 bytes wide, and is comprised of the
+  following:
+
+  ** `commit_pos`, a 4-byte unsigned value (in network byte-order)
+     containing the bit position for this commit.
+
+  ** `offset`, an 8-byte unsigned value (also in network byte-order)
+  containing either one of two possible offsets, depending on whether or
+  not the most-significant bit is set.
+
+    *** If unset (i.e. `offset & ((uint64_t)1<<63) == 0`), the offset
+	(relative to the beginning of the `.bitmap` file) at which the
+	pseudo-merge bitmap for this commit can be read. This indicates
+	only a single pseudo-merge bitmap contains this commit.
+
+    *** If set (i.e. `offset & ((uint64_t)1<<63) != 0`), the offset
+	(again relative to the beginning of the `.bitmap` file) at which
+	the extended offset table can be located describing the set of
+	pseudo-merge bitmaps which contain this commit. This indicates
+	that multiple pseudo-merge bitmaps contain this commit.
+
+* An (optional) extended lookup table (written if and only if there is
+  at least one commit which appears in more than one pseudo-merge).
+  There are as many entries as commits which appear in multiple
+  pseudo-merges. Each entry contains the following:
+
+  ** `N`, a 4-byte unsigned value equal to the number of pseudo-merges
+     which contain a given commit.
+
+  ** An array of `N` 8-byte unsigned values, each of which is
+     interpreted as an offset (relative to the beginning of the
+     `.bitmap` file) at which a pseudo-merge bitmap for this commit can
+     be read. These values occur in no particular order.
+
+* Positions for all pseudo-merges, each stored as an 8-byte unsigned
+  value (in network byte-order) containing the offset (relative to the
+  beginnign of the `.bitmap` file) of each consecutive pseudo-merge.
+
+* A 4-byte unsigned value (in network byte-order) equal to the number of
+  pseudo-merges.
+
+* A 4-byte unsigned value (in network byte-order) equal to the number of
+  unique commits which appear in any pseudo-merge.
+
+* An 8-byte unsigned value (in network byte-order) equal to the number
+  of bytes between the start of the pseudo-merge section and the
+  beginning of the lookup table.
+
+* An 8-byte unsigned value (in network byte-order) equal to the number
+  of bytes in the pseudo-merge section (including this field).
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 02/24] config: repo_config_get_expiry()
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
  2024-03-20 22:05 ` [PATCH 01/24] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
@ 2024-03-20 22:05 ` Taylor Blau
  2024-04-10 17:54   ` Jeff King
  2024-03-20 22:05 ` [PATCH 03/24] ewah: implement `ewah_bitmap_is_subset()` Taylor Blau
                   ` (25 subsequent siblings)
  27 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Callers interested in parsing an approxidate from configuration
currently make use of the `git_config_get_expiry()` function via the
standard `git_config()` callback.

Introduce a `repo_config_get_expiry()` variant in the style of functions
introduced by 3b256228a6 (config: read config from a repository object,
2017-06-22) to read a single value without requiring the git_config()
callback-style approach.

This new function is similar to the existing implementation in
`git_config_get_expiry()`, however it differs in that it fills out a
`timestamp_t` value through a pointer, instead of merely checking and
discarding the result (and returning it as a string).

This function will gain its first caller in a subsequent commit to parse
a "threshold" parameter for excluding too-recent commits from
pseudo-merge groups.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 config.c | 18 ++++++++++++++++++
 config.h |  2 ++
 2 files changed, 20 insertions(+)

diff --git a/config.c b/config.c
index 3cfeb3d8bd9..8512da92273 100644
--- a/config.c
+++ b/config.c
@@ -2627,6 +2627,24 @@ int repo_config_get_pathname(struct repository *repo,
 	return ret;
 }
 
+int repo_config_get_expiry(struct repository *repo,
+			   const char *key, const char **dest)
+{
+	int ret;
+
+	git_config_check_init(repo);
+
+	ret = repo_config_get_string(repo, key, (char **)dest);
+	if (ret)
+		return ret;
+	if (strcmp(*dest, "now")) {
+		timestamp_t now = approxidate("now");
+		if (approxidate(*dest) >= now)
+			git_die_config(key, _("Invalid %s: '%s'"), key, *dest);
+	}
+	return ret;
+}
+
 /* Read values into protected_config. */
 static void read_protected_config(void)
 {
diff --git a/config.h b/config.h
index 5dba984f770..619db01bc27 100644
--- a/config.h
+++ b/config.h
@@ -576,6 +576,8 @@ int repo_config_get_maybe_bool(struct repository *repo,
 			       const char *key, int *dest);
 int repo_config_get_pathname(struct repository *repo,
 			     const char *key, const char **dest);
+int repo_config_get_expiry(struct repository *repo,
+			   const char *key, const char **dest);
 
 /*
  * Functions for reading protected config. By definition, protected
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 03/24] ewah: implement `ewah_bitmap_is_subset()`
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
  2024-03-20 22:05 ` [PATCH 01/24] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
  2024-03-20 22:05 ` [PATCH 02/24] config: repo_config_get_expiry() Taylor Blau
@ 2024-03-20 22:05 ` Taylor Blau
  2024-04-10 18:05   ` Jeff King
  2024-03-20 22:05 ` [PATCH 04/24] pack-bitmap: drop unused `max_bitmaps` parameter Taylor Blau
                   ` (24 subsequent siblings)
  27 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

In order to know whether a given pseudo-merge (comprised of a "parents"
and "objects" bitmaps) is "satisfied" and can be OR'd into the bitmap
result, we need to be able to quickly determine whether the "parents"
bitmap is a subset of the current set of objects reachable on either
side of a traversal.

Implement a helper function to prepare for that, which determines
whether an EWAH bitmap (the parents bitmap from the pseudo-merge) is a
subset of a non-EWAH bitmap (in this case, the results bitmap from
either side of the traversal).

This function makes use of the EWAH iterator to avoid inflating any part
of the EWAH bitmap after we determine it is not a subset of the non-EWAH
bitmap. This "fail-fast" allows us to avoid a potentially large amount
of wasted effort.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ewah/bitmap.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 ewah/ewok.h   |  1 +
 2 files changed, 44 insertions(+)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index ac7e0af622a..5bdae3fb07b 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -138,6 +138,49 @@ void bitmap_or(struct bitmap *self, const struct bitmap *other)
 		self->words[i] |= other->words[i];
 }
 
+int ewah_bitmap_is_subset(struct ewah_bitmap *self, struct bitmap *other)
+{
+	struct ewah_iterator it;
+	eword_t word;
+	size_t i;
+
+	ewah_iterator_init(&it, self);
+
+	for (i = 0; i < other->word_alloc; i++) {
+		if (!ewah_iterator_next(&word, &it)) {
+			/*
+			 * If we reached the end of `self`, and haven't
+			 * rejected `self` as a possible subset of
+			 * `other` yet, then we are done and `self` is
+			 * indeed a subset of `other`.
+			 */
+			return 1;
+		}
+		if (word & ~other->words[i]) {
+			/*
+			 * Otherwise, compare the next two pairs of
+			 * words. If the word from `self` has bit(s) not
+			 * in the word from `other`, `self` is not a
+			 * proper subset of `other`.
+			 */
+			return 0;
+		}
+	}
+
+	/*
+	 * If we got to this point, there may be zero or more words
+	 * remaining in `self`, with no remaining words left in `other`.
+	 * If there are any bits set in the remaining word(s) in `self`,
+	 * then `self` is not a proper subset of `other`.
+	 */
+	while (ewah_iterator_next(&word, &it))
+		if (word)
+			return 0;
+
+	/* `self` is definitely a subset of `other` */
+	return 1;
+}
+
 void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other)
 {
 	size_t original_size = self->word_alloc;
diff --git a/ewah/ewok.h b/ewah/ewok.h
index c11d76c6f33..c334833b201 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -180,6 +180,7 @@ int bitmap_get(struct bitmap *self, size_t pos);
 void bitmap_free(struct bitmap *self);
 int bitmap_equals(struct bitmap *self, struct bitmap *other);
 int bitmap_is_subset(struct bitmap *self, struct bitmap *other);
+int ewah_bitmap_is_subset(struct ewah_bitmap *self, struct bitmap *other);
 
 struct ewah_bitmap * bitmap_to_ewah(struct bitmap *bitmap);
 struct bitmap *ewah_to_bitmap(struct ewah_bitmap *ewah);
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 04/24] pack-bitmap: drop unused `max_bitmaps` parameter
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (2 preceding siblings ...)
  2024-03-20 22:05 ` [PATCH 03/24] ewah: implement `ewah_bitmap_is_subset()` Taylor Blau
@ 2024-03-20 22:05 ` Taylor Blau
  2024-04-10 18:06   ` Jeff King
  2024-03-20 22:05 ` [PATCH 05/24] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
                   ` (23 subsequent siblings)
  27 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The `max_bitmaps` parameter in `bitmap_writer_select_commits()` was
introduced back in 7cc8f97108 (pack-objects: implement bitmap writing,
2013-12-21), making it original to the bitmap implementation in Git
itself.

When that patch was merged via 0f9e62e084 (Merge branch
'jk/pack-bitmap', 2014-02-27), its sole caller in builtin/pack-objects.c
passed a value of "-1" for `max_bitmaps`, indicating no limit.

Since then, the only other caller (in midx.c, added via c528e17966
(pack-bitmap: write multi-pack bitmaps, 2021-08-31)) also uses a value
of "-1" for `max_bitmaps`.

Since no callers have needed a finite limit for the `max_bitmaps`
parameter in the nearly decade that has passed since 0f9e62e084, let's
remove the parameter and any dead pieces of code connected to it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 2 +-
 midx.c                 | 2 +-
 pack-bitmap-write.c    | 8 +-------
 pack-bitmap.h          | 2 +-
 4 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 329aeac8043..41281cae91f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1359,7 +1359,7 @@ static void write_pack_file(void)
 				stop_progress(&progress_state);
 
 				bitmap_writer_show_progress(progress);
-				bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1);
+				bitmap_writer_select_commits(indexed_commits, indexed_commits_nr);
 				if (bitmap_writer_build(&to_pack) < 0)
 					die(_("failed to write bitmap index"));
 				bitmap_writer_finish(written_list, nr_written,
diff --git a/midx.c b/midx.c
index 85e1c2cd128..366bfbe18c8 100644
--- a/midx.c
+++ b/midx.c
@@ -1330,7 +1330,7 @@ static int write_midx_bitmap(const char *midx_name,
 	for (i = 0; i < pdata->nr_objects; i++)
 		index[pack_order[i]] = &pdata->objects[i].idx;
 
-	bitmap_writer_select_commits(commits, commits_nr, -1);
+	bitmap_writer_select_commits(commits, commits_nr);
 	ret = bitmap_writer_build(pdata);
 	if (ret < 0)
 		goto cleanup;
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 990a9498d73..3dc2408eca7 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -591,8 +591,7 @@ static int date_compare(const void *_a, const void *_b)
 }
 
 void bitmap_writer_select_commits(struct commit **indexed_commits,
-				  unsigned int indexed_commits_nr,
-				  int max_bitmaps)
+				  unsigned int indexed_commits_nr)
 {
 	unsigned int i = 0, j, next;
 
@@ -615,11 +614,6 @@ void bitmap_writer_select_commits(struct commit **indexed_commits,
 		if (i + next >= indexed_commits_nr)
 			break;
 
-		if (max_bitmaps > 0 && writer.selected_nr >= max_bitmaps) {
-			writer.selected_nr = max_bitmaps;
-			break;
-		}
-
 		if (next == 0) {
 			chosen = indexed_commits[i];
 		} else {
diff --git a/pack-bitmap.h b/pack-bitmap.h
index c7dea13217a..3f96608d5c1 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -110,7 +110,7 @@ int rebuild_bitmap(const uint32_t *reposition,
 struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
 				      struct commit *commit);
 void bitmap_writer_select_commits(struct commit **indexed_commits,
-		unsigned int indexed_commits_nr, int max_bitmaps);
+				  unsigned int indexed_commits_nr);
 int bitmap_writer_build(struct packing_data *to_pack);
 void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint32_t index_nr,
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 05/24] pack-bitmap: move some initialization to `bitmap_writer_init()`
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (3 preceding siblings ...)
  2024-03-20 22:05 ` [PATCH 04/24] pack-bitmap: drop unused `max_bitmaps` parameter Taylor Blau
@ 2024-03-20 22:05 ` Taylor Blau
  2024-04-10 18:10   ` Jeff King
  2024-03-20 22:05 ` [PATCH 06/24] pseudo-merge.ch: initial commit Taylor Blau
                   ` (22 subsequent siblings)
  27 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The pack-bitmap-writer machinery uses a oidmap (backed by khash.h) to
map from commits selected for bitmaps (by OID) to a bitmapped_commit
structure (containing the bitmap itself, among other things like its XOR
offset, etc.)

This map was initialized at the end of `bitmap_writer_build()`. New
entries are added in `pack-bitmap-write.c::store_selected()`, which is
called by the bitmap_builder machinery (which is responsible for
traversing history and generating the actual bitmaps).

Reorganize when this field is initialized and when entries are added to
it so that we can quickly determine whether a commit is a candidate for
pseudo-merge selection, or not (since it was already selected to receive
a bitmap, and thus is ineligible for pseudo-merge inclusion).

The changes are as follows:

  - Introduce a new `bitmap_writer_init()` function which initializes
    the `writer.bitmaps` field (instead of waiting until the end of
    `bitmap_writer_build()`).

  - Add map entries in `push_bitmapped_commit()` (which is called via
    `bitmap_writer_select_commits()`) with OID keys and NULL values to
    track whether or not we *expect* to write a bitmap for some given
    commit.

  - Validate that a NULL entry is found matching the given key when we
    store a selected bitmap.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c |  1 +
 midx.c                 |  1 +
 pack-bitmap-write.c    | 23 ++++++++++++++++++-----
 pack-bitmap.h          |  1 +
 4 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 41281cae91f..34a431e3856 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1339,6 +1339,7 @@ static void write_pack_file(void)
 				    hash_to_hex(hash));
 
 			if (write_bitmap_index) {
+				bitmap_writer_init(the_repository);
 				bitmap_writer_set_checksum(hash);
 				bitmap_writer_build_type_index(
 					&to_pack, written_list, nr_written);
diff --git a/midx.c b/midx.c
index 366bfbe18c8..24d98120852 100644
--- a/midx.c
+++ b/midx.c
@@ -1311,6 +1311,7 @@ static int write_midx_bitmap(const char *midx_name,
 	for (i = 0; i < pdata->nr_objects; i++)
 		index[i] = &pdata->objects[i].idx;
 
+	bitmap_writer_init(the_repository);
 	bitmap_writer_show_progress(flags & MIDX_PROGRESS);
 	bitmap_writer_build_type_index(pdata, index, pdata->nr_objects);
 
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 3dc2408eca7..ad768959633 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -46,6 +46,11 @@ struct bitmap_writer {
 
 static struct bitmap_writer writer;
 
+void bitmap_writer_init(struct repository *r)
+{
+	writer.bitmaps = kh_init_oid_map();
+}
+
 void bitmap_writer_show_progress(int show)
 {
 	writer.show_progress = show;
@@ -117,11 +122,20 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack,
 
 static inline void push_bitmapped_commit(struct commit *commit)
 {
+	int hash_ret;
+	khiter_t hash_pos;
+
 	if (writer.selected_nr >= writer.selected_alloc) {
 		writer.selected_alloc = (writer.selected_alloc + 32) * 2;
 		REALLOC_ARRAY(writer.selected, writer.selected_alloc);
 	}
 
+	hash_pos = kh_put_oid_map(writer.bitmaps, commit->object.oid, &hash_ret);
+	if (!hash_ret)
+		die(_("duplicate entry when writing bitmap index: %s"),
+		    oid_to_hex(&commit->object.oid));
+	kh_value(writer.bitmaps, hash_pos) = NULL;
+
 	writer.selected[writer.selected_nr].commit = commit;
 	writer.selected[writer.selected_nr].bitmap = NULL;
 	writer.selected[writer.selected_nr].flags = 0;
@@ -466,14 +480,14 @@ static void store_selected(struct bb_commit *ent, struct commit *commit)
 {
 	struct bitmapped_commit *stored = &writer.selected[ent->idx];
 	khiter_t hash_pos;
-	int hash_ret;
 
 	stored->bitmap = bitmap_to_ewah(ent->bitmap);
 
-	hash_pos = kh_put_oid_map(writer.bitmaps, commit->object.oid, &hash_ret);
-	if (hash_ret == 0)
-		die("Duplicate entry when writing index: %s",
+	hash_pos = kh_get_oid_map(writer.bitmaps, commit->object.oid);
+	if (hash_pos == kh_end(writer.bitmaps))
+		die(_("attempted to store non-selected commit: '%s'"),
 		    oid_to_hex(&commit->object.oid));
+
 	kh_value(writer.bitmaps, hash_pos) = stored;
 }
 
@@ -488,7 +502,6 @@ int bitmap_writer_build(struct packing_data *to_pack)
 	uint32_t *mapping;
 	int closed = 1; /* until proven otherwise */
 
-	writer.bitmaps = kh_init_oid_map();
 	writer.to_pack = to_pack;
 
 	if (writer.show_progress)
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 3f96608d5c1..dae2d68a338 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -97,6 +97,7 @@ int bitmap_has_oid_in_uninteresting(struct bitmap_index *, const struct object_i
 
 off_t get_disk_usage_from_bitmap(struct bitmap_index *, struct rev_info *);
 
+void bitmap_writer_init(struct repository *r);
 void bitmap_writer_show_progress(int show);
 void bitmap_writer_set_checksum(const unsigned char *sha1);
 void bitmap_writer_build_type_index(struct packing_data *to_pack,
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 06/24] pseudo-merge.ch: initial commit
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (4 preceding siblings ...)
  2024-03-20 22:05 ` [PATCH 05/24] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
@ 2024-03-20 22:05 ` Taylor Blau
  2024-03-20 22:05 ` [PATCH 07/24] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Add a new (empty) header file to contain the implementation for
selecting, reading, and applying pseudo-merge bitmaps.

For now this header and its corresponding implementation are left
empty, but they will evolve over the course of subsequent commit(s).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Makefile       | 1 +
 pseudo-merge.c | 2 ++
 pseudo-merge.h | 6 ++++++
 3 files changed, 9 insertions(+)
 create mode 100644 pseudo-merge.c
 create mode 100644 pseudo-merge.h

diff --git a/Makefile b/Makefile
index 4e255c81f22..fd050bd9d68 100644
--- a/Makefile
+++ b/Makefile
@@ -1114,6 +1114,7 @@ LIB_OBJS += prompt.o
 LIB_OBJS += protocol.o
 LIB_OBJS += protocol-caps.o
 LIB_OBJS += prune-packed.o
+LIB_OBJS += pseudo-merge.o
 LIB_OBJS += quote.o
 LIB_OBJS += range-diff.o
 LIB_OBJS += reachable.o
diff --git a/pseudo-merge.c b/pseudo-merge.c
new file mode 100644
index 00000000000..37e037ba272
--- /dev/null
+++ b/pseudo-merge.c
@@ -0,0 +1,2 @@
+#include "git-compat-util.h"
+#include "pseudo-merge.h"
diff --git a/pseudo-merge.h b/pseudo-merge.h
new file mode 100644
index 00000000000..cab8ff6960a
--- /dev/null
+++ b/pseudo-merge.h
@@ -0,0 +1,6 @@
+#ifndef PSEUDO_MERGE_H
+#define PSEUDO_MERGE_H
+
+#include "git-compat-util.h"
+
+#endif
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 07/24] pack-bitmap-write: support storing pseudo-merge commits
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (5 preceding siblings ...)
  2024-03-20 22:05 ` [PATCH 06/24] pseudo-merge.ch: initial commit Taylor Blau
@ 2024-03-20 22:05 ` Taylor Blau
  2024-03-20 22:05 ` [PATCH 08/24] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Taylor Blau
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Prepare to write pseudo-merge bitmaps by annotating individual bitmapped
commits (which are represented by the `bitmapped_commit` structure) with
an extra bit indicating whether or not they are a pseudo-merge.

In subsequent commits, pseudo-merge bitmaps will be generated by
allocating a fake commit node with parents covering the full set of
commits represented by the pseudo-merge bitmap. These commits will be
added to the set of "selected" commits as usual, but will be written
specially instead of being included with the rest of the selected
commits.

Mechanically speaking, there are two parts of this change:

  - The bitmapped_commit struct gets a new bit indicating whether it is
    a pseudo-merge, or an ordinary commit selected for bitmaps.

  - A handful of changes to only write out the non-pseudo-merge commits
    when enumerating through the selected array (see the new
    `bitmap_writer_selected_nr()` function). Pseudo-merge commits appear
    after all non-pseudo-merge commits, so it is safe to enumerate
    through the selected array like so:

        for (i = 0; i < bitmap_writer_selected_nr(); i++)
          if (writer.selected[i].pseudo_merge)
            BUG("unexpected pseudo-merge");

    without encountering the BUG().

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 100 +++++++++++++++++++++++++++++---------------
 pack-bitmap.h       |   1 +
 2 files changed, 67 insertions(+), 34 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index ad768959633..b1e8a0ad66d 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -24,7 +24,7 @@ struct bitmapped_commit {
 	struct ewah_bitmap *write_as;
 	int flags;
 	int xor_offset;
-	uint32_t commit_pos;
+	unsigned pseudo_merge : 1;
 };
 
 struct bitmap_writer {
@@ -39,6 +39,8 @@ struct bitmap_writer {
 	struct bitmapped_commit *selected;
 	unsigned int selected_nr, selected_alloc;
 
+	uint32_t pseudo_merges_nr;
+
 	struct progress *progress;
 	int show_progress;
 	unsigned char pack_checksum[GIT_MAX_RAWSZ];
@@ -46,6 +48,11 @@ struct bitmap_writer {
 
 static struct bitmap_writer writer;
 
+static inline int bitmap_writer_selected_nr(void)
+{
+	return writer.selected_nr - writer.pseudo_merges_nr;
+}
+
 void bitmap_writer_init(struct repository *r)
 {
 	writer.bitmaps = kh_init_oid_map();
@@ -120,25 +127,30 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack,
  * Compute the actual bitmaps
  */
 
-static inline void push_bitmapped_commit(struct commit *commit)
+static void bitmap_writer_push_bitmapped_commit(struct commit *commit,
+						unsigned pseudo_merge)
 {
-	int hash_ret;
-	khiter_t hash_pos;
-
 	if (writer.selected_nr >= writer.selected_alloc) {
 		writer.selected_alloc = (writer.selected_alloc + 32) * 2;
 		REALLOC_ARRAY(writer.selected, writer.selected_alloc);
 	}
 
-	hash_pos = kh_put_oid_map(writer.bitmaps, commit->object.oid, &hash_ret);
-	if (!hash_ret)
-		die(_("duplicate entry when writing bitmap index: %s"),
-		    oid_to_hex(&commit->object.oid));
-	kh_value(writer.bitmaps, hash_pos) = NULL;
+	if (!pseudo_merge) {
+		int hash_ret;
+		khiter_t hash_pos = kh_put_oid_map(writer.bitmaps,
+						   commit->object.oid,
+						   &hash_ret);
+
+		if (!hash_ret)
+			die(_("duplicate entry when writing bitmap index: %s"),
+			    oid_to_hex(&commit->object.oid));
+		kh_value(writer.bitmaps, hash_pos) = NULL;
+	}
 
 	writer.selected[writer.selected_nr].commit = commit;
 	writer.selected[writer.selected_nr].bitmap = NULL;
 	writer.selected[writer.selected_nr].flags = 0;
+	writer.selected[writer.selected_nr].pseudo_merge = pseudo_merge;
 
 	writer.selected_nr++;
 }
@@ -168,16 +180,20 @@ static void compute_xor_offsets(void)
 
 	while (next < writer.selected_nr) {
 		struct bitmapped_commit *stored = &writer.selected[next];
-
 		int best_offset = 0;
 		struct ewah_bitmap *best_bitmap = stored->bitmap;
 		struct ewah_bitmap *test_xor;
 
+		if (stored->pseudo_merge)
+			goto next;
+
 		for (i = 1; i <= MAX_XOR_OFFSET_SEARCH; ++i) {
 			int curr = next - i;
 
 			if (curr < 0)
 				break;
+			if (writer.selected[curr].pseudo_merge)
+				continue;
 
 			test_xor = ewah_pool_new();
 			ewah_xor(writer.selected[curr].bitmap, stored->bitmap, test_xor);
@@ -193,6 +209,7 @@ static void compute_xor_offsets(void)
 			}
 		}
 
+next:
 		stored->xor_offset = best_offset;
 		stored->write_as = best_bitmap;
 
@@ -205,7 +222,8 @@ struct bb_commit {
 	struct bitmap *commit_mask;
 	struct bitmap *bitmap;
 	unsigned selected:1,
-		 maximal:1;
+		 maximal:1,
+		 pseudo_merge:1;
 	unsigned idx; /* within selected array */
 };
 
@@ -243,17 +261,18 @@ static void bitmap_builder_init(struct bitmap_builder *bb,
 	revs.first_parent_only = 1;
 
 	for (i = 0; i < writer->selected_nr; i++) {
-		struct commit *c = writer->selected[i].commit;
-		struct bb_commit *ent = bb_data_at(&bb->data, c);
+		struct bitmapped_commit *bc = &writer->selected[i];
+		struct bb_commit *ent = bb_data_at(&bb->data, bc->commit);
 
 		ent->selected = 1;
 		ent->maximal = 1;
+		ent->pseudo_merge = bc->pseudo_merge;
 		ent->idx = i;
 
 		ent->commit_mask = bitmap_new();
 		bitmap_set(ent->commit_mask, i);
 
-		add_pending_object(&revs, &c->object, "");
+		add_pending_object(&revs, &bc->commit->object, "");
 	}
 
 	if (prepare_revision_walk(&revs))
@@ -430,8 +449,13 @@ static int fill_bitmap_commit(struct bb_commit *ent,
 		struct commit *c = prio_queue_get(queue);
 
 		if (old_bitmap && mapping) {
-			struct ewah_bitmap *old = bitmap_for_commit(old_bitmap, c);
+			struct ewah_bitmap *old;
 			struct bitmap *remapped = bitmap_new();
+
+			if (commit->object.flags & BITMAP_PSEUDO_MERGE)
+				old = NULL;
+			else
+				old = bitmap_for_commit(old_bitmap, c);
 			/*
 			 * If this commit has an old bitmap, then translate that
 			 * bitmap and add its bits to this one. No need to walk
@@ -450,12 +474,14 @@ static int fill_bitmap_commit(struct bb_commit *ent,
 		 * Mark ourselves and queue our tree. The commit
 		 * walk ensures we cover all parents.
 		 */
-		pos = find_object_pos(&c->object.oid, &found);
-		if (!found)
-			return -1;
-		bitmap_set(ent->bitmap, pos);
-		prio_queue_put(tree_queue,
-			       repo_get_commit_tree(the_repository, c));
+		if (!(c->object.flags & BITMAP_PSEUDO_MERGE)) {
+			pos = find_object_pos(&c->object.oid, &found);
+			if (!found)
+				return -1;
+			bitmap_set(ent->bitmap, pos);
+			prio_queue_put(tree_queue,
+				       repo_get_commit_tree(the_repository, c));
+		}
 
 		for (p = c->parents; p; p = p->next) {
 			pos = find_object_pos(&p->item->object.oid, &found);
@@ -483,6 +509,9 @@ static void store_selected(struct bb_commit *ent, struct commit *commit)
 
 	stored->bitmap = bitmap_to_ewah(ent->bitmap);
 
+	if (ent->pseudo_merge)
+		return;
+
 	hash_pos = kh_get_oid_map(writer.bitmaps, commit->object.oid);
 	if (hash_pos == kh_end(writer.bitmaps))
 		die(_("attempted to store non-selected commit: '%s'"),
@@ -612,7 +641,7 @@ void bitmap_writer_select_commits(struct commit **indexed_commits,
 
 	if (indexed_commits_nr < 100) {
 		for (i = 0; i < indexed_commits_nr; ++i)
-			push_bitmapped_commit(indexed_commits[i]);
+			bitmap_writer_push_bitmapped_commit(indexed_commits[i], 0);
 		return;
 	}
 
@@ -645,7 +674,7 @@ void bitmap_writer_select_commits(struct commit **indexed_commits,
 			}
 		}
 
-		push_bitmapped_commit(chosen);
+		bitmap_writer_push_bitmapped_commit(chosen, 0);
 
 		i += next + 1;
 		display_progress(writer.progress, i);
@@ -683,8 +712,11 @@ static void write_selected_commits_v1(struct hashfile *f,
 {
 	int i;
 
-	for (i = 0; i < writer.selected_nr; ++i) {
+	for (i = 0; i < bitmap_writer_selected_nr(); ++i) {
 		struct bitmapped_commit *stored = &writer.selected[i];
+		if (stored->pseudo_merge)
+			BUG("unexpected pseudo-merge among selected: %s",
+			    oid_to_hex(&stored->commit->object.oid));
 
 		if (offsets)
 			offsets[i] = hashfile_total(f);
@@ -718,10 +750,10 @@ static void write_lookup_table(struct hashfile *f,
 	uint32_t i;
 	uint32_t *table, *table_inv;
 
-	ALLOC_ARRAY(table, writer.selected_nr);
-	ALLOC_ARRAY(table_inv, writer.selected_nr);
+	ALLOC_ARRAY(table, bitmap_writer_selected_nr());
+	ALLOC_ARRAY(table_inv, bitmap_writer_selected_nr());
 
-	for (i = 0; i < writer.selected_nr; i++)
+	for (i = 0; i < bitmap_writer_selected_nr(); i++)
 		table[i] = i;
 
 	/*
@@ -729,16 +761,16 @@ static void write_lookup_table(struct hashfile *f,
 	 * bitmap corresponds to j'th bitmapped commit (among the selected
 	 * commits) in lex order of OIDs.
 	 */
-	QSORT_S(table, writer.selected_nr, table_cmp, commit_positions);
+	QSORT_S(table, bitmap_writer_selected_nr(), table_cmp, commit_positions);
 
 	/* table_inv helps us discover that relationship (i'th bitmap
 	 * to j'th commit by j = table_inv[i])
 	 */
-	for (i = 0; i < writer.selected_nr; i++)
+	for (i = 0; i < bitmap_writer_selected_nr(); i++)
 		table_inv[table[i]] = i;
 
 	trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository);
-	for (i = 0; i < writer.selected_nr; i++) {
+	for (i = 0; i < bitmap_writer_selected_nr(); i++) {
 		struct bitmapped_commit *selected = &writer.selected[table[i]];
 		uint32_t xor_offset = selected->xor_offset;
 		uint32_t xor_row;
@@ -809,7 +841,7 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE));
 	header.version = htons(default_version);
 	header.options = htons(flags | options);
-	header.entry_count = htonl(writer.selected_nr);
+	header.entry_count = htonl(bitmap_writer_selected_nr());
 	hashcpy(header.checksum, writer.pack_checksum);
 
 	hashwrite(f, &header, sizeof(header) - GIT_MAX_RAWSZ + the_hash_algo->rawsz);
@@ -821,9 +853,9 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	if (options & BITMAP_OPT_LOOKUP_TABLE)
 		CALLOC_ARRAY(offsets, index_nr);
 
-	ALLOC_ARRAY(commit_positions, writer.selected_nr);
+	ALLOC_ARRAY(commit_positions, bitmap_writer_selected_nr());
 
-	for (i = 0; i < writer.selected_nr; i++) {
+	for (i = 0; i < bitmap_writer_selected_nr(); i++) {
 		struct bitmapped_commit *stored = &writer.selected[i];
 		int commit_pos = oid_pos(&stored->commit->object.oid, index, index_nr, oid_access);
 
diff --git a/pack-bitmap.h b/pack-bitmap.h
index dae2d68a338..ca9acd2f735 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -21,6 +21,7 @@ struct bitmap_disk_header {
 	unsigned char checksum[GIT_MAX_RAWSZ];
 };
 
+#define BITMAP_PSEUDO_MERGE (1u<<21)
 #define NEEDS_BITMAP (1u<<22)
 
 /*
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 08/24] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()`
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (6 preceding siblings ...)
  2024-03-20 22:05 ` [PATCH 07/24] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
@ 2024-03-20 22:05 ` Taylor Blau
  2024-03-20 22:05 ` [PATCH 09/24] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Taylor Blau
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Prepare to implement pseudo-merge bitmap selection by implementing a
necessary new function, `bitmap_writer_has_bitmapped_object_id()`.

This function returns whether or not the bitmap_writer selected the
given object ID for bitmapping. This will allow the pseudo-merge
machinery to reject candidates for pseudo-merges if they have already
been selected as an ordinary bitmap tip.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 5 +++++
 pack-bitmap.h       | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index b1e8a0ad66d..cd528f89a76 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -123,6 +123,11 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack,
 	}
 }
 
+int bitmap_writer_has_bitmapped_object_id(const struct object_id *oid)
+{
+	return kh_get_oid_map(writer.bitmaps, *oid) != kh_end(writer.bitmaps);
+}
+
 /**
  * Compute the actual bitmaps
  */
diff --git a/pack-bitmap.h b/pack-bitmap.h
index ca9acd2f735..995d664cc89 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -98,6 +98,8 @@ int bitmap_has_oid_in_uninteresting(struct bitmap_index *, const struct object_i
 
 off_t get_disk_usage_from_bitmap(struct bitmap_index *, struct rev_info *);
 
+int bitmap_writer_has_bitmapped_object_id(const struct object_id *oid);
+
 void bitmap_writer_init(struct repository *r);
 void bitmap_writer_show_progress(int show);
 void bitmap_writer_set_checksum(const unsigned char *sha1);
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 09/24] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (7 preceding siblings ...)
  2024-03-20 22:05 ` [PATCH 08/24] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Taylor Blau
@ 2024-03-20 22:05 ` Taylor Blau
  2024-03-20 22:05 ` [PATCH 10/24] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The pseudo-merge selection code will be added in a subsequent commit,
and will need a way to push the allocated commit structures into the
bitmap writer from a separate compilation unit.

Make the `bitmap_writer_push_bitmapped_commit()` function part of the
pack-bitmap.h header in order to make this possible.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 4 ++--
 pack-bitmap.h       | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index cd528f89a76..e46978d494c 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -132,8 +132,8 @@ int bitmap_writer_has_bitmapped_object_id(const struct object_id *oid)
  * Compute the actual bitmaps
  */
 
-static void bitmap_writer_push_bitmapped_commit(struct commit *commit,
-						unsigned pseudo_merge)
+void bitmap_writer_push_bitmapped_commit(struct commit *commit,
+					 unsigned pseudo_merge)
 {
 	if (writer.selected_nr >= writer.selected_alloc) {
 		writer.selected_alloc = (writer.selected_alloc + 32) * 2;
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 995d664cc89..0f539d79cfd 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -99,6 +99,8 @@ int bitmap_has_oid_in_uninteresting(struct bitmap_index *, const struct object_i
 off_t get_disk_usage_from_bitmap(struct bitmap_index *, struct rev_info *);
 
 int bitmap_writer_has_bitmapped_object_id(const struct object_id *oid);
+void bitmap_writer_push_bitmapped_commit(struct commit *commit,
+					 unsigned pseudo_merge);
 
 void bitmap_writer_init(struct repository *r);
 void bitmap_writer_show_progress(int show);
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 10/24] pseudo-merge: implement support for selecting pseudo-merge commits
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (8 preceding siblings ...)
  2024-03-20 22:05 ` [PATCH 09/24] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Taylor Blau
@ 2024-03-20 22:05 ` Taylor Blau
  2024-03-20 22:05 ` [PATCH 11/24] pack-bitmap-write.c: select " Taylor Blau
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Teach the new pseudo-merge machinery how to select non-bitmapped commits
for inclusion in different pseudo-merge group(s) based on a handful of
criteria.

Pseudo-merges are derived first from named pseudo-merge groups (see the
`bitmapPseudoMerge.<name>.*` configuration options). They are
(optionally) further segmented within an individual pseudo-merge group
based on any capture group(s) within the pseudo-merge group's pattern.

For example, a configuration like so:

    [bitmapPseudoMerge "all"]
        pattern = "refs/"
        threshold = now
        stableThreshold = never
        sampleRate = 100
        maxMerges = 64

would group all non-bitmapped commits into up to 64 individual
pseudo-merge commits.

If you wanted to separate tags from branches when generating
pseudo-merge commits, and further segment them by which fork they
originate from (using the same "refs/virtual/" scheme as in the delta
islands documentation), you would instead write something like:

    [bitmapPseudoMerge "all"]
        pattern = "refs/virtual/([0-9]+)/(heads|tags)/"
        threshold = now
        stableThreshold = never
        sampleRate = 100
        maxMerges = 64

Which would generate pseudo-merge group identifiers like "1234-heads",
and "5678-tags" (for branches in fork "1234", and tags in remote "5678",
respectively).

Within pseudo-merge groups, there are a handful of other options used to
control the distribution of matching commits among individual
pseudo-merge commits:

  - bitmapPseudoMerge.<name>.decay
  - bitmapPseudoMerge.<name>.sampleRate
  - bitmapPseudoMerge.<name>.threshold
  - bitmapPseudoMerge.<name>.maxMerges
  - bitmapPseudoMerge.<name>.stableThreshold
  - bitmapPseudoMerge.<name>.stableSize

The decay parameter roughly corresponds to "k" in `f(n) = C*n^(-k/100)`,
where `f(n)` describes the size of the `n`-th pseudo-merge group. The
sample rate controls what percentage of eligible commits are considered
as candidates. The threshold parameter indicates the minimum age (so as
to avoid including too-recent commits in a pseudo-merge group, making it
less likely to be valid). The "maxMerges" parameter sets an upper-bound
on the number of pseudo-merge commits an individual group

The latter two "stable"-related parameters control "stable" pseudo-merge
groups, comprised of a fixed number of commits which are older than the
configured "stable threshold" value and may be grouped together in
chunks of "stableSize" in order of age.

This patch implements the aforementioned selection routine, as well as
parsing the relevant configuration options.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pseudo-merge.c | 441 +++++++++++++++++++++++++++++++++++++++++++++++++
 pseudo-merge.h |  96 +++++++++++
 2 files changed, 537 insertions(+)

diff --git a/pseudo-merge.c b/pseudo-merge.c
index 37e037ba272..caccef942a1 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -1,2 +1,443 @@
 #include "git-compat-util.h"
 #include "pseudo-merge.h"
+#include "date.h"
+#include "oid-array.h"
+#include "strbuf.h"
+#include "config.h"
+#include "string-list.h"
+#include "refs.h"
+#include "pack-bitmap.h"
+#include "commit.h"
+#include "alloc.h"
+#include "progress.h"
+
+#define DEFAULT_PSEUDO_MERGE_DECAY 1.0f
+#define DEFAULT_PSEUDO_MERGE_MAX_MERGES 64
+#define DEFAULT_PSEUDO_MERGE_SAMPLE_RATE 100
+#define DEFAULT_PSEUDO_MERGE_THRESHOLD approxidate("1.week.ago")
+#define DEFAULT_PSEUDO_MERGE_STABLE_THRESHOLD approxidate("1.month.ago")
+#define DEFAULT_PSEUDO_MERGE_STABLE_SIZE 512
+
+static float gitexp(float base, int exp)
+{
+	float result = 1;
+	while (1) {
+		if (exp % 2)
+			result *= base;
+		exp >>= 1;
+		if (!exp)
+			break;
+		base *= base;
+	}
+	return result;
+}
+
+static uint32_t pseudo_merge_group_size(const struct pseudo_merge_group *group,
+					const struct pseudo_merge_matches *matches,
+					uint32_t i)
+{
+	float C = 0.0f;
+	uint32_t n;
+
+	/*
+	 * The size of pseudo-merge groups decays according to a power series,
+	 * which looks like:
+	 *
+	 *   f(n) = C * n^-k
+	 *
+	 * , where 'n' is the n-th pseudo-merge group, 'f(n)' is its size, 'k'
+	 * is the decay rate, and 'C' is a scaling value.
+	 *
+	 * The value of C depends on the number of groups, decay rate, and total
+	 * number of commits. It is computed such that if there are M and N
+	 * total groups and commits, respectively, that:
+	 *
+	 *   N = f(0) + f(1) + ... f(M-1)
+	 *
+	 * Rearranging to isolate C, we get:
+	 *
+	 *   N = \sum_{n=1}^M C / n^k
+	 *
+	 *   N / C = \sum_{n=1}^M n^-k
+	 *
+	 *   C = N / \sum_{n=1}^M n^-k
+	 *
+	 * For example, if we have a decay rate of 'k' being equal to 1.5, 'N'
+	 * total commits equal to 10,000, and 'M' being equal to 6 groups, then
+	 * the (rounded) group sizes are:
+	 *
+	 *   { 5469, 1934, 1053, 684, 489, 372 }
+	 *
+	 * increasing the number of total groups, say to 10, scales the group
+	 * sizes appropriately:
+	 *
+	 *   { 5012, 1772, 964, 626, 448, 341, 271, 221, 186, 158 }
+	 */
+	for (n = 0; n < group->max_merges; n++)
+		C += 1.0f / gitexp(n + 1, group->decay);
+	C = matches->unstable_nr / C;
+
+	return (int)((C / gitexp(i + 1, group->decay)) + 0.5);
+}
+
+static void init_pseudo_merge_group(struct pseudo_merge_group *group)
+{
+	memset(group, 0, sizeof(struct pseudo_merge_group));
+
+	strmap_init_with_options(&group->matches, NULL, 0);
+
+	group->decay = DEFAULT_PSEUDO_MERGE_DECAY;
+	group->max_merges = DEFAULT_PSEUDO_MERGE_MAX_MERGES;
+	group->sample_rate = DEFAULT_PSEUDO_MERGE_SAMPLE_RATE;
+	group->threshold = DEFAULT_PSEUDO_MERGE_THRESHOLD;
+	group->stable_threshold = DEFAULT_PSEUDO_MERGE_STABLE_THRESHOLD;
+	group->stable_size = DEFAULT_PSEUDO_MERGE_STABLE_SIZE;
+}
+
+static int pseudo_merge_config(const char *var, const char *value,
+			       const struct config_context *ctx,
+			       void *cb_data)
+{
+	struct string_list *list = cb_data;
+	struct string_list_item *item;
+	struct pseudo_merge_group *group;
+	struct strbuf buf = STRBUF_INIT;
+	const char *sub, *key;
+	size_t sub_len;
+
+	if (parse_config_key(var, "bitmappseudomerge", &sub, &sub_len, &key))
+		return 0;
+
+	if (!sub_len)
+		return 0;
+
+	strbuf_add(&buf, sub, sub_len);
+
+	item = string_list_lookup(list, buf.buf);
+	if (!item) {
+		item = string_list_insert(list, buf.buf);
+
+		item->util = xmalloc(sizeof(struct pseudo_merge_group));
+		init_pseudo_merge_group(item->util);
+	}
+
+	group = item->util;
+
+	if (!strcmp(key, "pattern")) {
+		struct strbuf re = STRBUF_INIT;
+
+		free(group->pattern);
+		if (*value != '^')
+			strbuf_addch(&re, '^');
+		strbuf_addstr(&re, value);
+
+		group->pattern = xcalloc(1, sizeof(regex_t));
+		if (regcomp(group->pattern, re.buf, REG_EXTENDED))
+			die(_("failed to load pseudo-merge regex for %s: '%s'"),
+			    sub, re.buf);
+
+		strbuf_release(&re);
+	} else if (!strcmp(key, "decay")) {
+		group->decay = git_config_int(var, value, ctx->kvi);
+		if (group->decay < 0) {
+			warning(_("%s must be non-negative, using default"), var);
+			group->decay = DEFAULT_PSEUDO_MERGE_DECAY;
+		}
+	} else if (!strcmp(key, "samplerate")) {
+		group->sample_rate = git_config_int(var, value, ctx->kvi);
+		if (!(0 <= group->sample_rate && group->sample_rate <= 100)) {
+			warning(_("%s must be between 0 and 100, using default"), var);
+			group->sample_rate = DEFAULT_PSEUDO_MERGE_SAMPLE_RATE;
+		}
+	} else if (!strcmp(key, "threshold")) {
+		if (git_config_expiry_date(&group->threshold, var, value)) {
+			strbuf_release(&buf);
+			return -1;
+		}
+	} else if (!strcmp(key, "maxmerges")) {
+		group->max_merges = git_config_int(var, value, ctx->kvi);
+		if (group->max_merges < 0) {
+			warning(_("%s must be non-negative, using default"), var);
+			group->max_merges = DEFAULT_PSEUDO_MERGE_MAX_MERGES;
+		}
+	} else if (!strcmp(key, "stablethreshold")) {
+		if (git_config_expiry_date(&group->stable_threshold, var, value)) {
+			strbuf_release(&buf);
+			return -1;
+		}
+	} else if (!strcmp(key, "stablesize")) {
+		group->stable_size = git_config_int(var, value, ctx->kvi);
+		if (group->stable_size <= 0) {
+			warning(_("%s must be positive, using default"), var);
+			group->stable_size = DEFAULT_PSEUDO_MERGE_STABLE_SIZE;
+		}
+	}
+
+	strbuf_release(&buf);
+
+	return 0;
+}
+
+void load_pseudo_merges_from_config(struct string_list *list)
+{
+	struct string_list_item *item;
+
+	git_config(pseudo_merge_config, list);
+
+	for_each_string_list_item(item, list) {
+		struct pseudo_merge_group *group = item->util;
+		if (!group->pattern)
+			die(_("pseudo-merge group '%s' missing required pattern"),
+			    item->string);
+		if (group->threshold < group->stable_threshold)
+			die(_("pseudo-merge group '%s' has unstable threshold "
+			      "before stable one"), item->string);
+	}
+}
+
+static int find_pseudo_merge_group_for_ref(const char *refname,
+					   const struct object_id *oid,
+					   int flags UNUSED,
+					   void *_data)
+{
+	struct string_list *list = _data;
+	struct object_id peeled;
+	struct commit *c;
+	uint32_t i;
+	int has_bitmap;
+
+	if (!peel_iterated_oid(oid, &peeled))
+		oid = &peeled;
+
+	c = lookup_commit(the_repository, oid);
+	if (!c)
+		return 0;
+
+	has_bitmap = bitmap_writer_has_bitmapped_object_id(oid);
+
+	for (i = 0; i < list->nr; i++) {
+		struct pseudo_merge_group *group;
+		struct pseudo_merge_matches *matches;
+		struct strbuf group_name = STRBUF_INIT;
+		regmatch_t captures[16];
+		size_t j;
+
+		group = list->items[i].util;
+		if (regexec(group->pattern, refname, ARRAY_SIZE(captures),
+			    captures, 0))
+			continue;
+
+		if (captures[ARRAY_SIZE(captures) - 1].rm_so != -1)
+			warning(_("pseudo-merge regex from config has too many capture "
+				  "groups (max=%"PRIuMAX")"),
+				(uintmax_t)ARRAY_SIZE(captures) - 2);
+
+		for (j = !!group->pattern->re_nsub; j < ARRAY_SIZE(captures); j++) {
+			regmatch_t *match = &captures[j];
+			if (match->rm_so == -1)
+				continue;
+
+			if (group_name.len)
+				strbuf_addch(&group_name, '-');
+
+			strbuf_add(&group_name, refname + match->rm_so,
+				   match->rm_eo - match->rm_so);
+		}
+
+		matches = strmap_get(&group->matches, group_name.buf);
+		if (!matches) {
+			matches = xcalloc(1, sizeof(*matches));
+			strmap_put(&group->matches, strbuf_detach(&group_name, NULL),
+				   matches);
+		}
+
+		if (c->date <= group->stable_threshold) {
+			ALLOC_GROW(matches->stable, matches->stable_nr + 1,
+				   matches->stable_alloc);
+			matches->stable[matches->stable_nr++] = c;
+		} else if (c->date <= group->threshold && !has_bitmap) {
+			ALLOC_GROW(matches->unstable, matches->unstable_nr + 1,
+				   matches->unstable_alloc);
+			matches->unstable[matches->unstable_nr++] = c;
+		}
+
+		strbuf_release(&group_name);
+	}
+
+	return 0;
+}
+
+static struct commit *push_pseudo_merge(struct pseudo_merge_group *group)
+{
+	struct commit *merge;
+
+	ALLOC_GROW(group->merges, group->merges_nr + 1, group->merges_alloc);
+
+	merge = alloc_commit_node(the_repository);
+	merge->object.parsed = 1;
+	merge->object.flags |= BITMAP_PSEUDO_MERGE;
+
+	group->merges[group->merges_nr++] = merge;
+
+	return merge;
+}
+
+static struct pseudo_merge_commit_idx *pseudo_merge_idx(kh_oid_map_t *pseudo_merge_commits,
+							const struct object_id *oid)
+
+{
+	struct pseudo_merge_commit_idx *pmc;
+	khiter_t hash_pos;
+
+	hash_pos = kh_get_oid_map(pseudo_merge_commits, *oid);
+	if (hash_pos == kh_end(pseudo_merge_commits)) {
+		int hash_ret;
+		hash_pos = kh_put_oid_map(pseudo_merge_commits, *oid, &hash_ret);
+
+		CALLOC_ARRAY(pmc, 1);
+
+		kh_value(pseudo_merge_commits, hash_pos) = pmc;
+	} else {
+		pmc = kh_value(pseudo_merge_commits, hash_pos);
+	}
+
+	return pmc;
+}
+
+#define MIN_PSEUDO_MERGE_SIZE 8
+
+static void select_pseudo_merges_1(struct pseudo_merge_group *group,
+				   struct pseudo_merge_matches *matches,
+				   kh_oid_map_t *pseudo_merge_commits,
+				   uint32_t *pseudo_merges_nr)
+{
+	uint32_t i, j;
+	uint32_t stable_merges_nr;
+
+	if (!matches->stable_nr && !matches->unstable_nr)
+		return; /* all tips in this group already have bitmaps */
+
+	stable_merges_nr = matches->stable_nr / group->stable_size;
+	if (matches->stable_nr % group->stable_size)
+		stable_merges_nr++;
+
+	/* make stable_merges_nr pseudo merges for stable commits */
+	for (i = 0, j = 0; i < stable_merges_nr; i++) {
+		struct commit *merge;
+		struct commit_list **p;
+
+		merge = push_pseudo_merge(group);
+		p = &merge->parents;
+
+		do {
+			struct commit *c;
+			struct pseudo_merge_commit_idx *pmc;
+
+			if (j >= matches->stable_nr)
+				break;
+
+			c = matches->stable[j++];
+			pmc = pseudo_merge_idx(pseudo_merge_commits,
+					       &c->object.oid);
+
+			ALLOC_GROW(pmc->pseudo_merge, pmc->nr + 1, pmc->alloc);
+
+			pmc->pseudo_merge[pmc->nr++] = *pseudo_merges_nr;
+			p = commit_list_append(c, p);
+		} while (j % group->stable_size);
+
+		bitmap_writer_push_bitmapped_commit(merge, 1);
+		(*pseudo_merges_nr)++;
+	}
+
+	/* make up to group->max_merges pseudo merges for unstable commits */
+	for (i = 0, j = 0; i < group->max_merges; i++) {
+		struct commit *merge;
+		struct commit_list **p;
+		uint32_t size, end;
+
+		merge = push_pseudo_merge(group);
+		p = &merge->parents;
+
+		size = pseudo_merge_group_size(group, matches, i);
+		end = size < MIN_PSEUDO_MERGE_SIZE ? matches->unstable_nr : j + size;
+
+		for (; j < end && j < matches->unstable_nr; j++) {
+			struct commit *c = matches->unstable[j];
+			struct pseudo_merge_commit_idx *pmc;
+
+			if (j % (100 / group->sample_rate))
+				continue;
+
+			pmc = pseudo_merge_idx(pseudo_merge_commits,
+					       &c->object.oid);
+
+			ALLOC_GROW(pmc->pseudo_merge, pmc->nr + 1, pmc->alloc);
+
+			pmc->pseudo_merge[pmc->nr++] = *pseudo_merges_nr;
+			p = commit_list_append(c, p);
+		}
+
+		bitmap_writer_push_bitmapped_commit(merge, 1);
+		(*pseudo_merges_nr)++;
+		if (end >= matches->unstable_nr)
+			break;
+	}
+}
+
+static int commit_date_cmp(const void *va, const void *vb)
+{
+	timestamp_t a = (*(const struct commit **)va)->date;
+	timestamp_t b = (*(const struct commit **)vb)->date;
+
+	if (a < b)
+		return -1;
+	else if (a > b)
+		return 1;
+	return 0;
+}
+
+static void sort_pseudo_merge_matches(struct pseudo_merge_matches *matches)
+{
+	QSORT(matches->stable, matches->stable_nr, commit_date_cmp);
+	QSORT(matches->unstable, matches->unstable_nr, commit_date_cmp);
+}
+
+void select_pseudo_merges(struct string_list *list,
+			  struct commit **commits, size_t commits_nr,
+			  kh_oid_map_t *pseudo_merge_commits,
+			  uint32_t *pseudo_merges_nr,
+			  unsigned show_progress)
+{
+	struct progress *progress = NULL;
+	uint32_t i;
+
+	if (!list->nr)
+		return;
+
+	if (show_progress)
+		progress = start_progress("Selecting pseudo-merge commits", list->nr);
+
+	for_each_ref(find_pseudo_merge_group_for_ref, list);
+
+	for (i = 0; i < list->nr; i++) {
+		struct pseudo_merge_group *group;
+		struct hashmap_iter iter;
+		struct strmap_entry *e;
+
+		group = list->items[i].util;
+		strmap_for_each_entry(&group->matches, &iter, e) {
+			struct pseudo_merge_matches *matches = e->value;
+
+			sort_pseudo_merge_matches(matches);
+
+			select_pseudo_merges_1(group, matches,
+					       pseudo_merge_commits,
+					       pseudo_merges_nr);
+		}
+
+		display_progress(progress, i + 1);
+	}
+
+	stop_progress(&progress);
+}
diff --git a/pseudo-merge.h b/pseudo-merge.h
index cab8ff6960a..81888731864 100644
--- a/pseudo-merge.h
+++ b/pseudo-merge.h
@@ -2,5 +2,101 @@
 #define PSEUDO_MERGE_H
 
 #include "git-compat-util.h"
+#include "strmap.h"
+#include "khash.h"
+#include "ewah/ewok.h"
+
+struct commit;
+struct string_list;
+struct bitmap_index;
+
+/*
+ * A pseudo-merge group tracks the set of non-bitmapped reference tips
+ * that match the given pattern.
+ *
+ * Within those matches, they are further segmented by separating
+ * consecutive capture groups with '-' dash character capture groups
+ * with '-' dash characters.
+ *
+ * Those groups are then ordered by committer date and partitioned
+ * into individual pseudo-merge(s) according to the decay, max_merges,
+ * sample_rate, and threshold parameters.
+ */
+struct pseudo_merge_group {
+	regex_t *pattern;
+
+	/* capture group(s) -> struct pseudo_merge_matches */
+	struct strmap matches;
+
+	/*
+	 * The individual pseudo-merge(s) that are generated from the
+	 * above array of matches, partitioned according to the below
+	 * parameters.
+	 */
+	struct commit **merges;
+	size_t merges_nr;
+	size_t merges_alloc;
+
+	/*
+	 * Pseudo-merge grouping parameters. See git-config(1) for
+	 * more information.
+	 */
+	float decay;
+	int max_merges;
+	int sample_rate;
+	int stable_size;
+	timestamp_t threshold;
+	timestamp_t stable_threshold;
+};
+
+struct pseudo_merge_matches {
+	struct commit **stable;
+	struct commit **unstable;
+	size_t stable_nr, stable_alloc;
+	size_t unstable_nr, unstable_alloc;
+};
+
+/*
+ * Read the repository's configuration:
+ *
+ *   - bitmapPseudoMerge.<name>.pattern
+ *   - bitmapPseudoMerge.<name>.decay
+ *   - bitmapPseudoMerge.<name>.sampleRate
+ *   - bitmapPseudoMerge.<name>.threshold
+ *   - bitmapPseudoMerge.<name>.maxMerges
+ *   - bitmapPseudoMerge.<name>.stableThreshold
+ *   - bitmapPseudoMerge.<name>.stableSize
+ *
+ * and populates the given `list` with pseudo-merge groups. String
+ * entry keys are the pseudo-merge group names, and the values are
+ * pointers to the pseudo_merge_group structure itself.
+ */
+void load_pseudo_merges_from_config(struct string_list *list);
+
+/*
+ * A pseudo-merge commit index (pseudo_merge_commit_idx) maps a
+ * particular (non-pseudo-merge) commit to the list of pseudo-merge(s)
+ * it appears in.
+ */
+struct pseudo_merge_commit_idx {
+	uint32_t *pseudo_merge;
+	size_t nr, alloc;
+};
+
+/*
+ * Selects pseudo-merges from a list of commits, populating the given
+ * string_list of pseudo-merge groups.
+ *
+ * Populates the pseudo_merge_commits map with a commit_idx
+ * corresponding to each commit in the list. Counts the total number
+ * of pseudo-merges generated.
+ *
+ * Optionally shows a progress meter.
+ */
+void select_pseudo_merges(struct string_list *list,
+			  struct commit **commits, size_t commits_nr,
+			  kh_oid_map_t *pseudo_merge_commits,
+			  uint32_t *pseudo_merges_nr,
+			  unsigned show_progress);
 
 #endif
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 11/24] pack-bitmap-write.c: select pseudo-merge commits
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (9 preceding siblings ...)
  2024-03-20 22:05 ` [PATCH 10/24] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
@ 2024-03-20 22:05 ` Taylor Blau
  2024-03-20 22:05 ` [PATCH 12/24] pack-bitmap-write.c: write pseudo-merge table Taylor Blau
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Now that the pseudo-merge machinery has learned how to select
non-bitmapped commits and assign them into different pseudo-merge
group(s), invoke this new API from within the pack-bitmap internals and
store the results off.

Note that the selected pseudo-merge commits aren't actually used or
written anywhere yet. This will be done in the following commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config.txt                     |  2 +
 Documentation/config/bitmap-pseudo-merge.txt | 75 ++++++++++++++++++++
 Documentation/technical/bitmap-format.txt    | 26 +++++++
 pack-bitmap-write.c                          | 14 ++++
 4 files changed, 117 insertions(+)
 create mode 100644 Documentation/config/bitmap-pseudo-merge.txt

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 782c2bab906..e5a7170c9e0 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -381,6 +381,8 @@ include::config/apply.txt[]
 
 include::config/attr.txt[]
 
+include::config/bitmap-pseudo-merge.txt[]
+
 include::config/blame.txt[]
 
 include::config/branch.txt[]
diff --git a/Documentation/config/bitmap-pseudo-merge.txt b/Documentation/config/bitmap-pseudo-merge.txt
new file mode 100644
index 00000000000..90b72522046
--- /dev/null
+++ b/Documentation/config/bitmap-pseudo-merge.txt
@@ -0,0 +1,75 @@
+bitmapPseudoMerge.<name>.pattern::
+	Regular expression used to match reference names. Commits
+	pointed to by references matching this pattern (and meeting
+	the below criteria, like `bitmapPseudoMerge.<name>.sampleRate`
+	and `bitmapPseudoMerge.<name>.threshold`) will be considered
+	for inclusion in a pseudo-merge bitmap.
++
+Commits are grouped into pseudo-merge groups based on whether or not
+any reference(s) that point at a given commit match the pattern, which
+is an extended regular expression.
++
+Within a pseudo-merge group, commits may be further grouped into
+sub-groups based on the capture groups in the pattern. These
+sub-groupings are formed from the regular expressions by concatenating
+any capture groups from the regular expression, with a '-' dash in
+between.
++
+For example, if the pattern is `refs/tags/`, then all tags (provided
+they meet the below criteria) will be considered candidates for the
+same pseudo-merge group. However, if the pattern is instead
+`refs/remotes/([0-9])+/tags/`, then tags from different remotes will
+be grouped into separate pseudo-merge groups, based on the remote
+number.
+
+bitmapPseudoMerge.<name>.decay::
+	Determines the rate at which consecutive pseudo-merge bitmap
+	groups decrease in size. Must be non-negative. This parameter
+	can be thought of as `k` in the function `f(n) = C *
+	n^(-k/100)`, where `f(n)` is the size of the `n`th group.
++
+Setting the decay rate equal to `0` will cause all groups to be the
+same size. Setting the decay rate equal to `100` will cause the `n`th
+group to be `1/n` the size of the initial group.  Higher values of the
+decay rate cause consecutive groups to shrink at an increasing rate.
+The default is `100`.
+
+bitmapPseudoMerge.<name>.sampleRate::
+	Determines the proportion of non-bitmapped commits (among
+	reference tips) which are selected for inclusion in an
+	unstable pseudo-merge bitmap. Must be between `0` and `100`
+	(inclusive). The default is `100`.
+
+bitmapPseudoMerge.<name>.threshold::
+	Determines the minimum age of non-bitmapped commits (among
+	reference tips, as above) which are candidates for inclusion
+	in an unstable pseudo-merge bitmap. The default is
+	`1.week.ago`.
+
+bitmapPseudoMerge.<name>.maxMerges::
+	Determines the maximum number of pseudo-merge commits among
+	which commits may be distributed.
++
+For pseudo-merge groups whose pattern does not contain any capture
+groups, this setting is applied for all commits matching the regular
+expression. For patterns that have one or more capture groups, this
+setting is applied for each distinct capture group.
++
+For example, if your capture group is `refs/tags/`, then this setting
+will distribute all tags into a maximum of `maxMerges` pseudo-merge
+commits. However, if your capture group is, say,
+`refs/remotes/([0-9]+)/tags/`, then this setting will be applied to
+each remote's set of tags individually.
++
+Must be non-negative. The default value is 64.
+
+bitmapPseudoMerge.<name>.stableThreshold::
+	Determines the minimum age of commits (among reference tips,
+	as above, however stable commits are still considered
+	candidates even when they have been covered by a bitmap) which
+	are candidates for a stable a pseudo-merge bitmap. The default
+	is `1.month.ago`.
+
+bitmapPseudoMerge.<name>.stableSize::
+	Determines the size (in number of commits) of a stable
+	psuedo-merge bitmap. The default is `512`.
diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index 63a7177ac08..ed7edf98034 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -434,3 +434,29 @@ the end of a `.bitmap` file. The format is as follows:
 
 * An 8-byte unsigned value (in network byte-order) equal to the number
   of bytes in the pseudo-merge section (including this field).
+
+=== Pseudo-merge selection
+
+Pseudo-merge commits are selected among non-bitmapped commits at the
+tip of one or more reference(s). In addition, there are a handful of
+constraints to further refine this selection:
+
+`pack.bitmapPseudoMergeDecay`:: Defines the "decay rate", which
+corresponds to how quickly (or not) consecutive pseudo-merges decrease
+in size relative to one another.
+
+`pack.bitmapPseudoMergeGroups`:: Defines the maximum number of
+pseudo-merge groups.
+
+`pack.bitmapPseudoMergeSampleRate`:: Defines the percentage of commits
+(matching the above criteria) which are selected.
+
+`pack.bitmapPseudoMergeThreshold`:: Defines the minimum age of a commit
+in order to be considered for inclusion within one or more pseudo-merge
+bitmaps.
+
+The size of consecutive pseudo-merge groups decays according to a
+power-law decay function, where the size of the `n`-th group is `f(n) =
+C*n^-k`. The value of `C` is chosen accordingly to match the number of
+desired groups, and `k` is 1/100th of the value of
+`pack.bitmapPseudoMergeDecay`.
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index e46978d494c..db1c38f4e46 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -17,6 +17,7 @@
 #include "trace2.h"
 #include "tree.h"
 #include "tree-walk.h"
+#include "pseudo-merge.h"
 
 struct bitmapped_commit {
 	struct commit *commit;
@@ -39,6 +40,8 @@ struct bitmap_writer {
 	struct bitmapped_commit *selected;
 	unsigned int selected_nr, selected_alloc;
 
+	struct string_list pseudo_merge_groups;
+	kh_oid_map_t *pseudo_merge_commits; /* oid -> pseudo merge(s) */
 	uint32_t pseudo_merges_nr;
 
 	struct progress *progress;
@@ -56,6 +59,11 @@ static inline int bitmap_writer_selected_nr(void)
 void bitmap_writer_init(struct repository *r)
 {
 	writer.bitmaps = kh_init_oid_map();
+	writer.pseudo_merge_commits = kh_init_oid_map();
+
+	string_list_init_dup(&writer.pseudo_merge_groups);
+
+	load_pseudo_merges_from_config(&writer.pseudo_merge_groups);
 }
 
 void bitmap_writer_show_progress(int show)
@@ -686,6 +694,12 @@ void bitmap_writer_select_commits(struct commit **indexed_commits,
 	}
 
 	stop_progress(&writer.progress);
+
+	select_pseudo_merges(&writer.pseudo_merge_groups,
+			     indexed_commits, indexed_commits_nr,
+			     writer.pseudo_merge_commits,
+			     &writer.pseudo_merges_nr,
+			     writer.show_progress);
 }
 
 
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 12/24] pack-bitmap-write.c: write pseudo-merge table
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (10 preceding siblings ...)
  2024-03-20 22:05 ` [PATCH 11/24] pack-bitmap-write.c: select " Taylor Blau
@ 2024-03-20 22:05 ` Taylor Blau
  2024-03-20 22:05 ` [PATCH 13/24] pack-bitmap: extract `read_bitmap()` function Taylor Blau
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Now that the pack-bitmap writer machinery understands how to select and
store pseudo-merge commits, teach it how to write the new optional
pseudo-merge .bitmap extension.

No readers yet exist for this new extension to the .bitmap format. The
following commits will take any preparatory step(s) necessary before
then implementing the routines necessary to read this new table.

In the meantime, the new `write_pseudo_merges()` function implements
writing this new format as described by a previous commit in
Documentation/technical/bitmap-format.txt.

Writing this table is fairly straightforward and consists of a few
sub-components:

  - a pair of bitmaps for each pseudo-merge (one for the pseudo-merge
    "parents", and another for the objects reachable from those parents)

  - for each commit, the offset of either (a) the pseudo-merge it
    belongs to, or (b) an extended lookup table if it belongs to >1
    pseudo-merge groups

  - if there are any commits belonging to >1 pseudo-merge group, the
    extended lookup tables (which each consist of the number of
    pseudo-merge groups a commit appears in, and then that many 4-byte
    unsigned )

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 128 ++++++++++++++++++++++++++++++++++++++++++++
 pack-bitmap.h       |   1 +
 2 files changed, 129 insertions(+)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index db1c38f4e46..2d1b202fcd9 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -18,6 +18,7 @@
 #include "tree.h"
 #include "tree-walk.h"
 #include "pseudo-merge.h"
+#include "oid-array.h"
 
 struct bitmapped_commit {
 	struct commit *commit;
@@ -748,6 +749,127 @@ static void write_selected_commits_v1(struct hashfile *f,
 	}
 }
 
+static void write_pseudo_merges(struct hashfile *f)
+{
+	struct oid_array commits = OID_ARRAY_INIT;
+	struct bitmap **commits_bitmap = NULL;
+	off_t *pseudo_merge_ofs = NULL;
+	off_t start, table_start, next_ext;
+
+	uint32_t base = bitmap_writer_selected_nr();
+	size_t i, j = 0;
+
+	CALLOC_ARRAY(commits_bitmap, writer.pseudo_merges_nr);
+	CALLOC_ARRAY(pseudo_merge_ofs, writer.pseudo_merges_nr);
+
+	for (i = 0; i < writer.pseudo_merges_nr; i++) {
+		struct bitmapped_commit *merge = &writer.selected[base + i];
+		struct commit_list *p;
+
+		if (!merge->pseudo_merge)
+			BUG("found non-pseudo merge commit at %"PRIuMAX, (uintmax_t)i);
+
+		commits_bitmap[i] = bitmap_new();
+
+		for (p = merge->commit->parents; p; p = p->next)
+			bitmap_set(commits_bitmap[i],
+				   find_object_pos(&p->item->object.oid, NULL));
+	}
+
+	start = hashfile_total(f);
+
+	for (i = 0; i < writer.pseudo_merges_nr; i++) {
+		struct ewah_bitmap *commits_ewah = bitmap_to_ewah(commits_bitmap[i]);
+
+		pseudo_merge_ofs[i] = hashfile_total(f);
+
+		dump_bitmap(f, commits_ewah);
+		dump_bitmap(f, writer.selected[base+i].write_as);
+
+		ewah_free(commits_ewah);
+	}
+
+	next_ext = st_add(hashfile_total(f),
+			  st_mult(kh_size(writer.pseudo_merge_commits),
+				  sizeof(uint64_t)));
+
+	table_start = hashfile_total(f);
+
+	commits.alloc = kh_size(writer.pseudo_merge_commits);
+	CALLOC_ARRAY(commits.oid, commits.alloc);
+
+	for (i = kh_begin(writer.pseudo_merge_commits); i != kh_end(writer.pseudo_merge_commits); i++) {
+		if (!kh_exist(writer.pseudo_merge_commits, i))
+			continue;
+		oid_array_append(&commits, &kh_key(writer.pseudo_merge_commits, i));
+	}
+
+	oid_array_sort(&commits);
+
+	/* write lookup table (non-extended) */
+	for (i = 0; i < commits.nr; i++) {
+		int hash_pos;
+		struct pseudo_merge_commit_idx *c;
+
+		hash_pos = kh_get_oid_map(writer.pseudo_merge_commits,
+					  commits.oid[i]);
+		if (hash_pos == kh_end(writer.pseudo_merge_commits))
+			BUG("could not find pseudo-merge commit %s",
+			    oid_to_hex(&commits.oid[i]));
+
+		c = kh_value(writer.pseudo_merge_commits, hash_pos);
+
+		hashwrite_be32(f, find_object_pos(&commits.oid[i], NULL));
+		if (c->nr == 1)
+			hashwrite_be64(f, pseudo_merge_ofs[c->pseudo_merge[0]]);
+		else if (c->nr > 1) {
+			if (next_ext & ((uint64_t)1<<63))
+				die(_("too many pseudo-merges"));
+			hashwrite_be64(f, next_ext | ((uint64_t)1<<63));
+			next_ext = st_add3(next_ext,
+					   sizeof(uint32_t),
+					   st_mult(c->nr, sizeof(uint64_t)));
+		} else
+			BUG("expected commit '%s' to have at least one "
+			    "pseudo-merge", oid_to_hex(&commits.oid[i]));
+	}
+
+	/* write lookup table (extended) */
+	for (i = 0; i < commits.nr; i++) {
+		int hash_pos;
+		struct pseudo_merge_commit_idx *c;
+
+		hash_pos = kh_get_oid_map(writer.pseudo_merge_commits,
+					  commits.oid[i]);
+		if (hash_pos == kh_end(writer.pseudo_merge_commits))
+			BUG("could not find pseudo-merge commit %s",
+			    oid_to_hex(&commits.oid[i]));
+
+		c = kh_value(writer.pseudo_merge_commits, hash_pos);
+		if (c->nr == 1)
+			continue;
+
+		hashwrite_be32(f, c->nr);
+		for (j = 0; j < c->nr; j++)
+			hashwrite_be64(f, pseudo_merge_ofs[c->pseudo_merge[j]]);
+	}
+
+	/* write positions for all pseudo merges */
+	for (i = 0; i < writer.pseudo_merges_nr; i++)
+		hashwrite_be64(f, pseudo_merge_ofs[i]);
+
+	hashwrite_be32(f, writer.pseudo_merges_nr);
+	hashwrite_be32(f, kh_size(writer.pseudo_merge_commits));
+	hashwrite_be64(f, table_start - start);
+	hashwrite_be64(f, hashfile_total(f) - start + sizeof(uint64_t));
+
+	for (i = 0; i < writer.pseudo_merges_nr; i++)
+		bitmap_free(commits_bitmap[i]);
+
+	free(pseudo_merge_ofs);
+	free(commits_bitmap);
+}
+
 static int table_cmp(const void *_va, const void *_vb, void *_data)
 {
 	uint32_t *commit_positions = _data;
@@ -855,6 +977,9 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 
 	int fd = odb_mkstemp(&tmp_file, "pack/tmp_bitmap_XXXXXX");
 
+	if (writer.pseudo_merges_nr)
+		options |= BITMAP_OPT_PSEUDO_MERGES;
+
 	f = hashfd(fd, tmp_file.buf);
 
 	memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE));
@@ -886,6 +1011,9 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 
 	write_selected_commits_v1(f, commit_positions, offsets);
 
+	if (options & BITMAP_OPT_PSEUDO_MERGES)
+		write_pseudo_merges(f);
+
 	if (options & BITMAP_OPT_LOOKUP_TABLE)
 		write_lookup_table(f, commit_positions, offsets);
 
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 0f539d79cfd..55527f61cd9 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -37,6 +37,7 @@ enum pack_bitmap_opts {
 	BITMAP_OPT_FULL_DAG = 0x1,
 	BITMAP_OPT_HASH_CACHE = 0x4,
 	BITMAP_OPT_LOOKUP_TABLE = 0x10,
+	BITMAP_OPT_PSEUDO_MERGES = 0x20,
 };
 
 enum pack_bitmap_flags {
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 13/24] pack-bitmap: extract `read_bitmap()` function
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (11 preceding siblings ...)
  2024-03-20 22:05 ` [PATCH 12/24] pack-bitmap-write.c: write pseudo-merge table Taylor Blau
@ 2024-03-20 22:05 ` Taylor Blau
  2024-03-20 22:05 ` [PATCH 14/24] pseudo-merge: scaffolding for reads Taylor Blau
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The pack-bitmap machinery uses the `read_bitmap_1()` function to read a
bitmap from within the mmap'd region corresponding to the .bitmap file.
As as side-effect of calling this function, `read_bitmap_1()` increments
the `index->map_pos` variable to reflect the number of bytes read.

Extract the core of this routine to a separate function (that operates
over a `const unsigned char *`, a `size_t` and a `size_t *` pointer)
instead of a `struct bitmap_index *` pointer.

This function (called `read_bitmap()`) is part of the pack-bitmap.h API
so that it can be used within the upcoming portion of the implementation
in pseduo-merge.ch.

Rewrite the existing function, `read_bitmap_1()`, in terms of its more
generic counterpart.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 24 +++++++++++++++---------
 pack-bitmap.h |  2 ++
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 2baeabacee1..b3b6f9aad21 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -129,17 +129,13 @@ static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
 	return composed;
 }
 
-/*
- * Read a bitmap from the current read position on the mmaped
- * index, and increase the read position accordingly
- */
-static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
+struct ewah_bitmap *read_bitmap(const unsigned char *map,
+				size_t map_size, size_t *map_pos)
 {
 	struct ewah_bitmap *b = ewah_pool_new();
 
-	ssize_t bitmap_size = ewah_read_mmap(b,
-		index->map + index->map_pos,
-		index->map_size - index->map_pos);
+	ssize_t bitmap_size = ewah_read_mmap(b, map + *map_pos,
+					     map_size - *map_pos);
 
 	if (bitmap_size < 0) {
 		error(_("failed to load bitmap index (corrupted?)"));
@@ -147,10 +143,20 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
 		return NULL;
 	}
 
-	index->map_pos += bitmap_size;
+	*map_pos += bitmap_size;
+
 	return b;
 }
 
+/*
+ * Read a bitmap from the current read position on the mmaped
+ * index, and increase the read position accordingly
+ */
+static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
+{
+	return read_bitmap(index->map, index->map_size, &index->map_pos);
+}
+
 static uint32_t bitmap_num_objects(struct bitmap_index *index)
 {
 	if (index->midx)
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 55527f61cd9..a5fe4f305ef 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -133,4 +133,6 @@ int bitmap_is_preferred_refname(struct repository *r, const char *refname);
 
 int verify_bitmap_files(struct repository *r);
 
+struct ewah_bitmap *read_bitmap(const unsigned char *map,
+				size_t map_size, size_t *map_pos);
 #endif
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 14/24] pseudo-merge: scaffolding for reads
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (12 preceding siblings ...)
  2024-03-20 22:05 ` [PATCH 13/24] pack-bitmap: extract `read_bitmap()` function Taylor Blau
@ 2024-03-20 22:05 ` Taylor Blau
  2024-03-20 22:05 ` [PATCH 15/24] pack-bitmap.c: read pseudo-merge extension Taylor Blau
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Implement scaffolding within the new pseudo-merge compilation unit
necessary to use the pseudo-merge API from within the pack-bitmap.c
machinery.

The core of this scaffolding is two-fold:

  - The `pseudo_merge` structure itself, which represents an individual
    pseudo-merge bitmap. It has fields for both bitmaps, as well as
    metadata about its position within the memory-mapped region, and
    a few extra bits indicating whether or not it is satisfied, and
    which bitmaps(s, if any) have been read, since they are initialized
    lazily.

  - The `pseudo_merge_map` structure, which holds an array of
    pseudo_merges, as well as a pointer to the memory-mapped region
    containing the pseudo-merge serialization from within a .bitmap
    file.

Note that the `bitmap_index` structure is defined statically within the
pack-bitmap.o compilation unit, so we can't take in a `struct
bitmap_index *`. Instead, wrap the primary components necessary to read
the pseudo-merges in this new structure to avoid exposing the
implementation details of the `bitmap_index` structure.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pseudo-merge.c | 10 ++++++++
 pseudo-merge.h | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+)

diff --git a/pseudo-merge.c b/pseudo-merge.c
index caccef942a1..d18de0a266b 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -441,3 +441,13 @@ void select_pseudo_merges(struct string_list *list,
 
 	stop_progress(&progress);
 }
+
+void free_pseudo_merge_map(struct pseudo_merge_map *pm)
+{
+	uint32_t i;
+	for (i = 0; i < pm->nr; i++) {
+		ewah_pool_free(pm->v[i].commits);
+		ewah_pool_free(pm->v[i].bitmap);
+	}
+	free(pm->v);
+}
diff --git a/pseudo-merge.h b/pseudo-merge.h
index 81888731864..2f652fc6767 100644
--- a/pseudo-merge.h
+++ b/pseudo-merge.h
@@ -99,4 +99,69 @@ void select_pseudo_merges(struct string_list *list,
 			  uint32_t *pseudo_merges_nr,
 			  unsigned show_progress);
 
+/*
+ * Represents a serialized view of a file containing pseudo-merge(s)
+ * (see Documentation/technical/bitmap-format.txt for a specification
+ * of the format).
+ */
+struct pseudo_merge_map {
+	/*
+	 * An array of pseudo-merge(s), lazily loaded from the .bitmap
+	 * file.
+	 */
+	struct pseudo_merge *v;
+	size_t nr;
+	size_t commits_nr;
+
+	/*
+	 * Pointers into a memory-mapped view of the .bitmap file:
+	 *
+	 *   - map: the beginning of the .bitmap file
+	 *   - commits: the beginning of the pseudo-merge commit index
+	 *   - map_size: the size of the .bitmap file
+	 */
+	const unsigned char *map;
+	const unsigned char *commits;
+
+	size_t map_size;
+};
+
+/*
+ * An individual pseudo-merge, storing a pair of lazily-loaded
+ * bitmaps:
+ *
+ *  - commits: the set of commit(s) that are part of the pseudo-merge
+ *  - bitmap: the set of object(s) reachable from the above set of
+ *    commits.
+ *
+ * The `at` and `bitmap_at` fields are used to store the locations of
+ * each of the above bitmaps in the .bitmap file.
+ */
+struct pseudo_merge {
+	struct ewah_bitmap *commits;
+	struct ewah_bitmap *bitmap;
+
+	off_t at;
+	off_t bitmap_at;
+
+	/*
+	 * `satisfied` indicates whether the given pseudo-merge has been
+	 * used.
+	 *
+	 * `loaded_commits` and `loaded_bitmap` indicate whether the
+	 * respective bitmaps have been loaded and read from the
+	 * .bitmap file.
+	 */
+	unsigned satisfied : 1,
+		 loaded_commits : 1,
+		 loaded_bitmap : 1;
+};
+
+/*
+ * Frees the given pseudo-merge map, releasing any memory held by (a)
+ * parsed EWAH bitmaps, or (b) the array of pseudo-merges itself. Does
+ * not free the memory-mapped view of the .bitmap file.
+ */
+void free_pseudo_merge_map(struct pseudo_merge_map *pm);
+
 #endif
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 15/24] pack-bitmap.c: read pseudo-merge extension
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (13 preceding siblings ...)
  2024-03-20 22:05 ` [PATCH 14/24] pseudo-merge: scaffolding for reads Taylor Blau
@ 2024-03-20 22:05 ` Taylor Blau
  2024-03-20 22:05 ` [PATCH 16/24] pseudo-merge: implement support for reading pseudo-merge commits Taylor Blau
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Now that the scaffolding for reading the pseudo-merge extension has been
laid, teach the pack-bitmap machinery to read the pseudo-merge extension
when present.

Note that pseudo-merges themselves are not yet used during traversal,
this step will be taken by a future commit.

In the meantime, read the table and initialize the pseudo_merge_map
structure introduced by a previous commit. When the pseudo-merge
extension is present, `load_bitmap_header()` performs basic sanity
checks to make sure that the table is well-formed.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index b3b6f9aad21..e0f191b7581 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -20,6 +20,7 @@
 #include "list-objects-filter-options.h"
 #include "midx.h"
 #include "config.h"
+#include "pseudo-merge.h"
 
 /*
  * An entry on the bitmap index, representing the bitmap for a given
@@ -86,6 +87,9 @@ struct bitmap_index {
 	 */
 	unsigned char *table_lookup;
 
+	/* This contains the pseudo-merge cache within 'map' (if found). */
+	struct pseudo_merge_map pseudo_merges;
+
 	/*
 	 * Extended index.
 	 *
@@ -205,6 +209,41 @@ static int load_bitmap_header(struct bitmap_index *index)
 				index->table_lookup = (void *)(index_end - table_size);
 			index_end -= table_size;
 		}
+
+		if (flags & BITMAP_OPT_PSEUDO_MERGES) {
+			unsigned char *pseudo_merge_ofs;
+			size_t table_size;
+			uint32_t i;
+
+			if (sizeof(table_size) > index_end - index->map - header_size)
+				return error(_("corrupted bitmap index file (too short to fit pseudo-merge table header)"));
+
+			table_size = get_be64(index_end - 8);
+			if (table_size > index_end - index->map - header_size)
+				return error(_("corrupted bitmap index file (too short to fit pseudo-merge table)"));
+
+			if (git_env_bool("GIT_TEST_USE_PSEUDO_MERGES", 1)) {
+				const unsigned char *ext = (index_end - table_size);
+
+				index->pseudo_merges.map = index->map;
+				index->pseudo_merges.map_size = index->map_size;
+				index->pseudo_merges.commits = ext + get_be64(index_end - 16);
+				index->pseudo_merges.commits_nr = get_be32(index_end - 20);
+				index->pseudo_merges.nr = get_be32(index_end - 24);
+
+				CALLOC_ARRAY(index->pseudo_merges.v,
+					     index->pseudo_merges.nr);
+
+				pseudo_merge_ofs = index_end - 24 -
+					(index->pseudo_merges.nr * sizeof(uint64_t));
+				for (i = 0; i < index->pseudo_merges.nr; i++) {
+					index->pseudo_merges.v[i].at = get_be64(pseudo_merge_ofs);
+					pseudo_merge_ofs += sizeof(uint64_t);
+				}
+			}
+
+			index_end -= table_size;
+		}
 	}
 
 	index->entry_count = ntohl(header->entry_count);
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 16/24] pseudo-merge: implement support for reading pseudo-merge commits
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (14 preceding siblings ...)
  2024-03-20 22:05 ` [PATCH 15/24] pack-bitmap.c: read pseudo-merge extension Taylor Blau
@ 2024-03-20 22:05 ` Taylor Blau
  2024-03-20 22:05 ` [PATCH 17/24] ewah: implement `ewah_bitmap_popcount()` Taylor Blau
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Implement the basic API for reading pseudo-merge bitmaps, which consists
of four basic functions:

  - pseudo_merge_bitmap()
  - use_pseudo_merge()
  - apply_pseudo_merges_for_commit()
  - cascade_pseudo_merges()

These functions are all documented in pseudo-merge.h, but their rough
descriptions are as follows:

  - pseudo_merge_bitmap() reads and inflates the objects EWAH bitmap for
    a given pseudo-merge

  - use_pseudo_merge() does the same as pseudo_merge_bitmap(), but on
    the commits EWAH bitmap, not the objects bitmap

  - apply_pseudo_merges_for_commit() applies all satisfied pseudo-merge
    commits for a given result set, and cascades any yet-unsatisfied
    pseudo-merges if any were applied in the previous step

  - cascade_pseudo_merges() applies all pseudo-merges which are
    satisfied but have not been previously applied, repeating this
    process until no more pseudo-merges can be applied

The core of the API is the latter two functions, which are responsible
for applying pseudo-merges during the object traversal implemented in
the pack-bitmap machinery.

The other two functions (pseudo_merge_bitmap(), and use_pseudo_merge())
are low-level ways to interact with the pseudo-merge machinery, which
will be useful in future commits.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pseudo-merge.c | 231 +++++++++++++++++++++++++++++++++++++++++++++++++
 pseudo-merge.h |  44 ++++++++++
 2 files changed, 275 insertions(+)

diff --git a/pseudo-merge.c b/pseudo-merge.c
index d18de0a266b..e111c9cd1a6 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -10,6 +10,7 @@
 #include "commit.h"
 #include "alloc.h"
 #include "progress.h"
+#include "hex.h"
 
 #define DEFAULT_PSEUDO_MERGE_DECAY 1.0f
 #define DEFAULT_PSEUDO_MERGE_MAX_MERGES 64
@@ -451,3 +452,233 @@ void free_pseudo_merge_map(struct pseudo_merge_map *pm)
 	}
 	free(pm->v);
 }
+
+struct pseudo_merge_commit_ext {
+	uint32_t nr;
+	const unsigned char *ptr;
+};
+
+static int pseudo_merge_ext_at(const struct pseudo_merge_map *pm,
+			       struct pseudo_merge_commit_ext *ext, size_t at)
+{
+	if (at >= pm->map_size)
+		return error(_("extended pseudo-merge read out-of-bounds "
+			       "(%"PRIuMAX" >= %"PRIuMAX")"),
+			     (uintmax_t)at, (uintmax_t)pm->map_size);
+
+	ext->nr = get_be32(pm->map + at);
+	ext->ptr = pm->map + at + sizeof(uint32_t);
+
+	return 0;
+}
+
+struct ewah_bitmap *pseudo_merge_bitmap(const struct pseudo_merge_map *pm,
+					struct pseudo_merge *merge)
+{
+	if (!merge->loaded_commits)
+		BUG("cannot use unloaded pseudo-merge bitmap");
+
+	if (!merge->loaded_bitmap) {
+		size_t at = merge->bitmap_at;
+
+		merge->bitmap = read_bitmap(pm->map, pm->map_size, &at);
+		merge->loaded_bitmap = 1;
+	}
+
+	return merge->bitmap;
+}
+
+struct pseudo_merge *use_pseudo_merge(const struct pseudo_merge_map *pm,
+				      struct pseudo_merge *merge)
+{
+	if (!merge->loaded_commits) {
+		size_t pos = merge->at;
+
+		merge->commits = read_bitmap(pm->map, pm->map_size, &pos);
+		merge->bitmap_at = pos;
+		merge->loaded_commits = 1;
+	}
+	return merge;
+}
+
+static struct pseudo_merge *pseudo_merge_at(const struct pseudo_merge_map *pm,
+					    struct object_id *oid,
+					    size_t want)
+{
+	size_t lo = 0;
+	size_t hi = pm->nr;
+
+	while (lo < hi) {
+		size_t mi = lo + (hi - lo) / 2;
+		size_t got = pm->v[mi].at;
+
+		if (got == want)
+			return use_pseudo_merge(pm, &pm->v[mi]);
+		else if (got < want)
+			hi = mi;
+		else
+			lo = mi + 1;
+	}
+
+	warning(_("could not find pseudo-merge for commit %s at offset %"PRIuMAX),
+		oid_to_hex(oid), (uintmax_t)want);
+
+	return NULL;
+}
+
+struct pseudo_merge_commit {
+	uint32_t commit_pos;
+	uint64_t pseudo_merge_ofs;
+};
+
+#define PSEUDO_MERGE_COMMIT_RAWSZ (sizeof(uint32_t)+sizeof(uint64_t))
+
+static void read_pseudo_merge_commit_at(struct pseudo_merge_commit *merge,
+					const unsigned char *at)
+{
+	merge->commit_pos = get_be32(at);
+	merge->pseudo_merge_ofs = get_be64(at + sizeof(uint32_t));
+}
+
+static int nth_pseudo_merge_ext(const struct pseudo_merge_map *pm,
+				struct pseudo_merge_commit_ext *ext,
+				struct pseudo_merge_commit *merge,
+				uint32_t n)
+{
+	size_t ofs;
+
+	if (n >= ext->nr)
+		return error(_("extended pseudo-merge lookup out-of-bounds "
+			       "(%"PRIu32" >= %"PRIu32")"), n, ext->nr);
+
+	ofs = get_be64(ext->ptr + st_mult(n, sizeof(uint64_t)));
+	if (ofs >= pm->map_size)
+		return error(_("out-of-bounds read: (%"PRIuMAX" >= %"PRIuMAX")"),
+			     (uintmax_t)ofs, (uintmax_t)pm->map_size);
+
+	read_pseudo_merge_commit_at(merge, pm->map + ofs);
+
+	return 0;
+}
+
+static unsigned apply_pseudo_merge(const struct pseudo_merge_map *pm,
+				   struct pseudo_merge *merge,
+				   struct bitmap *result,
+				   struct bitmap *roots)
+{
+	if (merge->satisfied)
+		return 0;
+
+	if (!ewah_bitmap_is_subset(merge->commits, roots ? roots : result))
+		return 0;
+
+	bitmap_or_ewah(result, pseudo_merge_bitmap(pm, merge));
+	if (roots)
+		bitmap_or_ewah(roots, pseudo_merge_bitmap(pm, merge));
+	merge->satisfied = 1;
+
+	return 1;
+}
+
+static int pseudo_merge_commit_cmp(const void *va, const void *vb)
+{
+	struct pseudo_merge_commit merge;
+	uint32_t key = *(uint32_t*)va;
+
+	read_pseudo_merge_commit_at(&merge, vb);
+
+	if (key < merge.commit_pos)
+		return -1;
+	if (key > merge.commit_pos)
+		return 1;
+	return 0;
+}
+
+static struct pseudo_merge_commit *find_pseudo_merge(const struct pseudo_merge_map *pm,
+						     uint32_t pos)
+{
+	if (!pm->commits_nr)
+		return NULL;
+
+	return bsearch(&pos, pm->commits, pm->commits_nr,
+		       PSEUDO_MERGE_COMMIT_RAWSZ, pseudo_merge_commit_cmp);
+}
+
+int apply_pseudo_merges_for_commit(const struct pseudo_merge_map *pm,
+				   struct bitmap *result,
+				   struct commit *commit, uint32_t commit_pos)
+{
+	struct pseudo_merge *merge;
+	struct pseudo_merge_commit *merge_commit;
+	int ret = 0;
+
+	merge_commit = find_pseudo_merge(pm, commit_pos);
+	if (!merge_commit)
+		return 0;
+
+	if (merge_commit->pseudo_merge_ofs & ((uint64_t)1<<63)) {
+		struct pseudo_merge_commit_ext ext = { 0 };
+		off_t ofs = merge_commit->pseudo_merge_ofs & ~((uint64_t)1<<63);
+		uint32_t i;
+
+		if (pseudo_merge_ext_at(pm, &ext, ofs) < -1) {
+			warning(_("could not read extended pseudo-merge table "
+				  "for commit %s"),
+				oid_to_hex(&commit->object.oid));
+			return ret;
+		}
+
+		for (i = 0; i < ext.nr; i++) {
+			if (nth_pseudo_merge_ext(pm, &ext, merge_commit, i) < 0)
+				return ret;
+
+			merge = pseudo_merge_at(pm, &commit->object.oid,
+						merge_commit->pseudo_merge_ofs);
+
+			if (!merge)
+				return ret;
+
+			if (apply_pseudo_merge(pm, merge, result, NULL))
+				ret++;
+		}
+	} else {
+		merge = pseudo_merge_at(pm, &commit->object.oid,
+					merge_commit->pseudo_merge_ofs);
+
+		if (!merge)
+			return ret;
+
+		if (apply_pseudo_merge(pm, merge, result, NULL))
+			ret++;
+	}
+
+	if (ret)
+		cascade_pseudo_merges(pm, result, NULL);
+
+	return ret;
+}
+
+int cascade_pseudo_merges(const struct pseudo_merge_map *pm,
+			  struct bitmap *result,
+			  struct bitmap *roots)
+{
+	unsigned any_satisfied;
+	int ret = 0;
+
+	do {
+		struct pseudo_merge *merge;
+		uint32_t i;
+
+		any_satisfied = 0;
+
+		for (i = 0; i < pm->nr; i++) {
+			merge = use_pseudo_merge(pm, &pm->v[i]);
+			if (apply_pseudo_merge(pm, merge, result, roots)) {
+				any_satisfied |= 1;
+				ret++;
+			}
+		}
+	} while (any_satisfied);
+
+	return ret;
+}
diff --git a/pseudo-merge.h b/pseudo-merge.h
index 2f652fc6767..cc14e947e86 100644
--- a/pseudo-merge.h
+++ b/pseudo-merge.h
@@ -164,4 +164,48 @@ struct pseudo_merge {
  */
 void free_pseudo_merge_map(struct pseudo_merge_map *pm);
 
+/*
+ * Loads the bitmap corresponding to the given pseudo-merge from the
+ * map, if it has not already been loaded.
+ */
+struct ewah_bitmap *pseudo_merge_bitmap(const struct pseudo_merge_map *pm,
+					struct pseudo_merge *merge);
+
+/*
+ * Loads the pseudo-merge and its commits bitmap from the given
+ * pseudo-merge map, if it has not already been loaded.
+ */
+struct pseudo_merge *use_pseudo_merge(const struct pseudo_merge_map *pm,
+				      struct pseudo_merge *merge);
+
+/*
+ * Applies pseudo-merge(s) containing the given commit to the bitmap
+ * "result".
+ *
+ * If any pseudo-merge(s) were satisfied, returns the number
+ * satisfied, otherwise returns 0. If any were satisfied, the
+ * remaining unsatisfied pseudo-merges are cascaded (see below).
+ */
+int apply_pseudo_merges_for_commit(const struct pseudo_merge_map *pm,
+				   struct bitmap *result,
+				   struct commit *commit, uint32_t commit_pos);
+
+/*
+ * Applies pseudo-merge(s) which are satisfied according to the
+ * current bitmap in result (or roots, see below). If any
+ * pseudo-merges were satisfied, repeat the process over unsatisfied
+ * pseudo-merge commits until no more pseudo-merges are satisfied.
+ *
+ * Result is the bitmap to which the pseudo-merge(s) are applied.
+ * Roots (if given) is a bitmap of the traversal tip(s) for either
+ * side of a reachability traversal.
+ *
+ * Roots may given instead of a populated results bitmap at the
+ * beginning of a traversal on either side where the reachability
+ * closure over tips is not yet known.
+ */
+int cascade_pseudo_merges(const struct pseudo_merge_map *pm,
+			  struct bitmap *result,
+			  struct bitmap *roots);
+
 #endif
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 17/24] ewah: implement `ewah_bitmap_popcount()`
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (15 preceding siblings ...)
  2024-03-20 22:05 ` [PATCH 16/24] pseudo-merge: implement support for reading pseudo-merge commits Taylor Blau
@ 2024-03-20 22:05 ` Taylor Blau
  2024-03-20 22:05 ` [PATCH 18/24] pack-bitmap: implement test helpers for pseudo-merge Taylor Blau
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Some of the pseudo-merge test helpers (which will be introduced in the
following commit) will want to indicate the total number of commits in
or objects reachable from a pseudo-merge.

Implement a popcount() function that operates on EWAH bitmaps to quickly
determine how many bits are set in each of the respective bitmaps.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ewah/bitmap.c | 14 ++++++++++++++
 ewah/ewok.h   |  1 +
 2 files changed, 15 insertions(+)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index 5bdae3fb07b..a41fa152cbd 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -212,6 +212,20 @@ size_t bitmap_popcount(struct bitmap *self)
 	return count;
 }
 
+size_t ewah_bitmap_popcount(struct ewah_bitmap *self)
+{
+	struct ewah_iterator it;
+	eword_t word;
+	size_t count = 0;
+
+	ewah_iterator_init(&it, self);
+
+	while (ewah_iterator_next(&word, &it))
+		count += ewah_bit_popcount64(word);
+
+	return count;
+}
+
 int bitmap_is_empty(struct bitmap *self)
 {
 	size_t i;
diff --git a/ewah/ewok.h b/ewah/ewok.h
index c334833b201..d7e9fb67715 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -190,6 +190,7 @@ void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other);
 void bitmap_or(struct bitmap *self, const struct bitmap *other);
 
 size_t bitmap_popcount(struct bitmap *self);
+size_t ewah_bitmap_popcount(struct ewah_bitmap *self);
 int bitmap_is_empty(struct bitmap *self);
 
 #endif
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 18/24] pack-bitmap: implement test helpers for pseudo-merge
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (16 preceding siblings ...)
  2024-03-20 22:05 ` [PATCH 17/24] ewah: implement `ewah_bitmap_popcount()` Taylor Blau
@ 2024-03-20 22:05 ` Taylor Blau
  2024-03-20 22:05 ` [PATCH 19/24] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` Taylor Blau
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Implement three new sub-commands for the "bitmap" test-helper:

  - t/helper test-tool bitmap dump-pseudo-merges
  - t/helper test-tool bitmap dump-pseudo-merge-commits <n>
  - t/helper test-tool bitmap dump-pseudo-merge-objects <n>

These three helpers dump the list of pseudo merges, the "parents" of the
nth pseudo-merges, and the set of objects reachable from those parents,
respectively.

These helpers will be useful in subsequent patches when we add test
coverage for pseudo-merge bitmaps.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c          | 126 +++++++++++++++++++++++++++++++++++++++++
 pack-bitmap.h          |   3 +
 t/helper/test-bitmap.c |  34 ++++++++---
 3 files changed, 156 insertions(+), 7 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index e0f191b7581..7188dd75eaf 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2442,6 +2442,132 @@ int test_bitmap_hashes(struct repository *r)
 	return 0;
 }
 
+static void bit_pos_to_object_id(struct bitmap_index *bitmap_git,
+				 uint32_t bit_pos,
+				 struct object_id *oid)
+{
+	uint32_t index_pos;
+
+	if (bitmap_is_midx(bitmap_git))
+		index_pos = pack_pos_to_midx(bitmap_git->midx, bit_pos);
+	else
+		index_pos = pack_pos_to_index(bitmap_git->pack, bit_pos);
+
+	nth_bitmap_object_oid(bitmap_git, oid, index_pos);
+}
+
+int test_bitmap_pseudo_merges(struct repository *r)
+{
+	struct bitmap_index *bitmap_git;
+	uint32_t i;
+
+	bitmap_git = prepare_bitmap_git(r);
+	if (!bitmap_git || !bitmap_git->pseudo_merges.nr)
+		goto cleanup;
+
+	for (i = 0; i < bitmap_git->pseudo_merges.nr; i++) {
+		struct pseudo_merge *merge;
+		struct ewah_bitmap *commits_bitmap, *merge_bitmap;
+
+		merge = use_pseudo_merge(&bitmap_git->pseudo_merges,
+					 &bitmap_git->pseudo_merges.v[i]);
+		commits_bitmap = merge->commits;
+		merge_bitmap = pseudo_merge_bitmap(&bitmap_git->pseudo_merges,
+						   merge);
+
+		printf("at=%"PRIuMAX", commits=%"PRIuMAX", objects=%"PRIuMAX"\n",
+		       (uintmax_t)merge->at,
+		       (uintmax_t)ewah_bitmap_popcount(commits_bitmap),
+		       (uintmax_t)ewah_bitmap_popcount(merge_bitmap));
+	}
+
+cleanup:
+	free_bitmap_index(bitmap_git);
+	return 0;
+}
+
+static void dump_ewah_object_ids(struct bitmap_index *bitmap_git,
+				 struct ewah_bitmap *bitmap)
+
+{
+	struct ewah_iterator it;
+	eword_t word;
+	uint32_t pos = 0;
+
+	ewah_iterator_init(&it, bitmap);
+
+	while (ewah_iterator_next(&word, &it)) {
+		struct object_id oid;
+		uint32_t offset;
+
+		for (offset = 0; offset < BITS_IN_EWORD; offset++) {
+			if (!(word >> offset))
+				break;
+
+			offset += ewah_bit_ctz64(word >> offset);
+
+			bit_pos_to_object_id(bitmap_git, pos + offset, &oid);
+			printf("%s\n", oid_to_hex(&oid));
+		}
+		pos += BITS_IN_EWORD;
+	}
+}
+
+int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n)
+{
+	struct bitmap_index *bitmap_git;
+	struct pseudo_merge *merge;
+	int ret = 0;
+
+	bitmap_git = prepare_bitmap_git(r);
+	if (!bitmap_git || !bitmap_git->pseudo_merges.nr)
+		goto cleanup;
+
+	if (n >= bitmap_git->pseudo_merges.nr) {
+		ret = error(_("pseudo-merge index out of range "
+			      "(%"PRIu32" >= %"PRIuMAX")"),
+			    n, (uintmax_t)bitmap_git->pseudo_merges.nr);
+		goto cleanup;
+	}
+
+	merge = use_pseudo_merge(&bitmap_git->pseudo_merges,
+				 &bitmap_git->pseudo_merges.v[n]);
+	dump_ewah_object_ids(bitmap_git, merge->commits);
+
+cleanup:
+	free_bitmap_index(bitmap_git);
+	return ret;
+}
+
+int test_bitmap_pseudo_merge_objects(struct repository *r, uint32_t n)
+{
+	struct bitmap_index *bitmap_git;
+	struct pseudo_merge *merge;
+	int ret = 0;
+
+	bitmap_git = prepare_bitmap_git(r);
+	if (!bitmap_git || !bitmap_git->pseudo_merges.nr)
+		goto cleanup;
+
+	if (n >= bitmap_git->pseudo_merges.nr) {
+		ret = error(_("pseudo-merge index out of range "
+			      "(%"PRIu32" >= %"PRIuMAX")"),
+			    n, (uintmax_t)bitmap_git->pseudo_merges.nr);
+		goto cleanup;
+	}
+
+	merge = use_pseudo_merge(&bitmap_git->pseudo_merges,
+				 &bitmap_git->pseudo_merges.v[n]);
+
+	dump_ewah_object_ids(bitmap_git,
+			     pseudo_merge_bitmap(&bitmap_git->pseudo_merges,
+						 merge));
+
+cleanup:
+	free_bitmap_index(bitmap_git);
+	return ret;
+}
+
 int rebuild_bitmap(const uint32_t *reposition,
 		   struct ewah_bitmap *source,
 		   struct bitmap *dest)
diff --git a/pack-bitmap.h b/pack-bitmap.h
index a5fe4f305ef..25d3b8e604a 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -73,6 +73,9 @@ void traverse_bitmap_commit_list(struct bitmap_index *,
 void test_bitmap_walk(struct rev_info *revs);
 int test_bitmap_commits(struct repository *r);
 int test_bitmap_hashes(struct repository *r);
+int test_bitmap_pseudo_merges(struct repository *r);
+int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n);
+int test_bitmap_pseudo_merge_objects(struct repository *r, uint32_t n);
 
 #define GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL \
 	"GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL"
diff --git a/t/helper/test-bitmap.c b/t/helper/test-bitmap.c
index af43ee1cb5e..6af2b42678f 100644
--- a/t/helper/test-bitmap.c
+++ b/t/helper/test-bitmap.c
@@ -13,21 +13,41 @@ static int bitmap_dump_hashes(void)
 	return test_bitmap_hashes(the_repository);
 }
 
+static int bitmap_dump_pseudo_merges(void)
+{
+	return test_bitmap_pseudo_merges(the_repository);
+}
+
+static int bitmap_dump_pseudo_merge_commits(uint32_t n)
+{
+	return test_bitmap_pseudo_merge_commits(the_repository, n);
+}
+
+static int bitmap_dump_pseudo_merge_objects(uint32_t n)
+{
+	return test_bitmap_pseudo_merge_objects(the_repository, n);
+}
+
 int cmd__bitmap(int argc, const char **argv)
 {
 	setup_git_directory();
 
-	if (argc != 2)
-		goto usage;
-
-	if (!strcmp(argv[1], "list-commits"))
+	if (argc == 2 && !strcmp(argv[1], "list-commits"))
 		return bitmap_list_commits();
-	if (!strcmp(argv[1], "dump-hashes"))
+	if (argc == 2 && !strcmp(argv[1], "dump-hashes"))
 		return bitmap_dump_hashes();
+	if (argc == 2 && !strcmp(argv[1], "dump-pseudo-merges"))
+		return bitmap_dump_pseudo_merges();
+	if (argc == 3 && !strcmp(argv[1], "dump-pseudo-merge-commits"))
+		return bitmap_dump_pseudo_merge_commits(atoi(argv[2]));
+	if (argc == 3 && !strcmp(argv[1], "dump-pseudo-merge-objects"))
+		return bitmap_dump_pseudo_merge_objects(atoi(argv[2]));
 
-usage:
 	usage("\ttest-tool bitmap list-commits\n"
-	      "\ttest-tool bitmap dump-hashes");
+	      "\ttest-tool bitmap dump-hashes\n"
+	      "\ttest-tool bitmap dump-pseudo-merges\n"
+	      "\ttest-tool bitmap dump-pseudo-merge-commits <n>\n"
+	      "\ttest-tool bitmap dump-pseudo-merge-objects <n>");
 
 	return -1;
 }
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 19/24] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()`
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (17 preceding siblings ...)
  2024-03-20 22:05 ` [PATCH 18/24] pack-bitmap: implement test helpers for pseudo-merge Taylor Blau
@ 2024-03-20 22:05 ` Taylor Blau
  2024-03-20 22:05 ` [PATCH 20/24] pack-bitmap.c: use pseudo-merges during traversal Taylor Blau
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

One of the tests we'll want to add for pseudo-merge bitmaps needs to be
able to generate a large number of commits at a specific date.

Support the `--date` option (with identical semantics to the `--date`
option for `test_commit()`) within `test_commit_bulk` as a prerequisite
for that.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/test-lib-functions.sh | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 6eaf116346b..312cc5d4c79 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -458,6 +458,7 @@ test_commit_bulk () {
 	indir=.
 	ref=HEAD
 	n=1
+	notick=
 	message='commit %s'
 	filename='%s.t'
 	contents='content %s'
@@ -488,6 +489,12 @@ test_commit_bulk () {
 			filename="${1#--*=}-%s.t"
 			contents="${1#--*=} %s"
 			;;
+		--date)
+			notick=yes
+			GIT_COMMITTER_DATE="$2"
+			GIT_AUTHOR_DATE="$2"
+			shift
+			;;
 		-*)
 			BUG "invalid test_commit_bulk option: $1"
 			;;
@@ -507,7 +514,10 @@ test_commit_bulk () {
 
 	while test "$total" -gt 0
 	do
-		test_tick &&
+		if test -z "$notick"
+		then
+			test_tick
+		fi &&
 		echo "commit $ref"
 		printf 'author %s <%s> %s\n' \
 			"$GIT_AUTHOR_NAME" \
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 20/24] pack-bitmap.c: use pseudo-merges during traversal
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (18 preceding siblings ...)
  2024-03-20 22:05 ` [PATCH 19/24] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` Taylor Blau
@ 2024-03-20 22:05 ` Taylor Blau
  2024-03-20 22:06 ` [PATCH 21/24] pack-bitmap: extra trace2 information Taylor Blau
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Now that all of the groundwork has been laid to support reading and
using pseudo-merges, make use of that work in this commit by teaching
the pack-bitmap machinery to use pseudo-merge(s) when available during
traversal.

The basic operation is as follows:

  - When enumerating objects on either side of a reachability query,
    first see if any subset of the roots satisfies some pseudo-merge
    bitmap. If it does, apply that pseudo-merge bitmap.

  - If any pseudo-merge bitmap(s) were applied in the previous step, OR
    them into the result[^1]. Then repeat the process over all
    pseudo-merge bitmaps (we'll refer to this as "cascading"
    pseudo-merges). Once this is done, OR in the resulting bitmap.

  - If there is no fill-in traversal to be done, return the bitmap for
    that side of the reachability query. If there is fill-in traversal,
    then for each commit we encounter via show_commit(), check to see if
    any unsatisfied pseudo-merges containing that commit as one of its
    parents has been made satisfied by the presence of that commit.

    If so, OR in the object set from that pseudo-merge bitmap, and then
    cascade. If not, continue traversal.

A similar implementation is present in the boundary-based bitmap
traversal routines.

[^1]: Importantly, we cannot OR in the entire set of roots along with
  the objects reachable from whatever pseudo-merge bitmaps were
  satisfied.  This may leave some dangling bits corresponding to any
  unsatisfied root(s) getting OR'd into the resulting bitmap, tricking
  other parts of the traversal into thinking we already have a
  reachability closure over those commit(s) when we do not.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c                   | 112 ++++++++++-
 t/t5333-pseudo-merge-bitmaps.sh | 325 ++++++++++++++++++++++++++++++++
 2 files changed, 436 insertions(+), 1 deletion(-)
 create mode 100755 t/t5333-pseudo-merge-bitmaps.sh

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 7188dd75eaf..a7c36a977bd 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -114,6 +114,9 @@ struct bitmap_index {
 	unsigned int version;
 };
 
+static int pseudo_merges_satisfied_nr;
+static int pseudo_merges_cascades_nr;
+
 static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
 {
 	struct ewah_bitmap *parent;
@@ -1006,6 +1009,22 @@ static void show_commit(struct commit *commit UNUSED,
 {
 }
 
+static unsigned apply_pseudo_merges_for_commit_1(struct bitmap_index *bitmap_git,
+						 struct bitmap *result,
+						 struct commit *commit,
+						 uint32_t commit_pos)
+{
+	int ret;
+
+	ret = apply_pseudo_merges_for_commit(&bitmap_git->pseudo_merges,
+					     result, commit, commit_pos);
+
+	if (ret)
+		pseudo_merges_satisfied_nr += ret;
+
+	return ret;
+}
+
 static int add_to_include_set(struct bitmap_index *bitmap_git,
 			      struct include_data *data,
 			      struct commit *commit,
@@ -1026,6 +1045,10 @@ static int add_to_include_set(struct bitmap_index *bitmap_git,
 	}
 
 	bitmap_set(data->base, bitmap_pos);
+	if (apply_pseudo_merges_for_commit_1(bitmap_git, data->base, commit,
+					     bitmap_pos))
+		return 0;
+
 	return 1;
 }
 
@@ -1151,6 +1174,20 @@ static void show_boundary_object(struct object *object UNUSED,
 	BUG("should not be called");
 }
 
+static unsigned cascade_pseudo_merges_1(struct bitmap_index *bitmap_git,
+					struct bitmap *result,
+					struct bitmap *roots)
+{
+	int ret = cascade_pseudo_merges(&bitmap_git->pseudo_merges,
+					result, roots);
+	if (ret) {
+		pseudo_merges_cascades_nr++;
+		pseudo_merges_satisfied_nr += ret;
+	}
+
+	return ret;
+}
+
 static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 					    struct rev_info *revs,
 					    struct object_list *roots)
@@ -1160,6 +1197,7 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 	unsigned int i;
 	unsigned int tmp_blobs, tmp_trees, tmp_tags;
 	int any_missing = 0;
+	int existing_bitmaps = 0;
 
 	cb.bitmap_git = bitmap_git;
 	cb.base = bitmap_new();
@@ -1167,6 +1205,25 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 
 	revs->ignore_missing_links = 1;
 
+	if (bitmap_git->pseudo_merges.nr) {
+		struct bitmap *roots_bitmap = bitmap_new();
+		struct object_list *objects = NULL;
+
+		for (objects = roots; objects; objects = objects->next) {
+			struct object *object = objects->item;
+			int pos;
+
+			pos = bitmap_position(bitmap_git, &object->oid);
+			if (pos < 0)
+				continue;
+
+			bitmap_set(roots_bitmap, pos);
+		}
+
+		if (!cascade_pseudo_merges_1(bitmap_git, cb.base, roots_bitmap))
+			bitmap_free(roots_bitmap);
+	}
+
 	/*
 	 * OR in any existing reachability bitmaps among `roots` into
 	 * `cb.base`.
@@ -1178,8 +1235,10 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 			continue;
 
 		if (add_commit_to_bitmap(bitmap_git, &cb.base,
-					 (struct commit *)object))
+					 (struct commit *)object)) {
+			existing_bitmaps = 1;
 			continue;
+		}
 
 		any_missing = 1;
 	}
@@ -1187,6 +1246,9 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 	if (!any_missing)
 		goto cleanup;
 
+	if (existing_bitmaps)
+		cascade_pseudo_merges_1(bitmap_git, cb.base, NULL);
+
 	tmp_blobs = revs->blob_objects;
 	tmp_trees = revs->tree_objects;
 	tmp_tags = revs->blob_objects;
@@ -1242,6 +1304,13 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 	return cb.base;
 }
 
+static void unsatisfy_all_pseudo_merges(struct bitmap_index *bitmap_git)
+{
+	uint32_t i;
+	for (i = 0; i < bitmap_git->pseudo_merges.nr; i++)
+		bitmap_git->pseudo_merges.v[i].satisfied = 0;
+}
+
 static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 				   struct rev_info *revs,
 				   struct object_list *roots,
@@ -1249,9 +1318,32 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 {
 	struct bitmap *base = NULL;
 	int needs_walk = 0;
+	unsigned existing_bitmaps = 0;
 
 	struct object_list *not_mapped = NULL;
 
+	unsatisfy_all_pseudo_merges(bitmap_git);
+
+	if (bitmap_git->pseudo_merges.nr) {
+		struct bitmap *roots_bitmap = bitmap_new();
+		struct object_list *objects = NULL;
+
+		for (objects = roots; objects; objects = objects->next) {
+			struct object *object = objects->item;
+			int pos;
+
+			pos = bitmap_position(bitmap_git, &object->oid);
+			if (pos < 0)
+				continue;
+
+			bitmap_set(roots_bitmap, pos);
+		}
+
+		base = bitmap_new();
+		if (!cascade_pseudo_merges_1(bitmap_git, base, roots_bitmap))
+			bitmap_free(roots_bitmap);
+	}
+
 	/*
 	 * Go through all the roots for the walk. The ones that have bitmaps
 	 * on the bitmap index will be `or`ed together to form an initial
@@ -1262,11 +1354,21 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 	 */
 	while (roots) {
 		struct object *object = roots->item;
+
 		roots = roots->next;
 
+		if (base) {
+			int pos = bitmap_position(bitmap_git, &object->oid);
+			if (pos > 0 && bitmap_get(base, pos)) {
+				object->flags |= SEEN;
+				continue;
+			}
+		}
+
 		if (object->type == OBJ_COMMIT &&
 		    add_commit_to_bitmap(bitmap_git, &base, (struct commit *)object)) {
 			object->flags |= SEEN;
+			existing_bitmaps = 1;
 			continue;
 		}
 
@@ -1282,6 +1384,9 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 
 	roots = not_mapped;
 
+	if (existing_bitmaps)
+		cascade_pseudo_merges_1(bitmap_git, base, NULL);
+
 	/*
 	 * Let's iterate through all the roots that don't have bitmaps to
 	 * check if we can determine them to be reachable from the existing
@@ -1866,6 +1971,11 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	object_list_free(&wants);
 	object_list_free(&haves);
 
+	trace2_data_intmax("bitmap", the_repository, "pseudo_merges_satisfied",
+			   pseudo_merges_satisfied_nr);
+	trace2_data_intmax("bitmap", the_repository, "pseudo_merges_cascades",
+			   pseudo_merges_cascades_nr);
+
 	return bitmap_git;
 
 cleanup:
diff --git a/t/t5333-pseudo-merge-bitmaps.sh b/t/t5333-pseudo-merge-bitmaps.sh
new file mode 100755
index 00000000000..909c17e301e
--- /dev/null
+++ b/t/t5333-pseudo-merge-bitmaps.sh
@@ -0,0 +1,325 @@
+#!/bin/sh
+
+test_description='pseudo-merge bitmaps'
+
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
+. ./test-lib.sh
+
+test_pseudo_merges () {
+	test-tool bitmap dump-pseudo-merges
+}
+
+test_pseudo_merge_commits () {
+	test-tool bitmap dump-pseudo-merge-commits "$1"
+}
+
+test_pseudo_merges_satisfied () {
+	test_trace2_data bitmap pseudo_merges_satisfied "$1"
+}
+
+test_pseudo_merges_cascades () {
+	test_trace2_data bitmap pseudo_merges_cascades "$1"
+}
+
+tag_everything () {
+	git rev-list --all --no-object-names >in &&
+	perl -lne '
+		print "create refs/tags/" . $. . " " . $1 if /([0-9a-f]+)/
+	' <in | git update-ref --stdin
+}
+
+test_expect_success 'setup' '
+	test_commit_bulk 512 &&
+	tag_everything
+'
+
+test_expect_success 'bitmap traversal without pseudo-merges' '
+	git repack -adb &&
+
+	git rev-list --count --all --objects >expect &&
+
+	: >trace2.txt &&
+	GIT_TRACE2_EVENT=$PWD/trace2.txt \
+		git rev-list --count --all --objects --use-bitmap-index >actual &&
+
+	test_pseudo_merges_satisfied 0 <trace2.txt &&
+	test_pseudo_merges_cascades 0 <trace2.txt &&
+	test_pseudo_merges >merges &&
+	test_must_be_empty merges &&
+	test_cmp expect actual
+'
+
+test_expect_success 'pseudo-merges accurately represent their objects' '
+	test_config bitmapPseudoMerge.test.pattern "refs/tags/" &&
+	test_config bitmapPseudoMerge.test.maxMerges 8 &&
+	test_config bitmapPseudoMerge.test.stableThreshold never &&
+
+	git repack -adb &&
+
+	test_pseudo_merges >merges &&
+	test_line_count = 8 merges &&
+
+	for i in $(test_seq 0 $(($(wc -l <merges)-1)))
+	do
+		test-tool bitmap dump-pseudo-merge-commits $i >commits &&
+
+		git rev-list --objects --no-object-names --stdin <commits >expect.raw &&
+		test-tool bitmap dump-pseudo-merge-objects $i >actual.raw &&
+
+		sort -u <expect.raw >expect &&
+		sort -u <actual.raw >actual &&
+
+		test_cmp expect actual || return 1
+	done
+'
+
+test_expect_success 'bitmap traversal with pseudo-merges' '
+	: >trace2.txt &&
+	GIT_TRACE2_EVENT=$PWD/trace2.txt \
+		git rev-list --count --all --objects --use-bitmap-index >actual &&
+	git rev-list --count --all --objects >expect &&
+
+	test_pseudo_merges_satisfied 8 <trace2.txt &&
+	test_pseudo_merges_cascades 1 <trace2.txt &&
+	test_cmp expect actual
+'
+
+test_expect_success 'stale bitmap traversal with pseudo-merges' '
+	test_commit other &&
+
+	: >trace2.txt &&
+	GIT_TRACE2_EVENT=$PWD/trace2.txt \
+		git rev-list --count --all --objects --use-bitmap-index >actual &&
+	git rev-list --count --all --objects >expect &&
+
+	test_pseudo_merges_satisfied 8 <trace2.txt &&
+	test_pseudo_merges_cascades 1 <trace2.txt &&
+	test_cmp expect actual
+'
+
+test_expect_success 'bitmapPseudoMerge.sampleRate adjusts commit selection rate' '
+	test_config bitmapPseudoMerge.test.pattern "refs/tags/" &&
+	test_config bitmapPseudoMerge.test.maxMerges 8 &&
+	test_config bitmapPseudoMerge.test.stableThreshold never &&
+
+	commits_nr=$(git rev-list --all --count) &&
+
+	for rate in 100 50 10
+	do
+		git -c bitmapPseudoMerge.test.sampleRate=$rate repack -adb &&
+
+		test_pseudo_merges >merges &&
+		for i in $(test_seq 0 $(($(wc -l <merges)-1)))
+		do
+			test_pseudo_merge_commits $i || return 1
+		done >commits &&
+
+		test-tool bitmap list-commits >bitmaps &&
+		bitmaps_nr="$(wc -l <bitmaps)" &&
+
+		perl -MPOSIX -e "print ceil((\$ARGV[0]/100)*(\$ARGV[1]-\$ARGV[2]))" \
+			"$rate" "$commits_nr" "$bitmaps_nr" >expect &&
+
+		test $(cat expect) -eq $(wc -l <commits) || return 1
+	done
+'
+
+test_expect_success 'bitmapPseudoMerge.threshold excludes newer commits' '
+	git init pseudo-merge-threshold &&
+	(
+		cd pseudo-merge-threshold &&
+
+		new="1672549200" && # 2023-01-01
+		old="1641013200" && # 2022-01-01
+
+		test_commit_bulk --message="new" --date "$new +0000" 128 &&
+		test_commit_bulk --message="old" --date "$old +0000" 128 &&
+		test_tick &&
+
+		tag_everything &&
+
+		git \
+			-c bitmapPseudoMerge.test.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.test.maxMerges=1 \
+			-c bitmapPseudoMerge.test.threshold=$(($new - 1)) \
+			-c bitmapPseudoMerge.test.stableThreshold=never \
+			repack -adb &&
+
+		test_pseudo_merges >merges &&
+		test_line_count = 1 merges &&
+
+		test_pseudo_merge_commits 0 >oids &&
+		git cat-file --batch <oids >commits &&
+
+		test $(wc -l <oids) = $(grep -c "^committer.*$old +0000$" commits)
+	)
+'
+
+test_expect_success 'bitmapPseudoMerge.stableThreshold creates stable groups' '
+	(
+		cd pseudo-merge-threshold &&
+
+		new="1672549200" && # 2023-01-01
+		mid="1654059600" && # 2022-06-01
+		old="1641013200" && # 2022-01-01
+
+		test_commit_bulk --message="mid" --date "$mid +0000" 128 &&
+		test_tick &&
+
+		git for-each-ref --format="delete %(refname)" refs/tags >in &&
+		git update-ref --stdin <in &&
+
+		tag_everything &&
+
+		git \
+			-c bitmapPseudoMerge.test.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.test.maxMerges=1 \
+			-c bitmapPseudoMerge.test.threshold=$(($new - 1)) \
+			-c bitmapPseudoMerge.test.stableThreshold=$(($mid - 1)) \
+			-c bitmapPseudoMerge.test.stableSize=10 \
+			repack -adb &&
+
+		test_pseudo_merges >merges &&
+		merges_nr="$(wc -l <merges)" &&
+
+		for i in $(test_seq $(($merges_nr - 1)))
+		do
+			test_pseudo_merge_commits 0 >oids &&
+			git cat-file --batch <oids >commits &&
+
+			expect="$(grep -c "^committer.*$old +0000$" commits)" &&
+			actual="$(wc -l <oids)" &&
+
+			test $expect = $actual || return 1
+		done &&
+
+		test_pseudo_merge_commits $(($merges_nr - 1)) >oids &&
+		git cat-file --batch <oids >commits &&
+		test $(wc -l <oids) = $(grep -c "^committer.*$mid +0000$" commits)
+	)
+'
+
+test_expect_success 'out of order thresholds are rejected' '
+	test_must_fail git \
+		-c bitmapPseudoMerge.test.pattern="refs/*" \
+		-c bitmapPseudoMerge.test.threshold=1.month.ago \
+		-c bitmapPseudoMerge.test.stableThreshold=1.week.ago \
+		repack -adb 2>err &&
+
+	cat >expect <<-EOF &&
+	fatal: pseudo-merge group ${SQ}test${SQ} has unstable threshold before stable one
+	EOF
+
+	test_cmp expect err
+'
+
+test_expect_success 'pseudo-merge pattern with capture groups' '
+	git init pseudo-merge-captures &&
+	(
+		cd pseudo-merge-captures &&
+
+		test_commit_bulk 128 &&
+		tag_everything &&
+
+		for r in $(test_seq 8)
+		do
+			test_commit_bulk 16 &&
+
+			git rev-list HEAD~16.. >in &&
+
+			perl -lne "print \"create refs/remotes/$r/tags/\$. \$_\"" <in |
+			git update-ref --stdin || return 1
+		done &&
+
+		git \
+			-c bitmapPseudoMerge.tags.pattern="refs/remotes/([0-9]+)/tags/" \
+			-c bitmapPseudoMerge.tags.maxMerges=1 \
+			repack -adb &&
+
+		git for-each-ref --format="%(objectname) %(refname)" >refs &&
+
+		test_pseudo_merges >merges &&
+		for m in $(test_seq 0 $(($(wc -l <merges) - 1)))
+		do
+			test_pseudo_merge_commits $m >oids &&
+			grep -f oids refs |
+			perl -lne "print \$1 if /refs\/remotes\/([0-9]+)/" |
+			sort -u || return 1
+		done >remotes &&
+
+		test $(wc -l <remotes) -eq $(sort -u <remotes | wc -l)
+	)
+'
+
+test_expect_success 'pseudo-merge overlap setup' '
+	git init pseudo-merge-overlap &&
+	(
+		cd pseudo-merge-overlap &&
+
+		test_commit_bulk 256 &&
+		tag_everything &&
+
+		git \
+			-c bitmapPseudoMerge.all.pattern="refs/" \
+			-c bitmapPseudoMerge.all.maxMerges=1 \
+			-c bitmapPseudoMerge.all.stableThreshold=never \
+			-c bitmapPseudoMerge.tags.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.tags.maxMerges=1 \
+			-c bitmapPseudoMerge.tags.stableThreshold=never \
+			repack -adb
+	)
+'
+
+test_expect_success 'pseudo-merge overlap generates overlapping groups' '
+	(
+		cd pseudo-merge-overlap &&
+
+		test_pseudo_merges >merges &&
+		test_line_count = 2 merges &&
+
+		test_pseudo_merge_commits 0 >commits-0.raw &&
+		test_pseudo_merge_commits 1 >commits-1.raw &&
+
+		sort commits-0.raw >commits-0 &&
+		sort commits-1.raw >commits-1 &&
+
+		comm -12 commits-0 commits-1 >overlap &&
+
+		test_line_count -gt 0 overlap
+	)
+'
+
+test_expect_success 'pseudo-merge overlap traversal' '
+	(
+		cd pseudo-merge-overlap &&
+
+		: >trace2.txt &&
+		GIT_TRACE2_EVENT=$PWD/trace2.txt \
+			git rev-list --count --all --objects --use-bitmap-index >actual &&
+		git rev-list --count --all --objects >expect &&
+
+		test_pseudo_merges_satisfied 2 <trace2.txt &&
+		test_pseudo_merges_cascades 1 <trace2.txt &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'pseudo-merge overlap stale traversal' '
+	(
+		cd pseudo-merge-overlap &&
+
+		test_commit other &&
+
+		: >trace2.txt &&
+		GIT_TRACE2_EVENT=$PWD/trace2.txt \
+			git rev-list --count --all --objects --use-bitmap-index >actual &&
+		git rev-list --count --all --objects >expect &&
+
+		test_pseudo_merges_satisfied 2 <trace2.txt &&
+		test_pseudo_merges_cascades 1 <trace2.txt &&
+		test_cmp expect actual
+	)
+'
+
+test_done
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 21/24] pack-bitmap: extra trace2 information
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (19 preceding siblings ...)
  2024-03-20 22:05 ` [PATCH 20/24] pack-bitmap.c: use pseudo-merges during traversal Taylor Blau
@ 2024-03-20 22:06 ` Taylor Blau
  2024-03-20 22:06 ` [PATCH 22/24] ewah: `bitmap_equals_ewah()` Taylor Blau
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:06 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Add some extra trace2 lines to capture the number of bitmap lookups that
are hits versus misses, as well as the number of reachability roots that
have bitmap coverage (versus those that do not).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index a7c36a977bd..be65f637cf5 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -116,6 +116,10 @@ struct bitmap_index {
 
 static int pseudo_merges_satisfied_nr;
 static int pseudo_merges_cascades_nr;
+static int existing_bitmaps_hits_nr;
+static int existing_bitmaps_misses_nr;
+static int roots_with_bitmaps_nr;
+static int roots_without_bitmaps_nr;
 
 static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
 {
@@ -1040,10 +1044,14 @@ static int add_to_include_set(struct bitmap_index *bitmap_git,
 
 	partial = bitmap_for_commit(bitmap_git, commit);
 	if (partial) {
+		existing_bitmaps_hits_nr++;
+
 		bitmap_or_ewah(data->base, partial);
 		return 0;
 	}
 
+	existing_bitmaps_misses_nr++;
+
 	bitmap_set(data->base, bitmap_pos);
 	if (apply_pseudo_merges_for_commit_1(bitmap_git, data->base, commit,
 					     bitmap_pos))
@@ -1099,8 +1107,12 @@ static int add_commit_to_bitmap(struct bitmap_index *bitmap_git,
 {
 	struct ewah_bitmap *or_with = bitmap_for_commit(bitmap_git, commit);
 
-	if (!or_with)
+	if (!or_with) {
+		existing_bitmaps_misses_nr++;
 		return 0;
+	}
+
+	existing_bitmaps_hits_nr++;
 
 	if (!*base)
 		*base = ewah_to_bitmap(or_with);
@@ -1407,8 +1419,12 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 			object->flags &= ~UNINTERESTING;
 			add_pending_object(revs, object, "");
 			needs_walk = 1;
+
+			roots_without_bitmaps_nr++;
 		} else {
 			object->flags |= SEEN;
+
+			roots_with_bitmaps_nr++;
 		}
 	}
 
@@ -1975,6 +1991,14 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 			   pseudo_merges_satisfied_nr);
 	trace2_data_intmax("bitmap", the_repository, "pseudo_merges_cascades",
 			   pseudo_merges_cascades_nr);
+	trace2_data_intmax("bitmap", the_repository, "bitmap/hits",
+			   existing_bitmaps_hits_nr);
+	trace2_data_intmax("bitmap", the_repository, "bitmap/misses",
+			   existing_bitmaps_misses_nr);
+	trace2_data_intmax("bitmap", the_repository, "bitmap/roots_with_bitmap",
+			   roots_with_bitmaps_nr);
+	trace2_data_intmax("bitmap", the_repository, "bitmap/roots_without_bitmap",
+			   roots_without_bitmaps_nr);
 
 	return bitmap_git;
 
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 22/24] ewah: `bitmap_equals_ewah()`
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (20 preceding siblings ...)
  2024-03-20 22:06 ` [PATCH 21/24] pack-bitmap: extra trace2 information Taylor Blau
@ 2024-03-20 22:06 ` Taylor Blau
  2024-03-20 22:06 ` [PATCH 23/24] pseudo-merge: implement support for finding existing merges Taylor Blau
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:06 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Prepare to reuse existing pseudo-merge bitmaps by implementing a
`bitmap_equals_ewah()` helper.

This helper will be used to see if a raw bitmap (containing the set of
parents for some pseudo-merge) is equal to any existing pseudo-merge's
commits bitmap (which are stored as EWAH-compressed bitmaps on disk).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ewah/bitmap.c | 19 +++++++++++++++++++
 ewah/ewok.h   |  1 +
 2 files changed, 20 insertions(+)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index a41fa152cbd..59dc77a08f6 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -261,6 +261,25 @@ int bitmap_equals(struct bitmap *self, struct bitmap *other)
 	return 1;
 }
 
+int bitmap_equals_ewah(struct bitmap *self, struct ewah_bitmap *other)
+{
+	struct ewah_iterator it;
+	eword_t word;
+	size_t i = 0;
+
+	ewah_iterator_init(&it, other);
+
+	while (ewah_iterator_next(&word, &it))
+		if (word != (i < self->word_alloc ? self->words[i++] : 0))
+			return 0;
+
+	for (; i < self->word_alloc; i++)
+		if (self->words[i])
+			return 0;
+
+	return 1;
+}
+
 int bitmap_is_subset(struct bitmap *self, struct bitmap *other)
 {
 	size_t common_size, i;
diff --git a/ewah/ewok.h b/ewah/ewok.h
index d7e9fb67715..0d49ec00618 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -179,6 +179,7 @@ void bitmap_unset(struct bitmap *self, size_t pos);
 int bitmap_get(struct bitmap *self, size_t pos);
 void bitmap_free(struct bitmap *self);
 int bitmap_equals(struct bitmap *self, struct bitmap *other);
+int bitmap_equals_ewah(struct bitmap *self, struct ewah_bitmap *other);
 int bitmap_is_subset(struct bitmap *self, struct bitmap *other);
 int ewah_bitmap_is_subset(struct ewah_bitmap *self, struct bitmap *other);
 
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 23/24] pseudo-merge: implement support for finding existing merges
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (21 preceding siblings ...)
  2024-03-20 22:06 ` [PATCH 22/24] ewah: `bitmap_equals_ewah()` Taylor Blau
@ 2024-03-20 22:06 ` Taylor Blau
  2024-03-20 22:06 ` [PATCH 24/24] t/perf: implement performace tests for pseudo-merge bitmaps Taylor Blau
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:06 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

This patch implements support for reusing existing pseudo-merge commits
when writing bitmaps when there is an existing pseudo-merge bitmap which
has exactly the same set of parents as one that we are about to write.

Note that unstable pseudo-merges are likely to change between
consecutive repacks, and so are generally poor candidates for reuse.
However, stable pseudo-merges (see the configuration option
'bitmapPseudoMerge.<name>.stableThreshold') are by definition unlikely
to change between runs (as they represent long-running branches).

Because there is no index from a *set* of pseudo-merge parents to a
matching pseudo-merge bitmap, we have to construct the bitmap
corresponding to the set of parents for each pending pseudo-merge commit
and see if a matching bitmap exists.

This is technically quadratic in the number of pseudo-merges, but is OK
in practice for a couple of reasons:

  - non-matching pseudo-merge bitmaps are rejected quickly as soon as
    they differ in a single bit

  - already-matched pseudo-merge bitmaps are discarded from subsequent
    rounds of search

  - the number of pseudo-merges is generally small, even for large
    repositories

In order to do this, implement (a) a function that finds a matching
pseudo-merge given some uncompressed bitset describing its parents, (b)
a function that computes the bitset of parents for a given pseudo-merge
commit, and (c) call that function before computing the set of reachable
objects for some pending pseudo-merge.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c             | 15 ++++++--
 pack-bitmap.c                   | 32 +++++++++++++++++
 pack-bitmap.h                   |  2 ++
 pseudo-merge.c                  | 55 ++++++++++++++++++++++++++++
 pseudo-merge.h                  |  7 ++++
 t/t5333-pseudo-merge-bitmaps.sh | 64 +++++++++++++++++++++++++++++++++
 6 files changed, 173 insertions(+), 2 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 2d1b202fcd9..fdd84d31a68 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -19,6 +19,10 @@
 #include "tree-walk.h"
 #include "pseudo-merge.h"
 #include "oid-array.h"
+#include "config.h"
+#include "alloc.h"
+#include "refs.h"
+#include "strmap.h"
 
 struct bitmapped_commit {
 	struct commit *commit;
@@ -443,6 +447,7 @@ static int fill_bitmap_tree(struct bitmap *bitmap,
 }
 
 static int reused_bitmaps_nr;
+static int reused_pseudo_merge_bitmaps_nr;
 
 static int fill_bitmap_commit(struct bb_commit *ent,
 			      struct commit *commit,
@@ -467,7 +472,7 @@ static int fill_bitmap_commit(struct bb_commit *ent,
 			struct bitmap *remapped = bitmap_new();
 
 			if (commit->object.flags & BITMAP_PSEUDO_MERGE)
-				old = NULL;
+				old = pseudo_merge_bitmap_for_commit(old_bitmap, c);
 			else
 				old = bitmap_for_commit(old_bitmap, c);
 			/*
@@ -478,7 +483,10 @@ static int fill_bitmap_commit(struct bb_commit *ent,
 			if (old && !rebuild_bitmap(mapping, old, remapped)) {
 				bitmap_or(ent->bitmap, remapped);
 				bitmap_free(remapped);
-				reused_bitmaps_nr++;
+				if (commit->object.flags & BITMAP_PSEUDO_MERGE)
+					reused_pseudo_merge_bitmaps_nr++;
+				else
+					reused_bitmaps_nr++;
 				continue;
 			}
 			bitmap_free(remapped);
@@ -604,6 +612,9 @@ int bitmap_writer_build(struct packing_data *to_pack)
 			    the_repository);
 	trace2_data_intmax("pack-bitmap-write", the_repository,
 			   "building_bitmaps_reused", reused_bitmaps_nr);
+	trace2_data_intmax("pack-bitmap-write", the_repository,
+			   "building_bitmaps_pseudo_merge_reused",
+			   reused_pseudo_merge_bitmaps_nr);
 
 	stop_progress(&writer.progress);
 
diff --git a/pack-bitmap.c b/pack-bitmap.c
index be65f637cf5..5a5f8b7e69f 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1316,6 +1316,37 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 	return cb.base;
 }
 
+struct ewah_bitmap *pseudo_merge_bitmap_for_commit(struct bitmap_index *bitmap_git,
+						   struct commit *commit)
+{
+	struct commit_list *p;
+	struct bitmap *parents;
+	struct pseudo_merge *match = NULL;
+
+	if (!bitmap_git->pseudo_merges.nr)
+		return NULL;
+
+	parents = bitmap_new();
+
+	for (p = commit->parents; p; p = p->next) {
+		int pos = bitmap_position(bitmap_git, &p->item->object.oid);
+		if (pos < 0 || pos >= bitmap_num_objects(bitmap_git))
+			goto done;
+
+		bitmap_set(parents, pos);
+	}
+
+	match = pseudo_merge_for_parents(&bitmap_git->pseudo_merges,
+						parents);
+
+done:
+	bitmap_free(parents);
+	if (match)
+		return pseudo_merge_bitmap(&bitmap_git->pseudo_merges, match);
+
+	return NULL;
+}
+
 static void unsatisfy_all_pseudo_merges(struct bitmap_index *bitmap_git)
 {
 	uint32_t i;
@@ -2808,6 +2839,7 @@ void free_bitmap_index(struct bitmap_index *b)
 		 */
 		close_midx_revindex(b->midx);
 	}
+	free_pseudo_merge_map(&b->pseudo_merges);
 	free(b);
 }
 
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 25d3b8e604a..0fefef39bec 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -119,6 +119,8 @@ int rebuild_bitmap(const uint32_t *reposition,
 		   struct bitmap *dest);
 struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
 				      struct commit *commit);
+struct ewah_bitmap *pseudo_merge_bitmap_for_commit(struct bitmap_index *bitmap_git,
+						   struct commit *commit);
 void bitmap_writer_select_commits(struct commit **indexed_commits,
 				  unsigned int indexed_commits_nr);
 int bitmap_writer_build(struct packing_data *to_pack);
diff --git a/pseudo-merge.c b/pseudo-merge.c
index e111c9cd1a6..9e21fbb5062 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -682,3 +682,58 @@ int cascade_pseudo_merges(const struct pseudo_merge_map *pm,
 
 	return ret;
 }
+
+struct pseudo_merge *pseudo_merge_for_parents(const struct pseudo_merge_map *pm,
+					      struct bitmap *parents)
+{
+	struct pseudo_merge *match = NULL;
+	size_t i;
+
+	if (!pm->nr)
+		return NULL;
+
+	/*
+	 * NOTE: this loop is quadratic in the worst-case (where no
+	 * matching pseudo-merge bitmaps are found), but in practice
+	 * this is OK for a few reasons:
+	 *
+	 *   - Rejecting pseudo-merge bitmaps that do not match the
+	 *     given commit is done quickly (i.e. `bitmap_equals_ewah()`
+	 *     returns early when we know the two bitmaps aren't equal.
+	 *
+	 *   - Already matched pseudo-merge bitmaps (which we track with
+	 *     the `->satisfied` bit here) are skipped as potential
+	 *     candidates.
+	 *
+	 *   - The number of pseudo-merges should be small (in the
+	 *     hundreds for most repositories).
+	 *
+	 * If in the future this semi-quadratic behavior does become a
+	 * problem, another approach would be to keep track of which
+	 * pseudo-merges are still "viable" after enumerating the
+	 * pseudo-merge commit's parents:
+	 *
+	 *   - A pseudo-merge bitmap becomes non-viable when the bit(s)
+	 *     corresponding to one or more parent(s) of the given
+	 *     commit are not set in a candidate pseudo-merge's commits
+	 *     bitmap.
+	 *
+	 *   - After processing all bits, enumerate the remaining set of
+	 *     viable pseudo-merge bitmaps, and check that their
+	 *     popcount() matches the number of parents in the given
+	 *     commit.
+	 */
+	for (i = 0; i < pm->nr; i++) {
+		struct pseudo_merge *candidate = use_pseudo_merge(pm, &pm->v[i]);
+		if (!candidate || candidate->satisfied)
+			continue;
+		if (!bitmap_equals_ewah(parents, candidate->commits))
+			continue;
+
+		match = candidate;
+		match->satisfied = 1;
+		break;
+	}
+
+	return match;
+}
diff --git a/pseudo-merge.h b/pseudo-merge.h
index cc14e947e86..33acd00a3e5 100644
--- a/pseudo-merge.h
+++ b/pseudo-merge.h
@@ -208,4 +208,11 @@ int cascade_pseudo_merges(const struct pseudo_merge_map *pm,
 			  struct bitmap *result,
 			  struct bitmap *roots);
 
+/*
+ * Returns a pseudo-merge which contains the exact set of commits
+ * listed in the "parents" bitamp, or NULL if none could be found.
+ */
+struct pseudo_merge *pseudo_merge_for_parents(const struct pseudo_merge_map *pm,
+					      struct bitmap *parents);
+
 #endif
diff --git a/t/t5333-pseudo-merge-bitmaps.sh b/t/t5333-pseudo-merge-bitmaps.sh
index 909c17e301e..531f1924af4 100755
--- a/t/t5333-pseudo-merge-bitmaps.sh
+++ b/t/t5333-pseudo-merge-bitmaps.sh
@@ -22,6 +22,10 @@ test_pseudo_merges_cascades () {
 	test_trace2_data bitmap pseudo_merges_cascades "$1"
 }
 
+test_pseudo_merges_reused () {
+	test_trace2_data pack-bitmap-write building_bitmaps_pseudo_merge_reused "$1"
+}
+
 tag_everything () {
 	git rev-list --all --no-object-names >in &&
 	perl -lne '
@@ -322,4 +326,64 @@ test_expect_success 'pseudo-merge overlap stale traversal' '
 	)
 '
 
+test_expect_success 'pseudo-merge reuse' '
+	git init pseudo-merge-reuse &&
+	(
+		cd pseudo-merge-reuse &&
+
+		stable="1641013200" && # 2022-01-01
+		unstable="1672549200" && # 2023-01-01
+
+		for date in $stable $unstable
+		do
+			test_commit_bulk --date "$date +0000" 128 &&
+			test_tick || return 1
+		done &&
+
+		tag_everything &&
+
+		git \
+			-c bitmapPseudoMerge.test.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.test.maxMerges=1 \
+			-c bitmapPseudoMerge.test.threshold=now \
+			-c bitmapPseudoMerge.test.stableThreshold=$(($unstable - 1)) \
+			-c bitmapPseudoMerge.test.stableSize=512 \
+			repack -adb &&
+
+		test_pseudo_merges >merges &&
+		test_line_count = 2 merges &&
+
+		test_pseudo_merge_commits 0 >stable-oids.before &&
+		test_pseudo_merge_commits 1 >unstable-oids.before &&
+
+		: >trace2.txt &&
+		GIT_TRACE2_EVENT=$PWD/trace2.txt git \
+			-c bitmapPseudoMerge.test.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.test.maxMerges=2 \
+			-c bitmapPseudoMerge.test.threshold=now \
+			-c bitmapPseudoMerge.test.stableThreshold=$(($unstable - 1)) \
+			-c bitmapPseudoMerge.test.stableSize=512 \
+			repack -adb &&
+
+		test_pseudo_merges_reused 1 <trace2.txt &&
+
+		test_pseudo_merges >merges &&
+		test_line_count = 3 merges &&
+
+		test_pseudo_merge_commits 0 >stable-oids.after &&
+		for i in 1 2
+		do
+			test_pseudo_merge_commits $i || return 1
+		done >unstable-oids.after &&
+
+		sort -u <stable-oids.before >expect &&
+		sort -u <stable-oids.after >actual &&
+		test_cmp expect actual &&
+
+		sort -u <unstable-oids.before >expect &&
+		sort -u <unstable-oids.after >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.44.0.303.g1dc5e5b124c


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH 24/24] t/perf: implement performace tests for pseudo-merge bitmaps
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (22 preceding siblings ...)
  2024-03-20 22:06 ` [PATCH 23/24] pseudo-merge: implement support for finding existing merges Taylor Blau
@ 2024-03-20 22:06 ` Taylor Blau
  2024-03-21 19:50 ` [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Junio C Hamano
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-03-20 22:06 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Implement a straightforward performance test demonstrating the benefit
of pseudo-merge bitmaps by measuring how long it takes to count
reachable objects in a few different scenarios:

  - without bitmaps, to demonstrate a reasonable baseline
  - with bitmaps, but without pseudo-merges
  - with bitmaps and pseudo-merges

Results from running this test on git.git are as follows:

    Test                                                                this tree
    -----------------------------------------------------------------------------------
    5333.2: git rev-list --count --all --objects (no bitmaps)           3.46(3.37+0.09)
    5333.3: git rev-list --count --all --objects (no pseudo-merges)     0.13(0.11+0.01)
    5333.4: git rev-list --count --all --objects (with pseudo-merges)   0.12(0.11+0.01)

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/p5333-pseudo-merge-bitmaps.sh | 32 ++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)
 create mode 100755 t/perf/p5333-pseudo-merge-bitmaps.sh

diff --git a/t/perf/p5333-pseudo-merge-bitmaps.sh b/t/perf/p5333-pseudo-merge-bitmaps.sh
new file mode 100755
index 00000000000..4bec409d10e
--- /dev/null
+++ b/t/perf/p5333-pseudo-merge-bitmaps.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+
+test_description='pseudo-merge bitmaps'
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test_expect_success 'setup' '
+	git \
+		-c bitmapPseudoMerge.all.pattern="refs/" \
+		-c bitmapPseudoMerge.all.threshold=now \
+		-c bitmapPseudoMerge.all.stableThreshold=never \
+		-c bitmapPseudoMerge.all.maxMerges=64 \
+		-c pack.writeBitmapLookupTable=true \
+		repack -adb
+'
+
+test_perf 'git rev-list --count --all --objects (no bitmaps)' '
+	git rev-list --objects --all
+'
+
+test_perf 'git rev-list --count --all --objects (no pseudo-merges)' '
+	GIT_TEST_USE_PSEDUO_MERGES=0 \
+		git rev-list --objects --all --use-bitmap-index
+'
+
+test_perf 'git rev-list --count --all --objects (with pseudo-merges)' '
+	GIT_TEST_USE_PSEDUO_MERGES=1 \
+		git rev-list --objects --all --use-bitmap-index
+'
+
+test_done
-- 
2.44.0.303.g1dc5e5b124c

^ permalink raw reply related	[flat|nested] 157+ messages in thread

* Re: [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (23 preceding siblings ...)
  2024-03-20 22:06 ` [PATCH 24/24] t/perf: implement performace tests for pseudo-merge bitmaps Taylor Blau
@ 2024-03-21 19:50 ` Junio C Hamano
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 157+ messages in thread
From: Junio C Hamano @ 2024-03-21 19:50 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren

Taylor Blau <me@ttaylorr.com> writes:

> This series implements a new idea in the pack-bitmap machinery called
> "pseudo-merge reachability bitmaps".

When you work on your next series, please describe the feature in a
different order.  Giving name, and then mechanism (from coarse to
detailed), and then finally what it is good for, is a sure way to
discourage people from reading the long description.  What is the
problem, what insight led to a new solution to the problem, and what
the mechansim looks like, is probably an order to keep the reader
engaged better.


^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 01/24] Documentation/technical: describe pseudo-merge bitmaps format
  2024-03-20 22:05 ` [PATCH 01/24] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
@ 2024-03-21 21:24   ` Junio C Hamano
  2024-03-21 22:13     ` Taylor Blau
  0 siblings, 1 reply; 157+ messages in thread
From: Junio C Hamano @ 2024-03-21 21:24 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren

Taylor Blau <me@ttaylorr.com> writes:

> Prepare to implement pseudo-merge bitmaps over the next several commits
> by first describing the serialization format which will store the new
> pseudo-merge bitmaps themselves.

Before talking about what problem, which is not addressed adequately
with existing mechanisms, it will solve?

> +Pseudo-merge bitmaps
> +--------------------
> +
> +If the `BITMAP_OPT_PSEUDO_MERGES` flag is set, a variable number of
> +bytes (preceding the name-hash cache, commit lookup table, and trailing
> +checksum) of the `.bitmap` file is used to store pseudo-merge bitmaps.
> +
> +A "pseudo-merge bitmap" is used to refer to a pair of bitmaps, as
> +follows:
> +
> +Commit bitmap::
> +
> +  A bitmap whose set bits describe the set of commits included in the
> +  pseudo-merge's "merge" bitmap (as below).
> +
> +Merge bitmap::
> +
> +  A bitmap whose set bits describe the reachability closure over the set
> +  of commits in the pseudo-merge's "commits" bitmap (as above). An
> +  identical bitmap would be generated for an octopus merge with the same
> +  set of parents as described in the commits bitmap.
> +
> +Pseudo-merge bitmaps can accelerate bitmap traversals when all commits
> +for a given pseudo-merge are listed on either side of the traversal,
> +either directly (by explicitly asking for them as part of the `HAVES`
> +or `WANTS`) or indirectly (by encountering them during a fill-in
> +traversal).

"either side of" implies there are two sides.  Is it correct to
understand that they are "the side reachable from HAVE" and "the
other side that is reachable from WANT"?

> +=== Use-cases
> +
> +For example, suppose there exists a pseudo-merge bitmap with a large
> +number of commits, all of which are listed in the `WANTS` section of
> +some bitmap traversal query. When pseudo-merge bitmaps are enabled, the
> +bitmap machinery can quickly determine there is a pseudo-merge which
> +satisfies some subset of the wanted objects on either side of the query.

Here you only talk about WANT but still mention "either side of".
How would the HAVE side of the query contribute to the computation?

> +  ** `commits_bitmap`, an EWAH-compressed bitmap describing the set of
> +     commits included in the this psuedo-merge.
> +
> +  ** `merge_bitmap`, an EWAH-compressed bitmap describing the union of
> +     the set of objects reachable from all commits listed in the
> +     `commits_bitmap`.

"union of the set of objects reachable from all", meaning if an
object is listed here, all commits in commits_bitmap are guaranteed
to reach that object?  It sounds more like the intersction of sets
than union.

> +* A lookup table, mapping pseudo-merged commits to the pseudo-merges
> +  they belong to. Entries appear in increasing order of each commit's
> +  bit position. Each entry is 12 bytes wide, and is comprised of the
> +  following:

"a pseudo-merged commit" is a new term.  It was explained what "a
pseudo-merge bitmap" is already, and it was explained that "a
pseudo-merge bitmap" consists of a pair of bitmaps (commit bitmap
that records which commit belongs to the "pseudo-merge", and merge
bitmap that records objects reachable from all commits in the commit
bitmap).  But we haven't heard of "a pseudo-merged commit", or what
the verb "to pseudo-merge a commit" means.

Does it merely mean "a commit that is recorded in the commit-bitmap
half of a pseudo-merge bitmap"?  It still is unclear at this point
in the description if a commit can be part of only one such
commit-bitmap and makes readers reading hiccup, until a few
paragraphs below where extended table is there to help a commit
recorded in commit-bitmap of more than one pseudo-merge bitmaps.

I'll stop here for now, but this made me even more convinced that
the presentation order needs to be rethought to sell why this whole
thing is a good idea by telling readers what problem it is solving,
why a new data structure helps and how, etc.  Perhaps you can start
by trying to write a paragraph of description for the topic suitable
for the "What's cooking" report, which needs to do a good elevator
pitch.

Thanks.

> +  ** `commit_pos`, a 4-byte unsigned value (in network byte-order)
> +     containing the bit position for this commit.
> +
> +  ** `offset`, an 8-byte unsigned value (also in network byte-order)
> +  containing either one of two possible offsets, depending on whether or
> +  not the most-significant bit is set.
> +
> +    *** If unset (i.e. `offset & ((uint64_t)1<<63) == 0`), the offset
> +	(relative to the beginning of the `.bitmap` file) at which the
> +	pseudo-merge bitmap for this commit can be read. This indicates
> +	only a single pseudo-merge bitmap contains this commit.
> +
> +    *** If set (i.e. `offset & ((uint64_t)1<<63) != 0`), the offset
> +	(again relative to the beginning of the `.bitmap` file) at which
> +	the extended offset table can be located describing the set of
> +	pseudo-merge bitmaps which contain this commit. This indicates
> +	that multiple pseudo-merge bitmaps contain this commit.
> +
> +* An (optional) extended lookup table (written if and only if there is
> +  at least one commit which appears in more than one pseudo-merge).
> +  There are as many entries as commits which appear in multiple
> +  pseudo-merges. Each entry contains the following:
> +
> +  ** `N`, a 4-byte unsigned value equal to the number of pseudo-merges
> +     which contain a given commit.
> +
> +  ** An array of `N` 8-byte unsigned values, each of which is
> +     interpreted as an offset (relative to the beginning of the
> +     `.bitmap` file) at which a pseudo-merge bitmap for this commit can
> +     be read. These values occur in no particular order.
> +
> +* Positions for all pseudo-merges, each stored as an 8-byte unsigned
> +  value (in network byte-order) containing the offset (relative to the
> +  beginnign of the `.bitmap` file) of each consecutive pseudo-merge.
> +
> +* A 4-byte unsigned value (in network byte-order) equal to the number of
> +  pseudo-merges.
> +
> +* A 4-byte unsigned value (in network byte-order) equal to the number of
> +  unique commits which appear in any pseudo-merge.
> +
> +* An 8-byte unsigned value (in network byte-order) equal to the number
> +  of bytes between the start of the pseudo-merge section and the
> +  beginning of the lookup table.
> +
> +* An 8-byte unsigned value (in network byte-order) equal to the number
> +  of bytes in the pseudo-merge section (including this field).

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 01/24] Documentation/technical: describe pseudo-merge bitmaps format
  2024-03-21 21:24   ` Junio C Hamano
@ 2024-03-21 22:13     ` Taylor Blau
  2024-03-21 22:22       ` Junio C Hamano
  0 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-03-21 22:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Elijah Newren

On Thu, Mar 21, 2024 at 02:24:10PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > +Pseudo-merge bitmaps can accelerate bitmap traversals when all commits
> > +for a given pseudo-merge are listed on either side of the traversal,
> > +either directly (by explicitly asking for them as part of the `HAVES`
> > +or `WANTS`) or indirectly (by encountering them during a fill-in
> > +traversal).
>
> "either side of" implies there are two sides.  Is it correct to
> understand that they are "the side reachable from HAVE" and "the
> other side that is reachable from WANT"?

Yes, exactly.

> > +=== Use-cases
> > +
> > +For example, suppose there exists a pseudo-merge bitmap with a large
> > +number of commits, all of which are listed in the `WANTS` section of
> > +some bitmap traversal query. When pseudo-merge bitmaps are enabled, the
> > +bitmap machinery can quickly determine there is a pseudo-merge which
> > +satisfies some subset of the wanted objects on either side of the query.
>
> Here you only talk about WANT but still mention "either side of".
> How would the HAVE side of the query contribute to the computation?

Apologies for the confusion. In the first sentence, I'm talking about a
specific case where all parents of a pseudo-merge commit are listed
in/reachable from the WANTS side of a traversal.

The second sentence describes the general case. The order should be
swapped so that the second sentence comes first, and vice-versa for the
sentence beginning with "For example, [...]".

> > +  ** `commits_bitmap`, an EWAH-compressed bitmap describing the set of
> > +     commits included in the this psuedo-merge.
> > +
> > +  ** `merge_bitmap`, an EWAH-compressed bitmap describing the union of
> > +     the set of objects reachable from all commits listed in the
> > +     `commits_bitmap`.
>
> "union of the set of objects reachable from all", meaning if an
> object is listed here, all commits in commits_bitmap are guaranteed
> to reach that object?  It sounds more like the intersction of sets
> than union.

Oops, yes, I definitely meant intersection here. Thanks for a close
read.

> > +* A lookup table, mapping pseudo-merged commits to the pseudo-merges
> > +  they belong to. Entries appear in increasing order of each commit's
> > +  bit position. Each entry is 12 bytes wide, and is comprised of the
> > +  following:
>
> "a pseudo-merged commit" is a new term.  It was explained what "a
> pseudo-merge bitmap" is already, and it was explained that "a
> pseudo-merge bitmap" consists of a pair of bitmaps (commit bitmap
> that records which commit belongs to the "pseudo-merge", and merge
> bitmap that records objects reachable from all commits in the commit
> bitmap).  But we haven't heard of "a pseudo-merged commit", or what
> the verb "to pseudo-merge a commit" means.
>
> Does it merely mean "a commit that is recorded in the commit-bitmap
> half of a pseudo-merge bitmap"?  It still is unclear at this point
> in the description if a commit can be part of only one such
> commit-bitmap and makes readers reading hiccup, until a few
> paragraphs below where extended table is there to help a commit
> recorded in commit-bitmap of more than one pseudo-merge bitmaps.

Sorry for being unclear here. A pseudo-merge "commit" refers to a
conceptual octopus-merge commit whose parents are the ones listed in the
"parents" bitmap of the pseudo-merge bitmap, as defined. The "objects"
bitmap is the set of objects reachable from that imaginary commit, or,
equivalently, the intersection of objects reachable from the commits
listed in the parents bitmap.

I'll make this more clear in the next version, thanks.

> I'll stop here for now, but this made me even more convinced that
> the presentation order needs to be rethought to sell why this whole
> thing is a good idea by telling readers what problem it is solving,
> why a new data structure helps and how, etc.  Perhaps you can start
> by trying to write a paragraph of description for the topic suitable
> for the "What's cooking" report, which needs to do a good elevator
> pitch.

I'm sorry that the ordering was sub-optimal here. For the purposes of
the WC report, I would write the following if I were queuing this topic:

  * tb/pseudo-merge-bitmaps (2024-03-21) 24 commits
   - t/perf: implement performace tests for pseudo-merge bitmaps
   - pseudo-merge: implement support for finding existing merges
   - ewah: `bitmap_equals_ewah()`
   - pack-bitmap: extra trace2 information
   - pack-bitmap.c: use pseudo-merges during traversal
   - t/test-lib-functions.sh: support `--date` in `test_commit_bulk()`
   - pack-bitmap: implement test helpers for pseudo-merge
   - ewah: implement `ewah_bitmap_popcount()`
   - pseudo-merge: implement support for reading pseudo-merge commits
   - pack-bitmap.c: read pseudo-merge extension
   - pseudo-merge: scaffolding for reads
   - pack-bitmap: extract `read_bitmap()` function
   - pack-bitmap-write.c: write pseudo-merge table
   - pack-bitmap-write.c: select pseudo-merge commits
   - pseudo-merge: implement support for selecting pseudo-merge commits
   - pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public
   - pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()`
   - pack-bitmap-write: support storing pseudo-merge commits
   - pseudo-merge.ch: initial commit
   - pack-bitmap: move some initialization to `bitmap_writer_init()`
   - pack-bitmap: drop unused `max_bitmaps` parameter
   - ewah: implement `ewah_bitmap_is_subset()`
   - config: repo_config_get_expiry()
   - Documentation/technical: describe pseudo-merge bitmaps format

   The pack-bitmap machinery has been extended to write bitmaps for
   pseudo-merges, which are imaginary commits which act as octopus
   merges covering groups of the un-bitmapped parts of history at
   reference tips.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 01/24] Documentation/technical: describe pseudo-merge bitmaps format
  2024-03-21 22:13     ` Taylor Blau
@ 2024-03-21 22:22       ` Junio C Hamano
  0 siblings, 0 replies; 157+ messages in thread
From: Junio C Hamano @ 2024-03-21 22:22 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren

Taylor Blau <me@ttaylorr.com> writes:

>    The pack-bitmap machinery has been extended to write bitmaps for
>    pseudo-merges, which are imaginary commits which act as octopus
>    merges covering groups of the un-bitmapped parts of history at
>    reference tips.

That is "what the topic does", and does not cover "why does it do
so" and/or "for what effect".

I can sort-of see that allowing us to record a pre-combined bitmap
for octopus merges that do not exist, we have more flexibility
compared to the original bitmap machinery where we can only put
bitmap to commits that exist.  What is not clear is what this
additional flexibility is used for.

Does the approach takes advantage of that additional flexibility to
reduce redundancy, allowing us to have the same bitmap coverage with
smaller number of bitmaps?

Thanks.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 02/24] config: repo_config_get_expiry()
  2024-03-20 22:05 ` [PATCH 02/24] config: repo_config_get_expiry() Taylor Blau
@ 2024-04-10 17:54   ` Jeff King
  2024-04-29 19:39     ` Taylor Blau
  0 siblings, 1 reply; 157+ messages in thread
From: Jeff King @ 2024-04-10 17:54 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Mar 20, 2024 at 06:05:05PM -0400, Taylor Blau wrote:

> Introduce a `repo_config_get_expiry()` variant in the style of functions
> introduced by 3b256228a6 (config: read config from a repository object,
> 2017-06-22) to read a single value without requiring the git_config()
> callback-style approach.
> 
> This new function is similar to the existing implementation in
> `git_config_get_expiry()`, however it differs in that it fills out a
> `timestamp_t` value through a pointer, instead of merely checking and
> discarding the result (and returning it as a string).

The existing git_config_get_expiry() is a funny interface. That makes me
wonder how its existing callers handle this issue. In most cases we are
just grabbing values for git-gc to pass along as strings to
sub-programs. The only other case resolves via approxidate immediately,
in get_shared_index_expire_date(). Which sort of leaks the string,
though it is technically stuffed away in a global that we never look at.

So I can see the value of an interface which just returns the parsed
timestamp and handles the string itself. Weirdly we even have
git_config_get_expiry_in_days() which works like that, but always scales
the timestamp! So we could implement that in terms of this new function.

But...

> +int repo_config_get_expiry(struct repository *repo,
> +			   const char *key, const char **dest)
> +{
> +	int ret;
> +
> +	git_config_check_init(repo);
> +
> +	ret = repo_config_get_string(repo, key, (char **)dest);
> +	if (ret)
> +		return ret;
> +	if (strcmp(*dest, "now")) {
> +		timestamp_t now = approxidate("now");
> +		if (approxidate(*dest) >= now)
> +			git_die_config(key, _("Invalid %s: '%s'"), key, *dest);
> +	}
> +	return ret;
> +}

...does this actually do what the commit message says? It looks
identical to git_config_get_expiry() except that it takes a repository
parameter?

> This function will gain its first caller in a subsequent commit to parse
> a "threshold" parameter for excluding too-recent commits from
> pseudo-merge groups.

So presumably you call approxidate() there in that new caller. Looks
like that would be patch 10. But I don't see a call to the new function
at all! It just uses git_config_expiry_date(), which does what you need
(it doesn't use the configset, but it looks like you ended up doing a
config callback approach anyway).

So can this patch be dropped?

-Peff

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 03/24] ewah: implement `ewah_bitmap_is_subset()`
  2024-03-20 22:05 ` [PATCH 03/24] ewah: implement `ewah_bitmap_is_subset()` Taylor Blau
@ 2024-04-10 18:05   ` Jeff King
  2024-04-29 19:47     ` Taylor Blau
  0 siblings, 1 reply; 157+ messages in thread
From: Jeff King @ 2024-04-10 18:05 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Mar 20, 2024 at 06:05:08PM -0400, Taylor Blau wrote:

> In order to know whether a given pseudo-merge (comprised of a "parents"
> and "objects" bitmaps) is "satisfied" and can be OR'd into the bitmap
> result, we need to be able to quickly determine whether the "parents"
> bitmap is a subset of the current set of objects reachable on either
> side of a traversal.
> 
> Implement a helper function to prepare for that, which determines
> whether an EWAH bitmap (the parents bitmap from the pseudo-merge) is a
> subset of a non-EWAH bitmap (in this case, the results bitmap from
> either side of the traversal).
> 
> This function makes use of the EWAH iterator to avoid inflating any part
> of the EWAH bitmap after we determine it is not a subset of the non-EWAH
> bitmap. This "fail-fast" allows us to avoid a potentially large amount
> of wasted effort.

Makes sense, as we have an expanded bitmap_is_subset() already, and this
should be more efficient.

> diff --git a/ewah/bitmap.c b/ewah/bitmap.c
> index ac7e0af622a..5bdae3fb07b 100644
> --- a/ewah/bitmap.c
> +++ b/ewah/bitmap.c
> @@ -138,6 +138,49 @@ void bitmap_or(struct bitmap *self, const struct bitmap *other)
>  		self->words[i] |= other->words[i];
>  }
>  
> +int ewah_bitmap_is_subset(struct ewah_bitmap *self, struct bitmap *other)

It wasn't immediately obvious to me if we were checking that "other" is
a subset of "self" or vice versa. I wonder if we could use different
names here to make it more clear (though really it matters more in the
declaration, not the implementation).

I think bitmap_is_subset() suffers from the same issue (and is even more
confusing because the two have the same type!). Maybe just a header file
comment would help?

> +{
> +	struct ewah_iterator it;
> +	eword_t word;
> +	size_t i;
> +
> +	ewah_iterator_init(&it, self);
> +
> +	for (i = 0; i < other->word_alloc; i++) {
> +		if (!ewah_iterator_next(&word, &it)) {
> +			/*
> +			 * If we reached the end of `self`, and haven't
> +			 * rejected `self` as a possible subset of
> +			 * `other` yet, then we are done and `self` is
> +			 * indeed a subset of `other`.
> +			 */
> +			return 1;
> +		}
> +		if (word & ~other->words[i]) {
> +			/*
> +			 * Otherwise, compare the next two pairs of
> +			 * words. If the word from `self` has bit(s) not
> +			 * in the word from `other`, `self` is not a
> +			 * proper subset of `other`.
> +			 */
> +			return 0;
> +		}
> +	}

OK, so we expand the ewah as we go, comparing words, and then quit early
if we can. That's the best we can do when comparing to a regular bitmap.
I suspect we could do more clever things for ewah-to-ewah (like saying
"oh, they both have a run of 10,000 zeroes" without expanding), but
that wouldn't be helpful here, as your use case will be comparing
against a bitmap we're building in memory.

I think your use of the phrase "proper subset" is a little confusing
here, as it is not a subset at all, let alone the distinction between a
regular and proper one (in the mathematical definitions).

> +	/*
> +	 * If we got to this point, there may be zero or more words
> +	 * remaining in `self`, with no remaining words left in `other`.
> +	 * If there are any bits set in the remaining word(s) in `self`,
> +	 * then `self` is not a proper subset of `other`.
> +	 */
> +	while (ewah_iterator_next(&word, &it))
> +		if (word)
> +			return 0;

OK, so here we keep expanding to see if there are any bits set, meaning
we may read past a bunch of 0-words that we don't care about. I suspect
this could be optimized to just ask the ewah "are there any bits left?"
but I don't think we have an easy function for that. And it's not clear
to me that it would produce measurable speedups anyway, so probably not
worth worrying about.

As above, ditto on the use of "proper subset" here.

-Peff

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 04/24] pack-bitmap: drop unused `max_bitmaps` parameter
  2024-03-20 22:05 ` [PATCH 04/24] pack-bitmap: drop unused `max_bitmaps` parameter Taylor Blau
@ 2024-04-10 18:06   ` Jeff King
  0 siblings, 0 replies; 157+ messages in thread
From: Jeff King @ 2024-04-10 18:06 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Mar 20, 2024 at 06:05:11PM -0400, Taylor Blau wrote:

> The `max_bitmaps` parameter in `bitmap_writer_select_commits()` was
> introduced back in 7cc8f97108 (pack-objects: implement bitmap writing,
> 2013-12-21), making it original to the bitmap implementation in Git
> itself.
> 
> When that patch was merged via 0f9e62e084 (Merge branch
> 'jk/pack-bitmap', 2014-02-27), its sole caller in builtin/pack-objects.c
> passed a value of "-1" for `max_bitmaps`, indicating no limit.
> 
> Since then, the only other caller (in midx.c, added via c528e17966
> (pack-bitmap: write multi-pack bitmaps, 2021-08-31)) also uses a value
> of "-1" for `max_bitmaps`.
> 
> Since no callers have needed a finite limit for the `max_bitmaps`
> parameter in the nearly decade that has passed since 0f9e62e084, let's
> remove the parameter and any dead pieces of code connected to it.

Great, I'm happy to see dead code being cleaned up. And thanks (as
always) for digging into the history to explain how we got here.

-Peff

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 05/24] pack-bitmap: move some initialization to `bitmap_writer_init()`
  2024-03-20 22:05 ` [PATCH 05/24] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
@ 2024-04-10 18:10   ` Jeff King
  0 siblings, 0 replies; 157+ messages in thread
From: Jeff King @ 2024-04-10 18:10 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Mar 20, 2024 at 06:05:13PM -0400, Taylor Blau wrote:

> The pack-bitmap-writer machinery uses a oidmap (backed by khash.h) to
> map from commits selected for bitmaps (by OID) to a bitmapped_commit
> structure (containing the bitmap itself, among other things like its XOR
> offset, etc.)
> 
> This map was initialized at the end of `bitmap_writer_build()`. New
> entries are added in `pack-bitmap-write.c::store_selected()`, which is
> called by the bitmap_builder machinery (which is responsible for
> traversing history and generating the actual bitmaps).
> 
> Reorganize when this field is initialized and when entries are added to
> it so that we can quickly determine whether a commit is a candidate for
> pseudo-merge selection, or not (since it was already selected to receive
> a bitmap, and thus is ineligible for pseudo-merge inclusion).

OK, makes sense. I don't think this should violate any assumptions in
the current bitmap code (and the sanity checks for duplicate/missing
entries in the hash seem right to me).

-Peff

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 02/24] config: repo_config_get_expiry()
  2024-04-10 17:54   ` Jeff King
@ 2024-04-29 19:39     ` Taylor Blau
  0 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 19:39 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Apr 10, 2024 at 01:54:32PM -0400, Jeff King wrote:
> > This function will gain its first caller in a subsequent commit to parse
> > a "threshold" parameter for excluding too-recent commits from
> > pseudo-merge groups.
>
> So presumably you call approxidate() there in that new caller. Looks
> like that would be patch 10. But I don't see a call to the new function
> at all! It just uses git_config_expiry_date(), which does what you need
> (it doesn't use the configset, but it looks like you ended up doing a
> config callback approach anyway).
>
> So can this patch be dropped?

Wow, yes, definitely -- this patch can be absolutely dropped.

I suspect what happened here is that this patch is a relic from before I
introduced pseudo-merge.c::pseudo_merge_config(), which is a standard
callback for `git_config()`.

I *think* the reason for the change is that I wanted to use the
parse_config_key() function to parse different named pseudo-merge groups
separately. The idea of having multiple named pseudo-merge groups was
introduced after I wrote this patch, and I suspect that I never realized
that this patch could be dropped as a result.

Thanks for catching, I can cleanly just drop this patch from the series
and everything works as expected.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH 03/24] ewah: implement `ewah_bitmap_is_subset()`
  2024-04-10 18:05   ` Jeff King
@ 2024-04-29 19:47     ` Taylor Blau
  0 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 19:47 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Wed, Apr 10, 2024 at 02:05:05PM -0400, Jeff King wrote:
> > This function makes use of the EWAH iterator to avoid inflating any part
> > of the EWAH bitmap after we determine it is not a subset of the non-EWAH
> > bitmap. This "fail-fast" allows us to avoid a potentially large amount
> > of wasted effort.
>
> Makes sense, as we have an expanded bitmap_is_subset() already, and this
> should be more efficient.

Yep, the idea is that we already have a deflated non-EWAH bitmap in
memory, but we're comparing it against a potentially large EWAH
compressed bitmap.

If we can determine early that the EWAH bitmap has bits that are *not*
in the non-EWAH bitmap, then we can avoid inflating a large part of the
EWAH bitmap.

> > diff --git a/ewah/bitmap.c b/ewah/bitmap.c
> > index ac7e0af622a..5bdae3fb07b 100644
> > --- a/ewah/bitmap.c
> > +++ b/ewah/bitmap.c
> > @@ -138,6 +138,49 @@ void bitmap_or(struct bitmap *self, const struct bitmap *other)
> >  		self->words[i] |= other->words[i];
> >  }
> >
> > +int ewah_bitmap_is_subset(struct ewah_bitmap *self, struct bitmap *other)
>
> It wasn't immediately obvious to me if we were checking that "other" is
> a subset of "self" or vice versa. I wonder if we could use different
> names here to make it more clear (though really it matters more in the
> declaration, not the implementation).

We check whether 'self' is a subset of 'other', but I'll document it
here for both functions.

> I think bitmap_is_subset() suffers from the same issue (and is even more
> confusing because the two have the same type!). Maybe just a header file
> comment would help?

Yeah, agreed.

> I think your use of the phrase "proper subset" is a little confusing
> here, as it is not a subset at all, let alone the distinction between a
> regular and proper one (in the mathematical definitions).

Thanks for catching, this should definitely say just "subset"
(indicating that 'self' and 'other' can have an identical set of bits in
each and self is still considered a subset of other).

> > +	/*
> > +	 * If we got to this point, there may be zero or more words
> > +	 * remaining in `self`, with no remaining words left in `other`.
> > +	 * If there are any bits set in the remaining word(s) in `self`,
> > +	 * then `self` is not a proper subset of `other`.
> > +	 */
> > +	while (ewah_iterator_next(&word, &it))
> > +		if (word)
> > +			return 0;
>
> OK, so here we keep expanding to see if there are any bits set, meaning
> we may read past a bunch of 0-words that we don't care about. I suspect
> this could be optimized to just ask the ewah "are there any bits left?"
> but I don't think we have an easy function for that. And it's not clear
> to me that it would produce measurable speedups anyway, so probably not
> worth worrying about.

Yep.

> As above, ditto on the use of "proper subset" here.

Thanks, fixed.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 157+ messages in thread

* [PATCH v2 01/23] Documentation/technical: describe pseudo-merge bitmaps format
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
@ 2024-04-29 20:42   ` Taylor Blau
  2024-05-06 11:52     ` Patrick Steinhardt
  2024-04-29 20:43   ` [PATCH v2 02/23] ewah: implement `ewah_bitmap_is_subset()` Taylor Blau
                     ` (22 subsequent siblings)
  23 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:42 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Prepare to implement pseudo-merge bitmaps over the next several commits
by first describing the serialization format which will store the new
pseudo-merge bitmaps themselves.

This format is implemented as an optional extension within the bitmap v1
format, making it compatible with previous versions of Git, as well as
the original .bitmap implementation within JGit.

The format (as well as a general description of pseudo-merge bitmaps,
and motivating use-case(s)) is described in detail in the patch contents
below, but the high-level description is as follows:

  - An array of pseudo-merge bitmaps, each containing a pair of EWAH
    bitmaps: one describing the set of pseudo-merge "parents", and
    another describing the set of object(s) reachable from those
    parents.

  - A lookup table to determine which pseudo-merge(s) a given commit
    appears in. An optional extended lookup table follows when there is
    at least one commit which appears in multiple pseudo-merge groups.

  - Trailing metadata, including the number of pseudo-merge(s), number
    of unique parents, the offset within the .bitmap file for the
    pseudo-merge commit lookup table, and the size of the optional
    extension itself.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/bitmap-format.txt | 179 ++++++++++++++++++++++
 1 file changed, 179 insertions(+)

diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index f5d200939b0..63a7177ac08 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -255,3 +255,182 @@ triplet is -
 	xor_row (4 byte integer, network byte order): ::
 	The position of the triplet whose bitmap is used to compress
 	this one, or `0xffffffff` if no such bitmap exists.
+
+Pseudo-merge bitmaps
+--------------------
+
+If the `BITMAP_OPT_PSEUDO_MERGES` flag is set, a variable number of
+bytes (preceding the name-hash cache, commit lookup table, and trailing
+checksum) of the `.bitmap` file is used to store pseudo-merge bitmaps.
+
+A "pseudo-merge bitmap" is used to refer to a pair of bitmaps, as
+follows:
+
+Commit bitmap::
+
+  A bitmap whose set bits describe the set of commits included in the
+  pseudo-merge's "merge" bitmap (as below).
+
+Merge bitmap::
+
+  A bitmap whose set bits describe the reachability closure over the set
+  of commits in the pseudo-merge's "commits" bitmap (as above). An
+  identical bitmap would be generated for an octopus merge with the same
+  set of parents as described in the commits bitmap.
+
+Pseudo-merge bitmaps can accelerate bitmap traversals when all commits
+for a given pseudo-merge are listed on either side of the traversal,
+either directly (by explicitly asking for them as part of the `HAVES`
+or `WANTS`) or indirectly (by encountering them during a fill-in
+traversal).
+
+=== Use-cases
+
+For example, suppose there exists a pseudo-merge bitmap with a large
+number of commits, all of which are listed in the `WANTS` section of
+some bitmap traversal query. When pseudo-merge bitmaps are enabled, the
+bitmap machinery can quickly determine there is a pseudo-merge which
+satisfies some subset of the wanted objects on either side of the query.
+Then, we can inflate the EWAH-compressed bitmap, and `OR` it in to the
+resulting bitmap. By contrast, without pseudo-merge bitmaps, we would
+have to repeat the decompression and `OR`-ing step over a potentially
+large number of individual bitmaps, which can take proportionally more
+time.
+
+Another benefit of pseudo-merges arises when there is some combination
+of (a) a large number of references, with (b) poor bitmap coverage, and
+(c) deep, nested trees, making fill-in traversal relatively expensive.
+For example, suppose that there are a large enough number of tags where
+bitmapping each of the tags individually is infeasible. Without
+pseudo-merge bitmaps, computing the result of, say, `git rev-list
+--use-bitmap-index --count --objects --tags` would likely require a
+large amount of fill-in traversal. But when a large quantity of those
+tags are stored together in a pseudo-merge bitmap, the bitmap machinery
+can take advantage of the fact that we only care about the union of
+objects reachable from all of those tags, and answer the query much
+faster.
+
+=== File format
+
+If enabled, pseudo-merge bitmaps are stored in an optional section at
+the end of a `.bitmap` file. The format is as follows:
+
+....
++-------------------------------------------+
+|               .bitmap File                |
++-------------------------------------------+
+|                                           |
+|  Pseudo-merge bitmaps (Variable Length)   |
+|  +---------------------------+            |
+|  | commits_bitmap (EWAH)     |            |
+|  +---------------------------+            |
+|  | merge_bitmap (EWAH)       |            |
+|  +---------------------------+            |
+|                                           |
++-------------------------------------------+
+|                                           |
+|  Lookup Table                             |
+|  +------------+--------------+            |
+|  | commit_pos |    offset    |            |
+|  +------------+--------------+            |
+|  |  4 bytes   |   8 bytes    |            |
+|  +------------+--------------+            |
+|                                           |
+|  Offset Cases:                            |
+|  -------------                            |
+|                                           |
+|  1. MSB Unset: single pseudo-merge bitmap |
+|     + offset to pseudo-merge bitmap       |
+|                                           |
+|  2. MSB Set: multiple pseudo-merges       |
+|     + offset to extended lookup table     |
+|                                           |
++-------------------------------------------+
+|                                           |
+|  Extended Lookup Table (Optional)         |
+|                                           |
+|  +----+----------+----------+----------+  |
+|  | N  | Offset 1 |   ....   | Offset N |  |
+|  +----+----------+----------+----------+  |
+|  |    |  8 bytes |   ....   |  8 bytes |  |
+|  +----+----------+----------+----------+  |
+|                                           |
++-------------------------------------------+
+|                                           |
+|  Pseudo-merge Metadata                    |
+|  +------------------+----------------+    |
+|  | # pseudo-merges  | # Commits      |    |
+|  +------------------+----------------+    |
+|  | 4 bytes          | 4 bytes        |    |
+|  +------------------+----------------+    |
+|                                           |
+|  +------------------+----------------+    |
+|  | Lookup offset    | Extension size |    |
+|  +------------------+----------------+    |
+|  | 8 bytes          | 8 bytes        |    |
+|  +------------------+----------------+    |
+|                                           |
++-------------------------------------------+
+....
+
+* One or more pseudo-merge bitmaps, each containing:
+
+  ** `commits_bitmap`, an EWAH-compressed bitmap describing the set of
+     commits included in the this psuedo-merge.
+
+  ** `merge_bitmap`, an EWAH-compressed bitmap describing the union of
+     the set of objects reachable from all commits listed in the
+     `commits_bitmap`.
+
+* A lookup table, mapping pseudo-merged commits to the pseudo-merges
+  they belong to. Entries appear in increasing order of each commit's
+  bit position. Each entry is 12 bytes wide, and is comprised of the
+  following:
+
+  ** `commit_pos`, a 4-byte unsigned value (in network byte-order)
+     containing the bit position for this commit.
+
+  ** `offset`, an 8-byte unsigned value (also in network byte-order)
+  containing either one of two possible offsets, depending on whether or
+  not the most-significant bit is set.
+
+    *** If unset (i.e. `offset & ((uint64_t)1<<63) == 0`), the offset
+	(relative to the beginning of the `.bitmap` file) at which the
+	pseudo-merge bitmap for this commit can be read. This indicates
+	only a single pseudo-merge bitmap contains this commit.
+
+    *** If set (i.e. `offset & ((uint64_t)1<<63) != 0`), the offset
+	(again relative to the beginning of the `.bitmap` file) at which
+	the extended offset table can be located describing the set of
+	pseudo-merge bitmaps which contain this commit. This indicates
+	that multiple pseudo-merge bitmaps contain this commit.
+
+* An (optional) extended lookup table (written if and only if there is
+  at least one commit which appears in more than one pseudo-merge).
+  There are as many entries as commits which appear in multiple
+  pseudo-merges. Each entry contains the following:
+
+  ** `N`, a 4-byte unsigned value equal to the number of pseudo-merges
+     which contain a given commit.
+
+  ** An array of `N` 8-byte unsigned values, each of which is
+     interpreted as an offset (relative to the beginning of the
+     `.bitmap` file) at which a pseudo-merge bitmap for this commit can
+     be read. These values occur in no particular order.
+
+* Positions for all pseudo-merges, each stored as an 8-byte unsigned
+  value (in network byte-order) containing the offset (relative to the
+  beginnign of the `.bitmap` file) of each consecutive pseudo-merge.
+
+* A 4-byte unsigned value (in network byte-order) equal to the number of
+  pseudo-merges.
+
+* A 4-byte unsigned value (in network byte-order) equal to the number of
+  unique commits which appear in any pseudo-merge.
+
+* An 8-byte unsigned value (in network byte-order) equal to the number
+  of bytes between the start of the pseudo-merge section and the
+  beginning of the lookup table.
+
+* An 8-byte unsigned value (in network byte-order) equal to the number
+  of bytes in the pseudo-merge section (including this field).
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 00/23] pack-bitmap: pseudo-merge reachability bitmaps
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (24 preceding siblings ...)
  2024-03-21 19:50 ` [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Junio C Hamano
@ 2024-04-29 20:42 ` Taylor Blau
  2024-04-29 20:42   ` [PATCH v2 01/23] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
                     ` (23 more replies)
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
  27 siblings, 24 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:42 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Here is a reroll my topic to introduce pseudo-merge bitmaps. Much is
unchanged since last time, but notable changes (in response to Peff's
review of the first five or so patches of this topic) include:

  - Rebased onto 2.45, so this is now based on 'master', which is at
    786a3e4b8d (Git 2.45, 2024-04-29) at the time of writing.

  - Dropped patch 2/24 from the first round as it is no longer
    necessary.

  - Introduced some documentation and fixed a couple of comments
    around ewah_bitmap_is_subset() and bitmap_is_subset() to clarify
    which argument is supposed to be a subset of the other.

Otherwise, a range-diff is included below for convenience. Thanks in
advance for your review!

Taylor Blau (23):
  Documentation/technical: describe pseudo-merge bitmaps format
  ewah: implement `ewah_bitmap_is_subset()`
  pack-bitmap: drop unused `max_bitmaps` parameter
  pack-bitmap: move some initialization to `bitmap_writer_init()`
  pseudo-merge.ch: initial commit
  pack-bitmap-write: support storing pseudo-merge commits
  pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()`
  pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public
  pseudo-merge: implement support for selecting pseudo-merge commits
  pack-bitmap-write.c: select pseudo-merge commits
  pack-bitmap-write.c: write pseudo-merge table
  pack-bitmap: extract `read_bitmap()` function
  pseudo-merge: scaffolding for reads
  pack-bitmap.c: read pseudo-merge extension
  pseudo-merge: implement support for reading pseudo-merge commits
  ewah: implement `ewah_bitmap_popcount()`
  pack-bitmap: implement test helpers for pseudo-merge
  t/test-lib-functions.sh: support `--date` in `test_commit_bulk()`
  pack-bitmap.c: use pseudo-merges during traversal
  pack-bitmap: extra trace2 information
  ewah: `bitmap_equals_ewah()`
  pseudo-merge: implement support for finding existing merges
  t/perf: implement performace tests for pseudo-merge bitmaps

 Documentation/config.txt                     |   2 +
 Documentation/config/bitmap-pseudo-merge.txt |  75 ++
 Documentation/technical/bitmap-format.txt    | 205 +++++
 Makefile                                     |   1 +
 builtin/pack-objects.c                       |   3 +-
 ewah/bitmap.c                                |  76 ++
 ewah/ewok.h                                  |   8 +
 midx-write.c                                 |   3 +-
 pack-bitmap-write.c                          | 275 ++++++-
 pack-bitmap.c                                | 359 ++++++++-
 pack-bitmap.h                                |  16 +-
 pseudo-merge.c                               | 739 +++++++++++++++++++
 pseudo-merge.h                               | 218 ++++++
 t/helper/test-bitmap.c                       |  34 +-
 t/perf/p5333-pseudo-merge-bitmaps.sh         |  32 +
 t/t5333-pseudo-merge-bitmaps.sh              | 389 ++++++++++
 t/test-lib-functions.sh                      |  12 +-
 17 files changed, 2386 insertions(+), 61 deletions(-)
 create mode 100644 Documentation/config/bitmap-pseudo-merge.txt
 create mode 100644 pseudo-merge.c
 create mode 100644 pseudo-merge.h
 create mode 100755 t/perf/p5333-pseudo-merge-bitmaps.sh
 create mode 100755 t/t5333-pseudo-merge-bitmaps.sh

Range-diff against v1:
 1:  76e7e3b9cca =  1:  43fd5e35971 Documentation/technical: describe pseudo-merge bitmaps format
 2:  21d8f9dc2b4 <  -:  ----------- config: repo_config_get_expiry()
 3:  1347571ef4c !  2:  290d928325d ewah: implement `ewah_bitmap_is_subset()`
    @@ ewah/bitmap.c: void bitmap_or(struct bitmap *self, const struct bitmap *other)
     +			 * Otherwise, compare the next two pairs of
     +			 * words. If the word from `self` has bit(s) not
     +			 * in the word from `other`, `self` is not a
    -+			 * proper subset of `other`.
    ++			 * subset of `other`.
     +			 */
     +			return 0;
     +		}
    @@ ewah/bitmap.c: void bitmap_or(struct bitmap *self, const struct bitmap *other)
     +	 * If we got to this point, there may be zero or more words
     +	 * remaining in `self`, with no remaining words left in `other`.
     +	 * If there are any bits set in the remaining word(s) in `self`,
    -+	 * then `self` is not a proper subset of `other`.
    ++	 * then `self` is not a subset of `other`.
     +	 */
     +	while (ewah_iterator_next(&word, &it))
     +		if (word)
    @@ ewah/bitmap.c: void bitmap_or(struct bitmap *self, const struct bitmap *other)
      	size_t original_size = self->word_alloc;
     
      ## ewah/ewok.h ##
    -@@ ewah/ewok.h: int bitmap_get(struct bitmap *self, size_t pos);
    +@@ ewah/ewok.h: void bitmap_unset(struct bitmap *self, size_t pos);
    + int bitmap_get(struct bitmap *self, size_t pos);
      void bitmap_free(struct bitmap *self);
      int bitmap_equals(struct bitmap *self, struct bitmap *other);
    ++
    ++/*
    ++ * Both `bitmap_is_subset()` and `ewah_bitmap_is_subset()` return 1 if the set
    ++ * of bits in 'self' are a subset of the bits in 'other'. Returns 0 otherwise.
    ++ */
      int bitmap_is_subset(struct bitmap *self, struct bitmap *other);
     +int ewah_bitmap_is_subset(struct ewah_bitmap *self, struct bitmap *other);
      
 4:  c6a08dae037 !  3:  5160859f7f3 pack-bitmap: drop unused `max_bitmaps` parameter
    @@ builtin/pack-objects.c: static void write_pack_file(void)
      					die(_("failed to write bitmap index"));
      				bitmap_writer_finish(written_list, nr_written,
     
    - ## midx.c ##
    -@@ midx.c: static int write_midx_bitmap(const char *midx_name,
    + ## midx-write.c ##
    +@@ midx-write.c: static int write_midx_bitmap(const char *midx_name,
      	for (i = 0; i < pdata->nr_objects; i++)
      		index[pack_order[i]] = &pdata->objects[i].idx;
      
 5:  a6531656739 !  4:  3d7d930b1c5 pack-bitmap: move some initialization to `bitmap_writer_init()`
    @@ builtin/pack-objects.c: static void write_pack_file(void)
      				bitmap_writer_build_type_index(
      					&to_pack, written_list, nr_written);
     
    - ## midx.c ##
    -@@ midx.c: static int write_midx_bitmap(const char *midx_name,
    + ## midx-write.c ##
    +@@ midx-write.c: static int write_midx_bitmap(const char *midx_name,
      	for (i = 0; i < pdata->nr_objects; i++)
      		index[i] = &pdata->objects[i].idx;
      
 6:  c6f9170af0f =  5:  e7a87cf7d4e pseudo-merge.ch: initial commit
 7:  7acdee2b5f2 =  6:  ee33a703245 pack-bitmap-write: support storing pseudo-merge commits
 8:  4fdd7dda274 =  7:  9c6d09bf874 pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()`
 9:  d74cf3e484d =  8:  dfd4b73d12e pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public
10:  323e1250b24 =  9:  86a1e4b8b9b pseudo-merge: implement support for selecting pseudo-merge commits
11:  bf6b0d8601e = 10:  12b432e3a8a pack-bitmap-write.c: select pseudo-merge commits
12:  4c594f3faa8 = 11:  6ce805d061e pack-bitmap-write.c: write pseudo-merge table
13:  7a31a932ab3 = 12:  60f6b310213 pack-bitmap: extract `read_bitmap()` function
14:  7e4d051f37a = 13:  9465313691b pseudo-merge: scaffolding for reads
15:  7bb644b2b0c = 14:  5894f3d5369 pack-bitmap.c: read pseudo-merge extension
16:  792cc863154 = 15:  7dbee8bcbdf pseudo-merge: implement support for reading pseudo-merge commits
17:  8fb7f7ab37b = 16:  09650aa53e9 ewah: implement `ewah_bitmap_popcount()`
18:  c839e1fed15 = 17:  7b5ea56d053 pack-bitmap: implement test helpers for pseudo-merge
19:  7d3b88e6fd6 = 18:  006abdd1698 t/test-lib-functions.sh: support `--date` in `test_commit_bulk()`
20:  c18694ade2a = 19:  3f85e5b90f5 pack-bitmap.c: use pseudo-merges during traversal
21:  d38ebeba419 = 20:  5fac186df64 pack-bitmap: extra trace2 information
22:  1eb10c190ba ! 21:  b5aea8e57f8 ewah: `bitmap_equals_ewah()`
    @@ ewah/ewok.h: void bitmap_unset(struct bitmap *self, size_t pos);
      void bitmap_free(struct bitmap *self);
      int bitmap_equals(struct bitmap *self, struct bitmap *other);
     +int bitmap_equals_ewah(struct bitmap *self, struct ewah_bitmap *other);
    - int bitmap_is_subset(struct bitmap *self, struct bitmap *other);
    - int ewah_bitmap_is_subset(struct ewah_bitmap *self, struct bitmap *other);
      
    + /*
    +  * Both `bitmap_is_subset()` and `ewah_bitmap_is_subset()` return 1 if the set
23:  4ae4f0eaae5 = 22:  61ddb574285 pseudo-merge: implement support for finding existing merges
24:  a05ad42202d = 23:  2bd830d35dd t/perf: implement performace tests for pseudo-merge bitmaps

base-commit: 786a3e4b8d754d2b14b1208b98eeb0a554ef19a8
-- 
2.45.0.23.gc6f94b99219

^ permalink raw reply	[flat|nested] 157+ messages in thread

* [PATCH v2 02/23] ewah: implement `ewah_bitmap_is_subset()`
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
  2024-04-29 20:42   ` [PATCH v2 01/23] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
@ 2024-04-29 20:43   ` Taylor Blau
  2024-04-29 20:43   ` [PATCH v2 03/23] pack-bitmap: drop unused `max_bitmaps` parameter Taylor Blau
                     ` (21 subsequent siblings)
  23 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:43 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

In order to know whether a given pseudo-merge (comprised of a "parents"
and "objects" bitmaps) is "satisfied" and can be OR'd into the bitmap
result, we need to be able to quickly determine whether the "parents"
bitmap is a subset of the current set of objects reachable on either
side of a traversal.

Implement a helper function to prepare for that, which determines
whether an EWAH bitmap (the parents bitmap from the pseudo-merge) is a
subset of a non-EWAH bitmap (in this case, the results bitmap from
either side of the traversal).

This function makes use of the EWAH iterator to avoid inflating any part
of the EWAH bitmap after we determine it is not a subset of the non-EWAH
bitmap. This "fail-fast" allows us to avoid a potentially large amount
of wasted effort.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ewah/bitmap.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 ewah/ewok.h   |  6 ++++++
 2 files changed, 49 insertions(+)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index ac7e0af622a..d352fec54ce 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -138,6 +138,49 @@ void bitmap_or(struct bitmap *self, const struct bitmap *other)
 		self->words[i] |= other->words[i];
 }
 
+int ewah_bitmap_is_subset(struct ewah_bitmap *self, struct bitmap *other)
+{
+	struct ewah_iterator it;
+	eword_t word;
+	size_t i;
+
+	ewah_iterator_init(&it, self);
+
+	for (i = 0; i < other->word_alloc; i++) {
+		if (!ewah_iterator_next(&word, &it)) {
+			/*
+			 * If we reached the end of `self`, and haven't
+			 * rejected `self` as a possible subset of
+			 * `other` yet, then we are done and `self` is
+			 * indeed a subset of `other`.
+			 */
+			return 1;
+		}
+		if (word & ~other->words[i]) {
+			/*
+			 * Otherwise, compare the next two pairs of
+			 * words. If the word from `self` has bit(s) not
+			 * in the word from `other`, `self` is not a
+			 * subset of `other`.
+			 */
+			return 0;
+		}
+	}
+
+	/*
+	 * If we got to this point, there may be zero or more words
+	 * remaining in `self`, with no remaining words left in `other`.
+	 * If there are any bits set in the remaining word(s) in `self`,
+	 * then `self` is not a subset of `other`.
+	 */
+	while (ewah_iterator_next(&word, &it))
+		if (word)
+			return 0;
+
+	/* `self` is definitely a subset of `other` */
+	return 1;
+}
+
 void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other)
 {
 	size_t original_size = self->word_alloc;
diff --git a/ewah/ewok.h b/ewah/ewok.h
index c11d76c6f33..2b6c4ac499c 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -179,7 +179,13 @@ void bitmap_unset(struct bitmap *self, size_t pos);
 int bitmap_get(struct bitmap *self, size_t pos);
 void bitmap_free(struct bitmap *self);
 int bitmap_equals(struct bitmap *self, struct bitmap *other);
+
+/*
+ * Both `bitmap_is_subset()` and `ewah_bitmap_is_subset()` return 1 if the set
+ * of bits in 'self' are a subset of the bits in 'other'. Returns 0 otherwise.
+ */
 int bitmap_is_subset(struct bitmap *self, struct bitmap *other);
+int ewah_bitmap_is_subset(struct ewah_bitmap *self, struct bitmap *other);
 
 struct ewah_bitmap * bitmap_to_ewah(struct bitmap *bitmap);
 struct bitmap *ewah_to_bitmap(struct ewah_bitmap *ewah);
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 03/23] pack-bitmap: drop unused `max_bitmaps` parameter
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
  2024-04-29 20:42   ` [PATCH v2 01/23] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
  2024-04-29 20:43   ` [PATCH v2 02/23] ewah: implement `ewah_bitmap_is_subset()` Taylor Blau
@ 2024-04-29 20:43   ` Taylor Blau
  2024-04-29 20:43   ` [PATCH v2 04/23] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
                     ` (20 subsequent siblings)
  23 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:43 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The `max_bitmaps` parameter in `bitmap_writer_select_commits()` was
introduced back in 7cc8f97108 (pack-objects: implement bitmap writing,
2013-12-21), making it original to the bitmap implementation in Git
itself.

When that patch was merged via 0f9e62e084 (Merge branch
'jk/pack-bitmap', 2014-02-27), its sole caller in builtin/pack-objects.c
passed a value of "-1" for `max_bitmaps`, indicating no limit.

Since then, the only other caller (in midx.c, added via c528e17966
(pack-bitmap: write multi-pack bitmaps, 2021-08-31)) also uses a value
of "-1" for `max_bitmaps`.

Since no callers have needed a finite limit for the `max_bitmaps`
parameter in the nearly decade that has passed since 0f9e62e084, let's
remove the parameter and any dead pieces of code connected to it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 2 +-
 midx-write.c           | 2 +-
 pack-bitmap-write.c    | 8 +-------
 pack-bitmap.h          | 2 +-
 4 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index baf0090fc8d..5060ce2dfba 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1359,7 +1359,7 @@ static void write_pack_file(void)
 				stop_progress(&progress_state);
 
 				bitmap_writer_show_progress(progress);
-				bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1);
+				bitmap_writer_select_commits(indexed_commits, indexed_commits_nr);
 				if (bitmap_writer_build(&to_pack) < 0)
 					die(_("failed to write bitmap index"));
 				bitmap_writer_finish(written_list, nr_written,
diff --git a/midx-write.c b/midx-write.c
index 65e69d2de78..469cceaa583 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -838,7 +838,7 @@ static int write_midx_bitmap(const char *midx_name,
 	for (i = 0; i < pdata->nr_objects; i++)
 		index[pack_order[i]] = &pdata->objects[i].idx;
 
-	bitmap_writer_select_commits(commits, commits_nr, -1);
+	bitmap_writer_select_commits(commits, commits_nr);
 	ret = bitmap_writer_build(pdata);
 	if (ret < 0)
 		goto cleanup;
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index c6c8f94cc51..c35bc81d00f 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -591,8 +591,7 @@ static int date_compare(const void *_a, const void *_b)
 }
 
 void bitmap_writer_select_commits(struct commit **indexed_commits,
-				  unsigned int indexed_commits_nr,
-				  int max_bitmaps)
+				  unsigned int indexed_commits_nr)
 {
 	unsigned int i = 0, j, next;
 
@@ -615,11 +614,6 @@ void bitmap_writer_select_commits(struct commit **indexed_commits,
 		if (i + next >= indexed_commits_nr)
 			break;
 
-		if (max_bitmaps > 0 && writer.selected_nr >= max_bitmaps) {
-			writer.selected_nr = max_bitmaps;
-			break;
-		}
-
 		if (next == 0) {
 			chosen = indexed_commits[i];
 		} else {
diff --git a/pack-bitmap.h b/pack-bitmap.h
index c7dea13217a..3f96608d5c1 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -110,7 +110,7 @@ int rebuild_bitmap(const uint32_t *reposition,
 struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
 				      struct commit *commit);
 void bitmap_writer_select_commits(struct commit **indexed_commits,
-		unsigned int indexed_commits_nr, int max_bitmaps);
+				  unsigned int indexed_commits_nr);
 int bitmap_writer_build(struct packing_data *to_pack);
 void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint32_t index_nr,
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 04/23] pack-bitmap: move some initialization to `bitmap_writer_init()`
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                     ` (2 preceding siblings ...)
  2024-04-29 20:43   ` [PATCH v2 03/23] pack-bitmap: drop unused `max_bitmaps` parameter Taylor Blau
@ 2024-04-29 20:43   ` Taylor Blau
  2024-05-06 11:52     ` Patrick Steinhardt
  2024-04-29 20:43   ` [PATCH v2 05/23] pseudo-merge.ch: initial commit Taylor Blau
                     ` (19 subsequent siblings)
  23 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:43 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The pack-bitmap-writer machinery uses a oidmap (backed by khash.h) to
map from commits selected for bitmaps (by OID) to a bitmapped_commit
structure (containing the bitmap itself, among other things like its XOR
offset, etc.)

This map was initialized at the end of `bitmap_writer_build()`. New
entries are added in `pack-bitmap-write.c::store_selected()`, which is
called by the bitmap_builder machinery (which is responsible for
traversing history and generating the actual bitmaps).

Reorganize when this field is initialized and when entries are added to
it so that we can quickly determine whether a commit is a candidate for
pseudo-merge selection, or not (since it was already selected to receive
a bitmap, and thus is ineligible for pseudo-merge inclusion).

The changes are as follows:

  - Introduce a new `bitmap_writer_init()` function which initializes
    the `writer.bitmaps` field (instead of waiting until the end of
    `bitmap_writer_build()`).

  - Add map entries in `push_bitmapped_commit()` (which is called via
    `bitmap_writer_select_commits()`) with OID keys and NULL values to
    track whether or not we *expect* to write a bitmap for some given
    commit.

  - Validate that a NULL entry is found matching the given key when we
    store a selected bitmap.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c |  1 +
 midx-write.c           |  1 +
 pack-bitmap-write.c    | 23 ++++++++++++++++++-----
 pack-bitmap.h          |  1 +
 4 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5060ce2dfba..2958cdda499 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1339,6 +1339,7 @@ static void write_pack_file(void)
 				    hash_to_hex(hash));
 
 			if (write_bitmap_index) {
+				bitmap_writer_init(the_repository);
 				bitmap_writer_set_checksum(hash);
 				bitmap_writer_build_type_index(
 					&to_pack, written_list, nr_written);
diff --git a/midx-write.c b/midx-write.c
index 469cceaa583..ed5f8b72b9c 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -819,6 +819,7 @@ static int write_midx_bitmap(const char *midx_name,
 	for (i = 0; i < pdata->nr_objects; i++)
 		index[i] = &pdata->objects[i].idx;
 
+	bitmap_writer_init(the_repository);
 	bitmap_writer_show_progress(flags & MIDX_PROGRESS);
 	bitmap_writer_build_type_index(pdata, index, pdata->nr_objects);
 
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index c35bc81d00f..9bc41a9e145 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -46,6 +46,11 @@ struct bitmap_writer {
 
 static struct bitmap_writer writer;
 
+void bitmap_writer_init(struct repository *r)
+{
+	writer.bitmaps = kh_init_oid_map();
+}
+
 void bitmap_writer_show_progress(int show)
 {
 	writer.show_progress = show;
@@ -117,11 +122,20 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack,
 
 static inline void push_bitmapped_commit(struct commit *commit)
 {
+	int hash_ret;
+	khiter_t hash_pos;
+
 	if (writer.selected_nr >= writer.selected_alloc) {
 		writer.selected_alloc = (writer.selected_alloc + 32) * 2;
 		REALLOC_ARRAY(writer.selected, writer.selected_alloc);
 	}
 
+	hash_pos = kh_put_oid_map(writer.bitmaps, commit->object.oid, &hash_ret);
+	if (!hash_ret)
+		die(_("duplicate entry when writing bitmap index: %s"),
+		    oid_to_hex(&commit->object.oid));
+	kh_value(writer.bitmaps, hash_pos) = NULL;
+
 	writer.selected[writer.selected_nr].commit = commit;
 	writer.selected[writer.selected_nr].bitmap = NULL;
 	writer.selected[writer.selected_nr].flags = 0;
@@ -466,14 +480,14 @@ static void store_selected(struct bb_commit *ent, struct commit *commit)
 {
 	struct bitmapped_commit *stored = &writer.selected[ent->idx];
 	khiter_t hash_pos;
-	int hash_ret;
 
 	stored->bitmap = bitmap_to_ewah(ent->bitmap);
 
-	hash_pos = kh_put_oid_map(writer.bitmaps, commit->object.oid, &hash_ret);
-	if (hash_ret == 0)
-		die("Duplicate entry when writing index: %s",
+	hash_pos = kh_get_oid_map(writer.bitmaps, commit->object.oid);
+	if (hash_pos == kh_end(writer.bitmaps))
+		die(_("attempted to store non-selected commit: '%s'"),
 		    oid_to_hex(&commit->object.oid));
+
 	kh_value(writer.bitmaps, hash_pos) = stored;
 }
 
@@ -488,7 +502,6 @@ int bitmap_writer_build(struct packing_data *to_pack)
 	uint32_t *mapping;
 	int closed = 1; /* until proven otherwise */
 
-	writer.bitmaps = kh_init_oid_map();
 	writer.to_pack = to_pack;
 
 	if (writer.show_progress)
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 3f96608d5c1..dae2d68a338 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -97,6 +97,7 @@ int bitmap_has_oid_in_uninteresting(struct bitmap_index *, const struct object_i
 
 off_t get_disk_usage_from_bitmap(struct bitmap_index *, struct rev_info *);
 
+void bitmap_writer_init(struct repository *r);
 void bitmap_writer_show_progress(int show);
 void bitmap_writer_set_checksum(const unsigned char *sha1);
 void bitmap_writer_build_type_index(struct packing_data *to_pack,
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 05/23] pseudo-merge.ch: initial commit
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                     ` (3 preceding siblings ...)
  2024-04-29 20:43   ` [PATCH v2 04/23] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
@ 2024-04-29 20:43   ` Taylor Blau
  2024-04-29 20:43   ` [PATCH v2 06/23] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
                     ` (18 subsequent siblings)
  23 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:43 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Add a new (empty) header file to contain the implementation for
selecting, reading, and applying pseudo-merge bitmaps.

For now this header and its corresponding implementation are left
empty, but they will evolve over the course of subsequent commit(s).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Makefile       | 1 +
 pseudo-merge.c | 2 ++
 pseudo-merge.h | 6 ++++++
 3 files changed, 9 insertions(+)
 create mode 100644 pseudo-merge.c
 create mode 100644 pseudo-merge.h

diff --git a/Makefile b/Makefile
index 1e31acc72ec..6a3d164fdf8 100644
--- a/Makefile
+++ b/Makefile
@@ -1119,6 +1119,7 @@ LIB_OBJS += prompt.o
 LIB_OBJS += protocol.o
 LIB_OBJS += protocol-caps.o
 LIB_OBJS += prune-packed.o
+LIB_OBJS += pseudo-merge.o
 LIB_OBJS += quote.o
 LIB_OBJS += range-diff.o
 LIB_OBJS += reachable.o
diff --git a/pseudo-merge.c b/pseudo-merge.c
new file mode 100644
index 00000000000..37e037ba272
--- /dev/null
+++ b/pseudo-merge.c
@@ -0,0 +1,2 @@
+#include "git-compat-util.h"
+#include "pseudo-merge.h"
diff --git a/pseudo-merge.h b/pseudo-merge.h
new file mode 100644
index 00000000000..cab8ff6960a
--- /dev/null
+++ b/pseudo-merge.h
@@ -0,0 +1,6 @@
+#ifndef PSEUDO_MERGE_H
+#define PSEUDO_MERGE_H
+
+#include "git-compat-util.h"
+
+#endif
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 06/23] pack-bitmap-write: support storing pseudo-merge commits
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                     ` (4 preceding siblings ...)
  2024-04-29 20:43   ` [PATCH v2 05/23] pseudo-merge.ch: initial commit Taylor Blau
@ 2024-04-29 20:43   ` Taylor Blau
  2024-05-06 11:52     ` Patrick Steinhardt
  2024-05-13 18:42     ` Jeff King
  2024-04-29 20:43   ` [PATCH v2 07/23] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Taylor Blau
                     ` (17 subsequent siblings)
  23 siblings, 2 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:43 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Prepare to write pseudo-merge bitmaps by annotating individual bitmapped
commits (which are represented by the `bitmapped_commit` structure) with
an extra bit indicating whether or not they are a pseudo-merge.

In subsequent commits, pseudo-merge bitmaps will be generated by
allocating a fake commit node with parents covering the full set of
commits represented by the pseudo-merge bitmap. These commits will be
added to the set of "selected" commits as usual, but will be written
specially instead of being included with the rest of the selected
commits.

Mechanically speaking, there are two parts of this change:

  - The bitmapped_commit struct gets a new bit indicating whether it is
    a pseudo-merge, or an ordinary commit selected for bitmaps.

  - A handful of changes to only write out the non-pseudo-merge commits
    when enumerating through the selected array (see the new
    `bitmap_writer_selected_nr()` function). Pseudo-merge commits appear
    after all non-pseudo-merge commits, so it is safe to enumerate
    through the selected array like so:

        for (i = 0; i < bitmap_writer_selected_nr(); i++)
          if (writer.selected[i].pseudo_merge)
            BUG("unexpected pseudo-merge");

    without encountering the BUG().

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 100 +++++++++++++++++++++++++++++---------------
 pack-bitmap.h       |   1 +
 2 files changed, 67 insertions(+), 34 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 9bc41a9e145..fef02cd745a 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -24,7 +24,7 @@ struct bitmapped_commit {
 	struct ewah_bitmap *write_as;
 	int flags;
 	int xor_offset;
-	uint32_t commit_pos;
+	unsigned pseudo_merge : 1;
 };
 
 struct bitmap_writer {
@@ -39,6 +39,8 @@ struct bitmap_writer {
 	struct bitmapped_commit *selected;
 	unsigned int selected_nr, selected_alloc;
 
+	uint32_t pseudo_merges_nr;
+
 	struct progress *progress;
 	int show_progress;
 	unsigned char pack_checksum[GIT_MAX_RAWSZ];
@@ -46,6 +48,11 @@ struct bitmap_writer {
 
 static struct bitmap_writer writer;
 
+static inline int bitmap_writer_selected_nr(void)
+{
+	return writer.selected_nr - writer.pseudo_merges_nr;
+}
+
 void bitmap_writer_init(struct repository *r)
 {
 	writer.bitmaps = kh_init_oid_map();
@@ -120,25 +127,30 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack,
  * Compute the actual bitmaps
  */
 
-static inline void push_bitmapped_commit(struct commit *commit)
+static void bitmap_writer_push_bitmapped_commit(struct commit *commit,
+						unsigned pseudo_merge)
 {
-	int hash_ret;
-	khiter_t hash_pos;
-
 	if (writer.selected_nr >= writer.selected_alloc) {
 		writer.selected_alloc = (writer.selected_alloc + 32) * 2;
 		REALLOC_ARRAY(writer.selected, writer.selected_alloc);
 	}
 
-	hash_pos = kh_put_oid_map(writer.bitmaps, commit->object.oid, &hash_ret);
-	if (!hash_ret)
-		die(_("duplicate entry when writing bitmap index: %s"),
-		    oid_to_hex(&commit->object.oid));
-	kh_value(writer.bitmaps, hash_pos) = NULL;
+	if (!pseudo_merge) {
+		int hash_ret;
+		khiter_t hash_pos = kh_put_oid_map(writer.bitmaps,
+						   commit->object.oid,
+						   &hash_ret);
+
+		if (!hash_ret)
+			die(_("duplicate entry when writing bitmap index: %s"),
+			    oid_to_hex(&commit->object.oid));
+		kh_value(writer.bitmaps, hash_pos) = NULL;
+	}
 
 	writer.selected[writer.selected_nr].commit = commit;
 	writer.selected[writer.selected_nr].bitmap = NULL;
 	writer.selected[writer.selected_nr].flags = 0;
+	writer.selected[writer.selected_nr].pseudo_merge = pseudo_merge;
 
 	writer.selected_nr++;
 }
@@ -168,16 +180,20 @@ static void compute_xor_offsets(void)
 
 	while (next < writer.selected_nr) {
 		struct bitmapped_commit *stored = &writer.selected[next];
-
 		int best_offset = 0;
 		struct ewah_bitmap *best_bitmap = stored->bitmap;
 		struct ewah_bitmap *test_xor;
 
+		if (stored->pseudo_merge)
+			goto next;
+
 		for (i = 1; i <= MAX_XOR_OFFSET_SEARCH; ++i) {
 			int curr = next - i;
 
 			if (curr < 0)
 				break;
+			if (writer.selected[curr].pseudo_merge)
+				continue;
 
 			test_xor = ewah_pool_new();
 			ewah_xor(writer.selected[curr].bitmap, stored->bitmap, test_xor);
@@ -193,6 +209,7 @@ static void compute_xor_offsets(void)
 			}
 		}
 
+next:
 		stored->xor_offset = best_offset;
 		stored->write_as = best_bitmap;
 
@@ -205,7 +222,8 @@ struct bb_commit {
 	struct bitmap *commit_mask;
 	struct bitmap *bitmap;
 	unsigned selected:1,
-		 maximal:1;
+		 maximal:1,
+		 pseudo_merge:1;
 	unsigned idx; /* within selected array */
 };
 
@@ -243,17 +261,18 @@ static void bitmap_builder_init(struct bitmap_builder *bb,
 	revs.first_parent_only = 1;
 
 	for (i = 0; i < writer->selected_nr; i++) {
-		struct commit *c = writer->selected[i].commit;
-		struct bb_commit *ent = bb_data_at(&bb->data, c);
+		struct bitmapped_commit *bc = &writer->selected[i];
+		struct bb_commit *ent = bb_data_at(&bb->data, bc->commit);
 
 		ent->selected = 1;
 		ent->maximal = 1;
+		ent->pseudo_merge = bc->pseudo_merge;
 		ent->idx = i;
 
 		ent->commit_mask = bitmap_new();
 		bitmap_set(ent->commit_mask, i);
 
-		add_pending_object(&revs, &c->object, "");
+		add_pending_object(&revs, &bc->commit->object, "");
 	}
 
 	if (prepare_revision_walk(&revs))
@@ -430,8 +449,13 @@ static int fill_bitmap_commit(struct bb_commit *ent,
 		struct commit *c = prio_queue_get(queue);
 
 		if (old_bitmap && mapping) {
-			struct ewah_bitmap *old = bitmap_for_commit(old_bitmap, c);
+			struct ewah_bitmap *old;
 			struct bitmap *remapped = bitmap_new();
+
+			if (commit->object.flags & BITMAP_PSEUDO_MERGE)
+				old = NULL;
+			else
+				old = bitmap_for_commit(old_bitmap, c);
 			/*
 			 * If this commit has an old bitmap, then translate that
 			 * bitmap and add its bits to this one. No need to walk
@@ -450,12 +474,14 @@ static int fill_bitmap_commit(struct bb_commit *ent,
 		 * Mark ourselves and queue our tree. The commit
 		 * walk ensures we cover all parents.
 		 */
-		pos = find_object_pos(&c->object.oid, &found);
-		if (!found)
-			return -1;
-		bitmap_set(ent->bitmap, pos);
-		prio_queue_put(tree_queue,
-			       repo_get_commit_tree(the_repository, c));
+		if (!(c->object.flags & BITMAP_PSEUDO_MERGE)) {
+			pos = find_object_pos(&c->object.oid, &found);
+			if (!found)
+				return -1;
+			bitmap_set(ent->bitmap, pos);
+			prio_queue_put(tree_queue,
+				       repo_get_commit_tree(the_repository, c));
+		}
 
 		for (p = c->parents; p; p = p->next) {
 			pos = find_object_pos(&p->item->object.oid, &found);
@@ -483,6 +509,9 @@ static void store_selected(struct bb_commit *ent, struct commit *commit)
 
 	stored->bitmap = bitmap_to_ewah(ent->bitmap);
 
+	if (ent->pseudo_merge)
+		return;
+
 	hash_pos = kh_get_oid_map(writer.bitmaps, commit->object.oid);
 	if (hash_pos == kh_end(writer.bitmaps))
 		die(_("attempted to store non-selected commit: '%s'"),
@@ -612,7 +641,7 @@ void bitmap_writer_select_commits(struct commit **indexed_commits,
 
 	if (indexed_commits_nr < 100) {
 		for (i = 0; i < indexed_commits_nr; ++i)
-			push_bitmapped_commit(indexed_commits[i]);
+			bitmap_writer_push_bitmapped_commit(indexed_commits[i], 0);
 		return;
 	}
 
@@ -645,7 +674,7 @@ void bitmap_writer_select_commits(struct commit **indexed_commits,
 			}
 		}
 
-		push_bitmapped_commit(chosen);
+		bitmap_writer_push_bitmapped_commit(chosen, 0);
 
 		i += next + 1;
 		display_progress(writer.progress, i);
@@ -683,8 +712,11 @@ static void write_selected_commits_v1(struct hashfile *f,
 {
 	int i;
 
-	for (i = 0; i < writer.selected_nr; ++i) {
+	for (i = 0; i < bitmap_writer_selected_nr(); ++i) {
 		struct bitmapped_commit *stored = &writer.selected[i];
+		if (stored->pseudo_merge)
+			BUG("unexpected pseudo-merge among selected: %s",
+			    oid_to_hex(&stored->commit->object.oid));
 
 		if (offsets)
 			offsets[i] = hashfile_total(f);
@@ -718,10 +750,10 @@ static void write_lookup_table(struct hashfile *f,
 	uint32_t i;
 	uint32_t *table, *table_inv;
 
-	ALLOC_ARRAY(table, writer.selected_nr);
-	ALLOC_ARRAY(table_inv, writer.selected_nr);
+	ALLOC_ARRAY(table, bitmap_writer_selected_nr());
+	ALLOC_ARRAY(table_inv, bitmap_writer_selected_nr());
 
-	for (i = 0; i < writer.selected_nr; i++)
+	for (i = 0; i < bitmap_writer_selected_nr(); i++)
 		table[i] = i;
 
 	/*
@@ -729,16 +761,16 @@ static void write_lookup_table(struct hashfile *f,
 	 * bitmap corresponds to j'th bitmapped commit (among the selected
 	 * commits) in lex order of OIDs.
 	 */
-	QSORT_S(table, writer.selected_nr, table_cmp, commit_positions);
+	QSORT_S(table, bitmap_writer_selected_nr(), table_cmp, commit_positions);
 
 	/* table_inv helps us discover that relationship (i'th bitmap
 	 * to j'th commit by j = table_inv[i])
 	 */
-	for (i = 0; i < writer.selected_nr; i++)
+	for (i = 0; i < bitmap_writer_selected_nr(); i++)
 		table_inv[table[i]] = i;
 
 	trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository);
-	for (i = 0; i < writer.selected_nr; i++) {
+	for (i = 0; i < bitmap_writer_selected_nr(); i++) {
 		struct bitmapped_commit *selected = &writer.selected[table[i]];
 		uint32_t xor_offset = selected->xor_offset;
 		uint32_t xor_row;
@@ -809,7 +841,7 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE));
 	header.version = htons(default_version);
 	header.options = htons(flags | options);
-	header.entry_count = htonl(writer.selected_nr);
+	header.entry_count = htonl(bitmap_writer_selected_nr());
 	hashcpy(header.checksum, writer.pack_checksum);
 
 	hashwrite(f, &header, sizeof(header) - GIT_MAX_RAWSZ + the_hash_algo->rawsz);
@@ -821,9 +853,9 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	if (options & BITMAP_OPT_LOOKUP_TABLE)
 		CALLOC_ARRAY(offsets, index_nr);
 
-	ALLOC_ARRAY(commit_positions, writer.selected_nr);
+	ALLOC_ARRAY(commit_positions, bitmap_writer_selected_nr());
 
-	for (i = 0; i < writer.selected_nr; i++) {
+	for (i = 0; i < bitmap_writer_selected_nr(); i++) {
 		struct bitmapped_commit *stored = &writer.selected[i];
 		int commit_pos = oid_pos(&stored->commit->object.oid, index, index_nr, oid_access);
 
diff --git a/pack-bitmap.h b/pack-bitmap.h
index dae2d68a338..ca9acd2f735 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -21,6 +21,7 @@ struct bitmap_disk_header {
 	unsigned char checksum[GIT_MAX_RAWSZ];
 };
 
+#define BITMAP_PSEUDO_MERGE (1u<<21)
 #define NEEDS_BITMAP (1u<<22)
 
 /*
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 07/23] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()`
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                     ` (5 preceding siblings ...)
  2024-04-29 20:43   ` [PATCH v2 06/23] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
@ 2024-04-29 20:43   ` Taylor Blau
  2024-04-29 20:43   ` [PATCH v2 08/23] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Taylor Blau
                     ` (16 subsequent siblings)
  23 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:43 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Prepare to implement pseudo-merge bitmap selection by implementing a
necessary new function, `bitmap_writer_has_bitmapped_object_id()`.

This function returns whether or not the bitmap_writer selected the
given object ID for bitmapping. This will allow the pseudo-merge
machinery to reject candidates for pseudo-merges if they have already
been selected as an ordinary bitmap tip.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 5 +++++
 pack-bitmap.h       | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index fef02cd745a..c7514a58407 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -123,6 +123,11 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack,
 	}
 }
 
+int bitmap_writer_has_bitmapped_object_id(const struct object_id *oid)
+{
+	return kh_get_oid_map(writer.bitmaps, *oid) != kh_end(writer.bitmaps);
+}
+
 /**
  * Compute the actual bitmaps
  */
diff --git a/pack-bitmap.h b/pack-bitmap.h
index ca9acd2f735..995d664cc89 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -98,6 +98,8 @@ int bitmap_has_oid_in_uninteresting(struct bitmap_index *, const struct object_i
 
 off_t get_disk_usage_from_bitmap(struct bitmap_index *, struct rev_info *);
 
+int bitmap_writer_has_bitmapped_object_id(const struct object_id *oid);
+
 void bitmap_writer_init(struct repository *r);
 void bitmap_writer_show_progress(int show);
 void bitmap_writer_set_checksum(const unsigned char *sha1);
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 08/23] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                     ` (6 preceding siblings ...)
  2024-04-29 20:43   ` [PATCH v2 07/23] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Taylor Blau
@ 2024-04-29 20:43   ` Taylor Blau
  2024-05-13 18:50     ` Jeff King
  2024-04-29 20:43   ` [PATCH v2 09/23] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
                     ` (15 subsequent siblings)
  23 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:43 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The pseudo-merge selection code will be added in a subsequent commit,
and will need a way to push the allocated commit structures into the
bitmap writer from a separate compilation unit.

Make the `bitmap_writer_push_bitmapped_commit()` function part of the
pack-bitmap.h header in order to make this possible.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 4 ++--
 pack-bitmap.h       | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index c7514a58407..dab5bdea806 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -132,8 +132,8 @@ int bitmap_writer_has_bitmapped_object_id(const struct object_id *oid)
  * Compute the actual bitmaps
  */
 
-static void bitmap_writer_push_bitmapped_commit(struct commit *commit,
-						unsigned pseudo_merge)
+void bitmap_writer_push_bitmapped_commit(struct commit *commit,
+					 unsigned pseudo_merge)
 {
 	if (writer.selected_nr >= writer.selected_alloc) {
 		writer.selected_alloc = (writer.selected_alloc + 32) * 2;
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 995d664cc89..0f539d79cfd 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -99,6 +99,8 @@ int bitmap_has_oid_in_uninteresting(struct bitmap_index *, const struct object_i
 off_t get_disk_usage_from_bitmap(struct bitmap_index *, struct rev_info *);
 
 int bitmap_writer_has_bitmapped_object_id(const struct object_id *oid);
+void bitmap_writer_push_bitmapped_commit(struct commit *commit,
+					 unsigned pseudo_merge);
 
 void bitmap_writer_init(struct repository *r);
 void bitmap_writer_show_progress(int show);
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 09/23] pseudo-merge: implement support for selecting pseudo-merge commits
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                     ` (7 preceding siblings ...)
  2024-04-29 20:43   ` [PATCH v2 08/23] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Taylor Blau
@ 2024-04-29 20:43   ` Taylor Blau
  2024-05-06 11:53     ` Patrick Steinhardt
  2024-05-13 19:03     ` Jeff King
  2024-04-29 20:43   ` [PATCH v2 10/23] pack-bitmap-write.c: select " Taylor Blau
                     ` (14 subsequent siblings)
  23 siblings, 2 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:43 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Teach the new pseudo-merge machinery how to select non-bitmapped commits
for inclusion in different pseudo-merge group(s) based on a handful of
criteria.

Pseudo-merges are derived first from named pseudo-merge groups (see the
`bitmapPseudoMerge.<name>.*` configuration options). They are
(optionally) further segmented within an individual pseudo-merge group
based on any capture group(s) within the pseudo-merge group's pattern.

For example, a configuration like so:

    [bitmapPseudoMerge "all"]
        pattern = "refs/"
        threshold = now
        stableThreshold = never
        sampleRate = 100
        maxMerges = 64

would group all non-bitmapped commits into up to 64 individual
pseudo-merge commits.

If you wanted to separate tags from branches when generating
pseudo-merge commits, and further segment them by which fork they
originate from (using the same "refs/virtual/" scheme as in the delta
islands documentation), you would instead write something like:

    [bitmapPseudoMerge "all"]
        pattern = "refs/virtual/([0-9]+)/(heads|tags)/"
        threshold = now
        stableThreshold = never
        sampleRate = 100
        maxMerges = 64

Which would generate pseudo-merge group identifiers like "1234-heads",
and "5678-tags" (for branches in fork "1234", and tags in remote "5678",
respectively).

Within pseudo-merge groups, there are a handful of other options used to
control the distribution of matching commits among individual
pseudo-merge commits:

  - bitmapPseudoMerge.<name>.decay
  - bitmapPseudoMerge.<name>.sampleRate
  - bitmapPseudoMerge.<name>.threshold
  - bitmapPseudoMerge.<name>.maxMerges
  - bitmapPseudoMerge.<name>.stableThreshold
  - bitmapPseudoMerge.<name>.stableSize

The decay parameter roughly corresponds to "k" in `f(n) = C*n^(-k/100)`,
where `f(n)` describes the size of the `n`-th pseudo-merge group. The
sample rate controls what percentage of eligible commits are considered
as candidates. The threshold parameter indicates the minimum age (so as
to avoid including too-recent commits in a pseudo-merge group, making it
less likely to be valid). The "maxMerges" parameter sets an upper-bound
on the number of pseudo-merge commits an individual group

The latter two "stable"-related parameters control "stable" pseudo-merge
groups, comprised of a fixed number of commits which are older than the
configured "stable threshold" value and may be grouped together in
chunks of "stableSize" in order of age.

This patch implements the aforementioned selection routine, as well as
parsing the relevant configuration options.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pseudo-merge.c | 441 +++++++++++++++++++++++++++++++++++++++++++++++++
 pseudo-merge.h |  96 +++++++++++
 2 files changed, 537 insertions(+)

diff --git a/pseudo-merge.c b/pseudo-merge.c
index 37e037ba272..caccef942a1 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -1,2 +1,443 @@
 #include "git-compat-util.h"
 #include "pseudo-merge.h"
+#include "date.h"
+#include "oid-array.h"
+#include "strbuf.h"
+#include "config.h"
+#include "string-list.h"
+#include "refs.h"
+#include "pack-bitmap.h"
+#include "commit.h"
+#include "alloc.h"
+#include "progress.h"
+
+#define DEFAULT_PSEUDO_MERGE_DECAY 1.0f
+#define DEFAULT_PSEUDO_MERGE_MAX_MERGES 64
+#define DEFAULT_PSEUDO_MERGE_SAMPLE_RATE 100
+#define DEFAULT_PSEUDO_MERGE_THRESHOLD approxidate("1.week.ago")
+#define DEFAULT_PSEUDO_MERGE_STABLE_THRESHOLD approxidate("1.month.ago")
+#define DEFAULT_PSEUDO_MERGE_STABLE_SIZE 512
+
+static float gitexp(float base, int exp)
+{
+	float result = 1;
+	while (1) {
+		if (exp % 2)
+			result *= base;
+		exp >>= 1;
+		if (!exp)
+			break;
+		base *= base;
+	}
+	return result;
+}
+
+static uint32_t pseudo_merge_group_size(const struct pseudo_merge_group *group,
+					const struct pseudo_merge_matches *matches,
+					uint32_t i)
+{
+	float C = 0.0f;
+	uint32_t n;
+
+	/*
+	 * The size of pseudo-merge groups decays according to a power series,
+	 * which looks like:
+	 *
+	 *   f(n) = C * n^-k
+	 *
+	 * , where 'n' is the n-th pseudo-merge group, 'f(n)' is its size, 'k'
+	 * is the decay rate, and 'C' is a scaling value.
+	 *
+	 * The value of C depends on the number of groups, decay rate, and total
+	 * number of commits. It is computed such that if there are M and N
+	 * total groups and commits, respectively, that:
+	 *
+	 *   N = f(0) + f(1) + ... f(M-1)
+	 *
+	 * Rearranging to isolate C, we get:
+	 *
+	 *   N = \sum_{n=1}^M C / n^k
+	 *
+	 *   N / C = \sum_{n=1}^M n^-k
+	 *
+	 *   C = N / \sum_{n=1}^M n^-k
+	 *
+	 * For example, if we have a decay rate of 'k' being equal to 1.5, 'N'
+	 * total commits equal to 10,000, and 'M' being equal to 6 groups, then
+	 * the (rounded) group sizes are:
+	 *
+	 *   { 5469, 1934, 1053, 684, 489, 372 }
+	 *
+	 * increasing the number of total groups, say to 10, scales the group
+	 * sizes appropriately:
+	 *
+	 *   { 5012, 1772, 964, 626, 448, 341, 271, 221, 186, 158 }
+	 */
+	for (n = 0; n < group->max_merges; n++)
+		C += 1.0f / gitexp(n + 1, group->decay);
+	C = matches->unstable_nr / C;
+
+	return (int)((C / gitexp(i + 1, group->decay)) + 0.5);
+}
+
+static void init_pseudo_merge_group(struct pseudo_merge_group *group)
+{
+	memset(group, 0, sizeof(struct pseudo_merge_group));
+
+	strmap_init_with_options(&group->matches, NULL, 0);
+
+	group->decay = DEFAULT_PSEUDO_MERGE_DECAY;
+	group->max_merges = DEFAULT_PSEUDO_MERGE_MAX_MERGES;
+	group->sample_rate = DEFAULT_PSEUDO_MERGE_SAMPLE_RATE;
+	group->threshold = DEFAULT_PSEUDO_MERGE_THRESHOLD;
+	group->stable_threshold = DEFAULT_PSEUDO_MERGE_STABLE_THRESHOLD;
+	group->stable_size = DEFAULT_PSEUDO_MERGE_STABLE_SIZE;
+}
+
+static int pseudo_merge_config(const char *var, const char *value,
+			       const struct config_context *ctx,
+			       void *cb_data)
+{
+	struct string_list *list = cb_data;
+	struct string_list_item *item;
+	struct pseudo_merge_group *group;
+	struct strbuf buf = STRBUF_INIT;
+	const char *sub, *key;
+	size_t sub_len;
+
+	if (parse_config_key(var, "bitmappseudomerge", &sub, &sub_len, &key))
+		return 0;
+
+	if (!sub_len)
+		return 0;
+
+	strbuf_add(&buf, sub, sub_len);
+
+	item = string_list_lookup(list, buf.buf);
+	if (!item) {
+		item = string_list_insert(list, buf.buf);
+
+		item->util = xmalloc(sizeof(struct pseudo_merge_group));
+		init_pseudo_merge_group(item->util);
+	}
+
+	group = item->util;
+
+	if (!strcmp(key, "pattern")) {
+		struct strbuf re = STRBUF_INIT;
+
+		free(group->pattern);
+		if (*value != '^')
+			strbuf_addch(&re, '^');
+		strbuf_addstr(&re, value);
+
+		group->pattern = xcalloc(1, sizeof(regex_t));
+		if (regcomp(group->pattern, re.buf, REG_EXTENDED))
+			die(_("failed to load pseudo-merge regex for %s: '%s'"),
+			    sub, re.buf);
+
+		strbuf_release(&re);
+	} else if (!strcmp(key, "decay")) {
+		group->decay = git_config_int(var, value, ctx->kvi);
+		if (group->decay < 0) {
+			warning(_("%s must be non-negative, using default"), var);
+			group->decay = DEFAULT_PSEUDO_MERGE_DECAY;
+		}
+	} else if (!strcmp(key, "samplerate")) {
+		group->sample_rate = git_config_int(var, value, ctx->kvi);
+		if (!(0 <= group->sample_rate && group->sample_rate <= 100)) {
+			warning(_("%s must be between 0 and 100, using default"), var);
+			group->sample_rate = DEFAULT_PSEUDO_MERGE_SAMPLE_RATE;
+		}
+	} else if (!strcmp(key, "threshold")) {
+		if (git_config_expiry_date(&group->threshold, var, value)) {
+			strbuf_release(&buf);
+			return -1;
+		}
+	} else if (!strcmp(key, "maxmerges")) {
+		group->max_merges = git_config_int(var, value, ctx->kvi);
+		if (group->max_merges < 0) {
+			warning(_("%s must be non-negative, using default"), var);
+			group->max_merges = DEFAULT_PSEUDO_MERGE_MAX_MERGES;
+		}
+	} else if (!strcmp(key, "stablethreshold")) {
+		if (git_config_expiry_date(&group->stable_threshold, var, value)) {
+			strbuf_release(&buf);
+			return -1;
+		}
+	} else if (!strcmp(key, "stablesize")) {
+		group->stable_size = git_config_int(var, value, ctx->kvi);
+		if (group->stable_size <= 0) {
+			warning(_("%s must be positive, using default"), var);
+			group->stable_size = DEFAULT_PSEUDO_MERGE_STABLE_SIZE;
+		}
+	}
+
+	strbuf_release(&buf);
+
+	return 0;
+}
+
+void load_pseudo_merges_from_config(struct string_list *list)
+{
+	struct string_list_item *item;
+
+	git_config(pseudo_merge_config, list);
+
+	for_each_string_list_item(item, list) {
+		struct pseudo_merge_group *group = item->util;
+		if (!group->pattern)
+			die(_("pseudo-merge group '%s' missing required pattern"),
+			    item->string);
+		if (group->threshold < group->stable_threshold)
+			die(_("pseudo-merge group '%s' has unstable threshold "
+			      "before stable one"), item->string);
+	}
+}
+
+static int find_pseudo_merge_group_for_ref(const char *refname,
+					   const struct object_id *oid,
+					   int flags UNUSED,
+					   void *_data)
+{
+	struct string_list *list = _data;
+	struct object_id peeled;
+	struct commit *c;
+	uint32_t i;
+	int has_bitmap;
+
+	if (!peel_iterated_oid(oid, &peeled))
+		oid = &peeled;
+
+	c = lookup_commit(the_repository, oid);
+	if (!c)
+		return 0;
+
+	has_bitmap = bitmap_writer_has_bitmapped_object_id(oid);
+
+	for (i = 0; i < list->nr; i++) {
+		struct pseudo_merge_group *group;
+		struct pseudo_merge_matches *matches;
+		struct strbuf group_name = STRBUF_INIT;
+		regmatch_t captures[16];
+		size_t j;
+
+		group = list->items[i].util;
+		if (regexec(group->pattern, refname, ARRAY_SIZE(captures),
+			    captures, 0))
+			continue;
+
+		if (captures[ARRAY_SIZE(captures) - 1].rm_so != -1)
+			warning(_("pseudo-merge regex from config has too many capture "
+				  "groups (max=%"PRIuMAX")"),
+				(uintmax_t)ARRAY_SIZE(captures) - 2);
+
+		for (j = !!group->pattern->re_nsub; j < ARRAY_SIZE(captures); j++) {
+			regmatch_t *match = &captures[j];
+			if (match->rm_so == -1)
+				continue;
+
+			if (group_name.len)
+				strbuf_addch(&group_name, '-');
+
+			strbuf_add(&group_name, refname + match->rm_so,
+				   match->rm_eo - match->rm_so);
+		}
+
+		matches = strmap_get(&group->matches, group_name.buf);
+		if (!matches) {
+			matches = xcalloc(1, sizeof(*matches));
+			strmap_put(&group->matches, strbuf_detach(&group_name, NULL),
+				   matches);
+		}
+
+		if (c->date <= group->stable_threshold) {
+			ALLOC_GROW(matches->stable, matches->stable_nr + 1,
+				   matches->stable_alloc);
+			matches->stable[matches->stable_nr++] = c;
+		} else if (c->date <= group->threshold && !has_bitmap) {
+			ALLOC_GROW(matches->unstable, matches->unstable_nr + 1,
+				   matches->unstable_alloc);
+			matches->unstable[matches->unstable_nr++] = c;
+		}
+
+		strbuf_release(&group_name);
+	}
+
+	return 0;
+}
+
+static struct commit *push_pseudo_merge(struct pseudo_merge_group *group)
+{
+	struct commit *merge;
+
+	ALLOC_GROW(group->merges, group->merges_nr + 1, group->merges_alloc);
+
+	merge = alloc_commit_node(the_repository);
+	merge->object.parsed = 1;
+	merge->object.flags |= BITMAP_PSEUDO_MERGE;
+
+	group->merges[group->merges_nr++] = merge;
+
+	return merge;
+}
+
+static struct pseudo_merge_commit_idx *pseudo_merge_idx(kh_oid_map_t *pseudo_merge_commits,
+							const struct object_id *oid)
+
+{
+	struct pseudo_merge_commit_idx *pmc;
+	khiter_t hash_pos;
+
+	hash_pos = kh_get_oid_map(pseudo_merge_commits, *oid);
+	if (hash_pos == kh_end(pseudo_merge_commits)) {
+		int hash_ret;
+		hash_pos = kh_put_oid_map(pseudo_merge_commits, *oid, &hash_ret);
+
+		CALLOC_ARRAY(pmc, 1);
+
+		kh_value(pseudo_merge_commits, hash_pos) = pmc;
+	} else {
+		pmc = kh_value(pseudo_merge_commits, hash_pos);
+	}
+
+	return pmc;
+}
+
+#define MIN_PSEUDO_MERGE_SIZE 8
+
+static void select_pseudo_merges_1(struct pseudo_merge_group *group,
+				   struct pseudo_merge_matches *matches,
+				   kh_oid_map_t *pseudo_merge_commits,
+				   uint32_t *pseudo_merges_nr)
+{
+	uint32_t i, j;
+	uint32_t stable_merges_nr;
+
+	if (!matches->stable_nr && !matches->unstable_nr)
+		return; /* all tips in this group already have bitmaps */
+
+	stable_merges_nr = matches->stable_nr / group->stable_size;
+	if (matches->stable_nr % group->stable_size)
+		stable_merges_nr++;
+
+	/* make stable_merges_nr pseudo merges for stable commits */
+	for (i = 0, j = 0; i < stable_merges_nr; i++) {
+		struct commit *merge;
+		struct commit_list **p;
+
+		merge = push_pseudo_merge(group);
+		p = &merge->parents;
+
+		do {
+			struct commit *c;
+			struct pseudo_merge_commit_idx *pmc;
+
+			if (j >= matches->stable_nr)
+				break;
+
+			c = matches->stable[j++];
+			pmc = pseudo_merge_idx(pseudo_merge_commits,
+					       &c->object.oid);
+
+			ALLOC_GROW(pmc->pseudo_merge, pmc->nr + 1, pmc->alloc);
+
+			pmc->pseudo_merge[pmc->nr++] = *pseudo_merges_nr;
+			p = commit_list_append(c, p);
+		} while (j % group->stable_size);
+
+		bitmap_writer_push_bitmapped_commit(merge, 1);
+		(*pseudo_merges_nr)++;
+	}
+
+	/* make up to group->max_merges pseudo merges for unstable commits */
+	for (i = 0, j = 0; i < group->max_merges; i++) {
+		struct commit *merge;
+		struct commit_list **p;
+		uint32_t size, end;
+
+		merge = push_pseudo_merge(group);
+		p = &merge->parents;
+
+		size = pseudo_merge_group_size(group, matches, i);
+		end = size < MIN_PSEUDO_MERGE_SIZE ? matches->unstable_nr : j + size;
+
+		for (; j < end && j < matches->unstable_nr; j++) {
+			struct commit *c = matches->unstable[j];
+			struct pseudo_merge_commit_idx *pmc;
+
+			if (j % (100 / group->sample_rate))
+				continue;
+
+			pmc = pseudo_merge_idx(pseudo_merge_commits,
+					       &c->object.oid);
+
+			ALLOC_GROW(pmc->pseudo_merge, pmc->nr + 1, pmc->alloc);
+
+			pmc->pseudo_merge[pmc->nr++] = *pseudo_merges_nr;
+			p = commit_list_append(c, p);
+		}
+
+		bitmap_writer_push_bitmapped_commit(merge, 1);
+		(*pseudo_merges_nr)++;
+		if (end >= matches->unstable_nr)
+			break;
+	}
+}
+
+static int commit_date_cmp(const void *va, const void *vb)
+{
+	timestamp_t a = (*(const struct commit **)va)->date;
+	timestamp_t b = (*(const struct commit **)vb)->date;
+
+	if (a < b)
+		return -1;
+	else if (a > b)
+		return 1;
+	return 0;
+}
+
+static void sort_pseudo_merge_matches(struct pseudo_merge_matches *matches)
+{
+	QSORT(matches->stable, matches->stable_nr, commit_date_cmp);
+	QSORT(matches->unstable, matches->unstable_nr, commit_date_cmp);
+}
+
+void select_pseudo_merges(struct string_list *list,
+			  struct commit **commits, size_t commits_nr,
+			  kh_oid_map_t *pseudo_merge_commits,
+			  uint32_t *pseudo_merges_nr,
+			  unsigned show_progress)
+{
+	struct progress *progress = NULL;
+	uint32_t i;
+
+	if (!list->nr)
+		return;
+
+	if (show_progress)
+		progress = start_progress("Selecting pseudo-merge commits", list->nr);
+
+	for_each_ref(find_pseudo_merge_group_for_ref, list);
+
+	for (i = 0; i < list->nr; i++) {
+		struct pseudo_merge_group *group;
+		struct hashmap_iter iter;
+		struct strmap_entry *e;
+
+		group = list->items[i].util;
+		strmap_for_each_entry(&group->matches, &iter, e) {
+			struct pseudo_merge_matches *matches = e->value;
+
+			sort_pseudo_merge_matches(matches);
+
+			select_pseudo_merges_1(group, matches,
+					       pseudo_merge_commits,
+					       pseudo_merges_nr);
+		}
+
+		display_progress(progress, i + 1);
+	}
+
+	stop_progress(&progress);
+}
diff --git a/pseudo-merge.h b/pseudo-merge.h
index cab8ff6960a..81888731864 100644
--- a/pseudo-merge.h
+++ b/pseudo-merge.h
@@ -2,5 +2,101 @@
 #define PSEUDO_MERGE_H
 
 #include "git-compat-util.h"
+#include "strmap.h"
+#include "khash.h"
+#include "ewah/ewok.h"
+
+struct commit;
+struct string_list;
+struct bitmap_index;
+
+/*
+ * A pseudo-merge group tracks the set of non-bitmapped reference tips
+ * that match the given pattern.
+ *
+ * Within those matches, they are further segmented by separating
+ * consecutive capture groups with '-' dash character capture groups
+ * with '-' dash characters.
+ *
+ * Those groups are then ordered by committer date and partitioned
+ * into individual pseudo-merge(s) according to the decay, max_merges,
+ * sample_rate, and threshold parameters.
+ */
+struct pseudo_merge_group {
+	regex_t *pattern;
+
+	/* capture group(s) -> struct pseudo_merge_matches */
+	struct strmap matches;
+
+	/*
+	 * The individual pseudo-merge(s) that are generated from the
+	 * above array of matches, partitioned according to the below
+	 * parameters.
+	 */
+	struct commit **merges;
+	size_t merges_nr;
+	size_t merges_alloc;
+
+	/*
+	 * Pseudo-merge grouping parameters. See git-config(1) for
+	 * more information.
+	 */
+	float decay;
+	int max_merges;
+	int sample_rate;
+	int stable_size;
+	timestamp_t threshold;
+	timestamp_t stable_threshold;
+};
+
+struct pseudo_merge_matches {
+	struct commit **stable;
+	struct commit **unstable;
+	size_t stable_nr, stable_alloc;
+	size_t unstable_nr, unstable_alloc;
+};
+
+/*
+ * Read the repository's configuration:
+ *
+ *   - bitmapPseudoMerge.<name>.pattern
+ *   - bitmapPseudoMerge.<name>.decay
+ *   - bitmapPseudoMerge.<name>.sampleRate
+ *   - bitmapPseudoMerge.<name>.threshold
+ *   - bitmapPseudoMerge.<name>.maxMerges
+ *   - bitmapPseudoMerge.<name>.stableThreshold
+ *   - bitmapPseudoMerge.<name>.stableSize
+ *
+ * and populates the given `list` with pseudo-merge groups. String
+ * entry keys are the pseudo-merge group names, and the values are
+ * pointers to the pseudo_merge_group structure itself.
+ */
+void load_pseudo_merges_from_config(struct string_list *list);
+
+/*
+ * A pseudo-merge commit index (pseudo_merge_commit_idx) maps a
+ * particular (non-pseudo-merge) commit to the list of pseudo-merge(s)
+ * it appears in.
+ */
+struct pseudo_merge_commit_idx {
+	uint32_t *pseudo_merge;
+	size_t nr, alloc;
+};
+
+/*
+ * Selects pseudo-merges from a list of commits, populating the given
+ * string_list of pseudo-merge groups.
+ *
+ * Populates the pseudo_merge_commits map with a commit_idx
+ * corresponding to each commit in the list. Counts the total number
+ * of pseudo-merges generated.
+ *
+ * Optionally shows a progress meter.
+ */
+void select_pseudo_merges(struct string_list *list,
+			  struct commit **commits, size_t commits_nr,
+			  kh_oid_map_t *pseudo_merge_commits,
+			  uint32_t *pseudo_merges_nr,
+			  unsigned show_progress);
 
 #endif
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 10/23] pack-bitmap-write.c: select pseudo-merge commits
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                     ` (8 preceding siblings ...)
  2024-04-29 20:43   ` [PATCH v2 09/23] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
@ 2024-04-29 20:43   ` Taylor Blau
  2024-05-06 11:53     ` Patrick Steinhardt
  2024-04-29 20:43   ` [PATCH v2 11/23] pack-bitmap-write.c: write pseudo-merge table Taylor Blau
                     ` (13 subsequent siblings)
  23 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:43 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Now that the pseudo-merge machinery has learned how to select
non-bitmapped commits and assign them into different pseudo-merge
group(s), invoke this new API from within the pack-bitmap internals and
store the results off.

Note that the selected pseudo-merge commits aren't actually used or
written anywhere yet. This will be done in the following commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config.txt                     |  2 +
 Documentation/config/bitmap-pseudo-merge.txt | 75 ++++++++++++++++++++
 Documentation/technical/bitmap-format.txt    | 26 +++++++
 pack-bitmap-write.c                          | 14 ++++
 4 files changed, 117 insertions(+)
 create mode 100644 Documentation/config/bitmap-pseudo-merge.txt

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 70b448b1326..bbedb7b9a06 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -383,6 +383,8 @@ include::config/apply.txt[]
 
 include::config/attr.txt[]
 
+include::config/bitmap-pseudo-merge.txt[]
+
 include::config/blame.txt[]
 
 include::config/branch.txt[]
diff --git a/Documentation/config/bitmap-pseudo-merge.txt b/Documentation/config/bitmap-pseudo-merge.txt
new file mode 100644
index 00000000000..90b72522046
--- /dev/null
+++ b/Documentation/config/bitmap-pseudo-merge.txt
@@ -0,0 +1,75 @@
+bitmapPseudoMerge.<name>.pattern::
+	Regular expression used to match reference names. Commits
+	pointed to by references matching this pattern (and meeting
+	the below criteria, like `bitmapPseudoMerge.<name>.sampleRate`
+	and `bitmapPseudoMerge.<name>.threshold`) will be considered
+	for inclusion in a pseudo-merge bitmap.
++
+Commits are grouped into pseudo-merge groups based on whether or not
+any reference(s) that point at a given commit match the pattern, which
+is an extended regular expression.
++
+Within a pseudo-merge group, commits may be further grouped into
+sub-groups based on the capture groups in the pattern. These
+sub-groupings are formed from the regular expressions by concatenating
+any capture groups from the regular expression, with a '-' dash in
+between.
++
+For example, if the pattern is `refs/tags/`, then all tags (provided
+they meet the below criteria) will be considered candidates for the
+same pseudo-merge group. However, if the pattern is instead
+`refs/remotes/([0-9])+/tags/`, then tags from different remotes will
+be grouped into separate pseudo-merge groups, based on the remote
+number.
+
+bitmapPseudoMerge.<name>.decay::
+	Determines the rate at which consecutive pseudo-merge bitmap
+	groups decrease in size. Must be non-negative. This parameter
+	can be thought of as `k` in the function `f(n) = C *
+	n^(-k/100)`, where `f(n)` is the size of the `n`th group.
++
+Setting the decay rate equal to `0` will cause all groups to be the
+same size. Setting the decay rate equal to `100` will cause the `n`th
+group to be `1/n` the size of the initial group.  Higher values of the
+decay rate cause consecutive groups to shrink at an increasing rate.
+The default is `100`.
+
+bitmapPseudoMerge.<name>.sampleRate::
+	Determines the proportion of non-bitmapped commits (among
+	reference tips) which are selected for inclusion in an
+	unstable pseudo-merge bitmap. Must be between `0` and `100`
+	(inclusive). The default is `100`.
+
+bitmapPseudoMerge.<name>.threshold::
+	Determines the minimum age of non-bitmapped commits (among
+	reference tips, as above) which are candidates for inclusion
+	in an unstable pseudo-merge bitmap. The default is
+	`1.week.ago`.
+
+bitmapPseudoMerge.<name>.maxMerges::
+	Determines the maximum number of pseudo-merge commits among
+	which commits may be distributed.
++
+For pseudo-merge groups whose pattern does not contain any capture
+groups, this setting is applied for all commits matching the regular
+expression. For patterns that have one or more capture groups, this
+setting is applied for each distinct capture group.
++
+For example, if your capture group is `refs/tags/`, then this setting
+will distribute all tags into a maximum of `maxMerges` pseudo-merge
+commits. However, if your capture group is, say,
+`refs/remotes/([0-9]+)/tags/`, then this setting will be applied to
+each remote's set of tags individually.
++
+Must be non-negative. The default value is 64.
+
+bitmapPseudoMerge.<name>.stableThreshold::
+	Determines the minimum age of commits (among reference tips,
+	as above, however stable commits are still considered
+	candidates even when they have been covered by a bitmap) which
+	are candidates for a stable a pseudo-merge bitmap. The default
+	is `1.month.ago`.
+
+bitmapPseudoMerge.<name>.stableSize::
+	Determines the size (in number of commits) of a stable
+	psuedo-merge bitmap. The default is `512`.
diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index 63a7177ac08..ed7edf98034 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -434,3 +434,29 @@ the end of a `.bitmap` file. The format is as follows:
 
 * An 8-byte unsigned value (in network byte-order) equal to the number
   of bytes in the pseudo-merge section (including this field).
+
+=== Pseudo-merge selection
+
+Pseudo-merge commits are selected among non-bitmapped commits at the
+tip of one or more reference(s). In addition, there are a handful of
+constraints to further refine this selection:
+
+`pack.bitmapPseudoMergeDecay`:: Defines the "decay rate", which
+corresponds to how quickly (or not) consecutive pseudo-merges decrease
+in size relative to one another.
+
+`pack.bitmapPseudoMergeGroups`:: Defines the maximum number of
+pseudo-merge groups.
+
+`pack.bitmapPseudoMergeSampleRate`:: Defines the percentage of commits
+(matching the above criteria) which are selected.
+
+`pack.bitmapPseudoMergeThreshold`:: Defines the minimum age of a commit
+in order to be considered for inclusion within one or more pseudo-merge
+bitmaps.
+
+The size of consecutive pseudo-merge groups decays according to a
+power-law decay function, where the size of the `n`-th group is `f(n) =
+C*n^-k`. The value of `C` is chosen accordingly to match the number of
+desired groups, and `k` is 1/100th of the value of
+`pack.bitmapPseudoMergeDecay`.
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index dab5bdea806..e06930e10b9 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -17,6 +17,7 @@
 #include "trace2.h"
 #include "tree.h"
 #include "tree-walk.h"
+#include "pseudo-merge.h"
 
 struct bitmapped_commit {
 	struct commit *commit;
@@ -39,6 +40,8 @@ struct bitmap_writer {
 	struct bitmapped_commit *selected;
 	unsigned int selected_nr, selected_alloc;
 
+	struct string_list pseudo_merge_groups;
+	kh_oid_map_t *pseudo_merge_commits; /* oid -> pseudo merge(s) */
 	uint32_t pseudo_merges_nr;
 
 	struct progress *progress;
@@ -56,6 +59,11 @@ static inline int bitmap_writer_selected_nr(void)
 void bitmap_writer_init(struct repository *r)
 {
 	writer.bitmaps = kh_init_oid_map();
+	writer.pseudo_merge_commits = kh_init_oid_map();
+
+	string_list_init_dup(&writer.pseudo_merge_groups);
+
+	load_pseudo_merges_from_config(&writer.pseudo_merge_groups);
 }
 
 void bitmap_writer_show_progress(int show)
@@ -686,6 +694,12 @@ void bitmap_writer_select_commits(struct commit **indexed_commits,
 	}
 
 	stop_progress(&writer.progress);
+
+	select_pseudo_merges(&writer.pseudo_merge_groups,
+			     indexed_commits, indexed_commits_nr,
+			     writer.pseudo_merge_commits,
+			     &writer.pseudo_merges_nr,
+			     writer.show_progress);
 }
 
 
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 11/23] pack-bitmap-write.c: write pseudo-merge table
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                     ` (9 preceding siblings ...)
  2024-04-29 20:43   ` [PATCH v2 10/23] pack-bitmap-write.c: select " Taylor Blau
@ 2024-04-29 20:43   ` Taylor Blau
  2024-04-29 20:43   ` [PATCH v2 12/23] pack-bitmap: extract `read_bitmap()` function Taylor Blau
                     ` (12 subsequent siblings)
  23 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:43 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Now that the pack-bitmap writer machinery understands how to select and
store pseudo-merge commits, teach it how to write the new optional
pseudo-merge .bitmap extension.

No readers yet exist for this new extension to the .bitmap format. The
following commits will take any preparatory step(s) necessary before
then implementing the routines necessary to read this new table.

In the meantime, the new `write_pseudo_merges()` function implements
writing this new format as described by a previous commit in
Documentation/technical/bitmap-format.txt.

Writing this table is fairly straightforward and consists of a few
sub-components:

  - a pair of bitmaps for each pseudo-merge (one for the pseudo-merge
    "parents", and another for the objects reachable from those parents)

  - for each commit, the offset of either (a) the pseudo-merge it
    belongs to, or (b) an extended lookup table if it belongs to >1
    pseudo-merge groups

  - if there are any commits belonging to >1 pseudo-merge group, the
    extended lookup tables (which each consist of the number of
    pseudo-merge groups a commit appears in, and then that many 4-byte
    unsigned )

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 128 ++++++++++++++++++++++++++++++++++++++++++++
 pack-bitmap.h       |   1 +
 2 files changed, 129 insertions(+)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index e06930e10b9..d4894ace9ee 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -18,6 +18,7 @@
 #include "tree.h"
 #include "tree-walk.h"
 #include "pseudo-merge.h"
+#include "oid-array.h"
 
 struct bitmapped_commit {
 	struct commit *commit;
@@ -748,6 +749,127 @@ static void write_selected_commits_v1(struct hashfile *f,
 	}
 }
 
+static void write_pseudo_merges(struct hashfile *f)
+{
+	struct oid_array commits = OID_ARRAY_INIT;
+	struct bitmap **commits_bitmap = NULL;
+	off_t *pseudo_merge_ofs = NULL;
+	off_t start, table_start, next_ext;
+
+	uint32_t base = bitmap_writer_selected_nr();
+	size_t i, j = 0;
+
+	CALLOC_ARRAY(commits_bitmap, writer.pseudo_merges_nr);
+	CALLOC_ARRAY(pseudo_merge_ofs, writer.pseudo_merges_nr);
+
+	for (i = 0; i < writer.pseudo_merges_nr; i++) {
+		struct bitmapped_commit *merge = &writer.selected[base + i];
+		struct commit_list *p;
+
+		if (!merge->pseudo_merge)
+			BUG("found non-pseudo merge commit at %"PRIuMAX, (uintmax_t)i);
+
+		commits_bitmap[i] = bitmap_new();
+
+		for (p = merge->commit->parents; p; p = p->next)
+			bitmap_set(commits_bitmap[i],
+				   find_object_pos(&p->item->object.oid, NULL));
+	}
+
+	start = hashfile_total(f);
+
+	for (i = 0; i < writer.pseudo_merges_nr; i++) {
+		struct ewah_bitmap *commits_ewah = bitmap_to_ewah(commits_bitmap[i]);
+
+		pseudo_merge_ofs[i] = hashfile_total(f);
+
+		dump_bitmap(f, commits_ewah);
+		dump_bitmap(f, writer.selected[base+i].write_as);
+
+		ewah_free(commits_ewah);
+	}
+
+	next_ext = st_add(hashfile_total(f),
+			  st_mult(kh_size(writer.pseudo_merge_commits),
+				  sizeof(uint64_t)));
+
+	table_start = hashfile_total(f);
+
+	commits.alloc = kh_size(writer.pseudo_merge_commits);
+	CALLOC_ARRAY(commits.oid, commits.alloc);
+
+	for (i = kh_begin(writer.pseudo_merge_commits); i != kh_end(writer.pseudo_merge_commits); i++) {
+		if (!kh_exist(writer.pseudo_merge_commits, i))
+			continue;
+		oid_array_append(&commits, &kh_key(writer.pseudo_merge_commits, i));
+	}
+
+	oid_array_sort(&commits);
+
+	/* write lookup table (non-extended) */
+	for (i = 0; i < commits.nr; i++) {
+		int hash_pos;
+		struct pseudo_merge_commit_idx *c;
+
+		hash_pos = kh_get_oid_map(writer.pseudo_merge_commits,
+					  commits.oid[i]);
+		if (hash_pos == kh_end(writer.pseudo_merge_commits))
+			BUG("could not find pseudo-merge commit %s",
+			    oid_to_hex(&commits.oid[i]));
+
+		c = kh_value(writer.pseudo_merge_commits, hash_pos);
+
+		hashwrite_be32(f, find_object_pos(&commits.oid[i], NULL));
+		if (c->nr == 1)
+			hashwrite_be64(f, pseudo_merge_ofs[c->pseudo_merge[0]]);
+		else if (c->nr > 1) {
+			if (next_ext & ((uint64_t)1<<63))
+				die(_("too many pseudo-merges"));
+			hashwrite_be64(f, next_ext | ((uint64_t)1<<63));
+			next_ext = st_add3(next_ext,
+					   sizeof(uint32_t),
+					   st_mult(c->nr, sizeof(uint64_t)));
+		} else
+			BUG("expected commit '%s' to have at least one "
+			    "pseudo-merge", oid_to_hex(&commits.oid[i]));
+	}
+
+	/* write lookup table (extended) */
+	for (i = 0; i < commits.nr; i++) {
+		int hash_pos;
+		struct pseudo_merge_commit_idx *c;
+
+		hash_pos = kh_get_oid_map(writer.pseudo_merge_commits,
+					  commits.oid[i]);
+		if (hash_pos == kh_end(writer.pseudo_merge_commits))
+			BUG("could not find pseudo-merge commit %s",
+			    oid_to_hex(&commits.oid[i]));
+
+		c = kh_value(writer.pseudo_merge_commits, hash_pos);
+		if (c->nr == 1)
+			continue;
+
+		hashwrite_be32(f, c->nr);
+		for (j = 0; j < c->nr; j++)
+			hashwrite_be64(f, pseudo_merge_ofs[c->pseudo_merge[j]]);
+	}
+
+	/* write positions for all pseudo merges */
+	for (i = 0; i < writer.pseudo_merges_nr; i++)
+		hashwrite_be64(f, pseudo_merge_ofs[i]);
+
+	hashwrite_be32(f, writer.pseudo_merges_nr);
+	hashwrite_be32(f, kh_size(writer.pseudo_merge_commits));
+	hashwrite_be64(f, table_start - start);
+	hashwrite_be64(f, hashfile_total(f) - start + sizeof(uint64_t));
+
+	for (i = 0; i < writer.pseudo_merges_nr; i++)
+		bitmap_free(commits_bitmap[i]);
+
+	free(pseudo_merge_ofs);
+	free(commits_bitmap);
+}
+
 static int table_cmp(const void *_va, const void *_vb, void *_data)
 {
 	uint32_t *commit_positions = _data;
@@ -855,6 +977,9 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 
 	int fd = odb_mkstemp(&tmp_file, "pack/tmp_bitmap_XXXXXX");
 
+	if (writer.pseudo_merges_nr)
+		options |= BITMAP_OPT_PSEUDO_MERGES;
+
 	f = hashfd(fd, tmp_file.buf);
 
 	memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE));
@@ -886,6 +1011,9 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 
 	write_selected_commits_v1(f, commit_positions, offsets);
 
+	if (options & BITMAP_OPT_PSEUDO_MERGES)
+		write_pseudo_merges(f);
+
 	if (options & BITMAP_OPT_LOOKUP_TABLE)
 		write_lookup_table(f, commit_positions, offsets);
 
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 0f539d79cfd..55527f61cd9 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -37,6 +37,7 @@ enum pack_bitmap_opts {
 	BITMAP_OPT_FULL_DAG = 0x1,
 	BITMAP_OPT_HASH_CACHE = 0x4,
 	BITMAP_OPT_LOOKUP_TABLE = 0x10,
+	BITMAP_OPT_PSEUDO_MERGES = 0x20,
 };
 
 enum pack_bitmap_flags {
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 12/23] pack-bitmap: extract `read_bitmap()` function
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                     ` (10 preceding siblings ...)
  2024-04-29 20:43   ` [PATCH v2 11/23] pack-bitmap-write.c: write pseudo-merge table Taylor Blau
@ 2024-04-29 20:43   ` Taylor Blau
  2024-04-29 20:43   ` [PATCH v2 13/23] pseudo-merge: scaffolding for reads Taylor Blau
                     ` (11 subsequent siblings)
  23 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:43 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

The pack-bitmap machinery uses the `read_bitmap_1()` function to read a
bitmap from within the mmap'd region corresponding to the .bitmap file.
As as side-effect of calling this function, `read_bitmap_1()` increments
the `index->map_pos` variable to reflect the number of bytes read.

Extract the core of this routine to a separate function (that operates
over a `const unsigned char *`, a `size_t` and a `size_t *` pointer)
instead of a `struct bitmap_index *` pointer.

This function (called `read_bitmap()`) is part of the pack-bitmap.h API
so that it can be used within the upcoming portion of the implementation
in pseduo-merge.ch.

Rewrite the existing function, `read_bitmap_1()`, in terms of its more
generic counterpart.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 24 +++++++++++++++---------
 pack-bitmap.h |  2 ++
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 35c5ef9d3cd..3519edb896b 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -129,17 +129,13 @@ static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
 	return composed;
 }
 
-/*
- * Read a bitmap from the current read position on the mmaped
- * index, and increase the read position accordingly
- */
-static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
+struct ewah_bitmap *read_bitmap(const unsigned char *map,
+				size_t map_size, size_t *map_pos)
 {
 	struct ewah_bitmap *b = ewah_pool_new();
 
-	ssize_t bitmap_size = ewah_read_mmap(b,
-		index->map + index->map_pos,
-		index->map_size - index->map_pos);
+	ssize_t bitmap_size = ewah_read_mmap(b, map + *map_pos,
+					     map_size - *map_pos);
 
 	if (bitmap_size < 0) {
 		error(_("failed to load bitmap index (corrupted?)"));
@@ -147,10 +143,20 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
 		return NULL;
 	}
 
-	index->map_pos += bitmap_size;
+	*map_pos += bitmap_size;
+
 	return b;
 }
 
+/*
+ * Read a bitmap from the current read position on the mmaped
+ * index, and increase the read position accordingly
+ */
+static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
+{
+	return read_bitmap(index->map, index->map_size, &index->map_pos);
+}
+
 static uint32_t bitmap_num_objects(struct bitmap_index *index)
 {
 	if (index->midx)
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 55527f61cd9..a5fe4f305ef 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -133,4 +133,6 @@ int bitmap_is_preferred_refname(struct repository *r, const char *refname);
 
 int verify_bitmap_files(struct repository *r);
 
+struct ewah_bitmap *read_bitmap(const unsigned char *map,
+				size_t map_size, size_t *map_pos);
 #endif
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 13/23] pseudo-merge: scaffolding for reads
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                     ` (11 preceding siblings ...)
  2024-04-29 20:43   ` [PATCH v2 12/23] pack-bitmap: extract `read_bitmap()` function Taylor Blau
@ 2024-04-29 20:43   ` Taylor Blau
  2024-04-29 20:43   ` [PATCH v2 14/23] pack-bitmap.c: read pseudo-merge extension Taylor Blau
                     ` (10 subsequent siblings)
  23 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:43 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Implement scaffolding within the new pseudo-merge compilation unit
necessary to use the pseudo-merge API from within the pack-bitmap.c
machinery.

The core of this scaffolding is two-fold:

  - The `pseudo_merge` structure itself, which represents an individual
    pseudo-merge bitmap. It has fields for both bitmaps, as well as
    metadata about its position within the memory-mapped region, and
    a few extra bits indicating whether or not it is satisfied, and
    which bitmaps(s, if any) have been read, since they are initialized
    lazily.

  - The `pseudo_merge_map` structure, which holds an array of
    pseudo_merges, as well as a pointer to the memory-mapped region
    containing the pseudo-merge serialization from within a .bitmap
    file.

Note that the `bitmap_index` structure is defined statically within the
pack-bitmap.o compilation unit, so we can't take in a `struct
bitmap_index *`. Instead, wrap the primary components necessary to read
the pseudo-merges in this new structure to avoid exposing the
implementation details of the `bitmap_index` structure.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pseudo-merge.c | 10 ++++++++
 pseudo-merge.h | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+)

diff --git a/pseudo-merge.c b/pseudo-merge.c
index caccef942a1..d18de0a266b 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -441,3 +441,13 @@ void select_pseudo_merges(struct string_list *list,
 
 	stop_progress(&progress);
 }
+
+void free_pseudo_merge_map(struct pseudo_merge_map *pm)
+{
+	uint32_t i;
+	for (i = 0; i < pm->nr; i++) {
+		ewah_pool_free(pm->v[i].commits);
+		ewah_pool_free(pm->v[i].bitmap);
+	}
+	free(pm->v);
+}
diff --git a/pseudo-merge.h b/pseudo-merge.h
index 81888731864..2f652fc6767 100644
--- a/pseudo-merge.h
+++ b/pseudo-merge.h
@@ -99,4 +99,69 @@ void select_pseudo_merges(struct string_list *list,
 			  uint32_t *pseudo_merges_nr,
 			  unsigned show_progress);
 
+/*
+ * Represents a serialized view of a file containing pseudo-merge(s)
+ * (see Documentation/technical/bitmap-format.txt for a specification
+ * of the format).
+ */
+struct pseudo_merge_map {
+	/*
+	 * An array of pseudo-merge(s), lazily loaded from the .bitmap
+	 * file.
+	 */
+	struct pseudo_merge *v;
+	size_t nr;
+	size_t commits_nr;
+
+	/*
+	 * Pointers into a memory-mapped view of the .bitmap file:
+	 *
+	 *   - map: the beginning of the .bitmap file
+	 *   - commits: the beginning of the pseudo-merge commit index
+	 *   - map_size: the size of the .bitmap file
+	 */
+	const unsigned char *map;
+	const unsigned char *commits;
+
+	size_t map_size;
+};
+
+/*
+ * An individual pseudo-merge, storing a pair of lazily-loaded
+ * bitmaps:
+ *
+ *  - commits: the set of commit(s) that are part of the pseudo-merge
+ *  - bitmap: the set of object(s) reachable from the above set of
+ *    commits.
+ *
+ * The `at` and `bitmap_at` fields are used to store the locations of
+ * each of the above bitmaps in the .bitmap file.
+ */
+struct pseudo_merge {
+	struct ewah_bitmap *commits;
+	struct ewah_bitmap *bitmap;
+
+	off_t at;
+	off_t bitmap_at;
+
+	/*
+	 * `satisfied` indicates whether the given pseudo-merge has been
+	 * used.
+	 *
+	 * `loaded_commits` and `loaded_bitmap` indicate whether the
+	 * respective bitmaps have been loaded and read from the
+	 * .bitmap file.
+	 */
+	unsigned satisfied : 1,
+		 loaded_commits : 1,
+		 loaded_bitmap : 1;
+};
+
+/*
+ * Frees the given pseudo-merge map, releasing any memory held by (a)
+ * parsed EWAH bitmaps, or (b) the array of pseudo-merges itself. Does
+ * not free the memory-mapped view of the .bitmap file.
+ */
+void free_pseudo_merge_map(struct pseudo_merge_map *pm);
+
 #endif
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 14/23] pack-bitmap.c: read pseudo-merge extension
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                     ` (12 preceding siblings ...)
  2024-04-29 20:43   ` [PATCH v2 13/23] pseudo-merge: scaffolding for reads Taylor Blau
@ 2024-04-29 20:43   ` Taylor Blau
  2024-04-29 20:44   ` [PATCH v2 15/23] pseudo-merge: implement support for reading pseudo-merge commits Taylor Blau
                     ` (9 subsequent siblings)
  23 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:43 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Now that the scaffolding for reading the pseudo-merge extension has been
laid, teach the pack-bitmap machinery to read the pseudo-merge extension
when present.

Note that pseudo-merges themselves are not yet used during traversal,
this step will be taken by a future commit.

In the meantime, read the table and initialize the pseudo_merge_map
structure introduced by a previous commit. When the pseudo-merge
extension is present, `load_bitmap_header()` performs basic sanity
checks to make sure that the table is well-formed.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 3519edb896b..fc9c3e2fc43 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -20,6 +20,7 @@
 #include "list-objects-filter-options.h"
 #include "midx.h"
 #include "config.h"
+#include "pseudo-merge.h"
 
 /*
  * An entry on the bitmap index, representing the bitmap for a given
@@ -86,6 +87,9 @@ struct bitmap_index {
 	 */
 	unsigned char *table_lookup;
 
+	/* This contains the pseudo-merge cache within 'map' (if found). */
+	struct pseudo_merge_map pseudo_merges;
+
 	/*
 	 * Extended index.
 	 *
@@ -205,6 +209,41 @@ static int load_bitmap_header(struct bitmap_index *index)
 				index->table_lookup = (void *)(index_end - table_size);
 			index_end -= table_size;
 		}
+
+		if (flags & BITMAP_OPT_PSEUDO_MERGES) {
+			unsigned char *pseudo_merge_ofs;
+			size_t table_size;
+			uint32_t i;
+
+			if (sizeof(table_size) > index_end - index->map - header_size)
+				return error(_("corrupted bitmap index file (too short to fit pseudo-merge table header)"));
+
+			table_size = get_be64(index_end - 8);
+			if (table_size > index_end - index->map - header_size)
+				return error(_("corrupted bitmap index file (too short to fit pseudo-merge table)"));
+
+			if (git_env_bool("GIT_TEST_USE_PSEUDO_MERGES", 1)) {
+				const unsigned char *ext = (index_end - table_size);
+
+				index->pseudo_merges.map = index->map;
+				index->pseudo_merges.map_size = index->map_size;
+				index->pseudo_merges.commits = ext + get_be64(index_end - 16);
+				index->pseudo_merges.commits_nr = get_be32(index_end - 20);
+				index->pseudo_merges.nr = get_be32(index_end - 24);
+
+				CALLOC_ARRAY(index->pseudo_merges.v,
+					     index->pseudo_merges.nr);
+
+				pseudo_merge_ofs = index_end - 24 -
+					(index->pseudo_merges.nr * sizeof(uint64_t));
+				for (i = 0; i < index->pseudo_merges.nr; i++) {
+					index->pseudo_merges.v[i].at = get_be64(pseudo_merge_ofs);
+					pseudo_merge_ofs += sizeof(uint64_t);
+				}
+			}
+
+			index_end -= table_size;
+		}
 	}
 
 	index->entry_count = ntohl(header->entry_count);
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 15/23] pseudo-merge: implement support for reading pseudo-merge commits
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                     ` (13 preceding siblings ...)
  2024-04-29 20:43   ` [PATCH v2 14/23] pack-bitmap.c: read pseudo-merge extension Taylor Blau
@ 2024-04-29 20:44   ` Taylor Blau
  2024-04-29 20:44   ` [PATCH v2 16/23] ewah: implement `ewah_bitmap_popcount()` Taylor Blau
                     ` (8 subsequent siblings)
  23 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:44 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Implement the basic API for reading pseudo-merge bitmaps, which consists
of four basic functions:

  - pseudo_merge_bitmap()
  - use_pseudo_merge()
  - apply_pseudo_merges_for_commit()
  - cascade_pseudo_merges()

These functions are all documented in pseudo-merge.h, but their rough
descriptions are as follows:

  - pseudo_merge_bitmap() reads and inflates the objects EWAH bitmap for
    a given pseudo-merge

  - use_pseudo_merge() does the same as pseudo_merge_bitmap(), but on
    the commits EWAH bitmap, not the objects bitmap

  - apply_pseudo_merges_for_commit() applies all satisfied pseudo-merge
    commits for a given result set, and cascades any yet-unsatisfied
    pseudo-merges if any were applied in the previous step

  - cascade_pseudo_merges() applies all pseudo-merges which are
    satisfied but have not been previously applied, repeating this
    process until no more pseudo-merges can be applied

The core of the API is the latter two functions, which are responsible
for applying pseudo-merges during the object traversal implemented in
the pack-bitmap machinery.

The other two functions (pseudo_merge_bitmap(), and use_pseudo_merge())
are low-level ways to interact with the pseudo-merge machinery, which
will be useful in future commits.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pseudo-merge.c | 231 +++++++++++++++++++++++++++++++++++++++++++++++++
 pseudo-merge.h |  44 ++++++++++
 2 files changed, 275 insertions(+)

diff --git a/pseudo-merge.c b/pseudo-merge.c
index d18de0a266b..e111c9cd1a6 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -10,6 +10,7 @@
 #include "commit.h"
 #include "alloc.h"
 #include "progress.h"
+#include "hex.h"
 
 #define DEFAULT_PSEUDO_MERGE_DECAY 1.0f
 #define DEFAULT_PSEUDO_MERGE_MAX_MERGES 64
@@ -451,3 +452,233 @@ void free_pseudo_merge_map(struct pseudo_merge_map *pm)
 	}
 	free(pm->v);
 }
+
+struct pseudo_merge_commit_ext {
+	uint32_t nr;
+	const unsigned char *ptr;
+};
+
+static int pseudo_merge_ext_at(const struct pseudo_merge_map *pm,
+			       struct pseudo_merge_commit_ext *ext, size_t at)
+{
+	if (at >= pm->map_size)
+		return error(_("extended pseudo-merge read out-of-bounds "
+			       "(%"PRIuMAX" >= %"PRIuMAX")"),
+			     (uintmax_t)at, (uintmax_t)pm->map_size);
+
+	ext->nr = get_be32(pm->map + at);
+	ext->ptr = pm->map + at + sizeof(uint32_t);
+
+	return 0;
+}
+
+struct ewah_bitmap *pseudo_merge_bitmap(const struct pseudo_merge_map *pm,
+					struct pseudo_merge *merge)
+{
+	if (!merge->loaded_commits)
+		BUG("cannot use unloaded pseudo-merge bitmap");
+
+	if (!merge->loaded_bitmap) {
+		size_t at = merge->bitmap_at;
+
+		merge->bitmap = read_bitmap(pm->map, pm->map_size, &at);
+		merge->loaded_bitmap = 1;
+	}
+
+	return merge->bitmap;
+}
+
+struct pseudo_merge *use_pseudo_merge(const struct pseudo_merge_map *pm,
+				      struct pseudo_merge *merge)
+{
+	if (!merge->loaded_commits) {
+		size_t pos = merge->at;
+
+		merge->commits = read_bitmap(pm->map, pm->map_size, &pos);
+		merge->bitmap_at = pos;
+		merge->loaded_commits = 1;
+	}
+	return merge;
+}
+
+static struct pseudo_merge *pseudo_merge_at(const struct pseudo_merge_map *pm,
+					    struct object_id *oid,
+					    size_t want)
+{
+	size_t lo = 0;
+	size_t hi = pm->nr;
+
+	while (lo < hi) {
+		size_t mi = lo + (hi - lo) / 2;
+		size_t got = pm->v[mi].at;
+
+		if (got == want)
+			return use_pseudo_merge(pm, &pm->v[mi]);
+		else if (got < want)
+			hi = mi;
+		else
+			lo = mi + 1;
+	}
+
+	warning(_("could not find pseudo-merge for commit %s at offset %"PRIuMAX),
+		oid_to_hex(oid), (uintmax_t)want);
+
+	return NULL;
+}
+
+struct pseudo_merge_commit {
+	uint32_t commit_pos;
+	uint64_t pseudo_merge_ofs;
+};
+
+#define PSEUDO_MERGE_COMMIT_RAWSZ (sizeof(uint32_t)+sizeof(uint64_t))
+
+static void read_pseudo_merge_commit_at(struct pseudo_merge_commit *merge,
+					const unsigned char *at)
+{
+	merge->commit_pos = get_be32(at);
+	merge->pseudo_merge_ofs = get_be64(at + sizeof(uint32_t));
+}
+
+static int nth_pseudo_merge_ext(const struct pseudo_merge_map *pm,
+				struct pseudo_merge_commit_ext *ext,
+				struct pseudo_merge_commit *merge,
+				uint32_t n)
+{
+	size_t ofs;
+
+	if (n >= ext->nr)
+		return error(_("extended pseudo-merge lookup out-of-bounds "
+			       "(%"PRIu32" >= %"PRIu32")"), n, ext->nr);
+
+	ofs = get_be64(ext->ptr + st_mult(n, sizeof(uint64_t)));
+	if (ofs >= pm->map_size)
+		return error(_("out-of-bounds read: (%"PRIuMAX" >= %"PRIuMAX")"),
+			     (uintmax_t)ofs, (uintmax_t)pm->map_size);
+
+	read_pseudo_merge_commit_at(merge, pm->map + ofs);
+
+	return 0;
+}
+
+static unsigned apply_pseudo_merge(const struct pseudo_merge_map *pm,
+				   struct pseudo_merge *merge,
+				   struct bitmap *result,
+				   struct bitmap *roots)
+{
+	if (merge->satisfied)
+		return 0;
+
+	if (!ewah_bitmap_is_subset(merge->commits, roots ? roots : result))
+		return 0;
+
+	bitmap_or_ewah(result, pseudo_merge_bitmap(pm, merge));
+	if (roots)
+		bitmap_or_ewah(roots, pseudo_merge_bitmap(pm, merge));
+	merge->satisfied = 1;
+
+	return 1;
+}
+
+static int pseudo_merge_commit_cmp(const void *va, const void *vb)
+{
+	struct pseudo_merge_commit merge;
+	uint32_t key = *(uint32_t*)va;
+
+	read_pseudo_merge_commit_at(&merge, vb);
+
+	if (key < merge.commit_pos)
+		return -1;
+	if (key > merge.commit_pos)
+		return 1;
+	return 0;
+}
+
+static struct pseudo_merge_commit *find_pseudo_merge(const struct pseudo_merge_map *pm,
+						     uint32_t pos)
+{
+	if (!pm->commits_nr)
+		return NULL;
+
+	return bsearch(&pos, pm->commits, pm->commits_nr,
+		       PSEUDO_MERGE_COMMIT_RAWSZ, pseudo_merge_commit_cmp);
+}
+
+int apply_pseudo_merges_for_commit(const struct pseudo_merge_map *pm,
+				   struct bitmap *result,
+				   struct commit *commit, uint32_t commit_pos)
+{
+	struct pseudo_merge *merge;
+	struct pseudo_merge_commit *merge_commit;
+	int ret = 0;
+
+	merge_commit = find_pseudo_merge(pm, commit_pos);
+	if (!merge_commit)
+		return 0;
+
+	if (merge_commit->pseudo_merge_ofs & ((uint64_t)1<<63)) {
+		struct pseudo_merge_commit_ext ext = { 0 };
+		off_t ofs = merge_commit->pseudo_merge_ofs & ~((uint64_t)1<<63);
+		uint32_t i;
+
+		if (pseudo_merge_ext_at(pm, &ext, ofs) < -1) {
+			warning(_("could not read extended pseudo-merge table "
+				  "for commit %s"),
+				oid_to_hex(&commit->object.oid));
+			return ret;
+		}
+
+		for (i = 0; i < ext.nr; i++) {
+			if (nth_pseudo_merge_ext(pm, &ext, merge_commit, i) < 0)
+				return ret;
+
+			merge = pseudo_merge_at(pm, &commit->object.oid,
+						merge_commit->pseudo_merge_ofs);
+
+			if (!merge)
+				return ret;
+
+			if (apply_pseudo_merge(pm, merge, result, NULL))
+				ret++;
+		}
+	} else {
+		merge = pseudo_merge_at(pm, &commit->object.oid,
+					merge_commit->pseudo_merge_ofs);
+
+		if (!merge)
+			return ret;
+
+		if (apply_pseudo_merge(pm, merge, result, NULL))
+			ret++;
+	}
+
+	if (ret)
+		cascade_pseudo_merges(pm, result, NULL);
+
+	return ret;
+}
+
+int cascade_pseudo_merges(const struct pseudo_merge_map *pm,
+			  struct bitmap *result,
+			  struct bitmap *roots)
+{
+	unsigned any_satisfied;
+	int ret = 0;
+
+	do {
+		struct pseudo_merge *merge;
+		uint32_t i;
+
+		any_satisfied = 0;
+
+		for (i = 0; i < pm->nr; i++) {
+			merge = use_pseudo_merge(pm, &pm->v[i]);
+			if (apply_pseudo_merge(pm, merge, result, roots)) {
+				any_satisfied |= 1;
+				ret++;
+			}
+		}
+	} while (any_satisfied);
+
+	return ret;
+}
diff --git a/pseudo-merge.h b/pseudo-merge.h
index 2f652fc6767..cc14e947e86 100644
--- a/pseudo-merge.h
+++ b/pseudo-merge.h
@@ -164,4 +164,48 @@ struct pseudo_merge {
  */
 void free_pseudo_merge_map(struct pseudo_merge_map *pm);
 
+/*
+ * Loads the bitmap corresponding to the given pseudo-merge from the
+ * map, if it has not already been loaded.
+ */
+struct ewah_bitmap *pseudo_merge_bitmap(const struct pseudo_merge_map *pm,
+					struct pseudo_merge *merge);
+
+/*
+ * Loads the pseudo-merge and its commits bitmap from the given
+ * pseudo-merge map, if it has not already been loaded.
+ */
+struct pseudo_merge *use_pseudo_merge(const struct pseudo_merge_map *pm,
+				      struct pseudo_merge *merge);
+
+/*
+ * Applies pseudo-merge(s) containing the given commit to the bitmap
+ * "result".
+ *
+ * If any pseudo-merge(s) were satisfied, returns the number
+ * satisfied, otherwise returns 0. If any were satisfied, the
+ * remaining unsatisfied pseudo-merges are cascaded (see below).
+ */
+int apply_pseudo_merges_for_commit(const struct pseudo_merge_map *pm,
+				   struct bitmap *result,
+				   struct commit *commit, uint32_t commit_pos);
+
+/*
+ * Applies pseudo-merge(s) which are satisfied according to the
+ * current bitmap in result (or roots, see below). If any
+ * pseudo-merges were satisfied, repeat the process over unsatisfied
+ * pseudo-merge commits until no more pseudo-merges are satisfied.
+ *
+ * Result is the bitmap to which the pseudo-merge(s) are applied.
+ * Roots (if given) is a bitmap of the traversal tip(s) for either
+ * side of a reachability traversal.
+ *
+ * Roots may given instead of a populated results bitmap at the
+ * beginning of a traversal on either side where the reachability
+ * closure over tips is not yet known.
+ */
+int cascade_pseudo_merges(const struct pseudo_merge_map *pm,
+			  struct bitmap *result,
+			  struct bitmap *roots);
+
 #endif
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 16/23] ewah: implement `ewah_bitmap_popcount()`
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                     ` (14 preceding siblings ...)
  2024-04-29 20:44   ` [PATCH v2 15/23] pseudo-merge: implement support for reading pseudo-merge commits Taylor Blau
@ 2024-04-29 20:44   ` Taylor Blau
  2024-04-29 20:44   ` [PATCH v2 17/23] pack-bitmap: implement test helpers for pseudo-merge Taylor Blau
                     ` (7 subsequent siblings)
  23 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:44 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Some of the pseudo-merge test helpers (which will be introduced in the
following commit) will want to indicate the total number of commits in
or objects reachable from a pseudo-merge.

Implement a popcount() function that operates on EWAH bitmaps to quickly
determine how many bits are set in each of the respective bitmaps.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ewah/bitmap.c | 14 ++++++++++++++
 ewah/ewok.h   |  1 +
 2 files changed, 15 insertions(+)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index d352fec54ce..dc2ca190f12 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -212,6 +212,20 @@ size_t bitmap_popcount(struct bitmap *self)
 	return count;
 }
 
+size_t ewah_bitmap_popcount(struct ewah_bitmap *self)
+{
+	struct ewah_iterator it;
+	eword_t word;
+	size_t count = 0;
+
+	ewah_iterator_init(&it, self);
+
+	while (ewah_iterator_next(&word, &it))
+		count += ewah_bit_popcount64(word);
+
+	return count;
+}
+
 int bitmap_is_empty(struct bitmap *self)
 {
 	size_t i;
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 2b6c4ac499c..7074a6347b7 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -195,6 +195,7 @@ void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other);
 void bitmap_or(struct bitmap *self, const struct bitmap *other);
 
 size_t bitmap_popcount(struct bitmap *self);
+size_t ewah_bitmap_popcount(struct ewah_bitmap *self);
 int bitmap_is_empty(struct bitmap *self);
 
 #endif
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 17/23] pack-bitmap: implement test helpers for pseudo-merge
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                     ` (15 preceding siblings ...)
  2024-04-29 20:44   ` [PATCH v2 16/23] ewah: implement `ewah_bitmap_popcount()` Taylor Blau
@ 2024-04-29 20:44   ` Taylor Blau
  2024-04-29 20:44   ` [PATCH v2 18/23] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` Taylor Blau
                     ` (6 subsequent siblings)
  23 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:44 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Implement three new sub-commands for the "bitmap" test-helper:

  - t/helper test-tool bitmap dump-pseudo-merges
  - t/helper test-tool bitmap dump-pseudo-merge-commits <n>
  - t/helper test-tool bitmap dump-pseudo-merge-objects <n>

These three helpers dump the list of pseudo merges, the "parents" of the
nth pseudo-merges, and the set of objects reachable from those parents,
respectively.

These helpers will be useful in subsequent patches when we add test
coverage for pseudo-merge bitmaps.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c          | 126 +++++++++++++++++++++++++++++++++++++++++
 pack-bitmap.h          |   3 +
 t/helper/test-bitmap.c |  34 ++++++++---
 3 files changed, 156 insertions(+), 7 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index fc9c3e2fc43..c13074673af 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2443,6 +2443,132 @@ int test_bitmap_hashes(struct repository *r)
 	return 0;
 }
 
+static void bit_pos_to_object_id(struct bitmap_index *bitmap_git,
+				 uint32_t bit_pos,
+				 struct object_id *oid)
+{
+	uint32_t index_pos;
+
+	if (bitmap_is_midx(bitmap_git))
+		index_pos = pack_pos_to_midx(bitmap_git->midx, bit_pos);
+	else
+		index_pos = pack_pos_to_index(bitmap_git->pack, bit_pos);
+
+	nth_bitmap_object_oid(bitmap_git, oid, index_pos);
+}
+
+int test_bitmap_pseudo_merges(struct repository *r)
+{
+	struct bitmap_index *bitmap_git;
+	uint32_t i;
+
+	bitmap_git = prepare_bitmap_git(r);
+	if (!bitmap_git || !bitmap_git->pseudo_merges.nr)
+		goto cleanup;
+
+	for (i = 0; i < bitmap_git->pseudo_merges.nr; i++) {
+		struct pseudo_merge *merge;
+		struct ewah_bitmap *commits_bitmap, *merge_bitmap;
+
+		merge = use_pseudo_merge(&bitmap_git->pseudo_merges,
+					 &bitmap_git->pseudo_merges.v[i]);
+		commits_bitmap = merge->commits;
+		merge_bitmap = pseudo_merge_bitmap(&bitmap_git->pseudo_merges,
+						   merge);
+
+		printf("at=%"PRIuMAX", commits=%"PRIuMAX", objects=%"PRIuMAX"\n",
+		       (uintmax_t)merge->at,
+		       (uintmax_t)ewah_bitmap_popcount(commits_bitmap),
+		       (uintmax_t)ewah_bitmap_popcount(merge_bitmap));
+	}
+
+cleanup:
+	free_bitmap_index(bitmap_git);
+	return 0;
+}
+
+static void dump_ewah_object_ids(struct bitmap_index *bitmap_git,
+				 struct ewah_bitmap *bitmap)
+
+{
+	struct ewah_iterator it;
+	eword_t word;
+	uint32_t pos = 0;
+
+	ewah_iterator_init(&it, bitmap);
+
+	while (ewah_iterator_next(&word, &it)) {
+		struct object_id oid;
+		uint32_t offset;
+
+		for (offset = 0; offset < BITS_IN_EWORD; offset++) {
+			if (!(word >> offset))
+				break;
+
+			offset += ewah_bit_ctz64(word >> offset);
+
+			bit_pos_to_object_id(bitmap_git, pos + offset, &oid);
+			printf("%s\n", oid_to_hex(&oid));
+		}
+		pos += BITS_IN_EWORD;
+	}
+}
+
+int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n)
+{
+	struct bitmap_index *bitmap_git;
+	struct pseudo_merge *merge;
+	int ret = 0;
+
+	bitmap_git = prepare_bitmap_git(r);
+	if (!bitmap_git || !bitmap_git->pseudo_merges.nr)
+		goto cleanup;
+
+	if (n >= bitmap_git->pseudo_merges.nr) {
+		ret = error(_("pseudo-merge index out of range "
+			      "(%"PRIu32" >= %"PRIuMAX")"),
+			    n, (uintmax_t)bitmap_git->pseudo_merges.nr);
+		goto cleanup;
+	}
+
+	merge = use_pseudo_merge(&bitmap_git->pseudo_merges,
+				 &bitmap_git->pseudo_merges.v[n]);
+	dump_ewah_object_ids(bitmap_git, merge->commits);
+
+cleanup:
+	free_bitmap_index(bitmap_git);
+	return ret;
+}
+
+int test_bitmap_pseudo_merge_objects(struct repository *r, uint32_t n)
+{
+	struct bitmap_index *bitmap_git;
+	struct pseudo_merge *merge;
+	int ret = 0;
+
+	bitmap_git = prepare_bitmap_git(r);
+	if (!bitmap_git || !bitmap_git->pseudo_merges.nr)
+		goto cleanup;
+
+	if (n >= bitmap_git->pseudo_merges.nr) {
+		ret = error(_("pseudo-merge index out of range "
+			      "(%"PRIu32" >= %"PRIuMAX")"),
+			    n, (uintmax_t)bitmap_git->pseudo_merges.nr);
+		goto cleanup;
+	}
+
+	merge = use_pseudo_merge(&bitmap_git->pseudo_merges,
+				 &bitmap_git->pseudo_merges.v[n]);
+
+	dump_ewah_object_ids(bitmap_git,
+			     pseudo_merge_bitmap(&bitmap_git->pseudo_merges,
+						 merge));
+
+cleanup:
+	free_bitmap_index(bitmap_git);
+	return ret;
+}
+
 int rebuild_bitmap(const uint32_t *reposition,
 		   struct ewah_bitmap *source,
 		   struct bitmap *dest)
diff --git a/pack-bitmap.h b/pack-bitmap.h
index a5fe4f305ef..25d3b8e604a 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -73,6 +73,9 @@ void traverse_bitmap_commit_list(struct bitmap_index *,
 void test_bitmap_walk(struct rev_info *revs);
 int test_bitmap_commits(struct repository *r);
 int test_bitmap_hashes(struct repository *r);
+int test_bitmap_pseudo_merges(struct repository *r);
+int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n);
+int test_bitmap_pseudo_merge_objects(struct repository *r, uint32_t n);
 
 #define GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL \
 	"GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL"
diff --git a/t/helper/test-bitmap.c b/t/helper/test-bitmap.c
index af43ee1cb5e..6af2b42678f 100644
--- a/t/helper/test-bitmap.c
+++ b/t/helper/test-bitmap.c
@@ -13,21 +13,41 @@ static int bitmap_dump_hashes(void)
 	return test_bitmap_hashes(the_repository);
 }
 
+static int bitmap_dump_pseudo_merges(void)
+{
+	return test_bitmap_pseudo_merges(the_repository);
+}
+
+static int bitmap_dump_pseudo_merge_commits(uint32_t n)
+{
+	return test_bitmap_pseudo_merge_commits(the_repository, n);
+}
+
+static int bitmap_dump_pseudo_merge_objects(uint32_t n)
+{
+	return test_bitmap_pseudo_merge_objects(the_repository, n);
+}
+
 int cmd__bitmap(int argc, const char **argv)
 {
 	setup_git_directory();
 
-	if (argc != 2)
-		goto usage;
-
-	if (!strcmp(argv[1], "list-commits"))
+	if (argc == 2 && !strcmp(argv[1], "list-commits"))
 		return bitmap_list_commits();
-	if (!strcmp(argv[1], "dump-hashes"))
+	if (argc == 2 && !strcmp(argv[1], "dump-hashes"))
 		return bitmap_dump_hashes();
+	if (argc == 2 && !strcmp(argv[1], "dump-pseudo-merges"))
+		return bitmap_dump_pseudo_merges();
+	if (argc == 3 && !strcmp(argv[1], "dump-pseudo-merge-commits"))
+		return bitmap_dump_pseudo_merge_commits(atoi(argv[2]));
+	if (argc == 3 && !strcmp(argv[1], "dump-pseudo-merge-objects"))
+		return bitmap_dump_pseudo_merge_objects(atoi(argv[2]));
 
-usage:
 	usage("\ttest-tool bitmap list-commits\n"
-	      "\ttest-tool bitmap dump-hashes");
+	      "\ttest-tool bitmap dump-hashes\n"
+	      "\ttest-tool bitmap dump-pseudo-merges\n"
+	      "\ttest-tool bitmap dump-pseudo-merge-commits <n>\n"
+	      "\ttest-tool bitmap dump-pseudo-merge-objects <n>");
 
 	return -1;
 }
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 18/23] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()`
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                     ` (16 preceding siblings ...)
  2024-04-29 20:44   ` [PATCH v2 17/23] pack-bitmap: implement test helpers for pseudo-merge Taylor Blau
@ 2024-04-29 20:44   ` Taylor Blau
  2024-04-29 20:44   ` [PATCH v2 19/23] pack-bitmap.c: use pseudo-merges during traversal Taylor Blau
                     ` (5 subsequent siblings)
  23 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:44 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

One of the tests we'll want to add for pseudo-merge bitmaps needs to be
able to generate a large number of commits at a specific date.

Support the `--date` option (with identical semantics to the `--date`
option for `test_commit()`) within `test_commit_bulk` as a prerequisite
for that.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/test-lib-functions.sh | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 862d80c9748..16fd585e34b 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -458,6 +458,7 @@ test_commit_bulk () {
 	indir=.
 	ref=HEAD
 	n=1
+	notick=
 	message='commit %s'
 	filename='%s.t'
 	contents='content %s'
@@ -488,6 +489,12 @@ test_commit_bulk () {
 			filename="${1#--*=}-%s.t"
 			contents="${1#--*=} %s"
 			;;
+		--date)
+			notick=yes
+			GIT_COMMITTER_DATE="$2"
+			GIT_AUTHOR_DATE="$2"
+			shift
+			;;
 		-*)
 			BUG "invalid test_commit_bulk option: $1"
 			;;
@@ -507,7 +514,10 @@ test_commit_bulk () {
 
 	while test "$total" -gt 0
 	do
-		test_tick &&
+		if test -z "$notick"
+		then
+			test_tick
+		fi &&
 		echo "commit $ref"
 		printf 'author %s <%s> %s\n' \
 			"$GIT_AUTHOR_NAME" \
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 19/23] pack-bitmap.c: use pseudo-merges during traversal
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                     ` (17 preceding siblings ...)
  2024-04-29 20:44   ` [PATCH v2 18/23] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` Taylor Blau
@ 2024-04-29 20:44   ` Taylor Blau
  2024-04-29 20:44   ` [PATCH v2 20/23] pack-bitmap: extra trace2 information Taylor Blau
                     ` (4 subsequent siblings)
  23 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:44 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Now that all of the groundwork has been laid to support reading and
using pseudo-merges, make use of that work in this commit by teaching
the pack-bitmap machinery to use pseudo-merge(s) when available during
traversal.

The basic operation is as follows:

  - When enumerating objects on either side of a reachability query,
    first see if any subset of the roots satisfies some pseudo-merge
    bitmap. If it does, apply that pseudo-merge bitmap.

  - If any pseudo-merge bitmap(s) were applied in the previous step, OR
    them into the result[^1]. Then repeat the process over all
    pseudo-merge bitmaps (we'll refer to this as "cascading"
    pseudo-merges). Once this is done, OR in the resulting bitmap.

  - If there is no fill-in traversal to be done, return the bitmap for
    that side of the reachability query. If there is fill-in traversal,
    then for each commit we encounter via show_commit(), check to see if
    any unsatisfied pseudo-merges containing that commit as one of its
    parents has been made satisfied by the presence of that commit.

    If so, OR in the object set from that pseudo-merge bitmap, and then
    cascade. If not, continue traversal.

A similar implementation is present in the boundary-based bitmap
traversal routines.

[^1]: Importantly, we cannot OR in the entire set of roots along with
  the objects reachable from whatever pseudo-merge bitmaps were
  satisfied.  This may leave some dangling bits corresponding to any
  unsatisfied root(s) getting OR'd into the resulting bitmap, tricking
  other parts of the traversal into thinking we already have a
  reachability closure over those commit(s) when we do not.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c                   | 112 ++++++++++-
 t/t5333-pseudo-merge-bitmaps.sh | 325 ++++++++++++++++++++++++++++++++
 2 files changed, 436 insertions(+), 1 deletion(-)
 create mode 100755 t/t5333-pseudo-merge-bitmaps.sh

diff --git a/pack-bitmap.c b/pack-bitmap.c
index c13074673af..e61058dada6 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -114,6 +114,9 @@ struct bitmap_index {
 	unsigned int version;
 };
 
+static int pseudo_merges_satisfied_nr;
+static int pseudo_merges_cascades_nr;
+
 static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
 {
 	struct ewah_bitmap *parent;
@@ -1006,6 +1009,22 @@ static void show_commit(struct commit *commit UNUSED,
 {
 }
 
+static unsigned apply_pseudo_merges_for_commit_1(struct bitmap_index *bitmap_git,
+						 struct bitmap *result,
+						 struct commit *commit,
+						 uint32_t commit_pos)
+{
+	int ret;
+
+	ret = apply_pseudo_merges_for_commit(&bitmap_git->pseudo_merges,
+					     result, commit, commit_pos);
+
+	if (ret)
+		pseudo_merges_satisfied_nr += ret;
+
+	return ret;
+}
+
 static int add_to_include_set(struct bitmap_index *bitmap_git,
 			      struct include_data *data,
 			      struct commit *commit,
@@ -1026,6 +1045,10 @@ static int add_to_include_set(struct bitmap_index *bitmap_git,
 	}
 
 	bitmap_set(data->base, bitmap_pos);
+	if (apply_pseudo_merges_for_commit_1(bitmap_git, data->base, commit,
+					     bitmap_pos))
+		return 0;
+
 	return 1;
 }
 
@@ -1151,6 +1174,20 @@ static void show_boundary_object(struct object *object UNUSED,
 	BUG("should not be called");
 }
 
+static unsigned cascade_pseudo_merges_1(struct bitmap_index *bitmap_git,
+					struct bitmap *result,
+					struct bitmap *roots)
+{
+	int ret = cascade_pseudo_merges(&bitmap_git->pseudo_merges,
+					result, roots);
+	if (ret) {
+		pseudo_merges_cascades_nr++;
+		pseudo_merges_satisfied_nr += ret;
+	}
+
+	return ret;
+}
+
 static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 					    struct rev_info *revs,
 					    struct object_list *roots)
@@ -1160,6 +1197,7 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 	unsigned int i;
 	unsigned int tmp_blobs, tmp_trees, tmp_tags;
 	int any_missing = 0;
+	int existing_bitmaps = 0;
 
 	cb.bitmap_git = bitmap_git;
 	cb.base = bitmap_new();
@@ -1167,6 +1205,25 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 
 	revs->ignore_missing_links = 1;
 
+	if (bitmap_git->pseudo_merges.nr) {
+		struct bitmap *roots_bitmap = bitmap_new();
+		struct object_list *objects = NULL;
+
+		for (objects = roots; objects; objects = objects->next) {
+			struct object *object = objects->item;
+			int pos;
+
+			pos = bitmap_position(bitmap_git, &object->oid);
+			if (pos < 0)
+				continue;
+
+			bitmap_set(roots_bitmap, pos);
+		}
+
+		if (!cascade_pseudo_merges_1(bitmap_git, cb.base, roots_bitmap))
+			bitmap_free(roots_bitmap);
+	}
+
 	/*
 	 * OR in any existing reachability bitmaps among `roots` into
 	 * `cb.base`.
@@ -1178,8 +1235,10 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 			continue;
 
 		if (add_commit_to_bitmap(bitmap_git, &cb.base,
-					 (struct commit *)object))
+					 (struct commit *)object)) {
+			existing_bitmaps = 1;
 			continue;
+		}
 
 		any_missing = 1;
 	}
@@ -1187,6 +1246,9 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 	if (!any_missing)
 		goto cleanup;
 
+	if (existing_bitmaps)
+		cascade_pseudo_merges_1(bitmap_git, cb.base, NULL);
+
 	tmp_blobs = revs->blob_objects;
 	tmp_trees = revs->tree_objects;
 	tmp_tags = revs->blob_objects;
@@ -1242,6 +1304,13 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 	return cb.base;
 }
 
+static void unsatisfy_all_pseudo_merges(struct bitmap_index *bitmap_git)
+{
+	uint32_t i;
+	for (i = 0; i < bitmap_git->pseudo_merges.nr; i++)
+		bitmap_git->pseudo_merges.v[i].satisfied = 0;
+}
+
 static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 				   struct rev_info *revs,
 				   struct object_list *roots,
@@ -1249,9 +1318,32 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 {
 	struct bitmap *base = NULL;
 	int needs_walk = 0;
+	unsigned existing_bitmaps = 0;
 
 	struct object_list *not_mapped = NULL;
 
+	unsatisfy_all_pseudo_merges(bitmap_git);
+
+	if (bitmap_git->pseudo_merges.nr) {
+		struct bitmap *roots_bitmap = bitmap_new();
+		struct object_list *objects = NULL;
+
+		for (objects = roots; objects; objects = objects->next) {
+			struct object *object = objects->item;
+			int pos;
+
+			pos = bitmap_position(bitmap_git, &object->oid);
+			if (pos < 0)
+				continue;
+
+			bitmap_set(roots_bitmap, pos);
+		}
+
+		base = bitmap_new();
+		if (!cascade_pseudo_merges_1(bitmap_git, base, roots_bitmap))
+			bitmap_free(roots_bitmap);
+	}
+
 	/*
 	 * Go through all the roots for the walk. The ones that have bitmaps
 	 * on the bitmap index will be `or`ed together to form an initial
@@ -1262,11 +1354,21 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 	 */
 	while (roots) {
 		struct object *object = roots->item;
+
 		roots = roots->next;
 
+		if (base) {
+			int pos = bitmap_position(bitmap_git, &object->oid);
+			if (pos > 0 && bitmap_get(base, pos)) {
+				object->flags |= SEEN;
+				continue;
+			}
+		}
+
 		if (object->type == OBJ_COMMIT &&
 		    add_commit_to_bitmap(bitmap_git, &base, (struct commit *)object)) {
 			object->flags |= SEEN;
+			existing_bitmaps = 1;
 			continue;
 		}
 
@@ -1282,6 +1384,9 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 
 	roots = not_mapped;
 
+	if (existing_bitmaps)
+		cascade_pseudo_merges_1(bitmap_git, base, NULL);
+
 	/*
 	 * Let's iterate through all the roots that don't have bitmaps to
 	 * check if we can determine them to be reachable from the existing
@@ -1866,6 +1971,11 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	object_list_free(&wants);
 	object_list_free(&haves);
 
+	trace2_data_intmax("bitmap", the_repository, "pseudo_merges_satisfied",
+			   pseudo_merges_satisfied_nr);
+	trace2_data_intmax("bitmap", the_repository, "pseudo_merges_cascades",
+			   pseudo_merges_cascades_nr);
+
 	return bitmap_git;
 
 cleanup:
diff --git a/t/t5333-pseudo-merge-bitmaps.sh b/t/t5333-pseudo-merge-bitmaps.sh
new file mode 100755
index 00000000000..909c17e301e
--- /dev/null
+++ b/t/t5333-pseudo-merge-bitmaps.sh
@@ -0,0 +1,325 @@
+#!/bin/sh
+
+test_description='pseudo-merge bitmaps'
+
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
+. ./test-lib.sh
+
+test_pseudo_merges () {
+	test-tool bitmap dump-pseudo-merges
+}
+
+test_pseudo_merge_commits () {
+	test-tool bitmap dump-pseudo-merge-commits "$1"
+}
+
+test_pseudo_merges_satisfied () {
+	test_trace2_data bitmap pseudo_merges_satisfied "$1"
+}
+
+test_pseudo_merges_cascades () {
+	test_trace2_data bitmap pseudo_merges_cascades "$1"
+}
+
+tag_everything () {
+	git rev-list --all --no-object-names >in &&
+	perl -lne '
+		print "create refs/tags/" . $. . " " . $1 if /([0-9a-f]+)/
+	' <in | git update-ref --stdin
+}
+
+test_expect_success 'setup' '
+	test_commit_bulk 512 &&
+	tag_everything
+'
+
+test_expect_success 'bitmap traversal without pseudo-merges' '
+	git repack -adb &&
+
+	git rev-list --count --all --objects >expect &&
+
+	: >trace2.txt &&
+	GIT_TRACE2_EVENT=$PWD/trace2.txt \
+		git rev-list --count --all --objects --use-bitmap-index >actual &&
+
+	test_pseudo_merges_satisfied 0 <trace2.txt &&
+	test_pseudo_merges_cascades 0 <trace2.txt &&
+	test_pseudo_merges >merges &&
+	test_must_be_empty merges &&
+	test_cmp expect actual
+'
+
+test_expect_success 'pseudo-merges accurately represent their objects' '
+	test_config bitmapPseudoMerge.test.pattern "refs/tags/" &&
+	test_config bitmapPseudoMerge.test.maxMerges 8 &&
+	test_config bitmapPseudoMerge.test.stableThreshold never &&
+
+	git repack -adb &&
+
+	test_pseudo_merges >merges &&
+	test_line_count = 8 merges &&
+
+	for i in $(test_seq 0 $(($(wc -l <merges)-1)))
+	do
+		test-tool bitmap dump-pseudo-merge-commits $i >commits &&
+
+		git rev-list --objects --no-object-names --stdin <commits >expect.raw &&
+		test-tool bitmap dump-pseudo-merge-objects $i >actual.raw &&
+
+		sort -u <expect.raw >expect &&
+		sort -u <actual.raw >actual &&
+
+		test_cmp expect actual || return 1
+	done
+'
+
+test_expect_success 'bitmap traversal with pseudo-merges' '
+	: >trace2.txt &&
+	GIT_TRACE2_EVENT=$PWD/trace2.txt \
+		git rev-list --count --all --objects --use-bitmap-index >actual &&
+	git rev-list --count --all --objects >expect &&
+
+	test_pseudo_merges_satisfied 8 <trace2.txt &&
+	test_pseudo_merges_cascades 1 <trace2.txt &&
+	test_cmp expect actual
+'
+
+test_expect_success 'stale bitmap traversal with pseudo-merges' '
+	test_commit other &&
+
+	: >trace2.txt &&
+	GIT_TRACE2_EVENT=$PWD/trace2.txt \
+		git rev-list --count --all --objects --use-bitmap-index >actual &&
+	git rev-list --count --all --objects >expect &&
+
+	test_pseudo_merges_satisfied 8 <trace2.txt &&
+	test_pseudo_merges_cascades 1 <trace2.txt &&
+	test_cmp expect actual
+'
+
+test_expect_success 'bitmapPseudoMerge.sampleRate adjusts commit selection rate' '
+	test_config bitmapPseudoMerge.test.pattern "refs/tags/" &&
+	test_config bitmapPseudoMerge.test.maxMerges 8 &&
+	test_config bitmapPseudoMerge.test.stableThreshold never &&
+
+	commits_nr=$(git rev-list --all --count) &&
+
+	for rate in 100 50 10
+	do
+		git -c bitmapPseudoMerge.test.sampleRate=$rate repack -adb &&
+
+		test_pseudo_merges >merges &&
+		for i in $(test_seq 0 $(($(wc -l <merges)-1)))
+		do
+			test_pseudo_merge_commits $i || return 1
+		done >commits &&
+
+		test-tool bitmap list-commits >bitmaps &&
+		bitmaps_nr="$(wc -l <bitmaps)" &&
+
+		perl -MPOSIX -e "print ceil((\$ARGV[0]/100)*(\$ARGV[1]-\$ARGV[2]))" \
+			"$rate" "$commits_nr" "$bitmaps_nr" >expect &&
+
+		test $(cat expect) -eq $(wc -l <commits) || return 1
+	done
+'
+
+test_expect_success 'bitmapPseudoMerge.threshold excludes newer commits' '
+	git init pseudo-merge-threshold &&
+	(
+		cd pseudo-merge-threshold &&
+
+		new="1672549200" && # 2023-01-01
+		old="1641013200" && # 2022-01-01
+
+		test_commit_bulk --message="new" --date "$new +0000" 128 &&
+		test_commit_bulk --message="old" --date "$old +0000" 128 &&
+		test_tick &&
+
+		tag_everything &&
+
+		git \
+			-c bitmapPseudoMerge.test.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.test.maxMerges=1 \
+			-c bitmapPseudoMerge.test.threshold=$(($new - 1)) \
+			-c bitmapPseudoMerge.test.stableThreshold=never \
+			repack -adb &&
+
+		test_pseudo_merges >merges &&
+		test_line_count = 1 merges &&
+
+		test_pseudo_merge_commits 0 >oids &&
+		git cat-file --batch <oids >commits &&
+
+		test $(wc -l <oids) = $(grep -c "^committer.*$old +0000$" commits)
+	)
+'
+
+test_expect_success 'bitmapPseudoMerge.stableThreshold creates stable groups' '
+	(
+		cd pseudo-merge-threshold &&
+
+		new="1672549200" && # 2023-01-01
+		mid="1654059600" && # 2022-06-01
+		old="1641013200" && # 2022-01-01
+
+		test_commit_bulk --message="mid" --date "$mid +0000" 128 &&
+		test_tick &&
+
+		git for-each-ref --format="delete %(refname)" refs/tags >in &&
+		git update-ref --stdin <in &&
+
+		tag_everything &&
+
+		git \
+			-c bitmapPseudoMerge.test.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.test.maxMerges=1 \
+			-c bitmapPseudoMerge.test.threshold=$(($new - 1)) \
+			-c bitmapPseudoMerge.test.stableThreshold=$(($mid - 1)) \
+			-c bitmapPseudoMerge.test.stableSize=10 \
+			repack -adb &&
+
+		test_pseudo_merges >merges &&
+		merges_nr="$(wc -l <merges)" &&
+
+		for i in $(test_seq $(($merges_nr - 1)))
+		do
+			test_pseudo_merge_commits 0 >oids &&
+			git cat-file --batch <oids >commits &&
+
+			expect="$(grep -c "^committer.*$old +0000$" commits)" &&
+			actual="$(wc -l <oids)" &&
+
+			test $expect = $actual || return 1
+		done &&
+
+		test_pseudo_merge_commits $(($merges_nr - 1)) >oids &&
+		git cat-file --batch <oids >commits &&
+		test $(wc -l <oids) = $(grep -c "^committer.*$mid +0000$" commits)
+	)
+'
+
+test_expect_success 'out of order thresholds are rejected' '
+	test_must_fail git \
+		-c bitmapPseudoMerge.test.pattern="refs/*" \
+		-c bitmapPseudoMerge.test.threshold=1.month.ago \
+		-c bitmapPseudoMerge.test.stableThreshold=1.week.ago \
+		repack -adb 2>err &&
+
+	cat >expect <<-EOF &&
+	fatal: pseudo-merge group ${SQ}test${SQ} has unstable threshold before stable one
+	EOF
+
+	test_cmp expect err
+'
+
+test_expect_success 'pseudo-merge pattern with capture groups' '
+	git init pseudo-merge-captures &&
+	(
+		cd pseudo-merge-captures &&
+
+		test_commit_bulk 128 &&
+		tag_everything &&
+
+		for r in $(test_seq 8)
+		do
+			test_commit_bulk 16 &&
+
+			git rev-list HEAD~16.. >in &&
+
+			perl -lne "print \"create refs/remotes/$r/tags/\$. \$_\"" <in |
+			git update-ref --stdin || return 1
+		done &&
+
+		git \
+			-c bitmapPseudoMerge.tags.pattern="refs/remotes/([0-9]+)/tags/" \
+			-c bitmapPseudoMerge.tags.maxMerges=1 \
+			repack -adb &&
+
+		git for-each-ref --format="%(objectname) %(refname)" >refs &&
+
+		test_pseudo_merges >merges &&
+		for m in $(test_seq 0 $(($(wc -l <merges) - 1)))
+		do
+			test_pseudo_merge_commits $m >oids &&
+			grep -f oids refs |
+			perl -lne "print \$1 if /refs\/remotes\/([0-9]+)/" |
+			sort -u || return 1
+		done >remotes &&
+
+		test $(wc -l <remotes) -eq $(sort -u <remotes | wc -l)
+	)
+'
+
+test_expect_success 'pseudo-merge overlap setup' '
+	git init pseudo-merge-overlap &&
+	(
+		cd pseudo-merge-overlap &&
+
+		test_commit_bulk 256 &&
+		tag_everything &&
+
+		git \
+			-c bitmapPseudoMerge.all.pattern="refs/" \
+			-c bitmapPseudoMerge.all.maxMerges=1 \
+			-c bitmapPseudoMerge.all.stableThreshold=never \
+			-c bitmapPseudoMerge.tags.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.tags.maxMerges=1 \
+			-c bitmapPseudoMerge.tags.stableThreshold=never \
+			repack -adb
+	)
+'
+
+test_expect_success 'pseudo-merge overlap generates overlapping groups' '
+	(
+		cd pseudo-merge-overlap &&
+
+		test_pseudo_merges >merges &&
+		test_line_count = 2 merges &&
+
+		test_pseudo_merge_commits 0 >commits-0.raw &&
+		test_pseudo_merge_commits 1 >commits-1.raw &&
+
+		sort commits-0.raw >commits-0 &&
+		sort commits-1.raw >commits-1 &&
+
+		comm -12 commits-0 commits-1 >overlap &&
+
+		test_line_count -gt 0 overlap
+	)
+'
+
+test_expect_success 'pseudo-merge overlap traversal' '
+	(
+		cd pseudo-merge-overlap &&
+
+		: >trace2.txt &&
+		GIT_TRACE2_EVENT=$PWD/trace2.txt \
+			git rev-list --count --all --objects --use-bitmap-index >actual &&
+		git rev-list --count --all --objects >expect &&
+
+		test_pseudo_merges_satisfied 2 <trace2.txt &&
+		test_pseudo_merges_cascades 1 <trace2.txt &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'pseudo-merge overlap stale traversal' '
+	(
+		cd pseudo-merge-overlap &&
+
+		test_commit other &&
+
+		: >trace2.txt &&
+		GIT_TRACE2_EVENT=$PWD/trace2.txt \
+			git rev-list --count --all --objects --use-bitmap-index >actual &&
+		git rev-list --count --all --objects >expect &&
+
+		test_pseudo_merges_satisfied 2 <trace2.txt &&
+		test_pseudo_merges_cascades 1 <trace2.txt &&
+		test_cmp expect actual
+	)
+'
+
+test_done
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 20/23] pack-bitmap: extra trace2 information
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                     ` (18 preceding siblings ...)
  2024-04-29 20:44   ` [PATCH v2 19/23] pack-bitmap.c: use pseudo-merges during traversal Taylor Blau
@ 2024-04-29 20:44   ` Taylor Blau
  2024-04-29 20:44   ` [PATCH v2 21/23] ewah: `bitmap_equals_ewah()` Taylor Blau
                     ` (3 subsequent siblings)
  23 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:44 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Add some extra trace2 lines to capture the number of bitmap lookups that
are hits versus misses, as well as the number of reachability roots that
have bitmap coverage (versus those that do not).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index e61058dada6..1966b3b95f1 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -116,6 +116,10 @@ struct bitmap_index {
 
 static int pseudo_merges_satisfied_nr;
 static int pseudo_merges_cascades_nr;
+static int existing_bitmaps_hits_nr;
+static int existing_bitmaps_misses_nr;
+static int roots_with_bitmaps_nr;
+static int roots_without_bitmaps_nr;
 
 static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
 {
@@ -1040,10 +1044,14 @@ static int add_to_include_set(struct bitmap_index *bitmap_git,
 
 	partial = bitmap_for_commit(bitmap_git, commit);
 	if (partial) {
+		existing_bitmaps_hits_nr++;
+
 		bitmap_or_ewah(data->base, partial);
 		return 0;
 	}
 
+	existing_bitmaps_misses_nr++;
+
 	bitmap_set(data->base, bitmap_pos);
 	if (apply_pseudo_merges_for_commit_1(bitmap_git, data->base, commit,
 					     bitmap_pos))
@@ -1099,8 +1107,12 @@ static int add_commit_to_bitmap(struct bitmap_index *bitmap_git,
 {
 	struct ewah_bitmap *or_with = bitmap_for_commit(bitmap_git, commit);
 
-	if (!or_with)
+	if (!or_with) {
+		existing_bitmaps_misses_nr++;
 		return 0;
+	}
+
+	existing_bitmaps_hits_nr++;
 
 	if (!*base)
 		*base = ewah_to_bitmap(or_with);
@@ -1407,8 +1419,12 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 			object->flags &= ~UNINTERESTING;
 			add_pending_object(revs, object, "");
 			needs_walk = 1;
+
+			roots_without_bitmaps_nr++;
 		} else {
 			object->flags |= SEEN;
+
+			roots_with_bitmaps_nr++;
 		}
 	}
 
@@ -1975,6 +1991,14 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 			   pseudo_merges_satisfied_nr);
 	trace2_data_intmax("bitmap", the_repository, "pseudo_merges_cascades",
 			   pseudo_merges_cascades_nr);
+	trace2_data_intmax("bitmap", the_repository, "bitmap/hits",
+			   existing_bitmaps_hits_nr);
+	trace2_data_intmax("bitmap", the_repository, "bitmap/misses",
+			   existing_bitmaps_misses_nr);
+	trace2_data_intmax("bitmap", the_repository, "bitmap/roots_with_bitmap",
+			   roots_with_bitmaps_nr);
+	trace2_data_intmax("bitmap", the_repository, "bitmap/roots_without_bitmap",
+			   roots_without_bitmaps_nr);
 
 	return bitmap_git;
 
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 21/23] ewah: `bitmap_equals_ewah()`
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                     ` (19 preceding siblings ...)
  2024-04-29 20:44   ` [PATCH v2 20/23] pack-bitmap: extra trace2 information Taylor Blau
@ 2024-04-29 20:44   ` Taylor Blau
  2024-04-29 20:44   ` [PATCH v2 22/23] pseudo-merge: implement support for finding existing merges Taylor Blau
                     ` (2 subsequent siblings)
  23 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:44 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Prepare to reuse existing pseudo-merge bitmaps by implementing a
`bitmap_equals_ewah()` helper.

This helper will be used to see if a raw bitmap (containing the set of
parents for some pseudo-merge) is equal to any existing pseudo-merge's
commits bitmap (which are stored as EWAH-compressed bitmaps on disk).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ewah/bitmap.c | 19 +++++++++++++++++++
 ewah/ewok.h   |  1 +
 2 files changed, 20 insertions(+)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index dc2ca190f12..55928dada86 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -261,6 +261,25 @@ int bitmap_equals(struct bitmap *self, struct bitmap *other)
 	return 1;
 }
 
+int bitmap_equals_ewah(struct bitmap *self, struct ewah_bitmap *other)
+{
+	struct ewah_iterator it;
+	eword_t word;
+	size_t i = 0;
+
+	ewah_iterator_init(&it, other);
+
+	while (ewah_iterator_next(&word, &it))
+		if (word != (i < self->word_alloc ? self->words[i++] : 0))
+			return 0;
+
+	for (; i < self->word_alloc; i++)
+		if (self->words[i])
+			return 0;
+
+	return 1;
+}
+
 int bitmap_is_subset(struct bitmap *self, struct bitmap *other)
 {
 	size_t common_size, i;
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 7074a6347b7..5e357e24933 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -179,6 +179,7 @@ void bitmap_unset(struct bitmap *self, size_t pos);
 int bitmap_get(struct bitmap *self, size_t pos);
 void bitmap_free(struct bitmap *self);
 int bitmap_equals(struct bitmap *self, struct bitmap *other);
+int bitmap_equals_ewah(struct bitmap *self, struct ewah_bitmap *other);
 
 /*
  * Both `bitmap_is_subset()` and `ewah_bitmap_is_subset()` return 1 if the set
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 22/23] pseudo-merge: implement support for finding existing merges
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                     ` (20 preceding siblings ...)
  2024-04-29 20:44   ` [PATCH v2 21/23] ewah: `bitmap_equals_ewah()` Taylor Blau
@ 2024-04-29 20:44   ` Taylor Blau
  2024-04-29 20:44   ` [PATCH v2 23/23] t/perf: implement performace tests for pseudo-merge bitmaps Taylor Blau
  2024-04-30 20:03   ` [PATCH v2 00/23] pack-bitmap: pseudo-merge reachability bitmaps Junio C Hamano
  23 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:44 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

This patch implements support for reusing existing pseudo-merge commits
when writing bitmaps when there is an existing pseudo-merge bitmap which
has exactly the same set of parents as one that we are about to write.

Note that unstable pseudo-merges are likely to change between
consecutive repacks, and so are generally poor candidates for reuse.
However, stable pseudo-merges (see the configuration option
'bitmapPseudoMerge.<name>.stableThreshold') are by definition unlikely
to change between runs (as they represent long-running branches).

Because there is no index from a *set* of pseudo-merge parents to a
matching pseudo-merge bitmap, we have to construct the bitmap
corresponding to the set of parents for each pending pseudo-merge commit
and see if a matching bitmap exists.

This is technically quadratic in the number of pseudo-merges, but is OK
in practice for a couple of reasons:

  - non-matching pseudo-merge bitmaps are rejected quickly as soon as
    they differ in a single bit

  - already-matched pseudo-merge bitmaps are discarded from subsequent
    rounds of search

  - the number of pseudo-merges is generally small, even for large
    repositories

In order to do this, implement (a) a function that finds a matching
pseudo-merge given some uncompressed bitset describing its parents, (b)
a function that computes the bitset of parents for a given pseudo-merge
commit, and (c) call that function before computing the set of reachable
objects for some pending pseudo-merge.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c             | 15 ++++++--
 pack-bitmap.c                   | 32 +++++++++++++++++
 pack-bitmap.h                   |  2 ++
 pseudo-merge.c                  | 55 ++++++++++++++++++++++++++++
 pseudo-merge.h                  |  7 ++++
 t/t5333-pseudo-merge-bitmaps.sh | 64 +++++++++++++++++++++++++++++++++
 6 files changed, 173 insertions(+), 2 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index d4894ace9ee..f7245d7d6fa 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -19,6 +19,10 @@
 #include "tree-walk.h"
 #include "pseudo-merge.h"
 #include "oid-array.h"
+#include "config.h"
+#include "alloc.h"
+#include "refs.h"
+#include "strmap.h"
 
 struct bitmapped_commit {
 	struct commit *commit;
@@ -443,6 +447,7 @@ static int fill_bitmap_tree(struct bitmap *bitmap,
 }
 
 static int reused_bitmaps_nr;
+static int reused_pseudo_merge_bitmaps_nr;
 
 static int fill_bitmap_commit(struct bb_commit *ent,
 			      struct commit *commit,
@@ -467,7 +472,7 @@ static int fill_bitmap_commit(struct bb_commit *ent,
 			struct bitmap *remapped = bitmap_new();
 
 			if (commit->object.flags & BITMAP_PSEUDO_MERGE)
-				old = NULL;
+				old = pseudo_merge_bitmap_for_commit(old_bitmap, c);
 			else
 				old = bitmap_for_commit(old_bitmap, c);
 			/*
@@ -478,7 +483,10 @@ static int fill_bitmap_commit(struct bb_commit *ent,
 			if (old && !rebuild_bitmap(mapping, old, remapped)) {
 				bitmap_or(ent->bitmap, remapped);
 				bitmap_free(remapped);
-				reused_bitmaps_nr++;
+				if (commit->object.flags & BITMAP_PSEUDO_MERGE)
+					reused_pseudo_merge_bitmaps_nr++;
+				else
+					reused_bitmaps_nr++;
 				continue;
 			}
 			bitmap_free(remapped);
@@ -604,6 +612,9 @@ int bitmap_writer_build(struct packing_data *to_pack)
 			    the_repository);
 	trace2_data_intmax("pack-bitmap-write", the_repository,
 			   "building_bitmaps_reused", reused_bitmaps_nr);
+	trace2_data_intmax("pack-bitmap-write", the_repository,
+			   "building_bitmaps_pseudo_merge_reused",
+			   reused_pseudo_merge_bitmaps_nr);
 
 	stop_progress(&writer.progress);
 
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 1966b3b95f1..70230e26479 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1316,6 +1316,37 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 	return cb.base;
 }
 
+struct ewah_bitmap *pseudo_merge_bitmap_for_commit(struct bitmap_index *bitmap_git,
+						   struct commit *commit)
+{
+	struct commit_list *p;
+	struct bitmap *parents;
+	struct pseudo_merge *match = NULL;
+
+	if (!bitmap_git->pseudo_merges.nr)
+		return NULL;
+
+	parents = bitmap_new();
+
+	for (p = commit->parents; p; p = p->next) {
+		int pos = bitmap_position(bitmap_git, &p->item->object.oid);
+		if (pos < 0 || pos >= bitmap_num_objects(bitmap_git))
+			goto done;
+
+		bitmap_set(parents, pos);
+	}
+
+	match = pseudo_merge_for_parents(&bitmap_git->pseudo_merges,
+						parents);
+
+done:
+	bitmap_free(parents);
+	if (match)
+		return pseudo_merge_bitmap(&bitmap_git->pseudo_merges, match);
+
+	return NULL;
+}
+
 static void unsatisfy_all_pseudo_merges(struct bitmap_index *bitmap_git)
 {
 	uint32_t i;
@@ -2809,6 +2840,7 @@ void free_bitmap_index(struct bitmap_index *b)
 		 */
 		close_midx_revindex(b->midx);
 	}
+	free_pseudo_merge_map(&b->pseudo_merges);
 	free(b);
 }
 
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 25d3b8e604a..0fefef39bec 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -119,6 +119,8 @@ int rebuild_bitmap(const uint32_t *reposition,
 		   struct bitmap *dest);
 struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
 				      struct commit *commit);
+struct ewah_bitmap *pseudo_merge_bitmap_for_commit(struct bitmap_index *bitmap_git,
+						   struct commit *commit);
 void bitmap_writer_select_commits(struct commit **indexed_commits,
 				  unsigned int indexed_commits_nr);
 int bitmap_writer_build(struct packing_data *to_pack);
diff --git a/pseudo-merge.c b/pseudo-merge.c
index e111c9cd1a6..9e21fbb5062 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -682,3 +682,58 @@ int cascade_pseudo_merges(const struct pseudo_merge_map *pm,
 
 	return ret;
 }
+
+struct pseudo_merge *pseudo_merge_for_parents(const struct pseudo_merge_map *pm,
+					      struct bitmap *parents)
+{
+	struct pseudo_merge *match = NULL;
+	size_t i;
+
+	if (!pm->nr)
+		return NULL;
+
+	/*
+	 * NOTE: this loop is quadratic in the worst-case (where no
+	 * matching pseudo-merge bitmaps are found), but in practice
+	 * this is OK for a few reasons:
+	 *
+	 *   - Rejecting pseudo-merge bitmaps that do not match the
+	 *     given commit is done quickly (i.e. `bitmap_equals_ewah()`
+	 *     returns early when we know the two bitmaps aren't equal.
+	 *
+	 *   - Already matched pseudo-merge bitmaps (which we track with
+	 *     the `->satisfied` bit here) are skipped as potential
+	 *     candidates.
+	 *
+	 *   - The number of pseudo-merges should be small (in the
+	 *     hundreds for most repositories).
+	 *
+	 * If in the future this semi-quadratic behavior does become a
+	 * problem, another approach would be to keep track of which
+	 * pseudo-merges are still "viable" after enumerating the
+	 * pseudo-merge commit's parents:
+	 *
+	 *   - A pseudo-merge bitmap becomes non-viable when the bit(s)
+	 *     corresponding to one or more parent(s) of the given
+	 *     commit are not set in a candidate pseudo-merge's commits
+	 *     bitmap.
+	 *
+	 *   - After processing all bits, enumerate the remaining set of
+	 *     viable pseudo-merge bitmaps, and check that their
+	 *     popcount() matches the number of parents in the given
+	 *     commit.
+	 */
+	for (i = 0; i < pm->nr; i++) {
+		struct pseudo_merge *candidate = use_pseudo_merge(pm, &pm->v[i]);
+		if (!candidate || candidate->satisfied)
+			continue;
+		if (!bitmap_equals_ewah(parents, candidate->commits))
+			continue;
+
+		match = candidate;
+		match->satisfied = 1;
+		break;
+	}
+
+	return match;
+}
diff --git a/pseudo-merge.h b/pseudo-merge.h
index cc14e947e86..33acd00a3e5 100644
--- a/pseudo-merge.h
+++ b/pseudo-merge.h
@@ -208,4 +208,11 @@ int cascade_pseudo_merges(const struct pseudo_merge_map *pm,
 			  struct bitmap *result,
 			  struct bitmap *roots);
 
+/*
+ * Returns a pseudo-merge which contains the exact set of commits
+ * listed in the "parents" bitamp, or NULL if none could be found.
+ */
+struct pseudo_merge *pseudo_merge_for_parents(const struct pseudo_merge_map *pm,
+					      struct bitmap *parents);
+
 #endif
diff --git a/t/t5333-pseudo-merge-bitmaps.sh b/t/t5333-pseudo-merge-bitmaps.sh
index 909c17e301e..531f1924af4 100755
--- a/t/t5333-pseudo-merge-bitmaps.sh
+++ b/t/t5333-pseudo-merge-bitmaps.sh
@@ -22,6 +22,10 @@ test_pseudo_merges_cascades () {
 	test_trace2_data bitmap pseudo_merges_cascades "$1"
 }
 
+test_pseudo_merges_reused () {
+	test_trace2_data pack-bitmap-write building_bitmaps_pseudo_merge_reused "$1"
+}
+
 tag_everything () {
 	git rev-list --all --no-object-names >in &&
 	perl -lne '
@@ -322,4 +326,64 @@ test_expect_success 'pseudo-merge overlap stale traversal' '
 	)
 '
 
+test_expect_success 'pseudo-merge reuse' '
+	git init pseudo-merge-reuse &&
+	(
+		cd pseudo-merge-reuse &&
+
+		stable="1641013200" && # 2022-01-01
+		unstable="1672549200" && # 2023-01-01
+
+		for date in $stable $unstable
+		do
+			test_commit_bulk --date "$date +0000" 128 &&
+			test_tick || return 1
+		done &&
+
+		tag_everything &&
+
+		git \
+			-c bitmapPseudoMerge.test.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.test.maxMerges=1 \
+			-c bitmapPseudoMerge.test.threshold=now \
+			-c bitmapPseudoMerge.test.stableThreshold=$(($unstable - 1)) \
+			-c bitmapPseudoMerge.test.stableSize=512 \
+			repack -adb &&
+
+		test_pseudo_merges >merges &&
+		test_line_count = 2 merges &&
+
+		test_pseudo_merge_commits 0 >stable-oids.before &&
+		test_pseudo_merge_commits 1 >unstable-oids.before &&
+
+		: >trace2.txt &&
+		GIT_TRACE2_EVENT=$PWD/trace2.txt git \
+			-c bitmapPseudoMerge.test.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.test.maxMerges=2 \
+			-c bitmapPseudoMerge.test.threshold=now \
+			-c bitmapPseudoMerge.test.stableThreshold=$(($unstable - 1)) \
+			-c bitmapPseudoMerge.test.stableSize=512 \
+			repack -adb &&
+
+		test_pseudo_merges_reused 1 <trace2.txt &&
+
+		test_pseudo_merges >merges &&
+		test_line_count = 3 merges &&
+
+		test_pseudo_merge_commits 0 >stable-oids.after &&
+		for i in 1 2
+		do
+			test_pseudo_merge_commits $i || return 1
+		done >unstable-oids.after &&
+
+		sort -u <stable-oids.before >expect &&
+		sort -u <stable-oids.after >actual &&
+		test_cmp expect actual &&
+
+		sort -u <unstable-oids.before >expect &&
+		sort -u <unstable-oids.after >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.45.0.23.gc6f94b99219


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v2 23/23] t/perf: implement performace tests for pseudo-merge bitmaps
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                     ` (21 preceding siblings ...)
  2024-04-29 20:44   ` [PATCH v2 22/23] pseudo-merge: implement support for finding existing merges Taylor Blau
@ 2024-04-29 20:44   ` Taylor Blau
  2024-04-30 20:03   ` [PATCH v2 00/23] pack-bitmap: pseudo-merge reachability bitmaps Junio C Hamano
  23 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-04-29 20:44 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Junio C Hamano

Implement a straightforward performance test demonstrating the benefit
of pseudo-merge bitmaps by measuring how long it takes to count
reachable objects in a few different scenarios:

  - without bitmaps, to demonstrate a reasonable baseline
  - with bitmaps, but without pseudo-merges
  - with bitmaps and pseudo-merges

Results from running this test on git.git are as follows:

    Test                                                                this tree
    -----------------------------------------------------------------------------------
    5333.2: git rev-list --count --all --objects (no bitmaps)           3.46(3.37+0.09)
    5333.3: git rev-list --count --all --objects (no pseudo-merges)     0.13(0.11+0.01)
    5333.4: git rev-list --count --all --objects (with pseudo-merges)   0.12(0.11+0.01)

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/p5333-pseudo-merge-bitmaps.sh | 32 ++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)
 create mode 100755 t/perf/p5333-pseudo-merge-bitmaps.sh

diff --git a/t/perf/p5333-pseudo-merge-bitmaps.sh b/t/perf/p5333-pseudo-merge-bitmaps.sh
new file mode 100755
index 00000000000..4bec409d10e
--- /dev/null
+++ b/t/perf/p5333-pseudo-merge-bitmaps.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+
+test_description='pseudo-merge bitmaps'
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test_expect_success 'setup' '
+	git \
+		-c bitmapPseudoMerge.all.pattern="refs/" \
+		-c bitmapPseudoMerge.all.threshold=now \
+		-c bitmapPseudoMerge.all.stableThreshold=never \
+		-c bitmapPseudoMerge.all.maxMerges=64 \
+		-c pack.writeBitmapLookupTable=true \
+		repack -adb
+'
+
+test_perf 'git rev-list --count --all --objects (no bitmaps)' '
+	git rev-list --objects --all
+'
+
+test_perf 'git rev-list --count --all --objects (no pseudo-merges)' '
+	GIT_TEST_USE_PSEDUO_MERGES=0 \
+		git rev-list --objects --all --use-bitmap-index
+'
+
+test_perf 'git rev-list --count --all --objects (with pseudo-merges)' '
+	GIT_TEST_USE_PSEDUO_MERGES=1 \
+		git rev-list --objects --all --use-bitmap-index
+'
+
+test_done
-- 
2.45.0.23.gc6f94b99219

^ permalink raw reply related	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 00/23] pack-bitmap: pseudo-merge reachability bitmaps
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
                     ` (22 preceding siblings ...)
  2024-04-29 20:44   ` [PATCH v2 23/23] t/perf: implement performace tests for pseudo-merge bitmaps Taylor Blau
@ 2024-04-30 20:03   ` Junio C Hamano
  2024-05-01 14:40     ` Taylor Blau
  23 siblings, 1 reply; 157+ messages in thread
From: Junio C Hamano @ 2024-04-30 20:03 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren

Taylor Blau <me@ttaylorr.com> writes:

>   - Rebased onto 2.45, so this is now based on 'master', which is at
>     786a3e4b8d (Git 2.45, 2024-04-29) at the time of writing.

Is there any notable reason for the rebase (other than "2.45 is out
now") that needs to be called out?  Something along the lines of
"topic X and Y has graduated and the helper function used by this
topic has changed its external interface"?

Thanks, queued.



^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 00/23] pack-bitmap: pseudo-merge reachability bitmaps
  2024-04-30 20:03   ` [PATCH v2 00/23] pack-bitmap: pseudo-merge reachability bitmaps Junio C Hamano
@ 2024-05-01 14:40     ` Taylor Blau
  0 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-01 14:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Elijah Newren

On Tue, Apr 30, 2024 at 01:03:50PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> >   - Rebased onto 2.45, so this is now based on 'master', which is at
> >     786a3e4b8d (Git 2.45, 2024-04-29) at the time of writing.
>
> Is there any notable reason for the rebase (other than "2.45 is out
> now") that needs to be called out?  Something along the lines of
> "topic X and Y has graduated and the helper function used by this
> topic has changed its external interface"?

It's mostly that 2.45 is out, but there were a couple of topics that
merged that produced minor conflicts that I wanted to resolve for you:

- 625ef1c6f1 (Merge branch 'tb/t7700-fixup', 2024-04-16) introduced a
  minor conflict (both sides add GIT_TEST_MULTI_PACK_INDEX=0 to the
  relevant test invocations).

- d8360a86ed (Merge branch 'tb/midx-write', 2024-04-12) introduced a
  conflict where this branch adds a call to the new
  `bitmap_writer_init()` function in midx.c, but tb/midx-write moved all
  of that code over to midx-write.c

Those are the two main ones, but I mostly just wanted to take care of it
since we're on the other side of 2.45.

> Thanks, queued.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 01/23] Documentation/technical: describe pseudo-merge bitmaps format
  2024-04-29 20:42   ` [PATCH v2 01/23] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
@ 2024-05-06 11:52     ` Patrick Steinhardt
  2024-05-06 16:37       ` Taylor Blau
  0 siblings, 1 reply; 157+ messages in thread
From: Patrick Steinhardt @ 2024-05-06 11:52 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 11898 bytes --]

On Mon, Apr 29, 2024 at 04:42:54PM -0400, Taylor Blau wrote:
> Prepare to implement pseudo-merge bitmaps over the next several commits
> by first describing the serialization format which will store the new
> pseudo-merge bitmaps themselves.
> 
> This format is implemented as an optional extension within the bitmap v1
> format, making it compatible with previous versions of Git, as well as
> the original .bitmap implementation within JGit.
> 
> The format (as well as a general description of pseudo-merge bitmaps,
> and motivating use-case(s)) is described in detail in the patch contents
> below, but the high-level description is as follows:
> 
>   - An array of pseudo-merge bitmaps, each containing a pair of EWAH
>     bitmaps: one describing the set of pseudo-merge "parents", and
>     another describing the set of object(s) reachable from those
>     parents.
> 
>   - A lookup table to determine which pseudo-merge(s) a given commit
>     appears in. An optional extended lookup table follows when there is
>     at least one commit which appears in multiple pseudo-merge groups.
> 
>   - Trailing metadata, including the number of pseudo-merge(s), number
>     of unique parents, the offset within the .bitmap file for the
>     pseudo-merge commit lookup table, and the size of the optional
>     extension itself.
> 
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  Documentation/technical/bitmap-format.txt | 179 ++++++++++++++++++++++
>  1 file changed, 179 insertions(+)
> 
> diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
> index f5d200939b0..63a7177ac08 100644
> --- a/Documentation/technical/bitmap-format.txt
> +++ b/Documentation/technical/bitmap-format.txt
> @@ -255,3 +255,182 @@ triplet is -
>  	xor_row (4 byte integer, network byte order): ::
>  	The position of the triplet whose bitmap is used to compress
>  	this one, or `0xffffffff` if no such bitmap exists.
> +
> +Pseudo-merge bitmaps
> +--------------------
> +
> +If the `BITMAP_OPT_PSEUDO_MERGES` flag is set, a variable number of
> +bytes (preceding the name-hash cache, commit lookup table, and trailing
> +checksum) of the `.bitmap` file is used to store pseudo-merge bitmaps.

Here you say that the section is supposed to come before some other
sections, whereas the first sentence in the "File format" section says
that it is the last section in a bitmap file.

> +A "pseudo-merge bitmap" is used to refer to a pair of bitmaps, as
> +follows:
> +
> +Commit bitmap::
> +
> +  A bitmap whose set bits describe the set of commits included in the
> +  pseudo-merge's "merge" bitmap (as below).
> +
> +Merge bitmap::
> +
> +  A bitmap whose set bits describe the reachability closure over the set
> +  of commits in the pseudo-merge's "commits" bitmap (as above). An
> +  identical bitmap would be generated for an octopus merge with the same
> +  set of parents as described in the commits bitmap.
> +
> +Pseudo-merge bitmaps can accelerate bitmap traversals when all commits
> +for a given pseudo-merge are listed on either side of the traversal,
> +either directly (by explicitly asking for them as part of the `HAVES`
> +or `WANTS`) or indirectly (by encountering them during a fill-in
> +traversal).
> +
> +=== Use-cases

I feel like starting with the problems that the whole feature is
intended to solve would help the reading flow quite a bit. So I'd move
this whole section up.

> +For example, suppose there exists a pseudo-merge bitmap with a large
> +number of commits, all of which are listed in the `WANTS` section of
> +some bitmap traversal query. When pseudo-merge bitmaps are enabled, the
> +bitmap machinery can quickly determine there is a pseudo-merge which
> +satisfies some subset of the wanted objects on either side of the query.
> +Then, we can inflate the EWAH-compressed bitmap, and `OR` it in to the
> +resulting bitmap. By contrast, without pseudo-merge bitmaps, we would
> +have to repeat the decompression and `OR`-ing step over a potentially
> +large number of individual bitmaps, which can take proportionally more
> +time.
> +
> +Another benefit of pseudo-merges arises when there is some combination
> +of (a) a large number of references, with (b) poor bitmap coverage, and
> +(c) deep, nested trees, making fill-in traversal relatively expensive.
> +For example, suppose that there are a large enough number of tags where
> +bitmapping each of the tags individually is infeasible. Without
> +pseudo-merge bitmaps, computing the result of, say, `git rev-list
> +--use-bitmap-index --count --objects --tags` would likely require a
> +large amount of fill-in traversal. But when a large quantity of those
> +tags are stored together in a pseudo-merge bitmap, the bitmap machinery
> +can take advantage of the fact that we only care about the union of
> +objects reachable from all of those tags, and answer the query much
> +faster.

I would start the explanation with a discussion of the problem before
presenting the solution to those problems. In the current version it's
the other way round, you present a solution to a problem that isn't yet
explained

It might also be helpful to discuss a bit who is supposed to create
those pseudo-merge bitmaps. Does Git do so automatically for all tags?
Does the admin have to configure this? If the latter, when do you want
to create those and what strategies are there to create them?

> +=== File format
> +
> +If enabled, pseudo-merge bitmaps are stored in an optional section at
> +the end of a `.bitmap` file. The format is as follows:
> +
> +....
> ++-------------------------------------------+
> +|               .bitmap File                |
> ++-------------------------------------------+
> +|                                           |
> +|  Pseudo-merge bitmaps (Variable Length)   |
> +|  +---------------------------+            |
> +|  | commits_bitmap (EWAH)     |            |
> +|  +---------------------------+            |
> +|  | merge_bitmap (EWAH)       |            |
> +|  +---------------------------+            |
> +|                                           |
> ++-------------------------------------------+
> +|                                           |
> +|  Lookup Table                             |
> +|  +------------+--------------+            |
> +|  | commit_pos |    offset    |            |
> +|  +------------+--------------+            |
> +|  |  4 bytes   |   8 bytes    |            |
> +|  +------------+--------------+            |

It's a bit confusing that in the EWAH section above you have the type of
the fields in the same line as the field itself, whereas here you have
them formatted in a separate box. This makes the reader wonder at first
whether this is two or four fields. How about the following instead:

    |  Lookup Table                             |
    |  +---------------------------+            |
    |  | commit_pos (4 bytes)      |            |
    |  +---------------------------+            |
    |  | offset (8 bytes)          |            |
    |  +---------------------------+            |

The same comment applies to the other sections further down.

> +|                                           |
> +|  Offset Cases:                            |
> +|  -------------                            |
> +|                                           |
> +|  1. MSB Unset: single pseudo-merge bitmap |
> +|     + offset to pseudo-merge bitmap       |
> +|                                           |
> +|  2. MSB Set: multiple pseudo-merges       |
> +|     + offset to extended lookup table     |
> +|                                           |
> ++-------------------------------------------+
> +|                                           |
> +|  Extended Lookup Table (Optional)         |
> +|                                           |
> +|  +----+----------+----------+----------+  |
> +|  | N  | Offset 1 |   ....   | Offset N |  |
> +|  +----+----------+----------+----------+  |
> +|  |    |  8 bytes |   ....   |  8 bytes |  |
> +|  +----+----------+----------+----------+  |
> +|                                           |
> ++-------------------------------------------+
> +|                                           |
> +|  Pseudo-merge Metadata                    |
> +|  +------------------+----------------+    |
> +|  | # pseudo-merges  | # Commits      |    |
> +|  +------------------+----------------+    |
> +|  | 4 bytes          | 4 bytes        |    |
> +|  +------------------+----------------+    |
> +|                                           |
> +|  +------------------+----------------+    |
> +|  | Lookup offset    | Extension size |    |
> +|  +------------------+----------------+    |
> +|  | 8 bytes          | 8 bytes        |    |
> +|  +------------------+----------------+    |
> +|                                           |
> ++-------------------------------------------+
> +....
> +
> +* One or more pseudo-merge bitmaps, each containing:

In case you have multiple pseudo-merge bitmaps, is the whole of the
above repeated for each bitmap or is it just parts of it?

> +  ** `commits_bitmap`, an EWAH-compressed bitmap describing the set of
> +     commits included in the this psuedo-merge.
> +
> +  ** `merge_bitmap`, an EWAH-compressed bitmap describing the union of
> +     the set of objects reachable from all commits listed in the
> +     `commits_bitmap`.
> +
> +* A lookup table, mapping pseudo-merged commits to the pseudo-merges
> +  they belong to. Entries appear in increasing order of each commit's
> +  bit position. Each entry is 12 bytes wide, and is comprised of the
> +  following:
> +
> +  ** `commit_pos`, a 4-byte unsigned value (in network byte-order)
> +     containing the bit position for this commit.
> +
> +  ** `offset`, an 8-byte unsigned value (also in network byte-order)
> +  containing either one of two possible offsets, depending on whether or
> +  not the most-significant bit is set.
> +
> +    *** If unset (i.e. `offset & ((uint64_t)1<<63) == 0`), the offset
> +	(relative to the beginning of the `.bitmap` file) at which the
> +	pseudo-merge bitmap for this commit can be read. This indicates
> +	only a single pseudo-merge bitmap contains this commit.
> +
> +    *** If set (i.e. `offset & ((uint64_t)1<<63) != 0`), the offset
> +	(again relative to the beginning of the `.bitmap` file) at which
> +	the extended offset table can be located describing the set of
> +	pseudo-merge bitmaps which contain this commit. This indicates
> +	that multiple pseudo-merge bitmaps contain this commit.
> +
> +* An (optional) extended lookup table (written if and only if there is
> +  at least one commit which appears in more than one pseudo-merge).
> +  There are as many entries as commits which appear in multiple
> +  pseudo-merges. Each entry contains the following:
> +
> +  ** `N`, a 4-byte unsigned value equal to the number of pseudo-merges
> +     which contain a given commit.

How exactly is the given commit identified? Or in other words, given an
entry in the lookup table here, how do I figure out what commit it
belongs to?

> +  ** An array of `N` 8-byte unsigned values, each of which is
> +     interpreted as an offset (relative to the beginning of the
> +     `.bitmap` file) at which a pseudo-merge bitmap for this commit can
> +     be read. These values occur in no particular order.
> +
> +* Positions for all pseudo-merges, each stored as an 8-byte unsigned
> +  value (in network byte-order) containing the offset (relative to the
> +  beginnign of the `.bitmap` file) of each consecutive pseudo-merge.

s/beginnign/beginning

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 04/23] pack-bitmap: move some initialization to `bitmap_writer_init()`
  2024-04-29 20:43   ` [PATCH v2 04/23] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
@ 2024-05-06 11:52     ` Patrick Steinhardt
  2024-05-06 18:24       ` Taylor Blau
  0 siblings, 1 reply; 157+ messages in thread
From: Patrick Steinhardt @ 2024-05-06 11:52 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2655 bytes --]

On Mon, Apr 29, 2024 at 04:43:08PM -0400, Taylor Blau wrote:
> The pack-bitmap-writer machinery uses a oidmap (backed by khash.h) to
> map from commits selected for bitmaps (by OID) to a bitmapped_commit
> structure (containing the bitmap itself, among other things like its XOR
> offset, etc.)
> 
> This map was initialized at the end of `bitmap_writer_build()`. New
> entries are added in `pack-bitmap-write.c::store_selected()`, which is
> called by the bitmap_builder machinery (which is responsible for
> traversing history and generating the actual bitmaps).
> 
> Reorganize when this field is initialized and when entries are added to
> it so that we can quickly determine whether a commit is a candidate for
> pseudo-merge selection, or not (since it was already selected to receive
> a bitmap, and thus is ineligible for pseudo-merge inclusion).

I feel like this last sentence here could use some more explanation as
the restriction has never been explained before. Is this a strict
requirement, or is this rather "It would be wasted anyway"?

> The changes are as follows:
> 
>   - Introduce a new `bitmap_writer_init()` function which initializes
>     the `writer.bitmaps` field (instead of waiting until the end of
>     `bitmap_writer_build()`).
> 
>   - Add map entries in `push_bitmapped_commit()` (which is called via
>     `bitmap_writer_select_commits()`) with OID keys and NULL values to
>     track whether or not we *expect* to write a bitmap for some given
>     commit.
> 
>   - Validate that a NULL entry is found matching the given key when we
>     store a selected bitmap.

It would be great if this refactoring went way further. Right now it's
quite hard to verify whether the writer has really been initialized in
all the right places because it is a global variable. Ideally, the whole
interface should be refactored to take the writer as input instead,
where `bitmap_writer_init()` would then initialize the local variables.

That'd of course be a bigger refactoring and may or may not be a good
fit for this patch series. But I'd very much love to see such a refactor
as a follow-up series.

[snip]
> diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
> index c35bc81d00f..9bc41a9e145 100644
> --- a/pack-bitmap-write.c
> +++ b/pack-bitmap-write.c
> @@ -46,6 +46,11 @@ struct bitmap_writer {
>  
>  static struct bitmap_writer writer;
>  
> +void bitmap_writer_init(struct repository *r)
> +{
> +	writer.bitmaps = kh_init_oid_map();
> +}

Given the other safety belts, do we also want to BUG here in case the
bitmap has already been initialized?

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 06/23] pack-bitmap-write: support storing pseudo-merge commits
  2024-04-29 20:43   ` [PATCH v2 06/23] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
@ 2024-05-06 11:52     ` Patrick Steinhardt
  2024-05-06 18:48       ` Taylor Blau
  2024-05-13 18:42     ` Jeff King
  1 sibling, 1 reply; 157+ messages in thread
From: Patrick Steinhardt @ 2024-05-06 11:52 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1064 bytes --]

On Mon, Apr 29, 2024 at 04:43:15PM -0400, Taylor Blau wrote:
[snip]
> @@ -46,6 +48,11 @@ struct bitmap_writer {
>  
>  static struct bitmap_writer writer;
>  
> +static inline int bitmap_writer_selected_nr(void)
> +{
> +	return writer.selected_nr - writer.pseudo_merges_nr;
> +}

This function may use a comment to explain what its meaning actually is.
Like, `bitmap_writer_selected_nr()` is obviously not the same as the
`selected_nr` of the `bitmap_writer`, which is quite confusing. So why
do we subtract values and why are there two different `selected_nr`s?

[snip]
> diff --git a/pack-bitmap.h b/pack-bitmap.h
> index dae2d68a338..ca9acd2f735 100644
> --- a/pack-bitmap.h
> +++ b/pack-bitmap.h
> @@ -21,6 +21,7 @@ struct bitmap_disk_header {
>  	unsigned char checksum[GIT_MAX_RAWSZ];
>  };
>  
> +#define BITMAP_PSEUDO_MERGE (1u<<21)
>  #define NEEDS_BITMAP (1u<<22)

This flag is already used by "builtin/pack-objects.c", which may be fine.
But in any case, shouldn't we update "object.h" with both of these flags?

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 09/23] pseudo-merge: implement support for selecting pseudo-merge commits
  2024-04-29 20:43   ` [PATCH v2 09/23] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
@ 2024-05-06 11:53     ` Patrick Steinhardt
  2024-05-06 19:58       ` Taylor Blau
  2024-05-13 19:03     ` Jeff King
  1 sibling, 1 reply; 157+ messages in thread
From: Patrick Steinhardt @ 2024-05-06 11:53 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 5482 bytes --]

On Mon, Apr 29, 2024 at 04:43:37PM -0400, Taylor Blau wrote:
[snip]
> +static uint32_t pseudo_merge_group_size(const struct pseudo_merge_group *group,
> +					const struct pseudo_merge_matches *matches,
> +					uint32_t i)
> +{
> +	float C = 0.0f;
> +	uint32_t n;
> +
> +	/*
> +	 * The size of pseudo-merge groups decays according to a power series,
> +	 * which looks like:
> +	 *
> +	 *   f(n) = C * n^-k
> +	 *
> +	 * , where 'n' is the n-th pseudo-merge group, 'f(n)' is its size, 'k'
> +	 * is the decay rate, and 'C' is a scaling value.
> +	 *
> +	 * The value of C depends on the number of groups, decay rate, and total
> +	 * number of commits. It is computed such that if there are M and N
> +	 * total groups and commits, respectively, that:
> +	 *
> +	 *   N = f(0) + f(1) + ... f(M-1)
> +	 *
> +	 * Rearranging to isolate C, we get:
> +	 *
> +	 *   N = \sum_{n=1}^M C / n^k
> +	 *
> +	 *   N / C = \sum_{n=1}^M n^-k
> +	 *
> +	 *   C = N / \sum_{n=1}^M n^-k
> +	 *
> +	 * For example, if we have a decay rate of 'k' being equal to 1.5, 'N'
> +	 * total commits equal to 10,000, and 'M' being equal to 6 groups, then
> +	 * the (rounded) group sizes are:
> +	 *
> +	 *   { 5469, 1934, 1053, 684, 489, 372 }
> +	 *
> +	 * increasing the number of total groups, say to 10, scales the group
> +	 * sizes appropriately:
> +	 *
> +	 *   { 5012, 1772, 964, 626, 448, 341, 271, 221, 186, 158 }
> +	 */
> +	for (n = 0; n < group->max_merges; n++)
> +		C += 1.0f / gitexp(n + 1, group->decay);
> +	C = matches->unstable_nr / C;
> +
> +	return (int)((C / gitexp(i + 1, group->decay)) + 0.5);

Why do we cast the return to `int` when the function returns a
`uint32_t`?

> +}
> +
> +static void init_pseudo_merge_group(struct pseudo_merge_group *group)

Nit: Should't the name rather be `pseudo_merge_group_init()`?

[snip]
> +	} else if (!strcmp(key, "decay")) {
> +		group->decay = git_config_int(var, value, ctx->kvi);
> +		if (group->decay < 0) {
> +			warning(_("%s must be non-negative, using default"), var);
> +			group->decay = DEFAULT_PSEUDO_MERGE_DECAY;
> +		}

The decay is a float, and your decay rate examples mention a rate of
1.5f. It's impossible to specify fractional rates though because we use
`git_config_int()`. Should we introduce a new `git_config_float()`
function to implement this properly?

> +	} else if (!strcmp(key, "samplerate")) {
> +		group->sample_rate = git_config_int(var, value, ctx->kvi);
> +		if (!(0 <= group->sample_rate && group->sample_rate <= 100)) {
> +			warning(_("%s must be between 0 and 100, using default"), var);
> +			group->sample_rate = DEFAULT_PSEUDO_MERGE_SAMPLE_RATE;
> +		}
> +	} else if (!strcmp(key, "threshold")) {
> +		if (git_config_expiry_date(&group->threshold, var, value)) {
> +			strbuf_release(&buf);

Instead of having multiple exit paths where we need to release `buf` we
should likely have a comment exit path.

[snip]
> +static struct commit *push_pseudo_merge(struct pseudo_merge_group *group)
> +{
> +	struct commit *merge;
> +
> +	ALLOC_GROW(group->merges, group->merges_nr + 1, group->merges_alloc);
> +
> +	merge = alloc_commit_node(the_repository);
> +	merge->object.parsed = 1;

Why can we mark the object as parsed here?

> +	merge->object.flags |= BITMAP_PSEUDO_MERGE;
> +
> +	group->merges[group->merges_nr++] = merge;
> +
> +	return merge;
> +}
> +
> +static struct pseudo_merge_commit_idx *pseudo_merge_idx(kh_oid_map_t *pseudo_merge_commits,
> +							const struct object_id *oid)
> +
> +{
> +	struct pseudo_merge_commit_idx *pmc;
> +	khiter_t hash_pos;
> +
> +	hash_pos = kh_get_oid_map(pseudo_merge_commits, *oid);
> +	if (hash_pos == kh_end(pseudo_merge_commits)) {
> +		int hash_ret;
> +		hash_pos = kh_put_oid_map(pseudo_merge_commits, *oid, &hash_ret);
> +		CALLOC_ARRAY(pmc, 1);
> +
> +		kh_value(pseudo_merge_commits, hash_pos) = pmc;
> +	} else {
> +		pmc = kh_value(pseudo_merge_commits, hash_pos);
> +	}
> +
> +	return pmc;
> +}

Can't we simplify this to the following (untested):

static struct pseudo_merge_commit_idx *pseudo_merge_idx(kh_oid_map_t *pseudo_merge_commits,
                                                       const struct object_id *oid)
{
       struct pseudo_merge_commit_idx *pmc;
       khiter_t hash_pos;
       int hash_ret;

       hash_pos = kh_put_oid_map(pseudo_merge_commits, *oid, &hash_ret);
       if (hash_ret) {
               CALLOC_ARRAY(pmc, 1);
               kh_value(pseudo_merge_commits, hash_pos) = pmc;
       } else {
               pmc = kh_value(pseudo_merge_commits, hash_pos);
       }

       return pmc;
}

> +
> +#define MIN_PSEUDO_MERGE_SIZE 8
> +
> +static void select_pseudo_merges_1(struct pseudo_merge_group *group,
> +				   struct pseudo_merge_matches *matches,
> +				   kh_oid_map_t *pseudo_merge_commits,
> +				   uint32_t *pseudo_merges_nr)
> +{
> +	uint32_t i, j;
> +	uint32_t stable_merges_nr;
> +
> +	if (!matches->stable_nr && !matches->unstable_nr)
> +		return; /* all tips in this group already have bitmaps */

It's nice that there are some comments, but there are quite a lot of
non-obvious things going on in this function that would warrant an
explanation that expands a bit more into what exactly it is that we are
doing here.

I may only be speaking for myself, but I basically have no clue what we
do here :) Something something pseudo merges, I guess. But there is no
in-code explanation at all what a "stable" or "unstable" commit is, how
exactly we match commits and other higher-level ideas.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 10/23] pack-bitmap-write.c: select pseudo-merge commits
  2024-04-29 20:43   ` [PATCH v2 10/23] pack-bitmap-write.c: select " Taylor Blau
@ 2024-05-06 11:53     ` Patrick Steinhardt
  2024-05-06 20:05       ` Taylor Blau
  0 siblings, 1 reply; 157+ messages in thread
From: Patrick Steinhardt @ 2024-05-06 11:53 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3027 bytes --]

On Mon, Apr 29, 2024 at 04:43:41PM -0400, Taylor Blau wrote:
[snip]
> diff --git a/Documentation/config/bitmap-pseudo-merge.txt b/Documentation/config/bitmap-pseudo-merge.txt
> new file mode 100644
> index 00000000000..90b72522046
> --- /dev/null
> +++ b/Documentation/config/bitmap-pseudo-merge.txt
> @@ -0,0 +1,75 @@
> +bitmapPseudoMerge.<name>.pattern::
> +	Regular expression used to match reference names. Commits
> +	pointed to by references matching this pattern (and meeting
> +	the below criteria, like `bitmapPseudoMerge.<name>.sampleRate`
> +	and `bitmapPseudoMerge.<name>.threshold`) will be considered
> +	for inclusion in a pseudo-merge bitmap.
> ++
> +Commits are grouped into pseudo-merge groups based on whether or not
> +any reference(s) that point at a given commit match the pattern, which
> +is an extended regular expression.
> ++
> +Within a pseudo-merge group, commits may be further grouped into
> +sub-groups based on the capture groups in the pattern. These
> +sub-groupings are formed from the regular expressions by concatenating
> +any capture groups from the regular expression, with a '-' dash in
> +between.
> ++
> +For example, if the pattern is `refs/tags/`, then all tags (provided
> +they meet the below criteria) will be considered candidates for the
> +same pseudo-merge group. However, if the pattern is instead
> +`refs/remotes/([0-9])+/tags/`, then tags from different remotes will
> +be grouped into separate pseudo-merge groups, based on the remote
> +number.
> +
> +bitmapPseudoMerge.<name>.decay::
> +	Determines the rate at which consecutive pseudo-merge bitmap
> +	groups decrease in size. Must be non-negative. This parameter
> +	can be thought of as `k` in the function `f(n) = C *
> +	n^(-k/100)`, where `f(n)` is the size of the `n`th group.
> ++
> +Setting the decay rate equal to `0` will cause all groups to be the
> +same size. Setting the decay rate equal to `100` will cause the `n`th
> +group to be `1/n` the size of the initial group.  Higher values of the
> +decay rate cause consecutive groups to shrink at an increasing rate.
> +The default is `100`.
> +
> +bitmapPseudoMerge.<name>.sampleRate::
> +	Determines the proportion of non-bitmapped commits (among
> +	reference tips) which are selected for inclusion in an
> +	unstable pseudo-merge bitmap. Must be between `0` and `100`
> +	(inclusive). The default is `100`.

I think for this config to be actionable for anybody we need to explain
what "unstable" or "stable" bitmaps are and what the tradeoff is that
the user needs to pick here. Like, why would I want to set this higher
or lower than the default value, or modify it at all?

I think the same is true for most of the other parts of the docs here,
as well. We explain what those configs do, but basically leave the
reader on their own to figure out what the real-world consequences are
and why they would even want to configure those in the first place.

I spent quite some time on this series now, so I'll stop reading at this
point. Thanks!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 01/23] Documentation/technical: describe pseudo-merge bitmaps format
  2024-05-06 11:52     ` Patrick Steinhardt
@ 2024-05-06 16:37       ` Taylor Blau
  2024-05-10 11:46         ` Patrick Steinhardt
  0 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-05-06 16:37 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Elijah Newren, Junio C Hamano

On Mon, May 06, 2024 at 01:52:44PM +0200, Patrick Steinhardt wrote:
> > diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
> > index f5d200939b0..63a7177ac08 100644
> > --- a/Documentation/technical/bitmap-format.txt
> > +++ b/Documentation/technical/bitmap-format.txt
> > @@ -255,3 +255,182 @@ triplet is -
> >  	xor_row (4 byte integer, network byte order): ::
> >  	The position of the triplet whose bitmap is used to compress
> >  	this one, or `0xffffffff` if no such bitmap exists.
> > +
> > +Pseudo-merge bitmaps
> > +--------------------
> > +
> > +If the `BITMAP_OPT_PSEUDO_MERGES` flag is set, a variable number of
> > +bytes (preceding the name-hash cache, commit lookup table, and trailing
> > +checksum) of the `.bitmap` file is used to store pseudo-merge bitmaps.
>
> Here you say that the section is supposed to come before some other
> sections, whereas the first sentence in the "File format" section says
> that it is the last section in a bitmap file.

This is a quirk of the on-disk .bitmap format. New sections are added
before existing sections, so if you were reading the file from beginning
to end, you'd see the pseudo-merges extension, then the lookup table,
then the name-hash cache (assuming all were written).

I think that describing them in the order they were introduced here
makes more sense, leaving their layout within the .bitmap file as an
implementation detail.

If you feel strongly otherwise, let's clean it up outside of this series
since this whole portion of the documentation would need to be
reordered.

> > +A "pseudo-merge bitmap" is used to refer to a pair of bitmaps, as
> > +follows:
> > +
> > +Commit bitmap::
> > +
> > +  A bitmap whose set bits describe the set of commits included in the
> > +  pseudo-merge's "merge" bitmap (as below).
> > +
> > +Merge bitmap::
> > +
> > +  A bitmap whose set bits describe the reachability closure over the set
> > +  of commits in the pseudo-merge's "commits" bitmap (as above). An
> > +  identical bitmap would be generated for an octopus merge with the same
> > +  set of parents as described in the commits bitmap.
> > +
> > +Pseudo-merge bitmaps can accelerate bitmap traversals when all commits
> > +for a given pseudo-merge are listed on either side of the traversal,
> > +either directly (by explicitly asking for them as part of the `HAVES`
> > +or `WANTS`) or indirectly (by encountering them during a fill-in
> > +traversal).
> > +
> > +=== Use-cases
>
> I feel like starting with the problems that the whole feature is
> intended to solve would help the reading flow quite a bit. So I'd move
> this whole section up.

I think we may want something in the middle, more like a "problem
statement". I wrote up a section that aims to do that, and tries to both
briefly describe the problem (first), as well as a small overview of
what the solution is. Let me know what you think:

--- 8< ---
diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index 63a7177ac0..144f377e35 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -263,6 +263,26 @@ If the `BITMAP_OPT_PSEUDO_MERGES` flag is set, a variable number of
 bytes (preceding the name-hash cache, commit lookup table, and trailing
 checksum) of the `.bitmap` file is used to store pseudo-merge bitmaps.

+=== Problem
+
+When a bitmap traversal is performed, the client may have to do some
+amount of "fill-in" traversal to find the set of objects reachable from
+references which do not have bitmap coverage within the repository.
+
+This fill-in traversal can be expensive, and can become a significant
+bottleneck in repositories that have a large number of references with
+poor bitmap coverage. Ideally every single reference would have
+reachability bitmap coverage, but this is also not feasible as doing so
+would reduce cache locality when reading the .bitmap file, and we would
+also spend a significant amount of time XOR'ing individual bitmaps
+together to generate a result.
+
+This section describes "pseudo-merge bitmaps", a new kind of
+reachability bitmap that describes the set of objects reachable from a
+group of references, rather than an individual reference.
+
+=== Overview
+
 A "pseudo-merge bitmap" is used to refer to a pair of bitmaps, as
 follows:
--- >8 ---

> > +For example, suppose there exists a pseudo-merge bitmap with a large
> > +number of commits, all of which are listed in the `WANTS` section of
> > +some bitmap traversal query. When pseudo-merge bitmaps are enabled, the
> > +bitmap machinery can quickly determine there is a pseudo-merge which
> > +satisfies some subset of the wanted objects on either side of the query.
> > +Then, we can inflate the EWAH-compressed bitmap, and `OR` it in to the
> > +resulting bitmap. By contrast, without pseudo-merge bitmaps, we would
> > +have to repeat the decompression and `OR`-ing step over a potentially
> > +large number of individual bitmaps, which can take proportionally more
> > +time.
> > +
> > +Another benefit of pseudo-merges arises when there is some combination
> > +of (a) a large number of references, with (b) poor bitmap coverage, and
> > +(c) deep, nested trees, making fill-in traversal relatively expensive.
> > +For example, suppose that there are a large enough number of tags where
> > +bitmapping each of the tags individually is infeasible. Without
> > +pseudo-merge bitmaps, computing the result of, say, `git rev-list
> > +--use-bitmap-index --count --objects --tags` would likely require a
> > +large amount of fill-in traversal. But when a large quantity of those
> > +tags are stored together in a pseudo-merge bitmap, the bitmap machinery
> > +can take advantage of the fact that we only care about the union of
> > +objects reachable from all of those tags, and answer the query much
> > +faster.
>
> I would start the explanation with a discussion of the problem before
> presenting the solution to those problems. In the current version it's
> the other way round, you present a solution to a problem that isn't yet
> explained
>
> It might also be helpful to discuss a bit who is supposed to create
> those pseudo-merge bitmaps. Does Git do so automatically for all tags?
> Does the admin have to configure this? If the latter, when do you want
> to create those and what strategies are there to create them?

The pseudo-merge bitmaps are created by Git itself, configured via the
options described later on in this series. I'm happy to add a specific
call-out, but I would rather do it elsewhere outside of
Documentation/technical/bitmap-format.txt, which I think should be
mostly focused on the on-disk format.

> > +=== File format
> > +
> > +If enabled, pseudo-merge bitmaps are stored in an optional section at
> > +the end of a `.bitmap` file. The format is as follows:
> > +
> > +....
> > ++-------------------------------------------+
> > +|               .bitmap File                |
> > ++-------------------------------------------+
> > +|                                           |
> > +|  Pseudo-merge bitmaps (Variable Length)   |
> > +|  +---------------------------+            |
> > +|  | commits_bitmap (EWAH)     |            |
> > +|  +---------------------------+            |
> > +|  | merge_bitmap (EWAH)       |            |
> > +|  +---------------------------+            |
> > +|                                           |
> > ++-------------------------------------------+
> > +|                                           |
> > +|  Lookup Table                             |
> > +|  +------------+--------------+            |
> > +|  | commit_pos |    offset    |            |
> > +|  +------------+--------------+            |
> > +|  |  4 bytes   |   8 bytes    |            |
> > +|  +------------+--------------+            |
>
> It's a bit confusing that in the EWAH section above you have the type of
> the fields in the same line as the field itself, whereas here you have
> them formatted in a separate box. This makes the reader wonder at first
> whether this is two or four fields. How about the following instead:
>
>     |  Lookup Table                             |
>     |  +---------------------------+            |
>     |  | commit_pos (4 bytes)      |            |
>     |  +---------------------------+            |
>     |  | offset (8 bytes)          |            |
>     |  +---------------------------+            |
>
> The same comment applies to the other sections further down.

Much preferable, thanks. I made the change to use the suggested format
here and elsewhere in the document.

> In case you have multiple pseudo-merge bitmaps, is the whole of the
> above repeated for each bitmap or is it just parts of it?

The "pseudo-merge bitmaps" section contains a variable number of pairs
of EWAH bitmaps, one pair for each pseudo-merge bitmap. I think this is
covered below where it says "one or more pseudo-merge bitmaps, each
containing: [...]", but let me know if I should be more explicit.

> > +* An (optional) extended lookup table (written if and only if there is
> > +  at least one commit which appears in more than one pseudo-merge).
> > +  There are as many entries as commits which appear in multiple
> > +  pseudo-merges. Each entry contains the following:
> > +
> > +  ** `N`, a 4-byte unsigned value equal to the number of pseudo-merges
> > +     which contain a given commit.
>
> How exactly is the given commit identified? Or in other words, given an
> entry in the lookup table here, how do I figure out what commit it
> belongs to?

They aren't identified within this section. The extended lookup table is
indexed into via the lookup table with an offset that is stored in the
`offset` field when the MSB is set.

> > +  ** An array of `N` 8-byte unsigned values, each of which is
> > +     interpreted as an offset (relative to the beginning of the
> > +     `.bitmap` file) at which a pseudo-merge bitmap for this commit can
> > +     be read. These values occur in no particular order.
> > +
> > +* Positions for all pseudo-merges, each stored as an 8-byte unsigned
> > +  value (in network byte-order) containing the offset (relative to the
> > +  beginnign of the `.bitmap` file) of each consecutive pseudo-merge.
>
> s/beginnign/beginning

Great catch, thanks!

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 04/23] pack-bitmap: move some initialization to `bitmap_writer_init()`
  2024-05-06 11:52     ` Patrick Steinhardt
@ 2024-05-06 18:24       ` Taylor Blau
  0 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-06 18:24 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Elijah Newren, Junio C Hamano

On Mon, May 06, 2024 at 01:52:50PM +0200, Patrick Steinhardt wrote:
> On Mon, Apr 29, 2024 at 04:43:08PM -0400, Taylor Blau wrote:
> > The pack-bitmap-writer machinery uses a oidmap (backed by khash.h) to
> > map from commits selected for bitmaps (by OID) to a bitmapped_commit
> > structure (containing the bitmap itself, among other things like its XOR
> > offset, etc.)
> >
> > This map was initialized at the end of `bitmap_writer_build()`. New
> > entries are added in `pack-bitmap-write.c::store_selected()`, which is
> > called by the bitmap_builder machinery (which is responsible for
> > traversing history and generating the actual bitmaps).
> >
> > Reorganize when this field is initialized and when entries are added to
> > it so that we can quickly determine whether a commit is a candidate for
> > pseudo-merge selection, or not (since it was already selected to receive
> > a bitmap, and thus is ineligible for pseudo-merge inclusion).
>
> I feel like this last sentence here could use some more explanation as
> the restriction has never been explained before. Is this a strict
> requirement, or is this rather "It would be wasted anyway"?

Thanks, that's a great call out. I reworded this sentence to clarify
that it's redundant, but not a strict requirement. I think that's a
sufficient amount of detail to motivate the change, but not so much that
it distracts from the change at hand.

> > The changes are as follows:
> >
> >   - Introduce a new `bitmap_writer_init()` function which initializes
> >     the `writer.bitmaps` field (instead of waiting until the end of
> >     `bitmap_writer_build()`).
> >
> >   - Add map entries in `push_bitmapped_commit()` (which is called via
> >     `bitmap_writer_select_commits()`) with OID keys and NULL values to
> >     track whether or not we *expect* to write a bitmap for some given
> >     commit.
> >
> >   - Validate that a NULL entry is found matching the given key when we
> >     store a selected bitmap.
>
> It would be great if this refactoring went way further. Right now it's
> quite hard to verify whether the writer has really been initialized in
> all the right places because it is a global variable. Ideally, the whole
> interface should be refactored to take the writer as input instead,
> where `bitmap_writer_init()` would then initialize the local variables.
>
> That'd of course be a bigger refactoring and may or may not be a good
> fit for this patch series. But I'd very much love to see such a refactor
> as a follow-up series.

Yeah, I definitely agree here ;-). I will plan on doing this as a
follow-up to this series.

> [snip]
> > diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
> > index c35bc81d00f..9bc41a9e145 100644
> > --- a/pack-bitmap-write.c
> > +++ b/pack-bitmap-write.c
> > @@ -46,6 +46,11 @@ struct bitmap_writer {
> >
> >  static struct bitmap_writer writer;
> >
> > +void bitmap_writer_init(struct repository *r)
> > +{
> > +	writer.bitmaps = kh_init_oid_map();
> > +}
>
> Given the other safety belts, do we also want to BUG here in case the
> bitmap has already been initialized?

Great suggestion, thanks.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 06/23] pack-bitmap-write: support storing pseudo-merge commits
  2024-05-06 11:52     ` Patrick Steinhardt
@ 2024-05-06 18:48       ` Taylor Blau
  2024-05-10 11:47         ` Patrick Steinhardt
  0 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-05-06 18:48 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Elijah Newren, Junio C Hamano

On Mon, May 06, 2024 at 01:52:56PM +0200, Patrick Steinhardt wrote:
> On Mon, Apr 29, 2024 at 04:43:15PM -0400, Taylor Blau wrote:
> [snip]
> > @@ -46,6 +48,11 @@ struct bitmap_writer {
> >
> >  static struct bitmap_writer writer;
> >
> > +static inline int bitmap_writer_selected_nr(void)
> > +{
> > +	return writer.selected_nr - writer.pseudo_merges_nr;
> > +}
>
> This function may use a comment to explain what its meaning actually is.
> Like, `bitmap_writer_selected_nr()` is obviously not the same as the
> `selected_nr` of the `bitmap_writer`, which is quite confusing. So why
> do we subtract values and why are there two different `selected_nr`s?

selected_nr is the total number of bitmaps we are writing (including
pseudo-merges), and writer.pseudo_merges_nr is the number of those
bitmaps which are pseudo-merges.

I renamed this function to bitmap_writer_nr_selected_commits() which
should clarify things, let me know if that works!

> [snip]
> > diff --git a/pack-bitmap.h b/pack-bitmap.h
> > index dae2d68a338..ca9acd2f735 100644
> > --- a/pack-bitmap.h
> > +++ b/pack-bitmap.h
> > @@ -21,6 +21,7 @@ struct bitmap_disk_header {
> >  	unsigned char checksum[GIT_MAX_RAWSZ];
> >  };
> >
> > +#define BITMAP_PSEUDO_MERGE (1u<<21)
> >  #define NEEDS_BITMAP (1u<<22)
>
> This flag is already used by "builtin/pack-objects.c", which may be fine.
> But in any case, shouldn't we update "object.h" with both of these flags?

I can't see where in builtin/pack-objects.c this flag is used. The table
in object.h says that bit 21 is used in:

  - list-objects-filter.c
  - builtin/index-pack.c
  - builtin/unpack-objects.c

But I think those are all fine. We don't call unpack-objects from the
bitmap writing paths, and the same is true of index-pack (since we're
writing the pack out directly).

list-objects-filter.c should also be OK, since I am 99% sure that these
two code paths do not collide, but even if they do, that field is only
set on tree objects from the list-objects-filter.c code path, and the
new bits in pack-bitmap.h are only set on commit objects.

Regardless, let me not forget to update the table in object.h! Thanks
for reminding me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 09/23] pseudo-merge: implement support for selecting pseudo-merge commits
  2024-05-06 11:53     ` Patrick Steinhardt
@ 2024-05-06 19:58       ` Taylor Blau
  0 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-06 19:58 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Elijah Newren, Junio C Hamano

On Mon, May 06, 2024 at 01:53:01PM +0200, Patrick Steinhardt wrote:
> > +	for (n = 0; n < group->max_merges; n++)
> > +		C += 1.0f / gitexp(n + 1, group->decay);
> > +	C = matches->unstable_nr / C;
> > +
> > +	return (int)((C / gitexp(i + 1, group->decay)) + 0.5);
>
> Why do we cast the return to `int` when the function returns a
> `uint32_t`?

Oops, great catch. This should cast to a uint32_t, not a signed type.

> > +}
> > +
> > +static void init_pseudo_merge_group(struct pseudo_merge_group *group)
>
> Nit: Should't the name rather be `pseudo_merge_group_init()`?

Sure, I can change that.

> [snip]
> > +	} else if (!strcmp(key, "decay")) {
> > +		group->decay = git_config_int(var, value, ctx->kvi);
> > +		if (group->decay < 0) {
> > +			warning(_("%s must be non-negative, using default"), var);
> > +			group->decay = DEFAULT_PSEUDO_MERGE_DECAY;
> > +		}
>
> The decay is a float, and your decay rate examples mention a rate of
> 1.5f. It's impossible to specify fractional rates though because we use
> `git_config_int()`. Should we introduce a new `git_config_float()`
> function to implement this properly?

Good idea. I had addressed this with the sample rate by making the
configured value a percentage (0-100) that was scaled down by 100, but
for some reason I neglected to do the same here. I'll introduce a new
float parser.

> > +	} else if (!strcmp(key, "samplerate")) {
> > +		group->sample_rate = git_config_int(var, value, ctx->kvi);
> > +		if (!(0 <= group->sample_rate && group->sample_rate <= 100)) {
> > +			warning(_("%s must be between 0 and 100, using default"), var);
> > +			group->sample_rate = DEFAULT_PSEUDO_MERGE_SAMPLE_RATE;
> > +		}
> > +	} else if (!strcmp(key, "threshold")) {
> > +		if (git_config_expiry_date(&group->threshold, var, value)) {
> > +			strbuf_release(&buf);
>
> Instead of having multiple exit paths where we need to release `buf` we
> should likely have a comment exit path.

Good call, thanks!

> [snip]
> > +static struct commit *push_pseudo_merge(struct pseudo_merge_group *group)
> > +{
> > +	struct commit *merge;
> > +
> > +	ALLOC_GROW(group->merges, group->merges_nr + 1, group->merges_alloc);
> > +
> > +	merge = alloc_commit_node(the_repository);
> > +	merge->object.parsed = 1;
>
> Why can we mark the object as parsed here?

We have to mark it as parsed since there is no object buffer underlying
this fake commit node. If we try and parse it later on it will fail
since we won't be able to find a corresponding buffer.

> > +	merge->object.flags |= BITMAP_PSEUDO_MERGE;
> > +
> > +	group->merges[group->merges_nr++] = merge;
> > +
> > +	return merge;
> > +}
> > +
> > +static struct pseudo_merge_commit_idx *pseudo_merge_idx(kh_oid_map_t *pseudo_merge_commits,
> > +							const struct object_id *oid)
> > +
> > +{
> > +	struct pseudo_merge_commit_idx *pmc;
> > +	khiter_t hash_pos;
> > +
> > +	hash_pos = kh_get_oid_map(pseudo_merge_commits, *oid);
> > +	if (hash_pos == kh_end(pseudo_merge_commits)) {
> > +		int hash_ret;
> > +		hash_pos = kh_put_oid_map(pseudo_merge_commits, *oid, &hash_ret);
> > +		CALLOC_ARRAY(pmc, 1);
> > +
> > +		kh_value(pseudo_merge_commits, hash_pos) = pmc;
> > +	} else {
> > +		pmc = kh_value(pseudo_merge_commits, hash_pos);
> > +	}
> > +
> > +	return pmc;
> > +}
>
> Can't we simplify this to the following (untested):
>
> static struct pseudo_merge_commit_idx *pseudo_merge_idx(kh_oid_map_t *pseudo_merge_commits,
>                                                        const struct object_id *oid)
> {
>        struct pseudo_merge_commit_idx *pmc;
>        khiter_t hash_pos;
>        int hash_ret;
>
>        hash_pos = kh_put_oid_map(pseudo_merge_commits, *oid, &hash_ret);
>        if (hash_ret) {
>                CALLOC_ARRAY(pmc, 1);
>                kh_value(pseudo_merge_commits, hash_pos) = pmc;
>
>        } else {
>                pmc = kh_value(pseudo_merge_commits, hash_pos);
>        }
>
>        return pmc;
> }

Nice suggestion, I think that should work great.

> > +
> > +#define MIN_PSEUDO_MERGE_SIZE 8
> > +
> > +static void select_pseudo_merges_1(struct pseudo_merge_group *group,
> > +				   struct pseudo_merge_matches *matches,
> > +				   kh_oid_map_t *pseudo_merge_commits,
> > +				   uint32_t *pseudo_merges_nr)
> > +{
> > +	uint32_t i, j;
> > +	uint32_t stable_merges_nr;
> > +
> > +	if (!matches->stable_nr && !matches->unstable_nr)
> > +		return; /* all tips in this group already have bitmaps */
>
> It's nice that there are some comments, but there are quite a lot of
> non-obvious things going on in this function that would warrant an
> explanation that expands a bit more into what exactly it is that we are
> doing here.
>
> I may only be speaking for myself, but I basically have no clue what we
> do here :) Something something pseudo merges, I guess. But there is no
> in-code explanation at all what a "stable" or "unstable" commit is, how
> exactly we match commits and other higher-level ideas.

Very fair. I added some comments throughout to try and make this
function's purpose more clear. Thanks for all of the great review so
far!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 10/23] pack-bitmap-write.c: select pseudo-merge commits
  2024-05-06 11:53     ` Patrick Steinhardt
@ 2024-05-06 20:05       ` Taylor Blau
  2024-05-10 11:47         ` Patrick Steinhardt
  0 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-05-06 20:05 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Elijah Newren, Junio C Hamano

On Mon, May 06, 2024 at 01:53:07PM +0200, Patrick Steinhardt wrote:
> I think the same is true for most of the other parts of the docs here,
> as well. We explain what those configs do, but basically leave the
> reader on their own to figure out what the real-world consequences are
> and why they would even want to configure those in the first place.

I think that's a fair point, and I'm in the same boat as you are to a
large extent. I have an idea of what these settings do and why you
might want to set them, but since there is no real-world deployment of
this series, I don't have any solid guidelines on when you
should/shouldn't set these settings.

My hope is that there are enough knobs to tweak here that anyone
deploying this feature could find a configuration that works for them.

But I'll try and add some general guidance on why you would want to
change certain settings, etc.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 01/23] Documentation/technical: describe pseudo-merge bitmaps format
  2024-05-06 16:37       ` Taylor Blau
@ 2024-05-10 11:46         ` Patrick Steinhardt
  2024-05-13 19:47           ` Taylor Blau
  0 siblings, 1 reply; 157+ messages in thread
From: Patrick Steinhardt @ 2024-05-10 11:46 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 5933 bytes --]

On Mon, May 06, 2024 at 12:37:35PM -0400, Taylor Blau wrote:
> On Mon, May 06, 2024 at 01:52:44PM +0200, Patrick Steinhardt wrote:
> > > diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
> > > index f5d200939b0..63a7177ac08 100644
> > > --- a/Documentation/technical/bitmap-format.txt
> > > +++ b/Documentation/technical/bitmap-format.txt
> > > @@ -255,3 +255,182 @@ triplet is -
> > >  	xor_row (4 byte integer, network byte order): ::
> > >  	The position of the triplet whose bitmap is used to compress
> > >  	this one, or `0xffffffff` if no such bitmap exists.
> > > +
> > > +Pseudo-merge bitmaps
> > > +--------------------
> > > +
> > > +If the `BITMAP_OPT_PSEUDO_MERGES` flag is set, a variable number of
> > > +bytes (preceding the name-hash cache, commit lookup table, and trailing
> > > +checksum) of the `.bitmap` file is used to store pseudo-merge bitmaps.
> >
> > Here you say that the section is supposed to come before some other
> > sections, whereas the first sentence in the "File format" section says
> > that it is the last section in a bitmap file.
> 
> This is a quirk of the on-disk .bitmap format. New sections are added
> before existing sections, so if you were reading the file from beginning
> to end, you'd see the pseudo-merges extension, then the lookup table,
> then the name-hash cache (assuming all were written).
> 
> I think that describing them in the order they were introduced here
> makes more sense, leaving their layout within the .bitmap file as an
> implementation detail.
> 
> If you feel strongly otherwise, let's clean it up outside of this series
> since this whole portion of the documentation would need to be
> reordered.

I don't, thanks for the explanation.

[snip]
> +=== Overview
> +
>  A "pseudo-merge bitmap" is used to refer to a pair of bitmaps, as
>  follows:
> --- >8 ---
> 
> > > +For example, suppose there exists a pseudo-merge bitmap with a large
> > > +number of commits, all of which are listed in the `WANTS` section of
> > > +some bitmap traversal query. When pseudo-merge bitmaps are enabled, the
> > > +bitmap machinery can quickly determine there is a pseudo-merge which
> > > +satisfies some subset of the wanted objects on either side of the query.
> > > +Then, we can inflate the EWAH-compressed bitmap, and `OR` it in to the
> > > +resulting bitmap. By contrast, without pseudo-merge bitmaps, we would
> > > +have to repeat the decompression and `OR`-ing step over a potentially
> > > +large number of individual bitmaps, which can take proportionally more
> > > +time.
> > > +
> > > +Another benefit of pseudo-merges arises when there is some combination
> > > +of (a) a large number of references, with (b) poor bitmap coverage, and
> > > +(c) deep, nested trees, making fill-in traversal relatively expensive.
> > > +For example, suppose that there are a large enough number of tags where
> > > +bitmapping each of the tags individually is infeasible. Without
> > > +pseudo-merge bitmaps, computing the result of, say, `git rev-list
> > > +--use-bitmap-index --count --objects --tags` would likely require a
> > > +large amount of fill-in traversal. But when a large quantity of those
> > > +tags are stored together in a pseudo-merge bitmap, the bitmap machinery
> > > +can take advantage of the fact that we only care about the union of
> > > +objects reachable from all of those tags, and answer the query much
> > > +faster.
> >
> > I would start the explanation with a discussion of the problem before
> > presenting the solution to those problems. In the current version it's
> > the other way round, you present a solution to a problem that isn't yet
> > explained
> >
> > It might also be helpful to discuss a bit who is supposed to create
> > those pseudo-merge bitmaps. Does Git do so automatically for all tags?
> > Does the admin have to configure this? If the latter, when do you want
> > to create those and what strategies are there to create them?
> 
> The pseudo-merge bitmaps are created by Git itself, configured via the
> options described later on in this series. I'm happy to add a specific
> call-out, but I would rather do it elsewhere outside of
> Documentation/technical/bitmap-format.txt, which I think should be
> mostly focused on the on-disk format.

I think what throws me off here is that you already go into the
non-technical somewhat by explaining their usecases. This causes us to
end up halfwhere between "We motivate the changes" and "We document the
technical parts, only".

[snip]
> > In case you have multiple pseudo-merge bitmaps, is the whole of the
> > above repeated for each bitmap or is it just parts of it?
> 
> The "pseudo-merge bitmaps" section contains a variable number of pairs
> of EWAH bitmaps, one pair for each pseudo-merge bitmap. I think this is
> covered below where it says "one or more pseudo-merge bitmaps, each
> containing: [...]", but let me know if I should be more explicit.
> 
> > > +* An (optional) extended lookup table (written if and only if there is
> > > +  at least one commit which appears in more than one pseudo-merge).
> > > +  There are as many entries as commits which appear in multiple
> > > +  pseudo-merges. Each entry contains the following:
> > > +
> > > +  ** `N`, a 4-byte unsigned value equal to the number of pseudo-merges
> > > +     which contain a given commit.
> >
> > How exactly is the given commit identified? Or in other words, given an
> > entry in the lookup table here, how do I figure out what commit it
> > belongs to?
> 
> They aren't identified within this section. The extended lookup table is
> indexed into via the lookup table with an offset that is stored in the
> `offset` field when the MSB is set.

Okay. Would this explanation be a good addition to the document?

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 06/23] pack-bitmap-write: support storing pseudo-merge commits
  2024-05-06 18:48       ` Taylor Blau
@ 2024-05-10 11:47         ` Patrick Steinhardt
  0 siblings, 0 replies; 157+ messages in thread
From: Patrick Steinhardt @ 2024-05-10 11:47 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2425 bytes --]

On Mon, May 06, 2024 at 02:48:10PM -0400, Taylor Blau wrote:
> On Mon, May 06, 2024 at 01:52:56PM +0200, Patrick Steinhardt wrote:
> > On Mon, Apr 29, 2024 at 04:43:15PM -0400, Taylor Blau wrote:
> > [snip]
> > > @@ -46,6 +48,11 @@ struct bitmap_writer {
> > >
> > >  static struct bitmap_writer writer;
> > >
> > > +static inline int bitmap_writer_selected_nr(void)
> > > +{
> > > +	return writer.selected_nr - writer.pseudo_merges_nr;
> > > +}
> >
> > This function may use a comment to explain what its meaning actually is.
> > Like, `bitmap_writer_selected_nr()` is obviously not the same as the
> > `selected_nr` of the `bitmap_writer`, which is quite confusing. So why
> > do we subtract values and why are there two different `selected_nr`s?
> 
> selected_nr is the total number of bitmaps we are writing (including
> pseudo-merges), and writer.pseudo_merges_nr is the number of those
> bitmaps which are pseudo-merges.
> 
> I renamed this function to bitmap_writer_nr_selected_commits() which
> should clarify things, let me know if that works!

Yup, that's clearer, thanks.

> > [snip]
> > > diff --git a/pack-bitmap.h b/pack-bitmap.h
> > > index dae2d68a338..ca9acd2f735 100644
> > > --- a/pack-bitmap.h
> > > +++ b/pack-bitmap.h
> > > @@ -21,6 +21,7 @@ struct bitmap_disk_header {
> > >  	unsigned char checksum[GIT_MAX_RAWSZ];
> > >  };
> > >
> > > +#define BITMAP_PSEUDO_MERGE (1u<<21)
> > >  #define NEEDS_BITMAP (1u<<22)
> >
> > This flag is already used by "builtin/pack-objects.c", which may be fine.
> > But in any case, shouldn't we update "object.h" with both of these flags?
> 
> I can't see where in builtin/pack-objects.c this flag is used. The table
> in object.h says that bit 21 is used in:

Yeah, no idea. I must've been seeing ghosts here.

>   - list-objects-filter.c
>   - builtin/index-pack.c
>   - builtin/unpack-objects.c
> 
> But I think those are all fine. We don't call unpack-objects from the
> bitmap writing paths, and the same is true of index-pack (since we're
> writing the pack out directly).
> 
> list-objects-filter.c should also be OK, since I am 99% sure that these
> two code paths do not collide, but even if they do, that field is only
> set on tree objects from the list-objects-filter.c code path, and the
> new bits in pack-bitmap.h are only set on commit objects.

Okay, thanks for the explanation!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 10/23] pack-bitmap-write.c: select pseudo-merge commits
  2024-05-06 20:05       ` Taylor Blau
@ 2024-05-10 11:47         ` Patrick Steinhardt
  0 siblings, 0 replies; 157+ messages in thread
From: Patrick Steinhardt @ 2024-05-10 11:47 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1362 bytes --]

On Mon, May 06, 2024 at 04:05:38PM -0400, Taylor Blau wrote:
> On Mon, May 06, 2024 at 01:53:07PM +0200, Patrick Steinhardt wrote:
> > I think the same is true for most of the other parts of the docs here,
> > as well. We explain what those configs do, but basically leave the
> > reader on their own to figure out what the real-world consequences are
> > and why they would even want to configure those in the first place.
> 
> I think that's a fair point, and I'm in the same boat as you are to a
> large extent. I have an idea of what these settings do and why you
> might want to set them, but since there is no real-world deployment of
> this series, I don't have any solid guidelines on when you
> should/shouldn't set these settings.
> 
> My hope is that there are enough knobs to tweak here that anyone
> deploying this feature could find a configuration that works for them.
> 
> But I'll try and add some general guidance on why you would want to
> change certain settings, etc.

Fair enough, I can certainly relate to that. It's also the same for me
in my patch series that introduces config knobs for the reftable writer
[1]. I have of course benchmarked things, but benchmarking only goes so
far and thus isn't quite reflective of the real world.

Patrick

[1]: https://lore.kernel.org/git/cover.1714630191.git.ps@pks.im/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 06/23] pack-bitmap-write: support storing pseudo-merge commits
  2024-04-29 20:43   ` [PATCH v2 06/23] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
  2024-05-06 11:52     ` Patrick Steinhardt
@ 2024-05-13 18:42     ` Jeff King
  2024-05-13 20:19       ` Taylor Blau
  1 sibling, 1 reply; 157+ messages in thread
From: Jeff King @ 2024-05-13 18:42 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Mon, Apr 29, 2024 at 04:43:15PM -0400, Taylor Blau wrote:

> diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
> index 9bc41a9e145..fef02cd745a 100644
> --- a/pack-bitmap-write.c
> +++ b/pack-bitmap-write.c
> @@ -24,7 +24,7 @@ struct bitmapped_commit {
>  	struct ewah_bitmap *write_as;
>  	int flags;
>  	int xor_offset;
> -	uint32_t commit_pos;
> +	unsigned pseudo_merge : 1;
>  };

The addition of the bit flag here makes sense, but dropping commit_pos
caught me by surprise. But...it looks like that flag is simply unused
cruft even before this patch?

It might be worth noting that in the commit message, or better still,
pulling its removal out to a preparatory patch.

> +static inline int bitmap_writer_selected_nr(void)
> +{
> +	return writer.selected_nr - writer.pseudo_merges_nr;
> +}

OK, so now most spots should use this new function instead of looking at
writer.selected_nr directly. But if anybody accidentally uses the old
field directly, it is presumably disastrous. Is it worth renaming it to
make sure we caught all references?

The downside would be that spots which _do_ want the complete
selected_nr would need to be updated to use the new name. It doesn't
look like there are that many, though. OTOH, that means that it's also
easy to inspect them and see that you covered all of the relevant cases
(as far as I can see). I guess the biggest value in changing the field
name would be catching any topics in flight (or long-running forks).

-Peff

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 08/23] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public
  2024-04-29 20:43   ` [PATCH v2 08/23] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Taylor Blau
@ 2024-05-13 18:50     ` Jeff King
  2024-05-14  0:54       ` Taylor Blau
  0 siblings, 1 reply; 157+ messages in thread
From: Jeff King @ 2024-05-13 18:50 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Mon, Apr 29, 2024 at 04:43:22PM -0400, Taylor Blau wrote:

> The pseudo-merge selection code will be added in a subsequent commit,
> and will need a way to push the allocated commit structures into the
> bitmap writer from a separate compilation unit.
> 
> Make the `bitmap_writer_push_bitmapped_commit()` function part of the
> pack-bitmap.h header in order to make this possible.

I was a little surprised that this function and the one in the previous
commit needed to be public, since this whole topic is restricted to
writing, which is mostly contained to pack-bitmap-write.c. But you've
pulled the pseudo-merge bits out to pseudo-merge.[ch], and they need
access, which makes sense.

One could argue that it could all get stuffed into pack-bitmap-write.c,
but that is already getting to be a pretty large and complex file. So
this is probably the best route.

-Peff

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 09/23] pseudo-merge: implement support for selecting pseudo-merge commits
  2024-04-29 20:43   ` [PATCH v2 09/23] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
  2024-05-06 11:53     ` Patrick Steinhardt
@ 2024-05-13 19:03     ` Jeff King
  2024-05-14  0:58       ` Taylor Blau
  1 sibling, 1 reply; 157+ messages in thread
From: Jeff King @ 2024-05-13 19:03 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Mon, Apr 29, 2024 at 04:43:37PM -0400, Taylor Blau wrote:

> Teach the new pseudo-merge machinery how to select non-bitmapped commits
> for inclusion in different pseudo-merge group(s) based on a handful of
> criteria.
> 
> Pseudo-merges are derived first from named pseudo-merge groups (see the
> `bitmapPseudoMerge.<name>.*` configuration options). They are
> (optionally) further segmented within an individual pseudo-merge group
> based on any capture group(s) within the pseudo-merge group's pattern.
> 
> For example, a configuration like so:
> 
>     [bitmapPseudoMerge "all"]
>         pattern = "refs/"
>         threshold = now
>         stableThreshold = never
>         sampleRate = 100
>         maxMerges = 64
> 
> would group all non-bitmapped commits into up to 64 individual
> pseudo-merge commits.

I was going to complain that explanatory text like this should probably
go into the documentation, not a commit message. But I see you do later
add documentation. Which seems to happen when this code is actually
wired up to the bitmap-writer. Maybe a moot point now that I figured it
out, but I think we'd be better off with the two commits squashed
together.

And consider whether this commit message can be shortened a lot to just
refer to the embedded docs (and especially if there is any useful info
here that is not covered in the docs, and should be moved there). I do
think some of these explanatory examples are good for users, but we
don't necessarily have a good spot to put them. The git-config
documentation is more of a reference, and huge example sections would
probably bog it down. Maybe in the EXAMPLES section of pack-objects?
It's already a pretty big manpage, though, and this is just one tiny
corner.

So I dunno. I just want to make sure we don't bury useful information in
a commit message that most people won't see (something I've definitely
been guilty of in the past, and which has later caused problems).


I've got to break here in reviewing for the moment, but I think in a lot
of ways this commit is going to be the most interesting one, because the
usefulness of the whole pseudo-merge scheme depends on picking good sets
of commits (ones that cover a lot of ground but have a low chance of
being invalidated). So I'll pick up again with a careful look at this
one.

-Peff

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 01/23] Documentation/technical: describe pseudo-merge bitmaps format
  2024-05-10 11:46         ` Patrick Steinhardt
@ 2024-05-13 19:47           ` Taylor Blau
  2024-05-14  6:33             ` Patrick Steinhardt
  0 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-05-13 19:47 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Elijah Newren, Junio C Hamano

On Fri, May 10, 2024 at 01:46:59PM +0200, Patrick Steinhardt wrote:
> > > > +* An (optional) extended lookup table (written if and only if there is
> > > > +  at least one commit which appears in more than one pseudo-merge).
> > > > +  There are as many entries as commits which appear in multiple
> > > > +  pseudo-merges. Each entry contains the following:
> > > > +
> > > > +  ** `N`, a 4-byte unsigned value equal to the number of pseudo-merges
> > > > +     which contain a given commit.
> > >
> > > How exactly is the given commit identified? Or in other words, given an
> > > entry in the lookup table here, how do I figure out what commit it
> > > belongs to?
> >
> > They aren't identified within this section. The extended lookup table is
> > indexed into via the lookup table with an offset that is stored in the
> > `offset` field when the MSB is set.
>
> Okay. Would this explanation be a good addition to the document?

I think we already have this written down in the section above. See in
the previous bullet point the section reading "containing either one of
two possible offsets, deepening on whether or not the most-significant
bit is set: [...]"

Does that work?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 06/23] pack-bitmap-write: support storing pseudo-merge commits
  2024-05-13 18:42     ` Jeff King
@ 2024-05-13 20:19       ` Taylor Blau
  0 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-13 20:19 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Mon, May 13, 2024 at 02:42:46PM -0400, Jeff King wrote:
> On Mon, Apr 29, 2024 at 04:43:15PM -0400, Taylor Blau wrote:
>
> > diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
> > index 9bc41a9e145..fef02cd745a 100644
> > --- a/pack-bitmap-write.c
> > +++ b/pack-bitmap-write.c
> > @@ -24,7 +24,7 @@ struct bitmapped_commit {
> >  	struct ewah_bitmap *write_as;
> >  	int flags;
> >  	int xor_offset;
> > -	uint32_t commit_pos;
> > +	unsigned pseudo_merge : 1;
> >  };
>
> The addition of the bit flag here makes sense, but dropping commit_pos
> caught me by surprise. But...it looks like that flag is simply unused
> cruft even before this patch?
>
> It might be worth noting that in the commit message, or better still,
> pulling its removal out to a preparatory patch.

Hah, so this is a funny one :-).

I was following your suggestion to pull out the deletion into its own
patch[^1] and starting to dig out back-references to indicate why it was
safe to remove this field.

But the only reference to commit_pos is from 7cc8f971085 (pack-objects:
implement bitmap writing, 2013-12-21), which is the commit that added
this field in the first place. Looking at:

    $ git log -p -S commit_pos 7cc8f971085 -- pack-bitmap-write.c

doesn't really show us anything interesting, either.

But! There is an array called commit_positions, which I suspected was
for holding the values of commit_pos in the same order as they appear in
the writer.selected array.

So I think the right patch is something like this (which I'll put in the
next round of this series):

--- 8< ---
Subject: [PATCH] pack-bitmap-write.c: move commit_positions into commit_pos
 fields

In 7cc8f971085 (pack-objects: implement bitmap writing, 2013-12-21), the
bitmapped_commit struct was introduced, including the 'commit_pos'
field, which has been unused ever since its introduction more than a
decade ago.

Instead, we have used the nearby `commit_positions` array leaving the
bitmapped_commit struct with an unused 4-byte field.

We could drop the `commit_pos` field as unused, and continue to store
the values in the auxiliary array. But we could also drop the array and
store the data for each bitmapped_commit struct inside of the structure
itself, which is what this patch does.

In any spot that we previously read `commit_positions[i]`, we can now
instead read `writer.selected[i].commit_pos`. There are a few spots that
need changing as a result:

  - write_selected_commits_v1() is a simple transformation, since we're
    just reading the field. As a result, the function no longer needs an
    explicit argument to pass the commit_positions array.

  - write_lookup_table() also no longer needs the explicit
    commit_positions array passed in as an argument. But it still needs
    to sort an array of indices into the writer.selected array to read
    them in commit_pos order, so table_cmp() is adjusted accordingly.

  - bitmap_writer_finish() no longer needs to allocate, populate, and
    free the commit_positions table. Instead, we can just write the data
    directly into each struct bitmapped_commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 42 ++++++++++++++++--------------------------
 1 file changed, 16 insertions(+), 26 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 473a0fa0d40..26f57e48804 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -679,9 +679,7 @@ static const struct object_id *oid_access(size_t pos, const void *table)
 	return &index[pos]->oid;
 }

-static void write_selected_commits_v1(struct hashfile *f,
-				      uint32_t *commit_positions,
-				      off_t *offsets)
+static void write_selected_commits_v1(struct hashfile *f, off_t *offsets)
 {
 	int i;

@@ -691,7 +689,7 @@ static void write_selected_commits_v1(struct hashfile *f,
 		if (offsets)
 			offsets[i] = hashfile_total(f);

-		hashwrite_be32(f, commit_positions[i]);
+		hashwrite_be32(f, stored->commit_pos);
 		hashwrite_u8(f, stored->xor_offset);
 		hashwrite_u8(f, stored->flags);

@@ -699,23 +697,20 @@ static void write_selected_commits_v1(struct hashfile *f,
 	}
 }

-static int table_cmp(const void *_va, const void *_vb, void *_data)
+static int table_cmp(const void *_va, const void *_vb)
 {
-	uint32_t *commit_positions = _data;
-	uint32_t a = commit_positions[*(uint32_t *)_va];
-	uint32_t b = commit_positions[*(uint32_t *)_vb];
+	struct bitmapped_commit *a = &writer.selected[*(uint32_t *)_va];
+	struct bitmapped_commit *b = &writer.selected[*(uint32_t *)_vb];

-	if (a > b)
+	if (a->commit_pos < b->commit_pos)
+		return -1;
+	else if (a->commit_pos > b->commit_pos)
 		return 1;
-	else if (a < b)
-		return -1;

 	return 0;
 }

-static void write_lookup_table(struct hashfile *f,
-			       uint32_t *commit_positions,
-			       off_t *offsets)
+static void write_lookup_table(struct hashfile *f, off_t *offsets)
 {
 	uint32_t i;
 	uint32_t *table, *table_inv;
@@ -731,7 +726,7 @@ static void write_lookup_table(struct hashfile *f,
 	 * bitmap corresponds to j'th bitmapped commit (among the selected
 	 * commits) in lex order of OIDs.
 	 */
-	QSORT_S(table, writer.selected_nr, table_cmp, commit_positions);
+	QSORT(table, writer.selected_nr, table_cmp);

 	/* table_inv helps us discover that relationship (i'th bitmap
 	 * to j'th commit by j = table_inv[i])
@@ -762,7 +757,7 @@ static void write_lookup_table(struct hashfile *f,
 			xor_row = 0xffffffff;
 		}

-		hashwrite_be32(f, commit_positions[table[i]]);
+		hashwrite_be32(f, writer.selected[table[i]].commit_pos);
 		hashwrite_be64(f, (uint64_t)offsets[table[i]]);
 		hashwrite_be32(f, xor_row);
 	}
@@ -798,7 +793,6 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	static uint16_t flags = BITMAP_OPT_FULL_DAG;
 	struct strbuf tmp_file = STRBUF_INIT;
 	struct hashfile *f;
-	uint32_t *commit_positions = NULL;
 	off_t *offsets = NULL;
 	uint32_t i;

@@ -823,22 +817,19 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	if (options & BITMAP_OPT_LOOKUP_TABLE)
 		CALLOC_ARRAY(offsets, index_nr);

-	ALLOC_ARRAY(commit_positions, writer.selected_nr);
-
 	for (i = 0; i < writer.selected_nr; i++) {
 		struct bitmapped_commit *stored = &writer.selected[i];
-		int commit_pos = oid_pos(&stored->commit->object.oid, index, index_nr, oid_access);
+		stored->commit_pos = oid_pos(&stored->commit->object.oid, index,
+					     index_nr, oid_access);

-		if (commit_pos < 0)
+		if (stored->commit_pos < 0)
 			BUG(_("trying to write commit not in index"));
-
-		commit_positions[i] = commit_pos;
 	}

-	write_selected_commits_v1(f, commit_positions, offsets);
+	write_selected_commits_v1(f, offsets);

 	if (options & BITMAP_OPT_LOOKUP_TABLE)
-		write_lookup_table(f, commit_positions, offsets);
+		write_lookup_table(f, offsets);

 	if (options & BITMAP_OPT_HASH_CACHE)
 		write_hash_cache(f, index, index_nr);
@@ -853,6 +844,5 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 		die_errno("unable to rename temporary bitmap file to '%s'", filename);

 	strbuf_release(&tmp_file);
-	free(commit_positions);
 	free(offsets);
 }

--
2.45.0.57.gee4186f79f3

--- >8 ---

> > +static inline int bitmap_writer_selected_nr(void)
> > +{
> > +	return writer.selected_nr - writer.pseudo_merges_nr;
> > +}
>
> OK, so now most spots should use this new function instead of looking at
> writer.selected_nr directly. But if anybody accidentally uses the old
> field directly, it is presumably disastrous. Is it worth renaming it to
> make sure we caught all references?

We only need to check within this file, since the bitmap_writer
structure definition is defined within the pack-bitmap-writer.c
compilation unit.

I took a careful look through the file, and am confident that we touched
all of the spots that needed attention.

Thanks,
Taylor

[^1]: If memory serves, that was my original intention when writing this
  series for the first time, but I must have forgotten when I was
  actually splitting out the individual patches and staged the removal
  alongside the rest of this change.

^ permalink raw reply related	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 08/23] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public
  2024-05-13 18:50     ` Jeff King
@ 2024-05-14  0:54       ` Taylor Blau
  0 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-14  0:54 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Mon, May 13, 2024 at 02:50:55PM -0400, Jeff King wrote:
> On Mon, Apr 29, 2024 at 04:43:22PM -0400, Taylor Blau wrote:
>
> > The pseudo-merge selection code will be added in a subsequent commit,
> > and will need a way to push the allocated commit structures into the
> > bitmap writer from a separate compilation unit.
> >
> > Make the `bitmap_writer_push_bitmapped_commit()` function part of the
> > pack-bitmap.h header in order to make this possible.
>
> I was a little surprised that this function and the one in the previous
> commit needed to be public, since this whole topic is restricted to
> writing, which is mostly contained to pack-bitmap-write.c. But you've
> pulled the pseudo-merge bits out to pseudo-merge.[ch], and they need
> access, which makes sense.
>
> One could argue that it could all get stuffed into pack-bitmap-write.c,
> but that is already getting to be a pretty large and complex file. So
> this is probably the best route.

I had originally written the series like that, but the new bits nearly
doubled the line count of pack-bitmap-write.c:

    $ wc -l pack-bitmap-write.c pseudo-merge.c
     1031 pack-bitmap-write.c
      752 pseudo-merge.c
     1783 total

so I ended up splitting it out into pseudo-merge.ch in the end.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 09/23] pseudo-merge: implement support for selecting pseudo-merge commits
  2024-05-13 19:03     ` Jeff King
@ 2024-05-14  0:58       ` Taylor Blau
  2024-05-16  8:07         ` Jeff King
  0 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-05-14  0:58 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Junio C Hamano

On Mon, May 13, 2024 at 03:03:40PM -0400, Jeff King wrote:
> On Mon, Apr 29, 2024 at 04:43:37PM -0400, Taylor Blau wrote:
>
> > Teach the new pseudo-merge machinery how to select non-bitmapped commits
> > for inclusion in different pseudo-merge group(s) based on a handful of
> > criteria.
> >
> > Pseudo-merges are derived first from named pseudo-merge groups (see the
> > `bitmapPseudoMerge.<name>.*` configuration options). They are
> > (optionally) further segmented within an individual pseudo-merge group
> > based on any capture group(s) within the pseudo-merge group's pattern.
> >
> > For example, a configuration like so:
> >
> >     [bitmapPseudoMerge "all"]
> >         pattern = "refs/"
> >         threshold = now
> >         stableThreshold = never
> >         sampleRate = 100
> >         maxMerges = 64
> >
> > would group all non-bitmapped commits into up to 64 individual
> > pseudo-merge commits.
>
> I was going to complain that explanatory text like this should probably
> go into the documentation, not a commit message. But I see you do later
> add documentation. Which seems to happen when this code is actually
> wired up to the bitmap-writer. Maybe a moot point now that I figured it
> out, but I think we'd be better off with the two commits squashed
> together.

I dunno. This commit is already rather large, and I like the split of
"here's how we select these things", versus "now we actually start
selecting/writing them".

But maybe it results in a slightly awkward break in the middle that
leaves some of the stuff that would otherwise fit well in the EXAMPLES
section (as you mention below) in a weird limbo state.

> And consider whether this commit message can be shortened a lot to just
> refer to the embedded docs (and especially if there is any useful info
> here that is not covered in the docs, and should be moved there). I do
> think some of these explanatory examples are good for users, but we
> don't necessarily have a good spot to put them. The git-config
> documentation is more of a reference, and huge example sections would
> probably bog it down. Maybe in the EXAMPLES section of pack-objects?
> It's already a pretty big manpage, though, and this is just one tiny
> corner.

There's a good amount of information already in
Documentation/technical/bitmap-format.txt, though perhaps some of the
pieces mentioned here could be added there. Let me know if you think one
is missing something the other has (or if we could move significant
portions of the latter into the former).

> I've got to break here in reviewing for the moment, but I think in a lot
> of ways this commit is going to be the most interesting one, because the
> usefulness of the whole pseudo-merge scheme depends on picking good sets
> of commits (ones that cover a lot of ground but have a low chance of
> being invalidated). So I'll pick up again with a careful look at this
> one.

I agree that this is where things start to get interesting ;-). I'm
looking forward to your review when you get back to it!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 01/23] Documentation/technical: describe pseudo-merge bitmaps format
  2024-05-13 19:47           ` Taylor Blau
@ 2024-05-14  6:33             ` Patrick Steinhardt
  0 siblings, 0 replies; 157+ messages in thread
From: Patrick Steinhardt @ 2024-05-14  6:33 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1439 bytes --]

On Mon, May 13, 2024 at 03:47:01PM -0400, Taylor Blau wrote:
> On Fri, May 10, 2024 at 01:46:59PM +0200, Patrick Steinhardt wrote:
> > > > > +* An (optional) extended lookup table (written if and only if there is
> > > > > +  at least one commit which appears in more than one pseudo-merge).
> > > > > +  There are as many entries as commits which appear in multiple
> > > > > +  pseudo-merges. Each entry contains the following:
> > > > > +
> > > > > +  ** `N`, a 4-byte unsigned value equal to the number of pseudo-merges
> > > > > +     which contain a given commit.
> > > >
> > > > How exactly is the given commit identified? Or in other words, given an
> > > > entry in the lookup table here, how do I figure out what commit it
> > > > belongs to?
> > >
> > > They aren't identified within this section. The extended lookup table is
> > > indexed into via the lookup table with an offset that is stored in the
> > > `offset` field when the MSB is set.
> >
> > Okay. Would this explanation be a good addition to the document?
> 
> I think we already have this written down in the section above. See in
> the previous bullet point the section reading "containing either one of
> two possible offsets, deepening on whether or not the most-significant
> bit is set: [...]"
> 
> Does that work?

One could have a back-reference to that section here. But I don't mind
it too much overall.

Thanks!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 09/23] pseudo-merge: implement support for selecting pseudo-merge commits
  2024-05-14  0:58       ` Taylor Blau
@ 2024-05-16  8:07         ` Jeff King
  2024-05-16 22:43           ` Junio C Hamano
  0 siblings, 1 reply; 157+ messages in thread
From: Jeff King @ 2024-05-16  8:07 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Junio C Hamano

On Mon, May 13, 2024 at 08:58:17PM -0400, Taylor Blau wrote:

> > I was going to complain that explanatory text like this should probably
> > go into the documentation, not a commit message. But I see you do later
> > add documentation. Which seems to happen when this code is actually
> > wired up to the bitmap-writer. Maybe a moot point now that I figured it
> > out, but I think we'd be better off with the two commits squashed
> > together.
> 
> I dunno. This commit is already rather large, and I like the split of
> "here's how we select these things", versus "now we actually start
> selecting/writing them".
> 
> But maybe it results in a slightly awkward break in the middle that
> leaves some of the stuff that would otherwise fit well in the EXAMPLES
> section (as you mention below) in a weird limbo state.

It's not the break to me so much as the fact that you end up explaining
the concepts twice. Is it the same material presented in two ways? Or is
there stuff in one spot that is not in the other? I think the answer is
that it's a little bit of both. And as a reviewer (and an author) it's
hard to put yourself in the shoes of a user who is only going to see
what's in the docs.

So I'd rather even see the first patch as "here's some config; don't
worry about what it does too much, as we'll explain it in the next
commit" and then in the second patch say "go look at the config added by
this patch". And then we know we're looking at the same thing a user
will.

> There's a good amount of information already in
> Documentation/technical/bitmap-format.txt, though perhaps some of the
> pieces mentioned here could be added there. Let me know if you think one
> is missing something the other has (or if we could move significant
> portions of the latter into the former).

I don't think we should expect most users to read anything in
Documentation/technical. Now I don't expect most users to fiddle with
this feature at all. But reading over the config docs added by the
subsequent patch, it's not at all clear to me when I would want to tweak
the knobs or why.

I think there might need to be an "advanced packing concepts"
user-directed manual. Either as part of git-pack-objects(1), or maybe
broken out into its own page ("gitpacking(7)" or similar). Specifically
for this feature, I think it would want to cover:

  - what is this thing, and why would I want it. You cover this in the
    format doc, but I think it makes more sense in a user-directed doc
    (and to leave the format doc strictly as a technical reference).

  - what you wrote in "use cases" there is still, IMHO, introducing
    things in the wrong order for a regular user. They're going to come
    to the documentation either with a specific problem, or with an idea
    to browse around for things that might help them.

    So I feel like it needs to start with the concept and the
    motivation. Something like (assuming we're following a section on
    bitmaps in the "advanced packing" page, something I recognize also
    does not yet exist):

      Reachability bitmaps are most efficient when we have on-disk
      stored bitmaps for one or more of the starting points of a
      traversal. For this reason, Git prefers storing bitmaps for
      commits at the tips of refs, because traversals tend to start with
      those points.

      But if you have a large number of refs, it's not feasible to store
      a bitmap for _every_ ref tip. It takes up space, and just OR-ing
      all of those bitmaps together is expensive.

      One way we can deal with that is to create bitmaps that represent
      _groups_ of refs. When a traversal asks about the entire group,
      then we can use this single bitmap instead of considering each ref
      individually. Because these bitmaps represent the set of objects
      which would be reachable in a hypothetical merge of all of the
      commits, we call them pseudo-merge bitmaps.

   I don't think this is saying anything that your technical doc
   doesn't, but I think it's more important what it _isn't_ saying.
   We don't need to talk about commit bitmaps and merge bitmaps at all.
   We just want the user to have the concept of grouping refs. And then
   that would hopefully lead naturally into "OK, so how do we group
   them".

  - OK, so how do we group them? ;) I think there are two concerns here.
    One is that traversals can only use a pseudo-merge bitmap if _every_
    commit in its group is included in the traversal. So we want to
    group the refs along logical boundaries (e.g., tags vs branches vs
    remotes). Or in the case of shared-object repositories (like
    GitHub's), by boundaries which only the user knows about.

    And two is that we want groups that don't become invalidated when a
    ref changes or is removed. So you want a bunch of old, stable tags
    together, and probably don't want recent branches grouped at all.

    And then when we describe the config knobs you can turn, it should
    be in the user's mind how they can use them to influence those two
    things. For "logical boundaries" part, I think the commit message
    for patch 9 does some of that with the "refs/virtual" example. But
    that's something I don't see as clearly in the config documentation
    added in patch 10.

    The knobs for handling age are more complicated and harder to
    explain. ;) You do mention the power-law decay thing in patch 10,
    but it's in the technical format docs. I think it should be
    somewhere more user-accessible.

So hopefully that kind of lays out how I'm thinking about it. Both where
the docs go, but what I think are the useful ways to be thinking about
the feature. And not just for users, but as we see if the design is
doing a good job of fulfilling those needs.

I think the name/pattern config you introduce does cover the logical
boundaries in a clean and easy-to-understand way. The "age" stuff is
much fuzzier in my mind. Your power-law decay makes sense to me, though
it does have a lot of knobs, and I don't think we'll really understand
how it performs without real-world experimentation.

I do wonder if something stupidly simple like "just make a single group
including tags older than 3 months, and ignore everything else" (where
"single" is "single per logical boundary defined by the user") would
perform OK in practice. The point is to help "--all" and "--not --all"
when you have a bunch of crufty old refs. So really, the challenge is
mostly just identifying crufty old refs. ;)

But I do think your power-law stuff should be a functional superset of
that. And while it's complicated to reason about where the knobs should
be set, I don't think the code is very complex. And the fact that it
_has_ knobs gives something to tweak and gather data with.

All of which is to say, I guess, that I think the code is going in a
reasonable direction. It's hard to say much more without spending a
bunch of time benchmarking real repositories, their traversal queries,
and so on.

-Peff

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v2 09/23] pseudo-merge: implement support for selecting pseudo-merge commits
  2024-05-16  8:07         ` Jeff King
@ 2024-05-16 22:43           ` Junio C Hamano
  0 siblings, 0 replies; 157+ messages in thread
From: Junio C Hamano @ 2024-05-16 22:43 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, Elijah Newren

Jeff King <peff@peff.net> writes:

>     motivation. Something like (assuming we're following a section on
>     bitmaps in the "advanced packing" page, something I recognize also
>     does not yet exist):
>
>       Reachability bitmaps are most efficient when we have on-disk
>       stored bitmaps for one or more of the starting points of a
>       traversal. For this reason, Git prefers storing bitmaps for
>       commits at the tips of refs, because traversals tend to start with
>       those points.
>
>       But if you have a large number of refs, it's not feasible to store
>       a bitmap for _every_ ref tip. It takes up space, and just OR-ing
>       all of those bitmaps together is expensive.
>
>       One way we can deal with that is to create bitmaps that represent
>       _groups_ of refs. When a traversal asks about the entire group,
>       then we can use this single bitmap instead of considering each ref
>       individually. Because these bitmaps represent the set of objects
>       which would be reachable in a hypothetical merge of all of the
>       commits, we call them pseudo-merge bitmaps.

Nicely put.  I wish there were something like the above in the
patches when I read these patches for the first time.  The concept
of "pseudo-merge" was the first hump in the road to understanding.
Eventually I figured it out, but a simple write-up like the above
would have helped readers a lot.

Thanks.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* [PATCH v3 00/30] pack-bitmap: pseudo-merge reachability bitmaps
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (25 preceding siblings ...)
  2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
@ 2024-05-21 19:01 ` Taylor Blau
  2024-05-21 19:01   ` [PATCH v3 01/30] object.h: add flags allocated by pack-bitmap.h Taylor Blau
                     ` (25 more replies)
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
  27 siblings, 26 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:01 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Here is another reroll my topic to introduce pseudo-merge bitmaps.

The implementation is relatively unchanged compared to last time, save
for the review that Patrick provided on the first half or so of this
series.

The notable change from last time is a significant reorganization of the
documentation, description, motivation, and examples around what
pseudo-merge bitmaps are why you might want to use them. I took Peff's
suggestion to create a new gitpacking(7) manual page which I hope will
eventually grow to cover many of the more advanced concepts related to
packing objects [^1].

The other change is that this series is now based on
'tb/pack-bitmap-write-cleanups'.

Otherwise, a range-diff is included below for convenience. Thanks in
advance for your review!

[^1]: At present, the new manual page is rather bare :-). I want to
  separate the task of collecting all of the existing documentation
  around advanced packing concepts from pseudo-merge bitmaps and not add
  a new dependency here.

Taylor Blau (30):
  object.h: add flags allocated by pack-bitmap.h
  pack-bitmap-write.c: move commit_positions into commit_pos fields
  pack-bitmap: avoid use of static `bitmap_writer`
  pack-bitmap: drop unused `max_bitmaps` parameter
  pack-bitmap-write.c: avoid uninitialized 'write_as' field
  pack-bitmap: introduce `bitmap_writer_free()`
  Documentation/gitpacking.txt: initial commit
  Documentation/gitpacking.txt: describe pseudo-merge bitmaps
  Documentation/technical: describe pseudo-merge bitmaps format
  ewah: implement `ewah_bitmap_is_subset()`
  pack-bitmap: move some initialization to `bitmap_writer_init()`
  pseudo-merge.ch: initial commit
  pack-bitmap-write: support storing pseudo-merge commits
  pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()`
  pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public
  config: introduce git_config_float()
  pseudo-merge: implement support for selecting pseudo-merge commits
  pack-bitmap-write.c: write pseudo-merge table
  pack-bitmap: extract `read_bitmap()` function
  pseudo-merge: scaffolding for reads
  pack-bitmap.c: read pseudo-merge extension
  pseudo-merge: implement support for reading pseudo-merge commits
  ewah: implement `ewah_bitmap_popcount()`
  pack-bitmap: implement test helpers for pseudo-merge
  t/test-lib-functions.sh: support `--date` in `test_commit_bulk()`
  pack-bitmap.c: use pseudo-merges during traversal
  pack-bitmap: extra trace2 information
  ewah: `bitmap_equals_ewah()`
  pseudo-merge: implement support for finding existing merges
  t/perf: implement performace tests for pseudo-merge bitmaps

 Documentation/Makefile                       |   1 +
 Documentation/config.txt                     |   2 +
 Documentation/config/bitmap-pseudo-merge.txt |  90 +++
 Documentation/gitpacking.txt                 | 186 +++++
 Documentation/technical/bitmap-format.txt    | 132 ++++
 Makefile                                     |   1 +
 builtin/pack-objects.c                       |  20 +-
 config.c                                     |   9 +
 config.h                                     |   6 +
 ewah/bitmap.c                                |  76 ++
 ewah/ewok.h                                  |   8 +
 midx-write.c                                 |  17 +-
 object.h                                     |   1 +
 pack-bitmap-write.c                          | 488 ++++++++----
 pack-bitmap.c                                | 359 ++++++++-
 pack-bitmap.h                                |  55 +-
 parse.c                                      |  29 +
 parse.h                                      |   1 +
 pseudo-merge.c                               | 752 +++++++++++++++++++
 pseudo-merge.h                               | 216 ++++++
 t/helper/test-bitmap.c                       |  34 +-
 t/perf/p5333-pseudo-merge-bitmaps.sh         |  32 +
 t/t5333-pseudo-merge-bitmaps.sh              | 387 ++++++++++
 t/test-lib-functions.sh                      |  12 +-
 24 files changed, 2742 insertions(+), 172 deletions(-)
 create mode 100644 Documentation/config/bitmap-pseudo-merge.txt
 create mode 100644 Documentation/gitpacking.txt
 create mode 100644 pseudo-merge.c
 create mode 100644 pseudo-merge.h
 create mode 100755 t/perf/p5333-pseudo-merge-bitmaps.sh
 create mode 100755 t/t5333-pseudo-merge-bitmaps.sh

Range-diff against v2:
 -:  ----------- >  1:  38c96fc1909 object.h: add flags allocated by pack-bitmap.h
 -:  ----------- >  2:  cbedff02ed1 pack-bitmap-write.c: move commit_positions into commit_pos fields
 -:  ----------- >  3:  65ee69acfeb pack-bitmap: avoid use of static `bitmap_writer`
 -:  ----------- >  4:  b38dd5464d5 pack-bitmap: drop unused `max_bitmaps` parameter
 -:  ----------- >  5:  f16175295f5 pack-bitmap-write.c: avoid uninitialized 'write_as' field
 -:  ----------- >  6:  bf65967764f pack-bitmap: introduce `bitmap_writer_free()`
 -:  ----------- >  7:  0f20c9becf4 Documentation/gitpacking.txt: initial commit
 -:  ----------- >  8:  528b591bd84 Documentation/gitpacking.txt: describe pseudo-merge bitmaps
 1:  43fd5e35971 !  9:  12f318b3d7e Documentation/technical: describe pseudo-merge bitmaps format
    @@ Commit message
         format, making it compatible with previous versions of Git, as well as
         the original .bitmap implementation within JGit.
     
    -    The format (as well as a general description of pseudo-merge bitmaps,
    -    and motivating use-case(s)) is described in detail in the patch contents
    -    below, but the high-level description is as follows:
    +    The format is described in detail in the patch contents below, but the
    +    high-level description is as follows:
     
           - An array of pseudo-merge bitmaps, each containing a pair of EWAH
             bitmaps: one describing the set of pseudo-merge "parents", and
    @@ Documentation/technical/bitmap-format.txt: triplet is -
     +bytes (preceding the name-hash cache, commit lookup table, and trailing
     +checksum) of the `.bitmap` file is used to store pseudo-merge bitmaps.
     +
    -+A "pseudo-merge bitmap" is used to refer to a pair of bitmaps, as
    -+follows:
    -+
    -+Commit bitmap::
    -+
    -+  A bitmap whose set bits describe the set of commits included in the
    -+  pseudo-merge's "merge" bitmap (as below).
    -+
    -+Merge bitmap::
    -+
    -+  A bitmap whose set bits describe the reachability closure over the set
    -+  of commits in the pseudo-merge's "commits" bitmap (as above). An
    -+  identical bitmap would be generated for an octopus merge with the same
    -+  set of parents as described in the commits bitmap.
    -+
    -+Pseudo-merge bitmaps can accelerate bitmap traversals when all commits
    -+for a given pseudo-merge are listed on either side of the traversal,
    -+either directly (by explicitly asking for them as part of the `HAVES`
    -+or `WANTS`) or indirectly (by encountering them during a fill-in
    -+traversal).
    -+
    -+=== Use-cases
    -+
    -+For example, suppose there exists a pseudo-merge bitmap with a large
    -+number of commits, all of which are listed in the `WANTS` section of
    -+some bitmap traversal query. When pseudo-merge bitmaps are enabled, the
    -+bitmap machinery can quickly determine there is a pseudo-merge which
    -+satisfies some subset of the wanted objects on either side of the query.
    -+Then, we can inflate the EWAH-compressed bitmap, and `OR` it in to the
    -+resulting bitmap. By contrast, without pseudo-merge bitmaps, we would
    -+have to repeat the decompression and `OR`-ing step over a potentially
    -+large number of individual bitmaps, which can take proportionally more
    -+time.
    -+
    -+Another benefit of pseudo-merges arises when there is some combination
    -+of (a) a large number of references, with (b) poor bitmap coverage, and
    -+(c) deep, nested trees, making fill-in traversal relatively expensive.
    -+For example, suppose that there are a large enough number of tags where
    -+bitmapping each of the tags individually is infeasible. Without
    -+pseudo-merge bitmaps, computing the result of, say, `git rev-list
    -+--use-bitmap-index --count --objects --tags` would likely require a
    -+large amount of fill-in traversal. But when a large quantity of those
    -+tags are stored together in a pseudo-merge bitmap, the bitmap machinery
    -+can take advantage of the fact that we only care about the union of
    -+objects reachable from all of those tags, and answer the query much
    -+faster.
    ++For more information on what pseudo-merges are, why they are useful, and
    ++how to configure them, see the information in linkgit:gitpacking[7].
     +
     +=== File format
     +
    @@ Documentation/technical/bitmap-format.txt: triplet is -
     ++-------------------------------------------+
     +|                                           |
     +|  Lookup Table                             |
    -+|  +------------+--------------+            |
    -+|  | commit_pos |    offset    |            |
    -+|  +------------+--------------+            |
    -+|  |  4 bytes   |   8 bytes    |            |
    ++|  +---------------------------+            |
    ++|  | commit_pos (4 bytes)      |            |
    ++|  +---------------------------+            |
    ++|  | offset (8 bytes)          |            |
     +|  +------------+--------------+            |
     +|                                           |
     +|  Offset Cases:                            |
    @@ Documentation/technical/bitmap-format.txt: triplet is -
     ++-------------------------------------------+
     +|                                           |
     +|  Extended Lookup Table (Optional)         |
    -+|                                           |
     +|  +----+----------+----------+----------+  |
     +|  | N  | Offset 1 |   ....   | Offset N |  |
     +|  +----+----------+----------+----------+  |
    @@ Documentation/technical/bitmap-format.txt: triplet is -
     ++-------------------------------------------+
     +|                                           |
     +|  Pseudo-merge Metadata                    |
    -+|  +------------------+----------------+    |
    -+|  | # pseudo-merges  | # Commits      |    |
    -+|  +------------------+----------------+    |
    -+|  | 4 bytes          | 4 bytes        |    |
    -+|  +------------------+----------------+    |
    -+|                                           |
    -+|  +------------------+----------------+    |
    -+|  | Lookup offset    | Extension size |    |
    -+|  +------------------+----------------+    |
    -+|  | 8 bytes          | 8 bytes        |    |
    -+|  +------------------+----------------+    |
    ++|  +-----------------------------------+    |
    ++|  | # pseudo-merges (4 bytes)         |    |
    ++|  +-----------------------------------+    |
    ++|  | # commits (4 bytes)               |    |
    ++|  +-----------------------------------+    |
    ++|  | Lookup offset (8 bytes)           |    |
    ++|  +-----------------------------------+    |
    ++|  | Extension size (8 bytes)          |    |
    ++|  +-----------------------------------+    |
     +|                                           |
     ++-------------------------------------------+
     +....
    @@ Documentation/technical/bitmap-format.txt: triplet is -
     +
     +* Positions for all pseudo-merges, each stored as an 8-byte unsigned
     +  value (in network byte-order) containing the offset (relative to the
    -+  beginnign of the `.bitmap` file) of each consecutive pseudo-merge.
    ++  beginning of the `.bitmap` file) of each consecutive pseudo-merge.
     +
     +* A 4-byte unsigned value (in network byte-order) equal to the number of
     +  pseudo-merges.
 2:  290d928325d = 10:  40eb6137618 ewah: implement `ewah_bitmap_is_subset()`
 3:  5160859f7f3 <  -:  ----------- pack-bitmap: drop unused `max_bitmaps` parameter
 4:  3d7d930b1c5 <  -:  ----------- pack-bitmap: move some initialization to `bitmap_writer_init()`
 -:  ----------- > 11:  487fb7c6e9c pack-bitmap: move some initialization to `bitmap_writer_init()`
 5:  e7a87cf7d4e = 12:  827732acf99 pseudo-merge.ch: initial commit
 6:  ee33a703245 ! 13:  8608dd1860f pack-bitmap-write: support storing pseudo-merge commits
    @@ Commit message
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    + ## object.h ##
    +@@ object.h: void object_array_init(struct object_array *array);
    +  * reflog.c:                           10--12
    +  * builtin/show-branch.c:    0-------------------------------------------26
    +  * builtin/unpack-objects.c:                                 2021
    +- * pack-bitmap.h:                                                22
    ++ * pack-bitmap.h:                                              2122
    +  */
    + #define FLAG_BITS  28
    + 
    +
      ## pack-bitmap-write.c ##
     @@ pack-bitmap-write.c: struct bitmapped_commit {
    - 	struct ewah_bitmap *write_as;
      	int flags;
      	int xor_offset;
    --	uint32_t commit_pos;
    + 	uint32_t commit_pos;
     +	unsigned pseudo_merge : 1;
      };
      
    - struct bitmap_writer {
    -@@ pack-bitmap-write.c: struct bitmap_writer {
    - 	struct bitmapped_commit *selected;
    - 	unsigned int selected_nr, selected_alloc;
    - 
    -+	uint32_t pseudo_merges_nr;
    -+
    - 	struct progress *progress;
    - 	int show_progress;
    - 	unsigned char pack_checksum[GIT_MAX_RAWSZ];
    -@@ pack-bitmap-write.c: struct bitmap_writer {
    - 
    - static struct bitmap_writer writer;
    - 
    -+static inline int bitmap_writer_selected_nr(void)
    ++static inline int bitmap_writer_nr_selected_commits(struct bitmap_writer *writer)
     +{
    -+	return writer.selected_nr - writer.pseudo_merges_nr;
    ++	return writer->selected_nr - writer->pseudo_merges_nr;
     +}
     +
    - void bitmap_writer_init(struct repository *r)
    + void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r)
      {
    - 	writer.bitmaps = kh_init_oid_map();
    -@@ pack-bitmap-write.c: void bitmap_writer_build_type_index(struct packing_data *to_pack,
    -  * Compute the actual bitmaps
    + 	memset(writer, 0, sizeof(struct bitmap_writer));
    +@@ pack-bitmap-write.c: void bitmap_writer_build_type_index(struct bitmap_writer *writer,
       */
      
    --static inline void push_bitmapped_commit(struct commit *commit)
    -+static void bitmap_writer_push_bitmapped_commit(struct commit *commit,
    -+						unsigned pseudo_merge)
    + static inline void push_bitmapped_commit(struct bitmap_writer *writer,
    +-					 struct commit *commit)
    ++					 struct commit *commit,
    ++					 unsigned pseudo_merge)
      {
     -	int hash_ret;
     -	khiter_t hash_pos;
     -
    - 	if (writer.selected_nr >= writer.selected_alloc) {
    - 		writer.selected_alloc = (writer.selected_alloc + 32) * 2;
    - 		REALLOC_ARRAY(writer.selected, writer.selected_alloc);
    + 	if (writer->selected_nr >= writer->selected_alloc) {
    + 		writer->selected_alloc = (writer->selected_alloc + 32) * 2;
    + 		REALLOC_ARRAY(writer->selected, writer->selected_alloc);
      	}
      
    --	hash_pos = kh_put_oid_map(writer.bitmaps, commit->object.oid, &hash_ret);
    +-	hash_pos = kh_put_oid_map(writer->bitmaps, commit->object.oid,
    +-				  &hash_ret);
     -	if (!hash_ret)
     -		die(_("duplicate entry when writing bitmap index: %s"),
     -		    oid_to_hex(&commit->object.oid));
    --	kh_value(writer.bitmaps, hash_pos) = NULL;
    +-	kh_value(writer->bitmaps, hash_pos) = NULL;
     +	if (!pseudo_merge) {
     +		int hash_ret;
    -+		khiter_t hash_pos = kh_put_oid_map(writer.bitmaps,
    ++		khiter_t hash_pos = kh_put_oid_map(writer->bitmaps,
     +						   commit->object.oid,
     +						   &hash_ret);
     +
     +		if (!hash_ret)
     +			die(_("duplicate entry when writing bitmap index: %s"),
     +			    oid_to_hex(&commit->object.oid));
    -+		kh_value(writer.bitmaps, hash_pos) = NULL;
    ++		kh_value(writer->bitmaps, hash_pos) = NULL;
     +	}
      
    - 	writer.selected[writer.selected_nr].commit = commit;
    - 	writer.selected[writer.selected_nr].bitmap = NULL;
    - 	writer.selected[writer.selected_nr].flags = 0;
    -+	writer.selected[writer.selected_nr].pseudo_merge = pseudo_merge;
    + 	writer->selected[writer->selected_nr].commit = commit;
    + 	writer->selected[writer->selected_nr].bitmap = NULL;
    + 	writer->selected[writer->selected_nr].write_as = NULL;
    + 	writer->selected[writer->selected_nr].flags = 0;
    ++	writer->selected[writer->selected_nr].pseudo_merge = pseudo_merge;
      
    - 	writer.selected_nr++;
    + 	writer->selected_nr++;
      }
    -@@ pack-bitmap-write.c: static void compute_xor_offsets(void)
    +@@ pack-bitmap-write.c: static void compute_xor_offsets(struct bitmap_writer *writer)
      
    - 	while (next < writer.selected_nr) {
    - 		struct bitmapped_commit *stored = &writer.selected[next];
    + 	while (next < writer->selected_nr) {
    + 		struct bitmapped_commit *stored = &writer->selected[next];
     -
      		int best_offset = 0;
      		struct ewah_bitmap *best_bitmap = stored->bitmap;
    @@ pack-bitmap-write.c: static void compute_xor_offsets(void)
      
      			if (curr < 0)
      				break;
    -+			if (writer.selected[curr].pseudo_merge)
    ++			if (writer->selected[curr].pseudo_merge)
     +				continue;
      
      			test_xor = ewah_pool_new();
    - 			ewah_xor(writer.selected[curr].bitmap, stored->bitmap, test_xor);
    -@@ pack-bitmap-write.c: static void compute_xor_offsets(void)
    + 			ewah_xor(writer->selected[curr].bitmap, stored->bitmap, test_xor);
    +@@ pack-bitmap-write.c: static void compute_xor_offsets(struct bitmap_writer *writer)
      			}
      		}
      
    @@ pack-bitmap-write.c: static void bitmap_builder_init(struct bitmap_builder *bb,
      	}
      
      	if (prepare_revision_walk(&revs))
    -@@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bb_commit *ent,
    +@@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bitmap_writer *writer,
      		struct commit *c = prio_queue_get(queue);
      
      		if (old_bitmap && mapping) {
    @@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bb_commit *ent,
      			/*
      			 * If this commit has an old bitmap, then translate that
      			 * bitmap and add its bits to this one. No need to walk
    -@@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bb_commit *ent,
    +@@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bitmap_writer *writer,
      		 * Mark ourselves and queue our tree. The commit
      		 * walk ensures we cover all parents.
      		 */
    --		pos = find_object_pos(&c->object.oid, &found);
    +-		pos = find_object_pos(writer, &c->object.oid, &found);
     -		if (!found)
     -			return -1;
     -		bitmap_set(ent->bitmap, pos);
     -		prio_queue_put(tree_queue,
     -			       repo_get_commit_tree(the_repository, c));
     +		if (!(c->object.flags & BITMAP_PSEUDO_MERGE)) {
    -+			pos = find_object_pos(&c->object.oid, &found);
    ++			pos = find_object_pos(writer, &c->object.oid, &found);
     +			if (!found)
     +				return -1;
     +			bitmap_set(ent->bitmap, pos);
    @@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bb_commit *ent,
     +		}
      
      		for (p = c->parents; p; p = p->next) {
    - 			pos = find_object_pos(&p->item->object.oid, &found);
    -@@ pack-bitmap-write.c: static void store_selected(struct bb_commit *ent, struct commit *commit)
    + 			pos = find_object_pos(writer, &p->item->object.oid,
    +@@ pack-bitmap-write.c: static void store_selected(struct bitmap_writer *writer,
      
      	stored->bitmap = bitmap_to_ewah(ent->bitmap);
      
     +	if (ent->pseudo_merge)
     +		return;
     +
    - 	hash_pos = kh_get_oid_map(writer.bitmaps, commit->object.oid);
    - 	if (hash_pos == kh_end(writer.bitmaps))
    + 	hash_pos = kh_get_oid_map(writer->bitmaps, commit->object.oid);
    + 	if (hash_pos == kh_end(writer->bitmaps))
      		die(_("attempted to store non-selected commit: '%s'"),
    -@@ pack-bitmap-write.c: void bitmap_writer_select_commits(struct commit **indexed_commits,
    +@@ pack-bitmap-write.c: void bitmap_writer_select_commits(struct bitmap_writer *writer,
      
      	if (indexed_commits_nr < 100) {
      		for (i = 0; i < indexed_commits_nr; ++i)
    --			push_bitmapped_commit(indexed_commits[i]);
    -+			bitmap_writer_push_bitmapped_commit(indexed_commits[i], 0);
    +-			push_bitmapped_commit(writer, indexed_commits[i]);
    ++			push_bitmapped_commit(writer, indexed_commits[i], 0);
      		return;
      	}
      
    -@@ pack-bitmap-write.c: void bitmap_writer_select_commits(struct commit **indexed_commits,
    +@@ pack-bitmap-write.c: void bitmap_writer_select_commits(struct bitmap_writer *writer,
      			}
      		}
      
    --		push_bitmapped_commit(chosen);
    -+		bitmap_writer_push_bitmapped_commit(chosen, 0);
    +-		push_bitmapped_commit(writer, chosen);
    ++		push_bitmapped_commit(writer, chosen, 0);
      
      		i += next + 1;
    - 		display_progress(writer.progress, i);
    -@@ pack-bitmap-write.c: static void write_selected_commits_v1(struct hashfile *f,
    + 		display_progress(writer->progress, i);
    +@@ pack-bitmap-write.c: static void write_selected_commits_v1(struct bitmap_writer *writer,
      {
      	int i;
      
    --	for (i = 0; i < writer.selected_nr; ++i) {
    -+	for (i = 0; i < bitmap_writer_selected_nr(); ++i) {
    - 		struct bitmapped_commit *stored = &writer.selected[i];
    +-	for (i = 0; i < writer->selected_nr; ++i) {
    ++	for (i = 0; i < bitmap_writer_nr_selected_commits(writer); ++i) {
    + 		struct bitmapped_commit *stored = &writer->selected[i];
     +		if (stored->pseudo_merge)
     +			BUG("unexpected pseudo-merge among selected: %s",
     +			    oid_to_hex(&stored->commit->object.oid));
      
      		if (offsets)
      			offsets[i] = hashfile_total(f);
    -@@ pack-bitmap-write.c: static void write_lookup_table(struct hashfile *f,
    +@@ pack-bitmap-write.c: static void write_lookup_table(struct bitmap_writer *writer, struct hashfile *f,
      	uint32_t i;
      	uint32_t *table, *table_inv;
      
    --	ALLOC_ARRAY(table, writer.selected_nr);
    --	ALLOC_ARRAY(table_inv, writer.selected_nr);
    -+	ALLOC_ARRAY(table, bitmap_writer_selected_nr());
    -+	ALLOC_ARRAY(table_inv, bitmap_writer_selected_nr());
    +-	ALLOC_ARRAY(table, writer->selected_nr);
    +-	ALLOC_ARRAY(table_inv, writer->selected_nr);
    ++	ALLOC_ARRAY(table, bitmap_writer_nr_selected_commits(writer));
    ++	ALLOC_ARRAY(table_inv, bitmap_writer_nr_selected_commits(writer));
      
    --	for (i = 0; i < writer.selected_nr; i++)
    -+	for (i = 0; i < bitmap_writer_selected_nr(); i++)
    +-	for (i = 0; i < writer->selected_nr; i++)
    ++	for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++)
      		table[i] = i;
      
      	/*
    -@@ pack-bitmap-write.c: static void write_lookup_table(struct hashfile *f,
    +@@ pack-bitmap-write.c: static void write_lookup_table(struct bitmap_writer *writer, struct hashfile *f,
      	 * bitmap corresponds to j'th bitmapped commit (among the selected
      	 * commits) in lex order of OIDs.
      	 */
    --	QSORT_S(table, writer.selected_nr, table_cmp, commit_positions);
    -+	QSORT_S(table, bitmap_writer_selected_nr(), table_cmp, commit_positions);
    +-	QSORT_S(table, writer->selected_nr, table_cmp, writer);
    ++	QSORT_S(table, bitmap_writer_nr_selected_commits(writer), table_cmp, writer);
      
      	/* table_inv helps us discover that relationship (i'th bitmap
      	 * to j'th commit by j = table_inv[i])
      	 */
    --	for (i = 0; i < writer.selected_nr; i++)
    -+	for (i = 0; i < bitmap_writer_selected_nr(); i++)
    +-	for (i = 0; i < writer->selected_nr; i++)
    ++	for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++)
      		table_inv[table[i]] = i;
      
      	trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository);
    --	for (i = 0; i < writer.selected_nr; i++) {
    -+	for (i = 0; i < bitmap_writer_selected_nr(); i++) {
    - 		struct bitmapped_commit *selected = &writer.selected[table[i]];
    +-	for (i = 0; i < writer->selected_nr; i++) {
    ++	for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++) {
    + 		struct bitmapped_commit *selected = &writer->selected[table[i]];
      		uint32_t xor_offset = selected->xor_offset;
      		uint32_t xor_row;
    -@@ pack-bitmap-write.c: void bitmap_writer_finish(struct pack_idx_entry **index,
    +@@ pack-bitmap-write.c: void bitmap_writer_finish(struct bitmap_writer *writer,
      	memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE));
      	header.version = htons(default_version);
      	header.options = htons(flags | options);
    --	header.entry_count = htonl(writer.selected_nr);
    -+	header.entry_count = htonl(bitmap_writer_selected_nr());
    - 	hashcpy(header.checksum, writer.pack_checksum);
    +-	header.entry_count = htonl(writer->selected_nr);
    ++	header.entry_count = htonl(bitmap_writer_nr_selected_commits(writer));
    + 	hashcpy(header.checksum, writer->pack_checksum);
      
      	hashwrite(f, &header, sizeof(header) - GIT_MAX_RAWSZ + the_hash_algo->rawsz);
    -@@ pack-bitmap-write.c: void bitmap_writer_finish(struct pack_idx_entry **index,
    +@@ pack-bitmap-write.c: void bitmap_writer_finish(struct bitmap_writer *writer,
      	if (options & BITMAP_OPT_LOOKUP_TABLE)
      		CALLOC_ARRAY(offsets, index_nr);
      
    --	ALLOC_ARRAY(commit_positions, writer.selected_nr);
    -+	ALLOC_ARRAY(commit_positions, bitmap_writer_selected_nr());
    - 
    --	for (i = 0; i < writer.selected_nr; i++) {
    -+	for (i = 0; i < bitmap_writer_selected_nr(); i++) {
    - 		struct bitmapped_commit *stored = &writer.selected[i];
    - 		int commit_pos = oid_pos(&stored->commit->object.oid, index, index_nr, oid_access);
    - 
    +-	for (i = 0; i < writer->selected_nr; i++) {
    ++	for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++) {
    + 		struct bitmapped_commit *stored = &writer->selected[i];
    + 		int commit_pos = oid_pos(&stored->commit->object.oid, index,
    + 					 index_nr, oid_access);
     
      ## pack-bitmap.h ##
     @@ pack-bitmap.h: struct bitmap_disk_header {
    @@ pack-bitmap.h: struct bitmap_disk_header {
      #define NEEDS_BITMAP (1u<<22)
      
      /*
    +@@ pack-bitmap.h: struct bitmap_writer {
    + 	struct bitmapped_commit *selected;
    + 	unsigned int selected_nr, selected_alloc;
    + 
    ++	uint32_t pseudo_merges_nr;
    ++
    + 	struct progress *progress;
    + 	int show_progress;
    + 	unsigned char pack_checksum[GIT_MAX_RAWSZ];
 7:  9c6d09bf874 <  -:  ----------- pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()`
 8:  dfd4b73d12e <  -:  ----------- pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public
 -:  ----------- > 14:  99d2b6872ba pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()`
 -:  ----------- > 15:  e7209c60fa5 pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public
 -:  ----------- > 16:  3070135eb4b config: introduce git_config_float()
 9:  86a1e4b8b9b ! 17:  3029473c094 pseudo-merge: implement support for selecting pseudo-merge commits
    @@ Commit message
         for inclusion in different pseudo-merge group(s) based on a handful of
         criteria.
     
    -    Pseudo-merges are derived first from named pseudo-merge groups (see the
    -    `bitmapPseudoMerge.<name>.*` configuration options). They are
    -    (optionally) further segmented within an individual pseudo-merge group
    -    based on any capture group(s) within the pseudo-merge group's pattern.
    -
    -    For example, a configuration like so:
    -
    -        [bitmapPseudoMerge "all"]
    -            pattern = "refs/"
    -            threshold = now
    -            stableThreshold = never
    -            sampleRate = 100
    -            maxMerges = 64
    -
    -    would group all non-bitmapped commits into up to 64 individual
    -    pseudo-merge commits.
    -
    -    If you wanted to separate tags from branches when generating
    -    pseudo-merge commits, and further segment them by which fork they
    -    originate from (using the same "refs/virtual/" scheme as in the delta
    -    islands documentation), you would instead write something like:
    -
    -        [bitmapPseudoMerge "all"]
    -            pattern = "refs/virtual/([0-9]+)/(heads|tags)/"
    -            threshold = now
    -            stableThreshold = never
    -            sampleRate = 100
    -            maxMerges = 64
    -
    -    Which would generate pseudo-merge group identifiers like "1234-heads",
    -    and "5678-tags" (for branches in fork "1234", and tags in remote "5678",
    -    respectively).
    -
    -    Within pseudo-merge groups, there are a handful of other options used to
    -    control the distribution of matching commits among individual
    -    pseudo-merge commits:
    -
    -      - bitmapPseudoMerge.<name>.decay
    -      - bitmapPseudoMerge.<name>.sampleRate
    -      - bitmapPseudoMerge.<name>.threshold
    -      - bitmapPseudoMerge.<name>.maxMerges
    -      - bitmapPseudoMerge.<name>.stableThreshold
    -      - bitmapPseudoMerge.<name>.stableSize
    -
    -    The decay parameter roughly corresponds to "k" in `f(n) = C*n^(-k/100)`,
    -    where `f(n)` describes the size of the `n`-th pseudo-merge group. The
    -    sample rate controls what percentage of eligible commits are considered
    -    as candidates. The threshold parameter indicates the minimum age (so as
    -    to avoid including too-recent commits in a pseudo-merge group, making it
    -    less likely to be valid). The "maxMerges" parameter sets an upper-bound
    -    on the number of pseudo-merge commits an individual group
    -
    -    The latter two "stable"-related parameters control "stable" pseudo-merge
    -    groups, comprised of a fixed number of commits which are older than the
    -    configured "stable threshold" value and may be grouped together in
    -    chunks of "stableSize" in order of age.
    -
    -    This patch implements the aforementioned selection routine, as well as
    -    parsing the relevant configuration options.
    +    Note that the selected pseudo-merge commits aren't actually used or
    +    written anywhere yet. This will be done in the following commit.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    + ## Documentation/config.txt ##
    +@@ Documentation/config.txt: include::config/apply.txt[]
    + 
    + include::config/attr.txt[]
    + 
    ++include::config/bitmap-pseudo-merge.txt[]
    ++
    + include::config/blame.txt[]
    + 
    + include::config/branch.txt[]
    +
    + ## Documentation/config/bitmap-pseudo-merge.txt (new) ##
    +@@
    ++NOTE: The configuration options in `bitmapPseudoMerge.*` are considered
    ++EXPERIMENTAL and may be subject to change or be removed entirely in the
    ++future.
    ++
    ++bitmapPseudoMerge.<name>.pattern::
    ++	Regular expression used to match reference names. Commits
    ++	pointed to by references matching this pattern (and meeting
    ++	the below criteria, like `bitmapPseudoMerge.<name>.sampleRate`
    ++	and `bitmapPseudoMerge.<name>.threshold`) will be considered
    ++	for inclusion in a pseudo-merge bitmap.
    +++
    ++Commits are grouped into pseudo-merge groups based on whether or not
    ++any reference(s) that point at a given commit match the pattern, which
    ++is an extended regular expression.
    +++
    ++Within a pseudo-merge group, commits may be further grouped into
    ++sub-groups based on the capture groups in the pattern. These
    ++sub-groupings are formed from the regular expressions by concatenating
    ++any capture groups from the regular expression, with a '-' dash in
    ++between.
    +++
    ++For example, if the pattern is `refs/tags/`, then all tags (provided
    ++they meet the below criteria) will be considered candidates for the
    ++same pseudo-merge group. However, if the pattern is instead
    ++`refs/remotes/([0-9])+/tags/`, then tags from different remotes will
    ++be grouped into separate pseudo-merge groups, based on the remote
    ++number.
    ++
    ++bitmapPseudoMerge.<name>.decay::
    ++	Determines the rate at which consecutive pseudo-merge bitmap
    ++	groups decrease in size. Must be non-negative. This parameter
    ++	can be thought of as `k` in the function `f(n) = C * n^-k`,
    ++	where `f(n)` is the size of the `n`th group.
    +++
    ++Setting the decay rate equal to `0` will cause all groups to be the
    ++same size. Setting the decay rate equal to `1` will cause the `n`th
    ++group to be `1/n` the size of the initial group.  Higher values of the
    ++decay rate cause consecutive groups to shrink at an increasing rate.
    ++The default is `1`.
    +++
    ++If all groups are the same size, it is possible that groups containing
    ++newer commits will be able to be used less often than earlier groups,
    ++since it is more likely that the references pointing at newer commits
    ++will be updated more often than a reference pointing at an old commit.
    ++
    ++bitmapPseudoMerge.<name>.sampleRate::
    ++	Determines the proportion of non-bitmapped commits (among
    ++	reference tips) which are selected for inclusion in an
    ++	unstable pseudo-merge bitmap. Must be between `0` and `1`
    ++	(inclusive). The default is `1`.
    ++
    ++bitmapPseudoMerge.<name>.threshold::
    ++	Determines the minimum age of non-bitmapped commits (among
    ++	reference tips, as above) which are candidates for inclusion
    ++	in an unstable pseudo-merge bitmap. The default is
    ++	`1.week.ago`.
    ++
    ++bitmapPseudoMerge.<name>.maxMerges::
    ++	Determines the maximum number of pseudo-merge commits among
    ++	which commits may be distributed.
    +++
    ++For pseudo-merge groups whose pattern does not contain any capture
    ++groups, this setting is applied for all commits matching the regular
    ++expression. For patterns that have one or more capture groups, this
    ++setting is applied for each distinct capture group.
    +++
    ++For example, if your capture group is `refs/tags/`, then this setting
    ++will distribute all tags into a maximum of `maxMerges` pseudo-merge
    ++commits. However, if your capture group is, say,
    ++`refs/remotes/([0-9]+)/tags/`, then this setting will be applied to
    ++each remote's set of tags individually.
    +++
    ++Must be non-negative. The default value is 64.
    ++
    ++bitmapPseudoMerge.<name>.stableThreshold::
    ++	Determines the minimum age of commits (among reference tips,
    ++	as above, however stable commits are still considered
    ++	candidates even when they have been covered by a bitmap) which
    ++	are candidates for a stable a pseudo-merge bitmap. The default
    ++	is `1.month.ago`.
    +++
    ++Setting this threshold to a smaller value (e.g., 1.week.ago) will cause
    ++more stable groups to be generated (which impose a one-time generation
    ++cost) but those groups will likely become stale over time. Using a
    ++larger value incurs the opposite penalty (fewer stable groups which are
    ++more useful).
    ++
    ++bitmapPseudoMerge.<name>.stableSize::
    ++	Determines the size (in number of commits) of a stable
    ++	psuedo-merge bitmap. The default is `512`.
    +
    + ## Documentation/gitpacking.txt ##
    +@@ Documentation/gitpacking.txt: can take advantage of the fact that we only care about the union of
    + objects reachable from all of those tags, and answer the query much
    + faster.
    + 
    ++=== Configuration
    ++
    ++Reference tips are grouped into different pseudo-merge groups according
    ++to two criteria. A reference name matches one or more of the defined
    ++pseudo-merge patterns, and optionally one or more capture groups within
    ++that pattern which further partition the group.
    ++
    ++Within a group, commits may be considered "stable", or "unstable"
    ++depending on their age. These are adjusted by setting the
    ++`bitmapPseudoMerge.<name>.stableThreshold` and
    ++`bitmapPseudoMerge.<name>.threshold` configuration values, respectively.
    ++
    ++All stable commits are grouped into pseudo-merges of equal size
    ++(`bitmapPseudoMerge.<name>.stableSize`). If the `stableSize`
    ++configuration is set to, say, 100, then the first 100 commits (ordered
    ++by committer date) which are older than the `stableThreshold` value will
    ++form one group, the next 100 commits will form another group, and so on.
    ++
    ++Among unstable commits, the pseudo-merge machinery will attempt to
    ++combine older commits into large groups as opposed to newer commits
    ++which will appear in smaller groups. This is based on the heuristic that
    ++references whose tip commit is older are less likely to be modified to
    ++point at a different commit than a reference whose tip commit is newer.
    ++
    ++The size of groups is determined by a power-law decay function, and the
    ++decay parameter roughly corresponds to "k" in `f(n) = C*n^(-k/100)`,
    ++where `f(n)` describes the size of the `n`-th pseudo-merge group. The
    ++sample rate controls what percentage of eligible commits are considered
    ++as candidates. The threshold parameter indicates the minimum age (so as
    ++to avoid including too-recent commits in a pseudo-merge group, making it
    ++less likely to be valid). The "maxMerges" parameter sets an upper-bound
    ++on the number of pseudo-merge commits an individual group
    ++
    ++The "stable"-related parameters control "stable" pseudo-merge groups,
    ++comprised of a fixed number of commits which are older than the
    ++configured "stable threshold" value and may be grouped together in
    ++chunks of "stableSize" in order of age.
    ++
    ++The exact configuration for pseudo-merges is as follows:
    ++
    ++include::config/bitmap-pseudo-merge.txt[]
    ++
    ++=== Examples
    ++
    ++Suppose that you have a repository with a large number of references,
    ++and you want a bare-bones configuration of pseudo-merge bitmaps that
    ++will enhance bitmap coverage of the `refs/` namespace. You may start
    ++wiht a configuration like so:
    ++
    ++    [bitmapPseudoMerge "all"]
    ++	pattern = "refs/"
    ++	threshold = now
    ++	stableThreshold = never
    ++	sampleRate = 100
    ++	maxMerges = 64
    ++
    ++This will create pseudo-merge bitmaps for all references, regardless of
    ++their age, and group them into 64 pseudo-merge commits.
    ++
    ++If you wanted to separate tags from branches when generating
    ++pseudo-merge commits, you would instead define the pattern with a
    ++capture group, like so:
    ++
    ++    [bitmapPseudoMerge "all"]
    ++	pattern = "refs/(heads/tags)/"
    ++
    ++Suppose instead that you are working in a fork-network repository, with
    ++each fork specified by some numeric ID, and whose refs reside in
    ++`refs/virtual/NNN/` (where `NNN` is the numeric ID corresponding to some
    ++fork) in the network. In this instance, you may instead write something
    ++like:
    ++
    ++    [bitmapPseudoMerge "all"]
    ++	pattern = "refs/virtual/([0-9]+)/(heads|tags)/"
    ++	threshold = now
    ++	stableThreshold = never
    ++	sampleRate = 100
    ++	maxMerges = 64
    ++
    ++Which would generate pseudo-merge group identifiers like "1234-heads",
    ++and "5678-tags" (for branches in fork "1234", and tags in remote "5678",
    ++respectively).
    ++
    + SEE ALSO
    + --------
    + linkgit:git-pack-objects[1]
    +
    + ## pack-bitmap-write.c ##
    +@@
    + #include "trace2.h"
    + #include "tree.h"
    + #include "tree-walk.h"
    ++#include "pseudo-merge.h"
    + 
    + struct bitmapped_commit {
    + 	struct commit *commit;
    +@@ pack-bitmap-write.c: void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r)
    + 	if (writer->bitmaps)
    + 		BUG("bitmap writer already initialized");
    + 	writer->bitmaps = kh_init_oid_map();
    ++	writer->pseudo_merge_commits = kh_init_oid_map();
    ++
    ++	string_list_init_dup(&writer->pseudo_merge_groups);
    ++
    ++	load_pseudo_merges_from_config(&writer->pseudo_merge_groups);
    ++}
    ++
    ++static void free_pseudo_merge_commit_idx(struct pseudo_merge_commit_idx *idx)
    ++{
    ++	if (!idx)
    ++		return;
    ++	free(idx->pseudo_merge);
    ++	free(idx);
    + }
    + 
    + void bitmap_writer_free(struct bitmap_writer *writer)
    + {
    + 	uint32_t i;
    ++	struct pseudo_merge_commit_idx *idx;
    + 
    + 	if (!writer)
    + 		return;
    +@@ pack-bitmap-write.c: void bitmap_writer_free(struct bitmap_writer *writer)
    + 
    + 	kh_destroy_oid_map(writer->bitmaps);
    + 
    ++	kh_foreach_value(writer->pseudo_merge_commits, idx,
    ++			 free_pseudo_merge_commit_idx(idx));
    ++	kh_destroy_oid_map(writer->pseudo_merge_commits);
    ++
    + 	for (i = 0; i < writer->selected_nr; i++) {
    + 		struct bitmapped_commit *bc = &writer->selected[i];
    + 		if (bc->write_as != bc->bitmap)
    +@@ pack-bitmap-write.c: void bitmap_writer_select_commits(struct bitmap_writer *writer,
    + 	}
    + 
    + 	stop_progress(&writer->progress);
    ++
    ++	select_pseudo_merges(writer, indexed_commits, indexed_commits_nr);
    + }
    + 
    + 
    +
    + ## pack-bitmap.h ##
    +@@ pack-bitmap.h: struct bitmap_writer {
    + 	struct bitmapped_commit *selected;
    + 	unsigned int selected_nr, selected_alloc;
    + 
    ++	struct string_list pseudo_merge_groups;
    ++	kh_oid_map_t *pseudo_merge_commits; /* oid -> pseudo merge(s) */
    + 	uint32_t pseudo_merges_nr;
    + 
    + 	struct progress *progress;
    +
      ## pseudo-merge.c ##
     @@
      #include "git-compat-util.h"
    @@ pseudo-merge.c
     +
     +#define DEFAULT_PSEUDO_MERGE_DECAY 1.0f
     +#define DEFAULT_PSEUDO_MERGE_MAX_MERGES 64
    -+#define DEFAULT_PSEUDO_MERGE_SAMPLE_RATE 100
    ++#define DEFAULT_PSEUDO_MERGE_SAMPLE_RATE 1
     +#define DEFAULT_PSEUDO_MERGE_THRESHOLD approxidate("1.week.ago")
     +#define DEFAULT_PSEUDO_MERGE_STABLE_THRESHOLD approxidate("1.month.ago")
     +#define DEFAULT_PSEUDO_MERGE_STABLE_SIZE 512
    @@ pseudo-merge.c
     +		C += 1.0f / gitexp(n + 1, group->decay);
     +	C = matches->unstable_nr / C;
     +
    -+	return (int)((C / gitexp(i + 1, group->decay)) + 0.5);
    ++	return (uint32_t)((C / gitexp(i + 1, group->decay)) + 0.5);
     +}
     +
    -+static void init_pseudo_merge_group(struct pseudo_merge_group *group)
    ++static void pseudo_merge_group_init(struct pseudo_merge_group *group)
     +{
     +	memset(group, 0, sizeof(struct pseudo_merge_group));
     +
    @@ pseudo-merge.c
     +	struct strbuf buf = STRBUF_INIT;
     +	const char *sub, *key;
     +	size_t sub_len;
    ++	int ret = 0;
     +
     +	if (parse_config_key(var, "bitmappseudomerge", &sub, &sub_len, &key))
    -+		return 0;
    ++		goto done;
     +
     +	if (!sub_len)
    -+		return 0;
    ++		goto done;
     +
     +	strbuf_add(&buf, sub, sub_len);
     +
    @@ pseudo-merge.c
     +		item = string_list_insert(list, buf.buf);
     +
     +		item->util = xmalloc(sizeof(struct pseudo_merge_group));
    -+		init_pseudo_merge_group(item->util);
    ++		pseudo_merge_group_init(item->util);
     +	}
     +
     +	group = item->util;
    @@ pseudo-merge.c
     +
     +		strbuf_release(&re);
     +	} else if (!strcmp(key, "decay")) {
    -+		group->decay = git_config_int(var, value, ctx->kvi);
    ++		group->decay = git_config_float(var, value, ctx->kvi);
     +		if (group->decay < 0) {
     +			warning(_("%s must be non-negative, using default"), var);
     +			group->decay = DEFAULT_PSEUDO_MERGE_DECAY;
     +		}
     +	} else if (!strcmp(key, "samplerate")) {
    -+		group->sample_rate = git_config_int(var, value, ctx->kvi);
    -+		if (!(0 <= group->sample_rate && group->sample_rate <= 100)) {
    -+			warning(_("%s must be between 0 and 100, using default"), var);
    ++		group->sample_rate = git_config_float(var, value, ctx->kvi);
    ++		if (!(0 <= group->sample_rate && group->sample_rate <= 1)) {
    ++			warning(_("%s must be between 0 and 1, using default"), var);
     +			group->sample_rate = DEFAULT_PSEUDO_MERGE_SAMPLE_RATE;
     +		}
     +	} else if (!strcmp(key, "threshold")) {
     +		if (git_config_expiry_date(&group->threshold, var, value)) {
    -+			strbuf_release(&buf);
    -+			return -1;
    ++			ret = -1;
    ++			goto done;
     +		}
     +	} else if (!strcmp(key, "maxmerges")) {
     +		group->max_merges = git_config_int(var, value, ctx->kvi);
    @@ pseudo-merge.c
     +		}
     +	} else if (!strcmp(key, "stablethreshold")) {
     +		if (git_config_expiry_date(&group->stable_threshold, var, value)) {
    -+			strbuf_release(&buf);
    -+			return -1;
    ++			ret = -1;
    ++			goto done;
     +		}
     +	} else if (!strcmp(key, "stablesize")) {
     +		group->stable_size = git_config_int(var, value, ctx->kvi);
    @@ pseudo-merge.c
     +		}
     +	}
     +
    ++done:
     +	strbuf_release(&buf);
     +
    -+	return 0;
    ++	return ret;
     +}
     +
     +void load_pseudo_merges_from_config(struct string_list *list)
    @@ pseudo-merge.c
     +					   int flags UNUSED,
     +					   void *_data)
     +{
    -+	struct string_list *list = _data;
    ++	struct bitmap_writer *writer = _data;
     +	struct object_id peeled;
     +	struct commit *c;
     +	uint32_t i;
    @@ pseudo-merge.c
     +	if (!c)
     +		return 0;
     +
    -+	has_bitmap = bitmap_writer_has_bitmapped_object_id(oid);
    ++	has_bitmap = bitmap_writer_has_bitmapped_object_id(writer, oid);
     +
    -+	for (i = 0; i < list->nr; i++) {
    ++	for (i = 0; i < writer->pseudo_merge_groups.nr; i++) {
     +		struct pseudo_merge_group *group;
     +		struct pseudo_merge_matches *matches;
     +		struct strbuf group_name = STRBUF_INIT;
     +		regmatch_t captures[16];
     +		size_t j;
     +
    -+		group = list->items[i].util;
    ++		group = writer->pseudo_merge_groups.items[i].util;
     +		if (regexec(group->pattern, refname, ARRAY_SIZE(captures),
     +			    captures, 0))
     +			continue;
    @@ pseudo-merge.c
     +
     +{
     +	struct pseudo_merge_commit_idx *pmc;
    -+	khiter_t hash_pos;
    -+
    -+	hash_pos = kh_get_oid_map(pseudo_merge_commits, *oid);
    -+	if (hash_pos == kh_end(pseudo_merge_commits)) {
    -+		int hash_ret;
    -+		hash_pos = kh_put_oid_map(pseudo_merge_commits, *oid, &hash_ret);
    ++	int hash_ret;
    ++	khiter_t hash_pos = kh_put_oid_map(pseudo_merge_commits, *oid,
    ++					   &hash_ret);
     +
    ++	if (hash_ret) {
     +		CALLOC_ARRAY(pmc, 1);
    -+
     +		kh_value(pseudo_merge_commits, hash_pos) = pmc;
     +	} else {
     +		pmc = kh_value(pseudo_merge_commits, hash_pos);
    @@ pseudo-merge.c
     +
     +#define MIN_PSEUDO_MERGE_SIZE 8
     +
    -+static void select_pseudo_merges_1(struct pseudo_merge_group *group,
    -+				   struct pseudo_merge_matches *matches,
    -+				   kh_oid_map_t *pseudo_merge_commits,
    -+				   uint32_t *pseudo_merges_nr)
    ++static void select_pseudo_merges_1(struct bitmap_writer *writer,
    ++				   struct pseudo_merge_group *group,
    ++				   struct pseudo_merge_matches *matches)
     +{
     +	uint32_t i, j;
     +	uint32_t stable_merges_nr;
    @@ pseudo-merge.c
     +		merge = push_pseudo_merge(group);
     +		p = &merge->parents;
     +
    ++		/*
    ++		 * For each pseudo-merge created above, add parents to the
    ++		 * allocated commit node from the stable set of commits
    ++		 * (un-bitmapped, newer than the stable threshold).
    ++		 */
     +		do {
     +			struct commit *c;
     +			struct pseudo_merge_commit_idx *pmc;
    @@ pseudo-merge.c
     +				break;
     +
     +			c = matches->stable[j++];
    -+			pmc = pseudo_merge_idx(pseudo_merge_commits,
    ++			/*
    ++			 * Here and below, make sure that we keep our mapping of
    ++			 * commits -> pseudo-merge(s) which include the key'd
    ++			 * commit up-to-date.
    ++			 */
    ++			pmc = pseudo_merge_idx(writer->pseudo_merge_commits,
     +					       &c->object.oid);
     +
     +			ALLOC_GROW(pmc->pseudo_merge, pmc->nr + 1, pmc->alloc);
     +
    -+			pmc->pseudo_merge[pmc->nr++] = *pseudo_merges_nr;
    ++			pmc->pseudo_merge[pmc->nr++] = writer->pseudo_merges_nr;
     +			p = commit_list_append(c, p);
     +		} while (j % group->stable_size);
     +
    -+		bitmap_writer_push_bitmapped_commit(merge, 1);
    -+		(*pseudo_merges_nr)++;
    ++		bitmap_writer_push_commit(writer, merge, 1);
    ++		writer->pseudo_merges_nr++;
     +	}
     +
     +	/* make up to group->max_merges pseudo merges for unstable commits */
    @@ pseudo-merge.c
     +		size = pseudo_merge_group_size(group, matches, i);
     +		end = size < MIN_PSEUDO_MERGE_SIZE ? matches->unstable_nr : j + size;
     +
    ++		/*
    ++		 * For each pseudo-merge commit created above, add parents to
    ++		 * the allocated commit node from the unstable set of commits
    ++		 * (newer than the stable threshold).
    ++		 *
    ++		 * Account for the sample rate, since not every candidate from
    ++		 * the set of stable commits will be included as a pseudo-merge
    ++		 * parent.
    ++		 */
     +		for (; j < end && j < matches->unstable_nr; j++) {
     +			struct commit *c = matches->unstable[j];
     +			struct pseudo_merge_commit_idx *pmc;
     +
    -+			if (j % (100 / group->sample_rate))
    ++			if (j % (uint32_t)(1.0f / group->sample_rate))
     +				continue;
     +
    -+			pmc = pseudo_merge_idx(pseudo_merge_commits,
    ++			pmc = pseudo_merge_idx(writer->pseudo_merge_commits,
     +					       &c->object.oid);
     +
     +			ALLOC_GROW(pmc->pseudo_merge, pmc->nr + 1, pmc->alloc);
     +
    -+			pmc->pseudo_merge[pmc->nr++] = *pseudo_merges_nr;
    ++			pmc->pseudo_merge[pmc->nr++] = writer->pseudo_merges_nr;
     +			p = commit_list_append(c, p);
     +		}
     +
    -+		bitmap_writer_push_bitmapped_commit(merge, 1);
    -+		(*pseudo_merges_nr)++;
    ++		bitmap_writer_push_commit(writer, merge, 1);
    ++		writer->pseudo_merges_nr++;
     +		if (end >= matches->unstable_nr)
     +			break;
     +	}
    @@ pseudo-merge.c
     +	QSORT(matches->unstable, matches->unstable_nr, commit_date_cmp);
     +}
     +
    -+void select_pseudo_merges(struct string_list *list,
    -+			  struct commit **commits, size_t commits_nr,
    -+			  kh_oid_map_t *pseudo_merge_commits,
    -+			  uint32_t *pseudo_merges_nr,
    -+			  unsigned show_progress)
    ++void select_pseudo_merges(struct bitmap_writer *writer,
    ++			  struct commit **commits, size_t commits_nr)
     +{
     +	struct progress *progress = NULL;
     +	uint32_t i;
     +
    -+	if (!list->nr)
    ++	if (!writer->pseudo_merge_groups.nr)
     +		return;
     +
    -+	if (show_progress)
    -+		progress = start_progress("Selecting pseudo-merge commits", list->nr);
    ++	if (writer->show_progress)
    ++		progress = start_progress("Selecting pseudo-merge commits",
    ++					  writer->pseudo_merge_groups.nr);
     +
    -+	for_each_ref(find_pseudo_merge_group_for_ref, list);
    ++	for_each_ref(find_pseudo_merge_group_for_ref, writer);
     +
    -+	for (i = 0; i < list->nr; i++) {
    ++	for (i = 0; i < writer->pseudo_merge_groups.nr; i++) {
     +		struct pseudo_merge_group *group;
     +		struct hashmap_iter iter;
     +		struct strmap_entry *e;
     +
    -+		group = list->items[i].util;
    ++		group = writer->pseudo_merge_groups.items[i].util;
     +		strmap_for_each_entry(&group->matches, &iter, e) {
     +			struct pseudo_merge_matches *matches = e->value;
     +
     +			sort_pseudo_merge_matches(matches);
     +
    -+			select_pseudo_merges_1(group, matches,
    -+					       pseudo_merge_commits,
    -+					       pseudo_merges_nr);
    ++			select_pseudo_merges_1(writer, group, matches);
     +		}
     +
     +		display_progress(progress, i + 1);
    @@ pseudo-merge.h
     +struct commit;
     +struct string_list;
     +struct bitmap_index;
    ++struct bitmap_writer;
     +
     +/*
     + * A pseudo-merge group tracks the set of non-bitmapped reference tips
    @@ pseudo-merge.h
     +	 */
     +	float decay;
     +	int max_merges;
    -+	int sample_rate;
    ++	float sample_rate;
     +	int stable_size;
     +	timestamp_t threshold;
     +	timestamp_t stable_threshold;
    @@ pseudo-merge.h
     + *
     + * Optionally shows a progress meter.
     + */
    -+void select_pseudo_merges(struct string_list *list,
    -+			  struct commit **commits, size_t commits_nr,
    -+			  kh_oid_map_t *pseudo_merge_commits,
    -+			  uint32_t *pseudo_merges_nr,
    -+			  unsigned show_progress);
    ++void select_pseudo_merges(struct bitmap_writer *writer,
    ++			  struct commit **commits, size_t commits_nr);
      
      #endif
10:  12b432e3a8a <  -:  ----------- pack-bitmap-write.c: select pseudo-merge commits
11:  6ce805d061e ! 18:  311226f65c2 pack-bitmap-write.c: write pseudo-merge table
    @@ pack-bitmap-write.c
      
      struct bitmapped_commit {
      	struct commit *commit;
    -@@ pack-bitmap-write.c: static void write_selected_commits_v1(struct hashfile *f,
    +@@ pack-bitmap-write.c: static void write_selected_commits_v1(struct bitmap_writer *writer,
      	}
      }
      
    -+static void write_pseudo_merges(struct hashfile *f)
    ++static void write_pseudo_merges(struct bitmap_writer *writer,
    ++				struct hashfile *f)
     +{
     +	struct oid_array commits = OID_ARRAY_INIT;
     +	struct bitmap **commits_bitmap = NULL;
     +	off_t *pseudo_merge_ofs = NULL;
     +	off_t start, table_start, next_ext;
     +
    -+	uint32_t base = bitmap_writer_selected_nr();
    ++	uint32_t base = bitmap_writer_nr_selected_commits(writer);
     +	size_t i, j = 0;
     +
    -+	CALLOC_ARRAY(commits_bitmap, writer.pseudo_merges_nr);
    -+	CALLOC_ARRAY(pseudo_merge_ofs, writer.pseudo_merges_nr);
    ++	CALLOC_ARRAY(commits_bitmap, writer->pseudo_merges_nr);
    ++	CALLOC_ARRAY(pseudo_merge_ofs, writer->pseudo_merges_nr);
     +
    -+	for (i = 0; i < writer.pseudo_merges_nr; i++) {
    -+		struct bitmapped_commit *merge = &writer.selected[base + i];
    ++	for (i = 0; i < writer->pseudo_merges_nr; i++) {
    ++		struct bitmapped_commit *merge = &writer->selected[base + i];
     +		struct commit_list *p;
     +
     +		if (!merge->pseudo_merge)
    @@ pack-bitmap-write.c: static void write_selected_commits_v1(struct hashfile *f,
     +
     +		for (p = merge->commit->parents; p; p = p->next)
     +			bitmap_set(commits_bitmap[i],
    -+				   find_object_pos(&p->item->object.oid, NULL));
    ++				   find_object_pos(writer, &p->item->object.oid,
    ++						   NULL));
     +	}
     +
     +	start = hashfile_total(f);
     +
    -+	for (i = 0; i < writer.pseudo_merges_nr; i++) {
    ++	for (i = 0; i < writer->pseudo_merges_nr; i++) {
     +		struct ewah_bitmap *commits_ewah = bitmap_to_ewah(commits_bitmap[i]);
     +
     +		pseudo_merge_ofs[i] = hashfile_total(f);
     +
     +		dump_bitmap(f, commits_ewah);
    -+		dump_bitmap(f, writer.selected[base+i].write_as);
    ++		dump_bitmap(f, writer->selected[base+i].write_as);
     +
     +		ewah_free(commits_ewah);
     +	}
     +
     +	next_ext = st_add(hashfile_total(f),
    -+			  st_mult(kh_size(writer.pseudo_merge_commits),
    ++			  st_mult(kh_size(writer->pseudo_merge_commits),
     +				  sizeof(uint64_t)));
     +
     +	table_start = hashfile_total(f);
     +
    -+	commits.alloc = kh_size(writer.pseudo_merge_commits);
    ++	commits.alloc = kh_size(writer->pseudo_merge_commits);
     +	CALLOC_ARRAY(commits.oid, commits.alloc);
     +
    -+	for (i = kh_begin(writer.pseudo_merge_commits); i != kh_end(writer.pseudo_merge_commits); i++) {
    -+		if (!kh_exist(writer.pseudo_merge_commits, i))
    ++	for (i = kh_begin(writer->pseudo_merge_commits); i != kh_end(writer->pseudo_merge_commits); i++) {
    ++		if (!kh_exist(writer->pseudo_merge_commits, i))
     +			continue;
    -+		oid_array_append(&commits, &kh_key(writer.pseudo_merge_commits, i));
    ++		oid_array_append(&commits, &kh_key(writer->pseudo_merge_commits, i));
     +	}
     +
     +	oid_array_sort(&commits);
    @@ pack-bitmap-write.c: static void write_selected_commits_v1(struct hashfile *f,
     +		int hash_pos;
     +		struct pseudo_merge_commit_idx *c;
     +
    -+		hash_pos = kh_get_oid_map(writer.pseudo_merge_commits,
    ++		hash_pos = kh_get_oid_map(writer->pseudo_merge_commits,
     +					  commits.oid[i]);
    -+		if (hash_pos == kh_end(writer.pseudo_merge_commits))
    ++		if (hash_pos == kh_end(writer->pseudo_merge_commits))
     +			BUG("could not find pseudo-merge commit %s",
     +			    oid_to_hex(&commits.oid[i]));
     +
    -+		c = kh_value(writer.pseudo_merge_commits, hash_pos);
    ++		c = kh_value(writer->pseudo_merge_commits, hash_pos);
     +
    -+		hashwrite_be32(f, find_object_pos(&commits.oid[i], NULL));
    ++		hashwrite_be32(f, find_object_pos(writer, &commits.oid[i],
    ++						  NULL));
     +		if (c->nr == 1)
     +			hashwrite_be64(f, pseudo_merge_ofs[c->pseudo_merge[0]]);
     +		else if (c->nr > 1) {
    @@ pack-bitmap-write.c: static void write_selected_commits_v1(struct hashfile *f,
     +		int hash_pos;
     +		struct pseudo_merge_commit_idx *c;
     +
    -+		hash_pos = kh_get_oid_map(writer.pseudo_merge_commits,
    ++		hash_pos = kh_get_oid_map(writer->pseudo_merge_commits,
     +					  commits.oid[i]);
    -+		if (hash_pos == kh_end(writer.pseudo_merge_commits))
    ++		if (hash_pos == kh_end(writer->pseudo_merge_commits))
     +			BUG("could not find pseudo-merge commit %s",
     +			    oid_to_hex(&commits.oid[i]));
     +
    -+		c = kh_value(writer.pseudo_merge_commits, hash_pos);
    ++		c = kh_value(writer->pseudo_merge_commits, hash_pos);
     +		if (c->nr == 1)
     +			continue;
     +
    @@ pack-bitmap-write.c: static void write_selected_commits_v1(struct hashfile *f,
     +	}
     +
     +	/* write positions for all pseudo merges */
    -+	for (i = 0; i < writer.pseudo_merges_nr; i++)
    ++	for (i = 0; i < writer->pseudo_merges_nr; i++)
     +		hashwrite_be64(f, pseudo_merge_ofs[i]);
     +
    -+	hashwrite_be32(f, writer.pseudo_merges_nr);
    -+	hashwrite_be32(f, kh_size(writer.pseudo_merge_commits));
    ++	hashwrite_be32(f, writer->pseudo_merges_nr);
    ++	hashwrite_be32(f, kh_size(writer->pseudo_merge_commits));
     +	hashwrite_be64(f, table_start - start);
     +	hashwrite_be64(f, hashfile_total(f) - start + sizeof(uint64_t));
     +
    -+	for (i = 0; i < writer.pseudo_merges_nr; i++)
    ++	for (i = 0; i < writer->pseudo_merges_nr; i++)
     +		bitmap_free(commits_bitmap[i]);
     +
     +	free(pseudo_merge_ofs);
    @@ pack-bitmap-write.c: static void write_selected_commits_v1(struct hashfile *f,
     +
      static int table_cmp(const void *_va, const void *_vb, void *_data)
      {
    - 	uint32_t *commit_positions = _data;
    -@@ pack-bitmap-write.c: void bitmap_writer_finish(struct pack_idx_entry **index,
    + 	struct bitmap_writer *writer = _data;
    +@@ pack-bitmap-write.c: void bitmap_writer_finish(struct bitmap_writer *writer,
      
      	int fd = odb_mkstemp(&tmp_file, "pack/tmp_bitmap_XXXXXX");
      
    -+	if (writer.pseudo_merges_nr)
    ++	if (writer->pseudo_merges_nr)
     +		options |= BITMAP_OPT_PSEUDO_MERGES;
     +
      	f = hashfd(fd, tmp_file.buf);
      
      	memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE));
    -@@ pack-bitmap-write.c: void bitmap_writer_finish(struct pack_idx_entry **index,
    +@@ pack-bitmap-write.c: void bitmap_writer_finish(struct bitmap_writer *writer,
      
    - 	write_selected_commits_v1(f, commit_positions, offsets);
    + 	write_selected_commits_v1(writer, f, offsets);
      
     +	if (options & BITMAP_OPT_PSEUDO_MERGES)
    -+		write_pseudo_merges(f);
    ++		write_pseudo_merges(writer, f);
     +
      	if (options & BITMAP_OPT_LOOKUP_TABLE)
    - 		write_lookup_table(f, commit_positions, offsets);
    + 		write_lookup_table(writer, f, offsets);
      
     
      ## pack-bitmap.h ##
12:  60f6b310213 = 19:  55dd7a8023e pack-bitmap: extract `read_bitmap()` function
13:  9465313691b ! 20:  3cc5434e44e pseudo-merge: scaffolding for reads
    @@ Commit message
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## pseudo-merge.c ##
    -@@ pseudo-merge.c: void select_pseudo_merges(struct string_list *list,
    +@@ pseudo-merge.c: void select_pseudo_merges(struct bitmap_writer *writer,
      
      	stop_progress(&progress);
      }
    @@ pseudo-merge.c: void select_pseudo_merges(struct string_list *list,
     +}
     
      ## pseudo-merge.h ##
    -@@ pseudo-merge.h: void select_pseudo_merges(struct string_list *list,
    - 			  uint32_t *pseudo_merges_nr,
    - 			  unsigned show_progress);
    +@@ pseudo-merge.h: struct pseudo_merge_commit_idx {
    + void select_pseudo_merges(struct bitmap_writer *writer,
    + 			  struct commit **commits, size_t commits_nr);
      
     +/*
     + * Represents a serialized view of a file containing pseudo-merge(s)
14:  5894f3d5369 = 21:  7664f5f9648 pack-bitmap.c: read pseudo-merge extension
15:  7dbee8bcbdf = 22:  8ba0a9c5402 pseudo-merge: implement support for reading pseudo-merge commits
16:  09650aa53e9 = 23:  2c02f303b6f ewah: implement `ewah_bitmap_popcount()`
17:  7b5ea56d053 = 24:  82cce72bf55 pack-bitmap: implement test helpers for pseudo-merge
18:  006abdd1698 = 25:  890f6c4b9de t/test-lib-functions.sh: support `--date` in `test_commit_bulk()`
19:  3f85e5b90f5 ! 26:  41691824f78 pack-bitmap.c: use pseudo-merges during traversal
    @@ t/t5333-pseudo-merge-bitmaps.sh (new)
     +
     +test_expect_success 'bitmapPseudoMerge.sampleRate adjusts commit selection rate' '
     +	test_config bitmapPseudoMerge.test.pattern "refs/tags/" &&
    -+	test_config bitmapPseudoMerge.test.maxMerges 8 &&
    ++	test_config bitmapPseudoMerge.test.maxMerges 1 &&
     +	test_config bitmapPseudoMerge.test.stableThreshold never &&
     +
     +	commits_nr=$(git rev-list --all --count) &&
     +
    -+	for rate in 100 50 10
    ++	for rate in 1.0 0.5 0.25
     +	do
     +		git -c bitmapPseudoMerge.test.sampleRate=$rate repack -adb &&
     +
     +		test_pseudo_merges >merges &&
    -+		for i in $(test_seq 0 $(($(wc -l <merges)-1)))
    -+		do
    -+			test_pseudo_merge_commits $i || return 1
    -+		done >commits &&
    ++		test_line_count = 1 merges &&
    ++		test_pseudo_merge_commits 0 >commits &&
     +
     +		test-tool bitmap list-commits >bitmaps &&
     +		bitmaps_nr="$(wc -l <bitmaps)" &&
     +
    -+		perl -MPOSIX -e "print ceil((\$ARGV[0]/100)*(\$ARGV[1]-\$ARGV[2]))" \
    ++		perl -MPOSIX -e "print ceil(\$ARGV[0]*(\$ARGV[1]-\$ARGV[2]))" \
     +			"$rate" "$commits_nr" "$bitmaps_nr" >expect &&
     +
     +		test $(cat expect) -eq $(wc -l <commits) || return 1
20:  5fac186df64 = 27:  a34a60c3ef8 pack-bitmap: extra trace2 information
21:  b5aea8e57f8 = 28:  da2fb5b4b48 ewah: `bitmap_equals_ewah()`
22:  61ddb574285 ! 29:  ff21247281f pseudo-merge: implement support for finding existing merges
    @@ pack-bitmap-write.c
      
      struct bitmapped_commit {
      	struct commit *commit;
    -@@ pack-bitmap-write.c: static int fill_bitmap_tree(struct bitmap *bitmap,
    +@@ pack-bitmap-write.c: static int fill_bitmap_tree(struct bitmap_writer *writer,
      }
      
      static int reused_bitmaps_nr;
     +static int reused_pseudo_merge_bitmaps_nr;
      
    - static int fill_bitmap_commit(struct bb_commit *ent,
    - 			      struct commit *commit,
    -@@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bb_commit *ent,
    + static int fill_bitmap_commit(struct bitmap_writer *writer,
    + 			      struct bb_commit *ent,
    +@@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bitmap_writer *writer,
      			struct bitmap *remapped = bitmap_new();
      
      			if (commit->object.flags & BITMAP_PSEUDO_MERGE)
    @@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bb_commit *ent,
      			else
      				old = bitmap_for_commit(old_bitmap, c);
      			/*
    -@@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bb_commit *ent,
    +@@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bitmap_writer *writer,
      			if (old && !rebuild_bitmap(mapping, old, remapped)) {
      				bitmap_or(ent->bitmap, remapped);
      				bitmap_free(remapped);
    @@ pack-bitmap-write.c: static int fill_bitmap_commit(struct bb_commit *ent,
      				continue;
      			}
      			bitmap_free(remapped);
    -@@ pack-bitmap-write.c: int bitmap_writer_build(struct packing_data *to_pack)
    +@@ pack-bitmap-write.c: int bitmap_writer_build(struct bitmap_writer *writer,
      			    the_repository);
      	trace2_data_intmax("pack-bitmap-write", the_repository,
      			   "building_bitmaps_reused", reused_bitmaps_nr);
    @@ pack-bitmap-write.c: int bitmap_writer_build(struct packing_data *to_pack)
     +			   "building_bitmaps_pseudo_merge_reused",
     +			   reused_pseudo_merge_bitmaps_nr);
      
    - 	stop_progress(&writer.progress);
    + 	stop_progress(&writer->progress);
      
     
      ## pack-bitmap.c ##
    @@ pack-bitmap.h: int rebuild_bitmap(const uint32_t *reposition,
      				      struct commit *commit);
     +struct ewah_bitmap *pseudo_merge_bitmap_for_commit(struct bitmap_index *bitmap_git,
     +						   struct commit *commit);
    - void bitmap_writer_select_commits(struct commit **indexed_commits,
    + void bitmap_writer_select_commits(struct bitmap_writer *writer,
    + 				  struct commit **indexed_commits,
      				  unsigned int indexed_commits_nr);
    - int bitmap_writer_build(struct packing_data *to_pack);
     
      ## pseudo-merge.c ##
     @@ pseudo-merge.c: int cascade_pseudo_merges(const struct pseudo_merge_map *pm,
23:  2bd830d35dd = 30:  6a6d88fa512 t/perf: implement performace tests for pseudo-merge bitmaps
-- 
2.45.1.175.gbea44add9db

^ permalink raw reply	[flat|nested] 157+ messages in thread

* [PATCH v3 01/30] object.h: add flags allocated by pack-bitmap.h
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
@ 2024-05-21 19:01   ` Taylor Blau
  2024-05-21 19:06     ` Taylor Blau
  2024-05-21 19:01   ` [PATCH v3 07/30] Documentation/gitpacking.txt: initial commit Taylor Blau
                     ` (24 subsequent siblings)
  25 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:01 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

In commit 7cc8f971085 (pack-objects: implement bitmap writing,
2013-12-21) the NEEDS_BITMAP flag was introduced into pack-bitmap.h, but
no object flags allocation table existed at the time.

In 208acbfb82f (object.h: centralize object flag allocation, 2014-03-25)
when that table was first introduced, we never added the flags from
7cc8f971085, which has remained the case since.

Rectify this by including the flag bit used by pack-bitmap.h into the
centralized table in object.h.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 object.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/object.h b/object.h
index 9293e703ccc..99b9c8f114c 100644
--- a/object.h
+++ b/object.h
@@ -81,6 +81,7 @@ void object_array_init(struct object_array *array);
  * reflog.c:                           10--12
  * builtin/show-branch.c:    0-------------------------------------------26
  * builtin/unpack-objects.c:                                 2021
+ * pack-bitmap.h:                                                22
  */
 #define FLAG_BITS  28
 
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 07/30] Documentation/gitpacking.txt: initial commit
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
  2024-05-21 19:01   ` [PATCH v3 01/30] object.h: add flags allocated by pack-bitmap.h Taylor Blau
@ 2024-05-21 19:01   ` Taylor Blau
  2024-05-21 19:02   ` [PATCH v3 08/30] Documentation/gitpacking.txt: describe pseudo-merge bitmaps Taylor Blau
                     ` (23 subsequent siblings)
  25 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:01 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Introduce a new manual page, gitpacking(7) to collect useful information
about advanced packing concepts in Git.

In future commits in this series, this manual page will expand to
describe the new pseudo-merge bitmaps feature, as well as include
examples, relevant configuration bits, use-cases, and so on.

Outside of this series, this manual page may absorb similar pieces from
other parts of Git's documentation about packing.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/Makefile       |  1 +
 Documentation/gitpacking.txt | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)
 create mode 100644 Documentation/gitpacking.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 3f2383a12c7..920b6248aa4 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -51,6 +51,7 @@ MAN7_TXT += gitdiffcore.txt
 MAN7_TXT += giteveryday.txt
 MAN7_TXT += gitfaq.txt
 MAN7_TXT += gitglossary.txt
+MAN7_TXT += gitpacking.txt
 MAN7_TXT += gitnamespaces.txt
 MAN7_TXT += gitremote-helpers.txt
 MAN7_TXT += gitrevisions.txt
diff --git a/Documentation/gitpacking.txt b/Documentation/gitpacking.txt
new file mode 100644
index 00000000000..50e9900d845
--- /dev/null
+++ b/Documentation/gitpacking.txt
@@ -0,0 +1,34 @@
+gitpacking(7)
+=============
+
+NAME
+----
+gitpacking - Advanced concepts related to packing in Git
+
+SYNOPSIS
+--------
+gitpacking
+
+DESCRIPTION
+-----------
+
+This document aims to describe some advanced concepts related to packing
+in Git.
+
+Many concepts are currently described scattered between manual pages of
+various Git commands, including linkgit:git-pack-objects[1],
+linkgit:git-repack[1], and others, as well as linkgit:gitformat-pack[5],
+and parts of the `Documentation/technical` tree.
+
+There are many aspects of packing in Git that are not covered in this
+document that instead live in the aforementioned areas. Over time, those
+scattered bits may coalesce into this document.
+
+SEE ALSO
+--------
+linkgit:git-pack-objects[1]
+linkgit:git-repack[1]
+
+GIT
+---
+Part of the linkgit:git[1] suite
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 08/30] Documentation/gitpacking.txt: describe pseudo-merge bitmaps
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
  2024-05-21 19:01   ` [PATCH v3 01/30] object.h: add flags allocated by pack-bitmap.h Taylor Blau
  2024-05-21 19:01   ` [PATCH v3 07/30] Documentation/gitpacking.txt: initial commit Taylor Blau
@ 2024-05-21 19:02   ` Taylor Blau
  2024-05-21 19:02   ` [PATCH v3 09/30] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
                     ` (22 subsequent siblings)
  25 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Add some details to the gitpacking(7) manual page which motivate and
describe pseudo-merge bitmaps.

The exact on-disk format and many of the configuration knobs will be
described in subsequent commits.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/gitpacking.txt | 69 ++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)

diff --git a/Documentation/gitpacking.txt b/Documentation/gitpacking.txt
index 50e9900d845..ff18077129b 100644
--- a/Documentation/gitpacking.txt
+++ b/Documentation/gitpacking.txt
@@ -24,6 +24,75 @@ There are many aspects of packing in Git that are not covered in this
 document that instead live in the aforementioned areas. Over time, those
 scattered bits may coalesce into this document.
 
+== Pseudo-merge bitmaps
+
+=== Background
+
+Reachability bitmaps are most efficient when we have on-disk stored
+bitmaps for one or more of the starting points of a traversal. For this
+reason, Git prefers storing bitmaps for commits at the tips of refs,
+because traversals tend to start with those points.
+
+But if you have a large number of refs, it's not feasible to store a
+bitmap for _every_ ref tip. It takes up space, and just OR-ing all of
+those bitmaps together is expensive.
+
+One way we can deal with that is to create bitmaps that represent
+_groups_ of refs. When a traversal asks about the entire group, then we
+can use this single bitmap instead of considering each ref individually.
+Because these bitmaps represent the set of objects which would be
+reachable in a hypothetical merge of all of the commits, we call them
+pseudo-merge bitmaps.
+
+=== Overview
+
+A "pseudo-merge bitmap" is used to refer to a pair of bitmaps, as
+follows:
+
+Commit bitmap::
+
+  A bitmap whose set bits describe the set of commits included in the
+  pseudo-merge's "merge" bitmap (as below).
+
+Merge bitmap::
+
+  A bitmap whose set bits describe the reachability closure over the set
+  of commits in the pseudo-merge's "commits" bitmap (as above). An
+  identical bitmap would be generated for an octopus merge with the same
+  set of parents as described in the commits bitmap.
+
+Pseudo-merge bitmaps can accelerate bitmap traversals when all commits
+for a given pseudo-merge are listed on either side of the traversal,
+either directly (by explicitly asking for them as part of the `HAVES`
+or `WANTS`) or indirectly (by encountering them during a fill-in
+traversal).
+
+=== Use-cases
+
+For example, suppose there exists a pseudo-merge bitmap with a large
+number of commits, all of which are listed in the `WANTS` section of
+some bitmap traversal query. When pseudo-merge bitmaps are enabled, the
+bitmap machinery can quickly determine there is a pseudo-merge which
+satisfies some subset of the wanted objects on either side of the query.
+Then, we can inflate the EWAH-compressed bitmap, and `OR` it in to the
+resulting bitmap. By contrast, without pseudo-merge bitmaps, we would
+have to repeat the decompression and `OR`-ing step over a potentially
+large number of individual bitmaps, which can take proportionally more
+time.
+
+Another benefit of pseudo-merges arises when there is some combination
+of (a) a large number of references, with (b) poor bitmap coverage, and
+(c) deep, nested trees, making fill-in traversal relatively expensive.
+For example, suppose that there are a large enough number of tags where
+bitmapping each of the tags individually is infeasible. Without
+pseudo-merge bitmaps, computing the result of, say, `git rev-list
+--use-bitmap-index --count --objects --tags` would likely require a
+large amount of fill-in traversal. But when a large quantity of those
+tags are stored together in a pseudo-merge bitmap, the bitmap machinery
+can take advantage of the fact that we only care about the union of
+objects reachable from all of those tags, and answer the query much
+faster.
+
 SEE ALSO
 --------
 linkgit:git-pack-objects[1]
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 09/30] Documentation/technical: describe pseudo-merge bitmaps format
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (2 preceding siblings ...)
  2024-05-21 19:02   ` [PATCH v3 08/30] Documentation/gitpacking.txt: describe pseudo-merge bitmaps Taylor Blau
@ 2024-05-21 19:02   ` Taylor Blau
  2024-05-21 19:02   ` [PATCH v3 10/30] ewah: implement `ewah_bitmap_is_subset()` Taylor Blau
                     ` (21 subsequent siblings)
  25 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Prepare to implement pseudo-merge bitmaps over the next several commits
by first describing the serialization format which will store the new
pseudo-merge bitmaps themselves.

This format is implemented as an optional extension within the bitmap v1
format, making it compatible with previous versions of Git, as well as
the original .bitmap implementation within JGit.

The format is described in detail in the patch contents below, but the
high-level description is as follows:

  - An array of pseudo-merge bitmaps, each containing a pair of EWAH
    bitmaps: one describing the set of pseudo-merge "parents", and
    another describing the set of object(s) reachable from those
    parents.

  - A lookup table to determine which pseudo-merge(s) a given commit
    appears in. An optional extended lookup table follows when there is
    at least one commit which appears in multiple pseudo-merge groups.

  - Trailing metadata, including the number of pseudo-merge(s), number
    of unique parents, the offset within the .bitmap file for the
    pseudo-merge commit lookup table, and the size of the optional
    extension itself.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/bitmap-format.txt | 132 ++++++++++++++++++++++
 1 file changed, 132 insertions(+)

diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index f5d200939b0..ee7775a2586 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -255,3 +255,135 @@ triplet is -
 	xor_row (4 byte integer, network byte order): ::
 	The position of the triplet whose bitmap is used to compress
 	this one, or `0xffffffff` if no such bitmap exists.
+
+Pseudo-merge bitmaps
+--------------------
+
+If the `BITMAP_OPT_PSEUDO_MERGES` flag is set, a variable number of
+bytes (preceding the name-hash cache, commit lookup table, and trailing
+checksum) of the `.bitmap` file is used to store pseudo-merge bitmaps.
+
+For more information on what pseudo-merges are, why they are useful, and
+how to configure them, see the information in linkgit:gitpacking[7].
+
+=== File format
+
+If enabled, pseudo-merge bitmaps are stored in an optional section at
+the end of a `.bitmap` file. The format is as follows:
+
+....
++-------------------------------------------+
+|               .bitmap File                |
++-------------------------------------------+
+|                                           |
+|  Pseudo-merge bitmaps (Variable Length)   |
+|  +---------------------------+            |
+|  | commits_bitmap (EWAH)     |            |
+|  +---------------------------+            |
+|  | merge_bitmap (EWAH)       |            |
+|  +---------------------------+            |
+|                                           |
++-------------------------------------------+
+|                                           |
+|  Lookup Table                             |
+|  +---------------------------+            |
+|  | commit_pos (4 bytes)      |            |
+|  +---------------------------+            |
+|  | offset (8 bytes)          |            |
+|  +------------+--------------+            |
+|                                           |
+|  Offset Cases:                            |
+|  -------------                            |
+|                                           |
+|  1. MSB Unset: single pseudo-merge bitmap |
+|     + offset to pseudo-merge bitmap       |
+|                                           |
+|  2. MSB Set: multiple pseudo-merges       |
+|     + offset to extended lookup table     |
+|                                           |
++-------------------------------------------+
+|                                           |
+|  Extended Lookup Table (Optional)         |
+|  +----+----------+----------+----------+  |
+|  | N  | Offset 1 |   ....   | Offset N |  |
+|  +----+----------+----------+----------+  |
+|  |    |  8 bytes |   ....   |  8 bytes |  |
+|  +----+----------+----------+----------+  |
+|                                           |
++-------------------------------------------+
+|                                           |
+|  Pseudo-merge Metadata                    |
+|  +-----------------------------------+    |
+|  | # pseudo-merges (4 bytes)         |    |
+|  +-----------------------------------+    |
+|  | # commits (4 bytes)               |    |
+|  +-----------------------------------+    |
+|  | Lookup offset (8 bytes)           |    |
+|  +-----------------------------------+    |
+|  | Extension size (8 bytes)          |    |
+|  +-----------------------------------+    |
+|                                           |
++-------------------------------------------+
+....
+
+* One or more pseudo-merge bitmaps, each containing:
+
+  ** `commits_bitmap`, an EWAH-compressed bitmap describing the set of
+     commits included in the this psuedo-merge.
+
+  ** `merge_bitmap`, an EWAH-compressed bitmap describing the union of
+     the set of objects reachable from all commits listed in the
+     `commits_bitmap`.
+
+* A lookup table, mapping pseudo-merged commits to the pseudo-merges
+  they belong to. Entries appear in increasing order of each commit's
+  bit position. Each entry is 12 bytes wide, and is comprised of the
+  following:
+
+  ** `commit_pos`, a 4-byte unsigned value (in network byte-order)
+     containing the bit position for this commit.
+
+  ** `offset`, an 8-byte unsigned value (also in network byte-order)
+  containing either one of two possible offsets, depending on whether or
+  not the most-significant bit is set.
+
+    *** If unset (i.e. `offset & ((uint64_t)1<<63) == 0`), the offset
+	(relative to the beginning of the `.bitmap` file) at which the
+	pseudo-merge bitmap for this commit can be read. This indicates
+	only a single pseudo-merge bitmap contains this commit.
+
+    *** If set (i.e. `offset & ((uint64_t)1<<63) != 0`), the offset
+	(again relative to the beginning of the `.bitmap` file) at which
+	the extended offset table can be located describing the set of
+	pseudo-merge bitmaps which contain this commit. This indicates
+	that multiple pseudo-merge bitmaps contain this commit.
+
+* An (optional) extended lookup table (written if and only if there is
+  at least one commit which appears in more than one pseudo-merge).
+  There are as many entries as commits which appear in multiple
+  pseudo-merges. Each entry contains the following:
+
+  ** `N`, a 4-byte unsigned value equal to the number of pseudo-merges
+     which contain a given commit.
+
+  ** An array of `N` 8-byte unsigned values, each of which is
+     interpreted as an offset (relative to the beginning of the
+     `.bitmap` file) at which a pseudo-merge bitmap for this commit can
+     be read. These values occur in no particular order.
+
+* Positions for all pseudo-merges, each stored as an 8-byte unsigned
+  value (in network byte-order) containing the offset (relative to the
+  beginning of the `.bitmap` file) of each consecutive pseudo-merge.
+
+* A 4-byte unsigned value (in network byte-order) equal to the number of
+  pseudo-merges.
+
+* A 4-byte unsigned value (in network byte-order) equal to the number of
+  unique commits which appear in any pseudo-merge.
+
+* An 8-byte unsigned value (in network byte-order) equal to the number
+  of bytes between the start of the pseudo-merge section and the
+  beginning of the lookup table.
+
+* An 8-byte unsigned value (in network byte-order) equal to the number
+  of bytes in the pseudo-merge section (including this field).
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 10/30] ewah: implement `ewah_bitmap_is_subset()`
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (3 preceding siblings ...)
  2024-05-21 19:02   ` [PATCH v3 09/30] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
@ 2024-05-21 19:02   ` Taylor Blau
  2024-05-21 19:02   ` [PATCH v3 11/30] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
                     ` (20 subsequent siblings)
  25 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

In order to know whether a given pseudo-merge (comprised of a "parents"
and "objects" bitmaps) is "satisfied" and can be OR'd into the bitmap
result, we need to be able to quickly determine whether the "parents"
bitmap is a subset of the current set of objects reachable on either
side of a traversal.

Implement a helper function to prepare for that, which determines
whether an EWAH bitmap (the parents bitmap from the pseudo-merge) is a
subset of a non-EWAH bitmap (in this case, the results bitmap from
either side of the traversal).

This function makes use of the EWAH iterator to avoid inflating any part
of the EWAH bitmap after we determine it is not a subset of the non-EWAH
bitmap. This "fail-fast" allows us to avoid a potentially large amount
of wasted effort.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ewah/bitmap.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 ewah/ewok.h   |  6 ++++++
 2 files changed, 49 insertions(+)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index ac7e0af622a..d352fec54ce 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -138,6 +138,49 @@ void bitmap_or(struct bitmap *self, const struct bitmap *other)
 		self->words[i] |= other->words[i];
 }
 
+int ewah_bitmap_is_subset(struct ewah_bitmap *self, struct bitmap *other)
+{
+	struct ewah_iterator it;
+	eword_t word;
+	size_t i;
+
+	ewah_iterator_init(&it, self);
+
+	for (i = 0; i < other->word_alloc; i++) {
+		if (!ewah_iterator_next(&word, &it)) {
+			/*
+			 * If we reached the end of `self`, and haven't
+			 * rejected `self` as a possible subset of
+			 * `other` yet, then we are done and `self` is
+			 * indeed a subset of `other`.
+			 */
+			return 1;
+		}
+		if (word & ~other->words[i]) {
+			/*
+			 * Otherwise, compare the next two pairs of
+			 * words. If the word from `self` has bit(s) not
+			 * in the word from `other`, `self` is not a
+			 * subset of `other`.
+			 */
+			return 0;
+		}
+	}
+
+	/*
+	 * If we got to this point, there may be zero or more words
+	 * remaining in `self`, with no remaining words left in `other`.
+	 * If there are any bits set in the remaining word(s) in `self`,
+	 * then `self` is not a subset of `other`.
+	 */
+	while (ewah_iterator_next(&word, &it))
+		if (word)
+			return 0;
+
+	/* `self` is definitely a subset of `other` */
+	return 1;
+}
+
 void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other)
 {
 	size_t original_size = self->word_alloc;
diff --git a/ewah/ewok.h b/ewah/ewok.h
index c11d76c6f33..2b6c4ac499c 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -179,7 +179,13 @@ void bitmap_unset(struct bitmap *self, size_t pos);
 int bitmap_get(struct bitmap *self, size_t pos);
 void bitmap_free(struct bitmap *self);
 int bitmap_equals(struct bitmap *self, struct bitmap *other);
+
+/*
+ * Both `bitmap_is_subset()` and `ewah_bitmap_is_subset()` return 1 if the set
+ * of bits in 'self' are a subset of the bits in 'other'. Returns 0 otherwise.
+ */
 int bitmap_is_subset(struct bitmap *self, struct bitmap *other);
+int ewah_bitmap_is_subset(struct ewah_bitmap *self, struct bitmap *other);
 
 struct ewah_bitmap * bitmap_to_ewah(struct bitmap *bitmap);
 struct bitmap *ewah_to_bitmap(struct ewah_bitmap *ewah);
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 11/30] pack-bitmap: move some initialization to `bitmap_writer_init()`
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (4 preceding siblings ...)
  2024-05-21 19:02   ` [PATCH v3 10/30] ewah: implement `ewah_bitmap_is_subset()` Taylor Blau
@ 2024-05-21 19:02   ` Taylor Blau
  2024-05-21 19:02   ` [PATCH v3 12/30] pseudo-merge.ch: initial commit Taylor Blau
                     ` (19 subsequent siblings)
  25 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

The pack-bitmap-writer machinery uses a oidmap (backed by khash.h) to
map from commits selected for bitmaps (by OID) to a bitmapped_commit
structure (containing the bitmap itself, among other things like its XOR
offset, etc.)

This map was initialized at the end of `bitmap_writer_build()`. New
entries are added in `pack-bitmap-write.c::store_selected()`, which is
called by the bitmap_builder machinery (which is responsible for
traversing history and generating the actual bitmaps).

Reorganize when this field is initialized and when entries are added to
it so that we can quickly determine whether a commit is a candidate for
pseudo-merge selection, or not (since it was already selected to receive
a bitmap, and thus storing it in a pseudo-merge would be redundant).

The changes are as follows:

  - Introduce a new `bitmap_writer_init()` function which initializes
    the `writer.bitmaps` field (instead of waiting until the end of
    `bitmap_writer_build()`).

  - Add map entries in `push_bitmapped_commit()` (which is called via
    `bitmap_writer_select_commits()`) with OID keys and NULL values to
    track whether or not we *expect* to write a bitmap for some given
    commit.

  - Validate that a NULL entry is found matching the given key when we
    store a selected bitmap.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c |  3 ++-
 midx-write.c           |  2 +-
 pack-bitmap-write.c    | 24 ++++++++++++++++++------
 pack-bitmap.h          |  2 +-
 4 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 26a6d0d7919..6209264e60c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1340,7 +1340,8 @@ static void write_pack_file(void)
 				    hash_to_hex(hash));
 
 			if (write_bitmap_index) {
-				bitmap_writer_init(&bitmap_writer);
+				bitmap_writer_init(&bitmap_writer,
+						   the_repository);
 				bitmap_writer_set_checksum(&bitmap_writer, hash);
 				bitmap_writer_build_type_index(&bitmap_writer,
 					&to_pack, written_list, nr_written);
diff --git a/midx-write.c b/midx-write.c
index 7c0c08c64b2..c747d1a6af3 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -820,7 +820,7 @@ static int write_midx_bitmap(const char *midx_name,
 	for (i = 0; i < pdata->nr_objects; i++)
 		index[i] = &pdata->objects[i].idx;
 
-	bitmap_writer_init(&writer);
+	bitmap_writer_init(&writer, the_repository);
 	bitmap_writer_show_progress(&writer, flags & MIDX_PROGRESS);
 	bitmap_writer_build_type_index(&writer, pdata, index,
 				       pdata->nr_objects);
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 6cae670412c..d8870155831 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -27,9 +27,12 @@ struct bitmapped_commit {
 	uint32_t commit_pos;
 };
 
-void bitmap_writer_init(struct bitmap_writer *writer)
+void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r)
 {
 	memset(writer, 0, sizeof(struct bitmap_writer));
+	if (writer->bitmaps)
+		BUG("bitmap writer already initialized");
+	writer->bitmaps = kh_init_oid_map();
 }
 
 void bitmap_writer_free(struct bitmap_writer *writer)
@@ -128,11 +131,21 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
 static inline void push_bitmapped_commit(struct bitmap_writer *writer,
 					 struct commit *commit)
 {
+	int hash_ret;
+	khiter_t hash_pos;
+
 	if (writer->selected_nr >= writer->selected_alloc) {
 		writer->selected_alloc = (writer->selected_alloc + 32) * 2;
 		REALLOC_ARRAY(writer->selected, writer->selected_alloc);
 	}
 
+	hash_pos = kh_put_oid_map(writer->bitmaps, commit->object.oid,
+				  &hash_ret);
+	if (!hash_ret)
+		die(_("duplicate entry when writing bitmap index: %s"),
+		    oid_to_hex(&commit->object.oid));
+	kh_value(writer->bitmaps, hash_pos) = NULL;
+
 	writer->selected[writer->selected_nr].commit = commit;
 	writer->selected[writer->selected_nr].bitmap = NULL;
 	writer->selected[writer->selected_nr].write_as = NULL;
@@ -483,14 +496,14 @@ static void store_selected(struct bitmap_writer *writer,
 {
 	struct bitmapped_commit *stored = &writer->selected[ent->idx];
 	khiter_t hash_pos;
-	int hash_ret;
 
 	stored->bitmap = bitmap_to_ewah(ent->bitmap);
 
-	hash_pos = kh_put_oid_map(writer->bitmaps, commit->object.oid, &hash_ret);
-	if (hash_ret == 0)
-		die("Duplicate entry when writing index: %s",
+	hash_pos = kh_get_oid_map(writer->bitmaps, commit->object.oid);
+	if (hash_pos == kh_end(writer->bitmaps))
+		die(_("attempted to store non-selected commit: '%s'"),
 		    oid_to_hex(&commit->object.oid));
+
 	kh_value(writer->bitmaps, hash_pos) = stored;
 }
 
@@ -506,7 +519,6 @@ int bitmap_writer_build(struct bitmap_writer *writer,
 	uint32_t *mapping;
 	int closed = 1; /* until proven otherwise */
 
-	writer->bitmaps = kh_init_oid_map();
 	writer->to_pack = to_pack;
 
 	if (writer->show_progress)
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 3091095f336..f87e60153dd 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -114,7 +114,7 @@ struct bitmap_writer {
 	unsigned char pack_checksum[GIT_MAX_RAWSZ];
 };
 
-void bitmap_writer_init(struct bitmap_writer *writer);
+void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r);
 void bitmap_writer_show_progress(struct bitmap_writer *writer, int show);
 void bitmap_writer_set_checksum(struct bitmap_writer *writer,
 				const unsigned char *sha1);
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 12/30] pseudo-merge.ch: initial commit
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (5 preceding siblings ...)
  2024-05-21 19:02   ` [PATCH v3 11/30] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
@ 2024-05-21 19:02   ` Taylor Blau
  2024-05-21 19:02   ` [PATCH v3 13/30] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
                     ` (18 subsequent siblings)
  25 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Add a new (empty) header file to contain the implementation for
selecting, reading, and applying pseudo-merge bitmaps.

For now this header and its corresponding implementation are left
empty, but they will evolve over the course of subsequent commit(s).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Makefile       | 1 +
 pseudo-merge.c | 2 ++
 pseudo-merge.h | 6 ++++++
 3 files changed, 9 insertions(+)
 create mode 100644 pseudo-merge.c
 create mode 100644 pseudo-merge.h

diff --git a/Makefile b/Makefile
index 0285db56306..4705a69f57f 100644
--- a/Makefile
+++ b/Makefile
@@ -1105,6 +1105,7 @@ LIB_OBJS += prompt.o
 LIB_OBJS += protocol.o
 LIB_OBJS += protocol-caps.o
 LIB_OBJS += prune-packed.o
+LIB_OBJS += pseudo-merge.o
 LIB_OBJS += quote.o
 LIB_OBJS += range-diff.o
 LIB_OBJS += reachable.o
diff --git a/pseudo-merge.c b/pseudo-merge.c
new file mode 100644
index 00000000000..37e037ba272
--- /dev/null
+++ b/pseudo-merge.c
@@ -0,0 +1,2 @@
+#include "git-compat-util.h"
+#include "pseudo-merge.h"
diff --git a/pseudo-merge.h b/pseudo-merge.h
new file mode 100644
index 00000000000..cab8ff6960a
--- /dev/null
+++ b/pseudo-merge.h
@@ -0,0 +1,6 @@
+#ifndef PSEUDO_MERGE_H
+#define PSEUDO_MERGE_H
+
+#include "git-compat-util.h"
+
+#endif
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 13/30] pack-bitmap-write: support storing pseudo-merge commits
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (6 preceding siblings ...)
  2024-05-21 19:02   ` [PATCH v3 12/30] pseudo-merge.ch: initial commit Taylor Blau
@ 2024-05-21 19:02   ` Taylor Blau
  2024-05-21 19:02   ` [PATCH v3 14/30] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Taylor Blau
                     ` (17 subsequent siblings)
  25 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Prepare to write pseudo-merge bitmaps by annotating individual bitmapped
commits (which are represented by the `bitmapped_commit` structure) with
an extra bit indicating whether or not they are a pseudo-merge.

In subsequent commits, pseudo-merge bitmaps will be generated by
allocating a fake commit node with parents covering the full set of
commits represented by the pseudo-merge bitmap. These commits will be
added to the set of "selected" commits as usual, but will be written
specially instead of being included with the rest of the selected
commits.

Mechanically speaking, there are two parts of this change:

  - The bitmapped_commit struct gets a new bit indicating whether it is
    a pseudo-merge, or an ordinary commit selected for bitmaps.

  - A handful of changes to only write out the non-pseudo-merge commits
    when enumerating through the selected array (see the new
    `bitmap_writer_selected_nr()` function). Pseudo-merge commits appear
    after all non-pseudo-merge commits, so it is safe to enumerate
    through the selected array like so:

        for (i = 0; i < bitmap_writer_selected_nr(); i++)
          if (writer.selected[i].pseudo_merge)
            BUG("unexpected pseudo-merge");

    without encountering the BUG().

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 object.h            |  2 +-
 pack-bitmap-write.c | 96 +++++++++++++++++++++++++++++----------------
 pack-bitmap.h       |  3 ++
 3 files changed, 67 insertions(+), 34 deletions(-)

diff --git a/object.h b/object.h
index 99b9c8f114c..e6f9e89d3c5 100644
--- a/object.h
+++ b/object.h
@@ -81,7 +81,7 @@ void object_array_init(struct object_array *array);
  * reflog.c:                           10--12
  * builtin/show-branch.c:    0-------------------------------------------26
  * builtin/unpack-objects.c:                                 2021
- * pack-bitmap.h:                                                22
+ * pack-bitmap.h:                                              2122
  */
 #define FLAG_BITS  28
 
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index d8870155831..60eb1e71c98 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -25,8 +25,14 @@ struct bitmapped_commit {
 	int flags;
 	int xor_offset;
 	uint32_t commit_pos;
+	unsigned pseudo_merge : 1;
 };
 
+static inline int bitmap_writer_nr_selected_commits(struct bitmap_writer *writer)
+{
+	return writer->selected_nr - writer->pseudo_merges_nr;
+}
+
 void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r)
 {
 	memset(writer, 0, sizeof(struct bitmap_writer));
@@ -129,27 +135,31 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
  */
 
 static inline void push_bitmapped_commit(struct bitmap_writer *writer,
-					 struct commit *commit)
+					 struct commit *commit,
+					 unsigned pseudo_merge)
 {
-	int hash_ret;
-	khiter_t hash_pos;
-
 	if (writer->selected_nr >= writer->selected_alloc) {
 		writer->selected_alloc = (writer->selected_alloc + 32) * 2;
 		REALLOC_ARRAY(writer->selected, writer->selected_alloc);
 	}
 
-	hash_pos = kh_put_oid_map(writer->bitmaps, commit->object.oid,
-				  &hash_ret);
-	if (!hash_ret)
-		die(_("duplicate entry when writing bitmap index: %s"),
-		    oid_to_hex(&commit->object.oid));
-	kh_value(writer->bitmaps, hash_pos) = NULL;
+	if (!pseudo_merge) {
+		int hash_ret;
+		khiter_t hash_pos = kh_put_oid_map(writer->bitmaps,
+						   commit->object.oid,
+						   &hash_ret);
+
+		if (!hash_ret)
+			die(_("duplicate entry when writing bitmap index: %s"),
+			    oid_to_hex(&commit->object.oid));
+		kh_value(writer->bitmaps, hash_pos) = NULL;
+	}
 
 	writer->selected[writer->selected_nr].commit = commit;
 	writer->selected[writer->selected_nr].bitmap = NULL;
 	writer->selected[writer->selected_nr].write_as = NULL;
 	writer->selected[writer->selected_nr].flags = 0;
+	writer->selected[writer->selected_nr].pseudo_merge = pseudo_merge;
 
 	writer->selected_nr++;
 }
@@ -180,16 +190,20 @@ static void compute_xor_offsets(struct bitmap_writer *writer)
 
 	while (next < writer->selected_nr) {
 		struct bitmapped_commit *stored = &writer->selected[next];
-
 		int best_offset = 0;
 		struct ewah_bitmap *best_bitmap = stored->bitmap;
 		struct ewah_bitmap *test_xor;
 
+		if (stored->pseudo_merge)
+			goto next;
+
 		for (i = 1; i <= MAX_XOR_OFFSET_SEARCH; ++i) {
 			int curr = next - i;
 
 			if (curr < 0)
 				break;
+			if (writer->selected[curr].pseudo_merge)
+				continue;
 
 			test_xor = ewah_pool_new();
 			ewah_xor(writer->selected[curr].bitmap, stored->bitmap, test_xor);
@@ -205,6 +219,7 @@ static void compute_xor_offsets(struct bitmap_writer *writer)
 			}
 		}
 
+next:
 		stored->xor_offset = best_offset;
 		stored->write_as = best_bitmap;
 
@@ -217,7 +232,8 @@ struct bb_commit {
 	struct bitmap *commit_mask;
 	struct bitmap *bitmap;
 	unsigned selected:1,
-		 maximal:1;
+		 maximal:1,
+		 pseudo_merge:1;
 	unsigned idx; /* within selected array */
 };
 
@@ -255,17 +271,18 @@ static void bitmap_builder_init(struct bitmap_builder *bb,
 	revs.first_parent_only = 1;
 
 	for (i = 0; i < writer->selected_nr; i++) {
-		struct commit *c = writer->selected[i].commit;
-		struct bb_commit *ent = bb_data_at(&bb->data, c);
+		struct bitmapped_commit *bc = &writer->selected[i];
+		struct bb_commit *ent = bb_data_at(&bb->data, bc->commit);
 
 		ent->selected = 1;
 		ent->maximal = 1;
+		ent->pseudo_merge = bc->pseudo_merge;
 		ent->idx = i;
 
 		ent->commit_mask = bitmap_new();
 		bitmap_set(ent->commit_mask, i);
 
-		add_pending_object(&revs, &c->object, "");
+		add_pending_object(&revs, &bc->commit->object, "");
 	}
 
 	if (prepare_revision_walk(&revs))
@@ -444,8 +461,13 @@ static int fill_bitmap_commit(struct bitmap_writer *writer,
 		struct commit *c = prio_queue_get(queue);
 
 		if (old_bitmap && mapping) {
-			struct ewah_bitmap *old = bitmap_for_commit(old_bitmap, c);
+			struct ewah_bitmap *old;
 			struct bitmap *remapped = bitmap_new();
+
+			if (commit->object.flags & BITMAP_PSEUDO_MERGE)
+				old = NULL;
+			else
+				old = bitmap_for_commit(old_bitmap, c);
 			/*
 			 * If this commit has an old bitmap, then translate that
 			 * bitmap and add its bits to this one. No need to walk
@@ -464,12 +486,14 @@ static int fill_bitmap_commit(struct bitmap_writer *writer,
 		 * Mark ourselves and queue our tree. The commit
 		 * walk ensures we cover all parents.
 		 */
-		pos = find_object_pos(writer, &c->object.oid, &found);
-		if (!found)
-			return -1;
-		bitmap_set(ent->bitmap, pos);
-		prio_queue_put(tree_queue,
-			       repo_get_commit_tree(the_repository, c));
+		if (!(c->object.flags & BITMAP_PSEUDO_MERGE)) {
+			pos = find_object_pos(writer, &c->object.oid, &found);
+			if (!found)
+				return -1;
+			bitmap_set(ent->bitmap, pos);
+			prio_queue_put(tree_queue,
+				       repo_get_commit_tree(the_repository, c));
+		}
 
 		for (p = c->parents; p; p = p->next) {
 			pos = find_object_pos(writer, &p->item->object.oid,
@@ -499,6 +523,9 @@ static void store_selected(struct bitmap_writer *writer,
 
 	stored->bitmap = bitmap_to_ewah(ent->bitmap);
 
+	if (ent->pseudo_merge)
+		return;
+
 	hash_pos = kh_get_oid_map(writer->bitmaps, commit->object.oid);
 	if (hash_pos == kh_end(writer->bitmaps))
 		die(_("attempted to store non-selected commit: '%s'"),
@@ -631,7 +658,7 @@ void bitmap_writer_select_commits(struct bitmap_writer *writer,
 
 	if (indexed_commits_nr < 100) {
 		for (i = 0; i < indexed_commits_nr; ++i)
-			push_bitmapped_commit(writer, indexed_commits[i]);
+			push_bitmapped_commit(writer, indexed_commits[i], 0);
 		return;
 	}
 
@@ -664,7 +691,7 @@ void bitmap_writer_select_commits(struct bitmap_writer *writer,
 			}
 		}
 
-		push_bitmapped_commit(writer, chosen);
+		push_bitmapped_commit(writer, chosen, 0);
 
 		i += next + 1;
 		display_progress(writer->progress, i);
@@ -701,8 +728,11 @@ static void write_selected_commits_v1(struct bitmap_writer *writer,
 {
 	int i;
 
-	for (i = 0; i < writer->selected_nr; ++i) {
+	for (i = 0; i < bitmap_writer_nr_selected_commits(writer); ++i) {
 		struct bitmapped_commit *stored = &writer->selected[i];
+		if (stored->pseudo_merge)
+			BUG("unexpected pseudo-merge among selected: %s",
+			    oid_to_hex(&stored->commit->object.oid));
 
 		if (offsets)
 			offsets[i] = hashfile_total(f);
@@ -735,10 +765,10 @@ static void write_lookup_table(struct bitmap_writer *writer, struct hashfile *f,
 	uint32_t i;
 	uint32_t *table, *table_inv;
 
-	ALLOC_ARRAY(table, writer->selected_nr);
-	ALLOC_ARRAY(table_inv, writer->selected_nr);
+	ALLOC_ARRAY(table, bitmap_writer_nr_selected_commits(writer));
+	ALLOC_ARRAY(table_inv, bitmap_writer_nr_selected_commits(writer));
 
-	for (i = 0; i < writer->selected_nr; i++)
+	for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++)
 		table[i] = i;
 
 	/*
@@ -746,16 +776,16 @@ static void write_lookup_table(struct bitmap_writer *writer, struct hashfile *f,
 	 * bitmap corresponds to j'th bitmapped commit (among the selected
 	 * commits) in lex order of OIDs.
 	 */
-	QSORT_S(table, writer->selected_nr, table_cmp, writer);
+	QSORT_S(table, bitmap_writer_nr_selected_commits(writer), table_cmp, writer);
 
 	/* table_inv helps us discover that relationship (i'th bitmap
 	 * to j'th commit by j = table_inv[i])
 	 */
-	for (i = 0; i < writer->selected_nr; i++)
+	for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++)
 		table_inv[table[i]] = i;
 
 	trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository);
-	for (i = 0; i < writer->selected_nr; i++) {
+	for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++) {
 		struct bitmapped_commit *selected = &writer->selected[table[i]];
 		uint32_t xor_offset = selected->xor_offset;
 		uint32_t xor_row;
@@ -827,7 +857,7 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
 	memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE));
 	header.version = htons(default_version);
 	header.options = htons(flags | options);
-	header.entry_count = htonl(writer->selected_nr);
+	header.entry_count = htonl(bitmap_writer_nr_selected_commits(writer));
 	hashcpy(header.checksum, writer->pack_checksum);
 
 	hashwrite(f, &header, sizeof(header) - GIT_MAX_RAWSZ + the_hash_algo->rawsz);
@@ -839,7 +869,7 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
 	if (options & BITMAP_OPT_LOOKUP_TABLE)
 		CALLOC_ARRAY(offsets, index_nr);
 
-	for (i = 0; i < writer->selected_nr; i++) {
+	for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++) {
 		struct bitmapped_commit *stored = &writer->selected[i];
 		int commit_pos = oid_pos(&stored->commit->object.oid, index,
 					 index_nr, oid_access);
diff --git a/pack-bitmap.h b/pack-bitmap.h
index f87e60153dd..6937a0f090f 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -21,6 +21,7 @@ struct bitmap_disk_header {
 	unsigned char checksum[GIT_MAX_RAWSZ];
 };
 
+#define BITMAP_PSEUDO_MERGE (1u<<21)
 #define NEEDS_BITMAP (1u<<22)
 
 /*
@@ -109,6 +110,8 @@ struct bitmap_writer {
 	struct bitmapped_commit *selected;
 	unsigned int selected_nr, selected_alloc;
 
+	uint32_t pseudo_merges_nr;
+
 	struct progress *progress;
 	int show_progress;
 	unsigned char pack_checksum[GIT_MAX_RAWSZ];
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 14/30] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()`
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (7 preceding siblings ...)
  2024-05-21 19:02   ` [PATCH v3 13/30] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
@ 2024-05-21 19:02   ` Taylor Blau
  2024-05-21 19:02   ` [PATCH v3 15/30] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Taylor Blau
                     ` (16 subsequent siblings)
  25 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Prepare to implement pseudo-merge bitmap selection by implementing a
necessary new function, `bitmap_writer_has_bitmapped_object_id()`.

This function returns whether or not the bitmap_writer selected the
given object ID for bitmapping. This will allow the pseudo-merge
machinery to reject candidates for pseudo-merges if they have already
been selected as an ordinary bitmap tip.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 6 ++++++
 pack-bitmap.h       | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 60eb1e71c98..299aa8af6f5 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -130,6 +130,12 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
 	}
 }
 
+int bitmap_writer_has_bitmapped_object_id(struct bitmap_writer *writer,
+					  const struct object_id *oid)
+{
+	return kh_get_oid_map(writer->bitmaps, *oid) != kh_end(writer->bitmaps);
+}
+
 /**
  * Compute the actual bitmaps
  */
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 6937a0f090f..e175f28e0de 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -125,6 +125,8 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
 				    struct packing_data *to_pack,
 				    struct pack_idx_entry **index,
 				    uint32_t index_nr);
+int bitmap_writer_has_bitmapped_object_id(struct bitmap_writer *writer,
+					  const struct object_id *oid);
 uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 				struct packing_data *mapping);
 int rebuild_bitmap(const uint32_t *reposition,
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 15/30] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (8 preceding siblings ...)
  2024-05-21 19:02   ` [PATCH v3 14/30] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Taylor Blau
@ 2024-05-21 19:02   ` Taylor Blau
  2024-05-21 19:02   ` [PATCH v3 16/30] config: introduce git_config_float() Taylor Blau
                     ` (15 subsequent siblings)
  25 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

The pseudo-merge selection code will be added in a subsequent commit,
and will need a way to push the allocated commit structures into the
bitmap writer from a separate compilation unit.

Make the `bitmap_writer_push_bitmapped_commit()` function part of the
pack-bitmap.h header in order to make this possible.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 9 ++++-----
 pack-bitmap.h       | 2 ++
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 299aa8af6f5..bc19b33ad16 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -140,9 +140,8 @@ int bitmap_writer_has_bitmapped_object_id(struct bitmap_writer *writer,
  * Compute the actual bitmaps
  */
 
-static inline void push_bitmapped_commit(struct bitmap_writer *writer,
-					 struct commit *commit,
-					 unsigned pseudo_merge)
+void bitmap_writer_push_commit(struct bitmap_writer *writer,
+			       struct commit *commit, unsigned pseudo_merge)
 {
 	if (writer->selected_nr >= writer->selected_alloc) {
 		writer->selected_alloc = (writer->selected_alloc + 32) * 2;
@@ -664,7 +663,7 @@ void bitmap_writer_select_commits(struct bitmap_writer *writer,
 
 	if (indexed_commits_nr < 100) {
 		for (i = 0; i < indexed_commits_nr; ++i)
-			push_bitmapped_commit(writer, indexed_commits[i], 0);
+			bitmap_writer_push_commit(writer, indexed_commits[i], 0);
 		return;
 	}
 
@@ -697,7 +696,7 @@ void bitmap_writer_select_commits(struct bitmap_writer *writer,
 			}
 		}
 
-		push_bitmapped_commit(writer, chosen, 0);
+		bitmap_writer_push_commit(writer, chosen, 0);
 
 		i += next + 1;
 		display_progress(writer->progress, i);
diff --git a/pack-bitmap.h b/pack-bitmap.h
index e175f28e0de..a7e2f56c971 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -127,6 +127,8 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
 				    uint32_t index_nr);
 int bitmap_writer_has_bitmapped_object_id(struct bitmap_writer *writer,
 					  const struct object_id *oid);
+void bitmap_writer_push_commit(struct bitmap_writer *writer,
+			       struct commit *commit, unsigned pseudo_merge);
 uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 				struct packing_data *mapping);
 int rebuild_bitmap(const uint32_t *reposition,
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 16/30] config: introduce git_config_float()
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (9 preceding siblings ...)
  2024-05-21 19:02   ` [PATCH v3 15/30] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Taylor Blau
@ 2024-05-21 19:02   ` Taylor Blau
  2024-05-23 10:02     ` Jeff King
  2024-05-21 19:02   ` [PATCH v3 17/30] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
                     ` (14 subsequent siblings)
  25 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Future commits will want to parse a floating point value from
configuration, but we have no way to parse such a value prior to this
patch.

The core of the routine is implemented in git_parse_float(). Unlike
git_parse_unsigned() and git_parse_signed(), however, the function
implemented here only works on type "float", and not related types like
"double", or "long double".

This is because "double" and "long double" use different functions to
convert from ASCII strings to floating point values (strtod() and
strtold(), respectively). Likewise, there is no pointer type that can
assign to any of these values (except for "void *"), so the only way to
define this trio of functions would be with a macro expansion that is
parameterized over the floating point type and conversion function.

That is all doable, but likely to be overkill given our current needs,
which is only to parse floats.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 config.c |  9 +++++++++
 config.h |  6 ++++++
 parse.c  | 29 +++++++++++++++++++++++++++++
 parse.h  |  1 +
 4 files changed, 45 insertions(+)

diff --git a/config.c b/config.c
index 77a0fd2d80e..ee681fda34b 100644
--- a/config.c
+++ b/config.c
@@ -1243,6 +1243,15 @@ ssize_t git_config_ssize_t(const char *name, const char *value,
 	return ret;
 }
 
+float git_config_float(const char *name, const char *value,
+		       const struct key_value_info *kvi)
+{
+	float ret;
+	if (!git_parse_float(value, &ret))
+		die_bad_number(name, value, kvi);
+	return ret;
+}
+
 static const struct fsync_component_name {
 	const char *name;
 	enum fsync_component component_bits;
diff --git a/config.h b/config.h
index f4966e37494..b0d1baba95a 100644
--- a/config.h
+++ b/config.h
@@ -261,6 +261,12 @@ unsigned long git_config_ulong(const char *, const char *,
 ssize_t git_config_ssize_t(const char *, const char *,
 			   const struct key_value_info *);
 
+/**
+ * Identical to `git_config_int`, but for floating point values.
+ */
+float git_config_float(const char *, const char *,
+		       const struct key_value_info *);
+
 /**
  * Same as `git_config_bool`, except that integers are returned as-is, and
  * an `is_bool` flag is unset.
diff --git a/parse.c b/parse.c
index 42d691a0fbb..a5967e80910 100644
--- a/parse.c
+++ b/parse.c
@@ -125,6 +125,35 @@ int git_parse_ssize_t(const char *value, ssize_t *ret)
 	return 1;
 }
 
+int git_parse_float(const char *value, float *ret)
+{
+	char *end;
+	float val;
+	uintmax_t factor;
+
+	if (!value || !*value) {
+		errno = EINVAL;
+		return 0;
+	}
+
+	errno = 0;
+	val = strtof(value, &end);
+	if (errno == ERANGE)
+		return 0;
+	if (end == value) {
+		errno = EINVAL;
+		return 0;
+	}
+	factor = get_unit_factor(end);
+	if (!factor) {
+		errno = EINVAL;
+		return 0;
+	}
+	val *= factor;
+	*ret = val;
+	return 1;
+}
+
 int git_parse_maybe_bool_text(const char *value)
 {
 	if (!value)
diff --git a/parse.h b/parse.h
index 07d2193d698..7df82c5f5b8 100644
--- a/parse.h
+++ b/parse.h
@@ -6,6 +6,7 @@ int git_parse_ssize_t(const char *, ssize_t *);
 int git_parse_ulong(const char *, unsigned long *);
 int git_parse_int(const char *value, int *ret);
 int git_parse_int64(const char *value, int64_t *ret);
+int git_parse_float(const char *value, float *ret);
 
 /**
  * Same as `git_config_bool`, except that it returns -1 on error rather
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 17/30] pseudo-merge: implement support for selecting pseudo-merge commits
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (10 preceding siblings ...)
  2024-05-21 19:02   ` [PATCH v3 16/30] config: introduce git_config_float() Taylor Blau
@ 2024-05-21 19:02   ` Taylor Blau
  2024-05-23 10:12     ` Jeff King
  2024-05-21 19:02   ` [PATCH v3 18/30] pack-bitmap-write.c: write pseudo-merge table Taylor Blau
                     ` (13 subsequent siblings)
  25 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Teach the new pseudo-merge machinery how to select non-bitmapped commits
for inclusion in different pseudo-merge group(s) based on a handful of
criteria.

Note that the selected pseudo-merge commits aren't actually used or
written anywhere yet. This will be done in the following commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config.txt                     |   2 +
 Documentation/config/bitmap-pseudo-merge.txt |  90 ++++
 Documentation/gitpacking.txt                 |  83 ++++
 pack-bitmap-write.c                          |  21 +
 pack-bitmap.h                                |   2 +
 pseudo-merge.c                               | 454 +++++++++++++++++++
 pseudo-merge.h                               |  94 ++++
 7 files changed, 746 insertions(+)
 create mode 100644 Documentation/config/bitmap-pseudo-merge.txt

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 6f649c997c0..caa34311214 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -384,6 +384,8 @@ include::config/apply.txt[]
 
 include::config/attr.txt[]
 
+include::config/bitmap-pseudo-merge.txt[]
+
 include::config/blame.txt[]
 
 include::config/branch.txt[]
diff --git a/Documentation/config/bitmap-pseudo-merge.txt b/Documentation/config/bitmap-pseudo-merge.txt
new file mode 100644
index 00000000000..d4a2023b84a
--- /dev/null
+++ b/Documentation/config/bitmap-pseudo-merge.txt
@@ -0,0 +1,90 @@
+NOTE: The configuration options in `bitmapPseudoMerge.*` are considered
+EXPERIMENTAL and may be subject to change or be removed entirely in the
+future.
+
+bitmapPseudoMerge.<name>.pattern::
+	Regular expression used to match reference names. Commits
+	pointed to by references matching this pattern (and meeting
+	the below criteria, like `bitmapPseudoMerge.<name>.sampleRate`
+	and `bitmapPseudoMerge.<name>.threshold`) will be considered
+	for inclusion in a pseudo-merge bitmap.
++
+Commits are grouped into pseudo-merge groups based on whether or not
+any reference(s) that point at a given commit match the pattern, which
+is an extended regular expression.
++
+Within a pseudo-merge group, commits may be further grouped into
+sub-groups based on the capture groups in the pattern. These
+sub-groupings are formed from the regular expressions by concatenating
+any capture groups from the regular expression, with a '-' dash in
+between.
++
+For example, if the pattern is `refs/tags/`, then all tags (provided
+they meet the below criteria) will be considered candidates for the
+same pseudo-merge group. However, if the pattern is instead
+`refs/remotes/([0-9])+/tags/`, then tags from different remotes will
+be grouped into separate pseudo-merge groups, based on the remote
+number.
+
+bitmapPseudoMerge.<name>.decay::
+	Determines the rate at which consecutive pseudo-merge bitmap
+	groups decrease in size. Must be non-negative. This parameter
+	can be thought of as `k` in the function `f(n) = C * n^-k`,
+	where `f(n)` is the size of the `n`th group.
++
+Setting the decay rate equal to `0` will cause all groups to be the
+same size. Setting the decay rate equal to `1` will cause the `n`th
+group to be `1/n` the size of the initial group.  Higher values of the
+decay rate cause consecutive groups to shrink at an increasing rate.
+The default is `1`.
++
+If all groups are the same size, it is possible that groups containing
+newer commits will be able to be used less often than earlier groups,
+since it is more likely that the references pointing at newer commits
+will be updated more often than a reference pointing at an old commit.
+
+bitmapPseudoMerge.<name>.sampleRate::
+	Determines the proportion of non-bitmapped commits (among
+	reference tips) which are selected for inclusion in an
+	unstable pseudo-merge bitmap. Must be between `0` and `1`
+	(inclusive). The default is `1`.
+
+bitmapPseudoMerge.<name>.threshold::
+	Determines the minimum age of non-bitmapped commits (among
+	reference tips, as above) which are candidates for inclusion
+	in an unstable pseudo-merge bitmap. The default is
+	`1.week.ago`.
+
+bitmapPseudoMerge.<name>.maxMerges::
+	Determines the maximum number of pseudo-merge commits among
+	which commits may be distributed.
++
+For pseudo-merge groups whose pattern does not contain any capture
+groups, this setting is applied for all commits matching the regular
+expression. For patterns that have one or more capture groups, this
+setting is applied for each distinct capture group.
++
+For example, if your capture group is `refs/tags/`, then this setting
+will distribute all tags into a maximum of `maxMerges` pseudo-merge
+commits. However, if your capture group is, say,
+`refs/remotes/([0-9]+)/tags/`, then this setting will be applied to
+each remote's set of tags individually.
++
+Must be non-negative. The default value is 64.
+
+bitmapPseudoMerge.<name>.stableThreshold::
+	Determines the minimum age of commits (among reference tips,
+	as above, however stable commits are still considered
+	candidates even when they have been covered by a bitmap) which
+	are candidates for a stable a pseudo-merge bitmap. The default
+	is `1.month.ago`.
++
+Setting this threshold to a smaller value (e.g., 1.week.ago) will cause
+more stable groups to be generated (which impose a one-time generation
+cost) but those groups will likely become stale over time. Using a
+larger value incurs the opposite penalty (fewer stable groups which are
+more useful).
+
+bitmapPseudoMerge.<name>.stableSize::
+	Determines the size (in number of commits) of a stable
+	psuedo-merge bitmap. The default is `512`.
diff --git a/Documentation/gitpacking.txt b/Documentation/gitpacking.txt
index ff18077129b..1ed645ff910 100644
--- a/Documentation/gitpacking.txt
+++ b/Documentation/gitpacking.txt
@@ -93,6 +93,89 @@ can take advantage of the fact that we only care about the union of
 objects reachable from all of those tags, and answer the query much
 faster.
 
+=== Configuration
+
+Reference tips are grouped into different pseudo-merge groups according
+to two criteria. A reference name matches one or more of the defined
+pseudo-merge patterns, and optionally one or more capture groups within
+that pattern which further partition the group.
+
+Within a group, commits may be considered "stable", or "unstable"
+depending on their age. These are adjusted by setting the
+`bitmapPseudoMerge.<name>.stableThreshold` and
+`bitmapPseudoMerge.<name>.threshold` configuration values, respectively.
+
+All stable commits are grouped into pseudo-merges of equal size
+(`bitmapPseudoMerge.<name>.stableSize`). If the `stableSize`
+configuration is set to, say, 100, then the first 100 commits (ordered
+by committer date) which are older than the `stableThreshold` value will
+form one group, the next 100 commits will form another group, and so on.
+
+Among unstable commits, the pseudo-merge machinery will attempt to
+combine older commits into large groups as opposed to newer commits
+which will appear in smaller groups. This is based on the heuristic that
+references whose tip commit is older are less likely to be modified to
+point at a different commit than a reference whose tip commit is newer.
+
+The size of groups is determined by a power-law decay function, and the
+decay parameter roughly corresponds to "k" in `f(n) = C*n^(-k/100)`,
+where `f(n)` describes the size of the `n`-th pseudo-merge group. The
+sample rate controls what percentage of eligible commits are considered
+as candidates. The threshold parameter indicates the minimum age (so as
+to avoid including too-recent commits in a pseudo-merge group, making it
+less likely to be valid). The "maxMerges" parameter sets an upper-bound
+on the number of pseudo-merge commits an individual group
+
+The "stable"-related parameters control "stable" pseudo-merge groups,
+comprised of a fixed number of commits which are older than the
+configured "stable threshold" value and may be grouped together in
+chunks of "stableSize" in order of age.
+
+The exact configuration for pseudo-merges is as follows:
+
+include::config/bitmap-pseudo-merge.txt[]
+
+=== Examples
+
+Suppose that you have a repository with a large number of references,
+and you want a bare-bones configuration of pseudo-merge bitmaps that
+will enhance bitmap coverage of the `refs/` namespace. You may start
+wiht a configuration like so:
+
+    [bitmapPseudoMerge "all"]
+	pattern = "refs/"
+	threshold = now
+	stableThreshold = never
+	sampleRate = 100
+	maxMerges = 64
+
+This will create pseudo-merge bitmaps for all references, regardless of
+their age, and group them into 64 pseudo-merge commits.
+
+If you wanted to separate tags from branches when generating
+pseudo-merge commits, you would instead define the pattern with a
+capture group, like so:
+
+    [bitmapPseudoMerge "all"]
+	pattern = "refs/(heads/tags)/"
+
+Suppose instead that you are working in a fork-network repository, with
+each fork specified by some numeric ID, and whose refs reside in
+`refs/virtual/NNN/` (where `NNN` is the numeric ID corresponding to some
+fork) in the network. In this instance, you may instead write something
+like:
+
+    [bitmapPseudoMerge "all"]
+	pattern = "refs/virtual/([0-9]+)/(heads|tags)/"
+	threshold = now
+	stableThreshold = never
+	sampleRate = 100
+	maxMerges = 64
+
+Which would generate pseudo-merge group identifiers like "1234-heads",
+and "5678-tags" (for branches in fork "1234", and tags in remote "5678",
+respectively).
+
 SEE ALSO
 --------
 linkgit:git-pack-objects[1]
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index bc19b33ad16..d5884ea5e9c 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -17,6 +17,7 @@
 #include "trace2.h"
 #include "tree.h"
 #include "tree-walk.h"
+#include "pseudo-merge.h"
 
 struct bitmapped_commit {
 	struct commit *commit;
@@ -39,11 +40,25 @@ void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r)
 	if (writer->bitmaps)
 		BUG("bitmap writer already initialized");
 	writer->bitmaps = kh_init_oid_map();
+	writer->pseudo_merge_commits = kh_init_oid_map();
+
+	string_list_init_dup(&writer->pseudo_merge_groups);
+
+	load_pseudo_merges_from_config(&writer->pseudo_merge_groups);
+}
+
+static void free_pseudo_merge_commit_idx(struct pseudo_merge_commit_idx *idx)
+{
+	if (!idx)
+		return;
+	free(idx->pseudo_merge);
+	free(idx);
 }
 
 void bitmap_writer_free(struct bitmap_writer *writer)
 {
 	uint32_t i;
+	struct pseudo_merge_commit_idx *idx;
 
 	if (!writer)
 		return;
@@ -55,6 +70,10 @@ void bitmap_writer_free(struct bitmap_writer *writer)
 
 	kh_destroy_oid_map(writer->bitmaps);
 
+	kh_foreach_value(writer->pseudo_merge_commits, idx,
+			 free_pseudo_merge_commit_idx(idx));
+	kh_destroy_oid_map(writer->pseudo_merge_commits);
+
 	for (i = 0; i < writer->selected_nr; i++) {
 		struct bitmapped_commit *bc = &writer->selected[i];
 		if (bc->write_as != bc->bitmap)
@@ -703,6 +722,8 @@ void bitmap_writer_select_commits(struct bitmap_writer *writer,
 	}
 
 	stop_progress(&writer->progress);
+
+	select_pseudo_merges(writer, indexed_commits, indexed_commits_nr);
 }
 
 
diff --git a/pack-bitmap.h b/pack-bitmap.h
index a7e2f56c971..1e730ea1e54 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -110,6 +110,8 @@ struct bitmap_writer {
 	struct bitmapped_commit *selected;
 	unsigned int selected_nr, selected_alloc;
 
+	struct string_list pseudo_merge_groups;
+	kh_oid_map_t *pseudo_merge_commits; /* oid -> pseudo merge(s) */
 	uint32_t pseudo_merges_nr;
 
 	struct progress *progress;
diff --git a/pseudo-merge.c b/pseudo-merge.c
index 37e037ba272..4be730563eb 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -1,2 +1,456 @@
 #include "git-compat-util.h"
 #include "pseudo-merge.h"
+#include "date.h"
+#include "oid-array.h"
+#include "strbuf.h"
+#include "config.h"
+#include "string-list.h"
+#include "refs.h"
+#include "pack-bitmap.h"
+#include "commit.h"
+#include "alloc.h"
+#include "progress.h"
+
+#define DEFAULT_PSEUDO_MERGE_DECAY 1.0f
+#define DEFAULT_PSEUDO_MERGE_MAX_MERGES 64
+#define DEFAULT_PSEUDO_MERGE_SAMPLE_RATE 1
+#define DEFAULT_PSEUDO_MERGE_THRESHOLD approxidate("1.week.ago")
+#define DEFAULT_PSEUDO_MERGE_STABLE_THRESHOLD approxidate("1.month.ago")
+#define DEFAULT_PSEUDO_MERGE_STABLE_SIZE 512
+
+static float gitexp(float base, int exp)
+{
+	float result = 1;
+	while (1) {
+		if (exp % 2)
+			result *= base;
+		exp >>= 1;
+		if (!exp)
+			break;
+		base *= base;
+	}
+	return result;
+}
+
+static uint32_t pseudo_merge_group_size(const struct pseudo_merge_group *group,
+					const struct pseudo_merge_matches *matches,
+					uint32_t i)
+{
+	float C = 0.0f;
+	uint32_t n;
+
+	/*
+	 * The size of pseudo-merge groups decays according to a power series,
+	 * which looks like:
+	 *
+	 *   f(n) = C * n^-k
+	 *
+	 * , where 'n' is the n-th pseudo-merge group, 'f(n)' is its size, 'k'
+	 * is the decay rate, and 'C' is a scaling value.
+	 *
+	 * The value of C depends on the number of groups, decay rate, and total
+	 * number of commits. It is computed such that if there are M and N
+	 * total groups and commits, respectively, that:
+	 *
+	 *   N = f(0) + f(1) + ... f(M-1)
+	 *
+	 * Rearranging to isolate C, we get:
+	 *
+	 *   N = \sum_{n=1}^M C / n^k
+	 *
+	 *   N / C = \sum_{n=1}^M n^-k
+	 *
+	 *   C = N / \sum_{n=1}^M n^-k
+	 *
+	 * For example, if we have a decay rate of 'k' being equal to 1.5, 'N'
+	 * total commits equal to 10,000, and 'M' being equal to 6 groups, then
+	 * the (rounded) group sizes are:
+	 *
+	 *   { 5469, 1934, 1053, 684, 489, 372 }
+	 *
+	 * increasing the number of total groups, say to 10, scales the group
+	 * sizes appropriately:
+	 *
+	 *   { 5012, 1772, 964, 626, 448, 341, 271, 221, 186, 158 }
+	 */
+	for (n = 0; n < group->max_merges; n++)
+		C += 1.0f / gitexp(n + 1, group->decay);
+	C = matches->unstable_nr / C;
+
+	return (uint32_t)((C / gitexp(i + 1, group->decay)) + 0.5);
+}
+
+static void pseudo_merge_group_init(struct pseudo_merge_group *group)
+{
+	memset(group, 0, sizeof(struct pseudo_merge_group));
+
+	strmap_init_with_options(&group->matches, NULL, 0);
+
+	group->decay = DEFAULT_PSEUDO_MERGE_DECAY;
+	group->max_merges = DEFAULT_PSEUDO_MERGE_MAX_MERGES;
+	group->sample_rate = DEFAULT_PSEUDO_MERGE_SAMPLE_RATE;
+	group->threshold = DEFAULT_PSEUDO_MERGE_THRESHOLD;
+	group->stable_threshold = DEFAULT_PSEUDO_MERGE_STABLE_THRESHOLD;
+	group->stable_size = DEFAULT_PSEUDO_MERGE_STABLE_SIZE;
+}
+
+static int pseudo_merge_config(const char *var, const char *value,
+			       const struct config_context *ctx,
+			       void *cb_data)
+{
+	struct string_list *list = cb_data;
+	struct string_list_item *item;
+	struct pseudo_merge_group *group;
+	struct strbuf buf = STRBUF_INIT;
+	const char *sub, *key;
+	size_t sub_len;
+	int ret = 0;
+
+	if (parse_config_key(var, "bitmappseudomerge", &sub, &sub_len, &key))
+		goto done;
+
+	if (!sub_len)
+		goto done;
+
+	strbuf_add(&buf, sub, sub_len);
+
+	item = string_list_lookup(list, buf.buf);
+	if (!item) {
+		item = string_list_insert(list, buf.buf);
+
+		item->util = xmalloc(sizeof(struct pseudo_merge_group));
+		pseudo_merge_group_init(item->util);
+	}
+
+	group = item->util;
+
+	if (!strcmp(key, "pattern")) {
+		struct strbuf re = STRBUF_INIT;
+
+		free(group->pattern);
+		if (*value != '^')
+			strbuf_addch(&re, '^');
+		strbuf_addstr(&re, value);
+
+		group->pattern = xcalloc(1, sizeof(regex_t));
+		if (regcomp(group->pattern, re.buf, REG_EXTENDED))
+			die(_("failed to load pseudo-merge regex for %s: '%s'"),
+			    sub, re.buf);
+
+		strbuf_release(&re);
+	} else if (!strcmp(key, "decay")) {
+		group->decay = git_config_float(var, value, ctx->kvi);
+		if (group->decay < 0) {
+			warning(_("%s must be non-negative, using default"), var);
+			group->decay = DEFAULT_PSEUDO_MERGE_DECAY;
+		}
+	} else if (!strcmp(key, "samplerate")) {
+		group->sample_rate = git_config_float(var, value, ctx->kvi);
+		if (!(0 <= group->sample_rate && group->sample_rate <= 1)) {
+			warning(_("%s must be between 0 and 1, using default"), var);
+			group->sample_rate = DEFAULT_PSEUDO_MERGE_SAMPLE_RATE;
+		}
+	} else if (!strcmp(key, "threshold")) {
+		if (git_config_expiry_date(&group->threshold, var, value)) {
+			ret = -1;
+			goto done;
+		}
+	} else if (!strcmp(key, "maxmerges")) {
+		group->max_merges = git_config_int(var, value, ctx->kvi);
+		if (group->max_merges < 0) {
+			warning(_("%s must be non-negative, using default"), var);
+			group->max_merges = DEFAULT_PSEUDO_MERGE_MAX_MERGES;
+		}
+	} else if (!strcmp(key, "stablethreshold")) {
+		if (git_config_expiry_date(&group->stable_threshold, var, value)) {
+			ret = -1;
+			goto done;
+		}
+	} else if (!strcmp(key, "stablesize")) {
+		group->stable_size = git_config_int(var, value, ctx->kvi);
+		if (group->stable_size <= 0) {
+			warning(_("%s must be positive, using default"), var);
+			group->stable_size = DEFAULT_PSEUDO_MERGE_STABLE_SIZE;
+		}
+	}
+
+done:
+	strbuf_release(&buf);
+
+	return ret;
+}
+
+void load_pseudo_merges_from_config(struct string_list *list)
+{
+	struct string_list_item *item;
+
+	git_config(pseudo_merge_config, list);
+
+	for_each_string_list_item(item, list) {
+		struct pseudo_merge_group *group = item->util;
+		if (!group->pattern)
+			die(_("pseudo-merge group '%s' missing required pattern"),
+			    item->string);
+		if (group->threshold < group->stable_threshold)
+			die(_("pseudo-merge group '%s' has unstable threshold "
+			      "before stable one"), item->string);
+	}
+}
+
+static int find_pseudo_merge_group_for_ref(const char *refname,
+					   const struct object_id *oid,
+					   int flags UNUSED,
+					   void *_data)
+{
+	struct bitmap_writer *writer = _data;
+	struct object_id peeled;
+	struct commit *c;
+	uint32_t i;
+	int has_bitmap;
+
+	if (!peel_iterated_oid(oid, &peeled))
+		oid = &peeled;
+
+	c = lookup_commit(the_repository, oid);
+	if (!c)
+		return 0;
+
+	has_bitmap = bitmap_writer_has_bitmapped_object_id(writer, oid);
+
+	for (i = 0; i < writer->pseudo_merge_groups.nr; i++) {
+		struct pseudo_merge_group *group;
+		struct pseudo_merge_matches *matches;
+		struct strbuf group_name = STRBUF_INIT;
+		regmatch_t captures[16];
+		size_t j;
+
+		group = writer->pseudo_merge_groups.items[i].util;
+		if (regexec(group->pattern, refname, ARRAY_SIZE(captures),
+			    captures, 0))
+			continue;
+
+		if (captures[ARRAY_SIZE(captures) - 1].rm_so != -1)
+			warning(_("pseudo-merge regex from config has too many capture "
+				  "groups (max=%"PRIuMAX")"),
+				(uintmax_t)ARRAY_SIZE(captures) - 2);
+
+		for (j = !!group->pattern->re_nsub; j < ARRAY_SIZE(captures); j++) {
+			regmatch_t *match = &captures[j];
+			if (match->rm_so == -1)
+				continue;
+
+			if (group_name.len)
+				strbuf_addch(&group_name, '-');
+
+			strbuf_add(&group_name, refname + match->rm_so,
+				   match->rm_eo - match->rm_so);
+		}
+
+		matches = strmap_get(&group->matches, group_name.buf);
+		if (!matches) {
+			matches = xcalloc(1, sizeof(*matches));
+			strmap_put(&group->matches, strbuf_detach(&group_name, NULL),
+				   matches);
+		}
+
+		if (c->date <= group->stable_threshold) {
+			ALLOC_GROW(matches->stable, matches->stable_nr + 1,
+				   matches->stable_alloc);
+			matches->stable[matches->stable_nr++] = c;
+		} else if (c->date <= group->threshold && !has_bitmap) {
+			ALLOC_GROW(matches->unstable, matches->unstable_nr + 1,
+				   matches->unstable_alloc);
+			matches->unstable[matches->unstable_nr++] = c;
+		}
+
+		strbuf_release(&group_name);
+	}
+
+	return 0;
+}
+
+static struct commit *push_pseudo_merge(struct pseudo_merge_group *group)
+{
+	struct commit *merge;
+
+	ALLOC_GROW(group->merges, group->merges_nr + 1, group->merges_alloc);
+
+	merge = alloc_commit_node(the_repository);
+	merge->object.parsed = 1;
+	merge->object.flags |= BITMAP_PSEUDO_MERGE;
+
+	group->merges[group->merges_nr++] = merge;
+
+	return merge;
+}
+
+static struct pseudo_merge_commit_idx *pseudo_merge_idx(kh_oid_map_t *pseudo_merge_commits,
+							const struct object_id *oid)
+
+{
+	struct pseudo_merge_commit_idx *pmc;
+	int hash_ret;
+	khiter_t hash_pos = kh_put_oid_map(pseudo_merge_commits, *oid,
+					   &hash_ret);
+
+	if (hash_ret) {
+		CALLOC_ARRAY(pmc, 1);
+		kh_value(pseudo_merge_commits, hash_pos) = pmc;
+	} else {
+		pmc = kh_value(pseudo_merge_commits, hash_pos);
+	}
+
+	return pmc;
+}
+
+#define MIN_PSEUDO_MERGE_SIZE 8
+
+static void select_pseudo_merges_1(struct bitmap_writer *writer,
+				   struct pseudo_merge_group *group,
+				   struct pseudo_merge_matches *matches)
+{
+	uint32_t i, j;
+	uint32_t stable_merges_nr;
+
+	if (!matches->stable_nr && !matches->unstable_nr)
+		return; /* all tips in this group already have bitmaps */
+
+	stable_merges_nr = matches->stable_nr / group->stable_size;
+	if (matches->stable_nr % group->stable_size)
+		stable_merges_nr++;
+
+	/* make stable_merges_nr pseudo merges for stable commits */
+	for (i = 0, j = 0; i < stable_merges_nr; i++) {
+		struct commit *merge;
+		struct commit_list **p;
+
+		merge = push_pseudo_merge(group);
+		p = &merge->parents;
+
+		/*
+		 * For each pseudo-merge created above, add parents to the
+		 * allocated commit node from the stable set of commits
+		 * (un-bitmapped, newer than the stable threshold).
+		 */
+		do {
+			struct commit *c;
+			struct pseudo_merge_commit_idx *pmc;
+
+			if (j >= matches->stable_nr)
+				break;
+
+			c = matches->stable[j++];
+			/*
+			 * Here and below, make sure that we keep our mapping of
+			 * commits -> pseudo-merge(s) which include the key'd
+			 * commit up-to-date.
+			 */
+			pmc = pseudo_merge_idx(writer->pseudo_merge_commits,
+					       &c->object.oid);
+
+			ALLOC_GROW(pmc->pseudo_merge, pmc->nr + 1, pmc->alloc);
+
+			pmc->pseudo_merge[pmc->nr++] = writer->pseudo_merges_nr;
+			p = commit_list_append(c, p);
+		} while (j % group->stable_size);
+
+		bitmap_writer_push_commit(writer, merge, 1);
+		writer->pseudo_merges_nr++;
+	}
+
+	/* make up to group->max_merges pseudo merges for unstable commits */
+	for (i = 0, j = 0; i < group->max_merges; i++) {
+		struct commit *merge;
+		struct commit_list **p;
+		uint32_t size, end;
+
+		merge = push_pseudo_merge(group);
+		p = &merge->parents;
+
+		size = pseudo_merge_group_size(group, matches, i);
+		end = size < MIN_PSEUDO_MERGE_SIZE ? matches->unstable_nr : j + size;
+
+		/*
+		 * For each pseudo-merge commit created above, add parents to
+		 * the allocated commit node from the unstable set of commits
+		 * (newer than the stable threshold).
+		 *
+		 * Account for the sample rate, since not every candidate from
+		 * the set of stable commits will be included as a pseudo-merge
+		 * parent.
+		 */
+		for (; j < end && j < matches->unstable_nr; j++) {
+			struct commit *c = matches->unstable[j];
+			struct pseudo_merge_commit_idx *pmc;
+
+			if (j % (uint32_t)(1.0f / group->sample_rate))
+				continue;
+
+			pmc = pseudo_merge_idx(writer->pseudo_merge_commits,
+					       &c->object.oid);
+
+			ALLOC_GROW(pmc->pseudo_merge, pmc->nr + 1, pmc->alloc);
+
+			pmc->pseudo_merge[pmc->nr++] = writer->pseudo_merges_nr;
+			p = commit_list_append(c, p);
+		}
+
+		bitmap_writer_push_commit(writer, merge, 1);
+		writer->pseudo_merges_nr++;
+		if (end >= matches->unstable_nr)
+			break;
+	}
+}
+
+static int commit_date_cmp(const void *va, const void *vb)
+{
+	timestamp_t a = (*(const struct commit **)va)->date;
+	timestamp_t b = (*(const struct commit **)vb)->date;
+
+	if (a < b)
+		return -1;
+	else if (a > b)
+		return 1;
+	return 0;
+}
+
+static void sort_pseudo_merge_matches(struct pseudo_merge_matches *matches)
+{
+	QSORT(matches->stable, matches->stable_nr, commit_date_cmp);
+	QSORT(matches->unstable, matches->unstable_nr, commit_date_cmp);
+}
+
+void select_pseudo_merges(struct bitmap_writer *writer,
+			  struct commit **commits, size_t commits_nr)
+{
+	struct progress *progress = NULL;
+	uint32_t i;
+
+	if (!writer->pseudo_merge_groups.nr)
+		return;
+
+	if (writer->show_progress)
+		progress = start_progress("Selecting pseudo-merge commits",
+					  writer->pseudo_merge_groups.nr);
+
+	for_each_ref(find_pseudo_merge_group_for_ref, writer);
+
+	for (i = 0; i < writer->pseudo_merge_groups.nr; i++) {
+		struct pseudo_merge_group *group;
+		struct hashmap_iter iter;
+		struct strmap_entry *e;
+
+		group = writer->pseudo_merge_groups.items[i].util;
+		strmap_for_each_entry(&group->matches, &iter, e) {
+			struct pseudo_merge_matches *matches = e->value;
+
+			sort_pseudo_merge_matches(matches);
+
+			select_pseudo_merges_1(writer, group, matches);
+		}
+
+		display_progress(progress, i + 1);
+	}
+
+	stop_progress(&progress);
+}
diff --git a/pseudo-merge.h b/pseudo-merge.h
index cab8ff6960a..cab54daf14b 100644
--- a/pseudo-merge.h
+++ b/pseudo-merge.h
@@ -2,5 +2,99 @@
 #define PSEUDO_MERGE_H
 
 #include "git-compat-util.h"
+#include "strmap.h"
+#include "khash.h"
+#include "ewah/ewok.h"
+
+struct commit;
+struct string_list;
+struct bitmap_index;
+struct bitmap_writer;
+
+/*
+ * A pseudo-merge group tracks the set of non-bitmapped reference tips
+ * that match the given pattern.
+ *
+ * Within those matches, they are further segmented by separating
+ * consecutive capture groups with '-' dash character capture groups
+ * with '-' dash characters.
+ *
+ * Those groups are then ordered by committer date and partitioned
+ * into individual pseudo-merge(s) according to the decay, max_merges,
+ * sample_rate, and threshold parameters.
+ */
+struct pseudo_merge_group {
+	regex_t *pattern;
+
+	/* capture group(s) -> struct pseudo_merge_matches */
+	struct strmap matches;
+
+	/*
+	 * The individual pseudo-merge(s) that are generated from the
+	 * above array of matches, partitioned according to the below
+	 * parameters.
+	 */
+	struct commit **merges;
+	size_t merges_nr;
+	size_t merges_alloc;
+
+	/*
+	 * Pseudo-merge grouping parameters. See git-config(1) for
+	 * more information.
+	 */
+	float decay;
+	int max_merges;
+	float sample_rate;
+	int stable_size;
+	timestamp_t threshold;
+	timestamp_t stable_threshold;
+};
+
+struct pseudo_merge_matches {
+	struct commit **stable;
+	struct commit **unstable;
+	size_t stable_nr, stable_alloc;
+	size_t unstable_nr, unstable_alloc;
+};
+
+/*
+ * Read the repository's configuration:
+ *
+ *   - bitmapPseudoMerge.<name>.pattern
+ *   - bitmapPseudoMerge.<name>.decay
+ *   - bitmapPseudoMerge.<name>.sampleRate
+ *   - bitmapPseudoMerge.<name>.threshold
+ *   - bitmapPseudoMerge.<name>.maxMerges
+ *   - bitmapPseudoMerge.<name>.stableThreshold
+ *   - bitmapPseudoMerge.<name>.stableSize
+ *
+ * and populates the given `list` with pseudo-merge groups. String
+ * entry keys are the pseudo-merge group names, and the values are
+ * pointers to the pseudo_merge_group structure itself.
+ */
+void load_pseudo_merges_from_config(struct string_list *list);
+
+/*
+ * A pseudo-merge commit index (pseudo_merge_commit_idx) maps a
+ * particular (non-pseudo-merge) commit to the list of pseudo-merge(s)
+ * it appears in.
+ */
+struct pseudo_merge_commit_idx {
+	uint32_t *pseudo_merge;
+	size_t nr, alloc;
+};
+
+/*
+ * Selects pseudo-merges from a list of commits, populating the given
+ * string_list of pseudo-merge groups.
+ *
+ * Populates the pseudo_merge_commits map with a commit_idx
+ * corresponding to each commit in the list. Counts the total number
+ * of pseudo-merges generated.
+ *
+ * Optionally shows a progress meter.
+ */
+void select_pseudo_merges(struct bitmap_writer *writer,
+			  struct commit **commits, size_t commits_nr);
 
 #endif
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 18/30] pack-bitmap-write.c: write pseudo-merge table
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (11 preceding siblings ...)
  2024-05-21 19:02   ` [PATCH v3 17/30] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
@ 2024-05-21 19:02   ` Taylor Blau
  2024-05-21 19:02   ` [PATCH v3 19/30] pack-bitmap: extract `read_bitmap()` function Taylor Blau
                     ` (12 subsequent siblings)
  25 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Now that the pack-bitmap writer machinery understands how to select and
store pseudo-merge commits, teach it how to write the new optional
pseudo-merge .bitmap extension.

No readers yet exist for this new extension to the .bitmap format. The
following commits will take any preparatory step(s) necessary before
then implementing the routines necessary to read this new table.

In the meantime, the new `write_pseudo_merges()` function implements
writing this new format as described by a previous commit in
Documentation/technical/bitmap-format.txt.

Writing this table is fairly straightforward and consists of a few
sub-components:

  - a pair of bitmaps for each pseudo-merge (one for the pseudo-merge
    "parents", and another for the objects reachable from those parents)

  - for each commit, the offset of either (a) the pseudo-merge it
    belongs to, or (b) an extended lookup table if it belongs to >1
    pseudo-merge groups

  - if there are any commits belonging to >1 pseudo-merge group, the
    extended lookup tables (which each consist of the number of
    pseudo-merge groups a commit appears in, and then that many 4-byte
    unsigned )

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 131 ++++++++++++++++++++++++++++++++++++++++++++
 pack-bitmap.h       |   1 +
 2 files changed, 132 insertions(+)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index d5884ea5e9c..47250398aa2 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -18,6 +18,7 @@
 #include "tree.h"
 #include "tree-walk.h"
 #include "pseudo-merge.h"
+#include "oid-array.h"
 
 struct bitmapped_commit {
 	struct commit *commit;
@@ -771,6 +772,130 @@ static void write_selected_commits_v1(struct bitmap_writer *writer,
 	}
 }
 
+static void write_pseudo_merges(struct bitmap_writer *writer,
+				struct hashfile *f)
+{
+	struct oid_array commits = OID_ARRAY_INIT;
+	struct bitmap **commits_bitmap = NULL;
+	off_t *pseudo_merge_ofs = NULL;
+	off_t start, table_start, next_ext;
+
+	uint32_t base = bitmap_writer_nr_selected_commits(writer);
+	size_t i, j = 0;
+
+	CALLOC_ARRAY(commits_bitmap, writer->pseudo_merges_nr);
+	CALLOC_ARRAY(pseudo_merge_ofs, writer->pseudo_merges_nr);
+
+	for (i = 0; i < writer->pseudo_merges_nr; i++) {
+		struct bitmapped_commit *merge = &writer->selected[base + i];
+		struct commit_list *p;
+
+		if (!merge->pseudo_merge)
+			BUG("found non-pseudo merge commit at %"PRIuMAX, (uintmax_t)i);
+
+		commits_bitmap[i] = bitmap_new();
+
+		for (p = merge->commit->parents; p; p = p->next)
+			bitmap_set(commits_bitmap[i],
+				   find_object_pos(writer, &p->item->object.oid,
+						   NULL));
+	}
+
+	start = hashfile_total(f);
+
+	for (i = 0; i < writer->pseudo_merges_nr; i++) {
+		struct ewah_bitmap *commits_ewah = bitmap_to_ewah(commits_bitmap[i]);
+
+		pseudo_merge_ofs[i] = hashfile_total(f);
+
+		dump_bitmap(f, commits_ewah);
+		dump_bitmap(f, writer->selected[base+i].write_as);
+
+		ewah_free(commits_ewah);
+	}
+
+	next_ext = st_add(hashfile_total(f),
+			  st_mult(kh_size(writer->pseudo_merge_commits),
+				  sizeof(uint64_t)));
+
+	table_start = hashfile_total(f);
+
+	commits.alloc = kh_size(writer->pseudo_merge_commits);
+	CALLOC_ARRAY(commits.oid, commits.alloc);
+
+	for (i = kh_begin(writer->pseudo_merge_commits); i != kh_end(writer->pseudo_merge_commits); i++) {
+		if (!kh_exist(writer->pseudo_merge_commits, i))
+			continue;
+		oid_array_append(&commits, &kh_key(writer->pseudo_merge_commits, i));
+	}
+
+	oid_array_sort(&commits);
+
+	/* write lookup table (non-extended) */
+	for (i = 0; i < commits.nr; i++) {
+		int hash_pos;
+		struct pseudo_merge_commit_idx *c;
+
+		hash_pos = kh_get_oid_map(writer->pseudo_merge_commits,
+					  commits.oid[i]);
+		if (hash_pos == kh_end(writer->pseudo_merge_commits))
+			BUG("could not find pseudo-merge commit %s",
+			    oid_to_hex(&commits.oid[i]));
+
+		c = kh_value(writer->pseudo_merge_commits, hash_pos);
+
+		hashwrite_be32(f, find_object_pos(writer, &commits.oid[i],
+						  NULL));
+		if (c->nr == 1)
+			hashwrite_be64(f, pseudo_merge_ofs[c->pseudo_merge[0]]);
+		else if (c->nr > 1) {
+			if (next_ext & ((uint64_t)1<<63))
+				die(_("too many pseudo-merges"));
+			hashwrite_be64(f, next_ext | ((uint64_t)1<<63));
+			next_ext = st_add3(next_ext,
+					   sizeof(uint32_t),
+					   st_mult(c->nr, sizeof(uint64_t)));
+		} else
+			BUG("expected commit '%s' to have at least one "
+			    "pseudo-merge", oid_to_hex(&commits.oid[i]));
+	}
+
+	/* write lookup table (extended) */
+	for (i = 0; i < commits.nr; i++) {
+		int hash_pos;
+		struct pseudo_merge_commit_idx *c;
+
+		hash_pos = kh_get_oid_map(writer->pseudo_merge_commits,
+					  commits.oid[i]);
+		if (hash_pos == kh_end(writer->pseudo_merge_commits))
+			BUG("could not find pseudo-merge commit %s",
+			    oid_to_hex(&commits.oid[i]));
+
+		c = kh_value(writer->pseudo_merge_commits, hash_pos);
+		if (c->nr == 1)
+			continue;
+
+		hashwrite_be32(f, c->nr);
+		for (j = 0; j < c->nr; j++)
+			hashwrite_be64(f, pseudo_merge_ofs[c->pseudo_merge[j]]);
+	}
+
+	/* write positions for all pseudo merges */
+	for (i = 0; i < writer->pseudo_merges_nr; i++)
+		hashwrite_be64(f, pseudo_merge_ofs[i]);
+
+	hashwrite_be32(f, writer->pseudo_merges_nr);
+	hashwrite_be32(f, kh_size(writer->pseudo_merge_commits));
+	hashwrite_be64(f, table_start - start);
+	hashwrite_be64(f, hashfile_total(f) - start + sizeof(uint64_t));
+
+	for (i = 0; i < writer->pseudo_merges_nr; i++)
+		bitmap_free(commits_bitmap[i]);
+
+	free(pseudo_merge_ofs);
+	free(commits_bitmap);
+}
+
 static int table_cmp(const void *_va, const void *_vb, void *_data)
 {
 	struct bitmap_writer *writer = _data;
@@ -878,6 +1003,9 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
 
 	int fd = odb_mkstemp(&tmp_file, "pack/tmp_bitmap_XXXXXX");
 
+	if (writer->pseudo_merges_nr)
+		options |= BITMAP_OPT_PSEUDO_MERGES;
+
 	f = hashfd(fd, tmp_file.buf);
 
 	memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE));
@@ -907,6 +1035,9 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
 
 	write_selected_commits_v1(writer, f, offsets);
 
+	if (options & BITMAP_OPT_PSEUDO_MERGES)
+		write_pseudo_merges(writer, f);
+
 	if (options & BITMAP_OPT_LOOKUP_TABLE)
 		write_lookup_table(writer, f, offsets);
 
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 1e730ea1e54..db9ae554fa8 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -37,6 +37,7 @@ enum pack_bitmap_opts {
 	BITMAP_OPT_FULL_DAG = 0x1,
 	BITMAP_OPT_HASH_CACHE = 0x4,
 	BITMAP_OPT_LOOKUP_TABLE = 0x10,
+	BITMAP_OPT_PSEUDO_MERGES = 0x20,
 };
 
 enum pack_bitmap_flags {
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 19/30] pack-bitmap: extract `read_bitmap()` function
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (12 preceding siblings ...)
  2024-05-21 19:02   ` [PATCH v3 18/30] pack-bitmap-write.c: write pseudo-merge table Taylor Blau
@ 2024-05-21 19:02   ` Taylor Blau
  2024-05-21 19:02   ` [PATCH v3 20/30] pseudo-merge: scaffolding for reads Taylor Blau
                     ` (11 subsequent siblings)
  25 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

The pack-bitmap machinery uses the `read_bitmap_1()` function to read a
bitmap from within the mmap'd region corresponding to the .bitmap file.
As as side-effect of calling this function, `read_bitmap_1()` increments
the `index->map_pos` variable to reflect the number of bytes read.

Extract the core of this routine to a separate function (that operates
over a `const unsigned char *`, a `size_t` and a `size_t *` pointer)
instead of a `struct bitmap_index *` pointer.

This function (called `read_bitmap()`) is part of the pack-bitmap.h API
so that it can be used within the upcoming portion of the implementation
in pseduo-merge.ch.

Rewrite the existing function, `read_bitmap_1()`, in terms of its more
generic counterpart.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 24 +++++++++++++++---------
 pack-bitmap.h |  2 ++
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 35c5ef9d3cd..3519edb896b 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -129,17 +129,13 @@ static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
 	return composed;
 }
 
-/*
- * Read a bitmap from the current read position on the mmaped
- * index, and increase the read position accordingly
- */
-static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
+struct ewah_bitmap *read_bitmap(const unsigned char *map,
+				size_t map_size, size_t *map_pos)
 {
 	struct ewah_bitmap *b = ewah_pool_new();
 
-	ssize_t bitmap_size = ewah_read_mmap(b,
-		index->map + index->map_pos,
-		index->map_size - index->map_pos);
+	ssize_t bitmap_size = ewah_read_mmap(b, map + *map_pos,
+					     map_size - *map_pos);
 
 	if (bitmap_size < 0) {
 		error(_("failed to load bitmap index (corrupted?)"));
@@ -147,10 +143,20 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
 		return NULL;
 	}
 
-	index->map_pos += bitmap_size;
+	*map_pos += bitmap_size;
+
 	return b;
 }
 
+/*
+ * Read a bitmap from the current read position on the mmaped
+ * index, and increase the read position accordingly
+ */
+static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
+{
+	return read_bitmap(index->map, index->map_size, &index->map_pos);
+}
+
 static uint32_t bitmap_num_objects(struct bitmap_index *index)
 {
 	if (index->midx)
diff --git a/pack-bitmap.h b/pack-bitmap.h
index db9ae554fa8..21aabf805ea 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -160,4 +160,6 @@ int bitmap_is_preferred_refname(struct repository *r, const char *refname);
 
 int verify_bitmap_files(struct repository *r);
 
+struct ewah_bitmap *read_bitmap(const unsigned char *map,
+				size_t map_size, size_t *map_pos);
 #endif
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 20/30] pseudo-merge: scaffolding for reads
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (13 preceding siblings ...)
  2024-05-21 19:02   ` [PATCH v3 19/30] pack-bitmap: extract `read_bitmap()` function Taylor Blau
@ 2024-05-21 19:02   ` Taylor Blau
  2024-05-21 19:02   ` [PATCH v3 21/30] pack-bitmap.c: read pseudo-merge extension Taylor Blau
                     ` (10 subsequent siblings)
  25 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Implement scaffolding within the new pseudo-merge compilation unit
necessary to use the pseudo-merge API from within the pack-bitmap.c
machinery.

The core of this scaffolding is two-fold:

  - The `pseudo_merge` structure itself, which represents an individual
    pseudo-merge bitmap. It has fields for both bitmaps, as well as
    metadata about its position within the memory-mapped region, and
    a few extra bits indicating whether or not it is satisfied, and
    which bitmaps(s, if any) have been read, since they are initialized
    lazily.

  - The `pseudo_merge_map` structure, which holds an array of
    pseudo_merges, as well as a pointer to the memory-mapped region
    containing the pseudo-merge serialization from within a .bitmap
    file.

Note that the `bitmap_index` structure is defined statically within the
pack-bitmap.o compilation unit, so we can't take in a `struct
bitmap_index *`. Instead, wrap the primary components necessary to read
the pseudo-merges in this new structure to avoid exposing the
implementation details of the `bitmap_index` structure.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pseudo-merge.c | 10 ++++++++
 pseudo-merge.h | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+)

diff --git a/pseudo-merge.c b/pseudo-merge.c
index 4be730563eb..1aca70ecdfb 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -454,3 +454,13 @@ void select_pseudo_merges(struct bitmap_writer *writer,
 
 	stop_progress(&progress);
 }
+
+void free_pseudo_merge_map(struct pseudo_merge_map *pm)
+{
+	uint32_t i;
+	for (i = 0; i < pm->nr; i++) {
+		ewah_pool_free(pm->v[i].commits);
+		ewah_pool_free(pm->v[i].bitmap);
+	}
+	free(pm->v);
+}
diff --git a/pseudo-merge.h b/pseudo-merge.h
index cab54daf14b..a3f0243062c 100644
--- a/pseudo-merge.h
+++ b/pseudo-merge.h
@@ -97,4 +97,69 @@ struct pseudo_merge_commit_idx {
 void select_pseudo_merges(struct bitmap_writer *writer,
 			  struct commit **commits, size_t commits_nr);
 
+/*
+ * Represents a serialized view of a file containing pseudo-merge(s)
+ * (see Documentation/technical/bitmap-format.txt for a specification
+ * of the format).
+ */
+struct pseudo_merge_map {
+	/*
+	 * An array of pseudo-merge(s), lazily loaded from the .bitmap
+	 * file.
+	 */
+	struct pseudo_merge *v;
+	size_t nr;
+	size_t commits_nr;
+
+	/*
+	 * Pointers into a memory-mapped view of the .bitmap file:
+	 *
+	 *   - map: the beginning of the .bitmap file
+	 *   - commits: the beginning of the pseudo-merge commit index
+	 *   - map_size: the size of the .bitmap file
+	 */
+	const unsigned char *map;
+	const unsigned char *commits;
+
+	size_t map_size;
+};
+
+/*
+ * An individual pseudo-merge, storing a pair of lazily-loaded
+ * bitmaps:
+ *
+ *  - commits: the set of commit(s) that are part of the pseudo-merge
+ *  - bitmap: the set of object(s) reachable from the above set of
+ *    commits.
+ *
+ * The `at` and `bitmap_at` fields are used to store the locations of
+ * each of the above bitmaps in the .bitmap file.
+ */
+struct pseudo_merge {
+	struct ewah_bitmap *commits;
+	struct ewah_bitmap *bitmap;
+
+	off_t at;
+	off_t bitmap_at;
+
+	/*
+	 * `satisfied` indicates whether the given pseudo-merge has been
+	 * used.
+	 *
+	 * `loaded_commits` and `loaded_bitmap` indicate whether the
+	 * respective bitmaps have been loaded and read from the
+	 * .bitmap file.
+	 */
+	unsigned satisfied : 1,
+		 loaded_commits : 1,
+		 loaded_bitmap : 1;
+};
+
+/*
+ * Frees the given pseudo-merge map, releasing any memory held by (a)
+ * parsed EWAH bitmaps, or (b) the array of pseudo-merges itself. Does
+ * not free the memory-mapped view of the .bitmap file.
+ */
+void free_pseudo_merge_map(struct pseudo_merge_map *pm);
+
 #endif
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 21/30] pack-bitmap.c: read pseudo-merge extension
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (14 preceding siblings ...)
  2024-05-21 19:02   ` [PATCH v3 20/30] pseudo-merge: scaffolding for reads Taylor Blau
@ 2024-05-21 19:02   ` Taylor Blau
  2024-05-21 19:02   ` [PATCH v3 22/30] pseudo-merge: implement support for reading pseudo-merge commits Taylor Blau
                     ` (9 subsequent siblings)
  25 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Now that the scaffolding for reading the pseudo-merge extension has been
laid, teach the pack-bitmap machinery to read the pseudo-merge extension
when present.

Note that pseudo-merges themselves are not yet used during traversal,
this step will be taken by a future commit.

In the meantime, read the table and initialize the pseudo_merge_map
structure introduced by a previous commit. When the pseudo-merge
extension is present, `load_bitmap_header()` performs basic sanity
checks to make sure that the table is well-formed.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 3519edb896b..fc9c3e2fc43 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -20,6 +20,7 @@
 #include "list-objects-filter-options.h"
 #include "midx.h"
 #include "config.h"
+#include "pseudo-merge.h"
 
 /*
  * An entry on the bitmap index, representing the bitmap for a given
@@ -86,6 +87,9 @@ struct bitmap_index {
 	 */
 	unsigned char *table_lookup;
 
+	/* This contains the pseudo-merge cache within 'map' (if found). */
+	struct pseudo_merge_map pseudo_merges;
+
 	/*
 	 * Extended index.
 	 *
@@ -205,6 +209,41 @@ static int load_bitmap_header(struct bitmap_index *index)
 				index->table_lookup = (void *)(index_end - table_size);
 			index_end -= table_size;
 		}
+
+		if (flags & BITMAP_OPT_PSEUDO_MERGES) {
+			unsigned char *pseudo_merge_ofs;
+			size_t table_size;
+			uint32_t i;
+
+			if (sizeof(table_size) > index_end - index->map - header_size)
+				return error(_("corrupted bitmap index file (too short to fit pseudo-merge table header)"));
+
+			table_size = get_be64(index_end - 8);
+			if (table_size > index_end - index->map - header_size)
+				return error(_("corrupted bitmap index file (too short to fit pseudo-merge table)"));
+
+			if (git_env_bool("GIT_TEST_USE_PSEUDO_MERGES", 1)) {
+				const unsigned char *ext = (index_end - table_size);
+
+				index->pseudo_merges.map = index->map;
+				index->pseudo_merges.map_size = index->map_size;
+				index->pseudo_merges.commits = ext + get_be64(index_end - 16);
+				index->pseudo_merges.commits_nr = get_be32(index_end - 20);
+				index->pseudo_merges.nr = get_be32(index_end - 24);
+
+				CALLOC_ARRAY(index->pseudo_merges.v,
+					     index->pseudo_merges.nr);
+
+				pseudo_merge_ofs = index_end - 24 -
+					(index->pseudo_merges.nr * sizeof(uint64_t));
+				for (i = 0; i < index->pseudo_merges.nr; i++) {
+					index->pseudo_merges.v[i].at = get_be64(pseudo_merge_ofs);
+					pseudo_merge_ofs += sizeof(uint64_t);
+				}
+			}
+
+			index_end -= table_size;
+		}
 	}
 
 	index->entry_count = ntohl(header->entry_count);
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 22/30] pseudo-merge: implement support for reading pseudo-merge commits
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (15 preceding siblings ...)
  2024-05-21 19:02   ` [PATCH v3 21/30] pack-bitmap.c: read pseudo-merge extension Taylor Blau
@ 2024-05-21 19:02   ` Taylor Blau
  2024-05-23 10:40     ` Jeff King
  2024-05-21 19:02   ` [PATCH v3 23/30] ewah: implement `ewah_bitmap_popcount()` Taylor Blau
                     ` (8 subsequent siblings)
  25 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Implement the basic API for reading pseudo-merge bitmaps, which consists
of four basic functions:

  - pseudo_merge_bitmap()
  - use_pseudo_merge()
  - apply_pseudo_merges_for_commit()
  - cascade_pseudo_merges()

These functions are all documented in pseudo-merge.h, but their rough
descriptions are as follows:

  - pseudo_merge_bitmap() reads and inflates the objects EWAH bitmap for
    a given pseudo-merge

  - use_pseudo_merge() does the same as pseudo_merge_bitmap(), but on
    the commits EWAH bitmap, not the objects bitmap

  - apply_pseudo_merges_for_commit() applies all satisfied pseudo-merge
    commits for a given result set, and cascades any yet-unsatisfied
    pseudo-merges if any were applied in the previous step

  - cascade_pseudo_merges() applies all pseudo-merges which are
    satisfied but have not been previously applied, repeating this
    process until no more pseudo-merges can be applied

The core of the API is the latter two functions, which are responsible
for applying pseudo-merges during the object traversal implemented in
the pack-bitmap machinery.

The other two functions (pseudo_merge_bitmap(), and use_pseudo_merge())
are low-level ways to interact with the pseudo-merge machinery, which
will be useful in future commits.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pseudo-merge.c | 231 +++++++++++++++++++++++++++++++++++++++++++++++++
 pseudo-merge.h |  44 ++++++++++
 2 files changed, 275 insertions(+)

diff --git a/pseudo-merge.c b/pseudo-merge.c
index 1aca70ecdfb..0f50ac6183e 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -10,6 +10,7 @@
 #include "commit.h"
 #include "alloc.h"
 #include "progress.h"
+#include "hex.h"
 
 #define DEFAULT_PSEUDO_MERGE_DECAY 1.0f
 #define DEFAULT_PSEUDO_MERGE_MAX_MERGES 64
@@ -464,3 +465,233 @@ void free_pseudo_merge_map(struct pseudo_merge_map *pm)
 	}
 	free(pm->v);
 }
+
+struct pseudo_merge_commit_ext {
+	uint32_t nr;
+	const unsigned char *ptr;
+};
+
+static int pseudo_merge_ext_at(const struct pseudo_merge_map *pm,
+			       struct pseudo_merge_commit_ext *ext, size_t at)
+{
+	if (at >= pm->map_size)
+		return error(_("extended pseudo-merge read out-of-bounds "
+			       "(%"PRIuMAX" >= %"PRIuMAX")"),
+			     (uintmax_t)at, (uintmax_t)pm->map_size);
+
+	ext->nr = get_be32(pm->map + at);
+	ext->ptr = pm->map + at + sizeof(uint32_t);
+
+	return 0;
+}
+
+struct ewah_bitmap *pseudo_merge_bitmap(const struct pseudo_merge_map *pm,
+					struct pseudo_merge *merge)
+{
+	if (!merge->loaded_commits)
+		BUG("cannot use unloaded pseudo-merge bitmap");
+
+	if (!merge->loaded_bitmap) {
+		size_t at = merge->bitmap_at;
+
+		merge->bitmap = read_bitmap(pm->map, pm->map_size, &at);
+		merge->loaded_bitmap = 1;
+	}
+
+	return merge->bitmap;
+}
+
+struct pseudo_merge *use_pseudo_merge(const struct pseudo_merge_map *pm,
+				      struct pseudo_merge *merge)
+{
+	if (!merge->loaded_commits) {
+		size_t pos = merge->at;
+
+		merge->commits = read_bitmap(pm->map, pm->map_size, &pos);
+		merge->bitmap_at = pos;
+		merge->loaded_commits = 1;
+	}
+	return merge;
+}
+
+static struct pseudo_merge *pseudo_merge_at(const struct pseudo_merge_map *pm,
+					    struct object_id *oid,
+					    size_t want)
+{
+	size_t lo = 0;
+	size_t hi = pm->nr;
+
+	while (lo < hi) {
+		size_t mi = lo + (hi - lo) / 2;
+		size_t got = pm->v[mi].at;
+
+		if (got == want)
+			return use_pseudo_merge(pm, &pm->v[mi]);
+		else if (got < want)
+			hi = mi;
+		else
+			lo = mi + 1;
+	}
+
+	warning(_("could not find pseudo-merge for commit %s at offset %"PRIuMAX),
+		oid_to_hex(oid), (uintmax_t)want);
+
+	return NULL;
+}
+
+struct pseudo_merge_commit {
+	uint32_t commit_pos;
+	uint64_t pseudo_merge_ofs;
+};
+
+#define PSEUDO_MERGE_COMMIT_RAWSZ (sizeof(uint32_t)+sizeof(uint64_t))
+
+static void read_pseudo_merge_commit_at(struct pseudo_merge_commit *merge,
+					const unsigned char *at)
+{
+	merge->commit_pos = get_be32(at);
+	merge->pseudo_merge_ofs = get_be64(at + sizeof(uint32_t));
+}
+
+static int nth_pseudo_merge_ext(const struct pseudo_merge_map *pm,
+				struct pseudo_merge_commit_ext *ext,
+				struct pseudo_merge_commit *merge,
+				uint32_t n)
+{
+	size_t ofs;
+
+	if (n >= ext->nr)
+		return error(_("extended pseudo-merge lookup out-of-bounds "
+			       "(%"PRIu32" >= %"PRIu32")"), n, ext->nr);
+
+	ofs = get_be64(ext->ptr + st_mult(n, sizeof(uint64_t)));
+	if (ofs >= pm->map_size)
+		return error(_("out-of-bounds read: (%"PRIuMAX" >= %"PRIuMAX")"),
+			     (uintmax_t)ofs, (uintmax_t)pm->map_size);
+
+	read_pseudo_merge_commit_at(merge, pm->map + ofs);
+
+	return 0;
+}
+
+static unsigned apply_pseudo_merge(const struct pseudo_merge_map *pm,
+				   struct pseudo_merge *merge,
+				   struct bitmap *result,
+				   struct bitmap *roots)
+{
+	if (merge->satisfied)
+		return 0;
+
+	if (!ewah_bitmap_is_subset(merge->commits, roots ? roots : result))
+		return 0;
+
+	bitmap_or_ewah(result, pseudo_merge_bitmap(pm, merge));
+	if (roots)
+		bitmap_or_ewah(roots, pseudo_merge_bitmap(pm, merge));
+	merge->satisfied = 1;
+
+	return 1;
+}
+
+static int pseudo_merge_commit_cmp(const void *va, const void *vb)
+{
+	struct pseudo_merge_commit merge;
+	uint32_t key = *(uint32_t*)va;
+
+	read_pseudo_merge_commit_at(&merge, vb);
+
+	if (key < merge.commit_pos)
+		return -1;
+	if (key > merge.commit_pos)
+		return 1;
+	return 0;
+}
+
+static struct pseudo_merge_commit *find_pseudo_merge(const struct pseudo_merge_map *pm,
+						     uint32_t pos)
+{
+	if (!pm->commits_nr)
+		return NULL;
+
+	return bsearch(&pos, pm->commits, pm->commits_nr,
+		       PSEUDO_MERGE_COMMIT_RAWSZ, pseudo_merge_commit_cmp);
+}
+
+int apply_pseudo_merges_for_commit(const struct pseudo_merge_map *pm,
+				   struct bitmap *result,
+				   struct commit *commit, uint32_t commit_pos)
+{
+	struct pseudo_merge *merge;
+	struct pseudo_merge_commit *merge_commit;
+	int ret = 0;
+
+	merge_commit = find_pseudo_merge(pm, commit_pos);
+	if (!merge_commit)
+		return 0;
+
+	if (merge_commit->pseudo_merge_ofs & ((uint64_t)1<<63)) {
+		struct pseudo_merge_commit_ext ext = { 0 };
+		off_t ofs = merge_commit->pseudo_merge_ofs & ~((uint64_t)1<<63);
+		uint32_t i;
+
+		if (pseudo_merge_ext_at(pm, &ext, ofs) < -1) {
+			warning(_("could not read extended pseudo-merge table "
+				  "for commit %s"),
+				oid_to_hex(&commit->object.oid));
+			return ret;
+		}
+
+		for (i = 0; i < ext.nr; i++) {
+			if (nth_pseudo_merge_ext(pm, &ext, merge_commit, i) < 0)
+				return ret;
+
+			merge = pseudo_merge_at(pm, &commit->object.oid,
+						merge_commit->pseudo_merge_ofs);
+
+			if (!merge)
+				return ret;
+
+			if (apply_pseudo_merge(pm, merge, result, NULL))
+				ret++;
+		}
+	} else {
+		merge = pseudo_merge_at(pm, &commit->object.oid,
+					merge_commit->pseudo_merge_ofs);
+
+		if (!merge)
+			return ret;
+
+		if (apply_pseudo_merge(pm, merge, result, NULL))
+			ret++;
+	}
+
+	if (ret)
+		cascade_pseudo_merges(pm, result, NULL);
+
+	return ret;
+}
+
+int cascade_pseudo_merges(const struct pseudo_merge_map *pm,
+			  struct bitmap *result,
+			  struct bitmap *roots)
+{
+	unsigned any_satisfied;
+	int ret = 0;
+
+	do {
+		struct pseudo_merge *merge;
+		uint32_t i;
+
+		any_satisfied = 0;
+
+		for (i = 0; i < pm->nr; i++) {
+			merge = use_pseudo_merge(pm, &pm->v[i]);
+			if (apply_pseudo_merge(pm, merge, result, roots)) {
+				any_satisfied |= 1;
+				ret++;
+			}
+		}
+	} while (any_satisfied);
+
+	return ret;
+}
diff --git a/pseudo-merge.h b/pseudo-merge.h
index a3f0243062c..c00b622be4b 100644
--- a/pseudo-merge.h
+++ b/pseudo-merge.h
@@ -162,4 +162,48 @@ struct pseudo_merge {
  */
 void free_pseudo_merge_map(struct pseudo_merge_map *pm);
 
+/*
+ * Loads the bitmap corresponding to the given pseudo-merge from the
+ * map, if it has not already been loaded.
+ */
+struct ewah_bitmap *pseudo_merge_bitmap(const struct pseudo_merge_map *pm,
+					struct pseudo_merge *merge);
+
+/*
+ * Loads the pseudo-merge and its commits bitmap from the given
+ * pseudo-merge map, if it has not already been loaded.
+ */
+struct pseudo_merge *use_pseudo_merge(const struct pseudo_merge_map *pm,
+				      struct pseudo_merge *merge);
+
+/*
+ * Applies pseudo-merge(s) containing the given commit to the bitmap
+ * "result".
+ *
+ * If any pseudo-merge(s) were satisfied, returns the number
+ * satisfied, otherwise returns 0. If any were satisfied, the
+ * remaining unsatisfied pseudo-merges are cascaded (see below).
+ */
+int apply_pseudo_merges_for_commit(const struct pseudo_merge_map *pm,
+				   struct bitmap *result,
+				   struct commit *commit, uint32_t commit_pos);
+
+/*
+ * Applies pseudo-merge(s) which are satisfied according to the
+ * current bitmap in result (or roots, see below). If any
+ * pseudo-merges were satisfied, repeat the process over unsatisfied
+ * pseudo-merge commits until no more pseudo-merges are satisfied.
+ *
+ * Result is the bitmap to which the pseudo-merge(s) are applied.
+ * Roots (if given) is a bitmap of the traversal tip(s) for either
+ * side of a reachability traversal.
+ *
+ * Roots may given instead of a populated results bitmap at the
+ * beginning of a traversal on either side where the reachability
+ * closure over tips is not yet known.
+ */
+int cascade_pseudo_merges(const struct pseudo_merge_map *pm,
+			  struct bitmap *result,
+			  struct bitmap *roots);
+
 #endif
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 23/30] ewah: implement `ewah_bitmap_popcount()`
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (16 preceding siblings ...)
  2024-05-21 19:02   ` [PATCH v3 22/30] pseudo-merge: implement support for reading pseudo-merge commits Taylor Blau
@ 2024-05-21 19:02   ` Taylor Blau
  2024-05-21 19:02   ` [PATCH v3 24/30] pack-bitmap: implement test helpers for pseudo-merge Taylor Blau
                     ` (7 subsequent siblings)
  25 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Some of the pseudo-merge test helpers (which will be introduced in the
following commit) will want to indicate the total number of commits in
or objects reachable from a pseudo-merge.

Implement a popcount() function that operates on EWAH bitmaps to quickly
determine how many bits are set in each of the respective bitmaps.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ewah/bitmap.c | 14 ++++++++++++++
 ewah/ewok.h   |  1 +
 2 files changed, 15 insertions(+)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index d352fec54ce..dc2ca190f12 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -212,6 +212,20 @@ size_t bitmap_popcount(struct bitmap *self)
 	return count;
 }
 
+size_t ewah_bitmap_popcount(struct ewah_bitmap *self)
+{
+	struct ewah_iterator it;
+	eword_t word;
+	size_t count = 0;
+
+	ewah_iterator_init(&it, self);
+
+	while (ewah_iterator_next(&word, &it))
+		count += ewah_bit_popcount64(word);
+
+	return count;
+}
+
 int bitmap_is_empty(struct bitmap *self)
 {
 	size_t i;
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 2b6c4ac499c..7074a6347b7 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -195,6 +195,7 @@ void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other);
 void bitmap_or(struct bitmap *self, const struct bitmap *other);
 
 size_t bitmap_popcount(struct bitmap *self);
+size_t ewah_bitmap_popcount(struct ewah_bitmap *self);
 int bitmap_is_empty(struct bitmap *self);
 
 #endif
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 24/30] pack-bitmap: implement test helpers for pseudo-merge
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (17 preceding siblings ...)
  2024-05-21 19:02   ` [PATCH v3 23/30] ewah: implement `ewah_bitmap_popcount()` Taylor Blau
@ 2024-05-21 19:02   ` Taylor Blau
  2024-05-21 19:02   ` [PATCH v3 25/30] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` Taylor Blau
                     ` (6 subsequent siblings)
  25 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Implement three new sub-commands for the "bitmap" test-helper:

  - t/helper test-tool bitmap dump-pseudo-merges
  - t/helper test-tool bitmap dump-pseudo-merge-commits <n>
  - t/helper test-tool bitmap dump-pseudo-merge-objects <n>

These three helpers dump the list of pseudo merges, the "parents" of the
nth pseudo-merges, and the set of objects reachable from those parents,
respectively.

These helpers will be useful in subsequent patches when we add test
coverage for pseudo-merge bitmaps.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c          | 126 +++++++++++++++++++++++++++++++++++++++++
 pack-bitmap.h          |   3 +
 t/helper/test-bitmap.c |  34 ++++++++---
 3 files changed, 156 insertions(+), 7 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index fc9c3e2fc43..c13074673af 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2443,6 +2443,132 @@ int test_bitmap_hashes(struct repository *r)
 	return 0;
 }
 
+static void bit_pos_to_object_id(struct bitmap_index *bitmap_git,
+				 uint32_t bit_pos,
+				 struct object_id *oid)
+{
+	uint32_t index_pos;
+
+	if (bitmap_is_midx(bitmap_git))
+		index_pos = pack_pos_to_midx(bitmap_git->midx, bit_pos);
+	else
+		index_pos = pack_pos_to_index(bitmap_git->pack, bit_pos);
+
+	nth_bitmap_object_oid(bitmap_git, oid, index_pos);
+}
+
+int test_bitmap_pseudo_merges(struct repository *r)
+{
+	struct bitmap_index *bitmap_git;
+	uint32_t i;
+
+	bitmap_git = prepare_bitmap_git(r);
+	if (!bitmap_git || !bitmap_git->pseudo_merges.nr)
+		goto cleanup;
+
+	for (i = 0; i < bitmap_git->pseudo_merges.nr; i++) {
+		struct pseudo_merge *merge;
+		struct ewah_bitmap *commits_bitmap, *merge_bitmap;
+
+		merge = use_pseudo_merge(&bitmap_git->pseudo_merges,
+					 &bitmap_git->pseudo_merges.v[i]);
+		commits_bitmap = merge->commits;
+		merge_bitmap = pseudo_merge_bitmap(&bitmap_git->pseudo_merges,
+						   merge);
+
+		printf("at=%"PRIuMAX", commits=%"PRIuMAX", objects=%"PRIuMAX"\n",
+		       (uintmax_t)merge->at,
+		       (uintmax_t)ewah_bitmap_popcount(commits_bitmap),
+		       (uintmax_t)ewah_bitmap_popcount(merge_bitmap));
+	}
+
+cleanup:
+	free_bitmap_index(bitmap_git);
+	return 0;
+}
+
+static void dump_ewah_object_ids(struct bitmap_index *bitmap_git,
+				 struct ewah_bitmap *bitmap)
+
+{
+	struct ewah_iterator it;
+	eword_t word;
+	uint32_t pos = 0;
+
+	ewah_iterator_init(&it, bitmap);
+
+	while (ewah_iterator_next(&word, &it)) {
+		struct object_id oid;
+		uint32_t offset;
+
+		for (offset = 0; offset < BITS_IN_EWORD; offset++) {
+			if (!(word >> offset))
+				break;
+
+			offset += ewah_bit_ctz64(word >> offset);
+
+			bit_pos_to_object_id(bitmap_git, pos + offset, &oid);
+			printf("%s\n", oid_to_hex(&oid));
+		}
+		pos += BITS_IN_EWORD;
+	}
+}
+
+int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n)
+{
+	struct bitmap_index *bitmap_git;
+	struct pseudo_merge *merge;
+	int ret = 0;
+
+	bitmap_git = prepare_bitmap_git(r);
+	if (!bitmap_git || !bitmap_git->pseudo_merges.nr)
+		goto cleanup;
+
+	if (n >= bitmap_git->pseudo_merges.nr) {
+		ret = error(_("pseudo-merge index out of range "
+			      "(%"PRIu32" >= %"PRIuMAX")"),
+			    n, (uintmax_t)bitmap_git->pseudo_merges.nr);
+		goto cleanup;
+	}
+
+	merge = use_pseudo_merge(&bitmap_git->pseudo_merges,
+				 &bitmap_git->pseudo_merges.v[n]);
+	dump_ewah_object_ids(bitmap_git, merge->commits);
+
+cleanup:
+	free_bitmap_index(bitmap_git);
+	return ret;
+}
+
+int test_bitmap_pseudo_merge_objects(struct repository *r, uint32_t n)
+{
+	struct bitmap_index *bitmap_git;
+	struct pseudo_merge *merge;
+	int ret = 0;
+
+	bitmap_git = prepare_bitmap_git(r);
+	if (!bitmap_git || !bitmap_git->pseudo_merges.nr)
+		goto cleanup;
+
+	if (n >= bitmap_git->pseudo_merges.nr) {
+		ret = error(_("pseudo-merge index out of range "
+			      "(%"PRIu32" >= %"PRIuMAX")"),
+			    n, (uintmax_t)bitmap_git->pseudo_merges.nr);
+		goto cleanup;
+	}
+
+	merge = use_pseudo_merge(&bitmap_git->pseudo_merges,
+				 &bitmap_git->pseudo_merges.v[n]);
+
+	dump_ewah_object_ids(bitmap_git,
+			     pseudo_merge_bitmap(&bitmap_git->pseudo_merges,
+						 merge));
+
+cleanup:
+	free_bitmap_index(bitmap_git);
+	return ret;
+}
+
 int rebuild_bitmap(const uint32_t *reposition,
 		   struct ewah_bitmap *source,
 		   struct bitmap *dest)
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 21aabf805ea..4466b5ad0fb 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -73,6 +73,9 @@ void traverse_bitmap_commit_list(struct bitmap_index *,
 void test_bitmap_walk(struct rev_info *revs);
 int test_bitmap_commits(struct repository *r);
 int test_bitmap_hashes(struct repository *r);
+int test_bitmap_pseudo_merges(struct repository *r);
+int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n);
+int test_bitmap_pseudo_merge_objects(struct repository *r, uint32_t n);
 
 #define GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL \
 	"GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL"
diff --git a/t/helper/test-bitmap.c b/t/helper/test-bitmap.c
index af43ee1cb5e..6af2b42678f 100644
--- a/t/helper/test-bitmap.c
+++ b/t/helper/test-bitmap.c
@@ -13,21 +13,41 @@ static int bitmap_dump_hashes(void)
 	return test_bitmap_hashes(the_repository);
 }
 
+static int bitmap_dump_pseudo_merges(void)
+{
+	return test_bitmap_pseudo_merges(the_repository);
+}
+
+static int bitmap_dump_pseudo_merge_commits(uint32_t n)
+{
+	return test_bitmap_pseudo_merge_commits(the_repository, n);
+}
+
+static int bitmap_dump_pseudo_merge_objects(uint32_t n)
+{
+	return test_bitmap_pseudo_merge_objects(the_repository, n);
+}
+
 int cmd__bitmap(int argc, const char **argv)
 {
 	setup_git_directory();
 
-	if (argc != 2)
-		goto usage;
-
-	if (!strcmp(argv[1], "list-commits"))
+	if (argc == 2 && !strcmp(argv[1], "list-commits"))
 		return bitmap_list_commits();
-	if (!strcmp(argv[1], "dump-hashes"))
+	if (argc == 2 && !strcmp(argv[1], "dump-hashes"))
 		return bitmap_dump_hashes();
+	if (argc == 2 && !strcmp(argv[1], "dump-pseudo-merges"))
+		return bitmap_dump_pseudo_merges();
+	if (argc == 3 && !strcmp(argv[1], "dump-pseudo-merge-commits"))
+		return bitmap_dump_pseudo_merge_commits(atoi(argv[2]));
+	if (argc == 3 && !strcmp(argv[1], "dump-pseudo-merge-objects"))
+		return bitmap_dump_pseudo_merge_objects(atoi(argv[2]));
 
-usage:
 	usage("\ttest-tool bitmap list-commits\n"
-	      "\ttest-tool bitmap dump-hashes");
+	      "\ttest-tool bitmap dump-hashes\n"
+	      "\ttest-tool bitmap dump-pseudo-merges\n"
+	      "\ttest-tool bitmap dump-pseudo-merge-commits <n>\n"
+	      "\ttest-tool bitmap dump-pseudo-merge-objects <n>");
 
 	return -1;
 }
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 25/30] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()`
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (18 preceding siblings ...)
  2024-05-21 19:02   ` [PATCH v3 24/30] pack-bitmap: implement test helpers for pseudo-merge Taylor Blau
@ 2024-05-21 19:02   ` Taylor Blau
  2024-05-23 10:42     ` Jeff King
  2024-05-21 19:03   ` [PATCH v3 26/30] pack-bitmap.c: use pseudo-merges during traversal Taylor Blau
                     ` (5 subsequent siblings)
  25 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:02 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

One of the tests we'll want to add for pseudo-merge bitmaps needs to be
able to generate a large number of commits at a specific date.

Support the `--date` option (with identical semantics to the `--date`
option for `test_commit()`) within `test_commit_bulk` as a prerequisite
for that.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/test-lib-functions.sh | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 862d80c9748..16fd585e34b 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -458,6 +458,7 @@ test_commit_bulk () {
 	indir=.
 	ref=HEAD
 	n=1
+	notick=
 	message='commit %s'
 	filename='%s.t'
 	contents='content %s'
@@ -488,6 +489,12 @@ test_commit_bulk () {
 			filename="${1#--*=}-%s.t"
 			contents="${1#--*=} %s"
 			;;
+		--date)
+			notick=yes
+			GIT_COMMITTER_DATE="$2"
+			GIT_AUTHOR_DATE="$2"
+			shift
+			;;
 		-*)
 			BUG "invalid test_commit_bulk option: $1"
 			;;
@@ -507,7 +514,10 @@ test_commit_bulk () {
 
 	while test "$total" -gt 0
 	do
-		test_tick &&
+		if test -z "$notick"
+		then
+			test_tick
+		fi &&
 		echo "commit $ref"
 		printf 'author %s <%s> %s\n' \
 			"$GIT_AUTHOR_NAME" \
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 26/30] pack-bitmap.c: use pseudo-merges during traversal
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (19 preceding siblings ...)
  2024-05-21 19:02   ` [PATCH v3 25/30] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` Taylor Blau
@ 2024-05-21 19:03   ` Taylor Blau
  2024-05-23 10:48     ` Jeff King
  2024-05-21 19:03   ` [PATCH v3 27/30] pack-bitmap: extra trace2 information Taylor Blau
                     ` (4 subsequent siblings)
  25 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:03 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Now that all of the groundwork has been laid to support reading and
using pseudo-merges, make use of that work in this commit by teaching
the pack-bitmap machinery to use pseudo-merge(s) when available during
traversal.

The basic operation is as follows:

  - When enumerating objects on either side of a reachability query,
    first see if any subset of the roots satisfies some pseudo-merge
    bitmap. If it does, apply that pseudo-merge bitmap.

  - If any pseudo-merge bitmap(s) were applied in the previous step, OR
    them into the result[^1]. Then repeat the process over all
    pseudo-merge bitmaps (we'll refer to this as "cascading"
    pseudo-merges). Once this is done, OR in the resulting bitmap.

  - If there is no fill-in traversal to be done, return the bitmap for
    that side of the reachability query. If there is fill-in traversal,
    then for each commit we encounter via show_commit(), check to see if
    any unsatisfied pseudo-merges containing that commit as one of its
    parents has been made satisfied by the presence of that commit.

    If so, OR in the object set from that pseudo-merge bitmap, and then
    cascade. If not, continue traversal.

A similar implementation is present in the boundary-based bitmap
traversal routines.

[^1]: Importantly, we cannot OR in the entire set of roots along with
  the objects reachable from whatever pseudo-merge bitmaps were
  satisfied.  This may leave some dangling bits corresponding to any
  unsatisfied root(s) getting OR'd into the resulting bitmap, tricking
  other parts of the traversal into thinking we already have a
  reachability closure over those commit(s) when we do not.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c                   | 112 ++++++++++-
 t/t5333-pseudo-merge-bitmaps.sh | 323 ++++++++++++++++++++++++++++++++
 2 files changed, 434 insertions(+), 1 deletion(-)
 create mode 100755 t/t5333-pseudo-merge-bitmaps.sh

diff --git a/pack-bitmap.c b/pack-bitmap.c
index c13074673af..e61058dada6 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -114,6 +114,9 @@ struct bitmap_index {
 	unsigned int version;
 };
 
+static int pseudo_merges_satisfied_nr;
+static int pseudo_merges_cascades_nr;
+
 static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
 {
 	struct ewah_bitmap *parent;
@@ -1006,6 +1009,22 @@ static void show_commit(struct commit *commit UNUSED,
 {
 }
 
+static unsigned apply_pseudo_merges_for_commit_1(struct bitmap_index *bitmap_git,
+						 struct bitmap *result,
+						 struct commit *commit,
+						 uint32_t commit_pos)
+{
+	int ret;
+
+	ret = apply_pseudo_merges_for_commit(&bitmap_git->pseudo_merges,
+					     result, commit, commit_pos);
+
+	if (ret)
+		pseudo_merges_satisfied_nr += ret;
+
+	return ret;
+}
+
 static int add_to_include_set(struct bitmap_index *bitmap_git,
 			      struct include_data *data,
 			      struct commit *commit,
@@ -1026,6 +1045,10 @@ static int add_to_include_set(struct bitmap_index *bitmap_git,
 	}
 
 	bitmap_set(data->base, bitmap_pos);
+	if (apply_pseudo_merges_for_commit_1(bitmap_git, data->base, commit,
+					     bitmap_pos))
+		return 0;
+
 	return 1;
 }
 
@@ -1151,6 +1174,20 @@ static void show_boundary_object(struct object *object UNUSED,
 	BUG("should not be called");
 }
 
+static unsigned cascade_pseudo_merges_1(struct bitmap_index *bitmap_git,
+					struct bitmap *result,
+					struct bitmap *roots)
+{
+	int ret = cascade_pseudo_merges(&bitmap_git->pseudo_merges,
+					result, roots);
+	if (ret) {
+		pseudo_merges_cascades_nr++;
+		pseudo_merges_satisfied_nr += ret;
+	}
+
+	return ret;
+}
+
 static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 					    struct rev_info *revs,
 					    struct object_list *roots)
@@ -1160,6 +1197,7 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 	unsigned int i;
 	unsigned int tmp_blobs, tmp_trees, tmp_tags;
 	int any_missing = 0;
+	int existing_bitmaps = 0;
 
 	cb.bitmap_git = bitmap_git;
 	cb.base = bitmap_new();
@@ -1167,6 +1205,25 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 
 	revs->ignore_missing_links = 1;
 
+	if (bitmap_git->pseudo_merges.nr) {
+		struct bitmap *roots_bitmap = bitmap_new();
+		struct object_list *objects = NULL;
+
+		for (objects = roots; objects; objects = objects->next) {
+			struct object *object = objects->item;
+			int pos;
+
+			pos = bitmap_position(bitmap_git, &object->oid);
+			if (pos < 0)
+				continue;
+
+			bitmap_set(roots_bitmap, pos);
+		}
+
+		if (!cascade_pseudo_merges_1(bitmap_git, cb.base, roots_bitmap))
+			bitmap_free(roots_bitmap);
+	}
+
 	/*
 	 * OR in any existing reachability bitmaps among `roots` into
 	 * `cb.base`.
@@ -1178,8 +1235,10 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 			continue;
 
 		if (add_commit_to_bitmap(bitmap_git, &cb.base,
-					 (struct commit *)object))
+					 (struct commit *)object)) {
+			existing_bitmaps = 1;
 			continue;
+		}
 
 		any_missing = 1;
 	}
@@ -1187,6 +1246,9 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 	if (!any_missing)
 		goto cleanup;
 
+	if (existing_bitmaps)
+		cascade_pseudo_merges_1(bitmap_git, cb.base, NULL);
+
 	tmp_blobs = revs->blob_objects;
 	tmp_trees = revs->tree_objects;
 	tmp_tags = revs->blob_objects;
@@ -1242,6 +1304,13 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 	return cb.base;
 }
 
+static void unsatisfy_all_pseudo_merges(struct bitmap_index *bitmap_git)
+{
+	uint32_t i;
+	for (i = 0; i < bitmap_git->pseudo_merges.nr; i++)
+		bitmap_git->pseudo_merges.v[i].satisfied = 0;
+}
+
 static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 				   struct rev_info *revs,
 				   struct object_list *roots,
@@ -1249,9 +1318,32 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 {
 	struct bitmap *base = NULL;
 	int needs_walk = 0;
+	unsigned existing_bitmaps = 0;
 
 	struct object_list *not_mapped = NULL;
 
+	unsatisfy_all_pseudo_merges(bitmap_git);
+
+	if (bitmap_git->pseudo_merges.nr) {
+		struct bitmap *roots_bitmap = bitmap_new();
+		struct object_list *objects = NULL;
+
+		for (objects = roots; objects; objects = objects->next) {
+			struct object *object = objects->item;
+			int pos;
+
+			pos = bitmap_position(bitmap_git, &object->oid);
+			if (pos < 0)
+				continue;
+
+			bitmap_set(roots_bitmap, pos);
+		}
+
+		base = bitmap_new();
+		if (!cascade_pseudo_merges_1(bitmap_git, base, roots_bitmap))
+			bitmap_free(roots_bitmap);
+	}
+
 	/*
 	 * Go through all the roots for the walk. The ones that have bitmaps
 	 * on the bitmap index will be `or`ed together to form an initial
@@ -1262,11 +1354,21 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 	 */
 	while (roots) {
 		struct object *object = roots->item;
+
 		roots = roots->next;
 
+		if (base) {
+			int pos = bitmap_position(bitmap_git, &object->oid);
+			if (pos > 0 && bitmap_get(base, pos)) {
+				object->flags |= SEEN;
+				continue;
+			}
+		}
+
 		if (object->type == OBJ_COMMIT &&
 		    add_commit_to_bitmap(bitmap_git, &base, (struct commit *)object)) {
 			object->flags |= SEEN;
+			existing_bitmaps = 1;
 			continue;
 		}
 
@@ -1282,6 +1384,9 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 
 	roots = not_mapped;
 
+	if (existing_bitmaps)
+		cascade_pseudo_merges_1(bitmap_git, base, NULL);
+
 	/*
 	 * Let's iterate through all the roots that don't have bitmaps to
 	 * check if we can determine them to be reachable from the existing
@@ -1866,6 +1971,11 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	object_list_free(&wants);
 	object_list_free(&haves);
 
+	trace2_data_intmax("bitmap", the_repository, "pseudo_merges_satisfied",
+			   pseudo_merges_satisfied_nr);
+	trace2_data_intmax("bitmap", the_repository, "pseudo_merges_cascades",
+			   pseudo_merges_cascades_nr);
+
 	return bitmap_git;
 
 cleanup:
diff --git a/t/t5333-pseudo-merge-bitmaps.sh b/t/t5333-pseudo-merge-bitmaps.sh
new file mode 100755
index 00000000000..3a7dc7278a7
--- /dev/null
+++ b/t/t5333-pseudo-merge-bitmaps.sh
@@ -0,0 +1,323 @@
+#!/bin/sh
+
+test_description='pseudo-merge bitmaps'
+
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
+. ./test-lib.sh
+
+test_pseudo_merges () {
+	test-tool bitmap dump-pseudo-merges
+}
+
+test_pseudo_merge_commits () {
+	test-tool bitmap dump-pseudo-merge-commits "$1"
+}
+
+test_pseudo_merges_satisfied () {
+	test_trace2_data bitmap pseudo_merges_satisfied "$1"
+}
+
+test_pseudo_merges_cascades () {
+	test_trace2_data bitmap pseudo_merges_cascades "$1"
+}
+
+tag_everything () {
+	git rev-list --all --no-object-names >in &&
+	perl -lne '
+		print "create refs/tags/" . $. . " " . $1 if /([0-9a-f]+)/
+	' <in | git update-ref --stdin
+}
+
+test_expect_success 'setup' '
+	test_commit_bulk 512 &&
+	tag_everything
+'
+
+test_expect_success 'bitmap traversal without pseudo-merges' '
+	git repack -adb &&
+
+	git rev-list --count --all --objects >expect &&
+
+	: >trace2.txt &&
+	GIT_TRACE2_EVENT=$PWD/trace2.txt \
+		git rev-list --count --all --objects --use-bitmap-index >actual &&
+
+	test_pseudo_merges_satisfied 0 <trace2.txt &&
+	test_pseudo_merges_cascades 0 <trace2.txt &&
+	test_pseudo_merges >merges &&
+	test_must_be_empty merges &&
+	test_cmp expect actual
+'
+
+test_expect_success 'pseudo-merges accurately represent their objects' '
+	test_config bitmapPseudoMerge.test.pattern "refs/tags/" &&
+	test_config bitmapPseudoMerge.test.maxMerges 8 &&
+	test_config bitmapPseudoMerge.test.stableThreshold never &&
+
+	git repack -adb &&
+
+	test_pseudo_merges >merges &&
+	test_line_count = 8 merges &&
+
+	for i in $(test_seq 0 $(($(wc -l <merges)-1)))
+	do
+		test-tool bitmap dump-pseudo-merge-commits $i >commits &&
+
+		git rev-list --objects --no-object-names --stdin <commits >expect.raw &&
+		test-tool bitmap dump-pseudo-merge-objects $i >actual.raw &&
+
+		sort -u <expect.raw >expect &&
+		sort -u <actual.raw >actual &&
+
+		test_cmp expect actual || return 1
+	done
+'
+
+test_expect_success 'bitmap traversal with pseudo-merges' '
+	: >trace2.txt &&
+	GIT_TRACE2_EVENT=$PWD/trace2.txt \
+		git rev-list --count --all --objects --use-bitmap-index >actual &&
+	git rev-list --count --all --objects >expect &&
+
+	test_pseudo_merges_satisfied 8 <trace2.txt &&
+	test_pseudo_merges_cascades 1 <trace2.txt &&
+	test_cmp expect actual
+'
+
+test_expect_success 'stale bitmap traversal with pseudo-merges' '
+	test_commit other &&
+
+	: >trace2.txt &&
+	GIT_TRACE2_EVENT=$PWD/trace2.txt \
+		git rev-list --count --all --objects --use-bitmap-index >actual &&
+	git rev-list --count --all --objects >expect &&
+
+	test_pseudo_merges_satisfied 8 <trace2.txt &&
+	test_pseudo_merges_cascades 1 <trace2.txt &&
+	test_cmp expect actual
+'
+
+test_expect_success 'bitmapPseudoMerge.sampleRate adjusts commit selection rate' '
+	test_config bitmapPseudoMerge.test.pattern "refs/tags/" &&
+	test_config bitmapPseudoMerge.test.maxMerges 1 &&
+	test_config bitmapPseudoMerge.test.stableThreshold never &&
+
+	commits_nr=$(git rev-list --all --count) &&
+
+	for rate in 1.0 0.5 0.25
+	do
+		git -c bitmapPseudoMerge.test.sampleRate=$rate repack -adb &&
+
+		test_pseudo_merges >merges &&
+		test_line_count = 1 merges &&
+		test_pseudo_merge_commits 0 >commits &&
+
+		test-tool bitmap list-commits >bitmaps &&
+		bitmaps_nr="$(wc -l <bitmaps)" &&
+
+		perl -MPOSIX -e "print ceil(\$ARGV[0]*(\$ARGV[1]-\$ARGV[2]))" \
+			"$rate" "$commits_nr" "$bitmaps_nr" >expect &&
+
+		test $(cat expect) -eq $(wc -l <commits) || return 1
+	done
+'
+
+test_expect_success 'bitmapPseudoMerge.threshold excludes newer commits' '
+	git init pseudo-merge-threshold &&
+	(
+		cd pseudo-merge-threshold &&
+
+		new="1672549200" && # 2023-01-01
+		old="1641013200" && # 2022-01-01
+
+		test_commit_bulk --message="new" --date "$new +0000" 128 &&
+		test_commit_bulk --message="old" --date "$old +0000" 128 &&
+		test_tick &&
+
+		tag_everything &&
+
+		git \
+			-c bitmapPseudoMerge.test.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.test.maxMerges=1 \
+			-c bitmapPseudoMerge.test.threshold=$(($new - 1)) \
+			-c bitmapPseudoMerge.test.stableThreshold=never \
+			repack -adb &&
+
+		test_pseudo_merges >merges &&
+		test_line_count = 1 merges &&
+
+		test_pseudo_merge_commits 0 >oids &&
+		git cat-file --batch <oids >commits &&
+
+		test $(wc -l <oids) = $(grep -c "^committer.*$old +0000$" commits)
+	)
+'
+
+test_expect_success 'bitmapPseudoMerge.stableThreshold creates stable groups' '
+	(
+		cd pseudo-merge-threshold &&
+
+		new="1672549200" && # 2023-01-01
+		mid="1654059600" && # 2022-06-01
+		old="1641013200" && # 2022-01-01
+
+		test_commit_bulk --message="mid" --date "$mid +0000" 128 &&
+		test_tick &&
+
+		git for-each-ref --format="delete %(refname)" refs/tags >in &&
+		git update-ref --stdin <in &&
+
+		tag_everything &&
+
+		git \
+			-c bitmapPseudoMerge.test.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.test.maxMerges=1 \
+			-c bitmapPseudoMerge.test.threshold=$(($new - 1)) \
+			-c bitmapPseudoMerge.test.stableThreshold=$(($mid - 1)) \
+			-c bitmapPseudoMerge.test.stableSize=10 \
+			repack -adb &&
+
+		test_pseudo_merges >merges &&
+		merges_nr="$(wc -l <merges)" &&
+
+		for i in $(test_seq $(($merges_nr - 1)))
+		do
+			test_pseudo_merge_commits 0 >oids &&
+			git cat-file --batch <oids >commits &&
+
+			expect="$(grep -c "^committer.*$old +0000$" commits)" &&
+			actual="$(wc -l <oids)" &&
+
+			test $expect = $actual || return 1
+		done &&
+
+		test_pseudo_merge_commits $(($merges_nr - 1)) >oids &&
+		git cat-file --batch <oids >commits &&
+		test $(wc -l <oids) = $(grep -c "^committer.*$mid +0000$" commits)
+	)
+'
+
+test_expect_success 'out of order thresholds are rejected' '
+	test_must_fail git \
+		-c bitmapPseudoMerge.test.pattern="refs/*" \
+		-c bitmapPseudoMerge.test.threshold=1.month.ago \
+		-c bitmapPseudoMerge.test.stableThreshold=1.week.ago \
+		repack -adb 2>err &&
+
+	cat >expect <<-EOF &&
+	fatal: pseudo-merge group ${SQ}test${SQ} has unstable threshold before stable one
+	EOF
+
+	test_cmp expect err
+'
+
+test_expect_success 'pseudo-merge pattern with capture groups' '
+	git init pseudo-merge-captures &&
+	(
+		cd pseudo-merge-captures &&
+
+		test_commit_bulk 128 &&
+		tag_everything &&
+
+		for r in $(test_seq 8)
+		do
+			test_commit_bulk 16 &&
+
+			git rev-list HEAD~16.. >in &&
+
+			perl -lne "print \"create refs/remotes/$r/tags/\$. \$_\"" <in |
+			git update-ref --stdin || return 1
+		done &&
+
+		git \
+			-c bitmapPseudoMerge.tags.pattern="refs/remotes/([0-9]+)/tags/" \
+			-c bitmapPseudoMerge.tags.maxMerges=1 \
+			repack -adb &&
+
+		git for-each-ref --format="%(objectname) %(refname)" >refs &&
+
+		test_pseudo_merges >merges &&
+		for m in $(test_seq 0 $(($(wc -l <merges) - 1)))
+		do
+			test_pseudo_merge_commits $m >oids &&
+			grep -f oids refs |
+			perl -lne "print \$1 if /refs\/remotes\/([0-9]+)/" |
+			sort -u || return 1
+		done >remotes &&
+
+		test $(wc -l <remotes) -eq $(sort -u <remotes | wc -l)
+	)
+'
+
+test_expect_success 'pseudo-merge overlap setup' '
+	git init pseudo-merge-overlap &&
+	(
+		cd pseudo-merge-overlap &&
+
+		test_commit_bulk 256 &&
+		tag_everything &&
+
+		git \
+			-c bitmapPseudoMerge.all.pattern="refs/" \
+			-c bitmapPseudoMerge.all.maxMerges=1 \
+			-c bitmapPseudoMerge.all.stableThreshold=never \
+			-c bitmapPseudoMerge.tags.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.tags.maxMerges=1 \
+			-c bitmapPseudoMerge.tags.stableThreshold=never \
+			repack -adb
+	)
+'
+
+test_expect_success 'pseudo-merge overlap generates overlapping groups' '
+	(
+		cd pseudo-merge-overlap &&
+
+		test_pseudo_merges >merges &&
+		test_line_count = 2 merges &&
+
+		test_pseudo_merge_commits 0 >commits-0.raw &&
+		test_pseudo_merge_commits 1 >commits-1.raw &&
+
+		sort commits-0.raw >commits-0 &&
+		sort commits-1.raw >commits-1 &&
+
+		comm -12 commits-0 commits-1 >overlap &&
+
+		test_line_count -gt 0 overlap
+	)
+'
+
+test_expect_success 'pseudo-merge overlap traversal' '
+	(
+		cd pseudo-merge-overlap &&
+
+		: >trace2.txt &&
+		GIT_TRACE2_EVENT=$PWD/trace2.txt \
+			git rev-list --count --all --objects --use-bitmap-index >actual &&
+		git rev-list --count --all --objects >expect &&
+
+		test_pseudo_merges_satisfied 2 <trace2.txt &&
+		test_pseudo_merges_cascades 1 <trace2.txt &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'pseudo-merge overlap stale traversal' '
+	(
+		cd pseudo-merge-overlap &&
+
+		test_commit other &&
+
+		: >trace2.txt &&
+		GIT_TRACE2_EVENT=$PWD/trace2.txt \
+			git rev-list --count --all --objects --use-bitmap-index >actual &&
+		git rev-list --count --all --objects >expect &&
+
+		test_pseudo_merges_satisfied 2 <trace2.txt &&
+		test_pseudo_merges_cascades 1 <trace2.txt &&
+		test_cmp expect actual
+	)
+'
+
+test_done
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 27/30] pack-bitmap: extra trace2 information
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (20 preceding siblings ...)
  2024-05-21 19:03   ` [PATCH v3 26/30] pack-bitmap.c: use pseudo-merges during traversal Taylor Blau
@ 2024-05-21 19:03   ` Taylor Blau
  2024-05-21 19:03   ` [PATCH v3 28/30] ewah: `bitmap_equals_ewah()` Taylor Blau
                     ` (3 subsequent siblings)
  25 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:03 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Add some extra trace2 lines to capture the number of bitmap lookups that
are hits versus misses, as well as the number of reachability roots that
have bitmap coverage (versus those that do not).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index e61058dada6..1966b3b95f1 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -116,6 +116,10 @@ struct bitmap_index {
 
 static int pseudo_merges_satisfied_nr;
 static int pseudo_merges_cascades_nr;
+static int existing_bitmaps_hits_nr;
+static int existing_bitmaps_misses_nr;
+static int roots_with_bitmaps_nr;
+static int roots_without_bitmaps_nr;
 
 static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
 {
@@ -1040,10 +1044,14 @@ static int add_to_include_set(struct bitmap_index *bitmap_git,
 
 	partial = bitmap_for_commit(bitmap_git, commit);
 	if (partial) {
+		existing_bitmaps_hits_nr++;
+
 		bitmap_or_ewah(data->base, partial);
 		return 0;
 	}
 
+	existing_bitmaps_misses_nr++;
+
 	bitmap_set(data->base, bitmap_pos);
 	if (apply_pseudo_merges_for_commit_1(bitmap_git, data->base, commit,
 					     bitmap_pos))
@@ -1099,8 +1107,12 @@ static int add_commit_to_bitmap(struct bitmap_index *bitmap_git,
 {
 	struct ewah_bitmap *or_with = bitmap_for_commit(bitmap_git, commit);
 
-	if (!or_with)
+	if (!or_with) {
+		existing_bitmaps_misses_nr++;
 		return 0;
+	}
+
+	existing_bitmaps_hits_nr++;
 
 	if (!*base)
 		*base = ewah_to_bitmap(or_with);
@@ -1407,8 +1419,12 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 			object->flags &= ~UNINTERESTING;
 			add_pending_object(revs, object, "");
 			needs_walk = 1;
+
+			roots_without_bitmaps_nr++;
 		} else {
 			object->flags |= SEEN;
+
+			roots_with_bitmaps_nr++;
 		}
 	}
 
@@ -1975,6 +1991,14 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 			   pseudo_merges_satisfied_nr);
 	trace2_data_intmax("bitmap", the_repository, "pseudo_merges_cascades",
 			   pseudo_merges_cascades_nr);
+	trace2_data_intmax("bitmap", the_repository, "bitmap/hits",
+			   existing_bitmaps_hits_nr);
+	trace2_data_intmax("bitmap", the_repository, "bitmap/misses",
+			   existing_bitmaps_misses_nr);
+	trace2_data_intmax("bitmap", the_repository, "bitmap/roots_with_bitmap",
+			   roots_with_bitmaps_nr);
+	trace2_data_intmax("bitmap", the_repository, "bitmap/roots_without_bitmap",
+			   roots_without_bitmaps_nr);
 
 	return bitmap_git;
 
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 28/30] ewah: `bitmap_equals_ewah()`
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (21 preceding siblings ...)
  2024-05-21 19:03   ` [PATCH v3 27/30] pack-bitmap: extra trace2 information Taylor Blau
@ 2024-05-21 19:03   ` Taylor Blau
  2024-05-21 19:03   ` [PATCH v3 29/30] pseudo-merge: implement support for finding existing merges Taylor Blau
                     ` (2 subsequent siblings)
  25 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:03 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Prepare to reuse existing pseudo-merge bitmaps by implementing a
`bitmap_equals_ewah()` helper.

This helper will be used to see if a raw bitmap (containing the set of
parents for some pseudo-merge) is equal to any existing pseudo-merge's
commits bitmap (which are stored as EWAH-compressed bitmaps on disk).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ewah/bitmap.c | 19 +++++++++++++++++++
 ewah/ewok.h   |  1 +
 2 files changed, 20 insertions(+)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index dc2ca190f12..55928dada86 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -261,6 +261,25 @@ int bitmap_equals(struct bitmap *self, struct bitmap *other)
 	return 1;
 }
 
+int bitmap_equals_ewah(struct bitmap *self, struct ewah_bitmap *other)
+{
+	struct ewah_iterator it;
+	eword_t word;
+	size_t i = 0;
+
+	ewah_iterator_init(&it, other);
+
+	while (ewah_iterator_next(&word, &it))
+		if (word != (i < self->word_alloc ? self->words[i++] : 0))
+			return 0;
+
+	for (; i < self->word_alloc; i++)
+		if (self->words[i])
+			return 0;
+
+	return 1;
+}
+
 int bitmap_is_subset(struct bitmap *self, struct bitmap *other)
 {
 	size_t common_size, i;
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 7074a6347b7..5e357e24933 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -179,6 +179,7 @@ void bitmap_unset(struct bitmap *self, size_t pos);
 int bitmap_get(struct bitmap *self, size_t pos);
 void bitmap_free(struct bitmap *self);
 int bitmap_equals(struct bitmap *self, struct bitmap *other);
+int bitmap_equals_ewah(struct bitmap *self, struct ewah_bitmap *other);
 
 /*
  * Both `bitmap_is_subset()` and `ewah_bitmap_is_subset()` return 1 if the set
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 29/30] pseudo-merge: implement support for finding existing merges
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (22 preceding siblings ...)
  2024-05-21 19:03   ` [PATCH v3 28/30] ewah: `bitmap_equals_ewah()` Taylor Blau
@ 2024-05-21 19:03   ` Taylor Blau
  2024-05-21 19:03   ` [PATCH v3 30/30] t/perf: implement performace tests for pseudo-merge bitmaps Taylor Blau
  2024-05-23 11:05   ` [PATCH v3 00/30] pack-bitmap: pseudo-merge reachability bitmaps Jeff King
  25 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:03 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

This patch implements support for reusing existing pseudo-merge commits
when writing bitmaps when there is an existing pseudo-merge bitmap which
has exactly the same set of parents as one that we are about to write.

Note that unstable pseudo-merges are likely to change between
consecutive repacks, and so are generally poor candidates for reuse.
However, stable pseudo-merges (see the configuration option
'bitmapPseudoMerge.<name>.stableThreshold') are by definition unlikely
to change between runs (as they represent long-running branches).

Because there is no index from a *set* of pseudo-merge parents to a
matching pseudo-merge bitmap, we have to construct the bitmap
corresponding to the set of parents for each pending pseudo-merge commit
and see if a matching bitmap exists.

This is technically quadratic in the number of pseudo-merges, but is OK
in practice for a couple of reasons:

  - non-matching pseudo-merge bitmaps are rejected quickly as soon as
    they differ in a single bit

  - already-matched pseudo-merge bitmaps are discarded from subsequent
    rounds of search

  - the number of pseudo-merges is generally small, even for large
    repositories

In order to do this, implement (a) a function that finds a matching
pseudo-merge given some uncompressed bitset describing its parents, (b)
a function that computes the bitset of parents for a given pseudo-merge
commit, and (c) call that function before computing the set of reachable
objects for some pending pseudo-merge.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c             | 15 ++++++--
 pack-bitmap.c                   | 32 +++++++++++++++++
 pack-bitmap.h                   |  2 ++
 pseudo-merge.c                  | 55 ++++++++++++++++++++++++++++
 pseudo-merge.h                  |  7 ++++
 t/t5333-pseudo-merge-bitmaps.sh | 64 +++++++++++++++++++++++++++++++++
 6 files changed, 173 insertions(+), 2 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 47250398aa2..6e8060f8a0b 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -19,6 +19,10 @@
 #include "tree-walk.h"
 #include "pseudo-merge.h"
 #include "oid-array.h"
+#include "config.h"
+#include "alloc.h"
+#include "refs.h"
+#include "strmap.h"
 
 struct bitmapped_commit {
 	struct commit *commit;
@@ -465,6 +469,7 @@ static int fill_bitmap_tree(struct bitmap_writer *writer,
 }
 
 static int reused_bitmaps_nr;
+static int reused_pseudo_merge_bitmaps_nr;
 
 static int fill_bitmap_commit(struct bitmap_writer *writer,
 			      struct bb_commit *ent,
@@ -490,7 +495,7 @@ static int fill_bitmap_commit(struct bitmap_writer *writer,
 			struct bitmap *remapped = bitmap_new();
 
 			if (commit->object.flags & BITMAP_PSEUDO_MERGE)
-				old = NULL;
+				old = pseudo_merge_bitmap_for_commit(old_bitmap, c);
 			else
 				old = bitmap_for_commit(old_bitmap, c);
 			/*
@@ -501,7 +506,10 @@ static int fill_bitmap_commit(struct bitmap_writer *writer,
 			if (old && !rebuild_bitmap(mapping, old, remapped)) {
 				bitmap_or(ent->bitmap, remapped);
 				bitmap_free(remapped);
-				reused_bitmaps_nr++;
+				if (commit->object.flags & BITMAP_PSEUDO_MERGE)
+					reused_pseudo_merge_bitmaps_nr++;
+				else
+					reused_bitmaps_nr++;
 				continue;
 			}
 			bitmap_free(remapped);
@@ -631,6 +639,9 @@ int bitmap_writer_build(struct bitmap_writer *writer,
 			    the_repository);
 	trace2_data_intmax("pack-bitmap-write", the_repository,
 			   "building_bitmaps_reused", reused_bitmaps_nr);
+	trace2_data_intmax("pack-bitmap-write", the_repository,
+			   "building_bitmaps_pseudo_merge_reused",
+			   reused_pseudo_merge_bitmaps_nr);
 
 	stop_progress(&writer->progress);
 
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 1966b3b95f1..70230e26479 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1316,6 +1316,37 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 	return cb.base;
 }
 
+struct ewah_bitmap *pseudo_merge_bitmap_for_commit(struct bitmap_index *bitmap_git,
+						   struct commit *commit)
+{
+	struct commit_list *p;
+	struct bitmap *parents;
+	struct pseudo_merge *match = NULL;
+
+	if (!bitmap_git->pseudo_merges.nr)
+		return NULL;
+
+	parents = bitmap_new();
+
+	for (p = commit->parents; p; p = p->next) {
+		int pos = bitmap_position(bitmap_git, &p->item->object.oid);
+		if (pos < 0 || pos >= bitmap_num_objects(bitmap_git))
+			goto done;
+
+		bitmap_set(parents, pos);
+	}
+
+	match = pseudo_merge_for_parents(&bitmap_git->pseudo_merges,
+						parents);
+
+done:
+	bitmap_free(parents);
+	if (match)
+		return pseudo_merge_bitmap(&bitmap_git->pseudo_merges, match);
+
+	return NULL;
+}
+
 static void unsatisfy_all_pseudo_merges(struct bitmap_index *bitmap_git)
 {
 	uint32_t i;
@@ -2809,6 +2840,7 @@ void free_bitmap_index(struct bitmap_index *b)
 		 */
 		close_midx_revindex(b->midx);
 	}
+	free_pseudo_merge_map(&b->pseudo_merges);
 	free(b);
 }
 
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 4466b5ad0fb..1171e6d9893 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -142,6 +142,8 @@ int rebuild_bitmap(const uint32_t *reposition,
 		   struct bitmap *dest);
 struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
 				      struct commit *commit);
+struct ewah_bitmap *pseudo_merge_bitmap_for_commit(struct bitmap_index *bitmap_git,
+						   struct commit *commit);
 void bitmap_writer_select_commits(struct bitmap_writer *writer,
 				  struct commit **indexed_commits,
 				  unsigned int indexed_commits_nr);
diff --git a/pseudo-merge.c b/pseudo-merge.c
index 0f50ac6183e..36a617f64e6 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -695,3 +695,58 @@ int cascade_pseudo_merges(const struct pseudo_merge_map *pm,
 
 	return ret;
 }
+
+struct pseudo_merge *pseudo_merge_for_parents(const struct pseudo_merge_map *pm,
+					      struct bitmap *parents)
+{
+	struct pseudo_merge *match = NULL;
+	size_t i;
+
+	if (!pm->nr)
+		return NULL;
+
+	/*
+	 * NOTE: this loop is quadratic in the worst-case (where no
+	 * matching pseudo-merge bitmaps are found), but in practice
+	 * this is OK for a few reasons:
+	 *
+	 *   - Rejecting pseudo-merge bitmaps that do not match the
+	 *     given commit is done quickly (i.e. `bitmap_equals_ewah()`
+	 *     returns early when we know the two bitmaps aren't equal.
+	 *
+	 *   - Already matched pseudo-merge bitmaps (which we track with
+	 *     the `->satisfied` bit here) are skipped as potential
+	 *     candidates.
+	 *
+	 *   - The number of pseudo-merges should be small (in the
+	 *     hundreds for most repositories).
+	 *
+	 * If in the future this semi-quadratic behavior does become a
+	 * problem, another approach would be to keep track of which
+	 * pseudo-merges are still "viable" after enumerating the
+	 * pseudo-merge commit's parents:
+	 *
+	 *   - A pseudo-merge bitmap becomes non-viable when the bit(s)
+	 *     corresponding to one or more parent(s) of the given
+	 *     commit are not set in a candidate pseudo-merge's commits
+	 *     bitmap.
+	 *
+	 *   - After processing all bits, enumerate the remaining set of
+	 *     viable pseudo-merge bitmaps, and check that their
+	 *     popcount() matches the number of parents in the given
+	 *     commit.
+	 */
+	for (i = 0; i < pm->nr; i++) {
+		struct pseudo_merge *candidate = use_pseudo_merge(pm, &pm->v[i]);
+		if (!candidate || candidate->satisfied)
+			continue;
+		if (!bitmap_equals_ewah(parents, candidate->commits))
+			continue;
+
+		match = candidate;
+		match->satisfied = 1;
+		break;
+	}
+
+	return match;
+}
diff --git a/pseudo-merge.h b/pseudo-merge.h
index c00b622be4b..62fde979015 100644
--- a/pseudo-merge.h
+++ b/pseudo-merge.h
@@ -206,4 +206,11 @@ int cascade_pseudo_merges(const struct pseudo_merge_map *pm,
 			  struct bitmap *result,
 			  struct bitmap *roots);
 
+/*
+ * Returns a pseudo-merge which contains the exact set of commits
+ * listed in the "parents" bitamp, or NULL if none could be found.
+ */
+struct pseudo_merge *pseudo_merge_for_parents(const struct pseudo_merge_map *pm,
+					      struct bitmap *parents);
+
 #endif
diff --git a/t/t5333-pseudo-merge-bitmaps.sh b/t/t5333-pseudo-merge-bitmaps.sh
index 3a7dc7278a7..7ae4b7a35b7 100755
--- a/t/t5333-pseudo-merge-bitmaps.sh
+++ b/t/t5333-pseudo-merge-bitmaps.sh
@@ -22,6 +22,10 @@ test_pseudo_merges_cascades () {
 	test_trace2_data bitmap pseudo_merges_cascades "$1"
 }
 
+test_pseudo_merges_reused () {
+	test_trace2_data pack-bitmap-write building_bitmaps_pseudo_merge_reused "$1"
+}
+
 tag_everything () {
 	git rev-list --all --no-object-names >in &&
 	perl -lne '
@@ -320,4 +324,64 @@ test_expect_success 'pseudo-merge overlap stale traversal' '
 	)
 '
 
+test_expect_success 'pseudo-merge reuse' '
+	git init pseudo-merge-reuse &&
+	(
+		cd pseudo-merge-reuse &&
+
+		stable="1641013200" && # 2022-01-01
+		unstable="1672549200" && # 2023-01-01
+
+		for date in $stable $unstable
+		do
+			test_commit_bulk --date "$date +0000" 128 &&
+			test_tick || return 1
+		done &&
+
+		tag_everything &&
+
+		git \
+			-c bitmapPseudoMerge.test.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.test.maxMerges=1 \
+			-c bitmapPseudoMerge.test.threshold=now \
+			-c bitmapPseudoMerge.test.stableThreshold=$(($unstable - 1)) \
+			-c bitmapPseudoMerge.test.stableSize=512 \
+			repack -adb &&
+
+		test_pseudo_merges >merges &&
+		test_line_count = 2 merges &&
+
+		test_pseudo_merge_commits 0 >stable-oids.before &&
+		test_pseudo_merge_commits 1 >unstable-oids.before &&
+
+		: >trace2.txt &&
+		GIT_TRACE2_EVENT=$PWD/trace2.txt git \
+			-c bitmapPseudoMerge.test.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.test.maxMerges=2 \
+			-c bitmapPseudoMerge.test.threshold=now \
+			-c bitmapPseudoMerge.test.stableThreshold=$(($unstable - 1)) \
+			-c bitmapPseudoMerge.test.stableSize=512 \
+			repack -adb &&
+
+		test_pseudo_merges_reused 1 <trace2.txt &&
+
+		test_pseudo_merges >merges &&
+		test_line_count = 3 merges &&
+
+		test_pseudo_merge_commits 0 >stable-oids.after &&
+		for i in 1 2
+		do
+			test_pseudo_merge_commits $i || return 1
+		done >unstable-oids.after &&
+
+		sort -u <stable-oids.before >expect &&
+		sort -u <stable-oids.after >actual &&
+		test_cmp expect actual &&
+
+		sort -u <unstable-oids.before >expect &&
+		sort -u <unstable-oids.after >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.45.1.175.gbea44add9db


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v3 30/30] t/perf: implement performace tests for pseudo-merge bitmaps
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (23 preceding siblings ...)
  2024-05-21 19:03   ` [PATCH v3 29/30] pseudo-merge: implement support for finding existing merges Taylor Blau
@ 2024-05-21 19:03   ` Taylor Blau
  2024-05-23 10:54     ` Jeff King
  2024-05-23 11:05   ` [PATCH v3 00/30] pack-bitmap: pseudo-merge reachability bitmaps Jeff King
  25 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:03 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Implement a straightforward performance test demonstrating the benefit
of pseudo-merge bitmaps by measuring how long it takes to count
reachable objects in a few different scenarios:

  - without bitmaps, to demonstrate a reasonable baseline
  - with bitmaps, but without pseudo-merges
  - with bitmaps and pseudo-merges

Results from running this test on git.git are as follows:

    Test                                                                this tree
    -----------------------------------------------------------------------------------
    5333.2: git rev-list --count --all --objects (no bitmaps)           3.46(3.37+0.09)
    5333.3: git rev-list --count --all --objects (no pseudo-merges)     0.13(0.11+0.01)
    5333.4: git rev-list --count --all --objects (with pseudo-merges)   0.12(0.11+0.01)

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/p5333-pseudo-merge-bitmaps.sh | 32 ++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)
 create mode 100755 t/perf/p5333-pseudo-merge-bitmaps.sh

diff --git a/t/perf/p5333-pseudo-merge-bitmaps.sh b/t/perf/p5333-pseudo-merge-bitmaps.sh
new file mode 100755
index 00000000000..4bec409d10e
--- /dev/null
+++ b/t/perf/p5333-pseudo-merge-bitmaps.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+
+test_description='pseudo-merge bitmaps'
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test_expect_success 'setup' '
+	git \
+		-c bitmapPseudoMerge.all.pattern="refs/" \
+		-c bitmapPseudoMerge.all.threshold=now \
+		-c bitmapPseudoMerge.all.stableThreshold=never \
+		-c bitmapPseudoMerge.all.maxMerges=64 \
+		-c pack.writeBitmapLookupTable=true \
+		repack -adb
+'
+
+test_perf 'git rev-list --count --all --objects (no bitmaps)' '
+	git rev-list --objects --all
+'
+
+test_perf 'git rev-list --count --all --objects (no pseudo-merges)' '
+	GIT_TEST_USE_PSEDUO_MERGES=0 \
+		git rev-list --objects --all --use-bitmap-index
+'
+
+test_perf 'git rev-list --count --all --objects (with pseudo-merges)' '
+	GIT_TEST_USE_PSEDUO_MERGES=1 \
+		git rev-list --objects --all --use-bitmap-index
+'
+
+test_done
-- 
2.45.1.175.gbea44add9db

^ permalink raw reply related	[flat|nested] 157+ messages in thread

* Re: [PATCH v3 01/30] object.h: add flags allocated by pack-bitmap.h
  2024-05-21 19:01   ` [PATCH v3 01/30] object.h: add flags allocated by pack-bitmap.h Taylor Blau
@ 2024-05-21 19:06     ` Taylor Blau
  0 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-21 19:06 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

On Tue, May 21, 2024 at 03:01:48PM -0400, Taylor Blau wrote:
> In commit 7cc8f971085 (pack-objects: implement bitmap writing,
> 2013-12-21) the NEEDS_BITMAP flag was introduced into pack-bitmap.h, but
> no object flags allocation table existed at the time.

Oops. When I prepared these patches for the list, I told format-patch
that the base was origin/master, not @{u}. The correct upstream on which
patch 7 and onwards is based is 'tb/pack-bitmap-write-cleanups' as I
wrote in the cover letter.

Junio: please ignore this patch and apply patches 7-30 on top of your
'tb/pack-bitmap-write-cleanups'.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v3 16/30] config: introduce git_config_float()
  2024-05-21 19:02   ` [PATCH v3 16/30] config: introduce git_config_float() Taylor Blau
@ 2024-05-23 10:02     ` Jeff King
  2024-05-23 17:51       ` Taylor Blau
  0 siblings, 1 reply; 157+ messages in thread
From: Jeff King @ 2024-05-23 10:02 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano

On Tue, May 21, 2024 at 03:02:29PM -0400, Taylor Blau wrote:

> Future commits will want to parse a floating point value from
> configuration, but we have no way to parse such a value prior to this
> patch.
> 
> The core of the routine is implemented in git_parse_float(). Unlike
> git_parse_unsigned() and git_parse_signed(), however, the function
> implemented here only works on type "float", and not related types like
> "double", or "long double".
> 
> This is because "double" and "long double" use different functions to
> convert from ASCII strings to floating point values (strtod() and
> strtold(), respectively). Likewise, there is no pointer type that can
> assign to any of these values (except for "void *"), so the only way to
> define this trio of functions would be with a macro expansion that is
> parameterized over the floating point type and conversion function.

I agree it doesn't make sense to support both. But if we have to choose
between the two, should we just use "double"?

I doubt you need the extra precision for your case, but I also doubt
that the speed/storage benefits of "float" would matter. And support for
"float" in C is kind of weird. There is no "float" specifier for printf.
And according to my copy of strtof(3), until C99 we only had strtod()!


Regarding using non-integers at all, I do wonder how much we need them.
We've usually stuck to integers in other spots, even if it means a sort
of pseudo-fixed-point (e.g., rename scores). Looking ahead, you're using
these for the power-series knobs. I guess it would be pretty confusing
to try to force integers there. I dunno. Not really an objection, but I
just wonder if it was something you considered.

-Peff

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v3 17/30] pseudo-merge: implement support for selecting pseudo-merge commits
  2024-05-21 19:02   ` [PATCH v3 17/30] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
@ 2024-05-23 10:12     ` Jeff King
  2024-05-23 17:56       ` Taylor Blau
  0 siblings, 1 reply; 157+ messages in thread
From: Jeff King @ 2024-05-23 10:12 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano

On Tue, May 21, 2024 at 03:02:32PM -0400, Taylor Blau wrote:

> diff --git a/Documentation/config/bitmap-pseudo-merge.txt b/Documentation/config/bitmap-pseudo-merge.txt
> new file mode 100644
> index 00000000000..d4a2023b84a
> --- /dev/null
> +++ b/Documentation/config/bitmap-pseudo-merge.txt
> @@ -0,0 +1,90 @@
> +NOTE: The configuration options in `bitmapPseudoMerge.*` are considered
> +EXPERIMENTAL and may be subject to change or be removed entirely in the
> +future.

I'm happy to see this all marked as experimental. We really don't know
what selection approach will be best yet, and it's good not to lock
ourselves in.

I wasn't sure how this would format via asciidoc, since we're in the
middle of a list of variables. It...kind of looks like the note goes
under the previous entry (for attr.tree) in the text manpage. Though
looking at the docbook, I _think_ it's actually outside of that, and
it's just how the roff ends up indenting it?

I don't know if it's worth spending too much time on, but maybe there's
an easy way to do it differently (I couldn't think of one).

This paragraph might also be a good place to refer to gitpacking(7).

> diff --git a/Documentation/gitpacking.txt b/Documentation/gitpacking.txt
> index ff18077129b..1ed645ff910 100644
> --- a/Documentation/gitpacking.txt
> +++ b/Documentation/gitpacking.txt

Thanks, I think this new file and the content you added address all of
my documentation complaints.

I think it will also be a good place to discuss bitmaps in general, but
you were IMHO wise to not do that in this series, which is already quite
big. :)

-Peff

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v3 22/30] pseudo-merge: implement support for reading pseudo-merge commits
  2024-05-21 19:02   ` [PATCH v3 22/30] pseudo-merge: implement support for reading pseudo-merge commits Taylor Blau
@ 2024-05-23 10:40     ` Jeff King
  2024-05-23 18:09       ` Taylor Blau
  0 siblings, 1 reply; 157+ messages in thread
From: Jeff King @ 2024-05-23 10:40 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano

On Tue, May 21, 2024 at 03:02:48PM -0400, Taylor Blau wrote:

> These functions are all documented in pseudo-merge.h, but their rough
> descriptions are as follows:
> 
>   - pseudo_merge_bitmap() reads and inflates the objects EWAH bitmap for
>     a given pseudo-merge
> 
>   - use_pseudo_merge() does the same as pseudo_merge_bitmap(), but on
>     the commits EWAH bitmap, not the objects bitmap
> 
>   - apply_pseudo_merges_for_commit() applies all satisfied pseudo-merge
>     commits for a given result set, and cascades any yet-unsatisfied
>     pseudo-merges if any were applied in the previous step
> 
>   - cascade_pseudo_merges() applies all pseudo-merges which are
>     satisfied but have not been previously applied, repeating this
>     process until no more pseudo-merges can be applied

OK, so I think this commit is getting into the meat of how the new
bitmaps will be used. Just to restate it from a high-level to make sure
I understand, I think it is:

  1. When we are traversing (or even before we traverse and just know
     our tips), we can always say "hey, I have a commit in the bitmap;
     does this satisfy any pseudo-merges?". Where "satisfy" is "all of
     the commits pseudo-merged for that bitmap are already in our
     result". And if so, then we can use the pseudo-merge bitmap by
     OR-ing it in.

     And that's apply_pseudo_merges_for_commit().

  2. That "OR" operation may likewise open up new options, so we
     recurse. And that's the "cascade" function.

This commit is not yet adding the calls into this code for part (1). I
think there's an open question there of overhead; i.e., how expensive it
is to check whether each pseudo-merge is satisfied. And whether it makes
sense to just do it once at the beginning of the traversal (with the
provided tips) or to keep checking as we traverse (more expensive, but
it makes it more likely to use an older pseudo-merge bitmap that's had
new history built on top of some of its commits).

But those calls should come later, so let's read on.

> +static int pseudo_merge_ext_at(const struct pseudo_merge_map *pm,
> +			       struct pseudo_merge_commit_ext *ext, size_t at)
> +{
> +	if (at >= pm->map_size)
> +		return error(_("extended pseudo-merge read out-of-bounds "
> +			       "(%"PRIuMAX" >= %"PRIuMAX")"),
> +			     (uintmax_t)at, (uintmax_t)pm->map_size);
> +
> +	ext->nr = get_be32(pm->map + at);
> +	ext->ptr = pm->map + at + sizeof(uint32_t);
> +
> +	return 0;
> +}

I was happy to see the boundary check here. Do we need a length check,
too? We'd need at least four bytes here for the uint32_t. Does map_size
include the trailing hash? If not, then it might provide a bit of slop
(we'd read garbage, but never go outside the mmap).

I guess the ">=" in the size check implies that we have at least one
byte, but I don't think anything promises that we're correctly 4-byte
aligned.

The rest of the length check is here:

> +struct ewah_bitmap *pseudo_merge_bitmap(const struct pseudo_merge_map *pm,
> +					struct pseudo_merge *merge)
> +{
> +	if (!merge->loaded_commits)
> +		BUG("cannot use unloaded pseudo-merge bitmap");
> +
> +	if (!merge->loaded_bitmap) {
> +		size_t at = merge->bitmap_at;
> +
> +		merge->bitmap = read_bitmap(pm->map, pm->map_size, &at);
> +		merge->loaded_bitmap = 1;
> +	}
> +
> +	return merge->bitmap;
> +}

When we call read_bitmap(), it knows where the end is, and it's
careful to avoid reading past it. Good.

-Peff

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v3 25/30] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()`
  2024-05-21 19:02   ` [PATCH v3 25/30] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` Taylor Blau
@ 2024-05-23 10:42     ` Jeff King
  2024-05-23 15:45       ` Junio C Hamano
  0 siblings, 1 reply; 157+ messages in thread
From: Jeff King @ 2024-05-23 10:42 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano

On Tue, May 21, 2024 at 03:02:59PM -0400, Taylor Blau wrote:

> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
> index 862d80c9748..16fd585e34b 100644
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -458,6 +458,7 @@ test_commit_bulk () {
>  	indir=.
>  	ref=HEAD
>  	n=1
> +	notick=
>  	message='commit %s'
>  	filename='%s.t'
>  	contents='content %s'
> @@ -488,6 +489,12 @@ test_commit_bulk () {
>  			filename="${1#--*=}-%s.t"
>  			contents="${1#--*=} %s"
>  			;;
> +		--date)
> +			notick=yes
> +			GIT_COMMITTER_DATE="$2"
> +			GIT_AUTHOR_DATE="$2"
> +			shift
> +			;;

This gives all of the bulk commits the same date. Which is kind of
unrealistic. Conceivably you'd want to be set the starting date at some
old spot, and then tick forward from there. It may not matter much in
practice, though.

-Peff

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v3 26/30] pack-bitmap.c: use pseudo-merges during traversal
  2024-05-21 19:03   ` [PATCH v3 26/30] pack-bitmap.c: use pseudo-merges during traversal Taylor Blau
@ 2024-05-23 10:48     ` Jeff King
  2024-05-23 18:23       ` Taylor Blau
  0 siblings, 1 reply; 157+ messages in thread
From: Jeff King @ 2024-05-23 10:48 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano

On Tue, May 21, 2024 at 03:03:03PM -0400, Taylor Blau wrote:

> The basic operation is as follows:
> 
>   - When enumerating objects on either side of a reachability query,
>     first see if any subset of the roots satisfies some pseudo-merge
>     bitmap. If it does, apply that pseudo-merge bitmap.
> 
>   - If any pseudo-merge bitmap(s) were applied in the previous step, OR
>     them into the result[^1]. Then repeat the process over all
>     pseudo-merge bitmaps (we'll refer to this as "cascading"
>     pseudo-merges). Once this is done, OR in the resulting bitmap.
> 
>   - If there is no fill-in traversal to be done, return the bitmap for
>     that side of the reachability query. If there is fill-in traversal,
>     then for each commit we encounter via show_commit(), check to see if
>     any unsatisfied pseudo-merges containing that commit as one of its
>     parents has been made satisfied by the presence of that commit.
> 
>     If so, OR in the object set from that pseudo-merge bitmap, and then
>     cascade. If not, continue traversal.

Ah, OK. This is the high-level overview I was looking for in the earlier
commit. ;) I think it is fine here. I just hadn't gotten to it yet (and
I think it is much better stated than what I wrote in my earlier
response).

> [^1]: Importantly, we cannot OR in the entire set of roots along with
>   the objects reachable from whatever pseudo-merge bitmaps were
>   satisfied.  This may leave some dangling bits corresponding to any
>   unsatisfied root(s) getting OR'd into the resulting bitmap, tricking
>   other parts of the traversal into thinking we already have a
>   reachability closure over those commit(s) when we do not.

I think I know how you made this realization. :)

The code itself looks as I'd expect, and the entry points are nice and
clean.

-Peff

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v3 30/30] t/perf: implement performace tests for pseudo-merge bitmaps
  2024-05-21 19:03   ` [PATCH v3 30/30] t/perf: implement performace tests for pseudo-merge bitmaps Taylor Blau
@ 2024-05-23 10:54     ` Jeff King
  2024-05-23 19:53       ` Taylor Blau
  0 siblings, 1 reply; 157+ messages in thread
From: Jeff King @ 2024-05-23 10:54 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano

On Tue, May 21, 2024 at 03:03:17PM -0400, Taylor Blau wrote:

> Implement a straightforward performance test demonstrating the benefit
> of pseudo-merge bitmaps by measuring how long it takes to count
> reachable objects in a few different scenarios:
> 
>   - without bitmaps, to demonstrate a reasonable baseline
>   - with bitmaps, but without pseudo-merges
>   - with bitmaps and pseudo-merges
> 
> Results from running this test on git.git are as follows:
> 
>     Test                                                                this tree
>     -----------------------------------------------------------------------------------
>     5333.2: git rev-list --count --all --objects (no bitmaps)           3.46(3.37+0.09)
>     5333.3: git rev-list --count --all --objects (no pseudo-merges)     0.13(0.11+0.01)
>     5333.4: git rev-list --count --all --objects (with pseudo-merges)   0.12(0.11+0.01)

That's not a very exciting result. I have a feeling that your git.git is
not a very interesting test case. We'd want a lot of refs, and ones that
are old and have bushy history that is not included in the more recent
branches. So something like a bunch of old unmerged pull request heads,
for example. ;) Do you have more interesting numbers for something like
that?

-Peff

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v3 00/30] pack-bitmap: pseudo-merge reachability bitmaps
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
                     ` (24 preceding siblings ...)
  2024-05-21 19:03   ` [PATCH v3 30/30] t/perf: implement performace tests for pseudo-merge bitmaps Taylor Blau
@ 2024-05-23 11:05   ` Jeff King
  2024-05-23 20:04     ` Taylor Blau
  2024-05-23 20:42     ` Taylor Blau
  25 siblings, 2 replies; 157+ messages in thread
From: Jeff King @ 2024-05-23 11:05 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano

On Tue, May 21, 2024 at 03:01:38PM -0400, Taylor Blau wrote:

> Here is another reroll my topic to introduce pseudo-merge bitmaps.

OK, I got through the whole thing. I left a few small comments, but
mostly just observations. Overall, the shape of it looks pretty good.
The much bigger question to me is: does it work?

The perf results you showed at the end are underwhelming, but I think
that's mostly because it's not an interesting repository. I think it
would be nice to see at least some point-in-time benchmarks on a
single repository.

But much more interesting to me is how it performs in the real world in
aggregate, over time:

  - how often / how much do pseudo-merges speed up queries in the real
    world. Clones/fetches, but also reachability queries. Could
    connectivity checks use bitmaps with this?

  - how often do pseudo-merge groups get invalidated by refs changing
    (and thus we lose the speedups from above)?

  - what's the cost like to generate them initially?

  - what's the cost like for subsequent repacks? Does the selection /
    grouping algorithm do a good job of keeping the older, larger groups
    stable (so that we can reuse them verbatim)?

I know you don't have those answers yet, and I know there's some
chicken-and-egg with getting this integrated so that you can start to
explore that. So I mostly reviewed this with an eye towards:

  - does the idea make sense (I think it does, but I'm kind of biased)

  - are the patches going to hurt anybody who isn't using the new
    feature (I think the answer is no)

  - does the on-disk representation seem right, since that is hard to
    change later (I didn't see any issues)

  - does the implementation look clean and plausibly correct (yes, but
    what I'm getting at is that I didn't pore over all of the new code
    with a microscope. Mostly I think the proof is in the pudding that
    it provides the same correct answers more quickly).

So to my eyes it looks good to move forward and let people start playing
with it. The big "experimental" warning in the config is good. Maybe
we'd want want in gitpacking(7), too?

I did wonder briefly what the backup plan is if there are problems with
the on-disk format (or worst case, if it turns out to be a dead end).
We've allocated a flag bit which I think we'd need to respect forever
(though as an optional extension, it's OK to understand and ignore it).
If we needed a "pseudo-merge bitmaps v2", how would that work? I think
we'd have to allocate another bit in the flags field.

I wonder if the start of the pseudo-merge section should have a 4-byte
version/flags field itself? I don't think that's something we've done
before, and maybe it's overkill. I dunno. It's just a lot easier to do
now than later.

-Peff

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v3 25/30] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()`
  2024-05-23 10:42     ` Jeff King
@ 2024-05-23 15:45       ` Junio C Hamano
  2024-05-23 18:23         ` Taylor Blau
  0 siblings, 1 reply; 157+ messages in thread
From: Junio C Hamano @ 2024-05-23 15:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, Elijah Newren, Patrick Steinhardt

Jeff King <peff@peff.net> writes:

> On Tue, May 21, 2024 at 03:02:59PM -0400, Taylor Blau wrote:
>
>> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
>> index 862d80c9748..16fd585e34b 100644
>> --- a/t/test-lib-functions.sh
>> +++ b/t/test-lib-functions.sh
>> @@ -458,6 +458,7 @@ test_commit_bulk () {
>>  	indir=.
>>  	ref=HEAD
>>  	n=1
>> +	notick=
>>  	message='commit %s'
>>  	filename='%s.t'
>>  	contents='content %s'
>> @@ -488,6 +489,12 @@ test_commit_bulk () {
>>  			filename="${1#--*=}-%s.t"
>>  			contents="${1#--*=} %s"
>>  			;;
>> +		--date)
>> +			notick=yes
>> +			GIT_COMMITTER_DATE="$2"
>> +			GIT_AUTHOR_DATE="$2"
>> +			shift
>> +			;;
>
> This gives all of the bulk commits the same date. Which is kind of
> unrealistic.

Yeah, giving this helper function a "--notick" option, without
adding this "--date" option, is a better design, I suspect.

The callers Taylor expected to use --date can set the _DATE
variables and pass "--notick", and callers that want the same
timestamp without caring which exact timestamp can just pass
"--notick" without futzing with _DATE variables.

Thanks.

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v3 16/30] config: introduce git_config_float()
  2024-05-23 10:02     ` Jeff King
@ 2024-05-23 17:51       ` Taylor Blau
  0 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 17:51 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano

On Thu, May 23, 2024 at 06:02:25AM -0400, Jeff King wrote:
> On Tue, May 21, 2024 at 03:02:29PM -0400, Taylor Blau wrote:
>
> > Future commits will want to parse a floating point value from
> > configuration, but we have no way to parse such a value prior to this
> > patch.
> >
> > The core of the routine is implemented in git_parse_float(). Unlike
> > git_parse_unsigned() and git_parse_signed(), however, the function
> > implemented here only works on type "float", and not related types like
> > "double", or "long double".
> >
> > This is because "double" and "long double" use different functions to
> > convert from ASCII strings to floating point values (strtod() and
> > strtold(), respectively). Likewise, there is no pointer type that can
> > assign to any of these values (except for "void *"), so the only way to
> > define this trio of functions would be with a macro expansion that is
> > parameterized over the floating point type and conversion function.
>
> I agree it doesn't make sense to support both. But if we have to choose
> between the two, should we just use "double"?

Yeah, I share your feeling that there is no great need to support
double-precision floats here, but there's also no reason to artificially
limit ourselves to single-precision ones, either.

> Regarding using non-integers at all, I do wonder how much we need them.
> We've usually stuck to integers in other spots, even if it means a sort
> of pseudo-fixed-point (e.g., rename scores). Looking ahead, you're using
> these for the power-series knobs. I guess it would be pretty confusing
> to try to force integers there. I dunno. Not really an objection, but I
> just wonder if it was something you considered.

I had originally written the series that way, but Patrick suggested that
I use floating point numbers instead ;-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v3 17/30] pseudo-merge: implement support for selecting pseudo-merge commits
  2024-05-23 10:12     ` Jeff King
@ 2024-05-23 17:56       ` Taylor Blau
  0 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 17:56 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano

On Thu, May 23, 2024 at 06:12:48AM -0400, Jeff King wrote:
> On Tue, May 21, 2024 at 03:02:32PM -0400, Taylor Blau wrote:
>
> > diff --git a/Documentation/config/bitmap-pseudo-merge.txt b/Documentation/config/bitmap-pseudo-merge.txt
> > new file mode 100644
> > index 00000000000..d4a2023b84a
> > --- /dev/null
> > +++ b/Documentation/config/bitmap-pseudo-merge.txt
> > @@ -0,0 +1,90 @@
> > +NOTE: The configuration options in `bitmapPseudoMerge.*` are considered
> > +EXPERIMENTAL and may be subject to change or be removed entirely in the
> > +future.
>
> I'm happy to see this all marked as experimental. We really don't know
> what selection approach will be best yet, and it's good not to lock
> ourselves in.
>
> I wasn't sure how this would format via asciidoc, since we're in the
> middle of a list of variables. It...kind of looks like the note goes
> under the previous entry (for attr.tree) in the text manpage. Though
> looking at the docbook, I _think_ it's actually outside of that, and
> it's just how the roff ends up indenting it?

I think that you're right that this is a roff indentation thing. Looking
at the html version rendered by asciidoc, it places the NOTE in an
admonition block [1] that is separate from the previous attr.tree entry.

> I don't know if it's worth spending too much time on, but maybe there's
> an easy way to do it differently (I couldn't think of one).
>
> This paragraph might also be a good place to refer to gitpacking(7).

Great idea, thanks.

> > diff --git a/Documentation/gitpacking.txt b/Documentation/gitpacking.txt
> > index ff18077129b..1ed645ff910 100644
> > --- a/Documentation/gitpacking.txt
> > +++ b/Documentation/gitpacking.txt
>
> Thanks, I think this new file and the content you added address all of
> my documentation complaints.

Good :-).

> I think it will also be a good place to discuss bitmaps in general, but
> you were IMHO wise to not do that in this series, which is already quite
> big. :)

Thanks. Yeah, I would like to add more about bitmaps and other advanced
packing concepts in general to gitpacking(7), but let's do so outside of
this already-gigantic series ;-).

Thanks,
Taylor

[1]: https://docs.asciidoctor.org/asciidoc/latest/blocks/admonitions/

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v3 22/30] pseudo-merge: implement support for reading pseudo-merge commits
  2024-05-23 10:40     ` Jeff King
@ 2024-05-23 18:09       ` Taylor Blau
  0 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 18:09 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano

On Thu, May 23, 2024 at 06:40:00AM -0400, Jeff King wrote:
> OK, so I think this commit is getting into the meat of how the new
> bitmaps will be used. Just to restate it from a high-level to make sure
> I understand, I think it is:
>
>   1. When we are traversing (or even before we traverse and just know
>      our tips), we can always say "hey, I have a commit in the bitmap;
>      does this satisfy any pseudo-merges?". Where "satisfy" is "all of
>      the commits pseudo-merged for that bitmap are already in our
>      result". And if so, then we can use the pseudo-merge bitmap by
>      OR-ing it in.
>
>      And that's apply_pseudo_merges_for_commit().
>
>   2. That "OR" operation may likewise open up new options, so we
>      recurse. And that's the "cascade" function.

Exactly. I think implicit in the above is that your (2) is also a
recursive step, since each cascade step may open us up to new
pseudo-merges, which themselves may reach objects which satisfy other
pseudo-merges, and so on.

> > +static int pseudo_merge_ext_at(const struct pseudo_merge_map *pm,
> > +			       struct pseudo_merge_commit_ext *ext, size_t at)
> > +{
> > +	if (at >= pm->map_size)
> > +		return error(_("extended pseudo-merge read out-of-bounds "
> > +			       "(%"PRIuMAX" >= %"PRIuMAX")"),
> > +			     (uintmax_t)at, (uintmax_t)pm->map_size);
> > +
> > +	ext->nr = get_be32(pm->map + at);
> > +	ext->ptr = pm->map + at + sizeof(uint32_t);
> > +
> > +	return 0;
> > +}
>
> I was happy to see the boundary check here. Do we need a length check,
> too? We'd need at least four bytes here for the uint32_t. Does map_size
> include the trailing hash? If not, then it might provide a bit of slop
> (we'd read garbage, but never go outside the mmap).
>
> I guess the ">=" in the size check implies that we have at least one
> byte, but I don't think anything promises that we're correctly 4-byte
> aligned.

Yeah, we could read into the trailing hash area, which would just be
garbage from our perspective. But I think that adding a length check is
easy enough to do, something like:

--- 8< ---
diff --git a/pseudo-merge.c b/pseudo-merge.c
index b539791396..7d13101149 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -478,6 +478,10 @@ static int pseudo_merge_ext_at(const struct pseudo_merge_map *pm,
 		return error(_("extended pseudo-merge read out-of-bounds "
 			       "(%"PRIuMAX" >= %"PRIuMAX")"),
 			     (uintmax_t)at, (uintmax_t)pm->map_size);
+	if (at + 4 >= pm->map_size)
+		return error(_("extended pseudo-merge entry is too short "
+			       "(%"PRIuMAX" >= %"PRIuMAX")"),
+			     (uintmax_t)(at + 4), (uintmax_t)pm->map_size);

 	ext->nr = get_be32(pm->map + at);
 	ext->ptr = pm->map + at + sizeof(uint32_t);
--- >8 ---

> The rest of the length check is here:
>
> > +struct ewah_bitmap *pseudo_merge_bitmap(const struct pseudo_merge_map *pm,
> > +					struct pseudo_merge *merge)
> > +{
> > +	if (!merge->loaded_commits)
> > +		BUG("cannot use unloaded pseudo-merge bitmap");
> > +
> > +	if (!merge->loaded_bitmap) {
> > +		size_t at = merge->bitmap_at;
> > +
> > +		merge->bitmap = read_bitmap(pm->map, pm->map_size, &at);
> > +		merge->loaded_bitmap = 1;
> > +	}
> > +
> > +	return merge->bitmap;
> > +}
>
> When we call read_bitmap(), it knows where the end is, and it's
> careful to avoid reading past it. Good.

Yep, thanks for double checking.

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 157+ messages in thread

* Re: [PATCH v3 25/30] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()`
  2024-05-23 15:45       ` Junio C Hamano
@ 2024-05-23 18:23         ` Taylor Blau
  0 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 18:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git, Elijah Newren, Patrick Steinhardt

On Thu, May 23, 2024 at 08:45:24AM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > On Tue, May 21, 2024 at 03:02:59PM -0400, Taylor Blau wrote:
> >
> >> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
> >> index 862d80c9748..16fd585e34b 100644
> >> --- a/t/test-lib-functions.sh
> >> +++ b/t/test-lib-functions.sh
> >> @@ -458,6 +458,7 @@ test_commit_bulk () {
> >>  	indir=.
> >>  	ref=HEAD
> >>  	n=1
> >> +	notick=
> >>  	message='commit %s'
> >>  	filename='%s.t'
> >>  	contents='content %s'
> >> @@ -488,6 +489,12 @@ test_commit_bulk () {
> >>  			filename="${1#--*=}-%s.t"
> >>  			contents="${1#--*=} %s"
> >>  			;;
> >> +		--date)
> >> +			notick=yes
> >> +			GIT_COMMITTER_DATE="$2"
> >> +			GIT_AUTHOR_DATE="$2"
> >> +			shift
> >> +			;;
> >
> > This gives all of the bulk commits the same date. Which is kind of
> > unrealistic.
>
> Yeah, giving this helper function a "--notick" option, without
> adding this "--date" option, is a better design, I suspect.

Agreed. Thanks, both.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v3 26/30] pack-bitmap.c: use pseudo-merges during traversal
  2024-05-23 10:48     ` Jeff King
@ 2024-05-23 18:23       ` Taylor Blau
  0 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 18:23 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano

On Thu, May 23, 2024 at 06:48:37AM -0400, Jeff King wrote:
> On Tue, May 21, 2024 at 03:03:03PM -0400, Taylor Blau wrote:
>
> > The basic operation is as follows:
> >
> >   - When enumerating objects on either side of a reachability query,
> >     first see if any subset of the roots satisfies some pseudo-merge
> >     bitmap. If it does, apply that pseudo-merge bitmap.
> >
> >   - If any pseudo-merge bitmap(s) were applied in the previous step, OR
> >     them into the result[^1]. Then repeat the process over all
> >     pseudo-merge bitmaps (we'll refer to this as "cascading"
> >     pseudo-merges). Once this is done, OR in the resulting bitmap.
> >
> >   - If there is no fill-in traversal to be done, return the bitmap for
> >     that side of the reachability query. If there is fill-in traversal,
> >     then for each commit we encounter via show_commit(), check to see if
> >     any unsatisfied pseudo-merges containing that commit as one of its
> >     parents has been made satisfied by the presence of that commit.
> >
> >     If so, OR in the object set from that pseudo-merge bitmap, and then
> >     cascade. If not, continue traversal.
>
> Ah, OK. This is the high-level overview I was looking for in the earlier
> commit. ;) I think it is fine here. I just hadn't gotten to it yet (and
> I think it is much better stated than what I wrote in my earlier
> response).

Good, thanks.

> > [^1]: Importantly, we cannot OR in the entire set of roots along with
> >   the objects reachable from whatever pseudo-merge bitmaps were
> >   satisfied.  This may leave some dangling bits corresponding to any
> >   unsatisfied root(s) getting OR'd into the resulting bitmap, tricking
> >   other parts of the traversal into thinking we already have a
> >   reachability closure over those commit(s) when we do not.
>
> I think I know how you made this realization. :)

;-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v3 30/30] t/perf: implement performace tests for pseudo-merge bitmaps
  2024-05-23 10:54     ` Jeff King
@ 2024-05-23 19:53       ` Taylor Blau
  2024-05-25  3:13         ` Jeff King
  0 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 19:53 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano

On Thu, May 23, 2024 at 06:54:22AM -0400, Jeff King wrote:
> On Tue, May 21, 2024 at 03:03:17PM -0400, Taylor Blau wrote:
>
> > Implement a straightforward performance test demonstrating the benefit
> > of pseudo-merge bitmaps by measuring how long it takes to count
> > reachable objects in a few different scenarios:
> >
> >   - without bitmaps, to demonstrate a reasonable baseline
> >   - with bitmaps, but without pseudo-merges
> >   - with bitmaps and pseudo-merges
> >
> > Results from running this test on git.git are as follows:
> >
> >     Test                                                                this tree
> >     -----------------------------------------------------------------------------------
> >     5333.2: git rev-list --count --all --objects (no bitmaps)           3.46(3.37+0.09)
> >     5333.3: git rev-list --count --all --objects (no pseudo-merges)     0.13(0.11+0.01)
> >     5333.4: git rev-list --count --all --objects (with pseudo-merges)   0.12(0.11+0.01)
>
> That's not a very exciting result.

I suspect some of it might have to do with:

--- 8< ---
diff --git a/t/perf/p5333-pseudo-merge-bitmaps.sh b/t/perf/p5333-pseudo-merge-bitmaps.sh
index 4bec409d10..2e8b1d2635 100755
--- a/t/perf/p5333-pseudo-merge-bitmaps.sh
+++ b/t/perf/p5333-pseudo-merge-bitmaps.sh
@@ -20,12 +20,12 @@ test_perf 'git rev-list --count --all --objects (no bitmaps)' '
 '

 test_perf 'git rev-list --count --all --objects (no pseudo-merges)' '
-	GIT_TEST_USE_PSEDUO_MERGES=0 \
+	GIT_TEST_USE_PSEUDO_MERGES=0 \
 		git rev-list --objects --all --use-bitmap-index
 '

 test_perf 'git rev-list --count --all --objects (with pseudo-merges)' '
-	GIT_TEST_USE_PSEDUO_MERGES=1 \
+	GIT_TEST_USE_PSEUDO_MERGES=1 \
 		git rev-list --objects --all --use-bitmap-index
 '
--- > 8---

Sure enough, that shows us a little gap between the "no pseudo-merges"
and "with pseudo-merges" case:

```
Test                                                                this tree
-----------------------------------------------------------------------------------
5333.2: git rev-list --count --all --objects (no bitmaps)           3.54(3.45+0.08)
5333.3: git rev-list --count --all --objects (no pseudo-merges)     0.43(0.40+0.03)
5333.4: git rev-list --count --all --objects (with pseudo-merges)   0.12(0.11+0.01)
```

> I have a feeling that your git.git is not a very interesting test
> case. We'd want a lot of refs, and ones that are old and have bushy
> history that is not included in the more recent branches. So something
> like a bunch of old unmerged pull request heads, for example. ;) Do
> you have more interesting numbers for something like that?

Indeed, here's one for a private repository which meets that criteria:

```
Test                                                                this tree
---------------------------------------------------------------------------------------
5333.1: git rev-list --count --all --objects (no bitmaps)           122.29(121.31+0.97)
5333.2: git rev-list --count --all --objects (no pseudo-merges)     21.88(21.30+0.58)
5333.3: git rev-list --count --all --objects (with pseudo-merges)   5.05(4.77+0.28)
```

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 157+ messages in thread

* Re: [PATCH v3 00/30] pack-bitmap: pseudo-merge reachability bitmaps
  2024-05-23 11:05   ` [PATCH v3 00/30] pack-bitmap: pseudo-merge reachability bitmaps Jeff King
@ 2024-05-23 20:04     ` Taylor Blau
  2024-05-25  3:15       ` Jeff King
  2024-05-23 20:42     ` Taylor Blau
  1 sibling, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 20:04 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano

On Thu, May 23, 2024 at 07:05:32AM -0400, Jeff King wrote:
> I wonder if the start of the pseudo-merge section should have a 4-byte
> version/flags field itself? I don't think that's something we've done
> before, and maybe it's overkill. I dunno. It's just a lot easier to do
> now than later.

I think the tricky thing here would be that the extension itself is a
variable size, so every version would have to put the "extension size"
field in the same place.

Otherwise, an older Git client which doesn't understand a future version
of the pseudo-merge extension wouldn't know how large the extension is,
and wouldn't be able to adjust the index_end field appropriately to skip
over it.

Of course, we could make it a convention that says "all versions have to
place the extension size field at the same relative offset", but it
feels weird to read some of the extension while not understanding the
whole thing.

I'm definitely not saying that I think the specification ought to be set
in stone forever, but I think any changes would want to be behind a new
bitmap extension, not a version within the same extension.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v3 00/30] pack-bitmap: pseudo-merge reachability bitmaps
  2024-05-23 11:05   ` [PATCH v3 00/30] pack-bitmap: pseudo-merge reachability bitmaps Jeff King
  2024-05-23 20:04     ` Taylor Blau
@ 2024-05-23 20:42     ` Taylor Blau
  1 sibling, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 20:42 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano

On Thu, May 23, 2024 at 07:05:32AM -0400, Jeff King wrote:
> On Tue, May 21, 2024 at 03:01:38PM -0400, Taylor Blau wrote:
>
> > Here is another reroll my topic to introduce pseudo-merge bitmaps.
>
> OK, I got through the whole thing. I left a few small comments, but
> mostly just observations. Overall, the shape of it looks pretty good.
> The much bigger question to me is: does it work?

Thanks for working through this series, I appreciate you taking a close
look, especially since it's on the longer end.

> The perf results you showed at the end are underwhelming, but I think
> that's mostly because it's not an interesting repository. I think it
> would be nice to see at least some point-in-time benchmarks on a
> single repository.

I'd say it's mostly because I misspelt "pseudo" as "psuedo", but either
way ;-).

> I know you don't have those answers yet, and I know there's some
> chicken-and-egg with getting this integrated so that you can start to
> explore that. So I mostly reviewed this with an eye towards:
>
>   - does the idea make sense (I think it does, but I'm kind of biased)
>
>   - are the patches going to hurt anybody who isn't using the new
>     feature (I think the answer is no)
>
>   - does the on-disk representation seem right, since that is hard to
>     change later (I didn't see any issues)
>
>   - does the implementation look clean and plausibly correct (yes, but
>     what I'm getting at is that I didn't pore over all of the new code
>     with a microscope. Mostly I think the proof is in the pudding that
>     it provides the same correct answers more quickly).
>
> So to my eyes it looks good to move forward and let people start playing
> with it. The big "experimental" warning in the config is good. Maybe
> we'd want want in gitpacking(7), too?

I think adding a matching warning in gitpacking(7) is a good idea. Your
feeling matches my own here that the burden of this series on those not
using pseudo-merge bitmaps is negligible, and that this unblocks those
who want to move forward with testing this out in the wild on doing so.

I'll send a v4 shortly with the minor changes that I picked up from your
last read-through, and then I think we are good to go here.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 157+ messages in thread

* [PATCH v4 00/24] pack-bitmap: pseudo-merge reachability bitmaps
  2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
                   ` (26 preceding siblings ...)
  2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
@ 2024-05-23 21:26 ` Taylor Blau
  2024-05-23 21:26   ` [PATCH v4 01/24] Documentation/gitpacking.txt: initial commit Taylor Blau
                     ` (24 more replies)
  27 siblings, 25 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:26 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Here is another reroll my topic to introduce pseudo-merge bitmaps.

The implementation is still relatively unchanged compared to last time,
save for the review that Peff provided on the remaining parts of this
series.

As usual, there is a range-diff below, but the significant changes since
last time are as follows:

  - replace git_config_float() with git_config_double() (and matching
    tweaks in the callers)

  - add a NOTE to gitpacking(7) reflecting that the pseudo-merge bitmaps
    feature is considered experimental

  - add a length check when reading extended pseudo merges via the
    pseudo_merge_ext_at() function

  - replace the new `--date` option for `test_commit_bulk` with a
    `--notick` option (and set the GIT_COMMITTER_DATE values
    appropriately at the callers)

  - fix broken performance tests due to a typo on "pseudo", and include
    results from a large repository.

The series is still based on 'tb/pack-bitmap-write-cleanups'.

Taylor Blau (24):
  Documentation/gitpacking.txt: initial commit
  Documentation/gitpacking.txt: describe pseudo-merge bitmaps
  Documentation/technical: describe pseudo-merge bitmaps format
  ewah: implement `ewah_bitmap_is_subset()`
  pack-bitmap: move some initialization to `bitmap_writer_init()`
  pseudo-merge.ch: initial commit
  pack-bitmap-write: support storing pseudo-merge commits
  pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()`
  pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public
  config: introduce `git_config_double()`
  pseudo-merge: implement support for selecting pseudo-merge commits
  pack-bitmap-write.c: write pseudo-merge table
  pack-bitmap: extract `read_bitmap()` function
  pseudo-merge: scaffolding for reads
  pack-bitmap.c: read pseudo-merge extension
  pseudo-merge: implement support for reading pseudo-merge commits
  ewah: implement `ewah_bitmap_popcount()`
  pack-bitmap: implement test helpers for pseudo-merge
  t/test-lib-functions.sh: support `--notick` in `test_commit_bulk()`
  pack-bitmap.c: use pseudo-merges during traversal
  pack-bitmap: extra trace2 information
  ewah: `bitmap_equals_ewah()`
  pseudo-merge: implement support for finding existing merges
  t/perf: implement performance tests for pseudo-merge bitmaps

 Documentation/Makefile                       |   1 +
 Documentation/config.txt                     |   2 +
 Documentation/config/bitmap-pseudo-merge.txt |  91 +++
 Documentation/gitpacking.txt                 | 189 +++++
 Documentation/technical/bitmap-format.txt    | 132 ++++
 Makefile                                     |   1 +
 builtin/pack-objects.c                       |   3 +-
 config.c                                     |   9 +
 config.h                                     |   7 +
 ewah/bitmap.c                                |  76 ++
 ewah/ewok.h                                  |   8 +
 midx-write.c                                 |   2 +-
 object.h                                     |   2 +-
 pack-bitmap-write.c                          | 274 ++++++-
 pack-bitmap.c                                | 359 ++++++++-
 pack-bitmap.h                                |  19 +-
 parse.c                                      |  29 +
 parse.h                                      |   1 +
 pseudo-merge.c                               | 756 +++++++++++++++++++
 pseudo-merge.h                               | 216 ++++++
 t/helper/test-bitmap.c                       |  34 +-
 t/perf/p5333-pseudo-merge-bitmaps.sh         |  32 +
 t/t5333-pseudo-merge-bitmaps.sh              | 393 ++++++++++
 t/test-lib-functions.sh                      |   9 +-
 24 files changed, 2590 insertions(+), 55 deletions(-)
 create mode 100644 Documentation/config/bitmap-pseudo-merge.txt
 create mode 100644 Documentation/gitpacking.txt
 create mode 100644 pseudo-merge.c
 create mode 100644 pseudo-merge.h
 create mode 100755 t/perf/p5333-pseudo-merge-bitmaps.sh
 create mode 100755 t/t5333-pseudo-merge-bitmaps.sh

Range-diff against v3:
 -:  ----------- >  1:  0f20c9becf4 Documentation/gitpacking.txt: initial commit
 1:  528b591bd84 !  2:  48afaa74928 Documentation/gitpacking.txt: describe pseudo-merge bitmaps
    @@ Documentation/gitpacking.txt: There are many aspects of packing in Git that are
      
     +== Pseudo-merge bitmaps
     +
    ++NOTE: Pseudo-merge bitmaps are considered an experimental feature, so
    ++the configuration and many of the ideas are subject to change.
    ++
     +=== Background
     +
     +Reachability bitmaps are most efficient when we have on-disk stored
 2:  12f318b3d7e =  3:  44046f83c1a Documentation/technical: describe pseudo-merge bitmaps format
 3:  40eb6137618 =  4:  211d6f14128 ewah: implement `ewah_bitmap_is_subset()`
 4:  487fb7c6e9c =  5:  650cac2dcf9 pack-bitmap: move some initialization to `bitmap_writer_init()`
 5:  827732acf99 =  6:  6647d8832ce pseudo-merge.ch: initial commit
 6:  8608dd1860f =  7:  e8ef1ef5ee4 pack-bitmap-write: support storing pseudo-merge commits
 7:  99d2b6872ba =  8:  fe458728c8a pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()`
 8:  e7209c60fa5 =  9:  6bf372f4020 pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public
 9:  3070135eb4b ! 10:  6c77671ae9c config: introduce git_config_float()
    @@ Metadata
     Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
    -    config: introduce git_config_float()
    +    config: introduce `git_config_double()`
     
    -    Future commits will want to parse a floating point value from
    -    configuration, but we have no way to parse such a value prior to this
    -    patch.
    +    Future commits will want to parse a double-precision floating point
    +    value from configuration, but we have no way to parse such a value prior
    +    to this patch.
     
    -    The core of the routine is implemented in git_parse_float(). Unlike
    +    The core of the routine is implemented in git_parse_double(). Unlike
         git_parse_unsigned() and git_parse_signed(), however, the function
    -    implemented here only works on type "float", and not related types like
    -    "double", or "long double".
    +    implemented here only works on type "double", and not related types like
    +    "float", or "long double".
     
    -    This is because "double" and "long double" use different functions to
    -    convert from ASCII strings to floating point values (strtod() and
    +    This is because "float" and "long double" use different functions to
    +    convert from ASCII strings to floating point values (strtof() and
         strtold(), respectively). Likewise, there is no pointer type that can
         assign to any of these values (except for "void *"), so the only way to
         define this trio of functions would be with a macro expansion that is
         parameterized over the floating point type and conversion function.
     
         That is all doable, but likely to be overkill given our current needs,
    -    which is only to parse floats.
    +    which is only to parse double-precision floats.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    @@ config.c: ssize_t git_config_ssize_t(const char *name, const char *value,
      	return ret;
      }
      
    -+float git_config_float(const char *name, const char *value,
    -+		       const struct key_value_info *kvi)
    ++double git_config_double(const char *name, const char *value,
    ++			 const struct key_value_info *kvi)
     +{
    -+	float ret;
    -+	if (!git_parse_float(value, &ret))
    ++	double ret;
    ++	if (!git_parse_double(value, &ret))
     +		die_bad_number(name, value, kvi);
     +	return ret;
     +}
    @@ config.h: unsigned long git_config_ulong(const char *, const char *,
      			   const struct key_value_info *);
      
     +/**
    -+ * Identical to `git_config_int`, but for floating point values.
    ++ * Identically to `git_config_double`, but for double-precision floating point
    ++ * values.
     + */
    -+float git_config_float(const char *, const char *,
    -+		       const struct key_value_info *);
    ++double git_config_double(const char *, const char *,
    ++			 const struct key_value_info *);
     +
      /**
       * Same as `git_config_bool`, except that integers are returned as-is, and
    @@ parse.c: int git_parse_ssize_t(const char *value, ssize_t *ret)
      	return 1;
      }
      
    -+int git_parse_float(const char *value, float *ret)
    ++int git_parse_double(const char *value, double *ret)
     +{
     +	char *end;
    -+	float val;
    ++	double val;
     +	uintmax_t factor;
     +
     +	if (!value || !*value) {
    @@ parse.c: int git_parse_ssize_t(const char *value, ssize_t *ret)
     +	}
     +
     +	errno = 0;
    -+	val = strtof(value, &end);
    ++	val = strtod(value, &end);
     +	if (errno == ERANGE)
     +		return 0;
     +	if (end == value) {
    @@ parse.h: int git_parse_ssize_t(const char *, ssize_t *);
      int git_parse_ulong(const char *, unsigned long *);
      int git_parse_int(const char *value, int *ret);
      int git_parse_int64(const char *value, int64_t *ret);
    -+int git_parse_float(const char *value, float *ret);
    ++int git_parse_double(const char *value, double *ret);
      
      /**
       * Same as `git_config_bool`, except that it returns -1 on error rather
10:  3029473c094 ! 11:  180072ce848 pseudo-merge: implement support for selecting pseudo-merge commits
    @@ Documentation/config/bitmap-pseudo-merge.txt (new)
     @@
     +NOTE: The configuration options in `bitmapPseudoMerge.*` are considered
     +EXPERIMENTAL and may be subject to change or be removed entirely in the
    -+future.
    ++future. For more information about the pseudo-merge bitmap feature, see
    ++the "Pseudo-merge bitmaps" section of linkgit:gitpacking[7].
     +
     +bitmapPseudoMerge.<name>.pattern::
     +	Regular expression used to match reference names. Commits
    @@ pseudo-merge.c
     +#include "alloc.h"
     +#include "progress.h"
     +
    -+#define DEFAULT_PSEUDO_MERGE_DECAY 1.0f
    ++#define DEFAULT_PSEUDO_MERGE_DECAY 1.0
     +#define DEFAULT_PSEUDO_MERGE_MAX_MERGES 64
     +#define DEFAULT_PSEUDO_MERGE_SAMPLE_RATE 1
     +#define DEFAULT_PSEUDO_MERGE_THRESHOLD approxidate("1.week.ago")
     +#define DEFAULT_PSEUDO_MERGE_STABLE_THRESHOLD approxidate("1.month.ago")
     +#define DEFAULT_PSEUDO_MERGE_STABLE_SIZE 512
     +
    -+static float gitexp(float base, int exp)
    ++static double gitexp(double base, int exp)
     +{
    -+	float result = 1;
    ++	double result = 1;
     +	while (1) {
     +		if (exp % 2)
     +			result *= base;
    @@ pseudo-merge.c
     +					const struct pseudo_merge_matches *matches,
     +					uint32_t i)
     +{
    -+	float C = 0.0f;
    ++	double C = 0.0f;
     +	uint32_t n;
     +
     +	/*
    @@ pseudo-merge.c
     +	 *   { 5012, 1772, 964, 626, 448, 341, 271, 221, 186, 158 }
     +	 */
     +	for (n = 0; n < group->max_merges; n++)
    -+		C += 1.0f / gitexp(n + 1, group->decay);
    ++		C += 1.0 / gitexp(n + 1, group->decay);
     +	C = matches->unstable_nr / C;
     +
     +	return (uint32_t)((C / gitexp(i + 1, group->decay)) + 0.5);
    @@ pseudo-merge.c
     +
     +		strbuf_release(&re);
     +	} else if (!strcmp(key, "decay")) {
    -+		group->decay = git_config_float(var, value, ctx->kvi);
    ++		group->decay = git_config_double(var, value, ctx->kvi);
     +		if (group->decay < 0) {
     +			warning(_("%s must be non-negative, using default"), var);
     +			group->decay = DEFAULT_PSEUDO_MERGE_DECAY;
     +		}
     +	} else if (!strcmp(key, "samplerate")) {
    -+		group->sample_rate = git_config_float(var, value, ctx->kvi);
    ++		group->sample_rate = git_config_double(var, value, ctx->kvi);
     +		if (!(0 <= group->sample_rate && group->sample_rate <= 1)) {
     +			warning(_("%s must be between 0 and 1, using default"), var);
     +			group->sample_rate = DEFAULT_PSEUDO_MERGE_SAMPLE_RATE;
    @@ pseudo-merge.c
     +			struct commit *c = matches->unstable[j];
     +			struct pseudo_merge_commit_idx *pmc;
     +
    -+			if (j % (uint32_t)(1.0f / group->sample_rate))
    ++			if (j % (uint32_t)(1.0 / group->sample_rate))
     +				continue;
     +
     +			pmc = pseudo_merge_idx(writer->pseudo_merge_commits,
    @@ pseudo-merge.h
     +	 * Pseudo-merge grouping parameters. See git-config(1) for
     +	 * more information.
     +	 */
    -+	float decay;
    ++	double decay;
     +	int max_merges;
    -+	float sample_rate;
    ++	double sample_rate;
     +	int stable_size;
     +	timestamp_t threshold;
     +	timestamp_t stable_threshold;
11:  311226f65c2 = 12:  90df19e43f5 pack-bitmap-write.c: write pseudo-merge table
12:  55dd7a8023e = 13:  c653a10f8e4 pack-bitmap: extract `read_bitmap()` function
13:  3cc5434e44e = 14:  435ac048003 pseudo-merge: scaffolding for reads
14:  7664f5f9648 = 15:  fa7a948964c pack-bitmap.c: read pseudo-merge extension
15:  8ba0a9c5402 ! 16:  3a72e66cb69 pseudo-merge: implement support for reading pseudo-merge commits
    @@ pseudo-merge.c
      #include "progress.h"
     +#include "hex.h"
      
    - #define DEFAULT_PSEUDO_MERGE_DECAY 1.0f
    + #define DEFAULT_PSEUDO_MERGE_DECAY 1.0
      #define DEFAULT_PSEUDO_MERGE_MAX_MERGES 64
     @@ pseudo-merge.c: void free_pseudo_merge_map(struct pseudo_merge_map *pm)
      	}
    @@ pseudo-merge.c: void free_pseudo_merge_map(struct pseudo_merge_map *pm)
     +		return error(_("extended pseudo-merge read out-of-bounds "
     +			       "(%"PRIuMAX" >= %"PRIuMAX")"),
     +			     (uintmax_t)at, (uintmax_t)pm->map_size);
    ++	if (at + 4 >= pm->map_size)
    ++		return error(_("extended pseudo-merge entry is too short "
    ++			       "(%"PRIuMAX" >= %"PRIuMAX")"),
    ++			     (uintmax_t)(at + 4), (uintmax_t)pm->map_size);
     +
     +	ext->nr = get_be32(pm->map + at);
     +	ext->ptr = pm->map + at + sizeof(uint32_t);
16:  2c02f303b6f = 17:  42a836fda8a ewah: implement `ewah_bitmap_popcount()`
17:  82cce72bf55 = 18:  06ba1a5bbfd pack-bitmap: implement test helpers for pseudo-merge
18:  890f6c4b9de ! 19:  936f6d1b7e3 t/test-lib-functions.sh: support `--date` in `test_commit_bulk()`
    @@ Metadata
     Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
    -    t/test-lib-functions.sh: support `--date` in `test_commit_bulk()`
    +    t/test-lib-functions.sh: support `--notick` in `test_commit_bulk()`
     
         One of the tests we'll want to add for pseudo-merge bitmaps needs to be
         able to generate a large number of commits at a specific date.
     
    -    Support the `--date` option (with identical semantics to the `--date`
    -    option for `test_commit()`) within `test_commit_bulk` as a prerequisite
    -    for that.
    +    Support the `--notick` option (with identical semantics to the
    +    `--notick` option for `test_commit()`) within `test_commit_bulk` as a
    +    prerequisite for that. Callers can then set the various _DATE variables
    +    themselves.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    @@ t/test-lib-functions.sh: test_commit_bulk () {
      			filename="${1#--*=}-%s.t"
      			contents="${1#--*=} %s"
      			;;
    -+		--date)
    ++		--notick)
     +			notick=yes
    -+			GIT_COMMITTER_DATE="$2"
    -+			GIT_AUTHOR_DATE="$2"
    -+			shift
     +			;;
      		-*)
      			BUG "invalid test_commit_bulk option: $1"
19:  41691824f78 ! 20:  cad38608aae pack-bitmap.c: use pseudo-merges during traversal
    @@ t/t5333-pseudo-merge-bitmaps.sh (new)
     +		new="1672549200" && # 2023-01-01
     +		old="1641013200" && # 2022-01-01
     +
    -+		test_commit_bulk --message="new" --date "$new +0000" 128 &&
    -+		test_commit_bulk --message="old" --date "$old +0000" 128 &&
    -+		test_tick &&
    ++		GIT_COMMITTER_DATE="$new +0000" &&
    ++		export GIT_COMMITTER_DATE &&
    ++		test_commit_bulk --message="new" --notick 128 &&
    ++
    ++		GIT_COMMITTER_DATE="$old +0000" &&
    ++		export GIT_COMMITTER_DATE &&
    ++		test_commit_bulk --message="old" --notick 128 &&
     +
     +		tag_everything &&
     +
    @@ t/t5333-pseudo-merge-bitmaps.sh (new)
     +		mid="1654059600" && # 2022-06-01
     +		old="1641013200" && # 2022-01-01
     +
    -+		test_commit_bulk --message="mid" --date "$mid +0000" 128 &&
    -+		test_tick &&
    ++		GIT_COMMITTER_DATE="$mid +0000" &&
    ++		export GIT_COMMITTER_DATE &&
    ++		test_commit_bulk --message="mid" --notick 128 &&
     +
     +		git for-each-ref --format="delete %(refname)" refs/tags >in &&
     +		git update-ref --stdin <in &&
20:  a34a60c3ef8 = 21:  9240b06a7d8 pack-bitmap: extra trace2 information
21:  da2fb5b4b48 = 22:  625596a1432 ewah: `bitmap_equals_ewah()`
22:  ff21247281f ! 23:  fdd506d4544 pseudo-merge: implement support for finding existing merges
    @@ t/t5333-pseudo-merge-bitmaps.sh: test_expect_success 'pseudo-merge overlap stale
     +		stable="1641013200" && # 2022-01-01
     +		unstable="1672549200" && # 2023-01-01
     +
    -+		for date in $stable $unstable
    -+		do
    -+			test_commit_bulk --date "$date +0000" 128 &&
    -+			test_tick || return 1
    -+		done &&
    ++		GIT_COMMITTER_DATE="$stable +0000" &&
    ++		export GIT_COMMITTER_DATE &&
    ++		test_commit_bulk --notick 128 &&
    ++		GIT_COMMITTER_DATE="$unstable +0000" &&
    ++		export GIT_COMMITTER_DATE &&
    ++		test_commit_bulk --notick 128 &&
     +
     +		tag_everything &&
     +
23:  6a6d88fa512 ! 24:  cf0316ad0e9 t/perf: implement performace tests for pseudo-merge bitmaps
    @@ Metadata
     Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
    -    t/perf: implement performace tests for pseudo-merge bitmaps
    +    t/perf: implement performance tests for pseudo-merge bitmaps
     
         Implement a straightforward performance test demonstrating the benefit
         of pseudo-merge bitmaps by measuring how long it takes to count
    @@ Commit message
     
             Test                                                                this tree
             -----------------------------------------------------------------------------------
    -        5333.2: git rev-list --count --all --objects (no bitmaps)           3.46(3.37+0.09)
    -        5333.3: git rev-list --count --all --objects (no pseudo-merges)     0.13(0.11+0.01)
    +        5333.2: git rev-list --count --all --objects (no bitmaps)           3.54(3.45+0.08)
    +        5333.3: git rev-list --count --all --objects (no pseudo-merges)     0.43(0.40+0.03)
             5333.4: git rev-list --count --all --objects (with pseudo-merges)   0.12(0.11+0.01)
     
    +    On a private repository which is much larger, and has many spikey parts
    +    of history that aren't merged into the 'master' branch, the results are
    +    as follows:
    +
    +        Test                                                                this tree
    +        ---------------------------------------------------------------------------------------
    +        5333.1: git rev-list --count --all --objects (no bitmaps)           122.29(121.31+0.97)
    +        5333.2: git rev-list --count --all --objects (no pseudo-merges)     21.88(21.30+0.58)
    +        5333.3: git rev-list --count --all --objects (with pseudo-merges)   5.05(4.77+0.28)
    +
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## t/perf/p5333-pseudo-merge-bitmaps.sh (new) ##
    @@ t/perf/p5333-pseudo-merge-bitmaps.sh (new)
     +'
     +
     +test_perf 'git rev-list --count --all --objects (no pseudo-merges)' '
    -+	GIT_TEST_USE_PSEDUO_MERGES=0 \
    ++	GIT_TEST_USE_PSEUDO_MERGES=0 \
     +		git rev-list --objects --all --use-bitmap-index
     +'
     +
     +test_perf 'git rev-list --count --all --objects (with pseudo-merges)' '
    -+	GIT_TEST_USE_PSEDUO_MERGES=1 \
    ++	GIT_TEST_USE_PSEUDO_MERGES=1 \
     +		git rev-list --objects --all --use-bitmap-index
     +'
     +

base-commit: bf65967764f34adc2ca00d4c8195840ad3e4e127
-- 
2.45.1.175.gcf0316ad0e9

^ permalink raw reply	[flat|nested] 157+ messages in thread

* [PATCH v4 01/24] Documentation/gitpacking.txt: initial commit
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
@ 2024-05-23 21:26   ` Taylor Blau
  2024-05-23 21:26   ` [PATCH v4 02/24] Documentation/gitpacking.txt: describe pseudo-merge bitmaps Taylor Blau
                     ` (23 subsequent siblings)
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:26 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Introduce a new manual page, gitpacking(7) to collect useful information
about advanced packing concepts in Git.

In future commits in this series, this manual page will expand to
describe the new pseudo-merge bitmaps feature, as well as include
examples, relevant configuration bits, use-cases, and so on.

Outside of this series, this manual page may absorb similar pieces from
other parts of Git's documentation about packing.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/Makefile       |  1 +
 Documentation/gitpacking.txt | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)
 create mode 100644 Documentation/gitpacking.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 3f2383a12c7..920b6248aa4 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -51,6 +51,7 @@ MAN7_TXT += gitdiffcore.txt
 MAN7_TXT += giteveryday.txt
 MAN7_TXT += gitfaq.txt
 MAN7_TXT += gitglossary.txt
+MAN7_TXT += gitpacking.txt
 MAN7_TXT += gitnamespaces.txt
 MAN7_TXT += gitremote-helpers.txt
 MAN7_TXT += gitrevisions.txt
diff --git a/Documentation/gitpacking.txt b/Documentation/gitpacking.txt
new file mode 100644
index 00000000000..50e9900d845
--- /dev/null
+++ b/Documentation/gitpacking.txt
@@ -0,0 +1,34 @@
+gitpacking(7)
+=============
+
+NAME
+----
+gitpacking - Advanced concepts related to packing in Git
+
+SYNOPSIS
+--------
+gitpacking
+
+DESCRIPTION
+-----------
+
+This document aims to describe some advanced concepts related to packing
+in Git.
+
+Many concepts are currently described scattered between manual pages of
+various Git commands, including linkgit:git-pack-objects[1],
+linkgit:git-repack[1], and others, as well as linkgit:gitformat-pack[5],
+and parts of the `Documentation/technical` tree.
+
+There are many aspects of packing in Git that are not covered in this
+document that instead live in the aforementioned areas. Over time, those
+scattered bits may coalesce into this document.
+
+SEE ALSO
+--------
+linkgit:git-pack-objects[1]
+linkgit:git-repack[1]
+
+GIT
+---
+Part of the linkgit:git[1] suite
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 02/24] Documentation/gitpacking.txt: describe pseudo-merge bitmaps
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
  2024-05-23 21:26   ` [PATCH v4 01/24] Documentation/gitpacking.txt: initial commit Taylor Blau
@ 2024-05-23 21:26   ` Taylor Blau
  2024-05-23 21:26   ` [PATCH v4 03/24] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
                     ` (22 subsequent siblings)
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:26 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Add some details to the gitpacking(7) manual page which motivate and
describe pseudo-merge bitmaps.

The exact on-disk format and many of the configuration knobs will be
described in subsequent commits.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/gitpacking.txt | 72 ++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/Documentation/gitpacking.txt b/Documentation/gitpacking.txt
index 50e9900d845..f24396f0173 100644
--- a/Documentation/gitpacking.txt
+++ b/Documentation/gitpacking.txt
@@ -24,6 +24,78 @@ There are many aspects of packing in Git that are not covered in this
 document that instead live in the aforementioned areas. Over time, those
 scattered bits may coalesce into this document.
 
+== Pseudo-merge bitmaps
+
+NOTE: Pseudo-merge bitmaps are considered an experimental feature, so
+the configuration and many of the ideas are subject to change.
+
+=== Background
+
+Reachability bitmaps are most efficient when we have on-disk stored
+bitmaps for one or more of the starting points of a traversal. For this
+reason, Git prefers storing bitmaps for commits at the tips of refs,
+because traversals tend to start with those points.
+
+But if you have a large number of refs, it's not feasible to store a
+bitmap for _every_ ref tip. It takes up space, and just OR-ing all of
+those bitmaps together is expensive.
+
+One way we can deal with that is to create bitmaps that represent
+_groups_ of refs. When a traversal asks about the entire group, then we
+can use this single bitmap instead of considering each ref individually.
+Because these bitmaps represent the set of objects which would be
+reachable in a hypothetical merge of all of the commits, we call them
+pseudo-merge bitmaps.
+
+=== Overview
+
+A "pseudo-merge bitmap" is used to refer to a pair of bitmaps, as
+follows:
+
+Commit bitmap::
+
+  A bitmap whose set bits describe the set of commits included in the
+  pseudo-merge's "merge" bitmap (as below).
+
+Merge bitmap::
+
+  A bitmap whose set bits describe the reachability closure over the set
+  of commits in the pseudo-merge's "commits" bitmap (as above). An
+  identical bitmap would be generated for an octopus merge with the same
+  set of parents as described in the commits bitmap.
+
+Pseudo-merge bitmaps can accelerate bitmap traversals when all commits
+for a given pseudo-merge are listed on either side of the traversal,
+either directly (by explicitly asking for them as part of the `HAVES`
+or `WANTS`) or indirectly (by encountering them during a fill-in
+traversal).
+
+=== Use-cases
+
+For example, suppose there exists a pseudo-merge bitmap with a large
+number of commits, all of which are listed in the `WANTS` section of
+some bitmap traversal query. When pseudo-merge bitmaps are enabled, the
+bitmap machinery can quickly determine there is a pseudo-merge which
+satisfies some subset of the wanted objects on either side of the query.
+Then, we can inflate the EWAH-compressed bitmap, and `OR` it in to the
+resulting bitmap. By contrast, without pseudo-merge bitmaps, we would
+have to repeat the decompression and `OR`-ing step over a potentially
+large number of individual bitmaps, which can take proportionally more
+time.
+
+Another benefit of pseudo-merges arises when there is some combination
+of (a) a large number of references, with (b) poor bitmap coverage, and
+(c) deep, nested trees, making fill-in traversal relatively expensive.
+For example, suppose that there are a large enough number of tags where
+bitmapping each of the tags individually is infeasible. Without
+pseudo-merge bitmaps, computing the result of, say, `git rev-list
+--use-bitmap-index --count --objects --tags` would likely require a
+large amount of fill-in traversal. But when a large quantity of those
+tags are stored together in a pseudo-merge bitmap, the bitmap machinery
+can take advantage of the fact that we only care about the union of
+objects reachable from all of those tags, and answer the query much
+faster.
+
 SEE ALSO
 --------
 linkgit:git-pack-objects[1]
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 03/24] Documentation/technical: describe pseudo-merge bitmaps format
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
  2024-05-23 21:26   ` [PATCH v4 01/24] Documentation/gitpacking.txt: initial commit Taylor Blau
  2024-05-23 21:26   ` [PATCH v4 02/24] Documentation/gitpacking.txt: describe pseudo-merge bitmaps Taylor Blau
@ 2024-05-23 21:26   ` Taylor Blau
  2024-05-23 21:26   ` [PATCH v4 04/24] ewah: implement `ewah_bitmap_is_subset()` Taylor Blau
                     ` (21 subsequent siblings)
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:26 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Prepare to implement pseudo-merge bitmaps over the next several commits
by first describing the serialization format which will store the new
pseudo-merge bitmaps themselves.

This format is implemented as an optional extension within the bitmap v1
format, making it compatible with previous versions of Git, as well as
the original .bitmap implementation within JGit.

The format is described in detail in the patch contents below, but the
high-level description is as follows:

  - An array of pseudo-merge bitmaps, each containing a pair of EWAH
    bitmaps: one describing the set of pseudo-merge "parents", and
    another describing the set of object(s) reachable from those
    parents.

  - A lookup table to determine which pseudo-merge(s) a given commit
    appears in. An optional extended lookup table follows when there is
    at least one commit which appears in multiple pseudo-merge groups.

  - Trailing metadata, including the number of pseudo-merge(s), number
    of unique parents, the offset within the .bitmap file for the
    pseudo-merge commit lookup table, and the size of the optional
    extension itself.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/bitmap-format.txt | 132 ++++++++++++++++++++++
 1 file changed, 132 insertions(+)

diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index f5d200939b0..ee7775a2586 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -255,3 +255,135 @@ triplet is -
 	xor_row (4 byte integer, network byte order): ::
 	The position of the triplet whose bitmap is used to compress
 	this one, or `0xffffffff` if no such bitmap exists.
+
+Pseudo-merge bitmaps
+--------------------
+
+If the `BITMAP_OPT_PSEUDO_MERGES` flag is set, a variable number of
+bytes (preceding the name-hash cache, commit lookup table, and trailing
+checksum) of the `.bitmap` file is used to store pseudo-merge bitmaps.
+
+For more information on what pseudo-merges are, why they are useful, and
+how to configure them, see the information in linkgit:gitpacking[7].
+
+=== File format
+
+If enabled, pseudo-merge bitmaps are stored in an optional section at
+the end of a `.bitmap` file. The format is as follows:
+
+....
++-------------------------------------------+
+|               .bitmap File                |
++-------------------------------------------+
+|                                           |
+|  Pseudo-merge bitmaps (Variable Length)   |
+|  +---------------------------+            |
+|  | commits_bitmap (EWAH)     |            |
+|  +---------------------------+            |
+|  | merge_bitmap (EWAH)       |            |
+|  +---------------------------+            |
+|                                           |
++-------------------------------------------+
+|                                           |
+|  Lookup Table                             |
+|  +---------------------------+            |
+|  | commit_pos (4 bytes)      |            |
+|  +---------------------------+            |
+|  | offset (8 bytes)          |            |
+|  +------------+--------------+            |
+|                                           |
+|  Offset Cases:                            |
+|  -------------                            |
+|                                           |
+|  1. MSB Unset: single pseudo-merge bitmap |
+|     + offset to pseudo-merge bitmap       |
+|                                           |
+|  2. MSB Set: multiple pseudo-merges       |
+|     + offset to extended lookup table     |
+|                                           |
++-------------------------------------------+
+|                                           |
+|  Extended Lookup Table (Optional)         |
+|  +----+----------+----------+----------+  |
+|  | N  | Offset 1 |   ....   | Offset N |  |
+|  +----+----------+----------+----------+  |
+|  |    |  8 bytes |   ....   |  8 bytes |  |
+|  +----+----------+----------+----------+  |
+|                                           |
++-------------------------------------------+
+|                                           |
+|  Pseudo-merge Metadata                    |
+|  +-----------------------------------+    |
+|  | # pseudo-merges (4 bytes)         |    |
+|  +-----------------------------------+    |
+|  | # commits (4 bytes)               |    |
+|  +-----------------------------------+    |
+|  | Lookup offset (8 bytes)           |    |
+|  +-----------------------------------+    |
+|  | Extension size (8 bytes)          |    |
+|  +-----------------------------------+    |
+|                                           |
++-------------------------------------------+
+....
+
+* One or more pseudo-merge bitmaps, each containing:
+
+  ** `commits_bitmap`, an EWAH-compressed bitmap describing the set of
+     commits included in the this psuedo-merge.
+
+  ** `merge_bitmap`, an EWAH-compressed bitmap describing the union of
+     the set of objects reachable from all commits listed in the
+     `commits_bitmap`.
+
+* A lookup table, mapping pseudo-merged commits to the pseudo-merges
+  they belong to. Entries appear in increasing order of each commit's
+  bit position. Each entry is 12 bytes wide, and is comprised of the
+  following:
+
+  ** `commit_pos`, a 4-byte unsigned value (in network byte-order)
+     containing the bit position for this commit.
+
+  ** `offset`, an 8-byte unsigned value (also in network byte-order)
+  containing either one of two possible offsets, depending on whether or
+  not the most-significant bit is set.
+
+    *** If unset (i.e. `offset & ((uint64_t)1<<63) == 0`), the offset
+	(relative to the beginning of the `.bitmap` file) at which the
+	pseudo-merge bitmap for this commit can be read. This indicates
+	only a single pseudo-merge bitmap contains this commit.
+
+    *** If set (i.e. `offset & ((uint64_t)1<<63) != 0`), the offset
+	(again relative to the beginning of the `.bitmap` file) at which
+	the extended offset table can be located describing the set of
+	pseudo-merge bitmaps which contain this commit. This indicates
+	that multiple pseudo-merge bitmaps contain this commit.
+
+* An (optional) extended lookup table (written if and only if there is
+  at least one commit which appears in more than one pseudo-merge).
+  There are as many entries as commits which appear in multiple
+  pseudo-merges. Each entry contains the following:
+
+  ** `N`, a 4-byte unsigned value equal to the number of pseudo-merges
+     which contain a given commit.
+
+  ** An array of `N` 8-byte unsigned values, each of which is
+     interpreted as an offset (relative to the beginning of the
+     `.bitmap` file) at which a pseudo-merge bitmap for this commit can
+     be read. These values occur in no particular order.
+
+* Positions for all pseudo-merges, each stored as an 8-byte unsigned
+  value (in network byte-order) containing the offset (relative to the
+  beginning of the `.bitmap` file) of each consecutive pseudo-merge.
+
+* A 4-byte unsigned value (in network byte-order) equal to the number of
+  pseudo-merges.
+
+* A 4-byte unsigned value (in network byte-order) equal to the number of
+  unique commits which appear in any pseudo-merge.
+
+* An 8-byte unsigned value (in network byte-order) equal to the number
+  of bytes between the start of the pseudo-merge section and the
+  beginning of the lookup table.
+
+* An 8-byte unsigned value (in network byte-order) equal to the number
+  of bytes in the pseudo-merge section (including this field).
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 04/24] ewah: implement `ewah_bitmap_is_subset()`
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (2 preceding siblings ...)
  2024-05-23 21:26   ` [PATCH v4 03/24] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
@ 2024-05-23 21:26   ` Taylor Blau
  2024-05-23 21:26   ` [PATCH v4 05/24] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
                     ` (20 subsequent siblings)
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:26 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

In order to know whether a given pseudo-merge (comprised of a "parents"
and "objects" bitmaps) is "satisfied" and can be OR'd into the bitmap
result, we need to be able to quickly determine whether the "parents"
bitmap is a subset of the current set of objects reachable on either
side of a traversal.

Implement a helper function to prepare for that, which determines
whether an EWAH bitmap (the parents bitmap from the pseudo-merge) is a
subset of a non-EWAH bitmap (in this case, the results bitmap from
either side of the traversal).

This function makes use of the EWAH iterator to avoid inflating any part
of the EWAH bitmap after we determine it is not a subset of the non-EWAH
bitmap. This "fail-fast" allows us to avoid a potentially large amount
of wasted effort.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ewah/bitmap.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 ewah/ewok.h   |  6 ++++++
 2 files changed, 49 insertions(+)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index ac7e0af622a..d352fec54ce 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -138,6 +138,49 @@ void bitmap_or(struct bitmap *self, const struct bitmap *other)
 		self->words[i] |= other->words[i];
 }
 
+int ewah_bitmap_is_subset(struct ewah_bitmap *self, struct bitmap *other)
+{
+	struct ewah_iterator it;
+	eword_t word;
+	size_t i;
+
+	ewah_iterator_init(&it, self);
+
+	for (i = 0; i < other->word_alloc; i++) {
+		if (!ewah_iterator_next(&word, &it)) {
+			/*
+			 * If we reached the end of `self`, and haven't
+			 * rejected `self` as a possible subset of
+			 * `other` yet, then we are done and `self` is
+			 * indeed a subset of `other`.
+			 */
+			return 1;
+		}
+		if (word & ~other->words[i]) {
+			/*
+			 * Otherwise, compare the next two pairs of
+			 * words. If the word from `self` has bit(s) not
+			 * in the word from `other`, `self` is not a
+			 * subset of `other`.
+			 */
+			return 0;
+		}
+	}
+
+	/*
+	 * If we got to this point, there may be zero or more words
+	 * remaining in `self`, with no remaining words left in `other`.
+	 * If there are any bits set in the remaining word(s) in `self`,
+	 * then `self` is not a subset of `other`.
+	 */
+	while (ewah_iterator_next(&word, &it))
+		if (word)
+			return 0;
+
+	/* `self` is definitely a subset of `other` */
+	return 1;
+}
+
 void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other)
 {
 	size_t original_size = self->word_alloc;
diff --git a/ewah/ewok.h b/ewah/ewok.h
index c11d76c6f33..2b6c4ac499c 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -179,7 +179,13 @@ void bitmap_unset(struct bitmap *self, size_t pos);
 int bitmap_get(struct bitmap *self, size_t pos);
 void bitmap_free(struct bitmap *self);
 int bitmap_equals(struct bitmap *self, struct bitmap *other);
+
+/*
+ * Both `bitmap_is_subset()` and `ewah_bitmap_is_subset()` return 1 if the set
+ * of bits in 'self' are a subset of the bits in 'other'. Returns 0 otherwise.
+ */
 int bitmap_is_subset(struct bitmap *self, struct bitmap *other);
+int ewah_bitmap_is_subset(struct ewah_bitmap *self, struct bitmap *other);
 
 struct ewah_bitmap * bitmap_to_ewah(struct bitmap *bitmap);
 struct bitmap *ewah_to_bitmap(struct ewah_bitmap *ewah);
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 05/24] pack-bitmap: move some initialization to `bitmap_writer_init()`
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (3 preceding siblings ...)
  2024-05-23 21:26   ` [PATCH v4 04/24] ewah: implement `ewah_bitmap_is_subset()` Taylor Blau
@ 2024-05-23 21:26   ` Taylor Blau
  2024-05-23 21:26   ` [PATCH v4 06/24] pseudo-merge.ch: initial commit Taylor Blau
                     ` (19 subsequent siblings)
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:26 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

The pack-bitmap-writer machinery uses a oidmap (backed by khash.h) to
map from commits selected for bitmaps (by OID) to a bitmapped_commit
structure (containing the bitmap itself, among other things like its XOR
offset, etc.)

This map was initialized at the end of `bitmap_writer_build()`. New
entries are added in `pack-bitmap-write.c::store_selected()`, which is
called by the bitmap_builder machinery (which is responsible for
traversing history and generating the actual bitmaps).

Reorganize when this field is initialized and when entries are added to
it so that we can quickly determine whether a commit is a candidate for
pseudo-merge selection, or not (since it was already selected to receive
a bitmap, and thus storing it in a pseudo-merge would be redundant).

The changes are as follows:

  - Introduce a new `bitmap_writer_init()` function which initializes
    the `writer.bitmaps` field (instead of waiting until the end of
    `bitmap_writer_build()`).

  - Add map entries in `push_bitmapped_commit()` (which is called via
    `bitmap_writer_select_commits()`) with OID keys and NULL values to
    track whether or not we *expect* to write a bitmap for some given
    commit.

  - Validate that a NULL entry is found matching the given key when we
    store a selected bitmap.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c |  3 ++-
 midx-write.c           |  2 +-
 pack-bitmap-write.c    | 24 ++++++++++++++++++------
 pack-bitmap.h          |  2 +-
 4 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 26a6d0d7919..6209264e60c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1340,7 +1340,8 @@ static void write_pack_file(void)
 				    hash_to_hex(hash));
 
 			if (write_bitmap_index) {
-				bitmap_writer_init(&bitmap_writer);
+				bitmap_writer_init(&bitmap_writer,
+						   the_repository);
 				bitmap_writer_set_checksum(&bitmap_writer, hash);
 				bitmap_writer_build_type_index(&bitmap_writer,
 					&to_pack, written_list, nr_written);
diff --git a/midx-write.c b/midx-write.c
index 7c0c08c64b2..c747d1a6af3 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -820,7 +820,7 @@ static int write_midx_bitmap(const char *midx_name,
 	for (i = 0; i < pdata->nr_objects; i++)
 		index[i] = &pdata->objects[i].idx;
 
-	bitmap_writer_init(&writer);
+	bitmap_writer_init(&writer, the_repository);
 	bitmap_writer_show_progress(&writer, flags & MIDX_PROGRESS);
 	bitmap_writer_build_type_index(&writer, pdata, index,
 				       pdata->nr_objects);
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 6cae670412c..d8870155831 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -27,9 +27,12 @@ struct bitmapped_commit {
 	uint32_t commit_pos;
 };
 
-void bitmap_writer_init(struct bitmap_writer *writer)
+void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r)
 {
 	memset(writer, 0, sizeof(struct bitmap_writer));
+	if (writer->bitmaps)
+		BUG("bitmap writer already initialized");
+	writer->bitmaps = kh_init_oid_map();
 }
 
 void bitmap_writer_free(struct bitmap_writer *writer)
@@ -128,11 +131,21 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
 static inline void push_bitmapped_commit(struct bitmap_writer *writer,
 					 struct commit *commit)
 {
+	int hash_ret;
+	khiter_t hash_pos;
+
 	if (writer->selected_nr >= writer->selected_alloc) {
 		writer->selected_alloc = (writer->selected_alloc + 32) * 2;
 		REALLOC_ARRAY(writer->selected, writer->selected_alloc);
 	}
 
+	hash_pos = kh_put_oid_map(writer->bitmaps, commit->object.oid,
+				  &hash_ret);
+	if (!hash_ret)
+		die(_("duplicate entry when writing bitmap index: %s"),
+		    oid_to_hex(&commit->object.oid));
+	kh_value(writer->bitmaps, hash_pos) = NULL;
+
 	writer->selected[writer->selected_nr].commit = commit;
 	writer->selected[writer->selected_nr].bitmap = NULL;
 	writer->selected[writer->selected_nr].write_as = NULL;
@@ -483,14 +496,14 @@ static void store_selected(struct bitmap_writer *writer,
 {
 	struct bitmapped_commit *stored = &writer->selected[ent->idx];
 	khiter_t hash_pos;
-	int hash_ret;
 
 	stored->bitmap = bitmap_to_ewah(ent->bitmap);
 
-	hash_pos = kh_put_oid_map(writer->bitmaps, commit->object.oid, &hash_ret);
-	if (hash_ret == 0)
-		die("Duplicate entry when writing index: %s",
+	hash_pos = kh_get_oid_map(writer->bitmaps, commit->object.oid);
+	if (hash_pos == kh_end(writer->bitmaps))
+		die(_("attempted to store non-selected commit: '%s'"),
 		    oid_to_hex(&commit->object.oid));
+
 	kh_value(writer->bitmaps, hash_pos) = stored;
 }
 
@@ -506,7 +519,6 @@ int bitmap_writer_build(struct bitmap_writer *writer,
 	uint32_t *mapping;
 	int closed = 1; /* until proven otherwise */
 
-	writer->bitmaps = kh_init_oid_map();
 	writer->to_pack = to_pack;
 
 	if (writer->show_progress)
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 3091095f336..f87e60153dd 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -114,7 +114,7 @@ struct bitmap_writer {
 	unsigned char pack_checksum[GIT_MAX_RAWSZ];
 };
 
-void bitmap_writer_init(struct bitmap_writer *writer);
+void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r);
 void bitmap_writer_show_progress(struct bitmap_writer *writer, int show);
 void bitmap_writer_set_checksum(struct bitmap_writer *writer,
 				const unsigned char *sha1);
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 06/24] pseudo-merge.ch: initial commit
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (4 preceding siblings ...)
  2024-05-23 21:26   ` [PATCH v4 05/24] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
@ 2024-05-23 21:26   ` Taylor Blau
  2024-05-23 21:26   ` [PATCH v4 07/24] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
                     ` (18 subsequent siblings)
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:26 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Add a new (empty) header file to contain the implementation for
selecting, reading, and applying pseudo-merge bitmaps.

For now this header and its corresponding implementation are left
empty, but they will evolve over the course of subsequent commit(s).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Makefile       | 1 +
 pseudo-merge.c | 2 ++
 pseudo-merge.h | 6 ++++++
 3 files changed, 9 insertions(+)
 create mode 100644 pseudo-merge.c
 create mode 100644 pseudo-merge.h

diff --git a/Makefile b/Makefile
index 0285db56306..4705a69f57f 100644
--- a/Makefile
+++ b/Makefile
@@ -1105,6 +1105,7 @@ LIB_OBJS += prompt.o
 LIB_OBJS += protocol.o
 LIB_OBJS += protocol-caps.o
 LIB_OBJS += prune-packed.o
+LIB_OBJS += pseudo-merge.o
 LIB_OBJS += quote.o
 LIB_OBJS += range-diff.o
 LIB_OBJS += reachable.o
diff --git a/pseudo-merge.c b/pseudo-merge.c
new file mode 100644
index 00000000000..37e037ba272
--- /dev/null
+++ b/pseudo-merge.c
@@ -0,0 +1,2 @@
+#include "git-compat-util.h"
+#include "pseudo-merge.h"
diff --git a/pseudo-merge.h b/pseudo-merge.h
new file mode 100644
index 00000000000..cab8ff6960a
--- /dev/null
+++ b/pseudo-merge.h
@@ -0,0 +1,6 @@
+#ifndef PSEUDO_MERGE_H
+#define PSEUDO_MERGE_H
+
+#include "git-compat-util.h"
+
+#endif
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 07/24] pack-bitmap-write: support storing pseudo-merge commits
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (5 preceding siblings ...)
  2024-05-23 21:26   ` [PATCH v4 06/24] pseudo-merge.ch: initial commit Taylor Blau
@ 2024-05-23 21:26   ` Taylor Blau
  2024-05-23 21:26   ` [PATCH v4 08/24] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Taylor Blau
                     ` (17 subsequent siblings)
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:26 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Prepare to write pseudo-merge bitmaps by annotating individual bitmapped
commits (which are represented by the `bitmapped_commit` structure) with
an extra bit indicating whether or not they are a pseudo-merge.

In subsequent commits, pseudo-merge bitmaps will be generated by
allocating a fake commit node with parents covering the full set of
commits represented by the pseudo-merge bitmap. These commits will be
added to the set of "selected" commits as usual, but will be written
specially instead of being included with the rest of the selected
commits.

Mechanically speaking, there are two parts of this change:

  - The bitmapped_commit struct gets a new bit indicating whether it is
    a pseudo-merge, or an ordinary commit selected for bitmaps.

  - A handful of changes to only write out the non-pseudo-merge commits
    when enumerating through the selected array (see the new
    `bitmap_writer_selected_nr()` function). Pseudo-merge commits appear
    after all non-pseudo-merge commits, so it is safe to enumerate
    through the selected array like so:

        for (i = 0; i < bitmap_writer_selected_nr(); i++)
          if (writer.selected[i].pseudo_merge)
            BUG("unexpected pseudo-merge");

    without encountering the BUG().

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 object.h            |  2 +-
 pack-bitmap-write.c | 96 +++++++++++++++++++++++++++++----------------
 pack-bitmap.h       |  3 ++
 3 files changed, 67 insertions(+), 34 deletions(-)

diff --git a/object.h b/object.h
index 99b9c8f114c..e6f9e89d3c5 100644
--- a/object.h
+++ b/object.h
@@ -81,7 +81,7 @@ void object_array_init(struct object_array *array);
  * reflog.c:                           10--12
  * builtin/show-branch.c:    0-------------------------------------------26
  * builtin/unpack-objects.c:                                 2021
- * pack-bitmap.h:                                                22
+ * pack-bitmap.h:                                              2122
  */
 #define FLAG_BITS  28
 
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index d8870155831..60eb1e71c98 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -25,8 +25,14 @@ struct bitmapped_commit {
 	int flags;
 	int xor_offset;
 	uint32_t commit_pos;
+	unsigned pseudo_merge : 1;
 };
 
+static inline int bitmap_writer_nr_selected_commits(struct bitmap_writer *writer)
+{
+	return writer->selected_nr - writer->pseudo_merges_nr;
+}
+
 void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r)
 {
 	memset(writer, 0, sizeof(struct bitmap_writer));
@@ -129,27 +135,31 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
  */
 
 static inline void push_bitmapped_commit(struct bitmap_writer *writer,
-					 struct commit *commit)
+					 struct commit *commit,
+					 unsigned pseudo_merge)
 {
-	int hash_ret;
-	khiter_t hash_pos;
-
 	if (writer->selected_nr >= writer->selected_alloc) {
 		writer->selected_alloc = (writer->selected_alloc + 32) * 2;
 		REALLOC_ARRAY(writer->selected, writer->selected_alloc);
 	}
 
-	hash_pos = kh_put_oid_map(writer->bitmaps, commit->object.oid,
-				  &hash_ret);
-	if (!hash_ret)
-		die(_("duplicate entry when writing bitmap index: %s"),
-		    oid_to_hex(&commit->object.oid));
-	kh_value(writer->bitmaps, hash_pos) = NULL;
+	if (!pseudo_merge) {
+		int hash_ret;
+		khiter_t hash_pos = kh_put_oid_map(writer->bitmaps,
+						   commit->object.oid,
+						   &hash_ret);
+
+		if (!hash_ret)
+			die(_("duplicate entry when writing bitmap index: %s"),
+			    oid_to_hex(&commit->object.oid));
+		kh_value(writer->bitmaps, hash_pos) = NULL;
+	}
 
 	writer->selected[writer->selected_nr].commit = commit;
 	writer->selected[writer->selected_nr].bitmap = NULL;
 	writer->selected[writer->selected_nr].write_as = NULL;
 	writer->selected[writer->selected_nr].flags = 0;
+	writer->selected[writer->selected_nr].pseudo_merge = pseudo_merge;
 
 	writer->selected_nr++;
 }
@@ -180,16 +190,20 @@ static void compute_xor_offsets(struct bitmap_writer *writer)
 
 	while (next < writer->selected_nr) {
 		struct bitmapped_commit *stored = &writer->selected[next];
-
 		int best_offset = 0;
 		struct ewah_bitmap *best_bitmap = stored->bitmap;
 		struct ewah_bitmap *test_xor;
 
+		if (stored->pseudo_merge)
+			goto next;
+
 		for (i = 1; i <= MAX_XOR_OFFSET_SEARCH; ++i) {
 			int curr = next - i;
 
 			if (curr < 0)
 				break;
+			if (writer->selected[curr].pseudo_merge)
+				continue;
 
 			test_xor = ewah_pool_new();
 			ewah_xor(writer->selected[curr].bitmap, stored->bitmap, test_xor);
@@ -205,6 +219,7 @@ static void compute_xor_offsets(struct bitmap_writer *writer)
 			}
 		}
 
+next:
 		stored->xor_offset = best_offset;
 		stored->write_as = best_bitmap;
 
@@ -217,7 +232,8 @@ struct bb_commit {
 	struct bitmap *commit_mask;
 	struct bitmap *bitmap;
 	unsigned selected:1,
-		 maximal:1;
+		 maximal:1,
+		 pseudo_merge:1;
 	unsigned idx; /* within selected array */
 };
 
@@ -255,17 +271,18 @@ static void bitmap_builder_init(struct bitmap_builder *bb,
 	revs.first_parent_only = 1;
 
 	for (i = 0; i < writer->selected_nr; i++) {
-		struct commit *c = writer->selected[i].commit;
-		struct bb_commit *ent = bb_data_at(&bb->data, c);
+		struct bitmapped_commit *bc = &writer->selected[i];
+		struct bb_commit *ent = bb_data_at(&bb->data, bc->commit);
 
 		ent->selected = 1;
 		ent->maximal = 1;
+		ent->pseudo_merge = bc->pseudo_merge;
 		ent->idx = i;
 
 		ent->commit_mask = bitmap_new();
 		bitmap_set(ent->commit_mask, i);
 
-		add_pending_object(&revs, &c->object, "");
+		add_pending_object(&revs, &bc->commit->object, "");
 	}
 
 	if (prepare_revision_walk(&revs))
@@ -444,8 +461,13 @@ static int fill_bitmap_commit(struct bitmap_writer *writer,
 		struct commit *c = prio_queue_get(queue);
 
 		if (old_bitmap && mapping) {
-			struct ewah_bitmap *old = bitmap_for_commit(old_bitmap, c);
+			struct ewah_bitmap *old;
 			struct bitmap *remapped = bitmap_new();
+
+			if (commit->object.flags & BITMAP_PSEUDO_MERGE)
+				old = NULL;
+			else
+				old = bitmap_for_commit(old_bitmap, c);
 			/*
 			 * If this commit has an old bitmap, then translate that
 			 * bitmap and add its bits to this one. No need to walk
@@ -464,12 +486,14 @@ static int fill_bitmap_commit(struct bitmap_writer *writer,
 		 * Mark ourselves and queue our tree. The commit
 		 * walk ensures we cover all parents.
 		 */
-		pos = find_object_pos(writer, &c->object.oid, &found);
-		if (!found)
-			return -1;
-		bitmap_set(ent->bitmap, pos);
-		prio_queue_put(tree_queue,
-			       repo_get_commit_tree(the_repository, c));
+		if (!(c->object.flags & BITMAP_PSEUDO_MERGE)) {
+			pos = find_object_pos(writer, &c->object.oid, &found);
+			if (!found)
+				return -1;
+			bitmap_set(ent->bitmap, pos);
+			prio_queue_put(tree_queue,
+				       repo_get_commit_tree(the_repository, c));
+		}
 
 		for (p = c->parents; p; p = p->next) {
 			pos = find_object_pos(writer, &p->item->object.oid,
@@ -499,6 +523,9 @@ static void store_selected(struct bitmap_writer *writer,
 
 	stored->bitmap = bitmap_to_ewah(ent->bitmap);
 
+	if (ent->pseudo_merge)
+		return;
+
 	hash_pos = kh_get_oid_map(writer->bitmaps, commit->object.oid);
 	if (hash_pos == kh_end(writer->bitmaps))
 		die(_("attempted to store non-selected commit: '%s'"),
@@ -631,7 +658,7 @@ void bitmap_writer_select_commits(struct bitmap_writer *writer,
 
 	if (indexed_commits_nr < 100) {
 		for (i = 0; i < indexed_commits_nr; ++i)
-			push_bitmapped_commit(writer, indexed_commits[i]);
+			push_bitmapped_commit(writer, indexed_commits[i], 0);
 		return;
 	}
 
@@ -664,7 +691,7 @@ void bitmap_writer_select_commits(struct bitmap_writer *writer,
 			}
 		}
 
-		push_bitmapped_commit(writer, chosen);
+		push_bitmapped_commit(writer, chosen, 0);
 
 		i += next + 1;
 		display_progress(writer->progress, i);
@@ -701,8 +728,11 @@ static void write_selected_commits_v1(struct bitmap_writer *writer,
 {
 	int i;
 
-	for (i = 0; i < writer->selected_nr; ++i) {
+	for (i = 0; i < bitmap_writer_nr_selected_commits(writer); ++i) {
 		struct bitmapped_commit *stored = &writer->selected[i];
+		if (stored->pseudo_merge)
+			BUG("unexpected pseudo-merge among selected: %s",
+			    oid_to_hex(&stored->commit->object.oid));
 
 		if (offsets)
 			offsets[i] = hashfile_total(f);
@@ -735,10 +765,10 @@ static void write_lookup_table(struct bitmap_writer *writer, struct hashfile *f,
 	uint32_t i;
 	uint32_t *table, *table_inv;
 
-	ALLOC_ARRAY(table, writer->selected_nr);
-	ALLOC_ARRAY(table_inv, writer->selected_nr);
+	ALLOC_ARRAY(table, bitmap_writer_nr_selected_commits(writer));
+	ALLOC_ARRAY(table_inv, bitmap_writer_nr_selected_commits(writer));
 
-	for (i = 0; i < writer->selected_nr; i++)
+	for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++)
 		table[i] = i;
 
 	/*
@@ -746,16 +776,16 @@ static void write_lookup_table(struct bitmap_writer *writer, struct hashfile *f,
 	 * bitmap corresponds to j'th bitmapped commit (among the selected
 	 * commits) in lex order of OIDs.
 	 */
-	QSORT_S(table, writer->selected_nr, table_cmp, writer);
+	QSORT_S(table, bitmap_writer_nr_selected_commits(writer), table_cmp, writer);
 
 	/* table_inv helps us discover that relationship (i'th bitmap
 	 * to j'th commit by j = table_inv[i])
 	 */
-	for (i = 0; i < writer->selected_nr; i++)
+	for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++)
 		table_inv[table[i]] = i;
 
 	trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository);
-	for (i = 0; i < writer->selected_nr; i++) {
+	for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++) {
 		struct bitmapped_commit *selected = &writer->selected[table[i]];
 		uint32_t xor_offset = selected->xor_offset;
 		uint32_t xor_row;
@@ -827,7 +857,7 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
 	memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE));
 	header.version = htons(default_version);
 	header.options = htons(flags | options);
-	header.entry_count = htonl(writer->selected_nr);
+	header.entry_count = htonl(bitmap_writer_nr_selected_commits(writer));
 	hashcpy(header.checksum, writer->pack_checksum);
 
 	hashwrite(f, &header, sizeof(header) - GIT_MAX_RAWSZ + the_hash_algo->rawsz);
@@ -839,7 +869,7 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
 	if (options & BITMAP_OPT_LOOKUP_TABLE)
 		CALLOC_ARRAY(offsets, index_nr);
 
-	for (i = 0; i < writer->selected_nr; i++) {
+	for (i = 0; i < bitmap_writer_nr_selected_commits(writer); i++) {
 		struct bitmapped_commit *stored = &writer->selected[i];
 		int commit_pos = oid_pos(&stored->commit->object.oid, index,
 					 index_nr, oid_access);
diff --git a/pack-bitmap.h b/pack-bitmap.h
index f87e60153dd..6937a0f090f 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -21,6 +21,7 @@ struct bitmap_disk_header {
 	unsigned char checksum[GIT_MAX_RAWSZ];
 };
 
+#define BITMAP_PSEUDO_MERGE (1u<<21)
 #define NEEDS_BITMAP (1u<<22)
 
 /*
@@ -109,6 +110,8 @@ struct bitmap_writer {
 	struct bitmapped_commit *selected;
 	unsigned int selected_nr, selected_alloc;
 
+	uint32_t pseudo_merges_nr;
+
 	struct progress *progress;
 	int show_progress;
 	unsigned char pack_checksum[GIT_MAX_RAWSZ];
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 08/24] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()`
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (6 preceding siblings ...)
  2024-05-23 21:26   ` [PATCH v4 07/24] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
@ 2024-05-23 21:26   ` Taylor Blau
  2024-05-23 21:26   ` [PATCH v4 09/24] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Taylor Blau
                     ` (16 subsequent siblings)
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:26 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Prepare to implement pseudo-merge bitmap selection by implementing a
necessary new function, `bitmap_writer_has_bitmapped_object_id()`.

This function returns whether or not the bitmap_writer selected the
given object ID for bitmapping. This will allow the pseudo-merge
machinery to reject candidates for pseudo-merges if they have already
been selected as an ordinary bitmap tip.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 6 ++++++
 pack-bitmap.h       | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 60eb1e71c98..299aa8af6f5 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -130,6 +130,12 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
 	}
 }
 
+int bitmap_writer_has_bitmapped_object_id(struct bitmap_writer *writer,
+					  const struct object_id *oid)
+{
+	return kh_get_oid_map(writer->bitmaps, *oid) != kh_end(writer->bitmaps);
+}
+
 /**
  * Compute the actual bitmaps
  */
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 6937a0f090f..e175f28e0de 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -125,6 +125,8 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
 				    struct packing_data *to_pack,
 				    struct pack_idx_entry **index,
 				    uint32_t index_nr);
+int bitmap_writer_has_bitmapped_object_id(struct bitmap_writer *writer,
+					  const struct object_id *oid);
 uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 				struct packing_data *mapping);
 int rebuild_bitmap(const uint32_t *reposition,
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 09/24] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (7 preceding siblings ...)
  2024-05-23 21:26   ` [PATCH v4 08/24] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Taylor Blau
@ 2024-05-23 21:26   ` Taylor Blau
  2024-05-23 21:26   ` [PATCH v4 10/24] config: introduce `git_config_double()` Taylor Blau
                     ` (15 subsequent siblings)
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:26 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

The pseudo-merge selection code will be added in a subsequent commit,
and will need a way to push the allocated commit structures into the
bitmap writer from a separate compilation unit.

Make the `bitmap_writer_push_bitmapped_commit()` function part of the
pack-bitmap.h header in order to make this possible.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 9 ++++-----
 pack-bitmap.h       | 2 ++
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 299aa8af6f5..bc19b33ad16 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -140,9 +140,8 @@ int bitmap_writer_has_bitmapped_object_id(struct bitmap_writer *writer,
  * Compute the actual bitmaps
  */
 
-static inline void push_bitmapped_commit(struct bitmap_writer *writer,
-					 struct commit *commit,
-					 unsigned pseudo_merge)
+void bitmap_writer_push_commit(struct bitmap_writer *writer,
+			       struct commit *commit, unsigned pseudo_merge)
 {
 	if (writer->selected_nr >= writer->selected_alloc) {
 		writer->selected_alloc = (writer->selected_alloc + 32) * 2;
@@ -664,7 +663,7 @@ void bitmap_writer_select_commits(struct bitmap_writer *writer,
 
 	if (indexed_commits_nr < 100) {
 		for (i = 0; i < indexed_commits_nr; ++i)
-			push_bitmapped_commit(writer, indexed_commits[i], 0);
+			bitmap_writer_push_commit(writer, indexed_commits[i], 0);
 		return;
 	}
 
@@ -697,7 +696,7 @@ void bitmap_writer_select_commits(struct bitmap_writer *writer,
 			}
 		}
 
-		push_bitmapped_commit(writer, chosen, 0);
+		bitmap_writer_push_commit(writer, chosen, 0);
 
 		i += next + 1;
 		display_progress(writer->progress, i);
diff --git a/pack-bitmap.h b/pack-bitmap.h
index e175f28e0de..a7e2f56c971 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -127,6 +127,8 @@ void bitmap_writer_build_type_index(struct bitmap_writer *writer,
 				    uint32_t index_nr);
 int bitmap_writer_has_bitmapped_object_id(struct bitmap_writer *writer,
 					  const struct object_id *oid);
+void bitmap_writer_push_commit(struct bitmap_writer *writer,
+			       struct commit *commit, unsigned pseudo_merge);
 uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 				struct packing_data *mapping);
 int rebuild_bitmap(const uint32_t *reposition,
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 10/24] config: introduce `git_config_double()`
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (8 preceding siblings ...)
  2024-05-23 21:26   ` [PATCH v4 09/24] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Taylor Blau
@ 2024-05-23 21:26   ` Taylor Blau
  2024-05-23 21:26   ` [PATCH v4 11/24] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
                     ` (14 subsequent siblings)
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:26 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Future commits will want to parse a double-precision floating point
value from configuration, but we have no way to parse such a value prior
to this patch.

The core of the routine is implemented in git_parse_double(). Unlike
git_parse_unsigned() and git_parse_signed(), however, the function
implemented here only works on type "double", and not related types like
"float", or "long double".

This is because "float" and "long double" use different functions to
convert from ASCII strings to floating point values (strtof() and
strtold(), respectively). Likewise, there is no pointer type that can
assign to any of these values (except for "void *"), so the only way to
define this trio of functions would be with a macro expansion that is
parameterized over the floating point type and conversion function.

That is all doable, but likely to be overkill given our current needs,
which is only to parse double-precision floats.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 config.c |  9 +++++++++
 config.h |  7 +++++++
 parse.c  | 29 +++++++++++++++++++++++++++++
 parse.h  |  1 +
 4 files changed, 46 insertions(+)

diff --git a/config.c b/config.c
index 77a0fd2d80e..7df89f17275 100644
--- a/config.c
+++ b/config.c
@@ -1243,6 +1243,15 @@ ssize_t git_config_ssize_t(const char *name, const char *value,
 	return ret;
 }
 
+double git_config_double(const char *name, const char *value,
+			 const struct key_value_info *kvi)
+{
+	double ret;
+	if (!git_parse_double(value, &ret))
+		die_bad_number(name, value, kvi);
+	return ret;
+}
+
 static const struct fsync_component_name {
 	const char *name;
 	enum fsync_component component_bits;
diff --git a/config.h b/config.h
index f4966e37494..f5f306f373d 100644
--- a/config.h
+++ b/config.h
@@ -261,6 +261,13 @@ unsigned long git_config_ulong(const char *, const char *,
 ssize_t git_config_ssize_t(const char *, const char *,
 			   const struct key_value_info *);
 
+/**
+ * Identically to `git_config_double`, but for double-precision floating point
+ * values.
+ */
+double git_config_double(const char *, const char *,
+			 const struct key_value_info *);
+
 /**
  * Same as `git_config_bool`, except that integers are returned as-is, and
  * an `is_bool` flag is unset.
diff --git a/parse.c b/parse.c
index 42d691a0fbb..7a60a4f816c 100644
--- a/parse.c
+++ b/parse.c
@@ -125,6 +125,35 @@ int git_parse_ssize_t(const char *value, ssize_t *ret)
 	return 1;
 }
 
+int git_parse_double(const char *value, double *ret)
+{
+	char *end;
+	double val;
+	uintmax_t factor;
+
+	if (!value || !*value) {
+		errno = EINVAL;
+		return 0;
+	}
+
+	errno = 0;
+	val = strtod(value, &end);
+	if (errno == ERANGE)
+		return 0;
+	if (end == value) {
+		errno = EINVAL;
+		return 0;
+	}
+	factor = get_unit_factor(end);
+	if (!factor) {
+		errno = EINVAL;
+		return 0;
+	}
+	val *= factor;
+	*ret = val;
+	return 1;
+}
+
 int git_parse_maybe_bool_text(const char *value)
 {
 	if (!value)
diff --git a/parse.h b/parse.h
index 07d2193d698..6bb9a54d9ac 100644
--- a/parse.h
+++ b/parse.h
@@ -6,6 +6,7 @@ int git_parse_ssize_t(const char *, ssize_t *);
 int git_parse_ulong(const char *, unsigned long *);
 int git_parse_int(const char *value, int *ret);
 int git_parse_int64(const char *value, int64_t *ret);
+int git_parse_double(const char *value, double *ret);
 
 /**
  * Same as `git_config_bool`, except that it returns -1 on error rather
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 11/24] pseudo-merge: implement support for selecting pseudo-merge commits
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (9 preceding siblings ...)
  2024-05-23 21:26   ` [PATCH v4 10/24] config: introduce `git_config_double()` Taylor Blau
@ 2024-05-23 21:26   ` Taylor Blau
  2024-05-25  3:22     ` Jeff King
  2024-05-23 21:26   ` [PATCH v4 12/24] pack-bitmap-write.c: write pseudo-merge table Taylor Blau
                     ` (13 subsequent siblings)
  24 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:26 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Teach the new pseudo-merge machinery how to select non-bitmapped commits
for inclusion in different pseudo-merge group(s) based on a handful of
criteria.

Note that the selected pseudo-merge commits aren't actually used or
written anywhere yet. This will be done in the following commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config.txt                     |   2 +
 Documentation/config/bitmap-pseudo-merge.txt |  91 ++++
 Documentation/gitpacking.txt                 |  83 ++++
 pack-bitmap-write.c                          |  21 +
 pack-bitmap.h                                |   2 +
 pseudo-merge.c                               | 454 +++++++++++++++++++
 pseudo-merge.h                               |  94 ++++
 7 files changed, 747 insertions(+)
 create mode 100644 Documentation/config/bitmap-pseudo-merge.txt

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 6f649c997c0..caa34311214 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -384,6 +384,8 @@ include::config/apply.txt[]
 
 include::config/attr.txt[]
 
+include::config/bitmap-pseudo-merge.txt[]
+
 include::config/blame.txt[]
 
 include::config/branch.txt[]
diff --git a/Documentation/config/bitmap-pseudo-merge.txt b/Documentation/config/bitmap-pseudo-merge.txt
new file mode 100644
index 00000000000..1f264eca99b
--- /dev/null
+++ b/Documentation/config/bitmap-pseudo-merge.txt
@@ -0,0 +1,91 @@
+NOTE: The configuration options in `bitmapPseudoMerge.*` are considered
+EXPERIMENTAL and may be subject to change or be removed entirely in the
+future. For more information about the pseudo-merge bitmap feature, see
+the "Pseudo-merge bitmaps" section of linkgit:gitpacking[7].
+
+bitmapPseudoMerge.<name>.pattern::
+	Regular expression used to match reference names. Commits
+	pointed to by references matching this pattern (and meeting
+	the below criteria, like `bitmapPseudoMerge.<name>.sampleRate`
+	and `bitmapPseudoMerge.<name>.threshold`) will be considered
+	for inclusion in a pseudo-merge bitmap.
++
+Commits are grouped into pseudo-merge groups based on whether or not
+any reference(s) that point at a given commit match the pattern, which
+is an extended regular expression.
++
+Within a pseudo-merge group, commits may be further grouped into
+sub-groups based on the capture groups in the pattern. These
+sub-groupings are formed from the regular expressions by concatenating
+any capture groups from the regular expression, with a '-' dash in
+between.
++
+For example, if the pattern is `refs/tags/`, then all tags (provided
+they meet the below criteria) will be considered candidates for the
+same pseudo-merge group. However, if the pattern is instead
+`refs/remotes/([0-9])+/tags/`, then tags from different remotes will
+be grouped into separate pseudo-merge groups, based on the remote
+number.
+
+bitmapPseudoMerge.<name>.decay::
+	Determines the rate at which consecutive pseudo-merge bitmap
+	groups decrease in size. Must be non-negative. This parameter
+	can be thought of as `k` in the function `f(n) = C * n^-k`,
+	where `f(n)` is the size of the `n`th group.
++
+Setting the decay rate equal to `0` will cause all groups to be the
+same size. Setting the decay rate equal to `1` will cause the `n`th
+group to be `1/n` the size of the initial group.  Higher values of the
+decay rate cause consecutive groups to shrink at an increasing rate.
+The default is `1`.
++
+If all groups are the same size, it is possible that groups containing
+newer commits will be able to be used less often than earlier groups,
+since it is more likely that the references pointing at newer commits
+will be updated more often than a reference pointing at an old commit.
+
+bitmapPseudoMerge.<name>.sampleRate::
+	Determines the proportion of non-bitmapped commits (among
+	reference tips) which are selected for inclusion in an
+	unstable pseudo-merge bitmap. Must be between `0` and `1`
+	(inclusive). The default is `1`.
+
+bitmapPseudoMerge.<name>.threshold::
+	Determines the minimum age of non-bitmapped commits (among
+	reference tips, as above) which are candidates for inclusion
+	in an unstable pseudo-merge bitmap. The default is
+	`1.week.ago`.
+
+bitmapPseudoMerge.<name>.maxMerges::
+	Determines the maximum number of pseudo-merge commits among
+	which commits may be distributed.
++
+For pseudo-merge groups whose pattern does not contain any capture
+groups, this setting is applied for all commits matching the regular
+expression. For patterns that have one or more capture groups, this
+setting is applied for each distinct capture group.
++
+For example, if your capture group is `refs/tags/`, then this setting
+will distribute all tags into a maximum of `maxMerges` pseudo-merge
+commits. However, if your capture group is, say,
+`refs/remotes/([0-9]+)/tags/`, then this setting will be applied to
+each remote's set of tags individually.
++
+Must be non-negative. The default value is 64.
+
+bitmapPseudoMerge.<name>.stableThreshold::
+	Determines the minimum age of commits (among reference tips,
+	as above, however stable commits are still considered
+	candidates even when they have been covered by a bitmap) which
+	are candidates for a stable a pseudo-merge bitmap. The default
+	is `1.month.ago`.
++
+Setting this threshold to a smaller value (e.g., 1.week.ago) will cause
+more stable groups to be generated (which impose a one-time generation
+cost) but those groups will likely become stale over time. Using a
+larger value incurs the opposite penalty (fewer stable groups which are
+more useful).
+
+bitmapPseudoMerge.<name>.stableSize::
+	Determines the size (in number of commits) of a stable
+	psuedo-merge bitmap. The default is `512`.
diff --git a/Documentation/gitpacking.txt b/Documentation/gitpacking.txt
index f24396f0173..4a6fcba6f72 100644
--- a/Documentation/gitpacking.txt
+++ b/Documentation/gitpacking.txt
@@ -96,6 +96,89 @@ can take advantage of the fact that we only care about the union of
 objects reachable from all of those tags, and answer the query much
 faster.
 
+=== Configuration
+
+Reference tips are grouped into different pseudo-merge groups according
+to two criteria. A reference name matches one or more of the defined
+pseudo-merge patterns, and optionally one or more capture groups within
+that pattern which further partition the group.
+
+Within a group, commits may be considered "stable", or "unstable"
+depending on their age. These are adjusted by setting the
+`bitmapPseudoMerge.<name>.stableThreshold` and
+`bitmapPseudoMerge.<name>.threshold` configuration values, respectively.
+
+All stable commits are grouped into pseudo-merges of equal size
+(`bitmapPseudoMerge.<name>.stableSize`). If the `stableSize`
+configuration is set to, say, 100, then the first 100 commits (ordered
+by committer date) which are older than the `stableThreshold` value will
+form one group, the next 100 commits will form another group, and so on.
+
+Among unstable commits, the pseudo-merge machinery will attempt to
+combine older commits into large groups as opposed to newer commits
+which will appear in smaller groups. This is based on the heuristic that
+references whose tip commit is older are less likely to be modified to
+point at a different commit than a reference whose tip commit is newer.
+
+The size of groups is determined by a power-law decay function, and the
+decay parameter roughly corresponds to "k" in `f(n) = C*n^(-k/100)`,
+where `f(n)` describes the size of the `n`-th pseudo-merge group. The
+sample rate controls what percentage of eligible commits are considered
+as candidates. The threshold parameter indicates the minimum age (so as
+to avoid including too-recent commits in a pseudo-merge group, making it
+less likely to be valid). The "maxMerges" parameter sets an upper-bound
+on the number of pseudo-merge commits an individual group
+
+The "stable"-related parameters control "stable" pseudo-merge groups,
+comprised of a fixed number of commits which are older than the
+configured "stable threshold" value and may be grouped together in
+chunks of "stableSize" in order of age.
+
+The exact configuration for pseudo-merges is as follows:
+
+include::config/bitmap-pseudo-merge.txt[]
+
+=== Examples
+
+Suppose that you have a repository with a large number of references,
+and you want a bare-bones configuration of pseudo-merge bitmaps that
+will enhance bitmap coverage of the `refs/` namespace. You may start
+wiht a configuration like so:
+
+    [bitmapPseudoMerge "all"]
+	pattern = "refs/"
+	threshold = now
+	stableThreshold = never
+	sampleRate = 100
+	maxMerges = 64
+
+This will create pseudo-merge bitmaps for all references, regardless of
+their age, and group them into 64 pseudo-merge commits.
+
+If you wanted to separate tags from branches when generating
+pseudo-merge commits, you would instead define the pattern with a
+capture group, like so:
+
+    [bitmapPseudoMerge "all"]
+	pattern = "refs/(heads/tags)/"
+
+Suppose instead that you are working in a fork-network repository, with
+each fork specified by some numeric ID, and whose refs reside in
+`refs/virtual/NNN/` (where `NNN` is the numeric ID corresponding to some
+fork) in the network. In this instance, you may instead write something
+like:
+
+    [bitmapPseudoMerge "all"]
+	pattern = "refs/virtual/([0-9]+)/(heads|tags)/"
+	threshold = now
+	stableThreshold = never
+	sampleRate = 100
+	maxMerges = 64
+
+Which would generate pseudo-merge group identifiers like "1234-heads",
+and "5678-tags" (for branches in fork "1234", and tags in remote "5678",
+respectively).
+
 SEE ALSO
 --------
 linkgit:git-pack-objects[1]
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index bc19b33ad16..d5884ea5e9c 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -17,6 +17,7 @@
 #include "trace2.h"
 #include "tree.h"
 #include "tree-walk.h"
+#include "pseudo-merge.h"
 
 struct bitmapped_commit {
 	struct commit *commit;
@@ -39,11 +40,25 @@ void bitmap_writer_init(struct bitmap_writer *writer, struct repository *r)
 	if (writer->bitmaps)
 		BUG("bitmap writer already initialized");
 	writer->bitmaps = kh_init_oid_map();
+	writer->pseudo_merge_commits = kh_init_oid_map();
+
+	string_list_init_dup(&writer->pseudo_merge_groups);
+
+	load_pseudo_merges_from_config(&writer->pseudo_merge_groups);
+}
+
+static void free_pseudo_merge_commit_idx(struct pseudo_merge_commit_idx *idx)
+{
+	if (!idx)
+		return;
+	free(idx->pseudo_merge);
+	free(idx);
 }
 
 void bitmap_writer_free(struct bitmap_writer *writer)
 {
 	uint32_t i;
+	struct pseudo_merge_commit_idx *idx;
 
 	if (!writer)
 		return;
@@ -55,6 +70,10 @@ void bitmap_writer_free(struct bitmap_writer *writer)
 
 	kh_destroy_oid_map(writer->bitmaps);
 
+	kh_foreach_value(writer->pseudo_merge_commits, idx,
+			 free_pseudo_merge_commit_idx(idx));
+	kh_destroy_oid_map(writer->pseudo_merge_commits);
+
 	for (i = 0; i < writer->selected_nr; i++) {
 		struct bitmapped_commit *bc = &writer->selected[i];
 		if (bc->write_as != bc->bitmap)
@@ -703,6 +722,8 @@ void bitmap_writer_select_commits(struct bitmap_writer *writer,
 	}
 
 	stop_progress(&writer->progress);
+
+	select_pseudo_merges(writer, indexed_commits, indexed_commits_nr);
 }
 
 
diff --git a/pack-bitmap.h b/pack-bitmap.h
index a7e2f56c971..1e730ea1e54 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -110,6 +110,8 @@ struct bitmap_writer {
 	struct bitmapped_commit *selected;
 	unsigned int selected_nr, selected_alloc;
 
+	struct string_list pseudo_merge_groups;
+	kh_oid_map_t *pseudo_merge_commits; /* oid -> pseudo merge(s) */
 	uint32_t pseudo_merges_nr;
 
 	struct progress *progress;
diff --git a/pseudo-merge.c b/pseudo-merge.c
index 37e037ba272..0f6854c753f 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -1,2 +1,456 @@
 #include "git-compat-util.h"
 #include "pseudo-merge.h"
+#include "date.h"
+#include "oid-array.h"
+#include "strbuf.h"
+#include "config.h"
+#include "string-list.h"
+#include "refs.h"
+#include "pack-bitmap.h"
+#include "commit.h"
+#include "alloc.h"
+#include "progress.h"
+
+#define DEFAULT_PSEUDO_MERGE_DECAY 1.0
+#define DEFAULT_PSEUDO_MERGE_MAX_MERGES 64
+#define DEFAULT_PSEUDO_MERGE_SAMPLE_RATE 1
+#define DEFAULT_PSEUDO_MERGE_THRESHOLD approxidate("1.week.ago")
+#define DEFAULT_PSEUDO_MERGE_STABLE_THRESHOLD approxidate("1.month.ago")
+#define DEFAULT_PSEUDO_MERGE_STABLE_SIZE 512
+
+static double gitexp(double base, int exp)
+{
+	double result = 1;
+	while (1) {
+		if (exp % 2)
+			result *= base;
+		exp >>= 1;
+		if (!exp)
+			break;
+		base *= base;
+	}
+	return result;
+}
+
+static uint32_t pseudo_merge_group_size(const struct pseudo_merge_group *group,
+					const struct pseudo_merge_matches *matches,
+					uint32_t i)
+{
+	double C = 0.0f;
+	uint32_t n;
+
+	/*
+	 * The size of pseudo-merge groups decays according to a power series,
+	 * which looks like:
+	 *
+	 *   f(n) = C * n^-k
+	 *
+	 * , where 'n' is the n-th pseudo-merge group, 'f(n)' is its size, 'k'
+	 * is the decay rate, and 'C' is a scaling value.
+	 *
+	 * The value of C depends on the number of groups, decay rate, and total
+	 * number of commits. It is computed such that if there are M and N
+	 * total groups and commits, respectively, that:
+	 *
+	 *   N = f(0) + f(1) + ... f(M-1)
+	 *
+	 * Rearranging to isolate C, we get:
+	 *
+	 *   N = \sum_{n=1}^M C / n^k
+	 *
+	 *   N / C = \sum_{n=1}^M n^-k
+	 *
+	 *   C = N / \sum_{n=1}^M n^-k
+	 *
+	 * For example, if we have a decay rate of 'k' being equal to 1.5, 'N'
+	 * total commits equal to 10,000, and 'M' being equal to 6 groups, then
+	 * the (rounded) group sizes are:
+	 *
+	 *   { 5469, 1934, 1053, 684, 489, 372 }
+	 *
+	 * increasing the number of total groups, say to 10, scales the group
+	 * sizes appropriately:
+	 *
+	 *   { 5012, 1772, 964, 626, 448, 341, 271, 221, 186, 158 }
+	 */
+	for (n = 0; n < group->max_merges; n++)
+		C += 1.0 / gitexp(n + 1, group->decay);
+	C = matches->unstable_nr / C;
+
+	return (uint32_t)((C / gitexp(i + 1, group->decay)) + 0.5);
+}
+
+static void pseudo_merge_group_init(struct pseudo_merge_group *group)
+{
+	memset(group, 0, sizeof(struct pseudo_merge_group));
+
+	strmap_init_with_options(&group->matches, NULL, 0);
+
+	group->decay = DEFAULT_PSEUDO_MERGE_DECAY;
+	group->max_merges = DEFAULT_PSEUDO_MERGE_MAX_MERGES;
+	group->sample_rate = DEFAULT_PSEUDO_MERGE_SAMPLE_RATE;
+	group->threshold = DEFAULT_PSEUDO_MERGE_THRESHOLD;
+	group->stable_threshold = DEFAULT_PSEUDO_MERGE_STABLE_THRESHOLD;
+	group->stable_size = DEFAULT_PSEUDO_MERGE_STABLE_SIZE;
+}
+
+static int pseudo_merge_config(const char *var, const char *value,
+			       const struct config_context *ctx,
+			       void *cb_data)
+{
+	struct string_list *list = cb_data;
+	struct string_list_item *item;
+	struct pseudo_merge_group *group;
+	struct strbuf buf = STRBUF_INIT;
+	const char *sub, *key;
+	size_t sub_len;
+	int ret = 0;
+
+	if (parse_config_key(var, "bitmappseudomerge", &sub, &sub_len, &key))
+		goto done;
+
+	if (!sub_len)
+		goto done;
+
+	strbuf_add(&buf, sub, sub_len);
+
+	item = string_list_lookup(list, buf.buf);
+	if (!item) {
+		item = string_list_insert(list, buf.buf);
+
+		item->util = xmalloc(sizeof(struct pseudo_merge_group));
+		pseudo_merge_group_init(item->util);
+	}
+
+	group = item->util;
+
+	if (!strcmp(key, "pattern")) {
+		struct strbuf re = STRBUF_INIT;
+
+		free(group->pattern);
+		if (*value != '^')
+			strbuf_addch(&re, '^');
+		strbuf_addstr(&re, value);
+
+		group->pattern = xcalloc(1, sizeof(regex_t));
+		if (regcomp(group->pattern, re.buf, REG_EXTENDED))
+			die(_("failed to load pseudo-merge regex for %s: '%s'"),
+			    sub, re.buf);
+
+		strbuf_release(&re);
+	} else if (!strcmp(key, "decay")) {
+		group->decay = git_config_double(var, value, ctx->kvi);
+		if (group->decay < 0) {
+			warning(_("%s must be non-negative, using default"), var);
+			group->decay = DEFAULT_PSEUDO_MERGE_DECAY;
+		}
+	} else if (!strcmp(key, "samplerate")) {
+		group->sample_rate = git_config_double(var, value, ctx->kvi);
+		if (!(0 <= group->sample_rate && group->sample_rate <= 1)) {
+			warning(_("%s must be between 0 and 1, using default"), var);
+			group->sample_rate = DEFAULT_PSEUDO_MERGE_SAMPLE_RATE;
+		}
+	} else if (!strcmp(key, "threshold")) {
+		if (git_config_expiry_date(&group->threshold, var, value)) {
+			ret = -1;
+			goto done;
+		}
+	} else if (!strcmp(key, "maxmerges")) {
+		group->max_merges = git_config_int(var, value, ctx->kvi);
+		if (group->max_merges < 0) {
+			warning(_("%s must be non-negative, using default"), var);
+			group->max_merges = DEFAULT_PSEUDO_MERGE_MAX_MERGES;
+		}
+	} else if (!strcmp(key, "stablethreshold")) {
+		if (git_config_expiry_date(&group->stable_threshold, var, value)) {
+			ret = -1;
+			goto done;
+		}
+	} else if (!strcmp(key, "stablesize")) {
+		group->stable_size = git_config_int(var, value, ctx->kvi);
+		if (group->stable_size <= 0) {
+			warning(_("%s must be positive, using default"), var);
+			group->stable_size = DEFAULT_PSEUDO_MERGE_STABLE_SIZE;
+		}
+	}
+
+done:
+	strbuf_release(&buf);
+
+	return ret;
+}
+
+void load_pseudo_merges_from_config(struct string_list *list)
+{
+	struct string_list_item *item;
+
+	git_config(pseudo_merge_config, list);
+
+	for_each_string_list_item(item, list) {
+		struct pseudo_merge_group *group = item->util;
+		if (!group->pattern)
+			die(_("pseudo-merge group '%s' missing required pattern"),
+			    item->string);
+		if (group->threshold < group->stable_threshold)
+			die(_("pseudo-merge group '%s' has unstable threshold "
+			      "before stable one"), item->string);
+	}
+}
+
+static int find_pseudo_merge_group_for_ref(const char *refname,
+					   const struct object_id *oid,
+					   int flags UNUSED,
+					   void *_data)
+{
+	struct bitmap_writer *writer = _data;
+	struct object_id peeled;
+	struct commit *c;
+	uint32_t i;
+	int has_bitmap;
+
+	if (!peel_iterated_oid(oid, &peeled))
+		oid = &peeled;
+
+	c = lookup_commit(the_repository, oid);
+	if (!c)
+		return 0;
+
+	has_bitmap = bitmap_writer_has_bitmapped_object_id(writer, oid);
+
+	for (i = 0; i < writer->pseudo_merge_groups.nr; i++) {
+		struct pseudo_merge_group *group;
+		struct pseudo_merge_matches *matches;
+		struct strbuf group_name = STRBUF_INIT;
+		regmatch_t captures[16];
+		size_t j;
+
+		group = writer->pseudo_merge_groups.items[i].util;
+		if (regexec(group->pattern, refname, ARRAY_SIZE(captures),
+			    captures, 0))
+			continue;
+
+		if (captures[ARRAY_SIZE(captures) - 1].rm_so != -1)
+			warning(_("pseudo-merge regex from config has too many capture "
+				  "groups (max=%"PRIuMAX")"),
+				(uintmax_t)ARRAY_SIZE(captures) - 2);
+
+		for (j = !!group->pattern->re_nsub; j < ARRAY_SIZE(captures); j++) {
+			regmatch_t *match = &captures[j];
+			if (match->rm_so == -1)
+				continue;
+
+			if (group_name.len)
+				strbuf_addch(&group_name, '-');
+
+			strbuf_add(&group_name, refname + match->rm_so,
+				   match->rm_eo - match->rm_so);
+		}
+
+		matches = strmap_get(&group->matches, group_name.buf);
+		if (!matches) {
+			matches = xcalloc(1, sizeof(*matches));
+			strmap_put(&group->matches, strbuf_detach(&group_name, NULL),
+				   matches);
+		}
+
+		if (c->date <= group->stable_threshold) {
+			ALLOC_GROW(matches->stable, matches->stable_nr + 1,
+				   matches->stable_alloc);
+			matches->stable[matches->stable_nr++] = c;
+		} else if (c->date <= group->threshold && !has_bitmap) {
+			ALLOC_GROW(matches->unstable, matches->unstable_nr + 1,
+				   matches->unstable_alloc);
+			matches->unstable[matches->unstable_nr++] = c;
+		}
+
+		strbuf_release(&group_name);
+	}
+
+	return 0;
+}
+
+static struct commit *push_pseudo_merge(struct pseudo_merge_group *group)
+{
+	struct commit *merge;
+
+	ALLOC_GROW(group->merges, group->merges_nr + 1, group->merges_alloc);
+
+	merge = alloc_commit_node(the_repository);
+	merge->object.parsed = 1;
+	merge->object.flags |= BITMAP_PSEUDO_MERGE;
+
+	group->merges[group->merges_nr++] = merge;
+
+	return merge;
+}
+
+static struct pseudo_merge_commit_idx *pseudo_merge_idx(kh_oid_map_t *pseudo_merge_commits,
+							const struct object_id *oid)
+
+{
+	struct pseudo_merge_commit_idx *pmc;
+	int hash_ret;
+	khiter_t hash_pos = kh_put_oid_map(pseudo_merge_commits, *oid,
+					   &hash_ret);
+
+	if (hash_ret) {
+		CALLOC_ARRAY(pmc, 1);
+		kh_value(pseudo_merge_commits, hash_pos) = pmc;
+	} else {
+		pmc = kh_value(pseudo_merge_commits, hash_pos);
+	}
+
+	return pmc;
+}
+
+#define MIN_PSEUDO_MERGE_SIZE 8
+
+static void select_pseudo_merges_1(struct bitmap_writer *writer,
+				   struct pseudo_merge_group *group,
+				   struct pseudo_merge_matches *matches)
+{
+	uint32_t i, j;
+	uint32_t stable_merges_nr;
+
+	if (!matches->stable_nr && !matches->unstable_nr)
+		return; /* all tips in this group already have bitmaps */
+
+	stable_merges_nr = matches->stable_nr / group->stable_size;
+	if (matches->stable_nr % group->stable_size)
+		stable_merges_nr++;
+
+	/* make stable_merges_nr pseudo merges for stable commits */
+	for (i = 0, j = 0; i < stable_merges_nr; i++) {
+		struct commit *merge;
+		struct commit_list **p;
+
+		merge = push_pseudo_merge(group);
+		p = &merge->parents;
+
+		/*
+		 * For each pseudo-merge created above, add parents to the
+		 * allocated commit node from the stable set of commits
+		 * (un-bitmapped, newer than the stable threshold).
+		 */
+		do {
+			struct commit *c;
+			struct pseudo_merge_commit_idx *pmc;
+
+			if (j >= matches->stable_nr)
+				break;
+
+			c = matches->stable[j++];
+			/*
+			 * Here and below, make sure that we keep our mapping of
+			 * commits -> pseudo-merge(s) which include the key'd
+			 * commit up-to-date.
+			 */
+			pmc = pseudo_merge_idx(writer->pseudo_merge_commits,
+					       &c->object.oid);
+
+			ALLOC_GROW(pmc->pseudo_merge, pmc->nr + 1, pmc->alloc);
+
+			pmc->pseudo_merge[pmc->nr++] = writer->pseudo_merges_nr;
+			p = commit_list_append(c, p);
+		} while (j % group->stable_size);
+
+		bitmap_writer_push_commit(writer, merge, 1);
+		writer->pseudo_merges_nr++;
+	}
+
+	/* make up to group->max_merges pseudo merges for unstable commits */
+	for (i = 0, j = 0; i < group->max_merges; i++) {
+		struct commit *merge;
+		struct commit_list **p;
+		uint32_t size, end;
+
+		merge = push_pseudo_merge(group);
+		p = &merge->parents;
+
+		size = pseudo_merge_group_size(group, matches, i);
+		end = size < MIN_PSEUDO_MERGE_SIZE ? matches->unstable_nr : j + size;
+
+		/*
+		 * For each pseudo-merge commit created above, add parents to
+		 * the allocated commit node from the unstable set of commits
+		 * (newer than the stable threshold).
+		 *
+		 * Account for the sample rate, since not every candidate from
+		 * the set of stable commits will be included as a pseudo-merge
+		 * parent.
+		 */
+		for (; j < end && j < matches->unstable_nr; j++) {
+			struct commit *c = matches->unstable[j];
+			struct pseudo_merge_commit_idx *pmc;
+
+			if (j % (uint32_t)(1.0 / group->sample_rate))
+				continue;
+
+			pmc = pseudo_merge_idx(writer->pseudo_merge_commits,
+					       &c->object.oid);
+
+			ALLOC_GROW(pmc->pseudo_merge, pmc->nr + 1, pmc->alloc);
+
+			pmc->pseudo_merge[pmc->nr++] = writer->pseudo_merges_nr;
+			p = commit_list_append(c, p);
+		}
+
+		bitmap_writer_push_commit(writer, merge, 1);
+		writer->pseudo_merges_nr++;
+		if (end >= matches->unstable_nr)
+			break;
+	}
+}
+
+static int commit_date_cmp(const void *va, const void *vb)
+{
+	timestamp_t a = (*(const struct commit **)va)->date;
+	timestamp_t b = (*(const struct commit **)vb)->date;
+
+	if (a < b)
+		return -1;
+	else if (a > b)
+		return 1;
+	return 0;
+}
+
+static void sort_pseudo_merge_matches(struct pseudo_merge_matches *matches)
+{
+	QSORT(matches->stable, matches->stable_nr, commit_date_cmp);
+	QSORT(matches->unstable, matches->unstable_nr, commit_date_cmp);
+}
+
+void select_pseudo_merges(struct bitmap_writer *writer,
+			  struct commit **commits, size_t commits_nr)
+{
+	struct progress *progress = NULL;
+	uint32_t i;
+
+	if (!writer->pseudo_merge_groups.nr)
+		return;
+
+	if (writer->show_progress)
+		progress = start_progress("Selecting pseudo-merge commits",
+					  writer->pseudo_merge_groups.nr);
+
+	for_each_ref(find_pseudo_merge_group_for_ref, writer);
+
+	for (i = 0; i < writer->pseudo_merge_groups.nr; i++) {
+		struct pseudo_merge_group *group;
+		struct hashmap_iter iter;
+		struct strmap_entry *e;
+
+		group = writer->pseudo_merge_groups.items[i].util;
+		strmap_for_each_entry(&group->matches, &iter, e) {
+			struct pseudo_merge_matches *matches = e->value;
+
+			sort_pseudo_merge_matches(matches);
+
+			select_pseudo_merges_1(writer, group, matches);
+		}
+
+		display_progress(progress, i + 1);
+	}
+
+	stop_progress(&progress);
+}
diff --git a/pseudo-merge.h b/pseudo-merge.h
index cab8ff6960a..f809cf42aeb 100644
--- a/pseudo-merge.h
+++ b/pseudo-merge.h
@@ -2,5 +2,99 @@
 #define PSEUDO_MERGE_H
 
 #include "git-compat-util.h"
+#include "strmap.h"
+#include "khash.h"
+#include "ewah/ewok.h"
+
+struct commit;
+struct string_list;
+struct bitmap_index;
+struct bitmap_writer;
+
+/*
+ * A pseudo-merge group tracks the set of non-bitmapped reference tips
+ * that match the given pattern.
+ *
+ * Within those matches, they are further segmented by separating
+ * consecutive capture groups with '-' dash character capture groups
+ * with '-' dash characters.
+ *
+ * Those groups are then ordered by committer date and partitioned
+ * into individual pseudo-merge(s) according to the decay, max_merges,
+ * sample_rate, and threshold parameters.
+ */
+struct pseudo_merge_group {
+	regex_t *pattern;
+
+	/* capture group(s) -> struct pseudo_merge_matches */
+	struct strmap matches;
+
+	/*
+	 * The individual pseudo-merge(s) that are generated from the
+	 * above array of matches, partitioned according to the below
+	 * parameters.
+	 */
+	struct commit **merges;
+	size_t merges_nr;
+	size_t merges_alloc;
+
+	/*
+	 * Pseudo-merge grouping parameters. See git-config(1) for
+	 * more information.
+	 */
+	double decay;
+	int max_merges;
+	double sample_rate;
+	int stable_size;
+	timestamp_t threshold;
+	timestamp_t stable_threshold;
+};
+
+struct pseudo_merge_matches {
+	struct commit **stable;
+	struct commit **unstable;
+	size_t stable_nr, stable_alloc;
+	size_t unstable_nr, unstable_alloc;
+};
+
+/*
+ * Read the repository's configuration:
+ *
+ *   - bitmapPseudoMerge.<name>.pattern
+ *   - bitmapPseudoMerge.<name>.decay
+ *   - bitmapPseudoMerge.<name>.sampleRate
+ *   - bitmapPseudoMerge.<name>.threshold
+ *   - bitmapPseudoMerge.<name>.maxMerges
+ *   - bitmapPseudoMerge.<name>.stableThreshold
+ *   - bitmapPseudoMerge.<name>.stableSize
+ *
+ * and populates the given `list` with pseudo-merge groups. String
+ * entry keys are the pseudo-merge group names, and the values are
+ * pointers to the pseudo_merge_group structure itself.
+ */
+void load_pseudo_merges_from_config(struct string_list *list);
+
+/*
+ * A pseudo-merge commit index (pseudo_merge_commit_idx) maps a
+ * particular (non-pseudo-merge) commit to the list of pseudo-merge(s)
+ * it appears in.
+ */
+struct pseudo_merge_commit_idx {
+	uint32_t *pseudo_merge;
+	size_t nr, alloc;
+};
+
+/*
+ * Selects pseudo-merges from a list of commits, populating the given
+ * string_list of pseudo-merge groups.
+ *
+ * Populates the pseudo_merge_commits map with a commit_idx
+ * corresponding to each commit in the list. Counts the total number
+ * of pseudo-merges generated.
+ *
+ * Optionally shows a progress meter.
+ */
+void select_pseudo_merges(struct bitmap_writer *writer,
+			  struct commit **commits, size_t commits_nr);
 
 #endif
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 12/24] pack-bitmap-write.c: write pseudo-merge table
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (10 preceding siblings ...)
  2024-05-23 21:26   ` [PATCH v4 11/24] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
@ 2024-05-23 21:26   ` Taylor Blau
  2024-05-23 21:26   ` [PATCH v4 13/24] pack-bitmap: extract `read_bitmap()` function Taylor Blau
                     ` (12 subsequent siblings)
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:26 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Now that the pack-bitmap writer machinery understands how to select and
store pseudo-merge commits, teach it how to write the new optional
pseudo-merge .bitmap extension.

No readers yet exist for this new extension to the .bitmap format. The
following commits will take any preparatory step(s) necessary before
then implementing the routines necessary to read this new table.

In the meantime, the new `write_pseudo_merges()` function implements
writing this new format as described by a previous commit in
Documentation/technical/bitmap-format.txt.

Writing this table is fairly straightforward and consists of a few
sub-components:

  - a pair of bitmaps for each pseudo-merge (one for the pseudo-merge
    "parents", and another for the objects reachable from those parents)

  - for each commit, the offset of either (a) the pseudo-merge it
    belongs to, or (b) an extended lookup table if it belongs to >1
    pseudo-merge groups

  - if there are any commits belonging to >1 pseudo-merge group, the
    extended lookup tables (which each consist of the number of
    pseudo-merge groups a commit appears in, and then that many 4-byte
    unsigned )

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 131 ++++++++++++++++++++++++++++++++++++++++++++
 pack-bitmap.h       |   1 +
 2 files changed, 132 insertions(+)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index d5884ea5e9c..47250398aa2 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -18,6 +18,7 @@
 #include "tree.h"
 #include "tree-walk.h"
 #include "pseudo-merge.h"
+#include "oid-array.h"
 
 struct bitmapped_commit {
 	struct commit *commit;
@@ -771,6 +772,130 @@ static void write_selected_commits_v1(struct bitmap_writer *writer,
 	}
 }
 
+static void write_pseudo_merges(struct bitmap_writer *writer,
+				struct hashfile *f)
+{
+	struct oid_array commits = OID_ARRAY_INIT;
+	struct bitmap **commits_bitmap = NULL;
+	off_t *pseudo_merge_ofs = NULL;
+	off_t start, table_start, next_ext;
+
+	uint32_t base = bitmap_writer_nr_selected_commits(writer);
+	size_t i, j = 0;
+
+	CALLOC_ARRAY(commits_bitmap, writer->pseudo_merges_nr);
+	CALLOC_ARRAY(pseudo_merge_ofs, writer->pseudo_merges_nr);
+
+	for (i = 0; i < writer->pseudo_merges_nr; i++) {
+		struct bitmapped_commit *merge = &writer->selected[base + i];
+		struct commit_list *p;
+
+		if (!merge->pseudo_merge)
+			BUG("found non-pseudo merge commit at %"PRIuMAX, (uintmax_t)i);
+
+		commits_bitmap[i] = bitmap_new();
+
+		for (p = merge->commit->parents; p; p = p->next)
+			bitmap_set(commits_bitmap[i],
+				   find_object_pos(writer, &p->item->object.oid,
+						   NULL));
+	}
+
+	start = hashfile_total(f);
+
+	for (i = 0; i < writer->pseudo_merges_nr; i++) {
+		struct ewah_bitmap *commits_ewah = bitmap_to_ewah(commits_bitmap[i]);
+
+		pseudo_merge_ofs[i] = hashfile_total(f);
+
+		dump_bitmap(f, commits_ewah);
+		dump_bitmap(f, writer->selected[base+i].write_as);
+
+		ewah_free(commits_ewah);
+	}
+
+	next_ext = st_add(hashfile_total(f),
+			  st_mult(kh_size(writer->pseudo_merge_commits),
+				  sizeof(uint64_t)));
+
+	table_start = hashfile_total(f);
+
+	commits.alloc = kh_size(writer->pseudo_merge_commits);
+	CALLOC_ARRAY(commits.oid, commits.alloc);
+
+	for (i = kh_begin(writer->pseudo_merge_commits); i != kh_end(writer->pseudo_merge_commits); i++) {
+		if (!kh_exist(writer->pseudo_merge_commits, i))
+			continue;
+		oid_array_append(&commits, &kh_key(writer->pseudo_merge_commits, i));
+	}
+
+	oid_array_sort(&commits);
+
+	/* write lookup table (non-extended) */
+	for (i = 0; i < commits.nr; i++) {
+		int hash_pos;
+		struct pseudo_merge_commit_idx *c;
+
+		hash_pos = kh_get_oid_map(writer->pseudo_merge_commits,
+					  commits.oid[i]);
+		if (hash_pos == kh_end(writer->pseudo_merge_commits))
+			BUG("could not find pseudo-merge commit %s",
+			    oid_to_hex(&commits.oid[i]));
+
+		c = kh_value(writer->pseudo_merge_commits, hash_pos);
+
+		hashwrite_be32(f, find_object_pos(writer, &commits.oid[i],
+						  NULL));
+		if (c->nr == 1)
+			hashwrite_be64(f, pseudo_merge_ofs[c->pseudo_merge[0]]);
+		else if (c->nr > 1) {
+			if (next_ext & ((uint64_t)1<<63))
+				die(_("too many pseudo-merges"));
+			hashwrite_be64(f, next_ext | ((uint64_t)1<<63));
+			next_ext = st_add3(next_ext,
+					   sizeof(uint32_t),
+					   st_mult(c->nr, sizeof(uint64_t)));
+		} else
+			BUG("expected commit '%s' to have at least one "
+			    "pseudo-merge", oid_to_hex(&commits.oid[i]));
+	}
+
+	/* write lookup table (extended) */
+	for (i = 0; i < commits.nr; i++) {
+		int hash_pos;
+		struct pseudo_merge_commit_idx *c;
+
+		hash_pos = kh_get_oid_map(writer->pseudo_merge_commits,
+					  commits.oid[i]);
+		if (hash_pos == kh_end(writer->pseudo_merge_commits))
+			BUG("could not find pseudo-merge commit %s",
+			    oid_to_hex(&commits.oid[i]));
+
+		c = kh_value(writer->pseudo_merge_commits, hash_pos);
+		if (c->nr == 1)
+			continue;
+
+		hashwrite_be32(f, c->nr);
+		for (j = 0; j < c->nr; j++)
+			hashwrite_be64(f, pseudo_merge_ofs[c->pseudo_merge[j]]);
+	}
+
+	/* write positions for all pseudo merges */
+	for (i = 0; i < writer->pseudo_merges_nr; i++)
+		hashwrite_be64(f, pseudo_merge_ofs[i]);
+
+	hashwrite_be32(f, writer->pseudo_merges_nr);
+	hashwrite_be32(f, kh_size(writer->pseudo_merge_commits));
+	hashwrite_be64(f, table_start - start);
+	hashwrite_be64(f, hashfile_total(f) - start + sizeof(uint64_t));
+
+	for (i = 0; i < writer->pseudo_merges_nr; i++)
+		bitmap_free(commits_bitmap[i]);
+
+	free(pseudo_merge_ofs);
+	free(commits_bitmap);
+}
+
 static int table_cmp(const void *_va, const void *_vb, void *_data)
 {
 	struct bitmap_writer *writer = _data;
@@ -878,6 +1003,9 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
 
 	int fd = odb_mkstemp(&tmp_file, "pack/tmp_bitmap_XXXXXX");
 
+	if (writer->pseudo_merges_nr)
+		options |= BITMAP_OPT_PSEUDO_MERGES;
+
 	f = hashfd(fd, tmp_file.buf);
 
 	memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE));
@@ -907,6 +1035,9 @@ void bitmap_writer_finish(struct bitmap_writer *writer,
 
 	write_selected_commits_v1(writer, f, offsets);
 
+	if (options & BITMAP_OPT_PSEUDO_MERGES)
+		write_pseudo_merges(writer, f);
+
 	if (options & BITMAP_OPT_LOOKUP_TABLE)
 		write_lookup_table(writer, f, offsets);
 
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 1e730ea1e54..db9ae554fa8 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -37,6 +37,7 @@ enum pack_bitmap_opts {
 	BITMAP_OPT_FULL_DAG = 0x1,
 	BITMAP_OPT_HASH_CACHE = 0x4,
 	BITMAP_OPT_LOOKUP_TABLE = 0x10,
+	BITMAP_OPT_PSEUDO_MERGES = 0x20,
 };
 
 enum pack_bitmap_flags {
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 13/24] pack-bitmap: extract `read_bitmap()` function
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (11 preceding siblings ...)
  2024-05-23 21:26   ` [PATCH v4 12/24] pack-bitmap-write.c: write pseudo-merge table Taylor Blau
@ 2024-05-23 21:26   ` Taylor Blau
  2024-05-23 21:26   ` [PATCH v4 14/24] pseudo-merge: scaffolding for reads Taylor Blau
                     ` (11 subsequent siblings)
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:26 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

The pack-bitmap machinery uses the `read_bitmap_1()` function to read a
bitmap from within the mmap'd region corresponding to the .bitmap file.
As as side-effect of calling this function, `read_bitmap_1()` increments
the `index->map_pos` variable to reflect the number of bytes read.

Extract the core of this routine to a separate function (that operates
over a `const unsigned char *`, a `size_t` and a `size_t *` pointer)
instead of a `struct bitmap_index *` pointer.

This function (called `read_bitmap()`) is part of the pack-bitmap.h API
so that it can be used within the upcoming portion of the implementation
in pseduo-merge.ch.

Rewrite the existing function, `read_bitmap_1()`, in terms of its more
generic counterpart.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 24 +++++++++++++++---------
 pack-bitmap.h |  2 ++
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 35c5ef9d3cd..3519edb896b 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -129,17 +129,13 @@ static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
 	return composed;
 }
 
-/*
- * Read a bitmap from the current read position on the mmaped
- * index, and increase the read position accordingly
- */
-static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
+struct ewah_bitmap *read_bitmap(const unsigned char *map,
+				size_t map_size, size_t *map_pos)
 {
 	struct ewah_bitmap *b = ewah_pool_new();
 
-	ssize_t bitmap_size = ewah_read_mmap(b,
-		index->map + index->map_pos,
-		index->map_size - index->map_pos);
+	ssize_t bitmap_size = ewah_read_mmap(b, map + *map_pos,
+					     map_size - *map_pos);
 
 	if (bitmap_size < 0) {
 		error(_("failed to load bitmap index (corrupted?)"));
@@ -147,10 +143,20 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
 		return NULL;
 	}
 
-	index->map_pos += bitmap_size;
+	*map_pos += bitmap_size;
+
 	return b;
 }
 
+/*
+ * Read a bitmap from the current read position on the mmaped
+ * index, and increase the read position accordingly
+ */
+static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
+{
+	return read_bitmap(index->map, index->map_size, &index->map_pos);
+}
+
 static uint32_t bitmap_num_objects(struct bitmap_index *index)
 {
 	if (index->midx)
diff --git a/pack-bitmap.h b/pack-bitmap.h
index db9ae554fa8..21aabf805ea 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -160,4 +160,6 @@ int bitmap_is_preferred_refname(struct repository *r, const char *refname);
 
 int verify_bitmap_files(struct repository *r);
 
+struct ewah_bitmap *read_bitmap(const unsigned char *map,
+				size_t map_size, size_t *map_pos);
 #endif
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 14/24] pseudo-merge: scaffolding for reads
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (12 preceding siblings ...)
  2024-05-23 21:26   ` [PATCH v4 13/24] pack-bitmap: extract `read_bitmap()` function Taylor Blau
@ 2024-05-23 21:26   ` Taylor Blau
  2024-05-23 21:26   ` [PATCH v4 15/24] pack-bitmap.c: read pseudo-merge extension Taylor Blau
                     ` (10 subsequent siblings)
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:26 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Implement scaffolding within the new pseudo-merge compilation unit
necessary to use the pseudo-merge API from within the pack-bitmap.c
machinery.

The core of this scaffolding is two-fold:

  - The `pseudo_merge` structure itself, which represents an individual
    pseudo-merge bitmap. It has fields for both bitmaps, as well as
    metadata about its position within the memory-mapped region, and
    a few extra bits indicating whether or not it is satisfied, and
    which bitmaps(s, if any) have been read, since they are initialized
    lazily.

  - The `pseudo_merge_map` structure, which holds an array of
    pseudo_merges, as well as a pointer to the memory-mapped region
    containing the pseudo-merge serialization from within a .bitmap
    file.

Note that the `bitmap_index` structure is defined statically within the
pack-bitmap.o compilation unit, so we can't take in a `struct
bitmap_index *`. Instead, wrap the primary components necessary to read
the pseudo-merges in this new structure to avoid exposing the
implementation details of the `bitmap_index` structure.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pseudo-merge.c | 10 ++++++++
 pseudo-merge.h | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+)

diff --git a/pseudo-merge.c b/pseudo-merge.c
index 0f6854c753f..f0080d53c03 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -454,3 +454,13 @@ void select_pseudo_merges(struct bitmap_writer *writer,
 
 	stop_progress(&progress);
 }
+
+void free_pseudo_merge_map(struct pseudo_merge_map *pm)
+{
+	uint32_t i;
+	for (i = 0; i < pm->nr; i++) {
+		ewah_pool_free(pm->v[i].commits);
+		ewah_pool_free(pm->v[i].bitmap);
+	}
+	free(pm->v);
+}
diff --git a/pseudo-merge.h b/pseudo-merge.h
index f809cf42aeb..e9216baace8 100644
--- a/pseudo-merge.h
+++ b/pseudo-merge.h
@@ -97,4 +97,69 @@ struct pseudo_merge_commit_idx {
 void select_pseudo_merges(struct bitmap_writer *writer,
 			  struct commit **commits, size_t commits_nr);
 
+/*
+ * Represents a serialized view of a file containing pseudo-merge(s)
+ * (see Documentation/technical/bitmap-format.txt for a specification
+ * of the format).
+ */
+struct pseudo_merge_map {
+	/*
+	 * An array of pseudo-merge(s), lazily loaded from the .bitmap
+	 * file.
+	 */
+	struct pseudo_merge *v;
+	size_t nr;
+	size_t commits_nr;
+
+	/*
+	 * Pointers into a memory-mapped view of the .bitmap file:
+	 *
+	 *   - map: the beginning of the .bitmap file
+	 *   - commits: the beginning of the pseudo-merge commit index
+	 *   - map_size: the size of the .bitmap file
+	 */
+	const unsigned char *map;
+	const unsigned char *commits;
+
+	size_t map_size;
+};
+
+/*
+ * An individual pseudo-merge, storing a pair of lazily-loaded
+ * bitmaps:
+ *
+ *  - commits: the set of commit(s) that are part of the pseudo-merge
+ *  - bitmap: the set of object(s) reachable from the above set of
+ *    commits.
+ *
+ * The `at` and `bitmap_at` fields are used to store the locations of
+ * each of the above bitmaps in the .bitmap file.
+ */
+struct pseudo_merge {
+	struct ewah_bitmap *commits;
+	struct ewah_bitmap *bitmap;
+
+	off_t at;
+	off_t bitmap_at;
+
+	/*
+	 * `satisfied` indicates whether the given pseudo-merge has been
+	 * used.
+	 *
+	 * `loaded_commits` and `loaded_bitmap` indicate whether the
+	 * respective bitmaps have been loaded and read from the
+	 * .bitmap file.
+	 */
+	unsigned satisfied : 1,
+		 loaded_commits : 1,
+		 loaded_bitmap : 1;
+};
+
+/*
+ * Frees the given pseudo-merge map, releasing any memory held by (a)
+ * parsed EWAH bitmaps, or (b) the array of pseudo-merges itself. Does
+ * not free the memory-mapped view of the .bitmap file.
+ */
+void free_pseudo_merge_map(struct pseudo_merge_map *pm);
+
 #endif
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 15/24] pack-bitmap.c: read pseudo-merge extension
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (13 preceding siblings ...)
  2024-05-23 21:26   ` [PATCH v4 14/24] pseudo-merge: scaffolding for reads Taylor Blau
@ 2024-05-23 21:26   ` Taylor Blau
  2024-05-23 21:26   ` [PATCH v4 16/24] pseudo-merge: implement support for reading pseudo-merge commits Taylor Blau
                     ` (9 subsequent siblings)
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:26 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Now that the scaffolding for reading the pseudo-merge extension has been
laid, teach the pack-bitmap machinery to read the pseudo-merge extension
when present.

Note that pseudo-merges themselves are not yet used during traversal,
this step will be taken by a future commit.

In the meantime, read the table and initialize the pseudo_merge_map
structure introduced by a previous commit. When the pseudo-merge
extension is present, `load_bitmap_header()` performs basic sanity
checks to make sure that the table is well-formed.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 3519edb896b..fc9c3e2fc43 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -20,6 +20,7 @@
 #include "list-objects-filter-options.h"
 #include "midx.h"
 #include "config.h"
+#include "pseudo-merge.h"
 
 /*
  * An entry on the bitmap index, representing the bitmap for a given
@@ -86,6 +87,9 @@ struct bitmap_index {
 	 */
 	unsigned char *table_lookup;
 
+	/* This contains the pseudo-merge cache within 'map' (if found). */
+	struct pseudo_merge_map pseudo_merges;
+
 	/*
 	 * Extended index.
 	 *
@@ -205,6 +209,41 @@ static int load_bitmap_header(struct bitmap_index *index)
 				index->table_lookup = (void *)(index_end - table_size);
 			index_end -= table_size;
 		}
+
+		if (flags & BITMAP_OPT_PSEUDO_MERGES) {
+			unsigned char *pseudo_merge_ofs;
+			size_t table_size;
+			uint32_t i;
+
+			if (sizeof(table_size) > index_end - index->map - header_size)
+				return error(_("corrupted bitmap index file (too short to fit pseudo-merge table header)"));
+
+			table_size = get_be64(index_end - 8);
+			if (table_size > index_end - index->map - header_size)
+				return error(_("corrupted bitmap index file (too short to fit pseudo-merge table)"));
+
+			if (git_env_bool("GIT_TEST_USE_PSEUDO_MERGES", 1)) {
+				const unsigned char *ext = (index_end - table_size);
+
+				index->pseudo_merges.map = index->map;
+				index->pseudo_merges.map_size = index->map_size;
+				index->pseudo_merges.commits = ext + get_be64(index_end - 16);
+				index->pseudo_merges.commits_nr = get_be32(index_end - 20);
+				index->pseudo_merges.nr = get_be32(index_end - 24);
+
+				CALLOC_ARRAY(index->pseudo_merges.v,
+					     index->pseudo_merges.nr);
+
+				pseudo_merge_ofs = index_end - 24 -
+					(index->pseudo_merges.nr * sizeof(uint64_t));
+				for (i = 0; i < index->pseudo_merges.nr; i++) {
+					index->pseudo_merges.v[i].at = get_be64(pseudo_merge_ofs);
+					pseudo_merge_ofs += sizeof(uint64_t);
+				}
+			}
+
+			index_end -= table_size;
+		}
 	}
 
 	index->entry_count = ntohl(header->entry_count);
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 16/24] pseudo-merge: implement support for reading pseudo-merge commits
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (14 preceding siblings ...)
  2024-05-23 21:26   ` [PATCH v4 15/24] pack-bitmap.c: read pseudo-merge extension Taylor Blau
@ 2024-05-23 21:26   ` Taylor Blau
  2024-05-23 21:27   ` [PATCH v4 17/24] ewah: implement `ewah_bitmap_popcount()` Taylor Blau
                     ` (8 subsequent siblings)
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:26 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Implement the basic API for reading pseudo-merge bitmaps, which consists
of four basic functions:

  - pseudo_merge_bitmap()
  - use_pseudo_merge()
  - apply_pseudo_merges_for_commit()
  - cascade_pseudo_merges()

These functions are all documented in pseudo-merge.h, but their rough
descriptions are as follows:

  - pseudo_merge_bitmap() reads and inflates the objects EWAH bitmap for
    a given pseudo-merge

  - use_pseudo_merge() does the same as pseudo_merge_bitmap(), but on
    the commits EWAH bitmap, not the objects bitmap

  - apply_pseudo_merges_for_commit() applies all satisfied pseudo-merge
    commits for a given result set, and cascades any yet-unsatisfied
    pseudo-merges if any were applied in the previous step

  - cascade_pseudo_merges() applies all pseudo-merges which are
    satisfied but have not been previously applied, repeating this
    process until no more pseudo-merges can be applied

The core of the API is the latter two functions, which are responsible
for applying pseudo-merges during the object traversal implemented in
the pack-bitmap machinery.

The other two functions (pseudo_merge_bitmap(), and use_pseudo_merge())
are low-level ways to interact with the pseudo-merge machinery, which
will be useful in future commits.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pseudo-merge.c | 235 +++++++++++++++++++++++++++++++++++++++++++++++++
 pseudo-merge.h |  44 +++++++++
 2 files changed, 279 insertions(+)

diff --git a/pseudo-merge.c b/pseudo-merge.c
index f0080d53c03..7d131011497 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -10,6 +10,7 @@
 #include "commit.h"
 #include "alloc.h"
 #include "progress.h"
+#include "hex.h"
 
 #define DEFAULT_PSEUDO_MERGE_DECAY 1.0
 #define DEFAULT_PSEUDO_MERGE_MAX_MERGES 64
@@ -464,3 +465,237 @@ void free_pseudo_merge_map(struct pseudo_merge_map *pm)
 	}
 	free(pm->v);
 }
+
+struct pseudo_merge_commit_ext {
+	uint32_t nr;
+	const unsigned char *ptr;
+};
+
+static int pseudo_merge_ext_at(const struct pseudo_merge_map *pm,
+			       struct pseudo_merge_commit_ext *ext, size_t at)
+{
+	if (at >= pm->map_size)
+		return error(_("extended pseudo-merge read out-of-bounds "
+			       "(%"PRIuMAX" >= %"PRIuMAX")"),
+			     (uintmax_t)at, (uintmax_t)pm->map_size);
+	if (at + 4 >= pm->map_size)
+		return error(_("extended pseudo-merge entry is too short "
+			       "(%"PRIuMAX" >= %"PRIuMAX")"),
+			     (uintmax_t)(at + 4), (uintmax_t)pm->map_size);
+
+	ext->nr = get_be32(pm->map + at);
+	ext->ptr = pm->map + at + sizeof(uint32_t);
+
+	return 0;
+}
+
+struct ewah_bitmap *pseudo_merge_bitmap(const struct pseudo_merge_map *pm,
+					struct pseudo_merge *merge)
+{
+	if (!merge->loaded_commits)
+		BUG("cannot use unloaded pseudo-merge bitmap");
+
+	if (!merge->loaded_bitmap) {
+		size_t at = merge->bitmap_at;
+
+		merge->bitmap = read_bitmap(pm->map, pm->map_size, &at);
+		merge->loaded_bitmap = 1;
+	}
+
+	return merge->bitmap;
+}
+
+struct pseudo_merge *use_pseudo_merge(const struct pseudo_merge_map *pm,
+				      struct pseudo_merge *merge)
+{
+	if (!merge->loaded_commits) {
+		size_t pos = merge->at;
+
+		merge->commits = read_bitmap(pm->map, pm->map_size, &pos);
+		merge->bitmap_at = pos;
+		merge->loaded_commits = 1;
+	}
+	return merge;
+}
+
+static struct pseudo_merge *pseudo_merge_at(const struct pseudo_merge_map *pm,
+					    struct object_id *oid,
+					    size_t want)
+{
+	size_t lo = 0;
+	size_t hi = pm->nr;
+
+	while (lo < hi) {
+		size_t mi = lo + (hi - lo) / 2;
+		size_t got = pm->v[mi].at;
+
+		if (got == want)
+			return use_pseudo_merge(pm, &pm->v[mi]);
+		else if (got < want)
+			hi = mi;
+		else
+			lo = mi + 1;
+	}
+
+	warning(_("could not find pseudo-merge for commit %s at offset %"PRIuMAX),
+		oid_to_hex(oid), (uintmax_t)want);
+
+	return NULL;
+}
+
+struct pseudo_merge_commit {
+	uint32_t commit_pos;
+	uint64_t pseudo_merge_ofs;
+};
+
+#define PSEUDO_MERGE_COMMIT_RAWSZ (sizeof(uint32_t)+sizeof(uint64_t))
+
+static void read_pseudo_merge_commit_at(struct pseudo_merge_commit *merge,
+					const unsigned char *at)
+{
+	merge->commit_pos = get_be32(at);
+	merge->pseudo_merge_ofs = get_be64(at + sizeof(uint32_t));
+}
+
+static int nth_pseudo_merge_ext(const struct pseudo_merge_map *pm,
+				struct pseudo_merge_commit_ext *ext,
+				struct pseudo_merge_commit *merge,
+				uint32_t n)
+{
+	size_t ofs;
+
+	if (n >= ext->nr)
+		return error(_("extended pseudo-merge lookup out-of-bounds "
+			       "(%"PRIu32" >= %"PRIu32")"), n, ext->nr);
+
+	ofs = get_be64(ext->ptr + st_mult(n, sizeof(uint64_t)));
+	if (ofs >= pm->map_size)
+		return error(_("out-of-bounds read: (%"PRIuMAX" >= %"PRIuMAX")"),
+			     (uintmax_t)ofs, (uintmax_t)pm->map_size);
+
+	read_pseudo_merge_commit_at(merge, pm->map + ofs);
+
+	return 0;
+}
+
+static unsigned apply_pseudo_merge(const struct pseudo_merge_map *pm,
+				   struct pseudo_merge *merge,
+				   struct bitmap *result,
+				   struct bitmap *roots)
+{
+	if (merge->satisfied)
+		return 0;
+
+	if (!ewah_bitmap_is_subset(merge->commits, roots ? roots : result))
+		return 0;
+
+	bitmap_or_ewah(result, pseudo_merge_bitmap(pm, merge));
+	if (roots)
+		bitmap_or_ewah(roots, pseudo_merge_bitmap(pm, merge));
+	merge->satisfied = 1;
+
+	return 1;
+}
+
+static int pseudo_merge_commit_cmp(const void *va, const void *vb)
+{
+	struct pseudo_merge_commit merge;
+	uint32_t key = *(uint32_t*)va;
+
+	read_pseudo_merge_commit_at(&merge, vb);
+
+	if (key < merge.commit_pos)
+		return -1;
+	if (key > merge.commit_pos)
+		return 1;
+	return 0;
+}
+
+static struct pseudo_merge_commit *find_pseudo_merge(const struct pseudo_merge_map *pm,
+						     uint32_t pos)
+{
+	if (!pm->commits_nr)
+		return NULL;
+
+	return bsearch(&pos, pm->commits, pm->commits_nr,
+		       PSEUDO_MERGE_COMMIT_RAWSZ, pseudo_merge_commit_cmp);
+}
+
+int apply_pseudo_merges_for_commit(const struct pseudo_merge_map *pm,
+				   struct bitmap *result,
+				   struct commit *commit, uint32_t commit_pos)
+{
+	struct pseudo_merge *merge;
+	struct pseudo_merge_commit *merge_commit;
+	int ret = 0;
+
+	merge_commit = find_pseudo_merge(pm, commit_pos);
+	if (!merge_commit)
+		return 0;
+
+	if (merge_commit->pseudo_merge_ofs & ((uint64_t)1<<63)) {
+		struct pseudo_merge_commit_ext ext = { 0 };
+		off_t ofs = merge_commit->pseudo_merge_ofs & ~((uint64_t)1<<63);
+		uint32_t i;
+
+		if (pseudo_merge_ext_at(pm, &ext, ofs) < -1) {
+			warning(_("could not read extended pseudo-merge table "
+				  "for commit %s"),
+				oid_to_hex(&commit->object.oid));
+			return ret;
+		}
+
+		for (i = 0; i < ext.nr; i++) {
+			if (nth_pseudo_merge_ext(pm, &ext, merge_commit, i) < 0)
+				return ret;
+
+			merge = pseudo_merge_at(pm, &commit->object.oid,
+						merge_commit->pseudo_merge_ofs);
+
+			if (!merge)
+				return ret;
+
+			if (apply_pseudo_merge(pm, merge, result, NULL))
+				ret++;
+		}
+	} else {
+		merge = pseudo_merge_at(pm, &commit->object.oid,
+					merge_commit->pseudo_merge_ofs);
+
+		if (!merge)
+			return ret;
+
+		if (apply_pseudo_merge(pm, merge, result, NULL))
+			ret++;
+	}
+
+	if (ret)
+		cascade_pseudo_merges(pm, result, NULL);
+
+	return ret;
+}
+
+int cascade_pseudo_merges(const struct pseudo_merge_map *pm,
+			  struct bitmap *result,
+			  struct bitmap *roots)
+{
+	unsigned any_satisfied;
+	int ret = 0;
+
+	do {
+		struct pseudo_merge *merge;
+		uint32_t i;
+
+		any_satisfied = 0;
+
+		for (i = 0; i < pm->nr; i++) {
+			merge = use_pseudo_merge(pm, &pm->v[i]);
+			if (apply_pseudo_merge(pm, merge, result, roots)) {
+				any_satisfied |= 1;
+				ret++;
+			}
+		}
+	} while (any_satisfied);
+
+	return ret;
+}
diff --git a/pseudo-merge.h b/pseudo-merge.h
index e9216baace8..755edc054ae 100644
--- a/pseudo-merge.h
+++ b/pseudo-merge.h
@@ -162,4 +162,48 @@ struct pseudo_merge {
  */
 void free_pseudo_merge_map(struct pseudo_merge_map *pm);
 
+/*
+ * Loads the bitmap corresponding to the given pseudo-merge from the
+ * map, if it has not already been loaded.
+ */
+struct ewah_bitmap *pseudo_merge_bitmap(const struct pseudo_merge_map *pm,
+					struct pseudo_merge *merge);
+
+/*
+ * Loads the pseudo-merge and its commits bitmap from the given
+ * pseudo-merge map, if it has not already been loaded.
+ */
+struct pseudo_merge *use_pseudo_merge(const struct pseudo_merge_map *pm,
+				      struct pseudo_merge *merge);
+
+/*
+ * Applies pseudo-merge(s) containing the given commit to the bitmap
+ * "result".
+ *
+ * If any pseudo-merge(s) were satisfied, returns the number
+ * satisfied, otherwise returns 0. If any were satisfied, the
+ * remaining unsatisfied pseudo-merges are cascaded (see below).
+ */
+int apply_pseudo_merges_for_commit(const struct pseudo_merge_map *pm,
+				   struct bitmap *result,
+				   struct commit *commit, uint32_t commit_pos);
+
+/*
+ * Applies pseudo-merge(s) which are satisfied according to the
+ * current bitmap in result (or roots, see below). If any
+ * pseudo-merges were satisfied, repeat the process over unsatisfied
+ * pseudo-merge commits until no more pseudo-merges are satisfied.
+ *
+ * Result is the bitmap to which the pseudo-merge(s) are applied.
+ * Roots (if given) is a bitmap of the traversal tip(s) for either
+ * side of a reachability traversal.
+ *
+ * Roots may given instead of a populated results bitmap at the
+ * beginning of a traversal on either side where the reachability
+ * closure over tips is not yet known.
+ */
+int cascade_pseudo_merges(const struct pseudo_merge_map *pm,
+			  struct bitmap *result,
+			  struct bitmap *roots);
+
 #endif
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 17/24] ewah: implement `ewah_bitmap_popcount()`
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (15 preceding siblings ...)
  2024-05-23 21:26   ` [PATCH v4 16/24] pseudo-merge: implement support for reading pseudo-merge commits Taylor Blau
@ 2024-05-23 21:27   ` Taylor Blau
  2024-05-23 21:27   ` [PATCH v4 18/24] pack-bitmap: implement test helpers for pseudo-merge Taylor Blau
                     ` (7 subsequent siblings)
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:27 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Some of the pseudo-merge test helpers (which will be introduced in the
following commit) will want to indicate the total number of commits in
or objects reachable from a pseudo-merge.

Implement a popcount() function that operates on EWAH bitmaps to quickly
determine how many bits are set in each of the respective bitmaps.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ewah/bitmap.c | 14 ++++++++++++++
 ewah/ewok.h   |  1 +
 2 files changed, 15 insertions(+)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index d352fec54ce..dc2ca190f12 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -212,6 +212,20 @@ size_t bitmap_popcount(struct bitmap *self)
 	return count;
 }
 
+size_t ewah_bitmap_popcount(struct ewah_bitmap *self)
+{
+	struct ewah_iterator it;
+	eword_t word;
+	size_t count = 0;
+
+	ewah_iterator_init(&it, self);
+
+	while (ewah_iterator_next(&word, &it))
+		count += ewah_bit_popcount64(word);
+
+	return count;
+}
+
 int bitmap_is_empty(struct bitmap *self)
 {
 	size_t i;
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 2b6c4ac499c..7074a6347b7 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -195,6 +195,7 @@ void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other);
 void bitmap_or(struct bitmap *self, const struct bitmap *other);
 
 size_t bitmap_popcount(struct bitmap *self);
+size_t ewah_bitmap_popcount(struct ewah_bitmap *self);
 int bitmap_is_empty(struct bitmap *self);
 
 #endif
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 18/24] pack-bitmap: implement test helpers for pseudo-merge
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (16 preceding siblings ...)
  2024-05-23 21:27   ` [PATCH v4 17/24] ewah: implement `ewah_bitmap_popcount()` Taylor Blau
@ 2024-05-23 21:27   ` Taylor Blau
  2024-05-23 21:27   ` [PATCH v4 19/24] t/test-lib-functions.sh: support `--notick` in `test_commit_bulk()` Taylor Blau
                     ` (6 subsequent siblings)
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:27 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Implement three new sub-commands for the "bitmap" test-helper:

  - t/helper test-tool bitmap dump-pseudo-merges
  - t/helper test-tool bitmap dump-pseudo-merge-commits <n>
  - t/helper test-tool bitmap dump-pseudo-merge-objects <n>

These three helpers dump the list of pseudo merges, the "parents" of the
nth pseudo-merges, and the set of objects reachable from those parents,
respectively.

These helpers will be useful in subsequent patches when we add test
coverage for pseudo-merge bitmaps.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c          | 126 +++++++++++++++++++++++++++++++++++++++++
 pack-bitmap.h          |   3 +
 t/helper/test-bitmap.c |  34 ++++++++---
 3 files changed, 156 insertions(+), 7 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index fc9c3e2fc43..c13074673af 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2443,6 +2443,132 @@ int test_bitmap_hashes(struct repository *r)
 	return 0;
 }
 
+static void bit_pos_to_object_id(struct bitmap_index *bitmap_git,
+				 uint32_t bit_pos,
+				 struct object_id *oid)
+{
+	uint32_t index_pos;
+
+	if (bitmap_is_midx(bitmap_git))
+		index_pos = pack_pos_to_midx(bitmap_git->midx, bit_pos);
+	else
+		index_pos = pack_pos_to_index(bitmap_git->pack, bit_pos);
+
+	nth_bitmap_object_oid(bitmap_git, oid, index_pos);
+}
+
+int test_bitmap_pseudo_merges(struct repository *r)
+{
+	struct bitmap_index *bitmap_git;
+	uint32_t i;
+
+	bitmap_git = prepare_bitmap_git(r);
+	if (!bitmap_git || !bitmap_git->pseudo_merges.nr)
+		goto cleanup;
+
+	for (i = 0; i < bitmap_git->pseudo_merges.nr; i++) {
+		struct pseudo_merge *merge;
+		struct ewah_bitmap *commits_bitmap, *merge_bitmap;
+
+		merge = use_pseudo_merge(&bitmap_git->pseudo_merges,
+					 &bitmap_git->pseudo_merges.v[i]);
+		commits_bitmap = merge->commits;
+		merge_bitmap = pseudo_merge_bitmap(&bitmap_git->pseudo_merges,
+						   merge);
+
+		printf("at=%"PRIuMAX", commits=%"PRIuMAX", objects=%"PRIuMAX"\n",
+		       (uintmax_t)merge->at,
+		       (uintmax_t)ewah_bitmap_popcount(commits_bitmap),
+		       (uintmax_t)ewah_bitmap_popcount(merge_bitmap));
+	}
+
+cleanup:
+	free_bitmap_index(bitmap_git);
+	return 0;
+}
+
+static void dump_ewah_object_ids(struct bitmap_index *bitmap_git,
+				 struct ewah_bitmap *bitmap)
+
+{
+	struct ewah_iterator it;
+	eword_t word;
+	uint32_t pos = 0;
+
+	ewah_iterator_init(&it, bitmap);
+
+	while (ewah_iterator_next(&word, &it)) {
+		struct object_id oid;
+		uint32_t offset;
+
+		for (offset = 0; offset < BITS_IN_EWORD; offset++) {
+			if (!(word >> offset))
+				break;
+
+			offset += ewah_bit_ctz64(word >> offset);
+
+			bit_pos_to_object_id(bitmap_git, pos + offset, &oid);
+			printf("%s\n", oid_to_hex(&oid));
+		}
+		pos += BITS_IN_EWORD;
+	}
+}
+
+int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n)
+{
+	struct bitmap_index *bitmap_git;
+	struct pseudo_merge *merge;
+	int ret = 0;
+
+	bitmap_git = prepare_bitmap_git(r);
+	if (!bitmap_git || !bitmap_git->pseudo_merges.nr)
+		goto cleanup;
+
+	if (n >= bitmap_git->pseudo_merges.nr) {
+		ret = error(_("pseudo-merge index out of range "
+			      "(%"PRIu32" >= %"PRIuMAX")"),
+			    n, (uintmax_t)bitmap_git->pseudo_merges.nr);
+		goto cleanup;
+	}
+
+	merge = use_pseudo_merge(&bitmap_git->pseudo_merges,
+				 &bitmap_git->pseudo_merges.v[n]);
+	dump_ewah_object_ids(bitmap_git, merge->commits);
+
+cleanup:
+	free_bitmap_index(bitmap_git);
+	return ret;
+}
+
+int test_bitmap_pseudo_merge_objects(struct repository *r, uint32_t n)
+{
+	struct bitmap_index *bitmap_git;
+	struct pseudo_merge *merge;
+	int ret = 0;
+
+	bitmap_git = prepare_bitmap_git(r);
+	if (!bitmap_git || !bitmap_git->pseudo_merges.nr)
+		goto cleanup;
+
+	if (n >= bitmap_git->pseudo_merges.nr) {
+		ret = error(_("pseudo-merge index out of range "
+			      "(%"PRIu32" >= %"PRIuMAX")"),
+			    n, (uintmax_t)bitmap_git->pseudo_merges.nr);
+		goto cleanup;
+	}
+
+	merge = use_pseudo_merge(&bitmap_git->pseudo_merges,
+				 &bitmap_git->pseudo_merges.v[n]);
+
+	dump_ewah_object_ids(bitmap_git,
+			     pseudo_merge_bitmap(&bitmap_git->pseudo_merges,
+						 merge));
+
+cleanup:
+	free_bitmap_index(bitmap_git);
+	return ret;
+}
+
 int rebuild_bitmap(const uint32_t *reposition,
 		   struct ewah_bitmap *source,
 		   struct bitmap *dest)
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 21aabf805ea..4466b5ad0fb 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -73,6 +73,9 @@ void traverse_bitmap_commit_list(struct bitmap_index *,
 void test_bitmap_walk(struct rev_info *revs);
 int test_bitmap_commits(struct repository *r);
 int test_bitmap_hashes(struct repository *r);
+int test_bitmap_pseudo_merges(struct repository *r);
+int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n);
+int test_bitmap_pseudo_merge_objects(struct repository *r, uint32_t n);
 
 #define GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL \
 	"GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL"
diff --git a/t/helper/test-bitmap.c b/t/helper/test-bitmap.c
index af43ee1cb5e..6af2b42678f 100644
--- a/t/helper/test-bitmap.c
+++ b/t/helper/test-bitmap.c
@@ -13,21 +13,41 @@ static int bitmap_dump_hashes(void)
 	return test_bitmap_hashes(the_repository);
 }
 
+static int bitmap_dump_pseudo_merges(void)
+{
+	return test_bitmap_pseudo_merges(the_repository);
+}
+
+static int bitmap_dump_pseudo_merge_commits(uint32_t n)
+{
+	return test_bitmap_pseudo_merge_commits(the_repository, n);
+}
+
+static int bitmap_dump_pseudo_merge_objects(uint32_t n)
+{
+	return test_bitmap_pseudo_merge_objects(the_repository, n);
+}
+
 int cmd__bitmap(int argc, const char **argv)
 {
 	setup_git_directory();
 
-	if (argc != 2)
-		goto usage;
-
-	if (!strcmp(argv[1], "list-commits"))
+	if (argc == 2 && !strcmp(argv[1], "list-commits"))
 		return bitmap_list_commits();
-	if (!strcmp(argv[1], "dump-hashes"))
+	if (argc == 2 && !strcmp(argv[1], "dump-hashes"))
 		return bitmap_dump_hashes();
+	if (argc == 2 && !strcmp(argv[1], "dump-pseudo-merges"))
+		return bitmap_dump_pseudo_merges();
+	if (argc == 3 && !strcmp(argv[1], "dump-pseudo-merge-commits"))
+		return bitmap_dump_pseudo_merge_commits(atoi(argv[2]));
+	if (argc == 3 && !strcmp(argv[1], "dump-pseudo-merge-objects"))
+		return bitmap_dump_pseudo_merge_objects(atoi(argv[2]));
 
-usage:
 	usage("\ttest-tool bitmap list-commits\n"
-	      "\ttest-tool bitmap dump-hashes");
+	      "\ttest-tool bitmap dump-hashes\n"
+	      "\ttest-tool bitmap dump-pseudo-merges\n"
+	      "\ttest-tool bitmap dump-pseudo-merge-commits <n>\n"
+	      "\ttest-tool bitmap dump-pseudo-merge-objects <n>");
 
 	return -1;
 }
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 19/24] t/test-lib-functions.sh: support `--notick` in `test_commit_bulk()`
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (17 preceding siblings ...)
  2024-05-23 21:27   ` [PATCH v4 18/24] pack-bitmap: implement test helpers for pseudo-merge Taylor Blau
@ 2024-05-23 21:27   ` Taylor Blau
  2024-05-25  3:25     ` Jeff King
  2024-05-23 21:27   ` [PATCH v4 20/24] pack-bitmap.c: use pseudo-merges during traversal Taylor Blau
                     ` (5 subsequent siblings)
  24 siblings, 1 reply; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:27 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

One of the tests we'll want to add for pseudo-merge bitmaps needs to be
able to generate a large number of commits at a specific date.

Support the `--notick` option (with identical semantics to the
`--notick` option for `test_commit()`) within `test_commit_bulk` as a
prerequisite for that. Callers can then set the various _DATE variables
themselves.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/test-lib-functions.sh | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 862d80c9748..427b375b392 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -458,6 +458,7 @@ test_commit_bulk () {
 	indir=.
 	ref=HEAD
 	n=1
+	notick=
 	message='commit %s'
 	filename='%s.t'
 	contents='content %s'
@@ -488,6 +489,9 @@ test_commit_bulk () {
 			filename="${1#--*=}-%s.t"
 			contents="${1#--*=} %s"
 			;;
+		--notick)
+			notick=yes
+			;;
 		-*)
 			BUG "invalid test_commit_bulk option: $1"
 			;;
@@ -507,7 +511,10 @@ test_commit_bulk () {
 
 	while test "$total" -gt 0
 	do
-		test_tick &&
+		if test -z "$notick"
+		then
+			test_tick
+		fi &&
 		echo "commit $ref"
 		printf 'author %s <%s> %s\n' \
 			"$GIT_AUTHOR_NAME" \
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 20/24] pack-bitmap.c: use pseudo-merges during traversal
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (18 preceding siblings ...)
  2024-05-23 21:27   ` [PATCH v4 19/24] t/test-lib-functions.sh: support `--notick` in `test_commit_bulk()` Taylor Blau
@ 2024-05-23 21:27   ` Taylor Blau
  2024-05-23 21:27   ` [PATCH v4 21/24] pack-bitmap: extra trace2 information Taylor Blau
                     ` (4 subsequent siblings)
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:27 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Now that all of the groundwork has been laid to support reading and
using pseudo-merges, make use of that work in this commit by teaching
the pack-bitmap machinery to use pseudo-merge(s) when available during
traversal.

The basic operation is as follows:

  - When enumerating objects on either side of a reachability query,
    first see if any subset of the roots satisfies some pseudo-merge
    bitmap. If it does, apply that pseudo-merge bitmap.

  - If any pseudo-merge bitmap(s) were applied in the previous step, OR
    them into the result[^1]. Then repeat the process over all
    pseudo-merge bitmaps (we'll refer to this as "cascading"
    pseudo-merges). Once this is done, OR in the resulting bitmap.

  - If there is no fill-in traversal to be done, return the bitmap for
    that side of the reachability query. If there is fill-in traversal,
    then for each commit we encounter via show_commit(), check to see if
    any unsatisfied pseudo-merges containing that commit as one of its
    parents has been made satisfied by the presence of that commit.

    If so, OR in the object set from that pseudo-merge bitmap, and then
    cascade. If not, continue traversal.

A similar implementation is present in the boundary-based bitmap
traversal routines.

[^1]: Importantly, we cannot OR in the entire set of roots along with
  the objects reachable from whatever pseudo-merge bitmaps were
  satisfied.  This may leave some dangling bits corresponding to any
  unsatisfied root(s) getting OR'd into the resulting bitmap, tricking
  other parts of the traversal into thinking we already have a
  reachability closure over those commit(s) when we do not.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c                   | 112 ++++++++++-
 t/t5333-pseudo-merge-bitmaps.sh | 328 ++++++++++++++++++++++++++++++++
 2 files changed, 439 insertions(+), 1 deletion(-)
 create mode 100755 t/t5333-pseudo-merge-bitmaps.sh

diff --git a/pack-bitmap.c b/pack-bitmap.c
index c13074673af..e61058dada6 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -114,6 +114,9 @@ struct bitmap_index {
 	unsigned int version;
 };
 
+static int pseudo_merges_satisfied_nr;
+static int pseudo_merges_cascades_nr;
+
 static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
 {
 	struct ewah_bitmap *parent;
@@ -1006,6 +1009,22 @@ static void show_commit(struct commit *commit UNUSED,
 {
 }
 
+static unsigned apply_pseudo_merges_for_commit_1(struct bitmap_index *bitmap_git,
+						 struct bitmap *result,
+						 struct commit *commit,
+						 uint32_t commit_pos)
+{
+	int ret;
+
+	ret = apply_pseudo_merges_for_commit(&bitmap_git->pseudo_merges,
+					     result, commit, commit_pos);
+
+	if (ret)
+		pseudo_merges_satisfied_nr += ret;
+
+	return ret;
+}
+
 static int add_to_include_set(struct bitmap_index *bitmap_git,
 			      struct include_data *data,
 			      struct commit *commit,
@@ -1026,6 +1045,10 @@ static int add_to_include_set(struct bitmap_index *bitmap_git,
 	}
 
 	bitmap_set(data->base, bitmap_pos);
+	if (apply_pseudo_merges_for_commit_1(bitmap_git, data->base, commit,
+					     bitmap_pos))
+		return 0;
+
 	return 1;
 }
 
@@ -1151,6 +1174,20 @@ static void show_boundary_object(struct object *object UNUSED,
 	BUG("should not be called");
 }
 
+static unsigned cascade_pseudo_merges_1(struct bitmap_index *bitmap_git,
+					struct bitmap *result,
+					struct bitmap *roots)
+{
+	int ret = cascade_pseudo_merges(&bitmap_git->pseudo_merges,
+					result, roots);
+	if (ret) {
+		pseudo_merges_cascades_nr++;
+		pseudo_merges_satisfied_nr += ret;
+	}
+
+	return ret;
+}
+
 static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 					    struct rev_info *revs,
 					    struct object_list *roots)
@@ -1160,6 +1197,7 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 	unsigned int i;
 	unsigned int tmp_blobs, tmp_trees, tmp_tags;
 	int any_missing = 0;
+	int existing_bitmaps = 0;
 
 	cb.bitmap_git = bitmap_git;
 	cb.base = bitmap_new();
@@ -1167,6 +1205,25 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 
 	revs->ignore_missing_links = 1;
 
+	if (bitmap_git->pseudo_merges.nr) {
+		struct bitmap *roots_bitmap = bitmap_new();
+		struct object_list *objects = NULL;
+
+		for (objects = roots; objects; objects = objects->next) {
+			struct object *object = objects->item;
+			int pos;
+
+			pos = bitmap_position(bitmap_git, &object->oid);
+			if (pos < 0)
+				continue;
+
+			bitmap_set(roots_bitmap, pos);
+		}
+
+		if (!cascade_pseudo_merges_1(bitmap_git, cb.base, roots_bitmap))
+			bitmap_free(roots_bitmap);
+	}
+
 	/*
 	 * OR in any existing reachability bitmaps among `roots` into
 	 * `cb.base`.
@@ -1178,8 +1235,10 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 			continue;
 
 		if (add_commit_to_bitmap(bitmap_git, &cb.base,
-					 (struct commit *)object))
+					 (struct commit *)object)) {
+			existing_bitmaps = 1;
 			continue;
+		}
 
 		any_missing = 1;
 	}
@@ -1187,6 +1246,9 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 	if (!any_missing)
 		goto cleanup;
 
+	if (existing_bitmaps)
+		cascade_pseudo_merges_1(bitmap_git, cb.base, NULL);
+
 	tmp_blobs = revs->blob_objects;
 	tmp_trees = revs->tree_objects;
 	tmp_tags = revs->blob_objects;
@@ -1242,6 +1304,13 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 	return cb.base;
 }
 
+static void unsatisfy_all_pseudo_merges(struct bitmap_index *bitmap_git)
+{
+	uint32_t i;
+	for (i = 0; i < bitmap_git->pseudo_merges.nr; i++)
+		bitmap_git->pseudo_merges.v[i].satisfied = 0;
+}
+
 static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 				   struct rev_info *revs,
 				   struct object_list *roots,
@@ -1249,9 +1318,32 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 {
 	struct bitmap *base = NULL;
 	int needs_walk = 0;
+	unsigned existing_bitmaps = 0;
 
 	struct object_list *not_mapped = NULL;
 
+	unsatisfy_all_pseudo_merges(bitmap_git);
+
+	if (bitmap_git->pseudo_merges.nr) {
+		struct bitmap *roots_bitmap = bitmap_new();
+		struct object_list *objects = NULL;
+
+		for (objects = roots; objects; objects = objects->next) {
+			struct object *object = objects->item;
+			int pos;
+
+			pos = bitmap_position(bitmap_git, &object->oid);
+			if (pos < 0)
+				continue;
+
+			bitmap_set(roots_bitmap, pos);
+		}
+
+		base = bitmap_new();
+		if (!cascade_pseudo_merges_1(bitmap_git, base, roots_bitmap))
+			bitmap_free(roots_bitmap);
+	}
+
 	/*
 	 * Go through all the roots for the walk. The ones that have bitmaps
 	 * on the bitmap index will be `or`ed together to form an initial
@@ -1262,11 +1354,21 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 	 */
 	while (roots) {
 		struct object *object = roots->item;
+
 		roots = roots->next;
 
+		if (base) {
+			int pos = bitmap_position(bitmap_git, &object->oid);
+			if (pos > 0 && bitmap_get(base, pos)) {
+				object->flags |= SEEN;
+				continue;
+			}
+		}
+
 		if (object->type == OBJ_COMMIT &&
 		    add_commit_to_bitmap(bitmap_git, &base, (struct commit *)object)) {
 			object->flags |= SEEN;
+			existing_bitmaps = 1;
 			continue;
 		}
 
@@ -1282,6 +1384,9 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 
 	roots = not_mapped;
 
+	if (existing_bitmaps)
+		cascade_pseudo_merges_1(bitmap_git, base, NULL);
+
 	/*
 	 * Let's iterate through all the roots that don't have bitmaps to
 	 * check if we can determine them to be reachable from the existing
@@ -1866,6 +1971,11 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	object_list_free(&wants);
 	object_list_free(&haves);
 
+	trace2_data_intmax("bitmap", the_repository, "pseudo_merges_satisfied",
+			   pseudo_merges_satisfied_nr);
+	trace2_data_intmax("bitmap", the_repository, "pseudo_merges_cascades",
+			   pseudo_merges_cascades_nr);
+
 	return bitmap_git;
 
 cleanup:
diff --git a/t/t5333-pseudo-merge-bitmaps.sh b/t/t5333-pseudo-merge-bitmaps.sh
new file mode 100755
index 00000000000..4c9aebcffdc
--- /dev/null
+++ b/t/t5333-pseudo-merge-bitmaps.sh
@@ -0,0 +1,328 @@
+#!/bin/sh
+
+test_description='pseudo-merge bitmaps'
+
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
+. ./test-lib.sh
+
+test_pseudo_merges () {
+	test-tool bitmap dump-pseudo-merges
+}
+
+test_pseudo_merge_commits () {
+	test-tool bitmap dump-pseudo-merge-commits "$1"
+}
+
+test_pseudo_merges_satisfied () {
+	test_trace2_data bitmap pseudo_merges_satisfied "$1"
+}
+
+test_pseudo_merges_cascades () {
+	test_trace2_data bitmap pseudo_merges_cascades "$1"
+}
+
+tag_everything () {
+	git rev-list --all --no-object-names >in &&
+	perl -lne '
+		print "create refs/tags/" . $. . " " . $1 if /([0-9a-f]+)/
+	' <in | git update-ref --stdin
+}
+
+test_expect_success 'setup' '
+	test_commit_bulk 512 &&
+	tag_everything
+'
+
+test_expect_success 'bitmap traversal without pseudo-merges' '
+	git repack -adb &&
+
+	git rev-list --count --all --objects >expect &&
+
+	: >trace2.txt &&
+	GIT_TRACE2_EVENT=$PWD/trace2.txt \
+		git rev-list --count --all --objects --use-bitmap-index >actual &&
+
+	test_pseudo_merges_satisfied 0 <trace2.txt &&
+	test_pseudo_merges_cascades 0 <trace2.txt &&
+	test_pseudo_merges >merges &&
+	test_must_be_empty merges &&
+	test_cmp expect actual
+'
+
+test_expect_success 'pseudo-merges accurately represent their objects' '
+	test_config bitmapPseudoMerge.test.pattern "refs/tags/" &&
+	test_config bitmapPseudoMerge.test.maxMerges 8 &&
+	test_config bitmapPseudoMerge.test.stableThreshold never &&
+
+	git repack -adb &&
+
+	test_pseudo_merges >merges &&
+	test_line_count = 8 merges &&
+
+	for i in $(test_seq 0 $(($(wc -l <merges)-1)))
+	do
+		test-tool bitmap dump-pseudo-merge-commits $i >commits &&
+
+		git rev-list --objects --no-object-names --stdin <commits >expect.raw &&
+		test-tool bitmap dump-pseudo-merge-objects $i >actual.raw &&
+
+		sort -u <expect.raw >expect &&
+		sort -u <actual.raw >actual &&
+
+		test_cmp expect actual || return 1
+	done
+'
+
+test_expect_success 'bitmap traversal with pseudo-merges' '
+	: >trace2.txt &&
+	GIT_TRACE2_EVENT=$PWD/trace2.txt \
+		git rev-list --count --all --objects --use-bitmap-index >actual &&
+	git rev-list --count --all --objects >expect &&
+
+	test_pseudo_merges_satisfied 8 <trace2.txt &&
+	test_pseudo_merges_cascades 1 <trace2.txt &&
+	test_cmp expect actual
+'
+
+test_expect_success 'stale bitmap traversal with pseudo-merges' '
+	test_commit other &&
+
+	: >trace2.txt &&
+	GIT_TRACE2_EVENT=$PWD/trace2.txt \
+		git rev-list --count --all --objects --use-bitmap-index >actual &&
+	git rev-list --count --all --objects >expect &&
+
+	test_pseudo_merges_satisfied 8 <trace2.txt &&
+	test_pseudo_merges_cascades 1 <trace2.txt &&
+	test_cmp expect actual
+'
+
+test_expect_success 'bitmapPseudoMerge.sampleRate adjusts commit selection rate' '
+	test_config bitmapPseudoMerge.test.pattern "refs/tags/" &&
+	test_config bitmapPseudoMerge.test.maxMerges 1 &&
+	test_config bitmapPseudoMerge.test.stableThreshold never &&
+
+	commits_nr=$(git rev-list --all --count) &&
+
+	for rate in 1.0 0.5 0.25
+	do
+		git -c bitmapPseudoMerge.test.sampleRate=$rate repack -adb &&
+
+		test_pseudo_merges >merges &&
+		test_line_count = 1 merges &&
+		test_pseudo_merge_commits 0 >commits &&
+
+		test-tool bitmap list-commits >bitmaps &&
+		bitmaps_nr="$(wc -l <bitmaps)" &&
+
+		perl -MPOSIX -e "print ceil(\$ARGV[0]*(\$ARGV[1]-\$ARGV[2]))" \
+			"$rate" "$commits_nr" "$bitmaps_nr" >expect &&
+
+		test $(cat expect) -eq $(wc -l <commits) || return 1
+	done
+'
+
+test_expect_success 'bitmapPseudoMerge.threshold excludes newer commits' '
+	git init pseudo-merge-threshold &&
+	(
+		cd pseudo-merge-threshold &&
+
+		new="1672549200" && # 2023-01-01
+		old="1641013200" && # 2022-01-01
+
+		GIT_COMMITTER_DATE="$new +0000" &&
+		export GIT_COMMITTER_DATE &&
+		test_commit_bulk --message="new" --notick 128 &&
+
+		GIT_COMMITTER_DATE="$old +0000" &&
+		export GIT_COMMITTER_DATE &&
+		test_commit_bulk --message="old" --notick 128 &&
+
+		tag_everything &&
+
+		git \
+			-c bitmapPseudoMerge.test.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.test.maxMerges=1 \
+			-c bitmapPseudoMerge.test.threshold=$(($new - 1)) \
+			-c bitmapPseudoMerge.test.stableThreshold=never \
+			repack -adb &&
+
+		test_pseudo_merges >merges &&
+		test_line_count = 1 merges &&
+
+		test_pseudo_merge_commits 0 >oids &&
+		git cat-file --batch <oids >commits &&
+
+		test $(wc -l <oids) = $(grep -c "^committer.*$old +0000$" commits)
+	)
+'
+
+test_expect_success 'bitmapPseudoMerge.stableThreshold creates stable groups' '
+	(
+		cd pseudo-merge-threshold &&
+
+		new="1672549200" && # 2023-01-01
+		mid="1654059600" && # 2022-06-01
+		old="1641013200" && # 2022-01-01
+
+		GIT_COMMITTER_DATE="$mid +0000" &&
+		export GIT_COMMITTER_DATE &&
+		test_commit_bulk --message="mid" --notick 128 &&
+
+		git for-each-ref --format="delete %(refname)" refs/tags >in &&
+		git update-ref --stdin <in &&
+
+		tag_everything &&
+
+		git \
+			-c bitmapPseudoMerge.test.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.test.maxMerges=1 \
+			-c bitmapPseudoMerge.test.threshold=$(($new - 1)) \
+			-c bitmapPseudoMerge.test.stableThreshold=$(($mid - 1)) \
+			-c bitmapPseudoMerge.test.stableSize=10 \
+			repack -adb &&
+
+		test_pseudo_merges >merges &&
+		merges_nr="$(wc -l <merges)" &&
+
+		for i in $(test_seq $(($merges_nr - 1)))
+		do
+			test_pseudo_merge_commits 0 >oids &&
+			git cat-file --batch <oids >commits &&
+
+			expect="$(grep -c "^committer.*$old +0000$" commits)" &&
+			actual="$(wc -l <oids)" &&
+
+			test $expect = $actual || return 1
+		done &&
+
+		test_pseudo_merge_commits $(($merges_nr - 1)) >oids &&
+		git cat-file --batch <oids >commits &&
+		test $(wc -l <oids) = $(grep -c "^committer.*$mid +0000$" commits)
+	)
+'
+
+test_expect_success 'out of order thresholds are rejected' '
+	test_must_fail git \
+		-c bitmapPseudoMerge.test.pattern="refs/*" \
+		-c bitmapPseudoMerge.test.threshold=1.month.ago \
+		-c bitmapPseudoMerge.test.stableThreshold=1.week.ago \
+		repack -adb 2>err &&
+
+	cat >expect <<-EOF &&
+	fatal: pseudo-merge group ${SQ}test${SQ} has unstable threshold before stable one
+	EOF
+
+	test_cmp expect err
+'
+
+test_expect_success 'pseudo-merge pattern with capture groups' '
+	git init pseudo-merge-captures &&
+	(
+		cd pseudo-merge-captures &&
+
+		test_commit_bulk 128 &&
+		tag_everything &&
+
+		for r in $(test_seq 8)
+		do
+			test_commit_bulk 16 &&
+
+			git rev-list HEAD~16.. >in &&
+
+			perl -lne "print \"create refs/remotes/$r/tags/\$. \$_\"" <in |
+			git update-ref --stdin || return 1
+		done &&
+
+		git \
+			-c bitmapPseudoMerge.tags.pattern="refs/remotes/([0-9]+)/tags/" \
+			-c bitmapPseudoMerge.tags.maxMerges=1 \
+			repack -adb &&
+
+		git for-each-ref --format="%(objectname) %(refname)" >refs &&
+
+		test_pseudo_merges >merges &&
+		for m in $(test_seq 0 $(($(wc -l <merges) - 1)))
+		do
+			test_pseudo_merge_commits $m >oids &&
+			grep -f oids refs |
+			perl -lne "print \$1 if /refs\/remotes\/([0-9]+)/" |
+			sort -u || return 1
+		done >remotes &&
+
+		test $(wc -l <remotes) -eq $(sort -u <remotes | wc -l)
+	)
+'
+
+test_expect_success 'pseudo-merge overlap setup' '
+	git init pseudo-merge-overlap &&
+	(
+		cd pseudo-merge-overlap &&
+
+		test_commit_bulk 256 &&
+		tag_everything &&
+
+		git \
+			-c bitmapPseudoMerge.all.pattern="refs/" \
+			-c bitmapPseudoMerge.all.maxMerges=1 \
+			-c bitmapPseudoMerge.all.stableThreshold=never \
+			-c bitmapPseudoMerge.tags.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.tags.maxMerges=1 \
+			-c bitmapPseudoMerge.tags.stableThreshold=never \
+			repack -adb
+	)
+'
+
+test_expect_success 'pseudo-merge overlap generates overlapping groups' '
+	(
+		cd pseudo-merge-overlap &&
+
+		test_pseudo_merges >merges &&
+		test_line_count = 2 merges &&
+
+		test_pseudo_merge_commits 0 >commits-0.raw &&
+		test_pseudo_merge_commits 1 >commits-1.raw &&
+
+		sort commits-0.raw >commits-0 &&
+		sort commits-1.raw >commits-1 &&
+
+		comm -12 commits-0 commits-1 >overlap &&
+
+		test_line_count -gt 0 overlap
+	)
+'
+
+test_expect_success 'pseudo-merge overlap traversal' '
+	(
+		cd pseudo-merge-overlap &&
+
+		: >trace2.txt &&
+		GIT_TRACE2_EVENT=$PWD/trace2.txt \
+			git rev-list --count --all --objects --use-bitmap-index >actual &&
+		git rev-list --count --all --objects >expect &&
+
+		test_pseudo_merges_satisfied 2 <trace2.txt &&
+		test_pseudo_merges_cascades 1 <trace2.txt &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'pseudo-merge overlap stale traversal' '
+	(
+		cd pseudo-merge-overlap &&
+
+		test_commit other &&
+
+		: >trace2.txt &&
+		GIT_TRACE2_EVENT=$PWD/trace2.txt \
+			git rev-list --count --all --objects --use-bitmap-index >actual &&
+		git rev-list --count --all --objects >expect &&
+
+		test_pseudo_merges_satisfied 2 <trace2.txt &&
+		test_pseudo_merges_cascades 1 <trace2.txt &&
+		test_cmp expect actual
+	)
+'
+
+test_done
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 21/24] pack-bitmap: extra trace2 information
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (19 preceding siblings ...)
  2024-05-23 21:27   ` [PATCH v4 20/24] pack-bitmap.c: use pseudo-merges during traversal Taylor Blau
@ 2024-05-23 21:27   ` Taylor Blau
  2024-05-23 21:27   ` [PATCH v4 22/24] ewah: `bitmap_equals_ewah()` Taylor Blau
                     ` (3 subsequent siblings)
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:27 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Add some extra trace2 lines to capture the number of bitmap lookups that
are hits versus misses, as well as the number of reachability roots that
have bitmap coverage (versus those that do not).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index e61058dada6..1966b3b95f1 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -116,6 +116,10 @@ struct bitmap_index {
 
 static int pseudo_merges_satisfied_nr;
 static int pseudo_merges_cascades_nr;
+static int existing_bitmaps_hits_nr;
+static int existing_bitmaps_misses_nr;
+static int roots_with_bitmaps_nr;
+static int roots_without_bitmaps_nr;
 
 static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st)
 {
@@ -1040,10 +1044,14 @@ static int add_to_include_set(struct bitmap_index *bitmap_git,
 
 	partial = bitmap_for_commit(bitmap_git, commit);
 	if (partial) {
+		existing_bitmaps_hits_nr++;
+
 		bitmap_or_ewah(data->base, partial);
 		return 0;
 	}
 
+	existing_bitmaps_misses_nr++;
+
 	bitmap_set(data->base, bitmap_pos);
 	if (apply_pseudo_merges_for_commit_1(bitmap_git, data->base, commit,
 					     bitmap_pos))
@@ -1099,8 +1107,12 @@ static int add_commit_to_bitmap(struct bitmap_index *bitmap_git,
 {
 	struct ewah_bitmap *or_with = bitmap_for_commit(bitmap_git, commit);
 
-	if (!or_with)
+	if (!or_with) {
+		existing_bitmaps_misses_nr++;
 		return 0;
+	}
+
+	existing_bitmaps_hits_nr++;
 
 	if (!*base)
 		*base = ewah_to_bitmap(or_with);
@@ -1407,8 +1419,12 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
 			object->flags &= ~UNINTERESTING;
 			add_pending_object(revs, object, "");
 			needs_walk = 1;
+
+			roots_without_bitmaps_nr++;
 		} else {
 			object->flags |= SEEN;
+
+			roots_with_bitmaps_nr++;
 		}
 	}
 
@@ -1975,6 +1991,14 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 			   pseudo_merges_satisfied_nr);
 	trace2_data_intmax("bitmap", the_repository, "pseudo_merges_cascades",
 			   pseudo_merges_cascades_nr);
+	trace2_data_intmax("bitmap", the_repository, "bitmap/hits",
+			   existing_bitmaps_hits_nr);
+	trace2_data_intmax("bitmap", the_repository, "bitmap/misses",
+			   existing_bitmaps_misses_nr);
+	trace2_data_intmax("bitmap", the_repository, "bitmap/roots_with_bitmap",
+			   roots_with_bitmaps_nr);
+	trace2_data_intmax("bitmap", the_repository, "bitmap/roots_without_bitmap",
+			   roots_without_bitmaps_nr);
 
 	return bitmap_git;
 
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 22/24] ewah: `bitmap_equals_ewah()`
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (20 preceding siblings ...)
  2024-05-23 21:27   ` [PATCH v4 21/24] pack-bitmap: extra trace2 information Taylor Blau
@ 2024-05-23 21:27   ` Taylor Blau
  2024-05-23 21:27   ` [PATCH v4 23/24] pseudo-merge: implement support for finding existing merges Taylor Blau
                     ` (2 subsequent siblings)
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:27 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Prepare to reuse existing pseudo-merge bitmaps by implementing a
`bitmap_equals_ewah()` helper.

This helper will be used to see if a raw bitmap (containing the set of
parents for some pseudo-merge) is equal to any existing pseudo-merge's
commits bitmap (which are stored as EWAH-compressed bitmaps on disk).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 ewah/bitmap.c | 19 +++++++++++++++++++
 ewah/ewok.h   |  1 +
 2 files changed, 20 insertions(+)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index dc2ca190f12..55928dada86 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -261,6 +261,25 @@ int bitmap_equals(struct bitmap *self, struct bitmap *other)
 	return 1;
 }
 
+int bitmap_equals_ewah(struct bitmap *self, struct ewah_bitmap *other)
+{
+	struct ewah_iterator it;
+	eword_t word;
+	size_t i = 0;
+
+	ewah_iterator_init(&it, other);
+
+	while (ewah_iterator_next(&word, &it))
+		if (word != (i < self->word_alloc ? self->words[i++] : 0))
+			return 0;
+
+	for (; i < self->word_alloc; i++)
+		if (self->words[i])
+			return 0;
+
+	return 1;
+}
+
 int bitmap_is_subset(struct bitmap *self, struct bitmap *other)
 {
 	size_t common_size, i;
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 7074a6347b7..5e357e24933 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -179,6 +179,7 @@ void bitmap_unset(struct bitmap *self, size_t pos);
 int bitmap_get(struct bitmap *self, size_t pos);
 void bitmap_free(struct bitmap *self);
 int bitmap_equals(struct bitmap *self, struct bitmap *other);
+int bitmap_equals_ewah(struct bitmap *self, struct ewah_bitmap *other);
 
 /*
  * Both `bitmap_is_subset()` and `ewah_bitmap_is_subset()` return 1 if the set
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 23/24] pseudo-merge: implement support for finding existing merges
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (21 preceding siblings ...)
  2024-05-23 21:27   ` [PATCH v4 22/24] ewah: `bitmap_equals_ewah()` Taylor Blau
@ 2024-05-23 21:27   ` Taylor Blau
  2024-05-23 21:27   ` [PATCH v4 24/24] t/perf: implement performance tests for pseudo-merge bitmaps Taylor Blau
  2024-05-25  3:26   ` [PATCH v4 00/24] pack-bitmap: pseudo-merge reachability bitmaps Jeff King
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:27 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

This patch implements support for reusing existing pseudo-merge commits
when writing bitmaps when there is an existing pseudo-merge bitmap which
has exactly the same set of parents as one that we are about to write.

Note that unstable pseudo-merges are likely to change between
consecutive repacks, and so are generally poor candidates for reuse.
However, stable pseudo-merges (see the configuration option
'bitmapPseudoMerge.<name>.stableThreshold') are by definition unlikely
to change between runs (as they represent long-running branches).

Because there is no index from a *set* of pseudo-merge parents to a
matching pseudo-merge bitmap, we have to construct the bitmap
corresponding to the set of parents for each pending pseudo-merge commit
and see if a matching bitmap exists.

This is technically quadratic in the number of pseudo-merges, but is OK
in practice for a couple of reasons:

  - non-matching pseudo-merge bitmaps are rejected quickly as soon as
    they differ in a single bit

  - already-matched pseudo-merge bitmaps are discarded from subsequent
    rounds of search

  - the number of pseudo-merges is generally small, even for large
    repositories

In order to do this, implement (a) a function that finds a matching
pseudo-merge given some uncompressed bitset describing its parents, (b)
a function that computes the bitset of parents for a given pseudo-merge
commit, and (c) call that function before computing the set of reachable
objects for some pending pseudo-merge.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c             | 15 +++++++-
 pack-bitmap.c                   | 32 ++++++++++++++++
 pack-bitmap.h                   |  2 +
 pseudo-merge.c                  | 55 ++++++++++++++++++++++++++++
 pseudo-merge.h                  |  7 ++++
 t/t5333-pseudo-merge-bitmaps.sh | 65 +++++++++++++++++++++++++++++++++
 6 files changed, 174 insertions(+), 2 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 47250398aa2..6e8060f8a0b 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -19,6 +19,10 @@
 #include "tree-walk.h"
 #include "pseudo-merge.h"
 #include "oid-array.h"
+#include "config.h"
+#include "alloc.h"
+#include "refs.h"
+#include "strmap.h"
 
 struct bitmapped_commit {
 	struct commit *commit;
@@ -465,6 +469,7 @@ static int fill_bitmap_tree(struct bitmap_writer *writer,
 }
 
 static int reused_bitmaps_nr;
+static int reused_pseudo_merge_bitmaps_nr;
 
 static int fill_bitmap_commit(struct bitmap_writer *writer,
 			      struct bb_commit *ent,
@@ -490,7 +495,7 @@ static int fill_bitmap_commit(struct bitmap_writer *writer,
 			struct bitmap *remapped = bitmap_new();
 
 			if (commit->object.flags & BITMAP_PSEUDO_MERGE)
-				old = NULL;
+				old = pseudo_merge_bitmap_for_commit(old_bitmap, c);
 			else
 				old = bitmap_for_commit(old_bitmap, c);
 			/*
@@ -501,7 +506,10 @@ static int fill_bitmap_commit(struct bitmap_writer *writer,
 			if (old && !rebuild_bitmap(mapping, old, remapped)) {
 				bitmap_or(ent->bitmap, remapped);
 				bitmap_free(remapped);
-				reused_bitmaps_nr++;
+				if (commit->object.flags & BITMAP_PSEUDO_MERGE)
+					reused_pseudo_merge_bitmaps_nr++;
+				else
+					reused_bitmaps_nr++;
 				continue;
 			}
 			bitmap_free(remapped);
@@ -631,6 +639,9 @@ int bitmap_writer_build(struct bitmap_writer *writer,
 			    the_repository);
 	trace2_data_intmax("pack-bitmap-write", the_repository,
 			   "building_bitmaps_reused", reused_bitmaps_nr);
+	trace2_data_intmax("pack-bitmap-write", the_repository,
+			   "building_bitmaps_pseudo_merge_reused",
+			   reused_pseudo_merge_bitmaps_nr);
 
 	stop_progress(&writer->progress);
 
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 1966b3b95f1..70230e26479 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1316,6 +1316,37 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 	return cb.base;
 }
 
+struct ewah_bitmap *pseudo_merge_bitmap_for_commit(struct bitmap_index *bitmap_git,
+						   struct commit *commit)
+{
+	struct commit_list *p;
+	struct bitmap *parents;
+	struct pseudo_merge *match = NULL;
+
+	if (!bitmap_git->pseudo_merges.nr)
+		return NULL;
+
+	parents = bitmap_new();
+
+	for (p = commit->parents; p; p = p->next) {
+		int pos = bitmap_position(bitmap_git, &p->item->object.oid);
+		if (pos < 0 || pos >= bitmap_num_objects(bitmap_git))
+			goto done;
+
+		bitmap_set(parents, pos);
+	}
+
+	match = pseudo_merge_for_parents(&bitmap_git->pseudo_merges,
+						parents);
+
+done:
+	bitmap_free(parents);
+	if (match)
+		return pseudo_merge_bitmap(&bitmap_git->pseudo_merges, match);
+
+	return NULL;
+}
+
 static void unsatisfy_all_pseudo_merges(struct bitmap_index *bitmap_git)
 {
 	uint32_t i;
@@ -2809,6 +2840,7 @@ void free_bitmap_index(struct bitmap_index *b)
 		 */
 		close_midx_revindex(b->midx);
 	}
+	free_pseudo_merge_map(&b->pseudo_merges);
 	free(b);
 }
 
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 4466b5ad0fb..1171e6d9893 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -142,6 +142,8 @@ int rebuild_bitmap(const uint32_t *reposition,
 		   struct bitmap *dest);
 struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
 				      struct commit *commit);
+struct ewah_bitmap *pseudo_merge_bitmap_for_commit(struct bitmap_index *bitmap_git,
+						   struct commit *commit);
 void bitmap_writer_select_commits(struct bitmap_writer *writer,
 				  struct commit **indexed_commits,
 				  unsigned int indexed_commits_nr);
diff --git a/pseudo-merge.c b/pseudo-merge.c
index 7d131011497..a117520996c 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -699,3 +699,58 @@ int cascade_pseudo_merges(const struct pseudo_merge_map *pm,
 
 	return ret;
 }
+
+struct pseudo_merge *pseudo_merge_for_parents(const struct pseudo_merge_map *pm,
+					      struct bitmap *parents)
+{
+	struct pseudo_merge *match = NULL;
+	size_t i;
+
+	if (!pm->nr)
+		return NULL;
+
+	/*
+	 * NOTE: this loop is quadratic in the worst-case (where no
+	 * matching pseudo-merge bitmaps are found), but in practice
+	 * this is OK for a few reasons:
+	 *
+	 *   - Rejecting pseudo-merge bitmaps that do not match the
+	 *     given commit is done quickly (i.e. `bitmap_equals_ewah()`
+	 *     returns early when we know the two bitmaps aren't equal.
+	 *
+	 *   - Already matched pseudo-merge bitmaps (which we track with
+	 *     the `->satisfied` bit here) are skipped as potential
+	 *     candidates.
+	 *
+	 *   - The number of pseudo-merges should be small (in the
+	 *     hundreds for most repositories).
+	 *
+	 * If in the future this semi-quadratic behavior does become a
+	 * problem, another approach would be to keep track of which
+	 * pseudo-merges are still "viable" after enumerating the
+	 * pseudo-merge commit's parents:
+	 *
+	 *   - A pseudo-merge bitmap becomes non-viable when the bit(s)
+	 *     corresponding to one or more parent(s) of the given
+	 *     commit are not set in a candidate pseudo-merge's commits
+	 *     bitmap.
+	 *
+	 *   - After processing all bits, enumerate the remaining set of
+	 *     viable pseudo-merge bitmaps, and check that their
+	 *     popcount() matches the number of parents in the given
+	 *     commit.
+	 */
+	for (i = 0; i < pm->nr; i++) {
+		struct pseudo_merge *candidate = use_pseudo_merge(pm, &pm->v[i]);
+		if (!candidate || candidate->satisfied)
+			continue;
+		if (!bitmap_equals_ewah(parents, candidate->commits))
+			continue;
+
+		match = candidate;
+		match->satisfied = 1;
+		break;
+	}
+
+	return match;
+}
diff --git a/pseudo-merge.h b/pseudo-merge.h
index 755edc054ae..2aca01d0566 100644
--- a/pseudo-merge.h
+++ b/pseudo-merge.h
@@ -206,4 +206,11 @@ int cascade_pseudo_merges(const struct pseudo_merge_map *pm,
 			  struct bitmap *result,
 			  struct bitmap *roots);
 
+/*
+ * Returns a pseudo-merge which contains the exact set of commits
+ * listed in the "parents" bitamp, or NULL if none could be found.
+ */
+struct pseudo_merge *pseudo_merge_for_parents(const struct pseudo_merge_map *pm,
+					      struct bitmap *parents);
+
 #endif
diff --git a/t/t5333-pseudo-merge-bitmaps.sh b/t/t5333-pseudo-merge-bitmaps.sh
index 4c9aebcffdc..f052f395a77 100755
--- a/t/t5333-pseudo-merge-bitmaps.sh
+++ b/t/t5333-pseudo-merge-bitmaps.sh
@@ -22,6 +22,10 @@ test_pseudo_merges_cascades () {
 	test_trace2_data bitmap pseudo_merges_cascades "$1"
 }
 
+test_pseudo_merges_reused () {
+	test_trace2_data pack-bitmap-write building_bitmaps_pseudo_merge_reused "$1"
+}
+
 tag_everything () {
 	git rev-list --all --no-object-names >in &&
 	perl -lne '
@@ -325,4 +329,65 @@ test_expect_success 'pseudo-merge overlap stale traversal' '
 	)
 '
 
+test_expect_success 'pseudo-merge reuse' '
+	git init pseudo-merge-reuse &&
+	(
+		cd pseudo-merge-reuse &&
+
+		stable="1641013200" && # 2022-01-01
+		unstable="1672549200" && # 2023-01-01
+
+		GIT_COMMITTER_DATE="$stable +0000" &&
+		export GIT_COMMITTER_DATE &&
+		test_commit_bulk --notick 128 &&
+		GIT_COMMITTER_DATE="$unstable +0000" &&
+		export GIT_COMMITTER_DATE &&
+		test_commit_bulk --notick 128 &&
+
+		tag_everything &&
+
+		git \
+			-c bitmapPseudoMerge.test.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.test.maxMerges=1 \
+			-c bitmapPseudoMerge.test.threshold=now \
+			-c bitmapPseudoMerge.test.stableThreshold=$(($unstable - 1)) \
+			-c bitmapPseudoMerge.test.stableSize=512 \
+			repack -adb &&
+
+		test_pseudo_merges >merges &&
+		test_line_count = 2 merges &&
+
+		test_pseudo_merge_commits 0 >stable-oids.before &&
+		test_pseudo_merge_commits 1 >unstable-oids.before &&
+
+		: >trace2.txt &&
+		GIT_TRACE2_EVENT=$PWD/trace2.txt git \
+			-c bitmapPseudoMerge.test.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.test.maxMerges=2 \
+			-c bitmapPseudoMerge.test.threshold=now \
+			-c bitmapPseudoMerge.test.stableThreshold=$(($unstable - 1)) \
+			-c bitmapPseudoMerge.test.stableSize=512 \
+			repack -adb &&
+
+		test_pseudo_merges_reused 1 <trace2.txt &&
+
+		test_pseudo_merges >merges &&
+		test_line_count = 3 merges &&
+
+		test_pseudo_merge_commits 0 >stable-oids.after &&
+		for i in 1 2
+		do
+			test_pseudo_merge_commits $i || return 1
+		done >unstable-oids.after &&
+
+		sort -u <stable-oids.before >expect &&
+		sort -u <stable-oids.after >actual &&
+		test_cmp expect actual &&
+
+		sort -u <unstable-oids.before >expect &&
+		sort -u <unstable-oids.after >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.45.1.175.gcf0316ad0e9


^ permalink raw reply related	[flat|nested] 157+ messages in thread

* [PATCH v4 24/24] t/perf: implement performance tests for pseudo-merge bitmaps
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (22 preceding siblings ...)
  2024-05-23 21:27   ` [PATCH v4 23/24] pseudo-merge: implement support for finding existing merges Taylor Blau
@ 2024-05-23 21:27   ` Taylor Blau
  2024-05-25  3:26   ` [PATCH v4 00/24] pack-bitmap: pseudo-merge reachability bitmaps Jeff King
  24 siblings, 0 replies; 157+ messages in thread
From: Taylor Blau @ 2024-05-23 21:27 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano

Implement a straightforward performance test demonstrating the benefit
of pseudo-merge bitmaps by measuring how long it takes to count
reachable objects in a few different scenarios:

  - without bitmaps, to demonstrate a reasonable baseline
  - with bitmaps, but without pseudo-merges
  - with bitmaps and pseudo-merges

Results from running this test on git.git are as follows:

    Test                                                                this tree
    -----------------------------------------------------------------------------------
    5333.2: git rev-list --count --all --objects (no bitmaps)           3.54(3.45+0.08)
    5333.3: git rev-list --count --all --objects (no pseudo-merges)     0.43(0.40+0.03)
    5333.4: git rev-list --count --all --objects (with pseudo-merges)   0.12(0.11+0.01)

On a private repository which is much larger, and has many spikey parts
of history that aren't merged into the 'master' branch, the results are
as follows:

    Test                                                                this tree
    ---------------------------------------------------------------------------------------
    5333.1: git rev-list --count --all --objects (no bitmaps)           122.29(121.31+0.97)
    5333.2: git rev-list --count --all --objects (no pseudo-merges)     21.88(21.30+0.58)
    5333.3: git rev-list --count --all --objects (with pseudo-merges)   5.05(4.77+0.28)

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/p5333-pseudo-merge-bitmaps.sh | 32 ++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)
 create mode 100755 t/perf/p5333-pseudo-merge-bitmaps.sh

diff --git a/t/perf/p5333-pseudo-merge-bitmaps.sh b/t/perf/p5333-pseudo-merge-bitmaps.sh
new file mode 100755
index 00000000000..2e8b1d2635e
--- /dev/null
+++ b/t/perf/p5333-pseudo-merge-bitmaps.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+
+test_description='pseudo-merge bitmaps'
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test_expect_success 'setup' '
+	git \
+		-c bitmapPseudoMerge.all.pattern="refs/" \
+		-c bitmapPseudoMerge.all.threshold=now \
+		-c bitmapPseudoMerge.all.stableThreshold=never \
+		-c bitmapPseudoMerge.all.maxMerges=64 \
+		-c pack.writeBitmapLookupTable=true \
+		repack -adb
+'
+
+test_perf 'git rev-list --count --all --objects (no bitmaps)' '
+	git rev-list --objects --all
+'
+
+test_perf 'git rev-list --count --all --objects (no pseudo-merges)' '
+	GIT_TEST_USE_PSEUDO_MERGES=0 \
+		git rev-list --objects --all --use-bitmap-index
+'
+
+test_perf 'git rev-list --count --all --objects (with pseudo-merges)' '
+	GIT_TEST_USE_PSEUDO_MERGES=1 \
+		git rev-list --objects --all --use-bitmap-index
+'
+
+test_done
-- 
2.45.1.175.gcf0316ad0e9

^ permalink raw reply related	[flat|nested] 157+ messages in thread

* Re: [PATCH v3 30/30] t/perf: implement performace tests for pseudo-merge bitmaps
  2024-05-23 19:53       ` Taylor Blau
@ 2024-05-25  3:13         ` Jeff King
  0 siblings, 0 replies; 157+ messages in thread
From: Jeff King @ 2024-05-25  3:13 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano

On Thu, May 23, 2024 at 03:53:08PM -0400, Taylor Blau wrote:

> -	GIT_TEST_USE_PSEDUO_MERGES=0 \
> +	GIT_TEST_USE_PSEUDO_MERGES=0 \

Whoops.

> Sure enough, that shows us a little gap between the "no pseudo-merges"
> and "with pseudo-merges" case:
> 
> ```
> Test                                                                this tree
> -----------------------------------------------------------------------------------
> 5333.2: git rev-list --count --all --objects (no bitmaps)           3.54(3.45+0.08)
> 5333.3: git rev-list --count --all --objects (no pseudo-merges)     0.43(0.40+0.03)
> 5333.4: git rev-list --count --all --objects (with pseudo-merges)   0.12(0.11+0.01)
> ```

OK, that seems more like it. 300ms is nice, but there's just not that
much improvement to make here.

This one is much more exciting:

> ```
> Test                                                                this tree
> ---------------------------------------------------------------------------------------
> 5333.1: git rev-list --count --all --objects (no bitmaps)           122.29(121.31+0.97)
> 5333.2: git rev-list --count --all --objects (no pseudo-merges)     21.88(21.30+0.58)
> 5333.3: git rev-list --count --all --objects (with pseudo-merges)   5.05(4.77+0.28)
> ```

Very nice improvement.

I wonder what we spend the final 5s on.  Maybe just book-keeping to
assemble all the tips (and maybe even parse tip commits? I can't
remember if we ever optimized that out). Anyway, that's all out of scope
for your series. Getting rid of the expensive traversal would let us
focus on those final bits. ;)

-Peff

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v3 00/30] pack-bitmap: pseudo-merge reachability bitmaps
  2024-05-23 20:04     ` Taylor Blau
@ 2024-05-25  3:15       ` Jeff King
  0 siblings, 0 replies; 157+ messages in thread
From: Jeff King @ 2024-05-25  3:15 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano

On Thu, May 23, 2024 at 04:04:15PM -0400, Taylor Blau wrote:

> On Thu, May 23, 2024 at 07:05:32AM -0400, Jeff King wrote:
> > I wonder if the start of the pseudo-merge section should have a 4-byte
> > version/flags field itself? I don't think that's something we've done
> > before, and maybe it's overkill. I dunno. It's just a lot easier to do
> > now than later.
> 
> I think the tricky thing here would be that the extension itself is a
> variable size, so every version would have to put the "extension size"
> field in the same place.
> 
> Otherwise, an older Git client which doesn't understand a future version
> of the pseudo-merge extension wouldn't know how large the extension is,
> and wouldn't be able to adjust the index_end field appropriately to skip
> over it.
> 
> Of course, we could make it a convention that says "all versions have to
> place the extension size field at the same relative offset", but it
> feels weird to read some of the extension while not understanding the
> whole thing.

Ah, yeah, I didn't think of that. That definitely complicates things.

It certainly would be possible to have a version+size header at the
start. Which is...basically reinventing the chunk format. Let's not
worry about it for now, and as a long-term thing we might consider
moving the bitmap format over to the chunk style.

-Peff

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v4 11/24] pseudo-merge: implement support for selecting pseudo-merge commits
  2024-05-23 21:26   ` [PATCH v4 11/24] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
@ 2024-05-25  3:22     ` Jeff King
  0 siblings, 0 replies; 157+ messages in thread
From: Jeff King @ 2024-05-25  3:22 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano

On Thu, May 23, 2024 at 05:26:42PM -0400, Taylor Blau wrote:

> +static uint32_t pseudo_merge_group_size(const struct pseudo_merge_group *group,
> +					const struct pseudo_merge_matches *matches,
> +					uint32_t i)
> +{
> +	double C = 0.0f;

This version mostly drops the "f" from floating point constants, since
they're now doubles. But this one doesn't.

I don't think it really matters in practice (the number obviously fits
in a float, and then it ends up as a double in the variable), so it's
really just a style / readability question.

Not worth a re-roll IMHO.

-Peff

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v4 19/24] t/test-lib-functions.sh: support `--notick` in `test_commit_bulk()`
  2024-05-23 21:27   ` [PATCH v4 19/24] t/test-lib-functions.sh: support `--notick` in `test_commit_bulk()` Taylor Blau
@ 2024-05-25  3:25     ` Jeff King
  0 siblings, 0 replies; 157+ messages in thread
From: Jeff King @ 2024-05-25  3:25 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano

On Thu, May 23, 2024 at 05:27:08PM -0400, Taylor Blau wrote:

> One of the tests we'll want to add for pseudo-merge bitmaps needs to be
> able to generate a large number of commits at a specific date.
> 
> Support the `--notick` option (with identical semantics to the
> `--notick` option for `test_commit()`) within `test_commit_bulk` as a
> prerequisite for that. Callers can then set the various _DATE variables
> themselves.

Looks good. I expected to see you add "--date" _also_, but it looks like
you just set $GIT_COMMITTER_DATE yourself. Which I think is fine for our
purposes here.

-Peff

^ permalink raw reply	[flat|nested] 157+ messages in thread

* Re: [PATCH v4 00/24] pack-bitmap: pseudo-merge reachability bitmaps
  2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
                     ` (23 preceding siblings ...)
  2024-05-23 21:27   ` [PATCH v4 24/24] t/perf: implement performance tests for pseudo-merge bitmaps Taylor Blau
@ 2024-05-25  3:26   ` Jeff King
  24 siblings, 0 replies; 157+ messages in thread
From: Jeff King @ 2024-05-25  3:26 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano

On Thu, May 23, 2024 at 05:26:06PM -0400, Taylor Blau wrote:

> Here is another reroll my topic to introduce pseudo-merge bitmaps.
> 
> The implementation is still relatively unchanged compared to last time,
> save for the review that Peff provided on the remaining parts of this
> series.

Thanks, this version looks good to me!

-Peff

^ permalink raw reply	[flat|nested] 157+ messages in thread

end of thread, other threads:[~2024-05-25  3:26 UTC | newest]

Thread overview: 157+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
2024-03-20 22:05 ` [PATCH 01/24] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
2024-03-21 21:24   ` Junio C Hamano
2024-03-21 22:13     ` Taylor Blau
2024-03-21 22:22       ` Junio C Hamano
2024-03-20 22:05 ` [PATCH 02/24] config: repo_config_get_expiry() Taylor Blau
2024-04-10 17:54   ` Jeff King
2024-04-29 19:39     ` Taylor Blau
2024-03-20 22:05 ` [PATCH 03/24] ewah: implement `ewah_bitmap_is_subset()` Taylor Blau
2024-04-10 18:05   ` Jeff King
2024-04-29 19:47     ` Taylor Blau
2024-03-20 22:05 ` [PATCH 04/24] pack-bitmap: drop unused `max_bitmaps` parameter Taylor Blau
2024-04-10 18:06   ` Jeff King
2024-03-20 22:05 ` [PATCH 05/24] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
2024-04-10 18:10   ` Jeff King
2024-03-20 22:05 ` [PATCH 06/24] pseudo-merge.ch: initial commit Taylor Blau
2024-03-20 22:05 ` [PATCH 07/24] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
2024-03-20 22:05 ` [PATCH 08/24] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Taylor Blau
2024-03-20 22:05 ` [PATCH 09/24] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Taylor Blau
2024-03-20 22:05 ` [PATCH 10/24] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
2024-03-20 22:05 ` [PATCH 11/24] pack-bitmap-write.c: select " Taylor Blau
2024-03-20 22:05 ` [PATCH 12/24] pack-bitmap-write.c: write pseudo-merge table Taylor Blau
2024-03-20 22:05 ` [PATCH 13/24] pack-bitmap: extract `read_bitmap()` function Taylor Blau
2024-03-20 22:05 ` [PATCH 14/24] pseudo-merge: scaffolding for reads Taylor Blau
2024-03-20 22:05 ` [PATCH 15/24] pack-bitmap.c: read pseudo-merge extension Taylor Blau
2024-03-20 22:05 ` [PATCH 16/24] pseudo-merge: implement support for reading pseudo-merge commits Taylor Blau
2024-03-20 22:05 ` [PATCH 17/24] ewah: implement `ewah_bitmap_popcount()` Taylor Blau
2024-03-20 22:05 ` [PATCH 18/24] pack-bitmap: implement test helpers for pseudo-merge Taylor Blau
2024-03-20 22:05 ` [PATCH 19/24] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` Taylor Blau
2024-03-20 22:05 ` [PATCH 20/24] pack-bitmap.c: use pseudo-merges during traversal Taylor Blau
2024-03-20 22:06 ` [PATCH 21/24] pack-bitmap: extra trace2 information Taylor Blau
2024-03-20 22:06 ` [PATCH 22/24] ewah: `bitmap_equals_ewah()` Taylor Blau
2024-03-20 22:06 ` [PATCH 23/24] pseudo-merge: implement support for finding existing merges Taylor Blau
2024-03-20 22:06 ` [PATCH 24/24] t/perf: implement performace tests for pseudo-merge bitmaps Taylor Blau
2024-03-21 19:50 ` [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Junio C Hamano
2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
2024-04-29 20:42   ` [PATCH v2 01/23] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
2024-05-06 11:52     ` Patrick Steinhardt
2024-05-06 16:37       ` Taylor Blau
2024-05-10 11:46         ` Patrick Steinhardt
2024-05-13 19:47           ` Taylor Blau
2024-05-14  6:33             ` Patrick Steinhardt
2024-04-29 20:43   ` [PATCH v2 02/23] ewah: implement `ewah_bitmap_is_subset()` Taylor Blau
2024-04-29 20:43   ` [PATCH v2 03/23] pack-bitmap: drop unused `max_bitmaps` parameter Taylor Blau
2024-04-29 20:43   ` [PATCH v2 04/23] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
2024-05-06 11:52     ` Patrick Steinhardt
2024-05-06 18:24       ` Taylor Blau
2024-04-29 20:43   ` [PATCH v2 05/23] pseudo-merge.ch: initial commit Taylor Blau
2024-04-29 20:43   ` [PATCH v2 06/23] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
2024-05-06 11:52     ` Patrick Steinhardt
2024-05-06 18:48       ` Taylor Blau
2024-05-10 11:47         ` Patrick Steinhardt
2024-05-13 18:42     ` Jeff King
2024-05-13 20:19       ` Taylor Blau
2024-04-29 20:43   ` [PATCH v2 07/23] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Taylor Blau
2024-04-29 20:43   ` [PATCH v2 08/23] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Taylor Blau
2024-05-13 18:50     ` Jeff King
2024-05-14  0:54       ` Taylor Blau
2024-04-29 20:43   ` [PATCH v2 09/23] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
2024-05-06 11:53     ` Patrick Steinhardt
2024-05-06 19:58       ` Taylor Blau
2024-05-13 19:03     ` Jeff King
2024-05-14  0:58       ` Taylor Blau
2024-05-16  8:07         ` Jeff King
2024-05-16 22:43           ` Junio C Hamano
2024-04-29 20:43   ` [PATCH v2 10/23] pack-bitmap-write.c: select " Taylor Blau
2024-05-06 11:53     ` Patrick Steinhardt
2024-05-06 20:05       ` Taylor Blau
2024-05-10 11:47         ` Patrick Steinhardt
2024-04-29 20:43   ` [PATCH v2 11/23] pack-bitmap-write.c: write pseudo-merge table Taylor Blau
2024-04-29 20:43   ` [PATCH v2 12/23] pack-bitmap: extract `read_bitmap()` function Taylor Blau
2024-04-29 20:43   ` [PATCH v2 13/23] pseudo-merge: scaffolding for reads Taylor Blau
2024-04-29 20:43   ` [PATCH v2 14/23] pack-bitmap.c: read pseudo-merge extension Taylor Blau
2024-04-29 20:44   ` [PATCH v2 15/23] pseudo-merge: implement support for reading pseudo-merge commits Taylor Blau
2024-04-29 20:44   ` [PATCH v2 16/23] ewah: implement `ewah_bitmap_popcount()` Taylor Blau
2024-04-29 20:44   ` [PATCH v2 17/23] pack-bitmap: implement test helpers for pseudo-merge Taylor Blau
2024-04-29 20:44   ` [PATCH v2 18/23] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` Taylor Blau
2024-04-29 20:44   ` [PATCH v2 19/23] pack-bitmap.c: use pseudo-merges during traversal Taylor Blau
2024-04-29 20:44   ` [PATCH v2 20/23] pack-bitmap: extra trace2 information Taylor Blau
2024-04-29 20:44   ` [PATCH v2 21/23] ewah: `bitmap_equals_ewah()` Taylor Blau
2024-04-29 20:44   ` [PATCH v2 22/23] pseudo-merge: implement support for finding existing merges Taylor Blau
2024-04-29 20:44   ` [PATCH v2 23/23] t/perf: implement performace tests for pseudo-merge bitmaps Taylor Blau
2024-04-30 20:03   ` [PATCH v2 00/23] pack-bitmap: pseudo-merge reachability bitmaps Junio C Hamano
2024-05-01 14:40     ` Taylor Blau
2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
2024-05-21 19:01   ` [PATCH v3 01/30] object.h: add flags allocated by pack-bitmap.h Taylor Blau
2024-05-21 19:06     ` Taylor Blau
2024-05-21 19:01   ` [PATCH v3 07/30] Documentation/gitpacking.txt: initial commit Taylor Blau
2024-05-21 19:02   ` [PATCH v3 08/30] Documentation/gitpacking.txt: describe pseudo-merge bitmaps Taylor Blau
2024-05-21 19:02   ` [PATCH v3 09/30] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
2024-05-21 19:02   ` [PATCH v3 10/30] ewah: implement `ewah_bitmap_is_subset()` Taylor Blau
2024-05-21 19:02   ` [PATCH v3 11/30] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
2024-05-21 19:02   ` [PATCH v3 12/30] pseudo-merge.ch: initial commit Taylor Blau
2024-05-21 19:02   ` [PATCH v3 13/30] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
2024-05-21 19:02   ` [PATCH v3 14/30] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Taylor Blau
2024-05-21 19:02   ` [PATCH v3 15/30] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Taylor Blau
2024-05-21 19:02   ` [PATCH v3 16/30] config: introduce git_config_float() Taylor Blau
2024-05-23 10:02     ` Jeff King
2024-05-23 17:51       ` Taylor Blau
2024-05-21 19:02   ` [PATCH v3 17/30] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
2024-05-23 10:12     ` Jeff King
2024-05-23 17:56       ` Taylor Blau
2024-05-21 19:02   ` [PATCH v3 18/30] pack-bitmap-write.c: write pseudo-merge table Taylor Blau
2024-05-21 19:02   ` [PATCH v3 19/30] pack-bitmap: extract `read_bitmap()` function Taylor Blau
2024-05-21 19:02   ` [PATCH v3 20/30] pseudo-merge: scaffolding for reads Taylor Blau
2024-05-21 19:02   ` [PATCH v3 21/30] pack-bitmap.c: read pseudo-merge extension Taylor Blau
2024-05-21 19:02   ` [PATCH v3 22/30] pseudo-merge: implement support for reading pseudo-merge commits Taylor Blau
2024-05-23 10:40     ` Jeff King
2024-05-23 18:09       ` Taylor Blau
2024-05-21 19:02   ` [PATCH v3 23/30] ewah: implement `ewah_bitmap_popcount()` Taylor Blau
2024-05-21 19:02   ` [PATCH v3 24/30] pack-bitmap: implement test helpers for pseudo-merge Taylor Blau
2024-05-21 19:02   ` [PATCH v3 25/30] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` Taylor Blau
2024-05-23 10:42     ` Jeff King
2024-05-23 15:45       ` Junio C Hamano
2024-05-23 18:23         ` Taylor Blau
2024-05-21 19:03   ` [PATCH v3 26/30] pack-bitmap.c: use pseudo-merges during traversal Taylor Blau
2024-05-23 10:48     ` Jeff King
2024-05-23 18:23       ` Taylor Blau
2024-05-21 19:03   ` [PATCH v3 27/30] pack-bitmap: extra trace2 information Taylor Blau
2024-05-21 19:03   ` [PATCH v3 28/30] ewah: `bitmap_equals_ewah()` Taylor Blau
2024-05-21 19:03   ` [PATCH v3 29/30] pseudo-merge: implement support for finding existing merges Taylor Blau
2024-05-21 19:03   ` [PATCH v3 30/30] t/perf: implement performace tests for pseudo-merge bitmaps Taylor Blau
2024-05-23 10:54     ` Jeff King
2024-05-23 19:53       ` Taylor Blau
2024-05-25  3:13         ` Jeff King
2024-05-23 11:05   ` [PATCH v3 00/30] pack-bitmap: pseudo-merge reachability bitmaps Jeff King
2024-05-23 20:04     ` Taylor Blau
2024-05-25  3:15       ` Jeff King
2024-05-23 20:42     ` Taylor Blau
2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
2024-05-23 21:26   ` [PATCH v4 01/24] Documentation/gitpacking.txt: initial commit Taylor Blau
2024-05-23 21:26   ` [PATCH v4 02/24] Documentation/gitpacking.txt: describe pseudo-merge bitmaps Taylor Blau
2024-05-23 21:26   ` [PATCH v4 03/24] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
2024-05-23 21:26   ` [PATCH v4 04/24] ewah: implement `ewah_bitmap_is_subset()` Taylor Blau
2024-05-23 21:26   ` [PATCH v4 05/24] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
2024-05-23 21:26   ` [PATCH v4 06/24] pseudo-merge.ch: initial commit Taylor Blau
2024-05-23 21:26   ` [PATCH v4 07/24] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
2024-05-23 21:26   ` [PATCH v4 08/24] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Taylor Blau
2024-05-23 21:26   ` [PATCH v4 09/24] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Taylor Blau
2024-05-23 21:26   ` [PATCH v4 10/24] config: introduce `git_config_double()` Taylor Blau
2024-05-23 21:26   ` [PATCH v4 11/24] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
2024-05-25  3:22     ` Jeff King
2024-05-23 21:26   ` [PATCH v4 12/24] pack-bitmap-write.c: write pseudo-merge table Taylor Blau
2024-05-23 21:26   ` [PATCH v4 13/24] pack-bitmap: extract `read_bitmap()` function Taylor Blau
2024-05-23 21:26   ` [PATCH v4 14/24] pseudo-merge: scaffolding for reads Taylor Blau
2024-05-23 21:26   ` [PATCH v4 15/24] pack-bitmap.c: read pseudo-merge extension Taylor Blau
2024-05-23 21:26   ` [PATCH v4 16/24] pseudo-merge: implement support for reading pseudo-merge commits Taylor Blau
2024-05-23 21:27   ` [PATCH v4 17/24] ewah: implement `ewah_bitmap_popcount()` Taylor Blau
2024-05-23 21:27   ` [PATCH v4 18/24] pack-bitmap: implement test helpers for pseudo-merge Taylor Blau
2024-05-23 21:27   ` [PATCH v4 19/24] t/test-lib-functions.sh: support `--notick` in `test_commit_bulk()` Taylor Blau
2024-05-25  3:25     ` Jeff King
2024-05-23 21:27   ` [PATCH v4 20/24] pack-bitmap.c: use pseudo-merges during traversal Taylor Blau
2024-05-23 21:27   ` [PATCH v4 21/24] pack-bitmap: extra trace2 information Taylor Blau
2024-05-23 21:27   ` [PATCH v4 22/24] ewah: `bitmap_equals_ewah()` Taylor Blau
2024-05-23 21:27   ` [PATCH v4 23/24] pseudo-merge: implement support for finding existing merges Taylor Blau
2024-05-23 21:27   ` [PATCH v4 24/24] t/perf: implement performance tests for pseudo-merge bitmaps Taylor Blau
2024-05-25  3:26   ` [PATCH v4 00/24] pack-bitmap: pseudo-merge reachability bitmaps Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).