git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format
@ 2022-06-20 12:33 Abhradeep Chakraborty via GitGitGadget
  2022-06-20 12:33 ` [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
                   ` (6 more replies)
  0 siblings, 7 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-20 12:33 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Kaartic Sivaram, Abhradeep Chakraborty

When parsing the .bitmap file, git loads all the bitmaps one by one even if
some of the bitmaps are not necessary. We can remove this overhead by
loading only the necessary bitmaps. A look up table extension can solve this
issue.

The proposed table has:

 * a list of nr_entries object ids. These objects are commits that has
   bitmaps. Ids are stored in lexicographic order (for better searching).
 * a list of <offset, xor-offset> pairs (4-byte integers, network-byte
   order). The i'th pair denotes the offset and xor-offset(respectively) of
   the bitmap of i'th commit in the previous list. These two informations
   are necessary because only in this way bitmaps can be found without
   parsing all the bitmap.
 * a 4-byte integer for table specific flags (none exists currently).

Whenever git want to parse the bitmap for a specific commit, it will first
refer to the table and will look for the offset and xor-offset for that
commit. Git will then try to parse the bitmap located at the offset
position. The xor-offset can be used to find the xor-bitmap for the
bitmap(if any). This process is recursive and will end if xor-offset is null
(i.e. there is no xor-bitmap left).

Abhradeep Chakraborty (5):
  Documentation/technical: describe bitmap lookup table extension
  pack-bitmap: prepare to read lookup table extension
  pack-bitmap-write.c: write lookup table extension
  bitmap-commit-table: add tests for the bitmap lookup table
  bitmap-lookup-table: add performance tests

Taylor Blau (1):
  builtin/pack-objects.c: learn pack.writeBitmapLookupTable

 Documentation/config/pack.txt             |   7 +
 Documentation/technical/bitmap-format.txt |  31 ++++
 builtin/pack-objects.c                    |   8 +
 pack-bitmap-write.c                       |  59 +++++++-
 pack-bitmap.c                             | 172 +++++++++++++++++++++-
 pack-bitmap.h                             |   1 +
 t/perf/p5310-pack-bitmaps.sh              |  60 +++++---
 t/perf/p5326-multi-pack-bitmaps.sh        |  55 ++++---
 t/t5310-pack-bitmaps.sh                   |  14 ++
 t/t5326-multi-pack-bitmaps.sh             |  19 +++
 10 files changed, 375 insertions(+), 51 deletions(-)


base-commit: 5699ec1b0aec51b9e9ba5a2785f65970c5a95d84
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1266%2FAbhra303%2Fbitmap-commit-table-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1266/Abhra303/bitmap-commit-table-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1266
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-06-20 12:33 [PATCH 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
@ 2022-06-20 12:33 ` Abhradeep Chakraborty via GitGitGadget
  2022-06-20 16:56   ` Derrick Stolee
                     ` (2 more replies)
  2022-06-20 12:33 ` [PATCH 2/6] pack-bitmap: prepare to read " Abhradeep Chakraborty via GitGitGadget
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-20 12:33 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Abhradeep Chakraborty,
	Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

When reading bitmap file, git loads each and every bitmap one by one
even if all the bitmaps are not required. A "bitmap lookup table"
extension to the bitmap format can reduce the overhead of loading
bitmaps which stores a list of bitmapped commit oids, along with their
offset and xor offset. This way git can load only the neccesary bitmaps
without loading the previous bitmaps.

Add some information for the new "bitmap lookup table" extension in the
bitmap-format documentation.

Co-Authored-by: Taylor Blau <ttaylorr@github.com>
Mentored-by: Taylor Blau <ttaylorr@github.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 Documentation/technical/bitmap-format.txt | 31 +++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index 04b3ec21785..34e98787b78 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -67,6 +67,14 @@ MIDXs, both the bit-cache and rev-cache extensions are required.
 			pack/MIDX. The format and meaning of the name-hash is
 			described below.
 
+			** {empty}
+			BITMAP_OPT_LOOKUP_TABLE (0xf) : :::
+			If present, the end of the bitmap file contains a table
+			containing a list of `N` object ids, a list of pairs of
+			offset and xor offset of respective objects, and 4-byte
+			integer denoting the flags (currently none). The format
+			and meaning of the table is described below.
+
 		4-byte entry count (network byte order)
 
 			The total count of entries (bitmapped commits) in this bitmap index.
@@ -205,3 +213,26 @@ Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag.
 If implementations want to choose a different hashing scheme, they are
 free to do so, but MUST allocate a new header flag (because comparing
 hashes made under two different schemes would be pointless).
+
+Commit lookup table
+-------------------
+
+If the BITMAP_OPT_LOOKUP_TABLE flag is set, the end of the `.bitmap`
+contains a lookup table specifying the positions of commits which have a
+bitmap.
+
+For a `.bitmap` containing `nr_entries` reachability bitmaps, the format
+is as follows:
+
+	- `nr_entries` object names.
+
+	- `nr_entries` pairs of 4-byte integers, each in network order.
+	  The first holds the offset from which that commit's bitmap can
+	  be read. The second number holds the position of the commit
+	  whose bitmap the current bitmap is xor'd with in lexicographic
+	  order, or 0xffffffff if the current commit is not xor'd with
+	  anything.
+
+	- One 4-byte network byte order integer specifying
+	  table-specific flags. None exist currently, so this is always
+	  "0".
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH 2/6] pack-bitmap: prepare to read lookup table extension
  2022-06-20 12:33 [PATCH 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
  2022-06-20 12:33 ` [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
@ 2022-06-20 12:33 ` Abhradeep Chakraborty via GitGitGadget
  2022-06-20 20:49   ` Derrick Stolee
  2022-06-20 22:06   ` Taylor Blau
  2022-06-20 12:33 ` [PATCH 3/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-20 12:33 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Abhradeep Chakraborty,
	Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Bitmap lookup table extension can let git to parse only the necessary
bitmaps without loading the previous bitmaps one by one.

Teach git to read and use the bitmap lookup table extension.

Co-Authored-by: Taylor Blau <ttaylorr@github.com>
Mentored-by: Taylor Blau <ttaylorr@github.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 pack-bitmap.c | 172 ++++++++++++++++++++++++++++++++++++++++++++++++--
 pack-bitmap.h |   1 +
 2 files changed, 166 insertions(+), 7 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 36134222d7a..d5e5973a79f 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -15,6 +15,7 @@
 #include "list-objects-filter-options.h"
 #include "midx.h"
 #include "config.h"
+#include "hash-lookup.h"
 
 /*
  * An entry on the bitmap index, representing the bitmap for a given
@@ -82,6 +83,13 @@ struct bitmap_index {
 	/* The checksum of the packfile or MIDX; points into map. */
 	const unsigned char *checksum;
 
+	/*
+	 * If not NULL, these point into the various commit table sections
+	 * (within map).
+	 */
+	unsigned char *table_lookup;
+	unsigned char *table_offsets;
+
 	/*
 	 * Extended index.
 	 *
@@ -185,6 +193,24 @@ static int load_bitmap_header(struct bitmap_index *index)
 			index->hashes = (void *)(index_end - cache_size);
 			index_end -= cache_size;
 		}
+
+		if (flags & BITMAP_OPT_LOOKUP_TABLE &&
+		    git_env_bool("GIT_READ_COMMIT_TABLE", 1)) {
+			uint32_t entry_count = ntohl(header->entry_count);
+			uint32_t table_size =
+				(entry_count * the_hash_algo->rawsz) /* oids */ +
+				(entry_count * sizeof(uint32_t)) /* offsets */ +
+				(entry_count * sizeof(uint32_t)) /* xor offsets */ +
+				(sizeof(uint32_t)) /* flags */;
+
+			if (table_size > index_end - index->map - header_size)
+				return error("corrupted bitmap index file (too short to fit commit table)");
+
+			index->table_lookup = (void *)(index_end - table_size);
+			index->table_offsets = index->table_lookup + the_hash_algo->rawsz * entry_count;
+
+			index_end -= table_size;
+		}
 	}
 
 	index->entry_count = ntohl(header->entry_count);
@@ -470,7 +496,7 @@ static int load_bitmap(struct bitmap_index *bitmap_git)
 		!(bitmap_git->tags = read_bitmap_1(bitmap_git)))
 		goto failed;
 
-	if (load_bitmap_entries_v1(bitmap_git) < 0)
+	if (!bitmap_git->table_lookup && load_bitmap_entries_v1(bitmap_git) < 0)
 		goto failed;
 
 	return 0;
@@ -557,14 +583,145 @@ struct include_data {
 	struct bitmap *seen;
 };
 
-struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
-				      struct commit *commit)
+static struct stored_bitmap *stored_bitmap_for_commit(struct bitmap_index *bitmap_git,
+						      struct commit *commit,
+						      uint32_t *pos_hint);
+
+static inline const unsigned char *bitmap_oid_pos(struct bitmap_index *bitmap_git,
+						  uint32_t pos)
+{
+	return bitmap_git->table_lookup + (pos * the_hash_algo->rawsz);
+}
+
+static inline const void *bitmap_offset_pos(struct bitmap_index *bitmap_git,
+					    uint32_t pos)
+{
+	return bitmap_git->table_offsets + (pos * 2 * sizeof(uint32_t));
+}
+
+static inline const void *xor_position_pos(struct bitmap_index *bitmap_git,
+					   uint32_t pos)
+{
+	return (unsigned char*) bitmap_offset_pos(bitmap_git, pos) + sizeof(uint32_t);
+}
+
+static int bitmap_lookup_cmp(const void *_va, const void *_vb)
+{
+	return hashcmp(_va, _vb);
+}
+
+static int bitmap_table_lookup(struct bitmap_index *bitmap_git,
+			       struct object_id *oid,
+			       uint32_t *commit_pos)
+{
+	unsigned char *found = bsearch(oid->hash, bitmap_git->table_lookup,
+				       bitmap_git->entry_count,
+				       the_hash_algo->rawsz, bitmap_lookup_cmp);
+	if (found)
+		*commit_pos = (found - bitmap_git->table_lookup) / the_hash_algo->rawsz;
+	return !!found;
+}
+
+static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_git,
+						    struct object_id *oid,
+						    uint32_t commit_pos)
+{
+	uint32_t xor_pos;
+	off_t bitmap_ofs;
+
+	int flags;
+	struct ewah_bitmap *bitmap;
+	struct stored_bitmap *xor_bitmap;
+
+	bitmap_ofs = get_be32(bitmap_offset_pos(bitmap_git, commit_pos));
+	xor_pos = get_be32(xor_position_pos(bitmap_git, commit_pos));
+
+	/*
+	 * Lazily load the xor'd bitmap if required (and we haven't done so
+	 * already). Make sure to pass the xor'd bitmap's position along as a
+	 * hint to avoid an unnecessary binary search in
+	 * stored_bitmap_for_commit().
+	 */
+	if (xor_pos == 0xffffffff) {
+		xor_bitmap = NULL;
+	} else {
+		struct commit *xor_commit;
+		struct object_id xor_oid;
+
+		oidread(&xor_oid, bitmap_oid_pos(bitmap_git, xor_pos));
+
+		xor_commit = lookup_commit(the_repository, &xor_oid);
+		if (!xor_commit)
+			return NULL;
+
+		xor_bitmap = stored_bitmap_for_commit(bitmap_git, xor_commit,
+						      &xor_pos);
+	}
+
+	/*
+	 * Don't bother reading the commit's index position or its xor
+	 * offset:
+	 *
+	 *   - The commit's index position is irrelevant to us, since
+	 *     load_bitmap_entries_v1 only uses it to learn the object
+	 *     id which is used to compute the hashmap's key. We already
+	 *     have an object id, so no need to look it up again.
+	 *
+	 *   - The xor_offset is unusable for us, since it specifies how
+	 *     many entries previous to ours we should look at. This
+	 *     makes sense when reading the bitmaps sequentially (as in
+	 *     load_bitmap_entries_v1()), since we can keep track of
+	 *     each bitmap as we read them.
+	 *
+	 *     But it can't work for us, since the bitmap's don't have a
+	 *     fixed size. So we learn the position of the xor'd bitmap
+	 *     from the commit table (and resolve it to a bitmap in the
+	 *     above if-statement).
+	 *
+	 * Instead, we can skip ahead and immediately read the flags and
+	 * ewah bitmap.
+	 */
+	bitmap_git->map_pos = bitmap_ofs + sizeof(uint32_t) + sizeof(uint8_t);
+	flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
+	bitmap = read_bitmap_1(bitmap_git);
+	if (!bitmap)
+		return NULL;
+
+	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags);
+}
+
+static struct stored_bitmap *stored_bitmap_for_commit(struct bitmap_index *bitmap_git,
+						      struct commit *commit,
+						      uint32_t *pos_hint)
 {
 	khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps,
 					   commit->object.oid);
-	if (hash_pos >= kh_end(bitmap_git->bitmaps))
+	if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
+		uint32_t commit_pos;
+		if (!bitmap_git->table_lookup)
+			return NULL;
+
+		/* NEEDSWORK: cache misses aren't recorded. */
+		if (pos_hint)
+			commit_pos = *pos_hint;
+		else if (!bitmap_table_lookup(bitmap_git,
+					      &commit->object.oid,
+					      &commit_pos))
+			return NULL;
+		return lazy_bitmap_for_commit(bitmap_git, &commit->object.oid,
+					      commit_pos);
+	}
+	return kh_value(bitmap_git->bitmaps, hash_pos);
+}
+
+struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
+				      struct commit *commit)
+{
+	struct stored_bitmap *sb = stored_bitmap_for_commit(bitmap_git, commit,
+							    NULL);
+	if (!sb)
 		return NULL;
-	return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
+	return lookup_stored_bitmap(sb);
 }
 
 static inline int bitmap_position_extended(struct bitmap_index *bitmap_git,
@@ -1699,8 +1856,9 @@ void test_bitmap_walk(struct rev_info *revs)
 	if (revs->pending.nr != 1)
 		die("you must specify exactly one commit to test");
 
-	fprintf(stderr, "Bitmap v%d test (%d entries loaded)\n",
-		bitmap_git->version, bitmap_git->entry_count);
+	if (!bitmap_git->table_lookup)
+		fprintf(stderr, "Bitmap v%d test (%d entries loaded)\n",
+			bitmap_git->version, bitmap_git->entry_count);
 
 	root = revs->pending.objects[0].item;
 	bm = bitmap_for_commit(bitmap_git, (struct commit *)root);
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 3d3ddd77345..37f86787a4d 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -26,6 +26,7 @@ struct bitmap_disk_header {
 enum pack_bitmap_opts {
 	BITMAP_OPT_FULL_DAG = 1,
 	BITMAP_OPT_HASH_CACHE = 4,
+	BITMAP_OPT_LOOKUP_TABLE = 16,
 };
 
 enum pack_bitmap_flags {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH 3/6] pack-bitmap-write.c: write lookup table extension
  2022-06-20 12:33 [PATCH 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
  2022-06-20 12:33 ` [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
  2022-06-20 12:33 ` [PATCH 2/6] pack-bitmap: prepare to read " Abhradeep Chakraborty via GitGitGadget
@ 2022-06-20 12:33 ` Abhradeep Chakraborty via GitGitGadget
  2022-06-20 22:16   ` Taylor Blau
  2022-06-20 12:33 ` [PATCH 4/6] builtin/pack-objects.c: learn pack.writeBitmapLookupTable Taylor Blau via GitGitGadget
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-20 12:33 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Abhradeep Chakraborty,
	Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Teach git to write bitmap lookup table extension. The table has the
following information:

    - `N` no of Object ids of each bitmapped commits

    - A list of offset, xor-offset pair; the i'th pair denotes the
      offsets and xor-offsets of i'th commit in the previous list.

    - 4-byte integer denoting the flags

Co-authored-by: Taylor Blau <ttaylorr@github.com>
Mentored-by: Taylor Blau <ttaylorr@github.com>
Co-mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 pack-bitmap-write.c | 59 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index c43375bd344..9e88a64dd65 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -650,7 +650,8 @@ static const struct object_id *oid_access(size_t pos, const void *table)
 
 static void write_selected_commits_v1(struct hashfile *f,
 				      struct pack_idx_entry **index,
-				      uint32_t index_nr)
+				      uint32_t index_nr,
+				      off_t *offsets)
 {
 	int i;
 
@@ -663,6 +664,9 @@ static void write_selected_commits_v1(struct hashfile *f,
 		if (commit_pos < 0)
 			BUG("trying to write commit not in index");
 
+		if (offsets)
+			offsets[i] = hashfile_total(f);
+
 		hashwrite_be32(f, commit_pos);
 		hashwrite_u8(f, stored->xor_offset);
 		hashwrite_u8(f, stored->flags);
@@ -671,6 +675,49 @@ static void write_selected_commits_v1(struct hashfile *f,
 	}
 }
 
+static int table_cmp(const void *_va, const void *_vb)
+{
+	return oidcmp(&writer.selected[*(uint32_t*)_va].commit->object.oid,
+		      &writer.selected[*(uint32_t*)_vb].commit->object.oid);
+}
+
+static void write_lookup_table(struct hashfile *f,
+			       off_t *offsets)
+{
+	uint32_t i;
+	uint32_t flags = 0;
+	uint32_t *table, *table_inv;
+
+	ALLOC_ARRAY(table, writer.selected_nr);
+	ALLOC_ARRAY(table_inv, writer.selected_nr);
+
+	for (i = 0; i < writer.selected_nr; i++)
+		table[i] = i;
+	QSORT(table, writer.selected_nr, table_cmp);
+	for (i = 0; i < writer.selected_nr; i++)
+		table_inv[table[i]] = i;
+
+	for (i = 0; i < writer.selected_nr; i++) {
+		struct bitmapped_commit *selected = &writer.selected[table[i]];
+		struct object_id *oid = &selected->commit->object.oid;
+
+		hashwrite(f, oid->hash, the_hash_algo->rawsz);
+	}
+	for (i = 0; i < writer.selected_nr; i++) {
+		struct bitmapped_commit *selected = &writer.selected[table[i]];
+
+		hashwrite_be32(f, offsets[table[i]]);
+		hashwrite_be32(f, selected->xor_offset
+			       ? table_inv[table[i] - selected->xor_offset]
+			       : 0xffffffff);
+	}
+
+	hashwrite_be32(f, flags);
+
+	free(table);
+	free(table_inv);
+}
+
 static void write_hash_cache(struct hashfile *f,
 			     struct pack_idx_entry **index,
 			     uint32_t index_nr)
@@ -695,6 +742,7 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 {
 	static uint16_t default_version = 1;
 	static uint16_t flags = BITMAP_OPT_FULL_DAG;
+	off_t *offsets = NULL;
 	struct strbuf tmp_file = STRBUF_INIT;
 	struct hashfile *f;
 
@@ -715,8 +763,14 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	dump_bitmap(f, writer.trees);
 	dump_bitmap(f, writer.blobs);
 	dump_bitmap(f, writer.tags);
-	write_selected_commits_v1(f, index, index_nr);
 
+	if (options & BITMAP_OPT_LOOKUP_TABLE)
+		CALLOC_ARRAY(offsets, index_nr);
+
+	write_selected_commits_v1(f, index, index_nr, offsets);
+
+	if (options & BITMAP_OPT_LOOKUP_TABLE)
+		write_lookup_table(f, offsets);
 	if (options & BITMAP_OPT_HASH_CACHE)
 		write_hash_cache(f, index, index_nr);
 
@@ -730,4 +784,5 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 		die_errno("unable to rename temporary bitmap file to '%s'", filename);
 
 	strbuf_release(&tmp_file);
+	free(offsets);
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH 4/6] builtin/pack-objects.c: learn pack.writeBitmapLookupTable
  2022-06-20 12:33 [PATCH 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-06-20 12:33 ` [PATCH 3/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
@ 2022-06-20 12:33 ` Taylor Blau via GitGitGadget
  2022-06-20 22:18   ` Taylor Blau
  2022-06-20 12:33 ` [PATCH 5/6] bitmap-commit-table: add tests for the bitmap lookup table Abhradeep Chakraborty via GitGitGadget
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 162+ messages in thread
From: Taylor Blau via GitGitGadget @ 2022-06-20 12:33 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Kaartic Sivaram, Abhradeep Chakraborty, Taylor Blau

From: Taylor Blau <ttaylorr@github.com>

Teach git to provide a way for users to enable/disable bitmap lookup
table extension by providing a config option named 'writeBitmapLookupTable'.

Signed-off-by: Taylor Blau <ttaylorr@github.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 Documentation/config/pack.txt | 7 +++++++
 builtin/pack-objects.c        | 8 ++++++++
 2 files changed, 15 insertions(+)

diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index ad7f73a1ead..e12008d2415 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -164,6 +164,13 @@ When writing a multi-pack reachability bitmap, no new namehashes are
 computed; instead, any namehashes stored in an existing bitmap are
 permuted into their appropriate location when writing a new bitmap.
 
+pack.writeBitmapLookupTable::
+	When true, git will include a "lookup table" section in the
+	bitmap index (if one is written). This table is used to defer
+	loading individual bitmaps as late as possible. This can be
+	beneficial in repositories which have relatively large bitmap
+	indexes. Defaults to false.
+
 pack.writeReverseIndex::
 	When true, git will write a corresponding .rev file (see:
 	link:../technical/pack-format.html[Documentation/technical/pack-format.txt])
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index cc5f41086da..3ba20301980 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3148,6 +3148,14 @@ static int git_pack_config(const char *k, const char *v, void *cb)
 		else
 			write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE;
 	}
+
+	if (!strcmp(k, "pack.writebitmaplookuptable")) {
+		if (git_config_bool(k, v))
+			write_bitmap_options |= BITMAP_OPT_LOOKUP_TABLE;
+		else
+			write_bitmap_options &= ~BITMAP_OPT_LOOKUP_TABLE;
+	}
+
 	if (!strcmp(k, "pack.usebitmaps")) {
 		use_bitmap_index_default = git_config_bool(k, v);
 		return 0;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH 5/6] bitmap-commit-table: add tests for the bitmap lookup table
  2022-06-20 12:33 [PATCH 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                   ` (3 preceding siblings ...)
  2022-06-20 12:33 ` [PATCH 4/6] builtin/pack-objects.c: learn pack.writeBitmapLookupTable Taylor Blau via GitGitGadget
@ 2022-06-20 12:33 ` Abhradeep Chakraborty via GitGitGadget
  2022-06-22 16:54   ` Taylor Blau
  2022-06-20 12:33 ` [PATCH 6/6] bitmap-lookup-table: add performance tests Abhradeep Chakraborty via GitGitGadget
  2022-06-26 13:10 ` [PATCH v2 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
  6 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-20 12:33 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Abhradeep Chakraborty,
	Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Add tests to check the working of the newly implemented lookup table.

Mentored-by: Taylor Blau <ttaylorr@github.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 t/t5310-pack-bitmaps.sh       | 14 ++++++++++++++
 t/t5326-multi-pack-bitmaps.sh | 19 +++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index f775fc1ce69..f05d3e6ace7 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -43,6 +43,20 @@ test_expect_success 'full repack creates bitmaps' '
 
 basic_bitmap_tests
 
+test_expect_success 'using lookup table does not affect basic bitmap tests' '
+	test_config pack.writeBitmapLookupTable true &&
+	git repack -adb
+'
+basic_bitmap_tests
+
+test_expect_success 'using lookup table does not let each entries to be parsed one by one' '
+	test_config pack.writeBitmapLookupTable true &&
+	git repack -adb &&
+	git rev-list --test-bitmap HEAD 2>out &&
+	grep "Found bitmap for" out &&
+	! grep "Bitmap v1 test "
+'
+
 test_expect_success 'incremental repack fails when bitmaps are requested' '
 	test_commit more-1 &&
 	test_must_fail git repack -d 2>err &&
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 4fe57414c13..85fbdf5e4bb 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -306,5 +306,24 @@ test_expect_success 'graceful fallback when missing reverse index' '
 		! grep "ignoring extra bitmap file" err
 	)
 '
+test_expect_success 'multi-pack-index write --bitmap writes lookup table if enabled' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+		test_commit_bulk 106 &&
+
+		git repack -d &&
+
+		git config pack.writeBitmapLookupTable true &&
+		git multi-pack-index write --bitmap &&
+
+		git rev-list --test-bitmap HEAD 2>out &&
+		grep "Found bitmap for" out &&
+		! grep "Bitmap v1 test "
+
+	)
+'
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH 6/6] bitmap-lookup-table: add performance tests
  2022-06-20 12:33 [PATCH 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                   ` (4 preceding siblings ...)
  2022-06-20 12:33 ` [PATCH 5/6] bitmap-commit-table: add tests for the bitmap lookup table Abhradeep Chakraborty via GitGitGadget
@ 2022-06-20 12:33 ` Abhradeep Chakraborty via GitGitGadget
  2022-06-22 17:14   ` Taylor Blau
  2022-06-26 13:10 ` [PATCH v2 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
  6 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-20 12:33 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Abhradeep Chakraborty,
	Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Add performance tests for bitmap lookup table extension.

Mentored-by: Taylor Blau <ttaylorr@github.com>
Co-mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 t/perf/p5310-pack-bitmaps.sh       | 60 +++++++++++++++++++-----------
 t/perf/p5326-multi-pack-bitmaps.sh | 55 +++++++++++++++++----------
 2 files changed, 73 insertions(+), 42 deletions(-)

diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index 7ad4f237bc3..a8d9414de92 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -10,10 +10,11 @@ test_perf_large_repo
 # since we want to be able to compare bitmap-aware
 # git versus non-bitmap git
 #
-# We intentionally use the deprecated pack.writebitmaps
+# We intentionally use the deprecated pack.writeBitmaps
 # config so that we can test against older versions of git.
 test_expect_success 'setup bitmap config' '
-	git config pack.writebitmaps true
+	git config pack.writeBitmaps true &&
+	git config pack.writeReverseIndex true
 '
 
 # we need to create the tag up front such that it is covered by the repack and
@@ -28,27 +29,42 @@ test_perf 'repack to disk' '
 
 test_full_bitmap
 
-test_expect_success 'create partial bitmap state' '
-	# pick a commit to represent the repo tip in the past
-	cutoff=$(git rev-list HEAD~100 -1) &&
-	orig_tip=$(git rev-parse HEAD) &&
-
-	# now kill off all of the refs and pretend we had
-	# just the one tip
-	rm -rf .git/logs .git/refs/* .git/packed-refs &&
-	git update-ref HEAD $cutoff &&
-
-	# and then repack, which will leave us with a nice
-	# big bitmap pack of the "old" history, and all of
-	# the new history will be loose, as if it had been pushed
-	# up incrementally and exploded via unpack-objects
-	git repack -Ad &&
-
-	# and now restore our original tip, as if the pushes
-	# had happened
-	git update-ref HEAD $orig_tip
+test_perf 'use lookup table' '
+    git config pack.writeBitmapLookupTable true
 '
 
-test_partial_bitmap
+test_perf 'repack to disk (lookup table)' '
+    git repack -adb
+'
+
+test_full_bitmap
+
+for i in false true
+do
+	$i && lookup=" (lookup table)"
+	test_expect_success "create partial bitmap state$lookup" '
+		git config pack.writeBitmapLookupTable '"$i"' &&
+		# pick a commit to represent the repo tip in the past
+		cutoff=$(git rev-list HEAD~100 -1) &&
+		orig_tip=$(git rev-parse HEAD) &&
+
+		# now kill off all of the refs and pretend we had
+		# just the one tip
+		rm -rf .git/logs .git/refs/* .git/packed-refs &&
+		git update-ref HEAD $cutoff &&
+
+		# and then repack, which will leave us with a nice
+		# big bitmap pack of the "old" history, and all of
+		# the new history will be loose, as if it had been pushed
+		# up incrementally and exploded via unpack-objects
+		git repack -Ad &&
+
+		# and now restore our original tip, as if the pushes
+		# had happened
+		git update-ref HEAD $orig_tip
+	'
+
+	test_partial_bitmap
+done
 
 test_done
diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
index f2fa228f16a..9001eb4533e 100755
--- a/t/perf/p5326-multi-pack-bitmaps.sh
+++ b/t/perf/p5326-multi-pack-bitmaps.sh
@@ -26,27 +26,42 @@ test_expect_success 'drop pack bitmap' '
 
 test_full_bitmap
 
-test_expect_success 'create partial bitmap state' '
-	# pick a commit to represent the repo tip in the past
-	cutoff=$(git rev-list HEAD~100 -1) &&
-	orig_tip=$(git rev-parse HEAD) &&
-
-	# now pretend we have just one tip
-	rm -rf .git/logs .git/refs/* .git/packed-refs &&
-	git update-ref HEAD $cutoff &&
-
-	# and then repack, which will leave us with a nice
-	# big bitmap pack of the "old" history, and all of
-	# the new history will be loose, as if it had been pushed
-	# up incrementally and exploded via unpack-objects
-	git repack -Ad &&
-	git multi-pack-index write --bitmap &&
-
-	# and now restore our original tip, as if the pushes
-	# had happened
-	git update-ref HEAD $orig_tip
+test_expect_success 'use lookup table' '
+	git config pack.writeBitmapLookupTable true
 '
 
-test_partial_bitmap
+test_perf 'setup multi-pack-index (lookup table)' '
+	git multi-pack-index write --bitmap
+'
+
+test_full_bitmap
+
+for i in false true
+do
+	$i && lookup=" (lookup table)"
+	test_expect_success "create partial bitmap state$lookup" '
+		git config pack.writeBitmapLookupTable '"$i"' &&
+		# pick a commit to represent the repo tip in the past
+		cutoff=$(git rev-list HEAD~100 -1) &&
+		orig_tip=$(git rev-parse HEAD) &&
+
+		# now pretend we have just one tip
+		rm -rf .git/logs .git/refs/* .git/packed-refs &&
+		git update-ref HEAD $cutoff &&
+
+		# and then repack, which will leave us with a nice
+		# big bitmap pack of the "old" history, and all of
+		# the new history will be loose, as if it had been pushed
+		# up incrementally and exploded via unpack-objects
+		git repack -Ad &&
+		git multi-pack-index write --bitmap &&
+
+		# and now restore our original tip, as if the pushes
+		# had happened
+		git update-ref HEAD $orig_tip
+	'
+
+	test_partial_bitmap
+done
 
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* Re: [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-06-20 12:33 ` [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
@ 2022-06-20 16:56   ` Derrick Stolee
  2022-06-20 17:09     ` Taylor Blau
  2022-06-21  8:23     ` Abhradeep Chakraborty
  2022-06-20 17:21   ` Taylor Blau
  2022-06-20 20:21   ` Derrick Stolee
  2 siblings, 2 replies; 162+ messages in thread
From: Derrick Stolee @ 2022-06-20 16:56 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget, git
  Cc: Taylor Blau, Kaartic Sivaram, Abhradeep Chakraborty

On 6/20/2022 8:33 AM, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> 
> When reading bitmap file, git loads each and every bitmap one by one
> even if all the bitmaps are not required. A "bitmap lookup table"
> extension to the bitmap format can reduce the overhead of loading
> bitmaps which stores a list of bitmapped commit oids, along with their
> offset and xor offset. This way git can load only the neccesary bitmaps
> without loading the previous bitmaps.
> 
> Add some information for the new "bitmap lookup table" extension in the
> bitmap-format documentation.


> @@ -67,6 +67,14 @@ MIDXs, both the bit-cache and rev-cache extensions are required.
>  			pack/MIDX. The format and meaning of the name-hash is
>  			described below.
>  
> +			** {empty}
> +			BITMAP_OPT_LOOKUP_TABLE (0xf) : :::
> +			If present, the end of the bitmap file contains a table
> +			containing a list of `N` object ids, a list of pairs of
> +			offset and xor offset of respective objects, and 4-byte
> +			integer denoting the flags (currently none). The format
> +			and meaning of the table is described below.
> +

Here, you are adding a new flag that indicates that the end of the file
contains this extra extension. This works because the size of the
extension is predictable. As long as any future extensions are also of
a predictable size, then we can continue adding them via flags in this
way.

This is better than updating the full file format to do something like
like use the chunk format API, especially because this format is shared
across other tools (JGit being mentioned frequently).

It might be worth mentioning in your commit message what happens when an
older version of Git (or JGit) notices this flag. Does it refuse to
operate on the .bitmap file? Does it give a warning or die? It would be
nice if this extension could be ignored (it seems like adding the extra
data at the end does not stop the bitmap data from being understood).

> +
> +Commit lookup table
> +-------------------
> +
> +If the BITMAP_OPT_LOOKUP_TABLE flag is set, the end of the `.bitmap`
> +contains a lookup table specifying the positions of commits which have a
> +bitmap.

Perhaps it would be better to say "the last N * (HASH_LEN + 8) + 4 bytes
preceding the trailing hash" or something? This gives us a concrete way
to compute the start of the table, while also being clear that the table
is included in the trailing hash.

> +For a `.bitmap` containing `nr_entries` reachability bitmaps, the format
> +is as follows:
> +
> +	- `nr_entries` object names.

Could you expand that these objects are commit OIDs, one for each bitmap
in the file. Are they sorted in lexicographical order for binary search,
or are we expecting to read the entire table into a hashtable in-memory?

> +	- `nr_entries` pairs of 4-byte integers, each in network order.
> +	  The first holds the offset from which that commit's bitmap can
> +	  be read. The second number holds the position of the commit
> +	  whose bitmap the current bitmap is xor'd with in lexicographic
> +	  order, or 0xffffffff if the current commit is not xor'd with
> +	  anything.

Interesting to give the xor chains directions here. You say "position"
here for the second commit: do you mean within the list of object names
as opposed to the offset? That would make the most sense so we can trace
the full list of XORs we need to make all at once.

Are .bitmap files already constrained to 4GB, so these 32-bit offsets
make sense? Using 64-bit offsets would be a small cost here, I think,
without needing to do any fancy "overflow" tables that could introduce
a variable-length extension.

> +	- One 4-byte network byte order integer specifying
> +	  table-specific flags. None exist currently, so this is always
> +	  "0".

I'm guessing this is at the end of the extension because a future flag
could modify the length of the extension, so we need the flags to be
in a predictable location. Could we make that clear somewhere?

How does Git react to seeing flags here that it does not recognize?
It seems that Git should ignore the lookup table but continue using the
rest of the .bitmap file as it did before, yes?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-06-20 16:56   ` Derrick Stolee
@ 2022-06-20 17:09     ` Taylor Blau
  2022-06-21  8:31       ` Abhradeep Chakraborty
  2022-06-21  8:23     ` Abhradeep Chakraborty
  1 sibling, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-06-20 17:09 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaram,
	Abhradeep Chakraborty

On Mon, Jun 20, 2022 at 12:56:27PM -0400, Derrick Stolee wrote:
> On 6/20/2022 8:33 AM, Abhradeep Chakraborty via GitGitGadget wrote:
> > From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> >
> > When reading bitmap file, git loads each and every bitmap one by one
> > even if all the bitmaps are not required. A "bitmap lookup table"
> > extension to the bitmap format can reduce the overhead of loading
> > bitmaps which stores a list of bitmapped commit oids, along with their
> > offset and xor offset. This way git can load only the neccesary bitmaps
> > without loading the previous bitmaps.
> >
> > Add some information for the new "bitmap lookup table" extension in the
> > bitmap-format documentation.
>
>
> > @@ -67,6 +67,14 @@ MIDXs, both the bit-cache and rev-cache extensions are required.
> >  			pack/MIDX. The format and meaning of the name-hash is
> >  			described below.
> >
> > +			** {empty}
> > +			BITMAP_OPT_LOOKUP_TABLE (0xf) : :::
> > +			If present, the end of the bitmap file contains a table
> > +			containing a list of `N` object ids, a list of pairs of
> > +			offset and xor offset of respective objects, and 4-byte
> > +			integer denoting the flags (currently none). The format
> > +			and meaning of the table is described below.
> > +
>
> Here, you are adding a new flag that indicates that the end of the file
> contains this extra extension. This works because the size of the
> extension is predictable. As long as any future extensions are also of
> a predictable size, then we can continue adding them via flags in this
> way.

Right; any extensions that are added to the existing .bitmap format must
have a size that is predictable in order for readers to locate the next
extension, if any.

> This is better than updating the full file format to do something like
> like use the chunk format API, especially because this format is shared
> across other tools (JGit being mentioned frequently).

Agreed. Abhradeep and I discussed whether or not it was worth exploring
a new .bitmap format, and the consensus we reached was that it may be
required in the future (if we explored a compression scheme other than
EWAH or made some other backwards-incompatible change), but as of yet it
isn't necessary. So we avoided it to eliminate unnecessary churn,
especially of on-disk formats.

> It might be worth mentioning in your commit message what happens when an
> older version of Git (or JGit) notices this flag. Does it refuse to
> operate on the .bitmap file? Does it give a warning or die? It would be
> nice if this extension could be ignored (it seems like adding the extra
> data at the end does not stop the bitmap data from being understood).

I agree. The bitmap reader does not warn or die when it sees
unrecognized extensions, that way new extensions can be added without
rendering all previously-written bitmaps useless. But in order to
understand an extension on bit N, the reader must also understand
extensions N-1, N-2, and so on (in order to locate the end of
extension N).

> > +	- `nr_entries` pairs of 4-byte integers, each in network order.
> > +	  The first holds the offset from which that commit's bitmap can
> > +	  be read. The second number holds the position of the commit
> > +	  whose bitmap the current bitmap is xor'd with in lexicographic
> > +	  order, or 0xffffffff if the current commit is not xor'd with
> > +	  anything.
>
> Interesting to give the xor chains directions here. You say "position"
> here for the second commit: do you mean within the list of object names
> as opposed to the offset? That would make the most sense so we can trace
> the full list of XORs we need to make all at once.
>
> Are .bitmap files already constrained to 4GB, so these 32-bit offsets
> make sense? Using 64-bit offsets would be a small cost here, I think,
> without needing to do any fancy "overflow" tables that could introduce
> a variable-length extension.

Yeah, we should support >4GB bitmaps here. An overflow table could work,
but I agree with Stolee that in practice it won't matter. Most .bitmap
files that I've looked at in the wild have around ~500 entries at most,
and are usually small. So the cost of widening this section isn't a big
deal.

But note that the entry count is only one component of the bitmap size:
the individual entry lengths obviously matter too. And in repositories
whose bitmaps exceed 500 entries, the entries themselves are often
several million bits long (before compression) already. So it is
certainly possible to exceed 4GB without having an astronomical entry
count.

So doubling the width of this extension might add an extra 250 KiB or
so, which is negligible.

I would much rather see us do that in cases where it makes sense (small
number of entries, minimal cost to wider records, etc.) than adding
unnecessary complexity via an extra lookup table for >4GB offsets.

> > +	- One 4-byte network byte order integer specifying
> > +	  table-specific flags. None exist currently, so this is always
> > +	  "0".
>
> I'm guessing this is at the end of the extension because a future flag
> could modify the length of the extension, so we need the flags to be
> in a predictable location. Could we make that clear somewhere?

I can't remember what I had on my mind when I wrote this ;-).

Abhradeep -- do you have any thoughts about what this might be used for?
I'll try to remember it myself, but I imagine that we could just as
easily remove this altogether and avoid the confusion.

> How does Git react to seeing flags here that it does not recognize?
> It seems that Git should ignore the lookup table but continue using the
> rest of the .bitmap file as it did before, yes?

(See above).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-06-20 12:33 ` [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
  2022-06-20 16:56   ` Derrick Stolee
@ 2022-06-20 17:21   ` Taylor Blau
  2022-06-21  9:22     ` Abhradeep Chakraborty
  2022-06-20 20:21   ` Derrick Stolee
  2 siblings, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-06-20 17:21 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Kaartic Sivaram, Abhradeep Chakraborty

On Mon, Jun 20, 2022 at 12:33:09PM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>
> When reading bitmap file, git loads each and every bitmap one by one
> even if all the bitmaps are not required. A "bitmap lookup table"
> extension to the bitmap format can reduce the overhead of loading
> bitmaps which stores a list of bitmapped commit oids, along with their
> offset and xor offset. This way git can load only the neccesary bitmaps
> without loading the previous bitmaps.

Well put. It might help to have a concrete example of where we expect
this to help and not help. I suspect that some of this will show up in
your work updating the perf suite to use this new table, but I imagine
that we'll find something like:

    In cases where the result can be read or computed without
    significant additional traversal (e.g., all commits of interest
    already have bitmaps computed), we can save some time loading and
    parsing a majority of the bitmap file that we will never read.

    But in cases where the bitmaps are out-of-date, or there is
    significant traversal required to go from the reference tips to
    what's contained in the .bitmap file, this table provides minimal
    benefit (or something).

Of course, you should verify that that is actually true before we insert
it into the commit message as such ;-). But that sort of information may
help readers understand what the purpose of this change is towards the
beinning of the series.

> Add some information for the new "bitmap lookup table" extension in the
> bitmap-format documentation.
>
> Co-Authored-by: Taylor Blau <ttaylorr@github.com>
> Mentored-by: Taylor Blau <ttaylorr@github.com>

Here and elsewhere: I typically use my <me@ttaylorr.com> address when
contributing to Git. So any trailers that mention my email or commits
that you send on my behalf should use that address, too.

> Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> ---
>  Documentation/technical/bitmap-format.txt | 31 +++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
>
> diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
> index 04b3ec21785..34e98787b78 100644
> --- a/Documentation/technical/bitmap-format.txt
> +++ b/Documentation/technical/bitmap-format.txt
> @@ -67,6 +67,14 @@ MIDXs, both the bit-cache and rev-cache extensions are required.
>  			pack/MIDX. The format and meaning of the name-hash is
>  			described below.
>
> +			** {empty}
> +			BITMAP_OPT_LOOKUP_TABLE (0xf) : :::

It the space between "(0xf)" and the first ":" intentional? Similarly,
should there be two or three colons at the end (either "::" or ":::")?

> +			If present, the end of the bitmap file contains a table
> +			containing a list of `N` object ids, a list of pairs of
> +			offset and xor offset of respective objects, and 4-byte
> +			integer denoting the flags (currently none). The format
> +			and meaning of the table is described below.
> +

I remember we had a brief off-list discussion about whether we should
store the full object IDs in the offset table, or whether we could store
their pack- or index-relative ordering. Is there a reason to prefer one
or the other?

I don't think we need to explain the choice fully in the documentation
in this patch, but it may be worth thinking about separately
nonetheless. We can store either order and convert it to an object ID in
constant time.

To figure out which is best, I would recommend trying a few different
choices here and seeing how they do or don't impact your performance
testing.

>  		4-byte entry count (network byte order)
>
>  			The total count of entries (bitmapped commits) in this bitmap index.
> @@ -205,3 +213,26 @@ Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag.
>  If implementations want to choose a different hashing scheme, they are
>  free to do so, but MUST allocate a new header flag (because comparing
>  hashes made under two different schemes would be pointless).
> +
> +Commit lookup table
> +-------------------
> +
> +If the BITMAP_OPT_LOOKUP_TABLE flag is set, the end of the `.bitmap`
> +contains a lookup table specifying the positions of commits which have a
> +bitmap.
> +
> +For a `.bitmap` containing `nr_entries` reachability bitmaps, the format
> +is as follows:
> +
> +	- `nr_entries` object names.
> +
> +	- `nr_entries` pairs of 4-byte integers, each in network order.
> +	  The first holds the offset from which that commit's bitmap can
> +	  be read. The second number holds the position of the commit
> +	  whose bitmap the current bitmap is xor'd with in lexicographic
> +	  order, or 0xffffffff if the current commit is not xor'd with
> +	  anything.

A couple of small thoughts here. I wonder if we'd get better locality if
we made each record look something like:

    (object_id, offset, xor_pos)

Where object_id is either 20- or 4-bytes long (depending if we store the
full object ID, or some 4-byte identifier that allows us to discover
it), offset is 8 bytes long, and xor_pos is 4-bytes (since in practice
we don't support packs or MIDXs which have more than 2^32-1 objects).

In the event that this table doesn't fit into a single cache line, I
think we'll get better performance out of reading it by not forcing the
cache to evict itself whenever we need to refer back to the object_id.

> +	- One 4-byte network byte order integer specifying
> +	  table-specific flags. None exist currently, so this is always
> +	  "0".

I mentioned in my reply to Stolee earlier, but I think that we should
either (a) try to remember what this is for and document it, or (b)
remove it.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-06-20 12:33 ` [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
  2022-06-20 16:56   ` Derrick Stolee
  2022-06-20 17:21   ` Taylor Blau
@ 2022-06-20 20:21   ` Derrick Stolee
  2022-06-21 10:08     ` Abhradeep Chakraborty
  2 siblings, 1 reply; 162+ messages in thread
From: Derrick Stolee @ 2022-06-20 20:21 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget, git
  Cc: Taylor Blau, Kaartic Sivaram, Abhradeep Chakraborty

On 6/20/2022 8:33 AM, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

> +			** {empty}
> +			BITMAP_OPT_LOOKUP_TABLE (0xf) : :::

I think you mean 0x10 (b_1_0000) instead of 0xf (b_1111).

I noticed when looking at the constant in patch 2.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 2/6] pack-bitmap: prepare to read lookup table extension
  2022-06-20 12:33 ` [PATCH 2/6] pack-bitmap: prepare to read " Abhradeep Chakraborty via GitGitGadget
@ 2022-06-20 20:49   ` Derrick Stolee
  2022-06-21 10:28     ` Abhradeep Chakraborty
  2022-06-20 22:06   ` Taylor Blau
  1 sibling, 1 reply; 162+ messages in thread
From: Derrick Stolee @ 2022-06-20 20:49 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget, git
  Cc: Taylor Blau, Kaartic Sivaram, Abhradeep Chakraborty

On 6/20/2022 8:33 AM, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> 
> Bitmap lookup table extension can let git to parse only the necessary
> bitmaps without loading the previous bitmaps one by one.

Here is an attempt to reword this a bit:

  The bitmap lookup table extension was documented by an earlier
  change, but Git does not yet know how to parse that information.
  The extension allows parsing a smaller portion of the bitmap
  file in order to find bitmaps for specific commits.

 
> Teach git to read and use the bitmap lookup table extension.

Normally, I don't mind doing the read portion after the write
portion, but it would be nice to have them in the opposite
order so we can test writing the extension _and Git ignoring
the extension_ before implementing the parsing. As it stands,
most of the code in this patch is untested until patch 5.

General outline attempt:

1. Document the format.
2. Write the extension if the flag is given.
3. Add pack.writeBitmapLookupTable and add tests that write
   the lookup table (and do other bitmap reads on that data).
4. Read the lookup table. The tests from step 3 already cover
   acting upon the lookup table. (Perhaps we add a mode here
   that disables GIT_READ_COMMIT_TABLE since that is not used
   anywhere else.)
5. Performance tests.

> +		if (flags & BITMAP_OPT_LOOKUP_TABLE &&
> +		    git_env_bool("GIT_READ_COMMIT_TABLE", 1)) {

This environment variable does not appear to be used or
documented anywhere. Do we really want to use it as a way
to disable reading the lookup table in general? Or would it be
better to have a GIT_TEST_* variable for disabling the read
during testing?

> +			uint32_t entry_count = ntohl(header->entry_count);
> +			uint32_t table_size =
> +				(entry_count * the_hash_algo->rawsz) /* oids */ +
> +				(entry_count * sizeof(uint32_t)) /* offsets */ +
> +				(entry_count * sizeof(uint32_t)) /* xor offsets */ +
> +				(sizeof(uint32_t)) /* flags */;

Here, uint32_T is probably fine, but maybe we should just use
size_t instead? Should we use st_mult() and st_add() everywhere?

Note: you're using the_hash_algo->rawsz here, which makes sense
because the bitmap format doesn't specify which hash algorithm is
used. Just making this note to say that we should include the hash
algorithm as a value in the bitmap format when we increment the
format version (in the future).

> +			if (table_size > index_end - index->map - header_size)
> +				return error("corrupted bitmap index file (too short to fit commit table)");
> +
> +			index->table_lookup = (void *)(index_end - table_size);
> +			index->table_offsets = index->table_lookup + the_hash_algo->rawsz * entry_count;

st_mult(), st_add()? Or, should we assume safety now?

> +			index_end -= table_size;
> +		}

> -	if (load_bitmap_entries_v1(bitmap_git) < 0)
> +	if (!bitmap_git->table_lookup && load_bitmap_entries_v1(bitmap_git) < 0)
>  		goto failed;

Ok, don't load these entries pre-emptively if we have the lookup table.

> +static struct stored_bitmap *stored_bitmap_for_commit(struct bitmap_index *bitmap_git,
> +						      struct commit *commit,
> +						      uint32_t *pos_hint);

I see that we have a two-method recursion loop. Please move this
declaration to immediately before lazy_bitmap_for_commit() so it
is declared as late as possible.

> +static inline const unsigned char *bitmap_oid_pos(struct bitmap_index *bitmap_git,
> +						  uint32_t pos)
> +{
> +	return bitmap_git->table_lookup + (pos * the_hash_algo->rawsz);
> +}

I would call this "bitmap_hash_pos()" because we are getting a raw
hash and not a 'struct object_id'. Do you want a helper that fills
a 'struct object_id', perhaps passed-by-reference?

> +static inline const void *bitmap_offset_pos(struct bitmap_index *bitmap_git,
> +					    uint32_t pos)

> +static inline const void *xor_position_pos(struct bitmap_index *bitmap_git,
> +					   uint32_t pos)

These two helpers should probably return a size_t and uint32_t
instead of a pointer. Let these do get_be[32|64]() on the computed
pointer.

> +static int bitmap_table_lookup(struct bitmap_index *bitmap_git,
> +			       struct object_id *oid,
> +			       uint32_t *commit_pos)
> +{
> +	unsigned char *found = bsearch(oid->hash, bitmap_git->table_lookup,
> +				       bitmap_git->entry_count,
> +				       the_hash_algo->rawsz, bitmap_lookup_cmp);
> +	if (found)
> +		*commit_pos = (found - bitmap_git->table_lookup) / the_hash_algo->rawsz;
> +	return !!found;

Ok, we are running binary search and converting the pointer into a
position.

Frequently, these kind of searches return an int, but use a negative
value to indicate that the value was not found. Using an int in this
way would restrict us to 2^31 bitmaps instead of 2^32, so maybe it
is not worth matching that practice.

> +static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_git,
> +						    struct object_id *oid,
> +						    uint32_t commit_pos)
> +{
> +	uint32_t xor_pos;
> +	off_t bitmap_ofs;
> +
> +	int flags;
> +	struct ewah_bitmap *bitmap;
> +	struct stored_bitmap *xor_bitmap;
> +
> +	bitmap_ofs = get_be32(bitmap_offset_pos(bitmap_git, commit_pos));
> +	xor_pos = get_be32(xor_position_pos(bitmap_git, commit_pos));

These lines become simpler with a change in the helper methods'
prototypes, as I recommended higher up.

> +	/*
> +	 * Lazily load the xor'd bitmap if required (and we haven't done so
> +	 * already). Make sure to pass the xor'd bitmap's position along as a
> +	 * hint to avoid an unnecessary binary search in
> +	 * stored_bitmap_for_commit().
> +	 */
> +	if (xor_pos == 0xffffffff) {
> +		xor_bitmap = NULL;
> +	} else {
> +		struct commit *xor_commit;
> +		struct object_id xor_oid;
> +
> +		oidread(&xor_oid, bitmap_oid_pos(bitmap_git, xor_pos));
> +
> +		xor_commit = lookup_commit(the_repository, &xor_oid);
> +		if (!xor_commit)
> +			return NULL;
> +
> +		xor_bitmap = stored_bitmap_for_commit(bitmap_git, xor_commit,
> +						      &xor_pos);
> +	}

This is using an interesting type of tail-recursion. We might be
better off using a loop with a stack: push to the stack the commit
positions of the XOR bitmaps. At the very bottom, we get a bitmap
without an XOR base. Then, pop off the stack, modifying the bitmap
with XOR operations as we go. (Perhaps we also store these bitmaps
in-memory along the way?) Finally, we have the necessary bitmap.

This iterative approach avoids possible stack exhaustion if there
are long XOR chains in the file.

> +
> +	/*
> +	 * Don't bother reading the commit's index position or its xor
> +	 * offset:
> +	 *
> +	 *   - The commit's index position is irrelevant to us, since
> +	 *     load_bitmap_entries_v1 only uses it to learn the object
> +	 *     id which is used to compute the hashmap's key. We already
> +	 *     have an object id, so no need to look it up again.
> +	 *
> +	 *   - The xor_offset is unusable for us, since it specifies how
> +	 *     many entries previous to ours we should look at. This
> +	 *     makes sense when reading the bitmaps sequentially (as in
> +	 *     load_bitmap_entries_v1()), since we can keep track of
> +	 *     each bitmap as we read them.
> +	 *
> +	 *     But it can't work for us, since the bitmap's don't have a
> +	 *     fixed size. So we learn the position of the xor'd bitmap
> +	 *     from the commit table (and resolve it to a bitmap in the
> +	 *     above if-statement).
> +	 *
> +	 * Instead, we can skip ahead and immediately read the flags and
> +	 * ewah bitmap.
> +	 */
> +	bitmap_git->map_pos = bitmap_ofs + sizeof(uint32_t) + sizeof(uint8_t);
> +	flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
> +	bitmap = read_bitmap_1(bitmap_git);
> +	if (!bitmap)
> +		return NULL;
> +
> +	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags);

Looks like we'd want to call store_bitmap() while popping the stack
in the loop I recommended above.

> +}
> +
> +static struct stored_bitmap *stored_bitmap_for_commit(struct bitmap_index *bitmap_git,
> +						      struct commit *commit,
> +						      uint32_t *pos_hint)
>  {
>  	khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps,
>  					   commit->object.oid);
> -	if (hash_pos >= kh_end(bitmap_git->bitmaps))
> +	if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
> +		uint32_t commit_pos;
> +		if (!bitmap_git->table_lookup)
> +			return NULL;
> +
> +		/* NEEDSWORK: cache misses aren't recorded. */
> +		if (pos_hint)
> +			commit_pos = *pos_hint;
> +		else if (!bitmap_table_lookup(bitmap_git,
> +					      &commit->object.oid,
> +					      &commit_pos))
> +			return NULL;
> +		return lazy_bitmap_for_commit(bitmap_git, &commit->object.oid,
> +					      commit_pos);

The extra bonus of going incremental is that we don't have recursion
across two methods, which I always find difficult to reason about.

> +	}
> +	return kh_value(bitmap_git->bitmaps, hash_pos);
> +}

> @@ -26,6 +26,7 @@ struct bitmap_disk_header {
>  enum pack_bitmap_opts {
>  	BITMAP_OPT_FULL_DAG = 1,
>  	BITMAP_OPT_HASH_CACHE = 4,
> +	BITMAP_OPT_LOOKUP_TABLE = 16,

Perhaps it is time to use hexadecimal representation here to match the
file format document?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 2/6] pack-bitmap: prepare to read lookup table extension
  2022-06-20 12:33 ` [PATCH 2/6] pack-bitmap: prepare to read " Abhradeep Chakraborty via GitGitGadget
  2022-06-20 20:49   ` Derrick Stolee
@ 2022-06-20 22:06   ` Taylor Blau
  2022-06-21 11:52     ` Abhradeep Chakraborty
  1 sibling, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-06-20 22:06 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Kaartic Sivaram, Abhradeep Chakraborty

On Mon, Jun 20, 2022 at 12:33:10PM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>
> Bitmap lookup table extension can let git to parse only the necessary
> bitmaps without loading the previous bitmaps one by one.
>
> Teach git to read and use the bitmap lookup table extension.
>
> Co-Authored-by: Taylor Blau <ttaylorr@github.com>
> Mentored-by: Taylor Blau <ttaylorr@github.com>
> Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> ---
>  pack-bitmap.c | 172 ++++++++++++++++++++++++++++++++++++++++++++++++--
>  pack-bitmap.h |   1 +
>  2 files changed, 166 insertions(+), 7 deletions(-)
>
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 36134222d7a..d5e5973a79f 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -15,6 +15,7 @@
>  #include "list-objects-filter-options.h"
>  #include "midx.h"
>  #include "config.h"
> +#include "hash-lookup.h"
>
>  /*
>   * An entry on the bitmap index, representing the bitmap for a given
> @@ -82,6 +83,13 @@ struct bitmap_index {
>  	/* The checksum of the packfile or MIDX; points into map. */
>  	const unsigned char *checksum;
>
> +	/*
> +	 * If not NULL, these point into the various commit table sections
> +	 * (within map).
> +	 */
> +	unsigned char *table_lookup;
> +	unsigned char *table_offsets;
> +

If table_offsets ends up being a list of just offsets, we could assign
this to the appropriate type, e.g., 'uint64_t *'. We would want to
avoid using a type whose width is platform dependent, like off_t.

But if you end up taking my suggestion from a previous response (of
making each entry in the offset table a triple of commit, offset, and
xor position), make sure to _not_ get tempted to define a struct and
assign table_lookup to be a pointer of that structure type.

That's because even though the struct *should* be packed as you expect,
the packing is mostly up to the compiler, so you can't guarantee its
members won't have padding between them or at the end of the struct for
alignment purposes.

>  	/*
>  	 * Extended index.
>  	 *
> @@ -185,6 +193,24 @@ static int load_bitmap_header(struct bitmap_index *index)
>  			index->hashes = (void *)(index_end - cache_size);
>  			index_end -= cache_size;
>  		}
> +
> +		if (flags & BITMAP_OPT_LOOKUP_TABLE &&
> +		    git_env_bool("GIT_READ_COMMIT_TABLE", 1)) {

What is the purpose of the GIT_READ_COMMIT_TABLE environment variable? I
assume that it's to make it easier to run tests (especially performance
ones) with and without access to the lookup table. If so, we should
document that (lightly) in the commit message, and rename this to be
GIT_TEST_READ_COMMIT_TABLE to indicate that it shouldn't be used outside
of tests.

> +			uint32_t entry_count = ntohl(header->entry_count);
> +			uint32_t table_size =
> +				(entry_count * the_hash_algo->rawsz) /* oids */ +
> +				(entry_count * sizeof(uint32_t)) /* offsets */ +
> +				(entry_count * sizeof(uint32_t)) /* xor offsets */ +
> +				(sizeof(uint32_t)) /* flags */;

entry_count is definitely a 4-byte integer, so uint32_t is the right
type. But I think table_size should be a size_t, and computations on it
should be more strictly checked. Perhaps something like;

    size_t table_size = sizeof(uint32_t); /* flags */
    table_size = st_add(table_size, st_mult(entry_count, the_hash_algo->rawsz)); /* oids */
    table_size = st_add(table_size, st_mult(entry_count, sizeof(uint32_t))); /* offsets */
    table_size = st_add(table_size, st_mult(entry_count, sizeof(uint32_t))); /* xor offsets */

or even:

    size_t table_size = sizeof(uint32_t); /* flags */
    table_size = st_add(table_size,
                        st_mult(entry_count,
                                the_hash_algo->rawsz + /* oids */
                                sizeof(uint32_t) + /* offsets*/
                                sizeof(uint32_t) /* xor offsets */
                               ));

> +			if (table_size > index_end - index->map - header_size)
> +				return error("corrupted bitmap index file (too short to fit commit table)");
> +
> +			index->table_lookup = (void *)(index_end - table_size);
> +			index->table_offsets = index->table_lookup + the_hash_algo->rawsz * entry_count;
> +
> +			index_end -= table_size;
> +		}

Looks good.

> @@ -470,7 +496,7 @@ static int load_bitmap(struct bitmap_index *bitmap_git)
>  		!(bitmap_git->tags = read_bitmap_1(bitmap_git)))
>  		goto failed;
>
> -	if (load_bitmap_entries_v1(bitmap_git) < 0)
> +	if (!bitmap_git->table_lookup && load_bitmap_entries_v1(bitmap_git) < 0)
>  		goto failed;

No need to load each of the bitmaps individually via
load_bitmap_entries_v1() if we have a lookup table. That function
doesn't do any other initialization that we depend on, so it's OK to
just avoid calling it altogether.

>  	return 0;
> @@ -557,14 +583,145 @@ struct include_data {
>  	struct bitmap *seen;
>  };
>
> -struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
> -				      struct commit *commit)
> +static struct stored_bitmap *stored_bitmap_for_commit(struct bitmap_index *bitmap_git,
> +						      struct commit *commit,
> +						      uint32_t *pos_hint);
> +
> +static inline const unsigned char *bitmap_oid_pos(struct bitmap_index *bitmap_git,
> +						  uint32_t pos)
> +{
> +	return bitmap_git->table_lookup + (pos * the_hash_algo->rawsz);
> +}
> +
> +static inline const void *bitmap_offset_pos(struct bitmap_index *bitmap_git,
> +					    uint32_t pos)
> +{
> +	return bitmap_git->table_offsets + (pos * 2 * sizeof(uint32_t));
> +}
> +
> +static inline const void *xor_position_pos(struct bitmap_index *bitmap_git,
> +					   uint32_t pos)
> +{
> +	return (unsigned char*) bitmap_offset_pos(bitmap_git, pos) + sizeof(uint32_t);
> +}
> +
> +static int bitmap_lookup_cmp(const void *_va, const void *_vb)
> +{
> +	return hashcmp(_va, _vb);
> +}

All makes sense. Some light documentation might help explain what this
comparator function is used for (the bsearch() call below in
bitmap_table_lookup()), although I suspect that this function will get
slightly more complicated if you pack the table contents as I suggest,
in which case more documentation will definitely help.

> +
> +static int bitmap_table_lookup(struct bitmap_index *bitmap_git,
> +			       struct object_id *oid,
> +			       uint32_t *commit_pos)
> +{
> +	unsigned char *found = bsearch(oid->hash, bitmap_git->table_lookup,
> +				       bitmap_git->entry_count,
> +				       the_hash_algo->rawsz, bitmap_lookup_cmp);
> +	if (found)
> +		*commit_pos = (found - bitmap_git->table_lookup) / the_hash_algo->rawsz;

If you end up chaning the type of bitmap_git->table_lookup, make sure
that you scale the result of the pointer arithmetic accordingly, or cast
down to an 'unsigned char *' before you do any math.

> +	return !!found;
> +}
> +
> +static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_git,
> +						    struct object_id *oid,
> +						    uint32_t commit_pos)
> +{
> +	uint32_t xor_pos;
> +	off_t bitmap_ofs;
> +
> +	int flags;
> +	struct ewah_bitmap *bitmap;
> +	struct stored_bitmap *xor_bitmap;
> +
> +	bitmap_ofs = get_be32(bitmap_offset_pos(bitmap_git, commit_pos));
> +	xor_pos = get_be32(xor_position_pos(bitmap_git, commit_pos));
> +
> +	/*
> +	 * Lazily load the xor'd bitmap if required (and we haven't done so
> +	 * already). Make sure to pass the xor'd bitmap's position along as a
> +	 * hint to avoid an unnecessary binary search in
> +	 * stored_bitmap_for_commit().
> +	 */
> +	if (xor_pos == 0xffffffff) {
> +		xor_bitmap = NULL;
> +	} else {
> +		struct commit *xor_commit;
> +		struct object_id xor_oid;
> +
> +		oidread(&xor_oid, bitmap_oid_pos(bitmap_git, xor_pos));

Interesting; this is a point that I forgot about from the original
patch. xor_pos is an index (not an offset) into the list of commits in
the table of contents in the order appear in that table. We should be
clear about (a) what that order is, and (b) that xor_pos is an index
into that order.

The rest of this function looks good to me.

> +static struct stored_bitmap *stored_bitmap_for_commit(struct bitmap_index *bitmap_git,
> +						      struct commit *commit,
> +						      uint32_t *pos_hint)
>  {
>  	khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps,
>  					   commit->object.oid);
> -	if (hash_pos >= kh_end(bitmap_git->bitmaps))
> +	if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
> +		uint32_t commit_pos;
> +		if (!bitmap_git->table_lookup)
> +			return NULL;

I was going to suggest moving this check into the caller
bitmap_for_commit() and making it a BUG() to call
stored_bitmap_for_commit() with a NULL bitmap_git->table_lookup pointer.

And I think this makes sense... if we return NULL here, then we know
that we definitely don't have a stored bitmap, since there's no table to
look it up in and we have already loaded everything else. So we
propagate that NULL to the return value of bitmap_for_commit(), and that
makes sense. Good.

> +		/* NEEDSWORK: cache misses aren't recorded. */

Yeah. The problem here is that we can't record every commit that
_doesn't_ have a bitmap every time we return NULL from one of these
queries, since there are arbitrarily many such commits that don't have
bitmaps.

We could approximate it using a Bloom filter or something, and much of
that code is already written and could be interesting to try and reuse.

But I wonder if we could get by with something simpler, though, which
would cause us to load all bitmaps from the lookup table after a fixed
number of cache misses (at which point we should force ourselves to load
everything and just read everything out of a single O(1) lookup in the
stored bitmap table).

That may or may not be a good idea, and the threshold will probably be
highly dependent on the system. So it may not even be worth it, but I
think it's an interesting area to experiemnt in and think a little more
about.

> +		if (pos_hint)
> +			commit_pos = *pos_hint;

How does this commit_pos work again? I confess I have forgetten since I
wrote some of this code a while ago... :-).

> @@ -1699,8 +1856,9 @@ void test_bitmap_walk(struct rev_info *revs)
>  	if (revs->pending.nr != 1)
>  		die("you must specify exactly one commit to test");
>
> -	fprintf(stderr, "Bitmap v%d test (%d entries loaded)\n",
> -		bitmap_git->version, bitmap_git->entry_count);
> +	if (!bitmap_git->table_lookup)
> +		fprintf(stderr, "Bitmap v%d test (%d entries loaded)\n",
> +			bitmap_git->version, bitmap_git->entry_count);

Should we print this regardless of whether or not there is a lookup
table? We should be able to learn the entry count either way.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 3/6] pack-bitmap-write.c: write lookup table extension
  2022-06-20 12:33 ` [PATCH 3/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
@ 2022-06-20 22:16   ` Taylor Blau
  2022-06-21 12:50     ` Abhradeep Chakraborty
  0 siblings, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-06-20 22:16 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Kaartic Sivaram, Abhradeep Chakraborty

On Mon, Jun 20, 2022 at 12:33:11PM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>
> Teach git to write bitmap lookup table extension. The table has the
> following information:
>
>     - `N` no of Object ids of each bitmapped commits

s/no/number, s/Object/object, s/ids/IDs, and s/commits/commit

>     - A list of offset, xor-offset pair; the i'th pair denotes the
>       offsets and xor-offsets of i'th commit in the previous list.

s/pair/pairs

>     - 4-byte integer denoting the flags
>
> Co-authored-by: Taylor Blau <ttaylorr@github.com>
> Mentored-by: Taylor Blau <ttaylorr@github.com>
> Co-mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> ---
>  pack-bitmap-write.c | 59 +++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 57 insertions(+), 2 deletions(-)
>
> diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
> index c43375bd344..9e88a64dd65 100644
> --- a/pack-bitmap-write.c
> +++ b/pack-bitmap-write.c
> @@ -650,7 +650,8 @@ static const struct object_id *oid_access(size_t pos, const void *table)
>
>  static void write_selected_commits_v1(struct hashfile *f,
>  				      struct pack_idx_entry **index,
> -				      uint32_t index_nr)
> +				      uint32_t index_nr,
> +				      off_t *offsets)
>  {
>  	int i;
>
> @@ -663,6 +664,9 @@ static void write_selected_commits_v1(struct hashfile *f,
>  		if (commit_pos < 0)
>  			BUG("trying to write commit not in index");
>
> +		if (offsets)
> +			offsets[i] = hashfile_total(f);
> +

Makes sense; we record the offset for the ith commit as however many
bytes we've already written into the hashfile up to this point, since
the subsequent byte will begin the bitmap (well, the preceding few
bytes of it, anyways) itself.

>  		hashwrite_be32(f, commit_pos);
>  		hashwrite_u8(f, stored->xor_offset);
>  		hashwrite_u8(f, stored->flags);
> @@ -671,6 +675,49 @@ static void write_selected_commits_v1(struct hashfile *f,
>  	}
>  }
>
> +static int table_cmp(const void *_va, const void *_vb)
> +{
> +	return oidcmp(&writer.selected[*(uint32_t*)_va].commit->object.oid,
> +		      &writer.selected[*(uint32_t*)_vb].commit->object.oid);

This implementation looks right to me, but perhaps we should expand it
out from the one-liner here to make it more readable. Perhaps something
like:

    static int table_cmp(const void *_va, const void *_vb)
    {
      struct commit *c1 = &writer.selected[*(uint32_t*)_va];
      struct commit *c2 = &writer.selected[*(uint32_t*)_vb];

      return oidcmp(&c1->object.oid, &c2->object.oid);
    }

which is arguably slightly more readable than the one-liner (but I don't
feel that strongly about it.)

> +static void write_lookup_table(struct hashfile *f,
> +			       off_t *offsets)
> +{
> +	uint32_t i;
> +	uint32_t flags = 0;
> +	uint32_t *table, *table_inv;
> +
> +	ALLOC_ARRAY(table, writer.selected_nr);
> +	ALLOC_ARRAY(table_inv, writer.selected_nr);
> +
> +	for (i = 0; i < writer.selected_nr; i++)
> +		table[i] = i;
> +	QSORT(table, writer.selected_nr, table_cmp);
> +	for (i = 0; i < writer.selected_nr; i++)
> +		table_inv[table[i]] = i;

Right... so table[0] will give us the index into writer.selected of the
commit with the earliest OID in lexicographic order. And table_inv goes
the other way around: table_inv[i] will tell us the lexicographic
position of the commit at writer.selected[i].

> +	for (i = 0; i < writer.selected_nr; i++) {
> +		struct bitmapped_commit *selected = &writer.selected[table[i]];
> +		struct object_id *oid = &selected->commit->object.oid;
> +
> +		hashwrite(f, oid->hash, the_hash_algo->rawsz);
> +	}
> +	for (i = 0; i < writer.selected_nr; i++) {
> +		struct bitmapped_commit *selected = &writer.selected[table[i]];
> +
> +		hashwrite_be32(f, offsets[table[i]]);
> +		hashwrite_be32(f, selected->xor_offset
> +			       ? table_inv[table[i] - selected->xor_offset]

...which we need to discover the position of the XOR'd bitmap. Though
I'm not sure if I remember why `table[i] - selected->xor_offset` is
right and not `i - selected->xor_offset`.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 4/6] builtin/pack-objects.c: learn pack.writeBitmapLookupTable
  2022-06-20 12:33 ` [PATCH 4/6] builtin/pack-objects.c: learn pack.writeBitmapLookupTable Taylor Blau via GitGitGadget
@ 2022-06-20 22:18   ` Taylor Blau
  0 siblings, 0 replies; 162+ messages in thread
From: Taylor Blau @ 2022-06-20 22:18 UTC (permalink / raw)
  To: Taylor Blau via GitGitGadget
  Cc: git, Kaartic Sivaram, Abhradeep Chakraborty, Taylor Blau

On Mon, Jun 20, 2022 at 12:33:12PM +0000, Taylor Blau via GitGitGadget wrote:
> From: Taylor Blau <ttaylorr@github.com>
>
> Teach git to provide a way for users to enable/disable bitmap lookup
> table extension by providing a config option named 'writeBitmapLookupTable'.
>
> Signed-off-by: Taylor Blau <ttaylorr@github.com>
> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> ---
>  Documentation/config/pack.txt | 7 +++++++
>  builtin/pack-objects.c        | 8 ++++++++
>  2 files changed, 15 insertions(+)
>
> diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
> index ad7f73a1ead..e12008d2415 100644
> --- a/Documentation/config/pack.txt
> +++ b/Documentation/config/pack.txt
> @@ -164,6 +164,13 @@ When writing a multi-pack reachability bitmap, no new namehashes are
>  computed; instead, any namehashes stored in an existing bitmap are
>  permuted into their appropriate location when writing a new bitmap.
>
> +pack.writeBitmapLookupTable::
> +	When true, git will include a "lookup table" section in the

s/git/Git (I typically use "git" when talking about the command-line
tool, and Git when talking about the project as a proper noun).

> +	bitmap index (if one is written). This table is used to defer
> +	loading individual bitmaps as late as possible. This can be
> +	beneficial in repositories which have relatively large bitmap
> +	indexes. Defaults to false.

Is there a reason that we would want to default to "false" here? Perhaps
in the first version of two we would want this to be an opt-in (since
there is no publicly documented way to opt-out of reading the extension
once it is written).

We should make sure to enable this by default at some point in the
future.

>  pack.writeReverseIndex::

...since it's easy to forget ;-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-06-20 16:56   ` Derrick Stolee
  2022-06-20 17:09     ` Taylor Blau
@ 2022-06-21  8:23     ` Abhradeep Chakraborty
  1 sibling, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-06-21  8:23 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Taylor Blau

Derrick Stole <derrickstolee@github.com> wrote:

> It might be worth mentioning in your commit message what happens when an
> older version of Git (or JGit) notices this flag. Does it refuse to
> operate on the .bitmap file? Does it give a warning or die? It would be
> nice if this extension could be ignored (it seems like adding the extra
> data at the end does not stop the bitmap data from being understood).

No, it doesn't refuse to operate on the .bitmap file. It just ignores the
extension. Will update the commit message.

> Perhaps it would be better to say "the last N * (HASH_LEN + 8) + 4 bytes
> preceding the trailing hash" or something? This gives us a concrete way
> to compute the start of the table, while also being clear that the table
> is included in the trailing hash.

Hmm, well said. Will update it.

> Could you expand that these objects are commit OIDs, one for each bitmap
> in the file. Are they sorted in lexicographical order for binary search,
> or are we expecting to read the entire table into a hashtable in-memory?

Yeah, of course! They are sorted in lexicographical order for binary search.


> Interesting to give the xor chains directions here. You say "position"
> here for the second commit: do you mean within the list of object names
> as opposed to the offset? That would make the most sense so we can trace
> the full list of XORs we need to make all at once.

I think I blundered here. I forgot that the xor-offset is relative to the
current bitmap. The current proposed code takes it as ABSOLUTE value and
tries to find the commit on that position (in the list of commit ids). So,
there are two faults in my code - (1) As the xor-offset have an upper limit
(which is 10 probably; not sure), any of the first 10 commits is always
selected. (2) As xor-offsets are relative to the current bitmap, it depends
On the order of the bitmaps. These bitmaps are ordered by the date of their
corresponding commit and commit ids in the lookup table are ordered
lexicographically. So, we can't use that xor-offset to find the xor'd
commit position.

Will fix it.

> Are .bitmap files already constrained to 4GB, so these 32-bit offsets
> make sense? Using 64-bit offsets would be a small cost here, I think,
> without needing to do any fancy "overflow" tables that could introduce
> a variable-length extension.

I think you're right. I should use 64-bit types here.

> I'm guessing this is at the end of the extension because a future flag
> could modify the length of the extension, so we need the flags to be
> in a predictable location. Could we make that clear somewhere?

Flags are at the end of this extension.

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-06-20 17:09     ` Taylor Blau
@ 2022-06-21  8:31       ` Abhradeep Chakraborty
  2022-06-22 16:26         ` Taylor Blau
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-06-21  8:31 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Derrick Stolee

Taylor Blau <me@ttaylorr.com> wrote:

> Abhradeep -- do you have any thoughts about what this might be used for?
> I'll try to remember it myself, but I imagine that we could just as
> easily remove this altogether and avoid the confusion.

Honestly, I never understood the logic behind adding this flag option.
I thought you have a reason to do that. Even I was thinking of curving
it to 1 byte. I will remove it then.

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-06-20 17:21   ` Taylor Blau
@ 2022-06-21  9:22     ` Abhradeep Chakraborty
  2022-06-22 16:29       ` Taylor Blau
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-06-21  9:22 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Derrick Stolee

Taylor Blau <me@ttaylorr.com> wrote:

>     In cases where the result can be read or computed without
>     significant additional traversal (e.g., all commits of interest
>     already have bitmaps computed), we can save some time loading and
>     parsing a majority of the bitmap file that we will never read.
>
>     But in cases where the bitmaps are out-of-date, or there is
>     significant traversal required to go from the reference tips to
>     what's contained in the .bitmap file, this table provides minimal
>     benefit (or something).
>
> Of course, you should verify that that is actually true before we insert
> it into the commit message as such ;-). But that sort of information may
> help readers understand what the purpose of this change is towards the
> beinning of the series.

The performance tests cover tests for command like "git rev-list --count
--objects --all", "simulated clone", "simulated fetch" etc. And I tested
it with both the Git and Linux. In both cases, the average cost of
"Without lookup table" is bigger than "with lookup table". The margin of
difference is bigger for linux. Though, I need to fix the calculation
of xor-offset (see my reply to derrick), the fix will not affect the
performance too much. So, what you're saying is true. I think I didn't
write the bitmap out-of-date test though.

> Here and elsewhere: I typically use my <me@ttaylorr.com> address when
> contributing to Git. So any trailers that mention my email or commits
> that you send on my behalf should use that address, too.

Ohh, sorry! Will fix it.

> It the space between "(0xf)" and the first ":" intentional? Similarly,
> should there be two or three colons at the end (either "::" or ":::")?

Yes, it is intentional. My previous patch (formatting the bitmap-format.txt)
uses nested description lists. ":::" means it is the level 3 description list.
The space is required else asciidoc will assume that it is level 4 description
list.

> I remember we had a brief off-list discussion about whether we should
> store the full object IDs in the offset table, or whether we could store
> their pack- or index-relative ordering. Is there a reason to prefer one
> or the other?
>
> I don't think we need to explain the choice fully in the documentation
> in this patch, but it may be worth thinking about separately
> nonetheless. We can store either order and convert it to an object ID in
> constant time.
>
> To figure out which is best, I would recommend trying a few different
> choices here and seeing how they do or don't impact your performance
> testing.

I think at that time I thought it would add extra cost of computing
the actual commit ids from those index position. So, I didn't go 
further here.

I still have a feeling that there is some way to get rid of this
list of commit ids. But at the same time, I do not want to add
extra computation to the code.

> A couple of small thoughts here. I wonder if we'd get better locality if
> we made each record look something like:
>
>     (object_id, offset, xor_pos)
>
> Where object_id is either 20- or 4-bytes long (depending if we store the
> full object ID, or some 4-byte identifier that allows us to discover
> it), offset is 8 bytes long, and xor_pos is 4-bytes (since in practice
> we don't support packs or MIDXs which have more than 2^32-1 objects).
>
> In the event that this table doesn't fit into a single cache line, I
> think we'll get better performance out of reading it by not forcing the
> cache to evict itself whenever we need to refer back to the object_id.

Ok, will look into it.

> I mentioned in my reply to Stolee earlier, but I think that we should
> either (a) try to remember what this is for and document it, or (b)
> remove it.

Let us for now remove it.

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-06-20 20:21   ` Derrick Stolee
@ 2022-06-21 10:08     ` Abhradeep Chakraborty
  2022-06-22 16:30       ` Taylor Blau
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-06-21 10:08 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Taylor Blau

Derrick Stolee <derrickstolee@github.com> wrote:

> I think you mean 0x10 (b_1_0000) instead of 0xf (b_1111).
>
> I noticed when looking at the constant in patch 2.

Yes, you're right. It's kind of embarrassment for me :)

If the flag was Oxf it would enable all the extensions.


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 2/6] pack-bitmap: prepare to read lookup table extension
  2022-06-20 20:49   ` Derrick Stolee
@ 2022-06-21 10:28     ` Abhradeep Chakraborty
  0 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-06-21 10:28 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Taylor Blau

Derrick Stolee <derrickstolee@github.com> wrote:

> Here is an attempt to reword this a bit:
>
>   The bitmap lookup table extension was documented by an earlier
>   change, but Git does not yet know how to parse that information.
>   The extension allows parsing a smaller portion of the bitmap
>   file in order to find bitmaps for specific commits.

Got it. Thanks.

> This environment variable does not appear to be used or
> documented anywhere. Do we really want to use it as a way
> to disable reading the lookup table in general? Or would it be
> better to have a GIT_TEST_* variable for disabling the read
> during testing?

GIT_TEST_* is perfect. This was mainly for testing purpose.

> Here, uint32_T is probably fine, but maybe we should just use
 size_t instead? Should we use st_mult() and st_add() everywhere?

Yeah, it would be better to use st_*().

> I see that we have a two-method recursion loop. Please move this
> declaration to immediately before lazy_bitmap_for_commit() so it
> is declared as late as possible.

Ok.

> These two helpers should probably return a size_t and uint32_t
> instead of a pointer. Let these do get_be[32|64]() on the computed
> pointer.

Ok.

> This is using an interesting type of tail-recursion. We might be
> better off using a loop with a stack: push to the stack the commit
> positions of the XOR bitmaps. At the very bottom, we get a bitmap
> without an XOR base. Then, pop off the stack, modifying the bitmap
> with XOR operations as we go. (Perhaps we also store these bitmaps
> in-memory along the way?) Finally, we have the necessary bitmap.

Hmm, got the point. I need to fix the xor-offset related issue first
(That I said earlier) before doing this.

> Perhaps it is time to use hexadecimal representation here to match the
> file format document?

Yeah, of course!

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 2/6] pack-bitmap: prepare to read lookup table extension
  2022-06-20 22:06   ` Taylor Blau
@ 2022-06-21 11:52     ` Abhradeep Chakraborty
  2022-06-22 16:49       ` Taylor Blau
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-06-21 11:52 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Derrick Stolee

Taylor Blau <me@ttaylorr.com> wrote:

> What is the purpose of the GIT_READ_COMMIT_TABLE environment variable? I
> assume that it's to make it easier to run tests (especially performance
> ones) with and without access to the lookup table. If so, we should
> document that (lightly) in the commit message, and rename this to be
> GIT_TEST_READ_COMMIT_TABLE to indicate that it shouldn't be used outside
> of tests.

This is mainly for testing, GIT_TEST_READ_COMMIT_TABLE is perfect.


> All makes sense. Some light documentation might help explain what this
> comparator function is used for (the bsearch() call below in
> bitmap_table_lookup()), although I suspect that this function will get
> slightly more complicated if you pack the table contents as I suggest,
> in which case more documentation will definitely help.

Ok.

> Interesting; this is a point that I forgot about from the original
> patch. xor_pos is an index (not an offset) into the list of commits in
> the table of contents in the order appear in that table. We should be
> clear about (a) what that order is, and (b) that xor_pos is an index
> into that order.

This is exactly what I said in my first reply. I made a mistake here.
(1) As xor_pos is relative to the current bitmap, it depends on the bitmap
entry order. These two order are not same. One is ordered by date, another
is lexicographically ordered. I will fix it.

> Yeah. The problem here is that we can't record every commit that
> _doesn't_ have a bitmap every time we return NULL from one of these
> queries, since there are arbitrarily many such commits that don't have
> bitmaps.
>
> We could approximate it using a Bloom filter or something, and much of
> that code is already written and could be interesting to try and reuse.
> But I wonder if we could get by with something simpler, though, which
> would cause us to load all bitmaps from the lookup table after a fixed
> number of cache misses (at which point we should force ourselves to load
> everything and just read everything out of a single O(1) lookup in the
> stored bitmap table).
>
> That may or may not be a good idea, and the threshold will probably be
> highly dependent on the system. So it may not even be worth it, but I
> think it's an interesting area to experiemnt in and think a little more
> about.

Now I got the point. I wonder what if we leave it as it is. How much will
it affect the code?

> How does this commit_pos work again? I confess I have forgetten since I
> wrote some of this code a while ago... :-).

It is using recursive strategy. The first call to `stored_bitmap_for_commit`
function do not have `pos_hint`. So, it uses `bitmap_table_lookup` to find
the commit position in the list and makes a call to `lazy_bitmap_for_commit`
function. This function gets the offset and xor-offset using the commit id's
position in the list. If xor-offset exists, it is using this xor-offset to
get the xor-bitmap by calling `stored_bitmap_for_commit` again. But this time
`pos_hint` is xor-offset. This goes on till the last non-xor bitmap has found.

As I said before, xor-offset should be an absolute value to make it work
correctly.

> Should we print this regardless of whether or not there is a lookup
> table? We should be able to learn the entry count either way.

No, this is necessary. "Bitmap v1 test (%d entries loaded)" means
all the bitmap entries has been loaded. It is basically for 
`load_bitmap_entries_bitmap_v1` function which loads all the bitmaps
One by one. But if there is a lookup table, `prepare_bitmap_git`
function will not load every entries and thus printing the above
line is wrong.

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 3/6] pack-bitmap-write.c: write lookup table extension
  2022-06-20 22:16   ` Taylor Blau
@ 2022-06-21 12:50     ` Abhradeep Chakraborty
  2022-06-22 16:51       ` Taylor Blau
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-06-21 12:50 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Derrick Stolee

Taylor Blau <me@ttaylorr.com> wrote:

> I'm not sure if I remember why `table[i] - selected->xor_offset` is
> right and not `i - selected->xor_offset`.

Even I myself got confused! Before sending the patch to the mailing
list, I was clear about that. That's why I didn't catch the so called
mistake I have been notifying till now. Thanks Taylor for asking
the question!

I should add a comment before the line so that people can understand it.
Let us parse `table_inv[table[i] - selected->xor_offset]` -

Suppose bitmap entries be like - 

Bitmap 0 (for commit 0)
Bitmap 1 (for commit 1)
Bitmap 2 (for commit 2)
Bitmap 3 (for commit 3)
.
.
.
Bitmap 20 (for commit 20)

These bitmaps are ordered by the date of their corresponding commit.
`table` array maps commit's lexicographic order to its bitmap order.
`table_inv` stores the reverse (i.e. it maps bitmap order to lexicographic
order). Say for example, if commit 4 is lexicographically first among all the
Commits then `table[0]` is 4. Similarly `table[1]`=2, table[2]=1 etc.
`table_inv[4]` is 0, table_inv[2]=1 etc.

Now suppose commit 4's bitmap has xor-relation with commit 2's bitmap.
So, xor-offset for bitmap 4 is 2. And `table[0] - selected->xor_offset`
is equal to 4-2 = 2. It is pointing to the commit 2. Now, 2 is in bitmap
Order. We need to convert it into lexicographic order. So, table_inv[2]
gives us the lexicographic order position of commit 2 I.e. 1.

Long story short, there is no issue regarding xor_offset. This xor_offset
is not relative to the current commit. It is absolute.

Sorry for the initial claim :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-06-21  8:31       ` Abhradeep Chakraborty
@ 2022-06-22 16:26         ` Taylor Blau
  0 siblings, 0 replies; 162+ messages in thread
From: Taylor Blau @ 2022-06-22 16:26 UTC (permalink / raw)
  To: Abhradeep Chakraborty; +Cc: Git, Kaartic Sivaraam, Derrick Stolee

On Tue, Jun 21, 2022 at 02:01:14PM +0530, Abhradeep Chakraborty wrote:
> Taylor Blau <me@ttaylorr.com> wrote:
>
> > Abhradeep -- do you have any thoughts about what this might be used for?
> > I'll try to remember it myself, but I imagine that we could just as
> > easily remove this altogether and avoid the confusion.
>
> Honestly, I never understood the logic behind adding this flag option.
> I thought you have a reason to do that. Even I was thinking of curving
> it to 1 byte. I will remove it then.

I think removing it makes more sense. Since many of the other fields are
4-bytes wide, it's important for alignment purposes that those fields
have addresses which are a multiple of four (relative to the start of
the region, hence the 4-byte wide flags field).

But I'd just as soon get rid of it, so I think that makes sense to me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-06-21  9:22     ` Abhradeep Chakraborty
@ 2022-06-22 16:29       ` Taylor Blau
  2022-06-22 16:45         ` Abhradeep Chakraborty
  0 siblings, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-06-22 16:29 UTC (permalink / raw)
  To: Abhradeep Chakraborty; +Cc: Git, Kaartic Sivaraam, Derrick Stolee

On Tue, Jun 21, 2022 at 02:52:53PM +0530, Abhradeep Chakraborty wrote:
> Taylor Blau <me@ttaylorr.com> wrote:
> > I remember we had a brief off-list discussion about whether we should
> > store the full object IDs in the offset table, or whether we could store
> > their pack- or index-relative ordering. Is there a reason to prefer one
> > or the other?
> >
> > I don't think we need to explain the choice fully in the documentation
> > in this patch, but it may be worth thinking about separately
> > nonetheless. We can store either order and convert it to an object ID in
> > constant time.
> >
> > To figure out which is best, I would recommend trying a few different
> > choices here and seeing how they do or don't impact your performance
> > testing.
>
> I think at that time I thought it would add extra cost of computing
> the actual commit ids from those index position. So, I didn't go
> further here.

It should be negligible relative to everything else, I would imagine.
The function that converts an index position into an object ID is
`nth_packed_object_id()`.

> I still have a feeling that there is some way to get rid of this
> list of commit ids. But at the same time, I do not want to add
> extra computation to the code.

I'm hoping that the additional complexity is minor. And if we can save
some extra bytes that aren't necessary in the first place without
compromising on performance, I think that's worthwhile to do.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-06-21 10:08     ` Abhradeep Chakraborty
@ 2022-06-22 16:30       ` Taylor Blau
  0 siblings, 0 replies; 162+ messages in thread
From: Taylor Blau @ 2022-06-22 16:30 UTC (permalink / raw)
  To: Abhradeep Chakraborty; +Cc: Derrick Stolee, Git, Kaartic Sivaraam

On Tue, Jun 21, 2022 at 03:38:00PM +0530, Abhradeep Chakraborty wrote:
> Derrick Stolee <derrickstolee@github.com> wrote:
>
> > I think you mean 0x10 (b_1_0000) instead of 0xf (b_1111).
> >
> > I noticed when looking at the constant in patch 2.
>
> Yes, you're right. It's kind of embarrassment for me :)

It happens ;). Let's use 0x10 instead.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-06-22 16:29       ` Taylor Blau
@ 2022-06-22 16:45         ` Abhradeep Chakraborty
  0 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-06-22 16:45 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Derrick Stolee

Taylor Blau <me@ttaylorr.com> wrote:

> It should be negligible relative to everything else, I would imagine.
> The function that converts an index position into an object ID is
> `nth_packed_object_id()`.
>
> > I still have a feeling that there is some way to get rid of this
> > list of commit ids. But at the same time, I do not want to add
> > extra computation to the code.
>
> I'm hoping that the additional complexity is minor. And if we can save
> some extra bytes that aren't necessary in the first place without
> compromising on performance, I think that's worthwhile to do.

Ok. I will look into it then.

Most of the reviews has been addressed. Hope I will be able to submit
it soon.

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 2/6] pack-bitmap: prepare to read lookup table extension
  2022-06-21 11:52     ` Abhradeep Chakraborty
@ 2022-06-22 16:49       ` Taylor Blau
  2022-06-22 17:18         ` Abhradeep Chakraborty
  0 siblings, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-06-22 16:49 UTC (permalink / raw)
  To: Abhradeep Chakraborty; +Cc: Git, Kaartic Sivaraam, Derrick Stolee

On Tue, Jun 21, 2022 at 05:22:12PM +0530, Abhradeep Chakraborty wrote:
> Taylor Blau <me@ttaylorr.com> wrote:
> > Yeah. The problem here is that we can't record every commit that
> > _doesn't_ have a bitmap every time we return NULL from one of these
> > queries, since there are arbitrarily many such commits that don't have
> > bitmaps.
> >
> > We could approximate it using a Bloom filter or something, and much of
> > that code is already written and could be interesting to try and reuse.
> > But I wonder if we could get by with something simpler, though, which
> > would cause us to load all bitmaps from the lookup table after a fixed
> > number of cache misses (at which point we should force ourselves to load
> > everything and just read everything out of a single O(1) lookup in the
> > stored bitmap table).
> >
> > That may or may not be a good idea, and the threshold will probably be
> > highly dependent on the system. So it may not even be worth it, but I
> > think it's an interesting area to experiemnt in and think a little more
> > about.
>
> Now I got the point. I wonder what if we leave it as it is. How much will
> it affect the code?

I'm not sure, and I think that it depends a lot on the repository and
query that we're running.

I'd imagine that the effect is probably measurable, but small. Each hash
lookup is cheap, but if there are many such lookups (a large proportion
of which end up resulting in "no, we haven't loaded this bitmap yet" and
then "...because no such bitmap exists for that commit") at some point
it is worth it to fault all of the commits that _do_ have bitmaps in and
answer authoritatively.

In other words, right now we have to do two queries when an commit
doesn't have a bitmap stored:

  - first, a lookup to see whether we have already loaded a bitmap for
    that commit

  - then, a subsequent lookup to see whether the .bitmap file itself has
    a bitmap for that commit, but we just haven't loaded it yet

If we knew that we had loaded all of the bitmaps in the file, then we
could simplify the above two queries into one, since whatever the first
one returns is enough to know whether or not a bitmap exists at all.

> > How does this commit_pos work again? I confess I have forgetten since I
> > wrote some of this code a while ago... :-).
>
> It is using recursive strategy. The first call to `stored_bitmap_for_commit`
> function do not have `pos_hint`. So, it uses `bitmap_table_lookup` to find
> the commit position in the list and makes a call to `lazy_bitmap_for_commit`
> function. This function gets the offset and xor-offset using the commit id's
> position in the list. If xor-offset exists, it is using this xor-offset to
> get the xor-bitmap by calling `stored_bitmap_for_commit` again. But this time
> `pos_hint` is xor-offset. This goes on till the last non-xor bitmap has found.

Ahhh. Thanks for refreshing my memory. I wonder if you think there is a
convenient way to work some of this into a short comment to help other
readers in the future, too.

> As I said before, xor-offset should be an absolute value to make it work
> correctly.

Yep, makes sense.

> > Should we print this regardless of whether or not there is a lookup
> > table? We should be able to learn the entry count either way.
>
> No, this is necessary. "Bitmap v1 test (%d entries loaded)" means
> all the bitmap entries has been loaded. It is basically for
> `load_bitmap_entries_bitmap_v1` function which loads all the bitmaps
> One by one. But if there is a lookup table, `prepare_bitmap_git`
> function will not load every entries and thus printing the above
> line is wrong.

Right, that part makes sense to me. But I wonder if we should still
print something, perhaps just "Bitmap v1 test" or "Bitmap v1 test (%d
entries)" omitting the "loaded" part.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 3/6] pack-bitmap-write.c: write lookup table extension
  2022-06-21 12:50     ` Abhradeep Chakraborty
@ 2022-06-22 16:51       ` Taylor Blau
  0 siblings, 0 replies; 162+ messages in thread
From: Taylor Blau @ 2022-06-22 16:51 UTC (permalink / raw)
  To: Abhradeep Chakraborty; +Cc: Git, Kaartic Sivaraam, Derrick Stolee

On Tue, Jun 21, 2022 at 06:20:54PM +0530, Abhradeep Chakraborty wrote:
> Taylor Blau <me@ttaylorr.com> wrote:
>
> > I'm not sure if I remember why `table[i] - selected->xor_offset` is
> > right and not `i - selected->xor_offset`.
>
> Even I myself got confused! Before sending the patch to the mailing
> list, I was clear about that. That's why I didn't catch the so called
> mistake I have been notifying till now. Thanks Taylor for asking
> the question!
>
> I should add a comment before the line so that people can understand it.
> Let us parse `table_inv[table[i] - selected->xor_offset]` -
>
> Suppose bitmap entries be like -
>
> Bitmap 0 (for commit 0)
> Bitmap 1 (for commit 1)
> Bitmap 2 (for commit 2)
> Bitmap 3 (for commit 3)
> .
> .
> .
> Bitmap 20 (for commit 20)
>
> These bitmaps are ordered by the date of their corresponding commit.
> `table` array maps commit's lexicographic order to its bitmap order.
> `table_inv` stores the reverse (i.e. it maps bitmap order to lexicographic
> order). Say for example, if commit 4 is lexicographically first among all the
> Commits then `table[0]` is 4. Similarly `table[1]`=2, table[2]=1 etc.
> `table_inv[4]` is 0, table_inv[2]=1 etc.
>
> Now suppose commit 4's bitmap has xor-relation with commit 2's bitmap.
> So, xor-offset for bitmap 4 is 2. And `table[0] - selected->xor_offset`
> is equal to 4-2 = 2. It is pointing to the commit 2. Now, 2 is in bitmap
> Order. We need to convert it into lexicographic order. So, table_inv[2]
> gives us the lexicographic order position of commit 2 I.e. 1.
>
> Long story short, there is no issue regarding xor_offset. This xor_offset
> is not relative to the current commit. It is absolute.
>
> Sorry for the initial claim :)

Ahhhhh. Makes perfect sense. Thanks!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 5/6] bitmap-commit-table: add tests for the bitmap lookup table
  2022-06-20 12:33 ` [PATCH 5/6] bitmap-commit-table: add tests for the bitmap lookup table Abhradeep Chakraborty via GitGitGadget
@ 2022-06-22 16:54   ` Taylor Blau
  0 siblings, 0 replies; 162+ messages in thread
From: Taylor Blau @ 2022-06-22 16:54 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Kaartic Sivaram, Abhradeep Chakraborty

On Mon, Jun 20, 2022 at 12:33:13PM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>
> Add tests to check the working of the newly implemented lookup table.
>
> Mentored-by: Taylor Blau <ttaylorr@github.com>
> Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> ---
>  t/t5310-pack-bitmaps.sh       | 14 ++++++++++++++
>  t/t5326-multi-pack-bitmaps.sh | 19 +++++++++++++++++++
>  2 files changed, 33 insertions(+)
>
> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
> index f775fc1ce69..f05d3e6ace7 100755
> --- a/t/t5310-pack-bitmaps.sh
> +++ b/t/t5310-pack-bitmaps.sh
> @@ -43,6 +43,20 @@ test_expect_success 'full repack creates bitmaps' '
>
>  basic_bitmap_tests
>
> +test_expect_success 'using lookup table does not affect basic bitmap tests' '
> +	test_config pack.writeBitmapLookupTable true &&
> +	git repack -adb
> +'

Whether or not we end up making pack.writeBitmapLookupTable be "true" by
default, I wonder if we should just set it to "true" whenever we write a
bitmap in this file, and then adjust whether or not we *read* the lookup
table with the GIT_TEST_ environment variable you introduced a few
commits back.

Thinking on it more, though, I don't think it makes a huge practical
difference for the code here in "t", since these repositories are tiny
and repacking them or rewriting their bitmaps is cheap.

But in the performance tests it probably makes a bigger difference.

> +basic_bitmap_tests
> +
> +test_expect_success 'using lookup table does not let each entries to be parsed one by one' '
> +	test_config pack.writeBitmapLookupTable true &&
> +	git repack -adb &&
> +	git rev-list --test-bitmap HEAD 2>out &&
> +	grep "Found bitmap for" out &&
> +	! grep "Bitmap v1 test "
> +'
> +
>  test_expect_success 'incremental repack fails when bitmaps are requested' '
>  	test_commit more-1 &&
>  	test_must_fail git repack -d 2>err &&
> diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
> index 4fe57414c13..85fbdf5e4bb 100755
> --- a/t/t5326-multi-pack-bitmaps.sh
> +++ b/t/t5326-multi-pack-bitmaps.sh
> @@ -306,5 +306,24 @@ test_expect_success 'graceful fallback when missing reverse index' '
>  		! grep "ignoring extra bitmap file" err
>  	)
>  '
> +test_expect_success 'multi-pack-index write --bitmap writes lookup table if enabled' '
> +	rm -fr repo &&
> +	git init repo &&
> +	test_when_finished "rm -fr repo" &&
> +	(
> +		cd repo &&
> +		test_commit_bulk 106 &&

Is there a reason we need to write this many commits? I think this is
copied from a test further up which deals explicitly with a case where
there are too many commits to write bitmaps for all of them (hence we
need to write more commits than 100 or so).

But I think for our purposes here we just need a single commit, written
into a single pack, which is covered with a MIDX.

So it should suffice to do something like:

    test_commit base &&
    git repack -ad &&
    git config pack.writeBitmapLookupTable true &&
    git multi-pack-index write --bitmap &&
    [...]

instead of what's written here.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 6/6] bitmap-lookup-table: add performance tests
  2022-06-20 12:33 ` [PATCH 6/6] bitmap-lookup-table: add performance tests Abhradeep Chakraborty via GitGitGadget
@ 2022-06-22 17:14   ` Taylor Blau
  0 siblings, 0 replies; 162+ messages in thread
From: Taylor Blau @ 2022-06-22 17:14 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Kaartic Sivaram, Abhradeep Chakraborty

On Mon, Jun 20, 2022 at 12:33:14PM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>
> Add performance tests for bitmap lookup table extension.

These tests look good, though I left a few notes below which boil down
to recommending a separate commit to set pack.writeReverseIndex=true,
and some suggestions for how to clean up the diff in the two performance
scripts you modified.

I would be interested to see the relevant results from running these
perf scripts on a reasonably large-sized repository, e.g. the kernel or
similar.

For the next version of this series, would you mind running these
scripts and including the results in this commit message?

> Mentored-by: Taylor Blau <ttaylorr@github.com>
> Co-mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> ---
>  t/perf/p5310-pack-bitmaps.sh       | 60 +++++++++++++++++++-----------
>  t/perf/p5326-multi-pack-bitmaps.sh | 55 +++++++++++++++++----------
>  2 files changed, 73 insertions(+), 42 deletions(-)
>
> diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
> index 7ad4f237bc3..a8d9414de92 100755
> --- a/t/perf/p5310-pack-bitmaps.sh
> +++ b/t/perf/p5310-pack-bitmaps.sh
> @@ -10,10 +10,11 @@ test_perf_large_repo
>  # since we want to be able to compare bitmap-aware
>  # git versus non-bitmap git
>  #
> -# We intentionally use the deprecated pack.writebitmaps
> +# We intentionally use the deprecated pack.writeBitmaps
>  # config so that we can test against older versions of git.
>  test_expect_success 'setup bitmap config' '
> -	git config pack.writebitmaps true
> +	git config pack.writeBitmaps true &&
> +	git config pack.writeReverseIndex true

I suspect that eliminating the overhead of generating the reverse index
in memory is important to see the effect of this test. We should make
sure that this is done in a separate step so when we compare two commits
that both have a reverse index written.

That being said, we should probably make reverse indexes be the default
anyways, since they help significantly with all kinds of things (really,
any operation which has to generate a reverse index in memory, like
preparing a pack to push, the '%(objectsize:disk)' cat-file formatting
atom, and so on.

So at a minimum I would suggest extracting a separate commit here which
sets pack.writeReverseIndex to true for this test. That way the commit
prior to this has reverse indexes written, and comparing "this commit"
to "the previous one" is isolating the effect of just the lookup table.

But as a useful sideproject, it would be worthwhile to investigate
setting this to true by default everywhere, perhaps after this series
has settled a little more (or if you are blocked / want something else
to do).

>  '
>
>  # we need to create the tag up front such that it is covered by the repack and
> @@ -28,27 +29,42 @@ test_perf 'repack to disk' '
>
>  test_full_bitmap
>
> -test_expect_success 'create partial bitmap state' '
> -	# pick a commit to represent the repo tip in the past
> -	cutoff=$(git rev-list HEAD~100 -1) &&
> -	orig_tip=$(git rev-parse HEAD) &&
> -
> -	# now kill off all of the refs and pretend we had
> -	# just the one tip
> -	rm -rf .git/logs .git/refs/* .git/packed-refs &&
> -	git update-ref HEAD $cutoff &&
> -
> -	# and then repack, which will leave us with a nice
> -	# big bitmap pack of the "old" history, and all of
> -	# the new history will be loose, as if it had been pushed
> -	# up incrementally and exploded via unpack-objects
> -	git repack -Ad &&
> -
> -	# and now restore our original tip, as if the pushes
> -	# had happened
> -	git update-ref HEAD $orig_tip
> +test_perf 'use lookup table' '
> +    git config pack.writeBitmapLookupTable true
>  '

This part doesn't need to use 'test_perf', since we don't care about the
performance of running "git config". Instead, using
`test_expect_success` is more appropriate here.

> -test_partial_bitmap
> +test_perf 'repack to disk (lookup table)' '
> +    git repack -adb
> +'
> +
> +test_full_bitmap
> +
> +for i in false true
> +do
> +	$i && lookup=" (lookup table)"
> +	test_expect_success "create partial bitmap state$lookup" '
> +		git config pack.writeBitmapLookupTable '"$i"' &&
> +		# pick a commit to represent the repo tip in the past
> +		cutoff=$(git rev-list HEAD~100 -1) &&
> +		orig_tip=$(git rev-parse HEAD) &&
> +
> +		# now kill off all of the refs and pretend we had
> +		# just the one tip
> +		rm -rf .git/logs .git/refs/* .git/packed-refs &&
> +		git update-ref HEAD $cutoff &&
> +
> +		# and then repack, which will leave us with a nice
> +		# big bitmap pack of the "old" history, and all of
> +		# the new history will be loose, as if it had been pushed
> +		# up incrementally and exploded via unpack-objects
> +		git repack -Ad &&
> +
> +		# and now restore our original tip, as if the pushes
> +		# had happened
> +		git update-ref HEAD $orig_tip
> +	'
> +
> +	test_partial_bitmap
> +done

Could we extract the body of this loop into a function whose first
argument is either true/false? I think that would improve readability
here, and potentially clean up the diff a little bit.

For what it's worth, I don't think we need to do anything fancier for
the test name other than:


    test_partial_bitmap () {
      local enabled="$1"
      test_expect_success "create partial bitmap state (lookup=$enabled)" '
        git config pack.writeBitmapLookupTable "$enabled" &&
        [...]
      '
    }

    test_partial_bitmap false
    test_partial_bitmap true

or something.

> +for i in false true
> +do
> +	$i && lookup=" (lookup table)"
> +	test_expect_success "create partial bitmap state$lookup" '
> +		git config pack.writeBitmapLookupTable '"$i"' &&
> +		# pick a commit to represent the repo tip in the past
> +		cutoff=$(git rev-list HEAD~100 -1) &&
> +		orig_tip=$(git rev-parse HEAD) &&
> +
> +		# now pretend we have just one tip
> +		rm -rf .git/logs .git/refs/* .git/packed-refs &&
> +		git update-ref HEAD $cutoff &&
> +
> +		# and then repack, which will leave us with a nice
> +		# big bitmap pack of the "old" history, and all of
> +		# the new history will be loose, as if it had been pushed
> +		# up incrementally and exploded via unpack-objects
> +		git repack -Ad &&
> +		git multi-pack-index write --bitmap &&
> +
> +		# and now restore our original tip, as if the pushes
> +		# had happened
> +		git update-ref HEAD $orig_tip
> +	'
> +
> +	test_partial_bitmap
> +done

Same note here.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 2/6] pack-bitmap: prepare to read lookup table extension
  2022-06-22 16:49       ` Taylor Blau
@ 2022-06-22 17:18         ` Abhradeep Chakraborty
  2022-06-22 21:34           ` Taylor Blau
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-06-22 17:18 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Derrick Stolee

Taylor Blau <me@ttaylorr.com> wrote:

> In other words, right now we have to do two queries when an commit
> doesn't have a bitmap stored:
>
>   - first, a lookup to see whether we have already loaded a bitmap for
>     that commit
>
>   - then, a subsequent lookup to see whether the .bitmap file itself has
>     a bitmap for that commit, but we just haven't loaded it yet
>
> If we knew that we had loaded all of the bitmaps in the file, then we
> could simplify the above two queries into one, since whatever the first
> one returns is enough to know whether or not a bitmap exists at all.

Hmm, agreed.

> Ahhh. Thanks for refreshing my memory. I wonder if you think there is a
> convenient way to work some of this into a short comment to help other
> readers in the future, too.

Actually, Derrick has suggested to go with iterative approach[1] instead of
Recursive approach. What's your view on it?

> Right, that part makes sense to me. But I wonder if we should still
> print something, perhaps just "Bitmap v1 test" or "Bitmap v1 test (%d
> entries)" omitting the "loaded" part.

Yeah, of course we can!

Thanks :)

[1] https://lore.kernel.org/git/92dc6860-ff35-0989-5114-fe1e220ca10c@github.com/

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH 2/6] pack-bitmap: prepare to read lookup table extension
  2022-06-22 17:18         ` Abhradeep Chakraborty
@ 2022-06-22 21:34           ` Taylor Blau
  0 siblings, 0 replies; 162+ messages in thread
From: Taylor Blau @ 2022-06-22 21:34 UTC (permalink / raw)
  To: Abhradeep Chakraborty; +Cc: Git, Kaartic Sivaraam, Derrick Stolee

On Wed, Jun 22, 2022 at 10:48:14PM +0530, Abhradeep Chakraborty wrote:
> > Ahhh. Thanks for refreshing my memory. I wonder if you think there is a
> > convenient way to work some of this into a short comment to help other
> > readers in the future, too.
>
> Actually, Derrick has suggested to go with iterative approach[1] instead of
> Recursive approach. What's your view on it?

I don't have a strong feeling about it. In practice, we seem to top out
at ~500 bitmaps or so for large-ish repositories, so I would be
surprised to see this result in stack exhaustion even in the worst case
(every bitmap xor'd with the previous one, forming a long chain).

But it doesn't hurt to be defensive, so I think it's worth it as long as
you don't find the implementation too complex.

> [1] https://lore.kernel.org/git/92dc6860-ff35-0989-5114-fe1e220ca10c@github.com/

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH v2 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format
  2022-06-20 12:33 [PATCH 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                   ` (5 preceding siblings ...)
  2022-06-20 12:33 ` [PATCH 6/6] bitmap-lookup-table: add performance tests Abhradeep Chakraborty via GitGitGadget
@ 2022-06-26 13:10 ` Abhradeep Chakraborty via GitGitGadget
  2022-06-26 13:10   ` [PATCH v2 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
                     ` (6 more replies)
  6 siblings, 7 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-26 13:10 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee, Abhradeep Chakraborty

When parsing the .bitmap file, git loads all the bitmaps one by one even if
some of the bitmaps are not necessary. We can remove this overhead by
loading only the necessary bitmaps. A look up table extension can solve this
issue.

Changes since v1:

This is the second version which addressed all (I think) the reviews. Please
notify me if some reviews are not addressed :)

 * The table size is decreased and the format has also changed. It now
   contains nr_entries triplets of size 4+8+4 bytes. Each triplet contains
   the following things - (1) 4 byte commit position (in the pack-index or
   midx) (2) 8 byte offset and (3) 4 byte xor triplet (i.e. with whose
   bitmap the current triplet's bitmap has to xor) position.
 * Performance tests are splitted into two commits. First contains the
   actual performance tests and second enables the pack.writeReverseIndex
   (as suggested by Taylor).
 * st_*() functions are used.
 * commit order is changed according to Derrick's suggestion.
 * Iterative approach is used instead of recursive approach to parse xor
   bitmaps. (As suggested by Derrick).
 * Some minor bug fixes of previous version.

Initial version:

The proposed table has:

 * a list of nr_entries object ids. These objects are commits that has
   bitmaps. Ids are stored in lexicographic order (for better searching).
 * a list of <offset, xor-offset> pairs (4-byte integers, network-byte
   order). The i'th pair denotes the offset and xor-offset(respectively) of
   the bitmap of i'th commit in the previous list. These two informations
   are necessary because only in this way bitmaps can be found without
   parsing all the bitmap.
 * a 4-byte integer for table specific flags (none exists currently).

Whenever git want to parse the bitmap for a specific commit, it will first
refer to the table and will look for the offset and xor-offset for that
commit. Git will then try to parse the bitmap located at the offset
position. The xor-offset can be used to find the xor-bitmap for the
bitmap(if any).

Abhradeep Chakraborty (6):
  Documentation/technical: describe bitmap lookup table extension
  pack-bitmap-write.c: write lookup table extension
  pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  pack-bitmap: prepare to read lookup table extension
  bitmap-lookup-table: add performance tests for lookup table
  p5310-pack-bitmaps.sh: enable pack.writeReverseIndex for testing

 Documentation/config/pack.txt             |   7 +
 Documentation/technical/bitmap-format.txt |  41 +++++
 builtin/multi-pack-index.c                |   8 +
 builtin/pack-objects.c                    |  10 +-
 midx.c                                    |   3 +
 midx.h                                    |   1 +
 pack-bitmap-write.c                       |  74 ++++++++-
 pack-bitmap.c                             | 193 ++++++++++++++++++++--
 pack-bitmap.h                             |   5 +-
 t/perf/p5310-pack-bitmaps.sh              |  66 ++++----
 t/perf/p5326-multi-pack-bitmaps.sh        |  93 ++++++-----
 t/t5310-pack-bitmaps.sh                   |  10 +-
 t/t5326-multi-pack-bitmaps.sh             |  14 ++
 13 files changed, 439 insertions(+), 86 deletions(-)


base-commit: 39c15e485575089eb77c769f6da02f98a55905e0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1266%2FAbhra303%2Fbitmap-commit-table-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1266/Abhra303/bitmap-commit-table-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1266

Range-diff vs v1:

 1:  2e22ca5069a ! 1:  4d11be66cfa Documentation/technical: describe bitmap lookup table extension
     @@ Commit message
          When reading bitmap file, git loads each and every bitmap one by one
          even if all the bitmaps are not required. A "bitmap lookup table"
          extension to the bitmap format can reduce the overhead of loading
     -    bitmaps which stores a list of bitmapped commit oids, along with their
     -    offset and xor offset. This way git can load only the neccesary bitmaps
     -    without loading the previous bitmaps.
     +    bitmaps which stores a list of bitmapped commit id pos (in the midx
     +    or pack, along with their offset and xor offset. This way git can
     +    load only the neccesary bitmaps without loading the previous bitmaps.
     +
     +    The older version of Git ignores the lookup table extension and doesn't
     +    throw any kind of warning or error while parsing the bitmap file.
      
          Add some information for the new "bitmap lookup table" extension in the
          bitmap-format documentation.
      
     -    Co-Authored-by: Taylor Blau <ttaylorr@github.com>
     -    Mentored-by: Taylor Blau <ttaylorr@github.com>
     +    Co-Authored-by: Taylor Blau <me@ttaylorr.com>
     +    Mentored-by: Taylor Blau <me@ttaylorr.com>
          Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
          Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
      
     @@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cac
       			described below.
       
      +			** {empty}
     -+			BITMAP_OPT_LOOKUP_TABLE (0xf) : :::
     ++			BITMAP_OPT_LOOKUP_TABLE (0x10): :::
      +			If present, the end of the bitmap file contains a table
     -+			containing a list of `N` object ids, a list of pairs of
     -+			offset and xor offset of respective objects, and 4-byte
     -+			integer denoting the flags (currently none). The format
     -+			and meaning of the table is described below.
     ++			containing a list of `N` <commit pos, offset, xor offset>
     ++			triplets. The format and meaning of the table is described
     ++			below.
     +++
     ++NOTE: This xor_offset is different from the bitmap's xor_offset.
     ++Bitmap's xor_offset is relative i.e. it tells how many bitmaps we have
     ++to go back from the current bitmap. Lookup table's xor_offset tells the
     ++position of the triplet in the list whose bitmap the current commit's
     ++bitmap have to xor with.
      +
       		4-byte entry count (network byte order)
       
     @@ Documentation/technical/bitmap-format.txt: Note that this hashing scheme is tied
      +Commit lookup table
      +-------------------
      +
     -+If the BITMAP_OPT_LOOKUP_TABLE flag is set, the end of the `.bitmap`
     -+contains a lookup table specifying the positions of commits which have a
     -+bitmap.
     ++If the BITMAP_OPT_LOOKUP_TABLE flag is set, the last `N * (4 + 8 + 4)`
     ++(preceding the name-hash cache and trailing hash) of the `.bitmap` file
     ++contains a lookup table specifying the information needed to get the
     ++desired bitmap from the entries without parsing previous unnecessary
     ++bitmaps.
      +
     -+For a `.bitmap` containing `nr_entries` reachability bitmaps, the format
     -+is as follows:
     ++For a `.bitmap` containing `nr_entries` reachability bitmaps, the table
     ++contains a list of `nr_entries` <commit pos, offset, xor offset> triplets.
     ++The content of i'th triplet is -
      +
     -+	- `nr_entries` object names.
     ++	* {empty}
     ++	commit pos (4 byte integer, network byte order): ::
     ++	It stores the object position of the commit (in the midx or pack index)
     ++	to which the i'th bitmap in the bitmap entries belongs.
      +
     -+	- `nr_entries` pairs of 4-byte integers, each in network order.
     -+	  The first holds the offset from which that commit's bitmap can
     -+	  be read. The second number holds the position of the commit
     -+	  whose bitmap the current bitmap is xor'd with in lexicographic
     -+	  order, or 0xffffffff if the current commit is not xor'd with
     -+	  anything.
     ++	* {empty}
     ++	offset (8 byte integer, network byte order): ::
     ++	The offset from which that commit's bitmap can be read.
      +
     -+	- One 4-byte network byte order integer specifying
     -+	  table-specific flags. None exist currently, so this is always
     -+	  "0".
     ++	* {empty}
     ++	xor offset (4 byte integer, network byte order): ::
     ++	It holds the position of the triplet with whose bitmap the
     ++	current bitmap need to xor. If the current triplet's bitmap
     ++	do not have any xor bitmap, it defaults to 0xffffffff.
 3:  ed91ebf69a8 ! 2:  d118f1d45e6 pack-bitmap-write.c: write lookup table extension
     @@ Metadata
       ## Commit message ##
          pack-bitmap-write.c: write lookup table extension
      
     -    Teach git to write bitmap lookup table extension. The table has the
     -    following information:
     +    The bitmap lookup table extension was documentated by an earlier
     +    change, but Git does not yet knowhow to write that extension.
      
     -        - `N` no of Object ids of each bitmapped commits
     +    Teach git to write bitmap lookup table extension. The table contains
     +    the list of `N` <commit pos, offset, xor offset>` triplets. These
     +    triplets are sorted according to their commit pos (ascending order).
     +    The meaning of each data in the i'th triplet is given below:
      
     -        - A list of offset, xor-offset pair; the i'th pair denotes the
     -          offsets and xor-offsets of i'th commit in the previous list.
     +      - Commit pos is the position of the commit in the pack-index
     +        (or midx) to which the i'th bitmap belongs. It is a 4 byte
     +        network byte order integer.
      
     -        - 4-byte integer denoting the flags
     +      - offset is the position of the i'th bitmap.
      
     -    Co-authored-by: Taylor Blau <ttaylorr@github.com>
     -    Mentored-by: Taylor Blau <ttaylorr@github.com>
     +      - xor offset denotes the position of the triplet with whose
     +        bitmap the current triplet's bitmap need to xor with.
     +
     +    Co-authored-by: Taylor Blau <me@ttaylorr.com>
     +    Mentored-by: Taylor Blau <me@ttaylorr.com>
          Co-mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
          Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
      
     @@ pack-bitmap-write.c: static const struct object_id *oid_access(size_t pos, const
       				      struct pack_idx_entry **index,
      -				      uint32_t index_nr)
      +				      uint32_t index_nr,
     -+				      off_t *offsets)
     ++				      uint64_t *offsets,
     ++				      uint32_t *commit_positions)
       {
       	int i;
       
     @@ pack-bitmap-write.c: static void write_selected_commits_v1(struct hashfile *f,
       
      +		if (offsets)
      +			offsets[i] = hashfile_total(f);
     ++		if (commit_positions)
     ++			commit_positions[i] = commit_pos;
      +
       		hashwrite_be32(f, commit_pos);
       		hashwrite_u8(f, stored->xor_offset);
     @@ pack-bitmap-write.c: static void write_selected_commits_v1(struct hashfile *f,
       	}
       }
       
     -+static int table_cmp(const void *_va, const void *_vb)
     ++static int table_cmp(const void *_va, const void *_vb, void *commit_positions)
      +{
     -+	return oidcmp(&writer.selected[*(uint32_t*)_va].commit->object.oid,
     -+		      &writer.selected[*(uint32_t*)_vb].commit->object.oid);
     ++	int8_t result = 0;
     ++	uint32_t *positions = (uint32_t *) commit_positions;
     ++	uint32_t a = positions[*(uint32_t *)_va];
     ++	uint32_t b = positions[*(uint32_t *)_vb];
     ++
     ++	if (a > b)
     ++		result = 1;
     ++	else if (a < b)
     ++		result = -1;
     ++	else
     ++		result = 0;
     ++
     ++	return result;
      +}
      +
      +static void write_lookup_table(struct hashfile *f,
     -+			       off_t *offsets)
     ++			       uint64_t *offsets,
     ++			       uint32_t *commit_positions)
      +{
      +	uint32_t i;
     -+	uint32_t flags = 0;
      +	uint32_t *table, *table_inv;
      +
      +	ALLOC_ARRAY(table, writer.selected_nr);
     @@ pack-bitmap-write.c: static void write_selected_commits_v1(struct hashfile *f,
      +
      +	for (i = 0; i < writer.selected_nr; i++)
      +		table[i] = i;
     -+	QSORT(table, writer.selected_nr, table_cmp);
     ++
     ++	QSORT_S(table, writer.selected_nr, table_cmp, commit_positions);
     ++
      +	for (i = 0; i < writer.selected_nr; i++)
      +		table_inv[table[i]] = i;
      +
      +	for (i = 0; i < writer.selected_nr; i++) {
      +		struct bitmapped_commit *selected = &writer.selected[table[i]];
     -+		struct object_id *oid = &selected->commit->object.oid;
     ++		uint32_t xor_offset = selected->xor_offset;
      +
     -+		hashwrite(f, oid->hash, the_hash_algo->rawsz);
     ++		hashwrite_be32(f, commit_positions[table[i]]);
     ++		hashwrite_be64(f, offsets[table[i]]);
     ++		hashwrite_be32(f, xor_offset ?
     ++				table_inv[table[i] - xor_offset]: 0xffffffff);
      +	}
     -+	for (i = 0; i < writer.selected_nr; i++) {
     -+		struct bitmapped_commit *selected = &writer.selected[table[i]];
     -+
     -+		hashwrite_be32(f, offsets[table[i]]);
     -+		hashwrite_be32(f, selected->xor_offset
     -+			       ? table_inv[table[i] - selected->xor_offset]
     -+			       : 0xffffffff);
     -+	}
     -+
     -+	hashwrite_be32(f, flags);
      +
      +	free(table);
      +	free(table_inv);
     @@ pack-bitmap-write.c: void bitmap_writer_finish(struct pack_idx_entry **index,
       {
       	static uint16_t default_version = 1;
       	static uint16_t flags = BITMAP_OPT_FULL_DAG;
     -+	off_t *offsets = NULL;
     ++	uint64_t *offsets = NULL;
     ++	uint32_t *commit_positions = NULL;
       	struct strbuf tmp_file = STRBUF_INIT;
       	struct hashfile *f;
       
     @@ pack-bitmap-write.c: void bitmap_writer_finish(struct pack_idx_entry **index,
       	dump_bitmap(f, writer.tags);
      -	write_selected_commits_v1(f, index, index_nr);
       
     -+	if (options & BITMAP_OPT_LOOKUP_TABLE)
     ++	if (options & BITMAP_OPT_LOOKUP_TABLE) {
      +		CALLOC_ARRAY(offsets, index_nr);
     ++		CALLOC_ARRAY(commit_positions, index_nr);
     ++	}
      +
     -+	write_selected_commits_v1(f, index, index_nr, offsets);
     ++	write_selected_commits_v1(f, index, index_nr, offsets, commit_positions);
      +
      +	if (options & BITMAP_OPT_LOOKUP_TABLE)
     -+		write_lookup_table(f, offsets);
     ++		write_lookup_table(f, offsets, commit_positions);
       	if (options & BITMAP_OPT_HASH_CACHE)
       		write_hash_cache(f, index, index_nr);
       
     @@ pack-bitmap-write.c: void bitmap_writer_finish(struct pack_idx_entry **index,
       
       	strbuf_release(&tmp_file);
      +	free(offsets);
     ++	free(commit_positions);
       }
     +
     + ## pack-bitmap.h ##
     +@@ pack-bitmap.h: struct bitmap_disk_header {
     + #define NEEDS_BITMAP (1u<<22)
     + 
     + enum pack_bitmap_opts {
     +-	BITMAP_OPT_FULL_DAG = 1,
     +-	BITMAP_OPT_HASH_CACHE = 4,
     ++	BITMAP_OPT_FULL_DAG = 0x1,
     ++	BITMAP_OPT_HASH_CACHE = 0x4,
     ++	BITMAP_OPT_LOOKUP_TABLE = 0x10,
     + };
     + 
     + enum pack_bitmap_flags {
 4:  661c1137e1c ! 3:  7786dc879f0 builtin/pack-objects.c: learn pack.writeBitmapLookupTable
     @@
       ## Metadata ##
     -Author: Taylor Blau <ttaylorr@github.com>
     +Author: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
      
       ## Commit message ##
     -    builtin/pack-objects.c: learn pack.writeBitmapLookupTable
     +    pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
      
          Teach git to provide a way for users to enable/disable bitmap lookup
          table extension by providing a config option named 'writeBitmapLookupTable'.
     +    Default is true.
      
     -    Signed-off-by: Taylor Blau <ttaylorr@github.com>
     +    Also add test to verify writting of lookup table.
     +
     +    Co-Authored-by: Taylor Blau <me@ttaylorr.com>
          Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
     +    Mentored-by: Taylor Blau <me@ttaylorr.com>
     +    Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
      
       ## Documentation/config/pack.txt ##
      @@ Documentation/config/pack.txt: When writing a multi-pack reachability bitmap, no new namehashes are
     @@ Documentation/config/pack.txt: When writing a multi-pack reachability bitmap, no
      +	bitmap index (if one is written). This table is used to defer
      +	loading individual bitmaps as late as possible. This can be
      +	beneficial in repositories which have relatively large bitmap
     -+	indexes. Defaults to false.
     ++	indexes. Defaults to true.
      +
       pack.writeReverseIndex::
       	When true, git will write a corresponding .rev file (see:
       	link:../technical/pack-format.html[Documentation/technical/pack-format.txt])
      
     + ## builtin/multi-pack-index.c ##
     +@@ builtin/multi-pack-index.c: static int git_multi_pack_index_write_config(const char *var, const char *value,
     + 			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
     + 	}
     + 
     ++	if (!strcmp(var, "pack.writebitmaplookuptable")) {
     ++		if (git_config_bool(var, value))
     ++			opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;
     ++		else
     ++			opts.flags &= ~MIDX_WRITE_BITMAP_LOOKUP_TABLE;
     ++	}
     ++
     + 	/*
     + 	 * We should never make a fall-back call to 'git_default_config', since
     + 	 * this was already called in 'cmd_multi_pack_index()'.
     +@@ builtin/multi-pack-index.c: static int cmd_multi_pack_index_write(int argc, const char **argv)
     + 	};
     + 
     + 	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
     ++	opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;
     + 
     + 	git_config(git_multi_pack_index_write_config, NULL);
     + 
     +
       ## builtin/pack-objects.c ##
     +@@ builtin/pack-objects.c: static enum {
     + 	WRITE_BITMAP_QUIET,
     + 	WRITE_BITMAP_TRUE,
     + } write_bitmap_index;
     +-static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE;
     ++static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE | BITMAP_OPT_LOOKUP_TABLE;
     + 
     + static int exclude_promisor_objects;
     + 
      @@ builtin/pack-objects.c: static int git_pack_config(const char *k, const char *v, void *cb)
       		else
       			write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE;
     @@ builtin/pack-objects.c: static int git_pack_config(const char *k, const char *v,
       	if (!strcmp(k, "pack.usebitmaps")) {
       		use_bitmap_index_default = git_config_bool(k, v);
       		return 0;
     +
     + ## midx.c ##
     +@@ midx.c: static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
     + 	if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
     + 		options |= BITMAP_OPT_HASH_CACHE;
     + 
     ++	if (flags & MIDX_WRITE_BITMAP_LOOKUP_TABLE)
     ++		options |= BITMAP_OPT_LOOKUP_TABLE;
     ++
     + 	prepare_midx_packing_data(&pdata, ctx);
     + 
     + 	commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx);
     +
     + ## midx.h ##
     +@@ midx.h: struct multi_pack_index {
     + #define MIDX_WRITE_REV_INDEX (1 << 1)
     + #define MIDX_WRITE_BITMAP (1 << 2)
     + #define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
     ++#define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
     + 
     + const unsigned char *get_midx_checksum(struct multi_pack_index *m);
     + void get_midx_filename(struct strbuf *out, const char *object_dir);
     +
     + ## pack-bitmap-write.c ##
     +@@ pack-bitmap-write.c: static void write_lookup_table(struct hashfile *f,
     + 	for (i = 0; i < writer.selected_nr; i++)
     + 		table_inv[table[i]] = i;
     + 
     ++	trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository);
     + 	for (i = 0; i < writer.selected_nr; i++) {
     + 		struct bitmapped_commit *selected = &writer.selected[table[i]];
     + 		uint32_t xor_offset = selected->xor_offset;
     +@@ pack-bitmap-write.c: static void write_lookup_table(struct hashfile *f,
     + 
     + 	free(table);
     + 	free(table_inv);
     ++	trace2_region_leave("pack-bitmap-write", "writing_lookup_table", the_repository);
     + }
     + 
     + static void write_hash_cache(struct hashfile *f,
     +
     + ## t/t5310-pack-bitmaps.sh ##
     +@@ t/t5310-pack-bitmaps.sh: test_expect_success 'full repack creates bitmaps' '
     + 	ls .git/objects/pack/ | grep bitmap >output &&
     + 	test_line_count = 1 output &&
     + 	grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
     +-	grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
     ++	grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace &&
     ++	grep "\"label\":\"writing_lookup_table\"" trace
     + '
     + 
     + basic_bitmap_tests
     +
     + ## t/t5326-multi-pack-bitmaps.sh ##
     +@@ t/t5326-multi-pack-bitmaps.sh: test_expect_success 'graceful fallback when missing reverse index' '
     + 	)
     + '
     + 
     ++test_expect_success 'multi-pack-index write writes lookup table if enabled' '
     ++	rm -fr repo &&
     ++	git init repo &&
     ++	test_when_finished "rm -fr repo" &&
     ++	(
     ++		cd repo &&
     ++		test_commit base &&
     ++		git repack -ad &&
     ++		GIT_TRACE2_EVENT="$(pwd)/trace" \
     ++			git multi-pack-index write --bitmap &&
     ++		grep "\"label\":\"writing_lookup_table\"" trace
     ++	)
     ++'
     + test_done
 2:  d139a4c48aa ! 4:  4fbfcff8a20 pack-bitmap: prepare to read lookup table extension
     @@ Metadata
       ## Commit message ##
          pack-bitmap: prepare to read lookup table extension
      
     -    Bitmap lookup table extension can let git to parse only the necessary
     -    bitmaps without loading the previous bitmaps one by one.
     +    Earlier change teaches Git to write bitmap lookup table. But Git
     +    does not know how to parse them.
      
     -    Teach git to read and use the bitmap lookup table extension.
     +    Teach Git to parse the existing bitmap lookup table. The older
     +    versions of git are not affected by it. Those versions ignore the
     +    lookup table.
      
     -    Co-Authored-by: Taylor Blau <ttaylorr@github.com>
     -    Mentored-by: Taylor Blau <ttaylorr@github.com>
     -    Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
          Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
     +    Mentored-by: Taylor Blau <me@ttaylorr.com>
     +    Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
      
       ## pack-bitmap.c ##
     -@@
     - #include "list-objects-filter-options.h"
     - #include "midx.h"
     - #include "config.h"
     -+#include "hash-lookup.h"
     - 
     - /*
     -  * An entry on the bitmap index, representing the bitmap for a given
      @@ pack-bitmap.c: struct bitmap_index {
       	/* The checksum of the packfile or MIDX; points into map. */
       	const unsigned char *checksum;
       
      +	/*
     -+	 * If not NULL, these point into the various commit table sections
     ++	 * If not NULL, this point into the commit table extension
      +	 * (within map).
      +	 */
      +	unsigned char *table_lookup;
     -+	unsigned char *table_offsets;
      +
       	/*
       	 * Extended index.
     @@ pack-bitmap.c: static int load_bitmap_header(struct bitmap_index *index)
       		}
      +
      +		if (flags & BITMAP_OPT_LOOKUP_TABLE &&
     -+		    git_env_bool("GIT_READ_COMMIT_TABLE", 1)) {
     -+			uint32_t entry_count = ntohl(header->entry_count);
     -+			uint32_t table_size =
     -+				(entry_count * the_hash_algo->rawsz) /* oids */ +
     -+				(entry_count * sizeof(uint32_t)) /* offsets */ +
     -+				(entry_count * sizeof(uint32_t)) /* xor offsets */ +
     -+				(sizeof(uint32_t)) /* flags */;
     ++			git_env_bool("GIT_TEST_READ_COMMIT_TABLE", 1)) {
     ++			size_t table_size = 0;
     ++			size_t triplet_sz = st_add3(sizeof(uint32_t),    /* commit position */
     ++							sizeof(uint64_t),    /* offset */
     ++							sizeof(uint32_t));    /* xor offset */
      +
     ++			table_size = st_add(table_size,
     ++					st_mult(ntohl(header->entry_count),
     ++						triplet_sz));
      +			if (table_size > index_end - index->map - header_size)
     -+				return error("corrupted bitmap index file (too short to fit commit table)");
     -+
     ++				return error("corrupted bitmap index file (too short to fit lookup table)");
      +			index->table_lookup = (void *)(index_end - table_size);
     -+			index->table_offsets = index->table_lookup + the_hash_algo->rawsz * entry_count;
     -+
      +			index_end -= table_size;
      +		}
       	}
       
       	index->entry_count = ntohl(header->entry_count);
     +@@ pack-bitmap.c: static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
     + 
     + 	hash_pos = kh_put_oid_map(index->bitmaps, stored->oid, &ret);
     + 
     +-	/* a 0 return code means the insertion succeeded with no changes,
     +-	 * because the SHA1 already existed on the map. this is bad, there
     +-	 * shouldn't be duplicated commits in the index */
     ++	/* A 0 return code means the insertion succeeded with no changes,
     ++	 * because the SHA1 already existed on the map. If lookup table
     ++	 * is NULL, this is bad, there shouldn't be duplicated commits
     ++	 * in the index.
     ++	 *
     ++	 * If table_lookup exists, that means the desired bitmap is already
     ++	 * loaded. Either this bitmap has been stored directly or another
     ++	 * bitmap has a direct or indirect xor relation with it. */
     + 	if (ret == 0) {
     +-		error("Duplicate entry in bitmap index: %s", oid_to_hex(oid));
     +-		return NULL;
     ++		if (!index->table_lookup) {
     ++			error("Duplicate entry in bitmap index: %s", oid_to_hex(oid));
     ++			return NULL;
     ++		}
     ++		return kh_value(index->bitmaps, hash_pos);
     + 	}
     + 
     + 	kh_value(index->bitmaps, hash_pos) = stored;
      @@ pack-bitmap.c: static int load_bitmap(struct bitmap_index *bitmap_git)
       		!(bitmap_git->tags = read_bitmap_1(bitmap_git)))
       		goto failed;
     @@ pack-bitmap.c: struct include_data {
       	struct bitmap *seen;
       };
       
     --struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
     --				      struct commit *commit)
     -+static struct stored_bitmap *stored_bitmap_for_commit(struct bitmap_index *bitmap_git,
     -+						      struct commit *commit,
     -+						      uint32_t *pos_hint);
     -+
     -+static inline const unsigned char *bitmap_oid_pos(struct bitmap_index *bitmap_git,
     -+						  uint32_t pos)
     ++static inline const void *bitmap_get_triplet(struct bitmap_index *bitmap_git, uint32_t xor_pos)
      +{
     -+	return bitmap_git->table_lookup + (pos * the_hash_algo->rawsz);
     ++	size_t triplet_sz = st_add3(sizeof(uint32_t), sizeof(uint64_t), sizeof(uint32_t));
     ++	const void *p = bitmap_git->table_lookup + st_mult(xor_pos, triplet_sz);
     ++	return p;
      +}
      +
     -+static inline const void *bitmap_offset_pos(struct bitmap_index *bitmap_git,
     -+					    uint32_t pos)
     ++static uint64_t triplet_get_offset(const void *triplet)
      +{
     -+	return bitmap_git->table_offsets + (pos * 2 * sizeof(uint32_t));
     ++	const void *p = (unsigned char*) triplet + sizeof(uint32_t);
     ++	return get_be64(p);
      +}
      +
     -+static inline const void *xor_position_pos(struct bitmap_index *bitmap_git,
     -+					   uint32_t pos)
     ++static uint32_t triplet_get_xor_pos(const void *triplet)
      +{
     -+	return (unsigned char*) bitmap_offset_pos(bitmap_git, pos) + sizeof(uint32_t);
     ++	const void *p = (unsigned char*) triplet + st_add(sizeof(uint32_t), sizeof(uint64_t));
     ++	return get_be32(p);
      +}
      +
     -+static int bitmap_lookup_cmp(const void *_va, const void *_vb)
     ++static int triplet_cmp(const void *va, const void *vb)
      +{
     -+	return hashcmp(_va, _vb);
     ++	int result = 0;
     ++	uint32_t *a = (uint32_t *) va;
     ++	uint32_t b = get_be32(vb);
     ++	if (*a > b)
     ++		result = 1;
     ++	else if (*a < b)
     ++		result = -1;
     ++	else
     ++		result = 0;
     ++
     ++	return result;
      +}
      +
     -+static int bitmap_table_lookup(struct bitmap_index *bitmap_git,
     -+			       struct object_id *oid,
     -+			       uint32_t *commit_pos)
     ++static uint32_t bsearch_pos(struct bitmap_index *bitmap_git, struct object_id *oid,
     ++						uint32_t *result)
      +{
     -+	unsigned char *found = bsearch(oid->hash, bitmap_git->table_lookup,
     -+				       bitmap_git->entry_count,
     -+				       the_hash_algo->rawsz, bitmap_lookup_cmp);
     -+	if (found)
     -+		*commit_pos = (found - bitmap_git->table_lookup) / the_hash_algo->rawsz;
     -+	return !!found;
     ++	int found;
     ++
     ++	if (bitmap_git->midx)
     ++		found = bsearch_midx(oid, bitmap_git->midx, result);
     ++	else
     ++		found = bsearch_pack(oid, bitmap_git->pack, result);
     ++
     ++	return found;
      +}
      +
      +static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_git,
     -+						    struct object_id *oid,
     -+						    uint32_t commit_pos)
     ++					  struct commit *commit)
      +{
     -+	uint32_t xor_pos;
     -+	off_t bitmap_ofs;
     -+
     ++	uint32_t commit_pos, xor_pos;
     ++	uint64_t offset;
      +	int flags;
     ++	const void *triplet = NULL;
     ++	struct object_id *oid = &commit->object.oid;
      +	struct ewah_bitmap *bitmap;
     -+	struct stored_bitmap *xor_bitmap;
     ++	struct stored_bitmap *xor_bitmap = NULL;
     ++	size_t triplet_sz = st_add3(sizeof(uint32_t), sizeof(uint64_t), sizeof(uint32_t));
      +
     -+	bitmap_ofs = get_be32(bitmap_offset_pos(bitmap_git, commit_pos));
     -+	xor_pos = get_be32(xor_position_pos(bitmap_git, commit_pos));
     ++	int found = bsearch_pos(bitmap_git, oid, &commit_pos);
      +
     -+	/*
     -+	 * Lazily load the xor'd bitmap if required (and we haven't done so
     -+	 * already). Make sure to pass the xor'd bitmap's position along as a
     -+	 * hint to avoid an unnecessary binary search in
     -+	 * stored_bitmap_for_commit().
     -+	 */
     -+	if (xor_pos == 0xffffffff) {
     -+		xor_bitmap = NULL;
     -+	} else {
     -+		struct commit *xor_commit;
     ++	if (!found)
     ++		return NULL;
     ++
     ++	triplet = bsearch(&commit_pos, bitmap_git->table_lookup, bitmap_git->entry_count,
     ++						triplet_sz, triplet_cmp);
     ++	if (!triplet)
     ++		return NULL;
     ++
     ++	offset = triplet_get_offset(triplet);
     ++	xor_pos = triplet_get_xor_pos(triplet);
     ++
     ++	if (xor_pos != 0xffffffff) {
     ++		int xor_flags;
     ++		uint64_t offset_xor;
     ++		uint32_t *xor_positions;
      +		struct object_id xor_oid;
     ++		size_t size = 0;
      +
     -+		oidread(&xor_oid, bitmap_oid_pos(bitmap_git, xor_pos));
     ++		ALLOC_ARRAY(xor_positions, bitmap_git->entry_count);
     ++		while (xor_pos != 0xffffffff) {
     ++			xor_positions[size++] = xor_pos;
     ++			triplet = bitmap_get_triplet(bitmap_git, xor_pos);
     ++			xor_pos = triplet_get_xor_pos(triplet);
     ++		}
      +
     -+		xor_commit = lookup_commit(the_repository, &xor_oid);
     -+		if (!xor_commit)
     -+			return NULL;
     ++		while (size){
     ++			xor_pos = xor_positions[size - 1];
     ++			triplet = bitmap_get_triplet(bitmap_git, xor_pos);
     ++			commit_pos = get_be32(triplet);
     ++			offset_xor = triplet_get_offset(triplet);
     ++
     ++			if (nth_bitmap_object_oid(bitmap_git, &xor_oid, commit_pos) < 0) {
     ++				free(xor_positions);
     ++				return NULL;
     ++			}
     ++
     ++			bitmap_git->map_pos = offset_xor + sizeof(uint32_t) + sizeof(uint8_t);
     ++			xor_flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
     ++			bitmap = read_bitmap_1(bitmap_git);
     ++
     ++			if (!bitmap){
     ++				free(xor_positions);
     ++				return NULL;
     ++			}
     ++
     ++			xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_oid, xor_bitmap, xor_flags);
     ++			size--;
     ++		}
      +
     -+		xor_bitmap = stored_bitmap_for_commit(bitmap_git, xor_commit,
     -+						      &xor_pos);
     ++		free(xor_positions);
      +	}
      +
     -+	/*
     -+	 * Don't bother reading the commit's index position or its xor
     -+	 * offset:
     -+	 *
     -+	 *   - The commit's index position is irrelevant to us, since
     -+	 *     load_bitmap_entries_v1 only uses it to learn the object
     -+	 *     id which is used to compute the hashmap's key. We already
     -+	 *     have an object id, so no need to look it up again.
     -+	 *
     -+	 *   - The xor_offset is unusable for us, since it specifies how
     -+	 *     many entries previous to ours we should look at. This
     -+	 *     makes sense when reading the bitmaps sequentially (as in
     -+	 *     load_bitmap_entries_v1()), since we can keep track of
     -+	 *     each bitmap as we read them.
     -+	 *
     -+	 *     But it can't work for us, since the bitmap's don't have a
     -+	 *     fixed size. So we learn the position of the xor'd bitmap
     -+	 *     from the commit table (and resolve it to a bitmap in the
     -+	 *     above if-statement).
     -+	 *
     -+	 * Instead, we can skip ahead and immediately read the flags and
     -+	 * ewah bitmap.
     -+	 */
     -+	bitmap_git->map_pos = bitmap_ofs + sizeof(uint32_t) + sizeof(uint8_t);
     ++	bitmap_git->map_pos = offset + sizeof(uint32_t) + sizeof(uint8_t);
      +	flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
      +	bitmap = read_bitmap_1(bitmap_git);
     ++
      +	if (!bitmap)
      +		return NULL;
      +
      +	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags);
      +}
      +
     -+static struct stored_bitmap *stored_bitmap_for_commit(struct bitmap_index *bitmap_git,
     -+						      struct commit *commit,
     -+						      uint32_t *pos_hint)
     + struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
     + 				      struct commit *commit)
       {
       	khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps,
       					   commit->object.oid);
      -	if (hash_pos >= kh_end(bitmap_git->bitmaps))
     +-		return NULL;
      +	if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
     -+		uint32_t commit_pos;
     ++		struct stored_bitmap *bitmap = NULL;
      +		if (!bitmap_git->table_lookup)
      +			return NULL;
      +
     -+		/* NEEDSWORK: cache misses aren't recorded. */
     -+		if (pos_hint)
     -+			commit_pos = *pos_hint;
     -+		else if (!bitmap_table_lookup(bitmap_git,
     -+					      &commit->object.oid,
     -+					      &commit_pos))
     ++		/* NEEDSWORK: cache misses aren't recorded */
     ++		bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
     ++		if(!bitmap)
      +			return NULL;
     -+		return lazy_bitmap_for_commit(bitmap_git, &commit->object.oid,
     -+					      commit_pos);
     ++		return lookup_stored_bitmap(bitmap);
      +	}
     -+	return kh_value(bitmap_git->bitmaps, hash_pos);
     -+}
     -+
     -+struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
     -+				      struct commit *commit)
     -+{
     -+	struct stored_bitmap *sb = stored_bitmap_for_commit(bitmap_git, commit,
     -+							    NULL);
     -+	if (!sb)
     - 		return NULL;
     --	return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
     -+	return lookup_stored_bitmap(sb);
     + 	return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
       }
       
     - static inline int bitmap_position_extended(struct bitmap_index *bitmap_git,
      @@ pack-bitmap.c: void test_bitmap_walk(struct rev_info *revs)
       	if (revs->pending.nr != 1)
       		die("you must specify exactly one commit to test");
       
      -	fprintf(stderr, "Bitmap v%d test (%d entries loaded)\n",
     --		bitmap_git->version, bitmap_git->entry_count);
     ++	fprintf(stderr, "Bitmap v%d test (%d entries)\n",
     + 		bitmap_git->version, bitmap_git->entry_count);
     + 
      +	if (!bitmap_git->table_lookup)
      +		fprintf(stderr, "Bitmap v%d test (%d entries loaded)\n",
      +			bitmap_git->version, bitmap_git->entry_count);
     - 
     ++
       	root = revs->pending.objects[0].item;
       	bm = bitmap_for_commit(bitmap_git, (struct commit *)root);
     + 
     +@@ pack-bitmap.c: void test_bitmap_walk(struct rev_info *revs)
     + 
     + int test_bitmap_commits(struct repository *r)
     + {
     +-	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
     ++	struct bitmap_index *bitmap_git = NULL;
     + 	struct object_id oid;
     + 	MAYBE_UNUSED void *value;
     + 
     ++	/* As this function is only used to print bitmap selected
     ++	 * commits, we don't have to read the commit table.
     ++	 */
     ++	setenv("GIT_TEST_READ_COMMIT_TABLE", "0", 1);
     ++
     ++	bitmap_git = prepare_bitmap_git(r);
     + 	if (!bitmap_git)
     + 		die("failed to load bitmap indexes");
     + 
     +@@ pack-bitmap.c: int test_bitmap_commits(struct repository *r)
     + 		printf("%s\n", oid_to_hex(&oid));
     + 	});
     + 
     ++	setenv("GIT_TEST_READ_COMMIT_TABLE", "1", 1);
     + 	free_bitmap_index(bitmap_git);
     + 
     + 	return 0;
      
     - ## pack-bitmap.h ##
     -@@ pack-bitmap.h: struct bitmap_disk_header {
     - enum pack_bitmap_opts {
     - 	BITMAP_OPT_FULL_DAG = 1,
     - 	BITMAP_OPT_HASH_CACHE = 4,
     -+	BITMAP_OPT_LOOKUP_TABLE = 16,
     - };
     + ## t/t5310-pack-bitmaps.sh ##
     +@@ t/t5310-pack-bitmaps.sh: test_expect_success 'full repack creates bitmaps' '
     + 	grep "\"label\":\"writing_lookup_table\"" trace
     + '
     + 
     ++test_expect_success 'using lookup table loads only necessary bitmaps' '
     ++	git rev-list --test-bitmap HEAD 2>out &&
     ++	! grep "Bitmap v1 test (106 entries loaded)" out &&
     ++	grep "Found bitmap for" out
     ++'
     ++
     + basic_bitmap_tests
       
     - enum pack_bitmap_flags {
     + test_expect_success 'incremental repack fails when bitmaps are requested' '
     +@@ t/t5310-pack-bitmaps.sh: test_expect_success 'pack reuse respects --incremental' '
     + 
     + test_expect_success 'truncated bitmap fails gracefully (ewah)' '
     + 	test_config pack.writebitmaphashcache false &&
     ++	test_config pack.writebitmaplookuptable false &&
     + 	git repack -ad &&
     + 	git rev-list --use-bitmap-index --count --all >expect &&
     + 	bitmap=$(ls .git/objects/pack/*.bitmap) &&
     +
     + ## t/t5326-multi-pack-bitmaps.sh ##
     +@@ t/t5326-multi-pack-bitmaps.sh: test_expect_success 'multi-pack-index write writes lookup table if enabled' '
     + 		grep "\"label\":\"writing_lookup_table\"" trace
     + 	)
     + '
     ++
     + test_done
 5:  a404779a30f < -:  ----------- bitmap-commit-table: add tests for the bitmap lookup table
 6:  f5f725a3fe2 ! 5:  96c0041688f bitmap-lookup-table: add performance tests
     @@ Metadata
      Author: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
      
       ## Commit message ##
     -    bitmap-lookup-table: add performance tests
     +    bitmap-lookup-table: add performance tests for lookup table
      
     -    Add performance tests for bitmap lookup table extension.
     +    Add performance tests to verify the performance of lookup table.
     +
     +    Lookup table makes Git run faster in most of the cases. Below is the
     +    result of `t/perf/p5310-pack-bitmaps.sh`.`perf/p5326-multi-pack-bitmaps.sh`
     +    gives similar result. The repository used in the test is linux kernel.
     +
     +    Test                                                      this tree
     +    --------------------------------------------------------------------------
     +    5310.4: repack to disk (lookup=false)                   295.94(250.45+15.24)
     +    5310.5: simulated clone                                 12.52(5.07+1.40)
     +    5310.6: simulated fetch                                 1.89(2.94+0.24)
     +    5310.7: pack to file (bitmap)                           41.39(20.33+7.20)
     +    5310.8: rev-list (commits)                              0.98(0.59+0.12)
     +    5310.9: rev-list (objects)                              3.40(3.27+0.10)
     +    5310.10: rev-list with tag negated via --not            0.07(0.02+0.04)
     +             --all (objects)
     +    5310.11: rev-list with negative tag (objects)           0.23(0.16+0.06)
     +    5310.12: rev-list count with blob:none                  0.26(0.18+0.07)
     +    5310.13: rev-list count with blob:limit=1k              6.45(5.94+0.37)
     +    5310.14: rev-list count with tree:0                     0.26(0.18+0.07)
     +    5310.15: simulated partial clone                        4.99(3.19+0.45)
     +    5310.19: repack to disk (lookup=true)                   269.67(174.70+21.33)
     +    5310.20: simulated clone                                11.03(5.07+1.11)
     +    5310.21: simulated fetch                                0.79(0.79+0.17)
     +    5310.22: pack to file (bitmap)                          43.03(20.28+7.43)
     +    5310.23: rev-list (commits)                             0.86(0.54+0.09)
     +    5310.24: rev-list (objects)                             3.35(3.26+0.07)
     +    5310.25: rev-list with tag negated via --not            0.05(0.00+0.03)
     +             --all (objects)
     +    5310.26: rev-list with negative tag (objects)           0.22(0.16+0.05)
     +    5310.27: rev-list count with blob:none                  0.22(0.16+0.05)
     +    5310.28: rev-list count with blob:limit=1k              6.45(5.87+0.31)
     +    5310.29: rev-list count with tree:0                     0.22(0.16+0.05)
     +    5310.30: simulated partial clone                        5.17(3.12+0.48)
     +
     +    Test 4-15 are tested without using lookup table. Same tests are
     +    repeated in 16-30 (using lookup table).
      
     -    Mentored-by: Taylor Blau <ttaylorr@github.com>
     -    Co-mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
          Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
     +    Mentored-by: Taylor Blau <me@ttaylorr.com>
     +    Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
      
       ## t/perf/p5310-pack-bitmaps.sh ##
     -@@ t/perf/p5310-pack-bitmaps.sh: test_perf_large_repo
     - # since we want to be able to compare bitmap-aware
     - # git versus non-bitmap git
     - #
     --# We intentionally use the deprecated pack.writebitmaps
     -+# We intentionally use the deprecated pack.writeBitmaps
     - # config so that we can test against older versions of git.
     - test_expect_success 'setup bitmap config' '
     --	git config pack.writebitmaps true
     -+	git config pack.writeBitmaps true &&
     -+	git config pack.writeReverseIndex true
     +@@ t/perf/p5310-pack-bitmaps.sh: test_expect_success 'setup bitmap config' '
     + 	git config pack.writebitmaps true
       '
       
     - # we need to create the tag up front such that it is covered by the repack and
     -@@ t/perf/p5310-pack-bitmaps.sh: test_perf 'repack to disk' '
     - 
     - test_full_bitmap
     - 
     +-# we need to create the tag up front such that it is covered by the repack and
     +-# thus by generated bitmaps.
     +-test_expect_success 'create tags' '
     +-	git tag --message="tag pointing to HEAD" perf-tag HEAD
     +-'
     +-
     +-test_perf 'repack to disk' '
     +-	git repack -ad
     +-'
     +-
     +-test_full_bitmap
     +-
      -test_expect_success 'create partial bitmap state' '
      -	# pick a commit to represent the repo tip in the past
      -	cutoff=$(git rev-list HEAD~100 -1) &&
     @@ t/perf/p5310-pack-bitmaps.sh: test_perf 'repack to disk' '
      -	# and now restore our original tip, as if the pushes
      -	# had happened
      -	git update-ref HEAD $orig_tip
     -+test_perf 'use lookup table' '
     -+    git config pack.writeBitmapLookupTable true
     - '
     - 
     +-'
     +-
      -test_partial_bitmap
     -+test_perf 'repack to disk (lookup table)' '
     -+    git repack -adb
     -+'
     ++test_bitmap () {
     ++    local enabled="$1"
      +
     -+test_full_bitmap
     ++	# we need to create the tag up front such that it is covered by the repack and
     ++	# thus by generated bitmaps.
     ++	test_expect_success 'create tags' '
     ++		git tag --message="tag pointing to HEAD" perf-tag HEAD
     ++	'
      +
     -+for i in false true
     -+do
     -+	$i && lookup=" (lookup table)"
     -+	test_expect_success "create partial bitmap state$lookup" '
     -+		git config pack.writeBitmapLookupTable '"$i"' &&
     ++	test_expect_success "use lookup table: $enabled" '
     ++		git config pack.writeBitmapLookupTable '"$enabled"'
     ++	'
     ++
     ++	test_perf "repack to disk (lookup=$enabled)" '
     ++		git repack -ad
     ++	'
     ++
     ++	test_full_bitmap
     ++
     ++    test_expect_success "create partial bitmap state (lookup=$enabled)" '
      +		# pick a commit to represent the repo tip in the past
      +		cutoff=$(git rev-list HEAD~100 -1) &&
      +		orig_tip=$(git rev-parse HEAD) &&
     @@ t/perf/p5310-pack-bitmaps.sh: test_perf 'repack to disk' '
      +		# and now restore our original tip, as if the pushes
      +		# had happened
      +		git update-ref HEAD $orig_tip
     -+	'
     ++    '
     ++}
      +
     -+	test_partial_bitmap
     -+done
     ++test_bitmap false
     ++test_bitmap true
       
       test_done
      
       ## t/perf/p5326-multi-pack-bitmaps.sh ##
     -@@ t/perf/p5326-multi-pack-bitmaps.sh: test_expect_success 'drop pack bitmap' '
     +@@ t/perf/p5326-multi-pack-bitmaps.sh: test_description='Tests performance using midx bitmaps'
       
     - test_full_bitmap
     + test_perf_large_repo
       
     +-# we need to create the tag up front such that it is covered by the repack and
     +-# thus by generated bitmaps.
     +-test_expect_success 'create tags' '
     +-	git tag --message="tag pointing to HEAD" perf-tag HEAD
     +-'
     +-
     +-test_expect_success 'start with bitmapped pack' '
     +-	git repack -adb
     +-'
     +-
     +-test_perf 'setup multi-pack index' '
     +-	git multi-pack-index write --bitmap
     +-'
     +-
     +-test_expect_success 'drop pack bitmap' '
     +-	rm -f .git/objects/pack/pack-*.bitmap
     +-'
     +-
     +-test_full_bitmap
     +-
      -test_expect_success 'create partial bitmap state' '
      -	# pick a commit to represent the repo tip in the past
      -	cutoff=$(git rev-list HEAD~100 -1) &&
     @@ t/perf/p5326-multi-pack-bitmaps.sh: test_expect_success 'drop pack bitmap' '
      -	# and now restore our original tip, as if the pushes
      -	# had happened
      -	git update-ref HEAD $orig_tip
     -+test_expect_success 'use lookup table' '
     -+	git config pack.writeBitmapLookupTable true
     - '
     - 
     +-'
     +-
      -test_partial_bitmap
     -+test_perf 'setup multi-pack-index (lookup table)' '
     -+	git multi-pack-index write --bitmap
     -+'
     ++test_bitmap () {
     ++    local enabled="$1"
     ++
     ++	# we need to create the tag up front such that it is covered by the repack and
     ++	# thus by generated bitmaps.
     ++	test_expect_success 'create tags' '
     ++		git tag --message="tag pointing to HEAD" perf-tag HEAD
     ++	'
      +
     -+test_full_bitmap
     ++	test_expect_success "use lookup table: $enabled" '
     ++		git config pack.writeBitmapLookupTable '"$enabled"'
     ++	'
     ++
     ++	test_expect_success "start with bitmapped pack (lookup=$enabled)" '
     ++		git repack -adb
     ++	'
     ++
     ++	test_perf "setup multi-pack index (lookup=$enabled)" '
     ++		git multi-pack-index write --bitmap
     ++	'
      +
     -+for i in false true
     -+do
     -+	$i && lookup=" (lookup table)"
     -+	test_expect_success "create partial bitmap state$lookup" '
     -+		git config pack.writeBitmapLookupTable '"$i"' &&
     ++	test_expect_success "drop pack bitmap (lookup=$enabled)" '
     ++		rm -f .git/objects/pack/pack-*.bitmap
     ++	'
     ++
     ++	test_full_bitmap
     ++
     ++    test_expect_success "create partial bitmap state (lookup=$enabled)" '
      +		# pick a commit to represent the repo tip in the past
      +		cutoff=$(git rev-list HEAD~100 -1) &&
      +		orig_tip=$(git rev-parse HEAD) &&
     @@ t/perf/p5326-multi-pack-bitmaps.sh: test_expect_success 'drop pack bitmap' '
      +		# and now restore our original tip, as if the pushes
      +		# had happened
      +		git update-ref HEAD $orig_tip
     -+	'
     ++    '
     ++}
      +
     -+	test_partial_bitmap
     -+done
     ++test_bitmap false
     ++test_bitmap true
       
       test_done
 -:  ----------- > 6:  fe556b58814 p5310-pack-bitmaps.sh: enable pack.writeReverseIndex for testing

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH v2 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-06-26 13:10 ` [PATCH v2 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
@ 2022-06-26 13:10   ` Abhradeep Chakraborty via GitGitGadget
  2022-06-27 14:18     ` Derrick Stolee
  2022-06-26 13:10   ` [PATCH v2 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-26 13:10 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee,
	Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

When reading bitmap file, git loads each and every bitmap one by one
even if all the bitmaps are not required. A "bitmap lookup table"
extension to the bitmap format can reduce the overhead of loading
bitmaps which stores a list of bitmapped commit id pos (in the midx
or pack, along with their offset and xor offset. This way git can
load only the neccesary bitmaps without loading the previous bitmaps.

The older version of Git ignores the lookup table extension and doesn't
throw any kind of warning or error while parsing the bitmap file.

Add some information for the new "bitmap lookup table" extension in the
bitmap-format documentation.

Co-Authored-by: Taylor Blau <me@ttaylorr.com>
Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 Documentation/technical/bitmap-format.txt | 41 +++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index 04b3ec21785..7d4e450d3d8 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -67,6 +67,19 @@ MIDXs, both the bit-cache and rev-cache extensions are required.
 			pack/MIDX. The format and meaning of the name-hash is
 			described below.
 
+			** {empty}
+			BITMAP_OPT_LOOKUP_TABLE (0x10): :::
+			If present, the end of the bitmap file contains a table
+			containing a list of `N` <commit pos, offset, xor offset>
+			triplets. The format and meaning of the table is described
+			below.
++
+NOTE: This xor_offset is different from the bitmap's xor_offset.
+Bitmap's xor_offset is relative i.e. it tells how many bitmaps we have
+to go back from the current bitmap. Lookup table's xor_offset tells the
+position of the triplet in the list whose bitmap the current commit's
+bitmap have to xor with.
+
 		4-byte entry count (network byte order)
 
 			The total count of entries (bitmapped commits) in this bitmap index.
@@ -205,3 +218,31 @@ Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag.
 If implementations want to choose a different hashing scheme, they are
 free to do so, but MUST allocate a new header flag (because comparing
 hashes made under two different schemes would be pointless).
+
+Commit lookup table
+-------------------
+
+If the BITMAP_OPT_LOOKUP_TABLE flag is set, the last `N * (4 + 8 + 4)`
+(preceding the name-hash cache and trailing hash) of the `.bitmap` file
+contains a lookup table specifying the information needed to get the
+desired bitmap from the entries without parsing previous unnecessary
+bitmaps.
+
+For a `.bitmap` containing `nr_entries` reachability bitmaps, the table
+contains a list of `nr_entries` <commit pos, offset, xor offset> triplets.
+The content of i'th triplet is -
+
+	* {empty}
+	commit pos (4 byte integer, network byte order): ::
+	It stores the object position of the commit (in the midx or pack index)
+	to which the i'th bitmap in the bitmap entries belongs.
+
+	* {empty}
+	offset (8 byte integer, network byte order): ::
+	The offset from which that commit's bitmap can be read.
+
+	* {empty}
+	xor offset (4 byte integer, network byte order): ::
+	It holds the position of the triplet with whose bitmap the
+	current bitmap need to xor. If the current triplet's bitmap
+	do not have any xor bitmap, it defaults to 0xffffffff.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v2 2/6] pack-bitmap-write.c: write lookup table extension
  2022-06-26 13:10 ` [PATCH v2 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
  2022-06-26 13:10   ` [PATCH v2 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
@ 2022-06-26 13:10   ` Abhradeep Chakraborty via GitGitGadget
  2022-06-27 14:35     ` Derrick Stolee
  2022-06-27 16:05     ` Taylor Blau
  2022-06-26 13:10   ` [PATCH v2 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
                     ` (4 subsequent siblings)
  6 siblings, 2 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-26 13:10 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee,
	Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

The bitmap lookup table extension was documentated by an earlier
change, but Git does not yet knowhow to write that extension.

Teach git to write bitmap lookup table extension. The table contains
the list of `N` <commit pos, offset, xor offset>` triplets. These
triplets are sorted according to their commit pos (ascending order).
The meaning of each data in the i'th triplet is given below:

  - Commit pos is the position of the commit in the pack-index
    (or midx) to which the i'th bitmap belongs. It is a 4 byte
    network byte order integer.

  - offset is the position of the i'th bitmap.

  - xor offset denotes the position of the triplet with whose
    bitmap the current triplet's bitmap need to xor with.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 pack-bitmap-write.c | 72 +++++++++++++++++++++++++++++++++++++++++++--
 pack-bitmap.h       |  5 ++--
 2 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index c43375bd344..899a4a941e1 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -650,7 +650,9 @@ static const struct object_id *oid_access(size_t pos, const void *table)
 
 static void write_selected_commits_v1(struct hashfile *f,
 				      struct pack_idx_entry **index,
-				      uint32_t index_nr)
+				      uint32_t index_nr,
+				      uint64_t *offsets,
+				      uint32_t *commit_positions)
 {
 	int i;
 
@@ -663,6 +665,11 @@ static void write_selected_commits_v1(struct hashfile *f,
 		if (commit_pos < 0)
 			BUG("trying to write commit not in index");
 
+		if (offsets)
+			offsets[i] = hashfile_total(f);
+		if (commit_positions)
+			commit_positions[i] = commit_pos;
+
 		hashwrite_be32(f, commit_pos);
 		hashwrite_u8(f, stored->xor_offset);
 		hashwrite_u8(f, stored->flags);
@@ -671,6 +678,55 @@ static void write_selected_commits_v1(struct hashfile *f,
 	}
 }
 
+static int table_cmp(const void *_va, const void *_vb, void *commit_positions)
+{
+	int8_t result = 0;
+	uint32_t *positions = (uint32_t *) commit_positions;
+	uint32_t a = positions[*(uint32_t *)_va];
+	uint32_t b = positions[*(uint32_t *)_vb];
+
+	if (a > b)
+		result = 1;
+	else if (a < b)
+		result = -1;
+	else
+		result = 0;
+
+	return result;
+}
+
+static void write_lookup_table(struct hashfile *f,
+			       uint64_t *offsets,
+			       uint32_t *commit_positions)
+{
+	uint32_t i;
+	uint32_t *table, *table_inv;
+
+	ALLOC_ARRAY(table, writer.selected_nr);
+	ALLOC_ARRAY(table_inv, writer.selected_nr);
+
+	for (i = 0; i < writer.selected_nr; i++)
+		table[i] = i;
+
+	QSORT_S(table, writer.selected_nr, table_cmp, commit_positions);
+
+	for (i = 0; i < writer.selected_nr; i++)
+		table_inv[table[i]] = i;
+
+	for (i = 0; i < writer.selected_nr; i++) {
+		struct bitmapped_commit *selected = &writer.selected[table[i]];
+		uint32_t xor_offset = selected->xor_offset;
+
+		hashwrite_be32(f, commit_positions[table[i]]);
+		hashwrite_be64(f, offsets[table[i]]);
+		hashwrite_be32(f, xor_offset ?
+				table_inv[table[i] - xor_offset]: 0xffffffff);
+	}
+
+	free(table);
+	free(table_inv);
+}
+
 static void write_hash_cache(struct hashfile *f,
 			     struct pack_idx_entry **index,
 			     uint32_t index_nr)
@@ -695,6 +751,8 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 {
 	static uint16_t default_version = 1;
 	static uint16_t flags = BITMAP_OPT_FULL_DAG;
+	uint64_t *offsets = NULL;
+	uint32_t *commit_positions = NULL;
 	struct strbuf tmp_file = STRBUF_INIT;
 	struct hashfile *f;
 
@@ -715,8 +773,16 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	dump_bitmap(f, writer.trees);
 	dump_bitmap(f, writer.blobs);
 	dump_bitmap(f, writer.tags);
-	write_selected_commits_v1(f, index, index_nr);
 
+	if (options & BITMAP_OPT_LOOKUP_TABLE) {
+		CALLOC_ARRAY(offsets, index_nr);
+		CALLOC_ARRAY(commit_positions, index_nr);
+	}
+
+	write_selected_commits_v1(f, index, index_nr, offsets, commit_positions);
+
+	if (options & BITMAP_OPT_LOOKUP_TABLE)
+		write_lookup_table(f, offsets, commit_positions);
 	if (options & BITMAP_OPT_HASH_CACHE)
 		write_hash_cache(f, index, index_nr);
 
@@ -730,4 +796,6 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 		die_errno("unable to rename temporary bitmap file to '%s'", filename);
 
 	strbuf_release(&tmp_file);
+	free(offsets);
+	free(commit_positions);
 }
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 3d3ddd77345..67a9d0fc303 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -24,8 +24,9 @@ struct bitmap_disk_header {
 #define NEEDS_BITMAP (1u<<22)
 
 enum pack_bitmap_opts {
-	BITMAP_OPT_FULL_DAG = 1,
-	BITMAP_OPT_HASH_CACHE = 4,
+	BITMAP_OPT_FULL_DAG = 0x1,
+	BITMAP_OPT_HASH_CACHE = 0x4,
+	BITMAP_OPT_LOOKUP_TABLE = 0x10,
 };
 
 enum pack_bitmap_flags {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v2 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-06-26 13:10 ` [PATCH v2 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
  2022-06-26 13:10   ` [PATCH v2 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
  2022-06-26 13:10   ` [PATCH v2 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
@ 2022-06-26 13:10   ` Abhradeep Chakraborty via GitGitGadget
  2022-06-27 14:43     ` Derrick Stolee
  2022-06-27 17:47     ` Taylor Blau
  2022-06-26 13:10   ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
                     ` (3 subsequent siblings)
  6 siblings, 2 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-26 13:10 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee,
	Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Teach git to provide a way for users to enable/disable bitmap lookup
table extension by providing a config option named 'writeBitmapLookupTable'.
Default is true.

Also add test to verify writting of lookup table.

Co-Authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
---
 Documentation/config/pack.txt |  7 +++++++
 builtin/multi-pack-index.c    |  8 ++++++++
 builtin/pack-objects.c        | 10 +++++++++-
 midx.c                        |  3 +++
 midx.h                        |  1 +
 pack-bitmap-write.c           |  2 ++
 t/t5310-pack-bitmaps.sh       |  3 ++-
 t/t5326-multi-pack-bitmaps.sh | 13 +++++++++++++
 8 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index ad7f73a1ead..6e1f454c4d6 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -164,6 +164,13 @@ When writing a multi-pack reachability bitmap, no new namehashes are
 computed; instead, any namehashes stored in an existing bitmap are
 permuted into their appropriate location when writing a new bitmap.
 
+pack.writeBitmapLookupTable::
+	When true, git will include a "lookup table" section in the
+	bitmap index (if one is written). This table is used to defer
+	loading individual bitmaps as late as possible. This can be
+	beneficial in repositories which have relatively large bitmap
+	indexes. Defaults to true.
+
 pack.writeReverseIndex::
 	When true, git will write a corresponding .rev file (see:
 	link:../technical/pack-format.html[Documentation/technical/pack-format.txt])
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 5edbb7fe86e..3757616f09c 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -87,6 +87,13 @@ static int git_multi_pack_index_write_config(const char *var, const char *value,
 			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
 	}
 
+	if (!strcmp(var, "pack.writebitmaplookuptable")) {
+		if (git_config_bool(var, value))
+			opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;
+		else
+			opts.flags &= ~MIDX_WRITE_BITMAP_LOOKUP_TABLE;
+	}
+
 	/*
 	 * We should never make a fall-back call to 'git_default_config', since
 	 * this was already called in 'cmd_multi_pack_index()'.
@@ -123,6 +130,7 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 	};
 
 	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
+	opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;
 
 	git_config(git_multi_pack_index_write_config, NULL);
 
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 39e28cfcafc..d6a33fd486c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -228,7 +228,7 @@ static enum {
 	WRITE_BITMAP_QUIET,
 	WRITE_BITMAP_TRUE,
 } write_bitmap_index;
-static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE;
+static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE | BITMAP_OPT_LOOKUP_TABLE;
 
 static int exclude_promisor_objects;
 
@@ -3148,6 +3148,14 @@ static int git_pack_config(const char *k, const char *v, void *cb)
 		else
 			write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE;
 	}
+
+	if (!strcmp(k, "pack.writebitmaplookuptable")) {
+		if (git_config_bool(k, v))
+			write_bitmap_options |= BITMAP_OPT_LOOKUP_TABLE;
+		else
+			write_bitmap_options &= ~BITMAP_OPT_LOOKUP_TABLE;
+	}
+
 	if (!strcmp(k, "pack.usebitmaps")) {
 		use_bitmap_index_default = git_config_bool(k, v);
 		return 0;
diff --git a/midx.c b/midx.c
index 5f0dd386b02..9c26d04bfde 100644
--- a/midx.c
+++ b/midx.c
@@ -1072,6 +1072,9 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
 	if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
 		options |= BITMAP_OPT_HASH_CACHE;
 
+	if (flags & MIDX_WRITE_BITMAP_LOOKUP_TABLE)
+		options |= BITMAP_OPT_LOOKUP_TABLE;
+
 	prepare_midx_packing_data(&pdata, ctx);
 
 	commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx);
diff --git a/midx.h b/midx.h
index 22e8e53288e..5578cd7b835 100644
--- a/midx.h
+++ b/midx.h
@@ -47,6 +47,7 @@ struct multi_pack_index {
 #define MIDX_WRITE_REV_INDEX (1 << 1)
 #define MIDX_WRITE_BITMAP (1 << 2)
 #define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
+#define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m);
 void get_midx_filename(struct strbuf *out, const char *object_dir);
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 899a4a941e1..79be0cf80e6 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -713,6 +713,7 @@ static void write_lookup_table(struct hashfile *f,
 	for (i = 0; i < writer.selected_nr; i++)
 		table_inv[table[i]] = i;
 
+	trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository);
 	for (i = 0; i < writer.selected_nr; i++) {
 		struct bitmapped_commit *selected = &writer.selected[table[i]];
 		uint32_t xor_offset = selected->xor_offset;
@@ -725,6 +726,7 @@ static void write_lookup_table(struct hashfile *f,
 
 	free(table);
 	free(table_inv);
+	trace2_region_leave("pack-bitmap-write", "writing_lookup_table", the_repository);
 }
 
 static void write_hash_cache(struct hashfile *f,
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index f775fc1ce69..c669ed959e9 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -38,7 +38,8 @@ test_expect_success 'full repack creates bitmaps' '
 	ls .git/objects/pack/ | grep bitmap >output &&
 	test_line_count = 1 output &&
 	grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
-	grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
+	grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace &&
+	grep "\"label\":\"writing_lookup_table\"" trace
 '
 
 basic_bitmap_tests
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 4fe57414c13..43be49617b8 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -307,4 +307,17 @@ test_expect_success 'graceful fallback when missing reverse index' '
 	)
 '
 
+test_expect_success 'multi-pack-index write writes lookup table if enabled' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+		test_commit base &&
+		git repack -ad &&
+		GIT_TRACE2_EVENT="$(pwd)/trace" \
+			git multi-pack-index write --bitmap &&
+		grep "\"label\":\"writing_lookup_table\"" trace
+	)
+'
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension
  2022-06-26 13:10 ` [PATCH v2 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                     ` (2 preceding siblings ...)
  2022-06-26 13:10   ` [PATCH v2 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
@ 2022-06-26 13:10   ` Abhradeep Chakraborty via GitGitGadget
  2022-06-27 15:12     ` Derrick Stolee
  2022-06-27 21:38     ` Taylor Blau
  2022-06-26 13:10   ` [PATCH v2 5/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
                     ` (2 subsequent siblings)
  6 siblings, 2 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-26 13:10 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee,
	Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Earlier change teaches Git to write bitmap lookup table. But Git
does not know how to parse them.

Teach Git to parse the existing bitmap lookup table. The older
versions of git are not affected by it. Those versions ignore the
lookup table.

Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
---
 pack-bitmap.c                 | 193 ++++++++++++++++++++++++++++++++--
 t/t5310-pack-bitmaps.sh       |   7 ++
 t/t5326-multi-pack-bitmaps.sh |   1 +
 3 files changed, 191 insertions(+), 10 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 36134222d7a..9e09c5824fc 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -82,6 +82,12 @@ struct bitmap_index {
 	/* The checksum of the packfile or MIDX; points into map. */
 	const unsigned char *checksum;
 
+	/*
+	 * If not NULL, this point into the commit table extension
+	 * (within map).
+	 */
+	unsigned char *table_lookup;
+
 	/*
 	 * Extended index.
 	 *
@@ -185,6 +191,22 @@ static int load_bitmap_header(struct bitmap_index *index)
 			index->hashes = (void *)(index_end - cache_size);
 			index_end -= cache_size;
 		}
+
+		if (flags & BITMAP_OPT_LOOKUP_TABLE &&
+			git_env_bool("GIT_TEST_READ_COMMIT_TABLE", 1)) {
+			size_t table_size = 0;
+			size_t triplet_sz = st_add3(sizeof(uint32_t),    /* commit position */
+							sizeof(uint64_t),    /* offset */
+							sizeof(uint32_t));    /* xor offset */
+
+			table_size = st_add(table_size,
+					st_mult(ntohl(header->entry_count),
+						triplet_sz));
+			if (table_size > index_end - index->map - header_size)
+				return error("corrupted bitmap index file (too short to fit lookup table)");
+			index->table_lookup = (void *)(index_end - table_size);
+			index_end -= table_size;
+		}
 	}
 
 	index->entry_count = ntohl(header->entry_count);
@@ -211,12 +233,20 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
 
 	hash_pos = kh_put_oid_map(index->bitmaps, stored->oid, &ret);
 
-	/* a 0 return code means the insertion succeeded with no changes,
-	 * because the SHA1 already existed on the map. this is bad, there
-	 * shouldn't be duplicated commits in the index */
+	/* A 0 return code means the insertion succeeded with no changes,
+	 * because the SHA1 already existed on the map. If lookup table
+	 * is NULL, this is bad, there shouldn't be duplicated commits
+	 * in the index.
+	 *
+	 * If table_lookup exists, that means the desired bitmap is already
+	 * loaded. Either this bitmap has been stored directly or another
+	 * bitmap has a direct or indirect xor relation with it. */
 	if (ret == 0) {
-		error("Duplicate entry in bitmap index: %s", oid_to_hex(oid));
-		return NULL;
+		if (!index->table_lookup) {
+			error("Duplicate entry in bitmap index: %s", oid_to_hex(oid));
+			return NULL;
+		}
+		return kh_value(index->bitmaps, hash_pos);
 	}
 
 	kh_value(index->bitmaps, hash_pos) = stored;
@@ -470,7 +500,7 @@ static int load_bitmap(struct bitmap_index *bitmap_git)
 		!(bitmap_git->tags = read_bitmap_1(bitmap_git)))
 		goto failed;
 
-	if (load_bitmap_entries_v1(bitmap_git) < 0)
+	if (!bitmap_git->table_lookup && load_bitmap_entries_v1(bitmap_git) < 0)
 		goto failed;
 
 	return 0;
@@ -557,13 +587,145 @@ struct include_data {
 	struct bitmap *seen;
 };
 
+static inline const void *bitmap_get_triplet(struct bitmap_index *bitmap_git, uint32_t xor_pos)
+{
+	size_t triplet_sz = st_add3(sizeof(uint32_t), sizeof(uint64_t), sizeof(uint32_t));
+	const void *p = bitmap_git->table_lookup + st_mult(xor_pos, triplet_sz);
+	return p;
+}
+
+static uint64_t triplet_get_offset(const void *triplet)
+{
+	const void *p = (unsigned char*) triplet + sizeof(uint32_t);
+	return get_be64(p);
+}
+
+static uint32_t triplet_get_xor_pos(const void *triplet)
+{
+	const void *p = (unsigned char*) triplet + st_add(sizeof(uint32_t), sizeof(uint64_t));
+	return get_be32(p);
+}
+
+static int triplet_cmp(const void *va, const void *vb)
+{
+	int result = 0;
+	uint32_t *a = (uint32_t *) va;
+	uint32_t b = get_be32(vb);
+	if (*a > b)
+		result = 1;
+	else if (*a < b)
+		result = -1;
+	else
+		result = 0;
+
+	return result;
+}
+
+static uint32_t bsearch_pos(struct bitmap_index *bitmap_git, struct object_id *oid,
+						uint32_t *result)
+{
+	int found;
+
+	if (bitmap_git->midx)
+		found = bsearch_midx(oid, bitmap_git->midx, result);
+	else
+		found = bsearch_pack(oid, bitmap_git->pack, result);
+
+	return found;
+}
+
+static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_git,
+					  struct commit *commit)
+{
+	uint32_t commit_pos, xor_pos;
+	uint64_t offset;
+	int flags;
+	const void *triplet = NULL;
+	struct object_id *oid = &commit->object.oid;
+	struct ewah_bitmap *bitmap;
+	struct stored_bitmap *xor_bitmap = NULL;
+	size_t triplet_sz = st_add3(sizeof(uint32_t), sizeof(uint64_t), sizeof(uint32_t));
+
+	int found = bsearch_pos(bitmap_git, oid, &commit_pos);
+
+	if (!found)
+		return NULL;
+
+	triplet = bsearch(&commit_pos, bitmap_git->table_lookup, bitmap_git->entry_count,
+						triplet_sz, triplet_cmp);
+	if (!triplet)
+		return NULL;
+
+	offset = triplet_get_offset(triplet);
+	xor_pos = triplet_get_xor_pos(triplet);
+
+	if (xor_pos != 0xffffffff) {
+		int xor_flags;
+		uint64_t offset_xor;
+		uint32_t *xor_positions;
+		struct object_id xor_oid;
+		size_t size = 0;
+
+		ALLOC_ARRAY(xor_positions, bitmap_git->entry_count);
+		while (xor_pos != 0xffffffff) {
+			xor_positions[size++] = xor_pos;
+			triplet = bitmap_get_triplet(bitmap_git, xor_pos);
+			xor_pos = triplet_get_xor_pos(triplet);
+		}
+
+		while (size){
+			xor_pos = xor_positions[size - 1];
+			triplet = bitmap_get_triplet(bitmap_git, xor_pos);
+			commit_pos = get_be32(triplet);
+			offset_xor = triplet_get_offset(triplet);
+
+			if (nth_bitmap_object_oid(bitmap_git, &xor_oid, commit_pos) < 0) {
+				free(xor_positions);
+				return NULL;
+			}
+
+			bitmap_git->map_pos = offset_xor + sizeof(uint32_t) + sizeof(uint8_t);
+			xor_flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
+			bitmap = read_bitmap_1(bitmap_git);
+
+			if (!bitmap){
+				free(xor_positions);
+				return NULL;
+			}
+
+			xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_oid, xor_bitmap, xor_flags);
+			size--;
+		}
+
+		free(xor_positions);
+	}
+
+	bitmap_git->map_pos = offset + sizeof(uint32_t) + sizeof(uint8_t);
+	flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
+	bitmap = read_bitmap_1(bitmap_git);
+
+	if (!bitmap)
+		return NULL;
+
+	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags);
+}
+
 struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
 				      struct commit *commit)
 {
 	khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps,
 					   commit->object.oid);
-	if (hash_pos >= kh_end(bitmap_git->bitmaps))
-		return NULL;
+	if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
+		struct stored_bitmap *bitmap = NULL;
+		if (!bitmap_git->table_lookup)
+			return NULL;
+
+		/* NEEDSWORK: cache misses aren't recorded */
+		bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
+		if(!bitmap)
+			return NULL;
+		return lookup_stored_bitmap(bitmap);
+	}
 	return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
 }
 
@@ -1699,9 +1861,13 @@ void test_bitmap_walk(struct rev_info *revs)
 	if (revs->pending.nr != 1)
 		die("you must specify exactly one commit to test");
 
-	fprintf(stderr, "Bitmap v%d test (%d entries loaded)\n",
+	fprintf(stderr, "Bitmap v%d test (%d entries)\n",
 		bitmap_git->version, bitmap_git->entry_count);
 
+	if (!bitmap_git->table_lookup)
+		fprintf(stderr, "Bitmap v%d test (%d entries loaded)\n",
+			bitmap_git->version, bitmap_git->entry_count);
+
 	root = revs->pending.objects[0].item;
 	bm = bitmap_for_commit(bitmap_git, (struct commit *)root);
 
@@ -1753,10 +1919,16 @@ void test_bitmap_walk(struct rev_info *revs)
 
 int test_bitmap_commits(struct repository *r)
 {
-	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
+	struct bitmap_index *bitmap_git = NULL;
 	struct object_id oid;
 	MAYBE_UNUSED void *value;
 
+	/* As this function is only used to print bitmap selected
+	 * commits, we don't have to read the commit table.
+	 */
+	setenv("GIT_TEST_READ_COMMIT_TABLE", "0", 1);
+
+	bitmap_git = prepare_bitmap_git(r);
 	if (!bitmap_git)
 		die("failed to load bitmap indexes");
 
@@ -1764,6 +1936,7 @@ int test_bitmap_commits(struct repository *r)
 		printf("%s\n", oid_to_hex(&oid));
 	});
 
+	setenv("GIT_TEST_READ_COMMIT_TABLE", "1", 1);
 	free_bitmap_index(bitmap_git);
 
 	return 0;
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index c669ed959e9..10d7691d973 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -42,6 +42,12 @@ test_expect_success 'full repack creates bitmaps' '
 	grep "\"label\":\"writing_lookup_table\"" trace
 '
 
+test_expect_success 'using lookup table loads only necessary bitmaps' '
+	git rev-list --test-bitmap HEAD 2>out &&
+	! grep "Bitmap v1 test (106 entries loaded)" out &&
+	grep "Found bitmap for" out
+'
+
 basic_bitmap_tests
 
 test_expect_success 'incremental repack fails when bitmaps are requested' '
@@ -255,6 +261,7 @@ test_expect_success 'pack reuse respects --incremental' '
 
 test_expect_success 'truncated bitmap fails gracefully (ewah)' '
 	test_config pack.writebitmaphashcache false &&
+	test_config pack.writebitmaplookuptable false &&
 	git repack -ad &&
 	git rev-list --use-bitmap-index --count --all >expect &&
 	bitmap=$(ls .git/objects/pack/*.bitmap) &&
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 43be49617b8..7d36dbcf722 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -320,4 +320,5 @@ test_expect_success 'multi-pack-index write writes lookup table if enabled' '
 		grep "\"label\":\"writing_lookup_table\"" trace
 	)
 '
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v2 5/6] bitmap-lookup-table: add performance tests for lookup table
  2022-06-26 13:10 ` [PATCH v2 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                     ` (3 preceding siblings ...)
  2022-06-26 13:10   ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
@ 2022-06-26 13:10   ` Abhradeep Chakraborty via GitGitGadget
  2022-06-27 21:53     ` Taylor Blau
  2022-06-26 13:10   ` [PATCH v2 6/6] p5310-pack-bitmaps.sh: enable pack.writeReverseIndex for testing Abhradeep Chakraborty via GitGitGadget
  2022-07-04  8:46   ` [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
  6 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-26 13:10 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee,
	Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Add performance tests to verify the performance of lookup table.

Lookup table makes Git run faster in most of the cases. Below is the
result of `t/perf/p5310-pack-bitmaps.sh`.`perf/p5326-multi-pack-bitmaps.sh`
gives similar result. The repository used in the test is linux kernel.

Test                                                      this tree
--------------------------------------------------------------------------
5310.4: repack to disk (lookup=false)                   295.94(250.45+15.24)
5310.5: simulated clone                                 12.52(5.07+1.40)
5310.6: simulated fetch                                 1.89(2.94+0.24)
5310.7: pack to file (bitmap)                           41.39(20.33+7.20)
5310.8: rev-list (commits)                              0.98(0.59+0.12)
5310.9: rev-list (objects)                              3.40(3.27+0.10)
5310.10: rev-list with tag negated via --not		0.07(0.02+0.04)
         --all (objects)
5310.11: rev-list with negative tag (objects)           0.23(0.16+0.06)
5310.12: rev-list count with blob:none                  0.26(0.18+0.07)
5310.13: rev-list count with blob:limit=1k              6.45(5.94+0.37)
5310.14: rev-list count with tree:0                     0.26(0.18+0.07)
5310.15: simulated partial clone                        4.99(3.19+0.45)
5310.19: repack to disk (lookup=true)                   269.67(174.70+21.33)
5310.20: simulated clone                                11.03(5.07+1.11)
5310.21: simulated fetch                                0.79(0.79+0.17)
5310.22: pack to file (bitmap)                          43.03(20.28+7.43)
5310.23: rev-list (commits)                             0.86(0.54+0.09)
5310.24: rev-list (objects)                             3.35(3.26+0.07)
5310.25: rev-list with tag negated via --not		0.05(0.00+0.03)
	 --all (objects)
5310.26: rev-list with negative tag (objects)           0.22(0.16+0.05)
5310.27: rev-list count with blob:none                  0.22(0.16+0.05)
5310.28: rev-list count with blob:limit=1k              6.45(5.87+0.31)
5310.29: rev-list count with tree:0                     0.22(0.16+0.05)
5310.30: simulated partial clone                        5.17(3.12+0.48)

Test 4-15 are tested without using lookup table. Same tests are
repeated in 16-30 (using lookup table).

Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
---
 t/perf/p5310-pack-bitmaps.sh       | 77 ++++++++++++++-----------
 t/perf/p5326-multi-pack-bitmaps.sh | 93 ++++++++++++++++--------------
 2 files changed, 94 insertions(+), 76 deletions(-)

diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index 7ad4f237bc3..6ff42bdd391 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -16,39 +16,48 @@ test_expect_success 'setup bitmap config' '
 	git config pack.writebitmaps true
 '
 
-# we need to create the tag up front such that it is covered by the repack and
-# thus by generated bitmaps.
-test_expect_success 'create tags' '
-	git tag --message="tag pointing to HEAD" perf-tag HEAD
-'
-
-test_perf 'repack to disk' '
-	git repack -ad
-'
-
-test_full_bitmap
-
-test_expect_success 'create partial bitmap state' '
-	# pick a commit to represent the repo tip in the past
-	cutoff=$(git rev-list HEAD~100 -1) &&
-	orig_tip=$(git rev-parse HEAD) &&
-
-	# now kill off all of the refs and pretend we had
-	# just the one tip
-	rm -rf .git/logs .git/refs/* .git/packed-refs &&
-	git update-ref HEAD $cutoff &&
-
-	# and then repack, which will leave us with a nice
-	# big bitmap pack of the "old" history, and all of
-	# the new history will be loose, as if it had been pushed
-	# up incrementally and exploded via unpack-objects
-	git repack -Ad &&
-
-	# and now restore our original tip, as if the pushes
-	# had happened
-	git update-ref HEAD $orig_tip
-'
-
-test_partial_bitmap
+test_bitmap () {
+    local enabled="$1"
+
+	# we need to create the tag up front such that it is covered by the repack and
+	# thus by generated bitmaps.
+	test_expect_success 'create tags' '
+		git tag --message="tag pointing to HEAD" perf-tag HEAD
+	'
+
+	test_expect_success "use lookup table: $enabled" '
+		git config pack.writeBitmapLookupTable '"$enabled"'
+	'
+
+	test_perf "repack to disk (lookup=$enabled)" '
+		git repack -ad
+	'
+
+	test_full_bitmap
+
+    test_expect_success "create partial bitmap state (lookup=$enabled)" '
+		# pick a commit to represent the repo tip in the past
+		cutoff=$(git rev-list HEAD~100 -1) &&
+		orig_tip=$(git rev-parse HEAD) &&
+
+		# now kill off all of the refs and pretend we had
+		# just the one tip
+		rm -rf .git/logs .git/refs/* .git/packed-refs &&
+		git update-ref HEAD $cutoff &&
+
+		# and then repack, which will leave us with a nice
+		# big bitmap pack of the "old" history, and all of
+		# the new history will be loose, as if it had been pushed
+		# up incrementally and exploded via unpack-objects
+		git repack -Ad &&
+
+		# and now restore our original tip, as if the pushes
+		# had happened
+		git update-ref HEAD $orig_tip
+    '
+}
+
+test_bitmap false
+test_bitmap true
 
 test_done
diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
index f2fa228f16a..d67e7437493 100755
--- a/t/perf/p5326-multi-pack-bitmaps.sh
+++ b/t/perf/p5326-multi-pack-bitmaps.sh
@@ -6,47 +6,56 @@ test_description='Tests performance using midx bitmaps'
 
 test_perf_large_repo
 
-# we need to create the tag up front such that it is covered by the repack and
-# thus by generated bitmaps.
-test_expect_success 'create tags' '
-	git tag --message="tag pointing to HEAD" perf-tag HEAD
-'
-
-test_expect_success 'start with bitmapped pack' '
-	git repack -adb
-'
-
-test_perf 'setup multi-pack index' '
-	git multi-pack-index write --bitmap
-'
-
-test_expect_success 'drop pack bitmap' '
-	rm -f .git/objects/pack/pack-*.bitmap
-'
-
-test_full_bitmap
-
-test_expect_success 'create partial bitmap state' '
-	# pick a commit to represent the repo tip in the past
-	cutoff=$(git rev-list HEAD~100 -1) &&
-	orig_tip=$(git rev-parse HEAD) &&
-
-	# now pretend we have just one tip
-	rm -rf .git/logs .git/refs/* .git/packed-refs &&
-	git update-ref HEAD $cutoff &&
-
-	# and then repack, which will leave us with a nice
-	# big bitmap pack of the "old" history, and all of
-	# the new history will be loose, as if it had been pushed
-	# up incrementally and exploded via unpack-objects
-	git repack -Ad &&
-	git multi-pack-index write --bitmap &&
-
-	# and now restore our original tip, as if the pushes
-	# had happened
-	git update-ref HEAD $orig_tip
-'
-
-test_partial_bitmap
+test_bitmap () {
+    local enabled="$1"
+
+	# we need to create the tag up front such that it is covered by the repack and
+	# thus by generated bitmaps.
+	test_expect_success 'create tags' '
+		git tag --message="tag pointing to HEAD" perf-tag HEAD
+	'
+
+	test_expect_success "use lookup table: $enabled" '
+		git config pack.writeBitmapLookupTable '"$enabled"'
+	'
+
+	test_expect_success "start with bitmapped pack (lookup=$enabled)" '
+		git repack -adb
+	'
+
+	test_perf "setup multi-pack index (lookup=$enabled)" '
+		git multi-pack-index write --bitmap
+	'
+
+	test_expect_success "drop pack bitmap (lookup=$enabled)" '
+		rm -f .git/objects/pack/pack-*.bitmap
+	'
+
+	test_full_bitmap
+
+    test_expect_success "create partial bitmap state (lookup=$enabled)" '
+		# pick a commit to represent the repo tip in the past
+		cutoff=$(git rev-list HEAD~100 -1) &&
+		orig_tip=$(git rev-parse HEAD) &&
+
+		# now pretend we have just one tip
+		rm -rf .git/logs .git/refs/* .git/packed-refs &&
+		git update-ref HEAD $cutoff &&
+
+		# and then repack, which will leave us with a nice
+		# big bitmap pack of the "old" history, and all of
+		# the new history will be loose, as if it had been pushed
+		# up incrementally and exploded via unpack-objects
+		git repack -Ad &&
+		git multi-pack-index write --bitmap &&
+
+		# and now restore our original tip, as if the pushes
+		# had happened
+		git update-ref HEAD $orig_tip
+    '
+}
+
+test_bitmap false
+test_bitmap true
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v2 6/6] p5310-pack-bitmaps.sh: enable pack.writeReverseIndex for testing
  2022-06-26 13:10 ` [PATCH v2 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                     ` (4 preceding siblings ...)
  2022-06-26 13:10   ` [PATCH v2 5/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
@ 2022-06-26 13:10   ` Abhradeep Chakraborty via GitGitGadget
  2022-06-27 21:50     ` Taylor Blau
  2022-07-04  8:46   ` [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
  6 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-06-26 13:10 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee,
	Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Enable pack.writeReverseIndex to true to see the effect of writing
the reverse index in the existing bitmap tests (with and without
lookup table).

Below is the result of performance test. Output format is in
seconds.

Test                                             this tree
-------------------------------------------------------------------
5310.4: repack to disk (lookup=false)           294.92(257.60+14.29)
5310.5: simulated clone                         14.97(8.95+1.31)
5310.6: simulated fetch                         1.64(2.77+0.20)
5310.7: pack to file (bitmap)                   41.76(29.33+6.77)
5310.8: rev-list (commits)                      0.71(0.49+0.09)
5310.9: rev-list (objects)                      4.65(4.55+0.09)
5310.10: rev-list with tag negated via --not	0.08(0.02+0.05)
	 --all (objects)
5310.11: rev-list with negative tag (objects)   0.06(0.01+0.04)
5310.12: rev-list count with blob:none          0.09(0.03+0.05)
5310.13: rev-list count with blob:limit=1k      7.58(7.06+0.33)
5310.14: rev-list count with tree:0             0.09(0.03+0.06)
5310.15: simulated partial clone                8.64(8.04+0.35)
5310.19: repack to disk (lookup=true)           249.86(191.57+19.50)
5310.20: simulated clone                        13.67(8.83+1.06)
5310.21: simulated fetch                        0.50(0.63+0.13)
5310.22: pack to file (bitmap)                  41.24(28.99+6.67)
5310.23: rev-list (commits)                     0.67(0.50+0.07)
5310.24: rev-list (objects)                     4.88(4.79+0.08)
5310.25: rev-list with tag negated via --not    0.04(0.00+0.03)
	 --all (objects)
5310.26: rev-list with negative tag (objects)   0.05(0.00+0.04)
5310.27: rev-list count with blob:none          0.05(0.01+0.03)
5310.28: rev-list count with blob:limit=1k      8.02(7.16+0.34)
5310.29: rev-list count with tree:0             0.05(0.01+0.04)
5310.30: simulated partial clone                8.57(8.16+0.32)

Tests 4-15 are without the use of lookup table. The rests are
repeatation of the previous tests but using lookup table.

Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
---
 t/perf/p5310-pack-bitmaps.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index 6ff42bdd391..9848c5d5040 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -13,7 +13,8 @@ test_perf_large_repo
 # We intentionally use the deprecated pack.writebitmaps
 # config so that we can test against older versions of git.
 test_expect_success 'setup bitmap config' '
-	git config pack.writebitmaps true
+	git config pack.writebitmaps true &&
+	git config pack.writeReverseIndex true
 '
 
 test_bitmap () {
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-06-26 13:10   ` [PATCH v2 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
@ 2022-06-27 14:18     ` Derrick Stolee
  2022-06-27 15:48       ` Taylor Blau
  2022-06-27 16:51       ` Abhradeep Chakraborty
  0 siblings, 2 replies; 162+ messages in thread
From: Derrick Stolee @ 2022-06-27 14:18 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget, git
  Cc: Taylor Blau, Kaartic Sivaram, Abhradeep Chakraborty

On 6/26/2022 9:10 AM, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> 
> When reading bitmap file, git loads each and every bitmap one by one
> even if all the bitmaps are not required. A "bitmap lookup table"
> extension to the bitmap format can reduce the overhead of loading
> bitmaps which stores a list of bitmapped commit id pos (in the midx
> or pack, along with their offset and xor offset. This way git can
> load only the neccesary bitmaps without loading the previous bitmaps.

s/neccesary/necessary/

> +			** {empty}
> +			BITMAP_OPT_LOOKUP_TABLE (0x10): :::
> +			If present, the end of the bitmap file contains a table
> +			containing a list of `N` <commit pos, offset, xor offset>

(Note that "commit pos" and "xor offset" here don't have underscores, but
your discussion below does use "xor_offset" with underscores.)

> +			triplets. The format and meaning of the table is described
> +			below.
> ++
> +NOTE: This xor_offset is different from the bitmap's xor_offset.
> +Bitmap's xor_offset is relative i.e. it tells how many bitmaps we have
> +to go back from the current bitmap. Lookup table's xor_offset tells the
> +position of the triplet in the list whose bitmap the current commit's
> +bitmap have to xor with.

I found this difficult to parse. Here is an attempt at a rewording. Please
let me know if I misunderstood something when reading your version:

  NOTE: The xor_offset stored in the BITMAP_OPT_LOOKUP_TABLE is different
  from the xor_offset used in the bitmap data table. The xor_offset in this
  table indicates the row number within this table of the commit whose
  bitmap is used for the XOR computation with the current commit's stored
  bitmap to create the proper logical reachability bitmap.

This does make me think that "xor_offset" should really be "xor_row" or
something like that.

>  		4-byte entry count (network byte order)
>  
>  			The total count of entries (bitmapped commits) in this bitmap index.
> @@ -205,3 +218,31 @@ Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag.
>  If implementations want to choose a different hashing scheme, they are
>  free to do so, but MUST allocate a new header flag (because comparing
>  hashes made under two different schemes would be pointless).
> +
> +Commit lookup table
> +-------------------
> +
> +If the BITMAP_OPT_LOOKUP_TABLE flag is set, the last `N * (4 + 8 + 4)`
> +(preceding the name-hash cache and trailing hash) of the `.bitmap` file
> +contains a lookup table specifying the information needed to get the
> +desired bitmap from the entries without parsing previous unnecessary
> +bitmaps.
> +
> +For a `.bitmap` containing `nr_entries` reachability bitmaps, the table
> +contains a list of `nr_entries` <commit pos, offset, xor offset> triplets.
> +The content of i'th triplet is -
> +
> +	* {empty}
> +	commit pos (4 byte integer, network byte order): ::
> +	It stores the object position of the commit (in the midx or pack index)
> +	to which the i'th bitmap in the bitmap entries belongs.

Ok, we are saving some space here, but relying on looking into the pack-index
or multi-pack-index to get the actual commit OID.

Since this is sorted by the order that stores the bitmaps, binary search will
no longer work on this list (unless we enforce that on the rest of the bitmap
file). I am going to expect that you parse this table into a hashmap in order
to allow fast commit lookups. I'll keep an eye out for that implementation.

> +	* {empty}
> +	offset (8 byte integer, network byte order): ::
> +	The offset from which that commit's bitmap can be read.
> +
> +	* {empty}
> +	xor offset (4 byte integer, network byte order): ::
> +	It holds the position of the triplet with whose bitmap the
> +	current bitmap need to xor. If the current triplet's bitmap
> +	do not have any xor bitmap, it defaults to 0xffffffff.

This last sentence seems backward. Perhaps:

  If the value is 0xffffffff, then the current bitmap has no xor bitmap.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 2/6] pack-bitmap-write.c: write lookup table extension
  2022-06-26 13:10   ` [PATCH v2 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
@ 2022-06-27 14:35     ` Derrick Stolee
  2022-06-27 16:12       ` Taylor Blau
  2022-06-27 17:10       ` Abhradeep Chakraborty
  2022-06-27 16:05     ` Taylor Blau
  1 sibling, 2 replies; 162+ messages in thread
From: Derrick Stolee @ 2022-06-27 14:35 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget, git
  Cc: Taylor Blau, Kaartic Sivaram, Abhradeep Chakraborty

On 6/26/2022 9:10 AM, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> 
> The bitmap lookup table extension was documentated by an earlier

s/documentated/documented/

> change, but Git does not yet knowhow to write that extension.

s/knowhow/know how/

> +static int table_cmp(const void *_va, const void *_vb, void *commit_positions)
> +{
> +	int8_t result = 0;
> +	uint32_t *positions = (uint32_t *) commit_positions;

nit: drop the space between the cast and commit_positions.

> +	uint32_t a = positions[*(uint32_t *)_va];
> +	uint32_t b = positions[*(uint32_t *)_vb];
> +
> +	if (a > b)
> +		result = 1;
> +	else if (a < b)
> +		result = -1;
> +	else
> +		result = 0;
> +
> +	return result;
> +}

Ok, here you are sorting by commit OID (indirectly by the order in the
[multi-]pack-index). I suppose that I misunderstood in the previous
patch, so that could use some more specific language, maybe.

> +static void write_lookup_table(struct hashfile *f,
> +			       uint64_t *offsets,
> +			       uint32_t *commit_positions)
> +{
> +	uint32_t i;
> +	uint32_t *table, *table_inv;
> +
> +	ALLOC_ARRAY(table, writer.selected_nr);
> +	ALLOC_ARRAY(table_inv, writer.selected_nr);
> +
> +	for (i = 0; i < writer.selected_nr; i++)
> +		table[i] = i;
> +
> +	QSORT_S(table, writer.selected_nr, table_cmp, commit_positions);

At the end of this sort, table[j] = i means that the ith bitmap corresponds
to the jth bitmapped commit in lex order of OIDs.

> +	for (i = 0; i < writer.selected_nr; i++)
> +		table_inv[table[i]] = i;

And table_inv helps us discover that relationship (ith bitmap to jth commit
by j = table_inv[i]).

> +	for (i = 0; i < writer.selected_nr; i++) {
> +		struct bitmapped_commit *selected = &writer.selected[table[i]];
> +		uint32_t xor_offset = selected->xor_offset;

Here, xor_offset is "number of bitmaps in relationship to the current bitmap"

> +		hashwrite_be32(f, commit_positions[table[i]]);
> +		hashwrite_be64(f, offsets[table[i]]);
> +		hashwrite_be32(f, xor_offset ?
> +				table_inv[table[i] - xor_offset]: 0xffffffff);

Which means that if "k = table[i] - xor_offset" that the xor base is the kth
bitmap. table_inv[k] gets us the position in this table of that bitmap's
commit.

(It's also strange to me that the offset is being _subtracted_, but I guess
the bitmap format requires the xor base to appear first so the offset does
not need to be a negative number ever.)

This last line is a bit complex.

	uint32_t xor_offset = selected->xor_offset;
	uint32_t xor_row = 0xffffffff;

	if (xor_offset) {
		uint32_t xor_order = table[i] - xor_offset;
		xor_row = table_inf[xor_order];
	}

...then we can "hashwrite_be32(f, xor_row);" when necessary. I'm not sure
that we need the "uint32_t xor_order" inside the "if (xor_offset)" block,
but splitting it helps add clarity to the multi-step computation.

>  enum pack_bitmap_opts {
> -	BITMAP_OPT_FULL_DAG = 1,
> -	BITMAP_OPT_HASH_CACHE = 4,
> +	BITMAP_OPT_FULL_DAG = 0x1,
> +	BITMAP_OPT_HASH_CACHE = 0x4,
> +	BITMAP_OPT_LOOKUP_TABLE = 0x10,
>  };

Excellent.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-06-26 13:10   ` [PATCH v2 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
@ 2022-06-27 14:43     ` Derrick Stolee
  2022-06-27 17:42       ` Abhradeep Chakraborty
  2022-06-27 17:47     ` Taylor Blau
  1 sibling, 1 reply; 162+ messages in thread
From: Derrick Stolee @ 2022-06-27 14:43 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget, git
  Cc: Taylor Blau, Kaartic Sivaram, Abhradeep Chakraborty

On 6/26/2022 9:10 AM, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> 
> Teach git to provide a way for users to enable/disable bitmap lookup
> table extension by providing a config option named 'writeBitmapLookupTable'.
> Default is true.

I wonder if it makes sense to have it default to 'false' for now, but to
change that default after the feature has been shipped and running in
production for a while.

> Also add test to verify writting of lookup table.

s/writting/writing/

> +pack.writeBitmapLookupTable::
> +	When true, git will include a "lookup table" section in the

I think you should either use "Git" when talking about the software
generally, OR use "`git repack --write-bitmap-index` will include..."

> +	bitmap index (if one is written). This table is used to defer
> +	loading individual bitmaps as late as possible. This can be
> +	beneficial in repositories which have relatively large bitmap

s/which/that/

(I'm pretty sure that "that" is better. We're trying to restrict the set
of repositories we are talking about, not implying that all repositories
have this property.)

> +	indexes. Defaults to true.
> +

> --- a/pack-bitmap-write.c
> +++ b/pack-bitmap-write.c
> @@ -713,6 +713,7 @@ static void write_lookup_table(struct hashfile *f,
>  	for (i = 0; i < writer.selected_nr; i++)
>  		table_inv[table[i]] = i;
>  
> +	trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository);
>  	for (i = 0; i < writer.selected_nr; i++) {
>  		struct bitmapped_commit *selected = &writer.selected[table[i]];
>  		uint32_t xor_offset = selected->xor_offset;
> @@ -725,6 +726,7 @@ static void write_lookup_table(struct hashfile *f,
>  
>  	free(table);
>  	free(table_inv);
> +	trace2_region_leave("pack-bitmap-write", "writing_lookup_table", the_repository);
>  }

These lines seem misplaced. Maybe they were meant for the previous
patch?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension
  2022-06-26 13:10   ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
@ 2022-06-27 15:12     ` Derrick Stolee
  2022-06-27 18:06       ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table Abhradeep Chakraborty
  2022-06-27 21:49       ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension Taylor Blau
  2022-06-27 21:38     ` Taylor Blau
  1 sibling, 2 replies; 162+ messages in thread
From: Derrick Stolee @ 2022-06-27 15:12 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget, git
  Cc: Taylor Blau, Kaartic Sivaram, Abhradeep Chakraborty

On 6/26/2022 9:10 AM, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> 
> Earlier change teaches Git to write bitmap lookup table. But Git
> does not know how to parse them.
> 
> Teach Git to parse the existing bitmap lookup table. The older
> versions of git are not affected by it. Those versions ignore the
> lookup table.
> 
> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> Mentored-by: Taylor Blau <me@ttaylorr.com>
> Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>

I didn't check the previous patches, but your sign-off should be the
last line of the message. (You are singing off on all previous content,
and any later content is not covered by your sign-off.)

> +
> +		if (flags & BITMAP_OPT_LOOKUP_TABLE &&
> +			git_env_bool("GIT_TEST_READ_COMMIT_TABLE", 1)) {

nit: This alignment should use four spaces at the end so the second phrase
matches the start of the previous phrase. Like this:

		if (flags & BITMAP_OPT_LOOKUP_TABLE &&
		    git_env_bool("GIT_TEST_READ_COMMIT_TABLE", 1)) {

Perhaps it looked right in your editor because it renders tabs as 4 spaces
instead of 8 spaces.

> +			size_t table_size = 0;
> +			size_t triplet_sz = st_add3(sizeof(uint32_t),    /* commit position */
> +							sizeof(uint64_t),    /* offset */
> +							sizeof(uint32_t));    /* xor offset */

The 4- vs 8-space tab view would also explain the alignment here:

			size_t triplet_sz = st_add3(sizeof(uint32_t),  /* commit position */
						    sizeof(uint64_t),  /* offset */
						    sizeof(uint32_t)); /* xor offset */

(I also modified the comment alignment.)

Of course, since these values are constants and have no risk of overflowing,
perhaps we can drop st_add3() here:


			size_t triplet_sz = sizeof(uint32_t) + /* commit position */
					    sizeof(uint64_t) +  /* offset */
					    sizeof(uint32_t); /* xor offset */

> +			table_size = st_add(table_size,
> +					st_mult(ntohl(header->entry_count),
> +						triplet_sz));

Here, we _do_ want to keep the st_mult(). Is the st_add() still necessary? It
seems this is a leftover from the previous version that had the 4-byte flag
data.

We set table_size to zero above. We could drop that initialization and instead
have this after the "size_t triplet_sz" definition:

			size_t table_size = st_mult(ntohl(header->entry_count),
						    triplet_sz));

> +			if (table_size > index_end - index->map - header_size)
> +				return error("corrupted bitmap index file (too short to fit lookup table)");

Please add "_(...)" around the error message so it can be translated.

> +			index->table_lookup = (void *)(index_end - table_size);
> +			index_end -= table_size;
> +		}

> -	/* a 0 return code means the insertion succeeded with no changes,
> -	 * because the SHA1 already existed on the map. this is bad, there
> -	 * shouldn't be duplicated commits in the index */
> +	/* A 0 return code means the insertion succeeded with no changes,
> +	 * because the SHA1 already existed on the map. If lookup table
> +	 * is NULL, this is bad, there shouldn't be duplicated commits
> +	 * in the index.
> +	 *
> +	 * If table_lookup exists, that means the desired bitmap is already
> +	 * loaded. Either this bitmap has been stored directly or another
> +	 * bitmap has a direct or indirect xor relation with it. */

If we are modifying this multi-line comment, then we should reformat it to
match convention:

	/*
	 * The first sentence starts after the comment start
	 * so it has symmetry with the comment end which is on
	 * its own line.
	 */

>  	if (ret == 0) {
> -		error("Duplicate entry in bitmap index: %s", oid_to_hex(oid));
> -		return NULL;
> +		if (!index->table_lookup) {
> +			error("Duplicate entry in bitmap index: %s", oid_to_hex(oid));

Errors start with lowercase letters. Please add translation markers "_(...)"

> +static uint32_t triplet_get_xor_pos(const void *triplet)
> +{
> +	const void *p = (unsigned char*) triplet + st_add(sizeof(uint32_t), sizeof(uint64_t));

This st_add() is not necessary since the constants will not overflow.

> +	return get_be32(p);
> +}
> +
> +static int triplet_cmp(const void *va, const void *vb)
> +{
> +	int result = 0;
> +	uint32_t *a = (uint32_t *) va;
> +	uint32_t b = get_be32(vb);
> +	if (*a > b)
> +		result = 1;
> +	else if (*a < b)
> +		result = -1;
> +	else
> +		result = 0;
> +
> +	return result;
> +}
> +
> +static uint32_t bsearch_pos(struct bitmap_index *bitmap_git, struct object_id *oid,
> +						uint32_t *result)

Strange wrapping. Perhaps

static uint32_t bsearch_pos(struct bitmap_index *bitmap_git,
			    struct object_id *oid,
			    uint32_t *result)

> +{
> +	int found;
> +
> +	if (bitmap_git->midx)
> +		found = bsearch_midx(oid, bitmap_git->midx, result);
> +	else
> +		found = bsearch_pack(oid, bitmap_git->pack, result);
> +
> +	return found;

Here, we are doing a binary search on the entire list of packed objects, which could
use quite a few more hops than a binary search on the bitmapped commits.

> +static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_git,
> +					  struct commit *commit)
...
> +	int found = bsearch_pos(bitmap_git, oid, &commit_pos);
> +
> +	if (!found)
> +		return NULL;
> +
> +	triplet = bsearch(&commit_pos, bitmap_git->table_lookup, bitmap_git->entry_count,
> +						triplet_sz, triplet_cmp);

But I see, you are searching the pack-index for the position in the index, and _then_
searching the bitmap lookup table based on that position value.

I expected something different: binary search on the triplets where the comparison is
made by looking up the OID from the [multi-]pack-index and comparing that OID to the
commit OID we are looking for.

I'm not convinced that the binary search I had in mind is meaningfully faster than
what you've implemented here, so I'm happy to leave it as you have it. We can investigate
if that full search on the pack-index matters at all (it probably doesn't).

> +	if (!triplet)
> +		return NULL;
> +
> +	offset = triplet_get_offset(triplet);
> +	xor_pos = triplet_get_xor_pos(triplet);
> +
> +	if (xor_pos != 0xffffffff) {
> +		int xor_flags;
> +		uint64_t offset_xor;
> +		uint32_t *xor_positions;
> +		struct object_id xor_oid;
> +		size_t size = 0;
> +
> +		ALLOC_ARRAY(xor_positions, bitmap_git->entry_count);

While there is potential that this is wasteful, it's probably not that huge,
so we can start with the "maximum XOR depth" and then reconsider a smaller
allocation in the future.

> +		while (xor_pos != 0xffffffff) {

We should consider ensuring that also "size < bitmap_git->entry_count".
Better yet, create an xor_positions_alloc variable that is initialized
to the entry_count value.

"size" should probably be xor_positions_nr.

> +			xor_positions[size++] = xor_pos;
> +			triplet = bitmap_get_triplet(bitmap_git, xor_pos);
> +			xor_pos = triplet_get_xor_pos(triplet);
> +		}

(at this point, "if (xor_positions_nr >= xor_positions_alloc)", then error
out since the file must be malformed with an XOR loop.)

> +		while (size){

nit: ") {"

> +			xor_pos = xor_positions[size - 1];
> +			triplet = bitmap_get_triplet(bitmap_git, xor_pos);
> +			commit_pos = get_be32(triplet);
> +			offset_xor = triplet_get_offset(triplet);
> +
> +			if (nth_bitmap_object_oid(bitmap_git, &xor_oid, commit_pos) < 0) {
> +				free(xor_positions);
> +				return NULL;
> +			}
> +
> +			bitmap_git->map_pos = offset_xor + sizeof(uint32_t) + sizeof(uint8_t);
> +			xor_flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
> +			bitmap = read_bitmap_1(bitmap_git);
> +
> +			if (!bitmap){

nit: ") {"

> +				free(xor_positions);
> +				return NULL;
> +			}
> +
> +			xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_oid, xor_bitmap, xor_flags);

Since we are storing the bitmap here as we "pop" the stack, should we be
looking for a stored bitmap while pushing to the stack in the previous loop?
That would save time when using multiple bitmaps with common XOR bases.

(Of course, we want to be careful that we do not create a recursive loop,
but instead _only_ look at the in-memory bitmaps that already exist.)

> +			size--;
> +		}
> +
> +		free(xor_positions);
> +	}
> +
> +	bitmap_git->map_pos = offset + sizeof(uint32_t) + sizeof(uint8_t);
> +	flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
> +	bitmap = read_bitmap_1(bitmap_git);
> +
> +	if (!bitmap)
> +		return NULL;
> +
> +	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags);
> +}
> +

I'm happy with the structure of this iterative algorithm!

I'll pause my review here for now.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-06-27 14:18     ` Derrick Stolee
@ 2022-06-27 15:48       ` Taylor Blau
  2022-06-27 16:51       ` Abhradeep Chakraborty
  1 sibling, 0 replies; 162+ messages in thread
From: Taylor Blau @ 2022-06-27 15:48 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaram,
	Abhradeep Chakraborty

On Mon, Jun 27, 2022 at 10:18:51AM -0400, Derrick Stolee wrote:
> On 6/26/2022 9:10 AM, Abhradeep Chakraborty via GitGitGadget wrote:
> > +			triplets. The format and meaning of the table is described
> > +			below.
> > ++
> > +NOTE: This xor_offset is different from the bitmap's xor_offset.
> > +Bitmap's xor_offset is relative i.e. it tells how many bitmaps we have
> > +to go back from the current bitmap. Lookup table's xor_offset tells the
> > +position of the triplet in the list whose bitmap the current commit's
> > +bitmap have to xor with.
>
> I found this difficult to parse. Here is an attempt at a rewording. Please
> let me know if I misunderstood something when reading your version:
>
>   NOTE: The xor_offset stored in the BITMAP_OPT_LOOKUP_TABLE is different
>   from the xor_offset used in the bitmap data table. The xor_offset in this
>   table indicates the row number within this table of the commit whose
>   bitmap is used for the XOR computation with the current commit's stored
>   bitmap to create the proper logical reachability bitmap.
>
> This does make me think that "xor_offset" should really be "xor_row" or
> something like that.

To be fair, I found Stolee's version equally difficult to parse. I
wonder if something like the following would be clearer:

    NOTE: Unlike the xor_offset used to compress an individual bitmap,
    this value stores an *absolute* index into the lookup table, not a
    location relative to the current entry.

> > +For a `.bitmap` containing `nr_entries` reachability bitmaps, the table
> > +contains a list of `nr_entries` <commit pos, offset, xor offset> triplets.
> > +The content of i'th triplet is -
> > +
> > +	* {empty}
> > +	commit pos (4 byte integer, network byte order): ::
> > +	It stores the object position of the commit (in the midx or pack index)
> > +	to which the i'th bitmap in the bitmap entries belongs.
>
> Ok, we are saving some space here, but relying on looking into the pack-index
> or multi-pack-index to get the actual commit OID.
>
> Since this is sorted by the order that stores the bitmaps, binary search will
> no longer work on this list (unless we enforce that on the rest of the bitmap
> file). I am going to expect that you parse this table into a hashmap in order
> to allow fast commit lookups. I'll keep an eye out for that implementation.

The main purpose of this series is to avoid having to construct such a
table ahead of time. This is more or less akin to what the existing
implementation already does in load_bitmap_entries_v1(), though that
function has to read (but not decompress!) all bitmaps.

But I disagree that this isn't binary searchable. The object positions
are in MIDX or pack .idx order, so they are sorted lexicographically.
The comparator implementation could either take as its key an object_id,
and then convert each of the "commit pos" fields themselves to
object_ids and call oidcmp().

Or we could go the other way (as it looks like Abhradeep did in a later
patch) and convert the key's object_id into the index or MIDX-relative
position, and search for that.

> > +	* {empty}
> > +	offset (8 byte integer, network byte order): ::
> > +	The offset from which that commit's bitmap can be read.
> > +
> > +	* {empty}
> > +	xor offset (4 byte integer, network byte order): ::
> > +	It holds the position of the triplet with whose bitmap the
> > +	current bitmap need to xor. If the current triplet's bitmap
> > +	do not have any xor bitmap, it defaults to 0xffffffff.
>
> This last sentence seems backward. Perhaps:
>
>   If the value is 0xffffffff, then the current bitmap has no xor bitmap.

Perhaps even more concisely:

    The position of a triplet whose bitmap is used to compress this one,
    or 0xffffffff if no such bitmap exists.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 2/6] pack-bitmap-write.c: write lookup table extension
  2022-06-26 13:10   ` [PATCH v2 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
  2022-06-27 14:35     ` Derrick Stolee
@ 2022-06-27 16:05     ` Taylor Blau
  2022-06-27 18:29       ` Abhradeep Chakraborty
  1 sibling, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-06-27 16:05 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Kaartic Sivaram, Derrick Stolee, Abhradeep Chakraborty

On Sun, Jun 26, 2022 at 01:10:13PM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>
> The bitmap lookup table extension was documentated by an earlier
> change, but Git does not yet knowhow to write that extension.
>
> Teach git to write bitmap lookup table extension. The table contains
> the list of `N` <commit pos, offset, xor offset>` triplets. These
> triplets are sorted according to their commit pos (ascending order).
> The meaning of each data in the i'th triplet is given below:
>
>   - Commit pos is the position of the commit in the pack-index
>     (or midx) to which the i'th bitmap belongs. It is a 4 byte
>     network byte order integer.
>
>   - offset is the position of the i'th bitmap.
>
>   - xor offset denotes the position of the triplet with whose
>     bitmap the current triplet's bitmap need to xor with.
>
> Co-authored-by: Taylor Blau <me@ttaylorr.com>
> Mentored-by: Taylor Blau <me@ttaylorr.com>
> Co-mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> ---
>  pack-bitmap-write.c | 72 +++++++++++++++++++++++++++++++++++++++++++--
>  pack-bitmap.h       |  5 ++--
>  2 files changed, 73 insertions(+), 4 deletions(-)
>
> diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
> index c43375bd344..899a4a941e1 100644
> --- a/pack-bitmap-write.c
> +++ b/pack-bitmap-write.c
> @@ -650,7 +650,9 @@ static const struct object_id *oid_access(size_t pos, const void *table)
>
>  static void write_selected_commits_v1(struct hashfile *f,
>  				      struct pack_idx_entry **index,
> -				      uint32_t index_nr)
> +				      uint32_t index_nr,
> +				      uint64_t *offsets,

We should probably leave this as a pointer to an off_t, since that is a
more appropriate type for keeping track of an offset within a file (and
indeed it is the return type of hashfile_total()).

But since it's platform-dependent, we should make sure to cast it to a
uint64_t before writing it as part of the lookup table.

> +				      uint32_t *commit_positions)
>  {
>  	int i;
>
> @@ -663,6 +665,11 @@ static void write_selected_commits_v1(struct hashfile *f,
>  		if (commit_pos < 0)
>  			BUG("trying to write commit not in index");
>
> +		if (offsets)
> +			offsets[i] = hashfile_total(f);

This makes sense to store here, since we can't easily recover this
information later on.

> +		if (commit_positions)
> +			commit_positions[i] = commit_pos;

This one I'm not as sure about. It would be nice to not have
write_selected_commits_v1() be responsible for writing this down, too.
And I think it's easy enough to recover later on, since we're just doing
a search over "index" (see above the "oid_pos" call).

I think that oid_pos() call could be hidden behind a function that takes
an object_id pointer, an index (double pointer) of pack_idx_entry
structs, and a length.

Its implementation would be something like:

    static int commit_bitmap_writer_pos(struct object_id *oid,
                                        struct pack_idx_entry **index,
                                        uint32_t index_nr)
    {
        return oid_pos(oid, index, index_nr, oid_access);
    }

and then we could replace any calls like commit_positions[i] with one
that first takes `i` to the appropriate object_id in selected commit
order.

That would be strictly less efficient, but not in a way that I think
matters, and it would definitely be cleaner to not rely on a side-effect
of write_selected_commits_v1().

Something in the middle there would be to have write_lookup_table()
assemble that list of commit_positions itself, something like:

    uint32_t *commit_positions;

    ALLOC_ARRAY(commit_positions, writer.selected_nr);

    for (i = 0; i < writer.selected_nr; i++) {
        int pos = oid_pos(&writer.selected[i].commit->object.oid,
                          index, index_nr);
        if (pos < 0)
            BUG("trying to write commit not in index");
        commit_positions[i] = pos;
    }

    ...

    free(commit_positions);

That at least removes a side-effect from the implementation of
write_selected_commits_v1() and brings the creation of the
commit_positions array closer to where it's being used, while still
maintaining the constant-time lookups. So that may be a good
alternative, but I'm curious of your thoughts.

> +static int table_cmp(const void *_va, const void *_vb, void *commit_positions)

OK, so this is sorting the table in order of the commit positions. I
would rename the commit_positions parameter to something like "void
*_data", and then have commit_positions be the result of the cast, like
"uint32_t *commit_positions = _data";

> +{
> +	int8_t result = 0;

int8_t isn't an often used type in Git's codebase, but we can get rid of
this variable altogether and just return immediately from each case,
e.g.:

    if (a < b)
        return -1;
    else if (a > b)
        return 1;
    return 0;

or similar.

> +	uint32_t *positions = (uint32_t *) commit_positions;

Explicit cast isn't need here since you're going up from void*.

> +static void write_lookup_table(struct hashfile *f,
> +			       uint64_t *offsets,
> +			       uint32_t *commit_positions)
> +{
> +	uint32_t i;
> +	uint32_t *table, *table_inv;
> +
> +	ALLOC_ARRAY(table, writer.selected_nr);
> +	ALLOC_ARRAY(table_inv, writer.selected_nr);
> +
> +	for (i = 0; i < writer.selected_nr; i++)
> +		table[i] = i;
> +
> +	QSORT_S(table, writer.selected_nr, table_cmp, commit_positions);

I think the construction of table and table_inv could definitely benefit
from a comment here indicating what they're used for and what they
contain (e.g., "table maps abc to xyz").

> +	for (i = 0; i < writer.selected_nr; i++)
> +		table_inv[table[i]] = i;
> +
> +	for (i = 0; i < writer.selected_nr; i++) {
> +		struct bitmapped_commit *selected = &writer.selected[table[i]];
> +		uint32_t xor_offset = selected->xor_offset;
> +
> +		hashwrite_be32(f, commit_positions[table[i]]);
> +		hashwrite_be64(f, offsets[table[i]]);
> +		hashwrite_be32(f, xor_offset ?
> +				table_inv[table[i] - xor_offset]: 0xffffffff);

Nit: missing space before ':'.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 2/6] pack-bitmap-write.c: write lookup table extension
  2022-06-27 14:35     ` Derrick Stolee
@ 2022-06-27 16:12       ` Taylor Blau
  2022-06-27 17:10       ` Abhradeep Chakraborty
  1 sibling, 0 replies; 162+ messages in thread
From: Taylor Blau @ 2022-06-27 16:12 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaram,
	Abhradeep Chakraborty

On Mon, Jun 27, 2022 at 10:35:25AM -0400, Derrick Stolee wrote:
> On 6/26/2022 9:10 AM, Abhradeep Chakraborty via GitGitGadget wrote:
>
> > +	uint32_t a = positions[*(uint32_t *)_va];
> > +	uint32_t b = positions[*(uint32_t *)_vb];
> > +
> > +	if (a > b)
> > +		result = 1;
> > +	else if (a < b)
> > +		result = -1;
> > +	else
> > +		result = 0;
> > +
> > +	return result;
> > +}
>
> Ok, here you are sorting by commit OID (indirectly by the order in the
> [multi-]pack-index). I suppose that I misunderstood in the previous
> patch, so that could use some more specific language, maybe.

Yeah, I agree that some more specific language could be used, with the
main idea being there that we make it clearer that the list of tuples is
still sorted (and can be binary searched).

> > +	for (i = 0; i < writer.selected_nr; i++)
> > +		table[i] = i;
> > +
> > +	QSORT_S(table, writer.selected_nr, table_cmp, commit_positions);
>
> At the end of this sort, table[j] = i means that the ith bitmap corresponds
> to the jth bitmapped commit in lex order of OIDs.
>
> > +	for (i = 0; i < writer.selected_nr; i++)
> > +		table_inv[table[i]] = i;
>
> And table_inv helps us discover that relationship (ith bitmap to jth commit
> by j = table_inv[i]).

These are both great descriptions and should give an idea of what sort
of information is worth putting into a comment.
>
> > +	for (i = 0; i < writer.selected_nr; i++) {
> > +		struct bitmapped_commit *selected = &writer.selected[table[i]];
> > +		uint32_t xor_offset = selected->xor_offset;
>
> Here, xor_offset is "number of bitmaps in relationship to the current bitmap"

It's an offset to an earlier commit which must be used to XOR-decompress the
current one (if any).

> > +		hashwrite_be32(f, commit_positions[table[i]]);
> > +		hashwrite_be64(f, offsets[table[i]]);
> > +		hashwrite_be32(f, xor_offset ?
> > +				table_inv[table[i] - xor_offset]: 0xffffffff);
>
> Which means that if "k = table[i] - xor_offset" that the xor base is the kth
> bitmap. table_inv[k] gets us the position in this table of that bitmap's
> commit.

Yes, exactly. Abhradeep: this is also worth commenting ;-).

> (It's also strange to me that the offset is being _subtracted_, but I guess
> the bitmap format requires the xor base to appear first so the offset does
> not need to be a negative number ever.)

You're right, this follows from the fact that the XOR bases must come
before the commits who must use them to decompress themselves. From
Documentation/technical/bitmap-format.txt:

    This number is always positive, and hence entries are always xor'ed
    with **previous** bitmaps, not bitmaps that will come afterwards in
    the index.

> This last line is a bit complex.
>
> 	uint32_t xor_offset = selected->xor_offset;
> 	uint32_t xor_row = 0xffffffff;
>
> 	if (xor_offset) {
> 		uint32_t xor_order = table[i] - xor_offset;
> 		xor_row = table_inf[xor_order];
> 	}
>
> ...then we can "hashwrite_be32(f, xor_row);" when necessary. I'm not sure
> that we need the "uint32_t xor_order" inside the "if (xor_offset)" block,
> but splitting it helps add clarity to the multi-step computation.

I had the same thought, though I would also say that xor_row should be
declared, not initialized, and the "else" block of "if (xor_offset)"
should set it to 0xffffffff to make the relationship between xor_offset
and the value written a little clearer.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-06-27 14:18     ` Derrick Stolee
  2022-06-27 15:48       ` Taylor Blau
@ 2022-06-27 16:51       ` Abhradeep Chakraborty
  1 sibling, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-06-27 16:51 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Taylor Blau

Derrick Stolee <derrickstolee@github.com> wrote:

> I found this difficult to parse. Here is an attempt at a rewording. Please
> let me know if I misunderstood something when reading your version:
>
>   NOTE: The xor_offset stored in the BITMAP_OPT_LOOKUP_TABLE is different
>   from the xor_offset used in the bitmap data table. The xor_offset in this
>   table indicates the row number within this table of the commit whose
>   bitmap is used for the XOR computation with the current commit's stored
>   bitmap to create the proper logical reachability bitmap.
>
> This does make me think that "xor_offset" should really be "xor_row" or
> something like that.

Thanks. `xor_row` seems nice to me.

> > +	* {empty}
> > +	commit pos (4 byte integer, network byte order): ::
> > +	It stores the object position of the commit (in the midx or pack index)
> > +	to which the i'th bitmap in the bitmap entries belongs.
>
> Ok, we are saving some space here, but relying on looking into the pack-index
> or multi-pack-index to get the actual commit OID.

Seems like I didn't update this particular part. At the time of writing this
patch, I was clear that I would store these triplets in the bitmap's order.
But when I started to implement the "read" part, I realised that these triplets
need to be ordered in ascending order. So I did update the "write extension"
patch but somehow missed this particular part.

Just to be clear, bitmaps are sorted by their commit's date (as far as I know).
Bitmaps for recent commits comes before bitmaps for older commits. So these
two orders are not same. Thus hashmap would not work here.

Will update this portion.

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 2/6] pack-bitmap-write.c: write lookup table extension
  2022-06-27 14:35     ` Derrick Stolee
  2022-06-27 16:12       ` Taylor Blau
@ 2022-06-27 17:10       ` Abhradeep Chakraborty
  1 sibling, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-06-27 17:10 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Taylor Blau

Derrick Stolee <derrickstolee@github.com> wrote:

> Which means that if "k = table[i] - xor_offset" that the xor base is the kth
> bitmap. table_inv[k] gets us the position in this table of that bitmap's
> commit.
>
> (It's also strange to me that the offset is being _subtracted_, but I guess
> the bitmap format requires the xor base to appear first so the offset does
> not need to be a negative number ever.)
>
> This last line is a bit complex.
>
> 	uint32_t xor_offset = selected->xor_offset;
> 	uint32_t xor_row = 0xffffffff;
>
>	if (xor_offset) {
>		uint32_t xor_order = table[i] - xor_offset;
>		xor_row = table_inf[xor_order];
>	}
>
> ...then we can "hashwrite_be32(f, xor_row);" when necessary. I'm not sure
> that we need the "uint32_t xor_order" inside the "if (xor_offset)" block,
> but splitting it helps add clarity to the multi-step computation.

Got it. Will add comments too.

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH v2 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-06-27 14:43     ` Derrick Stolee
@ 2022-06-27 17:42       ` Abhradeep Chakraborty
  2022-06-27 17:49         ` Taylor Blau
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-06-27 17:42 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Taylor Blau

Derrick Stolee <derrickstolee@github.com> wrote:

> I wonder if it makes sense to have it default to 'false' for now, but to
> change that default after the feature has been shipped and running in
> production for a while.

I do not have any opinion. If most reviewers agree on it, I will surely
Set it to false.

> I think you should either use "Git" when talking about the software
> generally, OR use "`git repack --write-bitmap-index` will include..."

Ohh, yeah! Thanks for pointing out.

> s/which/that/
>
> (I'm pretty sure that "that" is better. We're trying to restrict the set
> of repositories we are talking about, not implying that all repositories
> have this property.)

Ok.

> These lines seem misplaced. Maybe they were meant for the previous
> patch?

I mainly used it for testing purpose. That's why I included it in
This patch. But I got your point and will move it to the previous
patch.

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-06-26 13:10   ` [PATCH v2 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
  2022-06-27 14:43     ` Derrick Stolee
@ 2022-06-27 17:47     ` Taylor Blau
  2022-06-27 18:39       ` Abhradeep Chakraborty
  1 sibling, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-06-27 17:47 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Kaartic Sivaram, Derrick Stolee, Abhradeep Chakraborty

On Sun, Jun 26, 2022 at 01:10:14PM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>
> Teach git to provide a way for users to enable/disable bitmap lookup
> table extension by providing a config option named 'writeBitmapLookupTable'.
> Default is true.
>
> Also add test to verify writting of lookup table.
>
> Co-Authored-by: Taylor Blau <me@ttaylorr.com>
> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> Mentored-by: Taylor Blau <me@ttaylorr.com>
> Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>

I think that this was covered earlier in the review of this round, but
in general your Signed-off-by (often abbreviated as "S-o-b") should come
last. The order should be chronological, so I'd probably suggest
something like:

    Mentored-by: Taylor Blau <me@ttaylorr.com>
    Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
    Co-Authored-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> index 5edbb7fe86e..3757616f09c 100644
> --- a/builtin/multi-pack-index.c
> +++ b/builtin/multi-pack-index.c
> @@ -87,6 +87,13 @@ static int git_multi_pack_index_write_config(const char *var, const char *value,
>  			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
>  	}
>
> +	if (!strcmp(var, "pack.writebitmaplookuptable")) {
> +		if (git_config_bool(var, value))
> +			opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;
> +		else
> +			opts.flags &= ~MIDX_WRITE_BITMAP_LOOKUP_TABLE;
> +	}
> +
>  	/*
>  	 * We should never make a fall-back call to 'git_default_config', since
>  	 * this was already called in 'cmd_multi_pack_index()'.
> @@ -123,6 +130,7 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
>  	};
>
>  	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
> +	opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;

I wonder if this should respect pack.writeBitmapLookupTable, too.
Probably both of them should take into account their separate
configuration values, but cleaning up the hashcache one can be done
separately outside of this series.

Everything else looks good.

> diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
> index 899a4a941e1..79be0cf80e6 100644
> --- a/pack-bitmap-write.c
> +++ b/pack-bitmap-write.c
> @@ -713,6 +713,7 @@ static void write_lookup_table(struct hashfile *f,
>  	for (i = 0; i < writer.selected_nr; i++)
>  		table_inv[table[i]] = i;
>
> +	trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository);
>  	for (i = 0; i < writer.selected_nr; i++) {
>  		struct bitmapped_commit *selected = &writer.selected[table[i]];
>  		uint32_t xor_offset = selected->xor_offset;
> @@ -725,6 +726,7 @@ static void write_lookup_table(struct hashfile *f,
>
>  	free(table);
>  	free(table_inv);
> +	trace2_region_leave("pack-bitmap-write", "writing_lookup_table", the_repository);

This region may make more sense to include in the previous commit,
though I don't have a strong feeling about it.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-06-27 17:42       ` Abhradeep Chakraborty
@ 2022-06-27 17:49         ` Taylor Blau
  0 siblings, 0 replies; 162+ messages in thread
From: Taylor Blau @ 2022-06-27 17:49 UTC (permalink / raw)
  To: Abhradeep Chakraborty; +Cc: Derrick Stolee, Git, Kaartic Sivaraam

On Mon, Jun 27, 2022 at 11:12:30PM +0530, Abhradeep Chakraborty wrote:
> Derrick Stolee <derrickstolee@github.com> wrote:
>
> > I wonder if it makes sense to have it default to 'false' for now, but to
> > change that default after the feature has been shipped and running in
> > production for a while.
>
> I do not have any opinion. If most reviewers agree on it, I will surely
> Set it to false.

I think it's definitely a safe approach. I don't have a huge concern
about enabling it earlier, but I don't think we're in a huge rush to add
a new feature here, either.

So I'd be fine to ship this with the default being disabled (IOW, *not*
writing the lookup table). That should give us a window where we can
shake out whatever bugs there are, as is often the case when working
with the bitmap code ;).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 4/6] pack-bitmap: prepare to read lookup table
  2022-06-27 15:12     ` Derrick Stolee
@ 2022-06-27 18:06       ` Abhradeep Chakraborty
  2022-06-27 18:32         ` Derrick Stolee
  2022-06-27 21:49       ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension Taylor Blau
  1 sibling, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-06-27 18:06 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Taylor Blau

Derrick Stolee <derrickstolee@github.com> wrote:

> I didn't check the previous patches, but your sign-off should be the
> last line of the message. (You are singing off on all previous content,
> and any later content is not covered by your sign-off.)

Ohhh, got it. I didn't know about it before.

> nit: This alignment should use four spaces at the end so the second phrase
> matches the start of the previous phrase. Like this:
>
>		if (flags & BITMAP_OPT_LOOKUP_TABLE &&
>		    git_env_bool("GIT_TEST_READ_COMMIT_TABLE", 1)) {
>
> Perhaps it looked right in your editor because it renders tabs as 4 spaces
> instead of 8 spaces.

I don't know why but my editor sometimes do some weird things for alignments.
I generally use VS Code. But for alignment related problems, sometimes I have
to use vi editor.

> Here, we _do_ want to keep the st_mult(). Is the st_add() still necessary? It
> seems this is a leftover from the previous version that had the 4-byte flag
> data.
>
> We set table_size to zero above. We could drop that initialization and instead
> have this after the "size_t triplet_sz" definition:
>
>			size_t table_size = st_mult(ntohl(header->entry_count),
>						    triplet_sz));

Yes, you're right. Will update.

> I expected something different: binary search on the triplets where the comparison is
> made by looking up the OID from the [multi-]pack-index and comparing that OID to the
> commit OID we are looking for.
>
> I'm not convinced that the binary search I had in mind is meaningfully faster than
> what you've implemented here, so I'm happy to leave it as you have it. We can investigate
> if that full search on the pack-index matters at all (it probably doesn't).

Good idea! Thanks!

> While there is potential that this is wasteful, it's probably not that huge,
> so we can start with the "maximum XOR depth" and then reconsider a smaller
> allocation in the future.

Ok.

> We should consider ensuring that also "size < bitmap_git->entry_count".
> Better yet, create an xor_positions_alloc variable that is initialized
> to the entry_count value.
>
> "size" should probably be xor_positions_nr.
> 
> > +			xor_positions[size++] = xor_pos;
> > +			triplet = bitmap_get_triplet(bitmap_git, xor_pos);
> > +			xor_pos = triplet_get_xor_pos(triplet);
> > +		}
> 
> (at this point, "if (xor_positions_nr >= xor_positions_alloc)", then error
> out since the file must be malformed with an XOR loop.)

Got it.

> Since we are storing the bitmap here as we "pop" the stack, should we be
> looking for a stored bitmap while pushing to the stack in the previous loop?
> That would save time when using multiple bitmaps with common XOR bases.

Yeah, I also am thinking about it. Will make a try.

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 2/6] pack-bitmap-write.c: write lookup table extension
  2022-06-27 16:05     ` Taylor Blau
@ 2022-06-27 18:29       ` Abhradeep Chakraborty
  0 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-06-27 18:29 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Derrick Stolee

Taylor Blau <me@ttaylorr.com> wrote:

> We should probably leave this as a pointer to an off_t, since that is a
> more appropriate type for keeping track of an offset within a file (and
> indeed it is the return type of hashfile_total()).
> 
> But since it's platform-dependent, we should make sure to cast it to a
> uint64_t before writing it as part of the lookup table.

Hmm, will make the necessary changes.

> That at least removes a side-effect from the implementation of
> write_selected_commits_v1() and brings the creation of the
> commit_positions array closer to where it's being used, while still
> maintaining the constant-time lookups. So that may be a good
> alternative, but I'm curious of your thoughts.

Sounds good to me :)

> I think the construction of table and table_inv could definitely benefit
> from a comment here indicating what they're used for and what they
> contain (e.g., "table maps abc to xyz").

Yeah, true. Will add comments.

Thanks for the other suggestions also :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 4/6] pack-bitmap: prepare to read lookup table
  2022-06-27 18:06       ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table Abhradeep Chakraborty
@ 2022-06-27 18:32         ` Derrick Stolee
  0 siblings, 0 replies; 162+ messages in thread
From: Derrick Stolee @ 2022-06-27 18:32 UTC (permalink / raw)
  To: Abhradeep Chakraborty; +Cc: Git, Kaartic Sivaraam, Taylor Blau

On 6/27/2022 2:06 PM, Abhradeep Chakraborty wrote:
> Derrick Stolee <derrickstolee@github.com> wrote:
>> nit: This alignment should use four spaces at the end so the second phrase
>> matches the start of the previous phrase. Like this:
>>
>> 		if (flags & BITMAP_OPT_LOOKUP_TABLE &&
>> 		    git_env_bool("GIT_TEST_READ_COMMIT_TABLE", 1)) {
>>
>> Perhaps it looked right in your editor because it renders tabs as 4 spaces
>> instead of 8 spaces.
> 
> I don't know why but my editor sometimes do some weird things for alignments.
> I generally use VS Code. But for alignment related problems, sometimes I have
> to use vi editor.

I also use VS Code, and I noticed a few spacing issues recently, especially
in .txt files.

I submitted a patch [1] to improve the contrib/vscode/init.sh script, which
adds some helpful config settings to your Git workspace. Please take a look
and see how it works for you.

Thanks,
-Stolee

[1] https://lore.kernel.org/git/pull.1271.git.1656354587496.gitgitgadget@gmail.com

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-06-27 17:47     ` Taylor Blau
@ 2022-06-27 18:39       ` Abhradeep Chakraborty
  2022-06-29 20:11         ` Taylor Blau
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-06-27 18:39 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Derrick Stolee

Taylor Blau <me@ttaylorr.com> wrote:

> Probably both of them should take into account their separate
> configuration values, but cleaning up the hashcache one can be done
> separately outside of this series.

Actually, it does respect the `pack.writebitmaplookuptable` config.
As pack.writebitmaplookuptable is by default true (for this patch
Series), this line enables it by default. If `pack.writebitmaplookuptable`
Set to false, the proposed change in the `git_multi_pack_index_write_config`
function disables this flag.

> This region may make more sense to include in the previous commit,
> though I don't have a strong feeling about it.

Ok. Will move it to the previous patch.

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension
  2022-06-26 13:10   ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
  2022-06-27 15:12     ` Derrick Stolee
@ 2022-06-27 21:38     ` Taylor Blau
  2022-06-28 19:25       ` Abhradeep Chakraborty
  1 sibling, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-06-27 21:38 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Kaartic Sivaram, Derrick Stolee, Abhradeep Chakraborty

On Sun, Jun 26, 2022 at 01:10:15PM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>
> Earlier change teaches Git to write bitmap lookup table. But Git
> does not know how to parse them.
>
> Teach Git to parse the existing bitmap lookup table. The older
> versions of git are not affected by it. Those versions ignore the

s/git/Git

> lookup table.
>
> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> Mentored-by: Taylor Blau <me@ttaylorr.com>
> Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
> ---
>  pack-bitmap.c                 | 193 ++++++++++++++++++++++++++++++++--
>  t/t5310-pack-bitmaps.sh       |   7 ++
>  t/t5326-multi-pack-bitmaps.sh |   1 +
>  3 files changed, 191 insertions(+), 10 deletions(-)
>
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 36134222d7a..9e09c5824fc 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -82,6 +82,12 @@ struct bitmap_index {
>  	/* The checksum of the packfile or MIDX; points into map. */
>  	const unsigned char *checksum;
>
> +	/*
> +	 * If not NULL, this point into the commit table extension
> +	 * (within map).

It may be worth replacing "within map" to "within the memory mapped
region `map`" to make clear that this points somewhere within the mmap.

> +	 */
> +	unsigned char *table_lookup;
> +


> @@ -185,6 +191,22 @@ static int load_bitmap_header(struct bitmap_index *index)
>  			index->hashes = (void *)(index_end - cache_size);
>  			index_end -= cache_size;
>  		}
> +
> +		if (flags & BITMAP_OPT_LOOKUP_TABLE &&
> +			git_env_bool("GIT_TEST_READ_COMMIT_TABLE", 1)) {

I should have commented on this in an earlier round, but I wonder what
the behavior should be when we have BITMAP_OPT_LOOKUP_TABLE in our
flags, but GIT_TEST_READ_COMMIT_TABLE is disabled.

Right now, it doesn't matter, since there aren't any flags in bits above
BITMAP_OPT_LOOKUP_TABLE. But in the future, if there was some
BITMAP_OPT_FOO that was newer than BITMAP_OPT_LOOKUP_TABLE, we would
want to be able to read it without needing to read the lookup table.

At least, I think that should be true, though I would be interested to
hear if anybody has a differing opinion there.

> +			size_t table_size = 0;
> +			size_t triplet_sz = st_add3(sizeof(uint32_t),    /* commit position */
> +							sizeof(uint64_t),    /* offset */
> +							sizeof(uint32_t));    /* xor offset */

I don't think we need a st_add3() call here, since the size of these
three types is known to be small and thus won't overflow the available
range of size_t.

> +			table_size = st_add(table_size,
> +					st_mult(ntohl(header->entry_count),
> +						triplet_sz));

And table_size here is going to start off at zero, so the outer st_add()
call isn't necessary, either. This should instead be:

    size_t table_size = st_mult(ntohl(header->entry_count),
                                sizeof(uint32_t) + sizeof(uint64_t) + sizeof(uint32_t));

It might be nice to have triplet_sz #define'd somewhere else, since
there are a handful of declarations in this patch that are all
identical. Probably something like:

    #define BITMAP_LOOKUP_TABLE_RECORD_WIDTH (sizeof(uint32_t) + sizeof(uint64_t) + sizeof(uin32_t))

or even:

    /*
     * The width in bytes of a single record in the lookup table
     * extension:
     *
     *   (commit_pos, offset, xor_pos)
     *
     * whose fields are 32-, 64-, and 32-bits wide, respectively.
     */
    #define BITMAP_LOOKUP_TABLE_RECORD_WIDTH (16)

> +			if (table_size > index_end - index->map - header_size)
> +				return error("corrupted bitmap index file (too short to fit lookup table)");

if we decide to still recognize the lookup table extension without
*reading* from it when GIT_TEST_READ_COMMIT_TABLE is unset, I think we
should do something like:

    if (git_env_bool("GIT_TEST_READ_COMMIT_TABLE", 1))
        index->table_lookup = (void *)(index_end - table_size);
    index_end -= table_size;

...where the subtraction on index_end happens unconditionally.

> +static inline const void *bitmap_get_triplet(struct bitmap_index *bitmap_git, uint32_t xor_pos)
> +{
> +	size_t triplet_sz = st_add3(sizeof(uint32_t), sizeof(uint64_t), sizeof(uint32_t));

Same note about the #define constant here.

> +	const void *p = bitmap_git->table_lookup + st_mult(xor_pos, triplet_sz);

And this can be returned directly. Just:

    return bitmap_git->table_lookup + st_mult(xor_pos, BITMAP_LOOKUP_TABLE_RECORD_WIDTH);

although I wonder: why "xor_pos" and not just "pos" here?

> +static uint64_t triplet_get_offset(const void *triplet)
> +{
> +	const void *p = (unsigned char*) triplet + sizeof(uint32_t);
> +	return get_be64(p);
> +}
> +
> +static uint32_t triplet_get_xor_pos(const void *triplet)
> +{
> +	const void *p = (unsigned char*) triplet + st_add(sizeof(uint32_t), sizeof(uint64_t));
> +	return get_be32(p);
> +}

I wonder if we could get rid of these functions altogether and return a
small structure like:

    struct bitmap_lookup_table_record {
        uint32_t commit_pos;
        uint64_t offset;
        uint32_t xor_pos;
    };

or similar.

> +static int triplet_cmp(const void *va, const void *vb)
> +{
> +	int result = 0;
> +	uint32_t *a = (uint32_t *) va;
> +	uint32_t b = get_be32(vb);

Hmm. This is a little tricky to read. Here we're expecting "va" to hold
commit_pos from below, and "vb" to be a pointer at a lookup record.
Everything here is right, though I wonder if a comment or two might
clarify why one is "*(uint32_t *)va" and the other is "get_be32(vb)".

> +	if (*a > b)
> +		result = 1;
> +	else if (*a < b)
> +		result = -1;
> +	else
> +		result = 0;

Let's just return the result of the comparison directly here. And while
I'm looking at it, I think we can avoid dereferencing "a" on each use,
and instead just dereference va on assignment after casting, e.g.:

    uint32_t a = *(uint32_t*)va;

> +static uint32_t bsearch_pos(struct bitmap_index *bitmap_git, struct object_id *oid,
> +						uint32_t *result)
> +{
> +	int found;
> +
> +	if (bitmap_git->midx)

Nit: let's use the bitmap_is_midx() helper here instead of looking at
bitamp_git->midx directly.

> +		found = bsearch_midx(oid, bitmap_git->midx, result);
> +	else
> +		found = bsearch_pack(oid, bitmap_git->pack, result);
> +
> +	return found;
> +}

Makes sense.

> +static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_git,
> +					  struct commit *commit)
> +{
> +	uint32_t commit_pos, xor_pos;
> +	uint64_t offset;
> +	int flags;
> +	const void *triplet = NULL;
> +	struct object_id *oid = &commit->object.oid;
> +	struct ewah_bitmap *bitmap;
> +	struct stored_bitmap *xor_bitmap = NULL;
> +	size_t triplet_sz = st_add3(sizeof(uint32_t), sizeof(uint64_t), sizeof(uint32_t));
> +
> +	int found = bsearch_pos(bitmap_git, oid, &commit_pos);
> +
> +	if (!found)
> +		return NULL;
> +
> +	triplet = bsearch(&commit_pos, bitmap_git->table_lookup, bitmap_git->entry_count,
> +						triplet_sz, triplet_cmp);
> +	if (!triplet)
> +		return NULL;

OK. If you don't mind, I'm going to "think aloud" while I read through
this function to make sure that we're on the same page.

First thing is to convert the commit OID we're looking for into its
position within the corresponding pack index or MIDX file so that we can
use it as a search key to locate in the lookup table. If we didn't find
anything, or the commit doesn't exist in our pack / MIDX, nothing to do.

> +
> +	offset = triplet_get_offset(triplet);
> +	xor_pos = triplet_get_xor_pos(triplet);

Otherwise, record its offset and XOR "offset".

> +
> +	if (xor_pos != 0xffffffff) {
> +		int xor_flags;
> +		uint64_t offset_xor;
> +		uint32_t *xor_positions;
> +		struct object_id xor_oid;
> +		size_t size = 0;
> +
> +		ALLOC_ARRAY(xor_positions, bitmap_git->entry_count);

If we are XOR'd with another bitmap, make a stack of those bitmaps so
that we can decompress ourself.

I'm a little surprised that we're allocating an array as large as
bitmap_git->entry_count. It's not wrong, but it does waste some bytes
since we likely don't often have these long chains of XOR'd bitmaps.

We should instead allocate a smaller array and grow it over time (search
for examples of ALLOC_GROW() to see the canonical way to do this in
Git's codebase).

> +		while (xor_pos != 0xffffffff) {
> +			xor_positions[size++] = xor_pos;
> +			triplet = bitmap_get_triplet(bitmap_git, xor_pos);
> +			xor_pos = triplet_get_xor_pos(triplet);
> +		}
> +
> +		while (size){

Nit: missing space after ")" and before "{".

> +			xor_pos = xor_positions[size - 1];
> +			triplet = bitmap_get_triplet(bitmap_git, xor_pos);

We already have to get the triplets in the loop above, and then we dig
them back out here. Would it be easier to keep track of a list of
pointers into the mmaped region instead of looking up these triplets
each time?

> +			commit_pos = get_be32(triplet);
> +			offset_xor = triplet_get_offset(triplet);
> +
> +			if (nth_bitmap_object_oid(bitmap_git, &xor_oid, commit_pos) < 0) {

Should it be an error if we can't look up the object's ID here? I'd
think so.

> +				free(xor_positions);
> +				return NULL;
> +			}
> +
> +			bitmap_git->map_pos = offset_xor + sizeof(uint32_t) + sizeof(uint8_t);
> +			xor_flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
> +			bitmap = read_bitmap_1(bitmap_git);
> +
> +			if (!bitmap){

Nit: missing space between ")" and "{".

> +				free(xor_positions);
> +				return NULL;
> +			}
> +
> +			xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_oid, xor_bitmap, xor_flags);
> +			size--;

Makes sense. Nicely done!

> +		}
> +
> +		free(xor_positions);
> +	}
> +
> +	bitmap_git->map_pos = offset + sizeof(uint32_t) + sizeof(uint8_t);
> +	flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
> +	bitmap = read_bitmap_1(bitmap_git);

Great, and now we can finally read the original bitmap that we wanted
to...

> +	if (!bitmap)
> +		return NULL;
> +
> +	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags);

...and XOR it with the thing we built up in the loop. Very nicely done.
Do we have a good way to make sure that we're testing this code in CI?
It *seems* correct to me, but of course, we should have a computer check
that this produces OK results, not a human ;).

> +}
> +
>  struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
>  				      struct commit *commit)
>  {
>  	khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps,
>  					   commit->object.oid);
> -	if (hash_pos >= kh_end(bitmap_git->bitmaps))
> -		return NULL;
> +	if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
> +		struct stored_bitmap *bitmap = NULL;
> +		if (!bitmap_git->table_lookup)
> +			return NULL;
> +
> +		/* NEEDSWORK: cache misses aren't recorded */

For what it's worth, I think that it's completely fine to leave this as
a NEEDSWORK for the purposes of this series. I think we plausibly could
improve this in certain scenarios by finding some threshold on cache
misses when we should just fault in all bitmaps, but that can easily be
done on top.

> +		bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
> +		if(!bitmap)

Nit: missing space between "if" and "(".

> +			return NULL;
> +		return lookup_stored_bitmap(bitmap);
> +	}
>  	return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
>  }
>
> @@ -1699,9 +1861,13 @@ void test_bitmap_walk(struct rev_info *revs)
>  	if (revs->pending.nr != 1)
>  		die("you must specify exactly one commit to test");
>
> -	fprintf(stderr, "Bitmap v%d test (%d entries loaded)\n",
> +	fprintf(stderr, "Bitmap v%d test (%d entries)\n",
>  		bitmap_git->version, bitmap_git->entry_count);
>
> +	if (!bitmap_git->table_lookup)
> +		fprintf(stderr, "Bitmap v%d test (%d entries loaded)\n",
> +			bitmap_git->version, bitmap_git->entry_count);
> +

I think we should probably print just one or the other here, perhaps
like:

    fprintf(stderr, "Bitmap v%d test (%d entries%s)",
            bitmap_git->version,
            bitmap_git->entry_count,
            bitmap_git->table_lookup ? "" : " loaded");

>  	root = revs->pending.objects[0].item;
>  	bm = bitmap_for_commit(bitmap_git, (struct commit *)root);
>
> @@ -1753,10 +1919,16 @@ void test_bitmap_walk(struct rev_info *revs)
>
>  int test_bitmap_commits(struct repository *r)
>  {
> -	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
> +	struct bitmap_index *bitmap_git = NULL;
>  	struct object_id oid;
>  	MAYBE_UNUSED void *value;
>
> +	/* As this function is only used to print bitmap selected
> +	 * commits, we don't have to read the commit table.
> +	 */
> +	setenv("GIT_TEST_READ_COMMIT_TABLE", "0", 1);
> +
> +	bitmap_git = prepare_bitmap_git(r);
>  	if (!bitmap_git)
>  		die("failed to load bitmap indexes");
>
> @@ -1764,6 +1936,7 @@ int test_bitmap_commits(struct repository *r)
>  		printf("%s\n", oid_to_hex(&oid));
>  	});
>
> +	setenv("GIT_TEST_READ_COMMIT_TABLE", "1", 1);
>  	free_bitmap_index(bitmap_git);

Hmm. I'm not sure I follow the purpose of tweaking
GIT_TEST_READ_COMMIT_TABLE like this with setenv(). Are we trying to
avoid reading the lookup table? If so, why? I'd rather avoid
manipulating the environment directly like this, and instead have a
function we could call to fault in all of the bitmaps (when a lookup
table exists, otherwise do nothing).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension
  2022-06-27 15:12     ` Derrick Stolee
  2022-06-27 18:06       ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table Abhradeep Chakraborty
@ 2022-06-27 21:49       ` Taylor Blau
  2022-06-28  8:59         ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table Abhradeep Chakraborty
  1 sibling, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-06-27 21:49 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaram,
	Abhradeep Chakraborty

On Mon, Jun 27, 2022 at 11:12:09AM -0400, Derrick Stolee wrote:
> On 6/26/2022 9:10 AM, Abhradeep Chakraborty via GitGitGadget wrote:
> > +			table_size = st_add(table_size,
> > +					st_mult(ntohl(header->entry_count),
> > +						triplet_sz));
>
> Here, we _do_ want to keep the st_mult(). Is the st_add() still necessary? It
> seems this is a leftover from the previous version that had the 4-byte flag
> data.
>
> We set table_size to zero above. We could drop that initialization and instead
> have this after the "size_t triplet_sz" definition:
>
> 			size_t table_size = st_mult(ntohl(header->entry_count),
> 						    triplet_sz));

Well put, thank you.

> > +			if (table_size > index_end - index->map - header_size)
> > +				return error("corrupted bitmap index file (too short to fit lookup table)");
>
> Please add "_(...)" around the error message so it can be translated.

I missed this in my own review, but yes: this is a good practice.

> > +	if (bitmap_git->midx)
> > +		found = bsearch_midx(oid, bitmap_git->midx, result);
> > +	else
> > +		found = bsearch_pack(oid, bitmap_git->pack, result);
> > +
> > +	return found;
>
> Here, we are doing a binary search on the entire list of packed objects, which could
> use quite a few more hops than a binary search on the bitmapped commits.

I think this is the best we can do if we make the key to our bsearch
through the lookup table be an index into the pack index / MIDX. But...

> > +static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_git,
> > +					  struct commit *commit)
> ...
> > +	int found = bsearch_pos(bitmap_git, oid, &commit_pos);
> > +
> > +	if (!found)
> > +		return NULL;
> > +
> > +	triplet = bsearch(&commit_pos, bitmap_git->table_lookup, bitmap_git->entry_count,
> > +						triplet_sz, triplet_cmp);
>
> But I see, you are searching the pack-index for the position in the index, and _then_
> searching the bitmap lookup table based on that position value.
>
> I expected something different: binary search on the triplets where the comparison is
> made by looking up the OID from the [multi-]pack-index and comparing that OID to the
> commit OID we are looking for.
>
> I'm not convinced that the binary search I had in mind is meaningfully faster than
> what you've implemented here, so I'm happy to leave it as you have it. We can investigate
> if that full search on the pack-index matters at all (it probably doesn't).

...exactly my thoughts, too. It's possible that it would be faster to
key this search on the object_id "oid" above, and then convert each of
the entries in the lookup table from a uint32_t into an object_id by
calling nth_bitmap_object_oid() repeatedly.

I *think* that what Abhradeep wrote here is going to be faster more
often than not since it makes more efficient use of the page cache
rather than switching between reads across different memory mapped
regions at each point in the binary search.

But of course that depends on a number of factors. Abhradeep: if you're
up for it, I think it would be worth trying it both ways and seeing if
one produces a meaningful speed-up or slow-down over the other. Like I
said: my guess is that what you have now will be faster, but I don't
have a clear sense that that is true without trying it both ways ;-).

> > +	if (!triplet)
> > +		return NULL;
> > +
> > +	offset = triplet_get_offset(triplet);
> > +	xor_pos = triplet_get_xor_pos(triplet);
> > +
> > +	if (xor_pos != 0xffffffff) {
> > +		int xor_flags;
> > +		uint64_t offset_xor;
> > +		uint32_t *xor_positions;
> > +		struct object_id xor_oid;
> > +		size_t size = 0;
> > +
> > +		ALLOC_ARRAY(xor_positions, bitmap_git->entry_count);
>
> While there is potential that this is wasteful, it's probably not that huge,
> so we can start with the "maximum XOR depth" and then reconsider a smaller
> allocation in the future.

There is no maximum XOR depth, to my knowledge. We do have a maximum XOR
*offset*, which says we cannot XOR-compress a bitmap with an entry more
than 160 entries away from the current one. But in theory every commit
could be XOR compressed with the one immediately proceeding it, so the
maximum depth could be as long as the entry_count itself.

I think starting off with a small array and then letting it grow
according to alloc_nr() would be fine here, since it will grow more and
more each time, so the amount of times we have to reallocate the buffer
will tail off over time.

If we were really concerned about it, we could treat the buffer as a
static pointer and reuse it over time (making sure to clear out the
portions of it we're going to reuse, or otherwise ensuring that we don't
read old data). But I doubt it matters much either way in practice: the
individual records are small (at just 4 bytes each) and entry_count is
often less than 1,000, so I think this probably has a vanishingly small
impact.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 6/6] p5310-pack-bitmaps.sh: enable pack.writeReverseIndex for testing
  2022-06-26 13:10   ` [PATCH v2 6/6] p5310-pack-bitmaps.sh: enable pack.writeReverseIndex for testing Abhradeep Chakraborty via GitGitGadget
@ 2022-06-27 21:50     ` Taylor Blau
  2022-06-28  8:01       ` Abhradeep Chakraborty
  0 siblings, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-06-27 21:50 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Kaartic Sivaram, Derrick Stolee, Abhradeep Chakraborty

On Sun, Jun 26, 2022 at 01:10:17PM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>
> Enable pack.writeReverseIndex to true to see the effect of writing
> the reverse index in the existing bitmap tests (with and without
> lookup table).

I think we should swap the order of these final two patches, since we're
primarily interested in the difference between using a reverse index
with and without the lookup table.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 5/6] bitmap-lookup-table: add performance tests for lookup table
  2022-06-26 13:10   ` [PATCH v2 5/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
@ 2022-06-27 21:53     ` Taylor Blau
  2022-06-28  7:58       ` Abhradeep Chakraborty
  0 siblings, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-06-27 21:53 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Kaartic Sivaram, Derrick Stolee, Abhradeep Chakraborty

On Sun, Jun 26, 2022 at 01:10:16PM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>
> Add performance tests to verify the performance of lookup table.
>
> Lookup table makes Git run faster in most of the cases. Below is the
> result of `t/perf/p5310-pack-bitmaps.sh`.`perf/p5326-multi-pack-bitmaps.sh`
> gives similar result. The repository used in the test is linux kernel.
>
> Test                                                      this tree
> --------------------------------------------------------------------------
> 5310.4: repack to disk (lookup=false)                   295.94(250.45+15.24)
> 5310.5: simulated clone                                 12.52(5.07+1.40)
> 5310.6: simulated fetch                                 1.89(2.94+0.24)
> 5310.7: pack to file (bitmap)                           41.39(20.33+7.20)
> 5310.8: rev-list (commits)                              0.98(0.59+0.12)
> 5310.9: rev-list (objects)                              3.40(3.27+0.10)
> 5310.10: rev-list with tag negated via --not		0.07(0.02+0.04)
>          --all (objects)
> 5310.11: rev-list with negative tag (objects)           0.23(0.16+0.06)
> 5310.12: rev-list count with blob:none                  0.26(0.18+0.07)
> 5310.13: rev-list count with blob:limit=1k              6.45(5.94+0.37)
> 5310.14: rev-list count with tree:0                     0.26(0.18+0.07)
> 5310.15: simulated partial clone                        4.99(3.19+0.45)
> 5310.19: repack to disk (lookup=true)                   269.67(174.70+21.33)
> 5310.20: simulated clone                                11.03(5.07+1.11)
> 5310.21: simulated fetch                                0.79(0.79+0.17)
> 5310.22: pack to file (bitmap)                          43.03(20.28+7.43)
> 5310.23: rev-list (commits)                             0.86(0.54+0.09)
> 5310.24: rev-list (objects)                             3.35(3.26+0.07)
> 5310.25: rev-list with tag negated via --not		0.05(0.00+0.03)
> 	 --all (objects)
> 5310.26: rev-list with negative tag (objects)           0.22(0.16+0.05)
> 5310.27: rev-list count with blob:none                  0.22(0.16+0.05)
> 5310.28: rev-list count with blob:limit=1k              6.45(5.87+0.31)
> 5310.29: rev-list count with tree:0                     0.22(0.16+0.05)
> 5310.30: simulated partial clone                        5.17(3.12+0.48)
>
> Test 4-15 are tested without using lookup table. Same tests are
> repeated in 16-30 (using lookup table).
>
> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> Mentored-by: Taylor Blau <me@ttaylorr.com>
> Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
> ---
>  t/perf/p5310-pack-bitmaps.sh       | 77 ++++++++++++++-----------
>  t/perf/p5326-multi-pack-bitmaps.sh | 93 ++++++++++++++++--------------
>  2 files changed, 94 insertions(+), 76 deletions(-)
>
> diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
> index 7ad4f237bc3..6ff42bdd391 100755
> --- a/t/perf/p5310-pack-bitmaps.sh
> +++ b/t/perf/p5310-pack-bitmaps.sh
> @@ -16,39 +16,48 @@ test_expect_success 'setup bitmap config' '
>  	git config pack.writebitmaps true
>  '
>
> -# we need to create the tag up front such that it is covered by the repack and
> -# thus by generated bitmaps.
> -test_expect_success 'create tags' '
> -	git tag --message="tag pointing to HEAD" perf-tag HEAD
> -'
> -
> -test_perf 'repack to disk' '
> -	git repack -ad
> -'
> -
> -test_full_bitmap
> -
> -test_expect_success 'create partial bitmap state' '
> -	# pick a commit to represent the repo tip in the past
> -	cutoff=$(git rev-list HEAD~100 -1) &&
> -	orig_tip=$(git rev-parse HEAD) &&
> -
> -	# now kill off all of the refs and pretend we had
> -	# just the one tip
> -	rm -rf .git/logs .git/refs/* .git/packed-refs &&
> -	git update-ref HEAD $cutoff &&
> -
> -	# and then repack, which will leave us with a nice
> -	# big bitmap pack of the "old" history, and all of
> -	# the new history will be loose, as if it had been pushed
> -	# up incrementally and exploded via unpack-objects
> -	git repack -Ad &&
> -
> -	# and now restore our original tip, as if the pushes
> -	# had happened
> -	git update-ref HEAD $orig_tip
> -'
> -
> -test_partial_bitmap
> +test_bitmap () {
> +    local enabled="$1"
> +
> +	# we need to create the tag up front such that it is covered by the repack and
> +	# thus by generated bitmaps.
> +	test_expect_success 'create tags' '
> +		git tag --message="tag pointing to HEAD" perf-tag HEAD
> +	'

I think this "create tags" step can happen outside of the test_bitmap()
function, since it should only need to be done once, right?

> +	test_expect_success "use lookup table: $enabled" '
> +		git config pack.writeBitmapLookupTable '"$enabled"'
> +	'
> +
> +	test_perf "repack to disk (lookup=$enabled)" '
> +		git repack -ad
> +	'

And I think these two tests could be combined, since this could just
become:

    git -c pack.writeBitmapLookupTable "$enabled" repack -ad

right?

> +	test_full_bitmap
> +
> +    test_expect_success "create partial bitmap state (lookup=$enabled)" '

There is some funky spacing going on here, at least in my email client.
Could you double check that tabs are used consistently here?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 5/6] bitmap-lookup-table: add performance tests for lookup table
  2022-06-27 21:53     ` Taylor Blau
@ 2022-06-28  7:58       ` Abhradeep Chakraborty
  2022-06-29 20:40         ` Taylor Blau
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-06-28  7:58 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Derrick Stolee

Taylor Blau <me@ttaylorr.com> wrote:

> I think this "create tags" step can happen outside of the test_bitmap()
> function, since it should only need to be done once, right?

Yeah, I also think the same. That's why I tried to not include in the
Function but for some reason, one test is failing -

  perf 24 - rev-list with tag negated via --not --all (objects):
  running: 
  		git rev-list perf-tag --not --all --use-bitmap-index --objects >/dev/null
	
  fatal: ambiguous argument 'perf-tag': unknown revision or path not in the working tree.
  Use '--' to separate paths from revisions, like this:
  'git <command> [<revision>...] -- [<file>...]'
  not ok 24 - rev-list with tag negated via --not --all (objects)

One thing to note here is that the first `test_bitmap` call always
Passes. But the second `test_bitmap` call fails due to above error.
It throws error irrespective of any parameters for second `test_bitmap`.

If I put it inside the function it doesn't throw any error! 

For this reason, I put it into the function. Do you have any idea
why this happend?

> And I think these two tests could be combined, since this could just
> become:
>
>    git -c pack.writeBitmapLookupTable "$enabled" repack -ad
>
> right?

Yeah, sure.

> There is some funky spacing going on here, at least in my email client.
> Could you double check that tabs are used consistently here?

This is due to my editor's spacing issues. All seems fine when I look at
it in my editor. But actually it is not. Fixing it.

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 6/6] p5310-pack-bitmaps.sh: enable pack.writeReverseIndex for testing
  2022-06-27 21:50     ` Taylor Blau
@ 2022-06-28  8:01       ` Abhradeep Chakraborty
  0 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-06-28  8:01 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Derrick Stolee

Taylor Blau <me@ttaylorr.com> wrote:

> I think we should swap the order of these final two patches, since we're
> primarily interested in the difference between using a reverse index
> with and without the lookup table.

Ok. Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 4/6] pack-bitmap: prepare to read lookup table
  2022-06-27 21:49       ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension Taylor Blau
@ 2022-06-28  8:59         ` Abhradeep Chakraborty
  2022-06-29 20:22           ` Taylor Blau
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-06-28  8:59 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Derrick Stolee

Taylor Blau <me@ttaylorr.com> wrote:

> ...exactly my thoughts, too. It's possible that it would be faster to
> key this search on the object_id "oid" above, and then convert each of
> the entries in the lookup table from a uint32_t into an object_id by
> calling nth_bitmap_object_oid() repeatedly.
>
> I *think* that what Abhradeep wrote here is going to be faster more
> often than not since it makes more efficient use of the page cache
> rather than switching between reads across different memory mapped
> regions at each point in the binary search.
>
> But of course that depends on a number of factors. Abhradeep: if you're
> up for it, I think it would be worth trying it both ways and seeing if
> one produces a meaningful speed-up or slow-down over the other. Like I
> said: my guess is that what you have now will be faster, but I don't
> have a clear sense that that is true without trying it both ways ;-).

Ok. Let me try both the ways. In my opinion, I think my version has
less searching and less computation. So, I want to stick with this
version. But I also like to try the other one once so that we can
get the best out of these two.

> I think starting off with a small array and then letting it grow
> according to alloc_nr() would be fine here, since it will grow more and
> more each time, so the amount of times we have to reallocate the buffer
> will tail off over time.

What should be the size of that array?

> If we were really concerned about it, we could treat the buffer as a
> static pointer and reuse it over time (making sure to clear out the
> portions of it we're going to reuse, or otherwise ensuring that we don't
> read old data). But I doubt it matters much either way in practice: the
> individual records are small (at just 4 bytes each) and entry_count is
> often less than 1,000, so I think this probably has a vanishingly small
> impact.

Before submitting it to the mailing list, I did use the ALLOC_GROW macro
function. But my version was worse than yours. For every iteration I was
reallocating the array to support `size+1` positions. But later I drop
the code as this might be very much expensive.

Then I wrote this code. As `table` array and `table_inv` array allocate
this size of arrays (though all the indices are used), I thought it
would not be a problem if I use an array of this size for a small amount
of time.

Honestly, I don't like to realloc arrays. Because as far as I can remember,
realloc allocates a new array internally and copies the items from the old
array to the new array. This irritates me.

But at the same time, it is also true that in most cases we might not need
this amount of space.

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension
  2022-06-27 21:38     ` Taylor Blau
@ 2022-06-28 19:25       ` Abhradeep Chakraborty
  2022-06-29 20:37         ` Taylor Blau
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-06-28 19:25 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Derrick Stolee


Ohh, sorry! Looks like I missed this comment!

Taylor Blau <me@ttaylorr.com> wrote:

> It may be worth replacing "within map" to "within the memory mapped
> region `map`" to make clear that this points somewhere within the mmap.

Ok.

> I should have commented on this in an earlier round, but I wonder what
> the behavior should be when we have BITMAP_OPT_LOOKUP_TABLE in our
> flags, but GIT_TEST_READ_COMMIT_TABLE is disabled.
>
> Right now, it doesn't matter, since there aren't any flags in bits above
> BITMAP_OPT_LOOKUP_TABLE. But in the future, if there was some
> BITMAP_OPT_FOO that was newer than BITMAP_OPT_LOOKUP_TABLE, we would
> want to be able to read it without needing to read the lookup table.
>
> At least, I think that should be true, though I would be interested to
> hear if anybody has a differing opinion there.

Oh right! I didn't think about it. In that case, we should still subtract
The table size from the last index_size. In that way, These sections will
Not be overlapped.

> And table_size here is going to start off at zero, so the outer st_add()
> call isn't necessary, either. This should instead be:
>
>     size_t table_size = st_mult(ntohl(header->entry_count),
>                                 sizeof(uint32_t) + sizeof(uint64_t) + sizeof(uint32_t));
>
> It might be nice to have triplet_sz #define'd somewhere else, since
> there are a handful of declarations in this patch that are all
> identical. Probably something like:
>
>     #define BITMAP_LOOKUP_TABLE_RECORD_WIDTH (sizeof(uint32_t) + sizeof(uint64_t) + sizeof(uin32_t))
>
> or even:
>
>     /*
>      * The width in bytes of a single record in the lookup table
>      * extension:
>      *
>      *   (commit_pos, offset, xor_pos)
>      *
>      * whose fields are 32-, 64-, and 32-bits wide, respectively.
>      */
>      #define BITMAP_LOOKUP_TABLE_RECORD_WIDTH (16)

Seems perfect to me.

> if we decide to still recognize the lookup table extension without
> *reading* from it when GIT_TEST_READ_COMMIT_TABLE is unset, I think we
> should do something like:
>
>     if (git_env_bool("GIT_TEST_READ_COMMIT_TABLE", 1))
>         index->table_lookup = (void *)(index_end - table_size);
>     index_end -= table_size;
>
> ...where the subtraction on index_end happens unconditionally.

Right. Thanks!

> I wonder if we could get rid of these functions altogether and return a
> small structure like:
>
>     struct bitmap_lookup_table_record {
>         uint32_t commit_pos;
>         uint64_t offset;
>         uint32_t xor_pos;
>     };
>
> or similar.

Ok.

> Hmm. This is a little tricky to read. Here we're expecting "va" to hold
> commit_pos from below, and "vb" to be a pointer at a lookup record.
> Everything here is right, though I wonder if a comment or two might
> clarify why one is "*(uint32_t *)va" and the other is "get_be32(vb)".

Sure. Will add comments.

> Nit: let's use the bitmap_is_midx() helper here instead of looking at
> bitamp_git->midx directly.

Ok.

> First thing is to convert the commit OID we're looking for into its
> position within the corresponding pack index or MIDX file so that we can
> use it as a search key to locate in the lookup table. If we didn't find
> anything, or the commit doesn't exist in our pack / MIDX, nothing to do.
>
> > +
> > +	offset = triplet_get_offset(triplet);
> > +	xor_pos = triplet_get_xor_pos(triplet);
>
> Otherwise, record its offset and XOR "offset".

Exactly!

> We already have to get the triplets in the loop above, and then we dig
> them back out here. Would it be easier to keep track of a list of
> pointers into the mmaped region instead of looking up these triplets
> each time?

Sure. It might be a good idea. Thanks.

> > +			commit_pos = get_be32(triplet);
> > +			offset_xor = triplet_get_offset(triplet);
> > +
> > +			if (nth_bitmap_object_oid(bitmap_git, &xor_oid, commit_pos) < 0) {
>
> Should it be an error if we can't look up the object's ID here? I'd
> think so.

I also am not sure about it. Morally, I think it is better to throw
An error here.

> Do we have a good way to make sure that we're testing this code in CI?
> It *seems* correct to me, but of course, we should have a computer check
> that this produces OK results, not a human ;).

My current test file changes should test this code. As for now, the lookup
Table is enabled by default, all the existing tests that include write and
read bitmaps uses this lookup table. So, all the test case scenarios should
Pass. So, I think it is being tested in CI. Do you have a good idea to test
It better?

> Hmm. I'm not sure I follow the purpose of tweaking
> GIT_TEST_READ_COMMIT_TABLE like this with setenv(). Are we trying to
> avoid reading the lookup table? If so, why? I'd rather avoid
> manipulating the environment directly like this, and instead have a
> function we could call to fault in all of the bitmaps (when a lookup
> table exists, otherwise do nothing).

The problem was that the `test-tool bitmap list-commit` command was
Not printing any commits (the error that I notified you before). It
is because of this function. As lookup table is enabled by default,
`prepare_bitmap_git` function doesn't load each bitmap entries and
thus the below code in this function doesn't provide the bitmapped
commit list (because Hashtable didn't generated).

        kh_foreach(bitmap_git->bitmaps, oid, value, {
		printf("%s\n", oid_to_hex(&oid));
	});

So, the simplest fix I found was this. Should I make a function then
(Which you suggested here)?

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-06-27 18:39       ` Abhradeep Chakraborty
@ 2022-06-29 20:11         ` Taylor Blau
  0 siblings, 0 replies; 162+ messages in thread
From: Taylor Blau @ 2022-06-29 20:11 UTC (permalink / raw)
  To: Abhradeep Chakraborty; +Cc: Git, Kaartic Sivaraam, Derrick Stolee

On Tue, Jun 28, 2022 at 12:09:23AM +0530, Abhradeep Chakraborty wrote:
> Taylor Blau <me@ttaylorr.com> wrote:
>
> > Probably both of them should take into account their separate
> > configuration values, but cleaning up the hashcache one can be done
> > separately outside of this series.
>
> Actually, it does respect the `pack.writebitmaplookuptable` config.
> As pack.writebitmaplookuptable is by default true (for this patch
> Series), this line enables it by default. If `pack.writebitmaplookuptable`
> Set to false, the proposed change in the `git_multi_pack_index_write_config`
> function disables this flag.

Aha, you're absolutely right. I missed the earlier hunk. Thanks for
pointing it out, this part looks fine to me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 4/6] pack-bitmap: prepare to read lookup table
  2022-06-28  8:59         ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table Abhradeep Chakraborty
@ 2022-06-29 20:22           ` Taylor Blau
  2022-06-30  6:58             ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty
  0 siblings, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-06-29 20:22 UTC (permalink / raw)
  To: Abhradeep Chakraborty; +Cc: Git, Kaartic Sivaraam, Derrick Stolee

On Tue, Jun 28, 2022 at 02:29:50PM +0530, Abhradeep Chakraborty wrote:
> Taylor Blau <me@ttaylorr.com> wrote:
>
> > ...exactly my thoughts, too. It's possible that it would be faster to
> > key this search on the object_id "oid" above, and then convert each of
> > the entries in the lookup table from a uint32_t into an object_id by
> > calling nth_bitmap_object_oid() repeatedly.
> >
> > I *think* that what Abhradeep wrote here is going to be faster more
> > often than not since it makes more efficient use of the page cache
> > rather than switching between reads across different memory mapped
> > regions at each point in the binary search.
> >
> > But of course that depends on a number of factors. Abhradeep: if you're
> > up for it, I think it would be worth trying it both ways and seeing if
> > one produces a meaningful speed-up or slow-down over the other. Like I
> > said: my guess is that what you have now will be faster, but I don't
> > have a clear sense that that is true without trying it both ways ;-).
>
> Ok. Let me try both the ways. In my opinion, I think my version has
> less searching and less computation. So, I want to stick with this
> version. But I also like to try the other one once so that we can
> get the best out of these two.

Yeah, I agree with your general sense that the version as written is
going to be faster. We're comparing a smaller datatype (IOW, a 4-byte
integer that can be checked for equality in a single instruction,
instead of comparing two 20-byte OIDs), and likely flushing the cache
far less often.

But having two concrete implementations to compare will help us know for
a fact that our intuition is correct.

I'll be curious to see what you find here!

> > I think starting off with a small array and then letting it grow
> > according to alloc_nr() would be fine here, since it will grow more and
> > more each time, so the amount of times we have to reallocate the buffer
> > will tail off over time.
>
> What should be the size of that array?

I think some small, power of 2 would be a reasonable choice here.

> > If we were really concerned about it, we could treat the buffer as a
> > static pointer and reuse it over time (making sure to clear out the
> > portions of it we're going to reuse, or otherwise ensuring that we don't
> > read old data). But I doubt it matters much either way in practice: the
> > individual records are small (at just 4 bytes each) and entry_count is
> > often less than 1,000, so I think this probably has a vanishingly small
> > impact.
>
> Before submitting it to the mailing list, I did use the ALLOC_GROW macro
> function. But my version was worse than yours. For every iteration I was
> reallocating the array to support `size+1` positions. But later I drop
> the code as this might be very much expensive.

That shouldn't be the case. When you have a chance, take a look at the
alloc_nr macro, which shows how much memory we allocate at each
step:

    #define alloc_nr(x) (((x)+16)*3/2)

Suppose we allocated 16 slots initially, so nr (the number of entries
stored in the list) is 0 and alloc (the number of entries allocated) is
16. Then when we try to add the 17th item, we'll pass 16 to alloc_nr
which will allocate 48 slots. Then 96, then 168, and so on.

We only have to reallocate and copy the array when nr > alloc, which
should be fairly infrequently, and happens less and less often the
larger the array grows.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension
  2022-06-28 19:25       ` Abhradeep Chakraborty
@ 2022-06-29 20:37         ` Taylor Blau
  2022-06-29 20:41           ` Taylor Blau
  2022-06-30  8:35           ` Abhradeep Chakraborty
  0 siblings, 2 replies; 162+ messages in thread
From: Taylor Blau @ 2022-06-29 20:37 UTC (permalink / raw)
  To: Abhradeep Chakraborty; +Cc: Git, Kaartic Sivaraam, Derrick Stolee

On Wed, Jun 29, 2022 at 12:55:55AM +0530, Abhradeep Chakraborty wrote:
> > > +			commit_pos = get_be32(triplet);
> > > +			offset_xor = triplet_get_offset(triplet);
> > > +
> > > +			if (nth_bitmap_object_oid(bitmap_git, &xor_oid, commit_pos) < 0) {
> >
> > Should it be an error if we can't look up the object's ID here? I'd
> > think so.
>
> I also am not sure about it. Morally, I think it is better to throw
> An error here.

Yeah.

> > Do we have a good way to make sure that we're testing this code in CI?
> > It *seems* correct to me, but of course, we should have a computer check
> > that this produces OK results, not a human ;).
>
> My current test file changes should test this code. As for now, the lookup
> Table is enabled by default, all the existing tests that include write and
> read bitmaps uses this lookup table. So, all the test case scenarios should
> Pass. So, I think it is being tested in CI. Do you have a good idea to test
> It better?

I think having some indication (maybe via a trace2 region?) that we're
actually executing this code would be good. Although it's going to be
*really* noisy, so probably not a good idea to do that in general.

Stolee runs some coverage tests that show lines that we aren't
exercising via tests. So making sure that this doesn't show up in that
report when you run it locally would be good.

See some information from him about how to run those tests locally here:

    https://lore.kernel.org/git/00a57a1d-0566-8f54-26b2-0f3558bde88d@github.com/

(TL;DR: run `make coverage-test` and make sure that these lines don't
show up ;-)).

> > Hmm. I'm not sure I follow the purpose of tweaking
> > GIT_TEST_READ_COMMIT_TABLE like this with setenv(). Are we trying to
> > avoid reading the lookup table? If so, why? I'd rather avoid
> > manipulating the environment directly like this, and instead have a
> > function we could call to fault in all of the bitmaps (when a lookup
> > table exists, otherwise do nothing).
>
> The problem was that the `test-tool bitmap list-commit` command was
> Not printing any commits (the error that I notified you before). It
> is because of this function. As lookup table is enabled by default,
> `prepare_bitmap_git` function doesn't load each bitmap entries and
> thus the below code in this function doesn't provide the bitmapped
> commit list (because Hashtable didn't generated).
>
>         kh_foreach(bitmap_git->bitmaps, oid, value, {
> 		printf("%s\n", oid_to_hex(&oid));
> 	});
>
> So, the simplest fix I found was this. Should I make a function then
> (Which you suggested here)?

I see. I remember that issue, but I think we should go about fixing it
in a different way. Instead of tricking the code into loading all
bitmaps by pretending the lookup table doesn't exist, we should have a
function that forces loading in all bitmaps from the lookup table, if
one exists. If the lookup table doesn't exist, or we have already loaded
its entries, then that function can be noop.

If we had something like that, we could call that function from within
`test_bitmap_commits()` before reading the keys and values out of
`bitmap_git->bitmaps`.

An alternative approach would be to read the table directly when it
exists, perhaps something like this:

--- 8< ---

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 9e09c5824f..3bda059b9f 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1921,22 +1921,28 @@ int test_bitmap_commits(struct repository *r)
 {
 	struct bitmap_index *bitmap_git = NULL;
 	struct object_id oid;
-	MAYBE_UNUSED void *value;
-
-	/* As this function is only used to print bitmap selected
-	 * commits, we don't have to read the commit table.
-	 */
-	setenv("GIT_TEST_READ_COMMIT_TABLE", "0", 1);

 	bitmap_git = prepare_bitmap_git(r);
 	if (!bitmap_git)
 		die("failed to load bitmap indexes");

-	kh_foreach(bitmap_git->bitmaps, oid, value, {
-		printf("%s\n", oid_to_hex(&oid));
-	});
+	if (bitmap_git->table_lookup) {
+		uint32_t i, commit_pos;
+		for (i = 0; i < bitmap_git->entry_count; i++) {
+			commit_pos = get_be32(bitmap_get_triplet(bitmap_git, i));
+			if (nth_bitmap_object_oid(bitmap_git, &oid,
+						  commit_pos) < 0)
+				return error("could not find commit at "
+					     "position %"PRIu32, commit_pos);
+			printf("%s\n", oid_to_hex(&oid));
+		}
+	} else {
+		MAYBE_UNUSED void *value;
+		kh_foreach(bitmap_git->bitmaps, oid, value, {
+			printf("%s\n", oid_to_hex(&oid));
+		});
+	}

-	setenv("GIT_TEST_READ_COMMIT_TABLE", "1", 1);
 	free_bitmap_index(bitmap_git);

 	return 0;

--- >8 ---

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 5/6] bitmap-lookup-table: add performance tests for lookup table
  2022-06-28  7:58       ` Abhradeep Chakraborty
@ 2022-06-29 20:40         ` Taylor Blau
  0 siblings, 0 replies; 162+ messages in thread
From: Taylor Blau @ 2022-06-29 20:40 UTC (permalink / raw)
  To: Abhradeep Chakraborty; +Cc: Git, Kaartic Sivaraam, Derrick Stolee

On Tue, Jun 28, 2022 at 01:28:43PM +0530, Abhradeep Chakraborty wrote:
> Taylor Blau <me@ttaylorr.com> wrote:
>
> > I think this "create tags" step can happen outside of the test_bitmap()
> > function, since it should only need to be done once, right?
>
> Yeah, I also think the same. That's why I tried to not include in the
> Function but for some reason, one test is failing -
>
>   perf 24 - rev-list with tag negated via --not --all (objects):
>   running:
>   		git rev-list perf-tag --not --all --use-bitmap-index --objects >/dev/null
>
>   fatal: ambiguous argument 'perf-tag': unknown revision or path not in the working tree.
>   Use '--' to separate paths from revisions, like this:
>   'git <command> [<revision>...] -- [<file>...]'
>   not ok 24 - rev-list with tag negated via --not --all (objects)
>
> One thing to note here is that the first `test_bitmap` call always
> Passes. But the second `test_bitmap` call fails due to above error.
> It throws error irrespective of any parameters for second `test_bitmap`.
>
> If I put it inside the function it doesn't throw any error!
>
> For this reason, I put it into the function. Do you have any idea
> why this happend?

I think that it's because we delete all of the refs in the test that
creates a partial bitmap state. So anything that relies on perf-tag
existing after that test runs will definitely not work :).

My original suggestion was misguided there, unless we wanted to make the
aforementioned test (the one that creates the partial bitmap state)
restore the ref state after it finishes running, but I don't think
that's worthwhile.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension
  2022-06-29 20:37         ` Taylor Blau
@ 2022-06-29 20:41           ` Taylor Blau
  2022-06-30  8:35           ` Abhradeep Chakraborty
  1 sibling, 0 replies; 162+ messages in thread
From: Taylor Blau @ 2022-06-29 20:41 UTC (permalink / raw)
  To: Abhradeep Chakraborty; +Cc: Git, Kaartic Sivaraam, Derrick Stolee

On Wed, Jun 29, 2022 at 04:37:38PM -0400, Taylor Blau wrote:
> +				return error("could not find commit at "
> +					     "position %"PRIu32, commit_pos);

Oops. Pretend that I marked this string for translation ;-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension
  2022-06-29 20:22           ` Taylor Blau
@ 2022-06-30  6:58             ` Abhradeep Chakraborty
  0 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-06-30  6:58 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Derrick Stolee


Taylor Blau <me@ttaylorr.com> wrote:

> That shouldn't be the case. When you have a chance, take a look at the
> alloc_nr macro, which shows how much memory we allocate at each
> step:
>
>     #define alloc_nr(x) (((x)+16)*3/2)
>
> Suppose we allocated 16 slots initially, so nr (the number of entries
> stored in the list) is 0 and alloc (the number of entries allocated) is
> 16. Then when we try to add the 17th item, we'll pass 16 to alloc_nr
> which will allocate 48 slots. Then 96, then 168, and so on.
>
> We only have to reallocate and copy the array when nr > alloc, which
> should be fairly infrequently, and happens less and less often the
> larger the array grows.

Ohh, I misunderstood the ALLOC_GROW function. Thanks!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension
  2022-06-29 20:37         ` Taylor Blau
  2022-06-29 20:41           ` Taylor Blau
@ 2022-06-30  8:35           ` Abhradeep Chakraborty
  1 sibling, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-06-30  8:35 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Derrick Stolee


Taylor Blau <me@ttaylorr.com> wrote:

> I think having some indication (maybe via a trace2 region?) that we're
> actually executing this code would be good. Although it's going to be
> *really* noisy, so probably not a good idea to do that in general.
>
> Stolee runs some coverage tests that show lines that we aren't
> exercising via tests. So making sure that this doesn't show up in that
> report when you run it locally would be good.
>
> See some information from him about how to run those tests locally here:
>
>     https://lore.kernel.org/git/00a57a1d-0566-8f54-26b2-0f3558bde88d@github.com/
>
> (TL;DR: run `make coverage-test` and make sure that these lines don't
> show up ;-)).

Got it. Thanks.

> If we had something like that, we could call that function from within
> `test_bitmap_commits()` before reading the keys and values out of
> `bitmap_git->bitmaps`.
>
> An alternative approach would be to read the table directly when it
> exists, perhaps something like this:

I think we have a simpler fix than what you suggested here. What if
We do it like this way -

    if (bitmap_git->table_lookup) {
	if (load_bitmap_entries_v1(bitmap_git) < 0)
	    die(_("failed to load bitmap indexes"));
    }

Is this okay for you?

^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format
  2022-06-26 13:10 ` [PATCH v2 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                     ` (5 preceding siblings ...)
  2022-06-26 13:10   ` [PATCH v2 6/6] p5310-pack-bitmaps.sh: enable pack.writeReverseIndex for testing Abhradeep Chakraborty via GitGitGadget
@ 2022-07-04  8:46   ` Abhradeep Chakraborty via GitGitGadget
  2022-07-04  8:46     ` [PATCH v3 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
                       ` (8 more replies)
  6 siblings, 9 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-07-04  8:46 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee, Abhradeep Chakraborty

When parsing the .bitmap file, git loads all the bitmaps one by one even if
some of the bitmaps are not necessary. We can remove this overhead by
loading only the necessary bitmaps. A look up table extension can solve this
issue.

Changes since v1:

This is the second version which addressed all (I think) the reviews. Please
notify me if some reviews are not addressed :)

 * The table size is decreased and the format has also changed. It now
   contains nr_entries triplets of size 4+8+4 bytes. Each triplet contains
   the following things - (1) 4 byte commit position (in the pack-index or
   midx) (2) 8 byte offset and (3) 4 byte xor triplet (i.e. with whose
   bitmap the current triplet's bitmap has to xor) position.
 * Performance tests are splitted into two commits. First contains the
   actual performance tests and second enables the pack.writeReverseIndex
   (as suggested by Taylor).
 * st_*() functions are used.
 * commit order is changed according to Derrick's suggestion.
 * Iterative approach is used instead of recursive approach to parse xor
   bitmaps. (As suggested by Derrick).
 * Some minor bug fixes of previous version.

Initial version:

The proposed table has:

 * a list of nr_entries object ids. These objects are commits that has
   bitmaps. Ids are stored in lexicographic order (for better searching).
 * a list of <offset, xor-offset> pairs (4-byte integers, network-byte
   order). The i'th pair denotes the offset and xor-offset(respectively) of
   the bitmap of i'th commit in the previous list. These two informations
   are necessary because only in this way bitmaps can be found without
   parsing all the bitmap.
 * a 4-byte integer for table specific flags (none exists currently).

Whenever git want to parse the bitmap for a specific commit, it will first
refer to the table and will look for the offset and xor-offset for that
commit. Git will then try to parse the bitmap located at the offset
position. The xor-offset can be used to find the xor-bitmap for the
bitmap(if any).

Abhradeep Chakraborty (6):
  Documentation/technical: describe bitmap lookup table extension
  pack-bitmap-write.c: write lookup table extension
  pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  pack-bitmap: prepare to read lookup table extension
  bitmap-lookup-table: add performance tests for lookup table
  p5310-pack-bitmaps.sh: remove pack.writeReverseIndex

 Documentation/config/pack.txt             |   7 +
 Documentation/technical/bitmap-format.txt |  39 ++
 builtin/multi-pack-index.c                |   7 +
 builtin/pack-objects.c                    |   8 +
 midx.c                                    |   3 +
 midx.h                                    |   1 +
 pack-bitmap-write.c                       | 112 ++-
 pack-bitmap.c                             | 266 +++++++-
 pack-bitmap.h                             |  14 +-
 t/perf/p5310-pack-bitmaps.sh              |  77 ++-
 t/perf/p5326-multi-pack-bitmaps.sh        |  93 +--
 t/t5310-pack-bitmaps.sh                   | 786 ++++++++++++----------
 t/t5311-pack-bitmaps-shallow.sh           |  53 +-
 t/t5326-multi-pack-bitmaps.sh             | 421 +++++++-----
 t/t5327-multi-pack-bitmaps-rev.sh         |   9 +
 15 files changed, 1238 insertions(+), 658 deletions(-)


base-commit: 39c15e485575089eb77c769f6da02f98a55905e0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1266%2FAbhra303%2Fbitmap-commit-table-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1266/Abhra303/bitmap-commit-table-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1266

Range-diff vs v2:

 1:  4d11be66cfa ! 1:  f72bf11e6ef Documentation/technical: describe bitmap lookup table extension
     @@ Metadata
       ## Commit message ##
          Documentation/technical: describe bitmap lookup table extension
      
     -    When reading bitmap file, git loads each and every bitmap one by one
     +    When reading bitmap file, Git loads each and every bitmap one by one
          even if all the bitmaps are not required. A "bitmap lookup table"
          extension to the bitmap format can reduce the overhead of loading
          bitmaps which stores a list of bitmapped commit id pos (in the midx
          or pack, along with their offset and xor offset. This way git can
     -    load only the neccesary bitmaps without loading the previous bitmaps.
     +    load only the necessary bitmaps without loading the previous bitmaps.
      
     -    The older version of Git ignores the lookup table extension and doesn't
     +    Older versions of Git ignore the lookup table extension and don't
          throw any kind of warning or error while parsing the bitmap file.
      
          Add some information for the new "bitmap lookup table" extension in the
          bitmap-format documentation.
      
     -    Co-Authored-by: Taylor Blau <me@ttaylorr.com>
          Mentored-by: Taylor Blau <me@ttaylorr.com>
          Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
     +    Co-Authored-by: Taylor Blau <me@ttaylorr.com>
          Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
      
       ## Documentation/technical/bitmap-format.txt ##
     @@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cac
      +			** {empty}
      +			BITMAP_OPT_LOOKUP_TABLE (0x10): :::
      +			If present, the end of the bitmap file contains a table
     -+			containing a list of `N` <commit pos, offset, xor offset>
     ++			containing a list of `N` <commit_pos, offset, xor_row>
      +			triplets. The format and meaning of the table is described
      +			below.
      ++
     -+NOTE: This xor_offset is different from the bitmap's xor_offset.
     -+Bitmap's xor_offset is relative i.e. it tells how many bitmaps we have
     -+to go back from the current bitmap. Lookup table's xor_offset tells the
     -+position of the triplet in the list whose bitmap the current commit's
     -+bitmap have to xor with.
     ++NOTE: Unlike the xor_offset used to compress an individual bitmap,
     ++`xor_row` stores an *absolute* index into the lookup table, not a location
     ++relative to the current entry.
      +
       		4-byte entry count (network byte order)
       
     @@ Documentation/technical/bitmap-format.txt: Note that this hashing scheme is tied
      +-------------------
      +
      +If the BITMAP_OPT_LOOKUP_TABLE flag is set, the last `N * (4 + 8 + 4)`
     -+(preceding the name-hash cache and trailing hash) of the `.bitmap` file
     -+contains a lookup table specifying the information needed to get the
     -+desired bitmap from the entries without parsing previous unnecessary
     ++bytes (preceding the name-hash cache and trailing hash) of the `.bitmap`
     ++file contains a lookup table specifying the information needed to get
     ++the desired bitmap from the entries without parsing previous unnecessary
      +bitmaps.
      +
      +For a `.bitmap` containing `nr_entries` reachability bitmaps, the table
     -+contains a list of `nr_entries` <commit pos, offset, xor offset> triplets.
     -+The content of i'th triplet is -
     ++contains a list of `nr_entries` <commit_pos, offset, xor_row> triplets
     ++(sorted in the ascending order of `commit_pos`). The content of i'th
     ++triplet is -
      +
      +	* {empty}
     -+	commit pos (4 byte integer, network byte order): ::
     -+	It stores the object position of the commit (in the midx or pack index)
     -+	to which the i'th bitmap in the bitmap entries belongs.
     ++	commit_pos (4 byte integer, network byte order): ::
     ++	It stores the object position of a commit (in the midx or pack
     ++	index).
      +
      +	* {empty}
      +	offset (8 byte integer, network byte order): ::
      +	The offset from which that commit's bitmap can be read.
      +
      +	* {empty}
     -+	xor offset (4 byte integer, network byte order): ::
     -+	It holds the position of the triplet with whose bitmap the
     -+	current bitmap need to xor. If the current triplet's bitmap
     -+	do not have any xor bitmap, it defaults to 0xffffffff.
     ++	xor_row (4 byte integer, network byte order): ::
     ++	The position of the triplet whose bitmap is used to compress
     ++	this one, or `0xffffffff` if no such bitmap exists.
 2:  d118f1d45e6 ! 2:  5e9b985e39b pack-bitmap-write.c: write lookup table extension
     @@ Metadata
       ## Commit message ##
          pack-bitmap-write.c: write lookup table extension
      
     -    The bitmap lookup table extension was documentated by an earlier
     -    change, but Git does not yet knowhow to write that extension.
     +    The bitmap lookup table extension was documented by an earlier
     +    change, but Git does not yet know how to write that extension.
      
     -    Teach git to write bitmap lookup table extension. The table contains
     -    the list of `N` <commit pos, offset, xor offset>` triplets. These
     +    Teach Git to write bitmap lookup table extension. The table contains
     +    the list of `N` <commit_pos, offset, xor_row>` triplets. These
          triplets are sorted according to their commit pos (ascending order).
          The meaning of each data in the i'th triplet is given below:
      
     -      - Commit pos is the position of the commit in the pack-index
     -        (or midx) to which the i'th bitmap belongs. It is a 4 byte
     -        network byte order integer.
     +      - commit_pos stores commit position (in the pack-index or midx).
     +        It is a 4 byte network byte order unsigned integer.
      
     -      - offset is the position of the i'th bitmap.
     +      - offset is the position (in the bitmap file) from which that
     +        commit's bitmap can be read.
      
     -      - xor offset denotes the position of the triplet with whose
     -        bitmap the current triplet's bitmap need to xor with.
     +      - xor_row is the position of the triplet in the lookup table
     +        whose bitmap is used to compress this bitmap, or `0xffffffff`
     +        if no such bitmap exists.
      
     -    Co-authored-by: Taylor Blau <me@ttaylorr.com>
          Mentored-by: Taylor Blau <me@ttaylorr.com>
          Co-mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
     +    Co-authored-by: Taylor Blau <me@ttaylorr.com>
          Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
      
       ## pack-bitmap-write.c ##
      @@ pack-bitmap-write.c: static const struct object_id *oid_access(size_t pos, const void *table)
     + 	return &index[pos]->oid;
     + }
       
     ++static int commit_bitmap_writer_pos(struct object_id *oid,
     ++				    struct pack_idx_entry **index,
     ++				    uint32_t index_nr)
     ++{
     ++	return oid_pos(oid, index, index_nr, oid_access);
     ++}
     ++
       static void write_selected_commits_v1(struct hashfile *f,
       				      struct pack_idx_entry **index,
      -				      uint32_t index_nr)
      +				      uint32_t index_nr,
     -+				      uint64_t *offsets,
     -+				      uint32_t *commit_positions)
     ++				      off_t *offsets)
       {
       	int i;
       
      @@ pack-bitmap-write.c: static void write_selected_commits_v1(struct hashfile *f,
     + 		struct bitmapped_commit *stored = &writer.selected[i];
     + 
     + 		int commit_pos =
     +-			oid_pos(&stored->commit->object.oid, index, index_nr, oid_access);
     ++			commit_bitmap_writer_pos(&stored->commit->object.oid, index, index_nr);
     + 
       		if (commit_pos < 0)
       			BUG("trying to write commit not in index");
       
      +		if (offsets)
      +			offsets[i] = hashfile_total(f);
     -+		if (commit_positions)
     -+			commit_positions[i] = commit_pos;
      +
       		hashwrite_be32(f, commit_pos);
       		hashwrite_u8(f, stored->xor_offset);
     @@ pack-bitmap-write.c: static void write_selected_commits_v1(struct hashfile *f,
       	}
       }
       
     -+static int table_cmp(const void *_va, const void *_vb, void *commit_positions)
     ++static int table_cmp(const void *_va, const void *_vb, void *_data)
      +{
     -+	int8_t result = 0;
     -+	uint32_t *positions = (uint32_t *) commit_positions;
     -+	uint32_t a = positions[*(uint32_t *)_va];
     -+	uint32_t b = positions[*(uint32_t *)_vb];
     ++	uint32_t *commit_positions = _data;
     ++	uint32_t a = commit_positions[*(uint32_t *)_va];
     ++	uint32_t b = commit_positions[*(uint32_t *)_vb];
      +
      +	if (a > b)
     -+		result = 1;
     ++		return 1;
      +	else if (a < b)
     -+		result = -1;
     -+	else
     -+		result = 0;
     ++		return -1;
      +
     -+	return result;
     ++	return 0;
      +}
      +
      +static void write_lookup_table(struct hashfile *f,
     -+			       uint64_t *offsets,
     -+			       uint32_t *commit_positions)
     ++			       struct pack_idx_entry **index,
     ++			       uint32_t index_nr,
     ++			       off_t *offsets)
      +{
      +	uint32_t i;
     -+	uint32_t *table, *table_inv;
     ++	uint32_t *table, *table_inv, *commit_positions;
      +
      +	ALLOC_ARRAY(table, writer.selected_nr);
      +	ALLOC_ARRAY(table_inv, writer.selected_nr);
     ++	ALLOC_ARRAY(commit_positions, writer.selected_nr);
     ++
     ++	/* store the index positions of the commits */
     ++	for (i = 0; i < writer.selected_nr; i++) {
     ++		int pos = commit_bitmap_writer_pos(&writer.selected[i].commit->object.oid,
     ++						   index, index_nr);
     ++		if (pos < 0)
     ++			BUG(_("trying to write commit not in index"));
     ++
     ++		commit_positions[i] = pos;
     ++	}
      +
      +	for (i = 0; i < writer.selected_nr; i++)
      +		table[i] = i;
      +
     ++	/*
     ++	 * At the end of this sort table[j] = i means that the i'th
     ++	 * bitmap corresponds to j'th bitmapped commit in lex order of
     ++	 * OIDs.
     ++	 */
      +	QSORT_S(table, writer.selected_nr, table_cmp, commit_positions);
      +
     ++	/* table_inv helps us discover that relationship (i'th bitmap
     ++	 * to j'th commit by j = table_inv[i])
     ++	 */
      +	for (i = 0; i < writer.selected_nr; i++)
      +		table_inv[table[i]] = i;
      +
     ++	trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository);
      +	for (i = 0; i < writer.selected_nr; i++) {
      +		struct bitmapped_commit *selected = &writer.selected[table[i]];
      +		uint32_t xor_offset = selected->xor_offset;
     ++		uint32_t xor_row;
     ++
     ++		if (xor_offset) {
     ++			/*
     ++			 * xor_index stores the index (in the bitmap entries)
     ++			 * of the corresponding xor bitmap. But we need to convert
     ++			 * this index into lookup table's index. So, table_inv[xor_index]
     ++			 * gives us the index position w.r.t. the lookup table.
     ++			 *
     ++			 * If "k = table[i] - xor_offset" then the xor base is the k'th
     ++			 * bitmap. `table_inv[k]` gives us the position of that bitmap
     ++			 * in the lookup table.
     ++			 */
     ++			uint32_t xor_index = table[i] - xor_offset;
     ++			xor_row = table_inv[xor_index];
     ++		} else {
     ++			xor_row = 0xffffffff;
     ++		}
      +
      +		hashwrite_be32(f, commit_positions[table[i]]);
     -+		hashwrite_be64(f, offsets[table[i]]);
     -+		hashwrite_be32(f, xor_offset ?
     -+				table_inv[table[i] - xor_offset]: 0xffffffff);
     ++		hashwrite_be64(f, (uint64_t)offsets[table[i]]);
     ++		hashwrite_be32(f, xor_row);
      +	}
     ++	trace2_region_leave("pack-bitmap-write", "writing_lookup_table", the_repository);
      +
      +	free(table);
      +	free(table_inv);
     ++	free(commit_positions);
      +}
      +
       static void write_hash_cache(struct hashfile *f,
     @@ pack-bitmap-write.c: void bitmap_writer_finish(struct pack_idx_entry **index,
       {
       	static uint16_t default_version = 1;
       	static uint16_t flags = BITMAP_OPT_FULL_DAG;
     -+	uint64_t *offsets = NULL;
     -+	uint32_t *commit_positions = NULL;
     ++	off_t *offsets = NULL;
       	struct strbuf tmp_file = STRBUF_INIT;
       	struct hashfile *f;
       
     @@ pack-bitmap-write.c: void bitmap_writer_finish(struct pack_idx_entry **index,
       	dump_bitmap(f, writer.blobs);
       	dump_bitmap(f, writer.tags);
      -	write_selected_commits_v1(f, index, index_nr);
     - 
     -+	if (options & BITMAP_OPT_LOOKUP_TABLE) {
     ++
     ++	if (options & BITMAP_OPT_LOOKUP_TABLE)
      +		CALLOC_ARRAY(offsets, index_nr);
     -+		CALLOC_ARRAY(commit_positions, index_nr);
     -+	}
      +
     -+	write_selected_commits_v1(f, index, index_nr, offsets, commit_positions);
     ++	write_selected_commits_v1(f, index, index_nr, offsets);
      +
      +	if (options & BITMAP_OPT_LOOKUP_TABLE)
     -+		write_lookup_table(f, offsets, commit_positions);
     ++		write_lookup_table(f, index, index_nr, offsets);
     + 
       	if (options & BITMAP_OPT_HASH_CACHE)
       		write_hash_cache(f, index, index_nr);
     - 
      @@ pack-bitmap-write.c: void bitmap_writer_finish(struct pack_idx_entry **index,
       		die_errno("unable to rename temporary bitmap file to '%s'", filename);
       
       	strbuf_release(&tmp_file);
      +	free(offsets);
     -+	free(commit_positions);
       }
      
       ## pack-bitmap.h ##
 3:  7786dc879f0 ! 3:  3dc40cc7f73 pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
     @@ Metadata
       ## Commit message ##
          pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
      
     -    Teach git to provide a way for users to enable/disable bitmap lookup
     +    Teach Git to provide a way for users to enable/disable bitmap lookup
          table extension by providing a config option named 'writeBitmapLookupTable'.
     -    Default is true.
     +    Default is false.
      
          Also add test to verify writting of lookup table.
      
     -    Co-Authored-by: Taylor Blau <me@ttaylorr.com>
     -    Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
          Mentored-by: Taylor Blau <me@ttaylorr.com>
          Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
     +    Co-Authored-by: Taylor Blau <me@ttaylorr.com>
     +    Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
      
       ## Documentation/config/pack.txt ##
      @@ Documentation/config/pack.txt: When writing a multi-pack reachability bitmap, no new namehashes are
     @@ Documentation/config/pack.txt: When writing a multi-pack reachability bitmap, no
       permuted into their appropriate location when writing a new bitmap.
       
      +pack.writeBitmapLookupTable::
     -+	When true, git will include a "lookup table" section in the
     ++	When true, Git will include a "lookup table" section in the
      +	bitmap index (if one is written). This table is used to defer
      +	loading individual bitmaps as late as possible. This can be
     -+	beneficial in repositories which have relatively large bitmap
     -+	indexes. Defaults to true.
     ++	beneficial in repositories that have relatively large bitmap
     ++	indexes. Defaults to false.
      +
       pack.writeReverseIndex::
       	When true, git will write a corresponding .rev file (see:
     @@ builtin/multi-pack-index.c: static int git_multi_pack_index_write_config(const c
       	/*
       	 * We should never make a fall-back call to 'git_default_config', since
       	 * this was already called in 'cmd_multi_pack_index()'.
     -@@ builtin/multi-pack-index.c: static int cmd_multi_pack_index_write(int argc, const char **argv)
     - 	};
     - 
     - 	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
     -+	opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;
     - 
     - 	git_config(git_multi_pack_index_write_config, NULL);
     - 
      
       ## builtin/pack-objects.c ##
     -@@ builtin/pack-objects.c: static enum {
     - 	WRITE_BITMAP_QUIET,
     - 	WRITE_BITMAP_TRUE,
     - } write_bitmap_index;
     --static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE;
     -+static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE | BITMAP_OPT_LOOKUP_TABLE;
     - 
     - static int exclude_promisor_objects;
     - 
      @@ builtin/pack-objects.c: static int git_pack_config(const char *k, const char *v, void *cb)
       		else
       			write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE;
     @@ midx.h: struct multi_pack_index {
       const unsigned char *get_midx_checksum(struct multi_pack_index *m);
       void get_midx_filename(struct strbuf *out, const char *object_dir);
      
     - ## pack-bitmap-write.c ##
     -@@ pack-bitmap-write.c: static void write_lookup_table(struct hashfile *f,
     - 	for (i = 0; i < writer.selected_nr; i++)
     - 		table_inv[table[i]] = i;
     - 
     -+	trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository);
     - 	for (i = 0; i < writer.selected_nr; i++) {
     - 		struct bitmapped_commit *selected = &writer.selected[table[i]];
     - 		uint32_t xor_offset = selected->xor_offset;
     -@@ pack-bitmap-write.c: static void write_lookup_table(struct hashfile *f,
     - 
     - 	free(table);
     - 	free(table_inv);
     -+	trace2_region_leave("pack-bitmap-write", "writing_lookup_table", the_repository);
     + ## t/t5310-pack-bitmaps.sh ##
     +@@ t/t5310-pack-bitmaps.sh: has_any () {
     + 	grep -Ff "$1" "$2"
       }
       
     - static void write_hash_cache(struct hashfile *f,
     -
     - ## t/t5310-pack-bitmaps.sh ##
     -@@ t/t5310-pack-bitmaps.sh: test_expect_success 'full repack creates bitmaps' '
     - 	ls .git/objects/pack/ | grep bitmap >output &&
     - 	test_line_count = 1 output &&
     - 	grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
     +-setup_bitmap_history
     +-
     +-test_expect_success 'setup writing bitmaps during repack' '
     +-	git config repack.writeBitmaps true
     +-'
     +-
     +-test_expect_success 'full repack creates bitmaps' '
     +-	GIT_TRACE2_EVENT="$(pwd)/trace" \
     ++test_bitmap_cases () {
     ++	writeLookupTable=false
     ++	for i in "$@"
     ++	do
     ++		case "$i" in
     ++		"pack.writeBitmapLookupTable") writeLookupTable=true;;
     ++		esac
     ++	done
     ++
     ++	test_expect_success 'setup test repository' '
     ++		rm -fr * .git &&
     ++		git init &&
     ++		git config pack.writeBitmapLookupTable '"$writeLookupTable"'
     ++	'
     ++	setup_bitmap_history
     ++
     ++	test_expect_success 'setup writing bitmaps during repack' '
     ++		git config repack.writeBitmaps true
     ++	'
     ++
     ++	test_expect_success 'full repack creates bitmaps' '
     ++		GIT_TRACE2_EVENT="$(pwd)/trace" \
     ++			git repack -ad &&
     ++		ls .git/objects/pack/ | grep bitmap >output &&
     ++		test_line_count = 1 output &&
     ++		grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
     ++		grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
     ++	'
     ++
     ++	basic_bitmap_tests
     ++
     ++	test_expect_success 'pack-objects respects --local (non-local loose)' '
     ++		git init --bare alt.git &&
     ++		echo $(pwd)/alt.git/objects >.git/objects/info/alternates &&
     ++		echo content1 >file1 &&
     ++		# non-local loose object which is not present in bitmapped pack
     ++		altblob=$(GIT_DIR=alt.git git hash-object -w file1) &&
     ++		# non-local loose object which is also present in bitmapped pack
     ++		git cat-file blob $blob | GIT_DIR=alt.git git hash-object -w --stdin &&
     ++		git add file1 &&
     ++		test_tick &&
     ++		git commit -m commit_file1 &&
     ++		echo HEAD | git pack-objects --local --stdout --revs >1.pack &&
     ++		git index-pack 1.pack &&
     ++		list_packed_objects 1.idx >1.objects &&
     ++		printf "%s\n" "$altblob" "$blob" >nonlocal-loose &&
     ++		! has_any nonlocal-loose 1.objects
     ++	'
     ++
     ++	test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' '
     ++		echo content2 >file2 &&
     ++		blob2=$(git hash-object -w file2) &&
     ++		git add file2 &&
     ++		test_tick &&
     ++		git commit -m commit_file2 &&
     ++		printf "%s\n" "$blob2" "$bitmaptip" >keepobjects &&
     ++		pack2=$(git pack-objects pack2 <keepobjects) &&
     ++		mv pack2-$pack2.* .git/objects/pack/ &&
     ++		>.git/objects/pack/pack2-$pack2.keep &&
     ++		rm $(objpath $blob2) &&
     ++		echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >2a.pack &&
     ++		git index-pack 2a.pack &&
     ++		list_packed_objects 2a.idx >2a.objects &&
     ++		! has_any keepobjects 2a.objects
     ++	'
     ++
     ++	test_expect_success 'pack-objects respects --local (non-local pack)' '
     ++		mv .git/objects/pack/pack2-$pack2.* alt.git/objects/pack/ &&
     ++		echo HEAD | git pack-objects --local --stdout --revs >2b.pack &&
     ++		git index-pack 2b.pack &&
     ++		list_packed_objects 2b.idx >2b.objects &&
     ++		! has_any keepobjects 2b.objects
     ++	'
     ++
     ++	test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' '
     ++		ls .git/objects/pack/ | grep bitmap >output &&
     ++		test_line_count = 1 output &&
     ++		packbitmap=$(basename $(cat output) .bitmap) &&
     ++		list_packed_objects .git/objects/pack/$packbitmap.idx >packbitmap.objects &&
     ++		test_when_finished "rm -f .git/objects/pack/$packbitmap.keep" &&
     ++		>.git/objects/pack/$packbitmap.keep &&
     ++		echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack &&
     ++		git index-pack 3a.pack &&
     ++		list_packed_objects 3a.idx >3a.objects &&
     ++		! has_any packbitmap.objects 3a.objects
     ++	'
     ++
     ++	test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' '
     ++		mv .git/objects/pack/$packbitmap.* alt.git/objects/pack/ &&
     ++		rm -f .git/objects/pack/multi-pack-index &&
     ++		test_when_finished "mv alt.git/objects/pack/$packbitmap.* .git/objects/pack/" &&
     ++		echo HEAD | git pack-objects --local --stdout --revs >3b.pack &&
     ++		git index-pack 3b.pack &&
     ++		list_packed_objects 3b.idx >3b.objects &&
     ++		! has_any packbitmap.objects 3b.objects
     ++	'
     ++
     ++	test_expect_success 'pack-objects to file can use bitmap' '
     ++		# make sure we still have 1 bitmap index from previous tests
     ++		ls .git/objects/pack/ | grep bitmap >output &&
     ++		test_line_count = 1 output &&
     ++		# verify equivalent packs are generated with/without using bitmap index
     ++		packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) &&
     ++		packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) &&
     ++		list_packed_objects packa-$packasha1.idx >packa.objects &&
     ++		list_packed_objects packb-$packbsha1.idx >packb.objects &&
     ++		test_cmp packa.objects packb.objects
     ++	'
     ++
     ++	test_expect_success 'full repack, reusing previous bitmaps' '
     + 		git repack -ad &&
     +-	ls .git/objects/pack/ | grep bitmap >output &&
     +-	test_line_count = 1 output &&
     +-	grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
      -	grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
     -+	grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace &&
     -+	grep "\"label\":\"writing_lookup_table\"" trace
     +-'
     ++		ls .git/objects/pack/ | grep bitmap >output &&
     ++		test_line_count = 1 output
     ++	'
     ++
     ++	test_expect_success 'fetch (full bitmap)' '
     ++		git --git-dir=clone.git fetch origin second:second &&
     ++		git rev-parse HEAD >expect &&
     ++		git --git-dir=clone.git rev-parse HEAD >actual &&
     ++		test_cmp expect actual
     ++	'
     ++
     ++	test_expect_success 'create objects for missing-HAVE tests' '
     ++		blob=$(echo "missing have" | git hash-object -w --stdin) &&
     ++		tree=$(printf "100644 blob $blob\tfile\n" | git mktree) &&
     ++		parent=$(echo parent | git commit-tree $tree) &&
     ++		commit=$(echo commit | git commit-tree $tree -p $parent) &&
     ++		cat >revs <<-EOF
     ++		HEAD
     ++		^HEAD^
     ++		^$commit
     ++		EOF
     ++	'
     ++
     ++	test_expect_success 'pack-objects respects --incremental' '
     ++		cat >revs2 <<-EOF &&
     ++		HEAD
     ++		$commit
     ++		EOF
     ++		git pack-objects --incremental --stdout --revs <revs2 >4.pack &&
     ++		git index-pack 4.pack &&
     ++		list_packed_objects 4.idx >4.objects &&
     ++		test_line_count = 4 4.objects &&
     ++		git rev-list --objects $commit >revlist &&
     ++		cut -d" " -f1 revlist |sort >objects &&
     ++		test_cmp 4.objects objects
     ++	'
     ++
     ++	test_expect_success 'pack with missing blob' '
     ++		rm $(objpath $blob) &&
     ++		git pack-objects --stdout --revs <revs >/dev/null
     ++	'
     ++
     ++	test_expect_success 'pack with missing tree' '
     ++		rm $(objpath $tree) &&
     ++		git pack-objects --stdout --revs <revs >/dev/null
     ++	'
     ++
     ++	test_expect_success 'pack with missing parent' '
     ++		rm $(objpath $parent) &&
     ++		git pack-objects --stdout --revs <revs >/dev/null
     ++	'
     ++
     ++	test_expect_success JGIT,SHA1 'we can read jgit bitmaps' '
     ++		git clone --bare . compat-jgit.git &&
     ++		(
     ++			cd compat-jgit.git &&
     ++			rm -f objects/pack/*.bitmap &&
     ++			jgit gc &&
     ++			git rev-list --test-bitmap HEAD
     ++		)
     ++	'
     ++
     ++	test_expect_success JGIT,SHA1 'jgit can read our bitmaps' '
     ++		git clone --bare . compat-us.git &&
     ++		(
     ++			cd compat-us.git &&
     ++			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
     ++			git repack -adb &&
     ++			# jgit gc will barf if it does not like our bitmaps
     ++			jgit gc
     ++		)
     ++	'
     ++
     ++	test_expect_success 'splitting packs does not generate bogus bitmaps' '
     ++		test-tool genrandom foo $((1024 * 1024)) >rand &&
     ++		git add rand &&
     ++		git commit -m "commit with big file" &&
     ++		git -c pack.packSizeLimit=500k repack -adb &&
     ++		git init --bare no-bitmaps.git &&
     ++		git -C no-bitmaps.git fetch .. HEAD
     ++	'
     ++
     ++	test_expect_success 'set up reusable pack' '
     ++		rm -f .git/objects/pack/*.keep &&
     ++		git repack -adb &&
     ++		reusable_pack () {
     ++			git for-each-ref --format="%(objectname)" |
     ++			git pack-objects --delta-base-offset --revs --stdout "$@"
     ++		}
     ++	'
     ++
     ++	test_expect_success 'pack reuse respects --honor-pack-keep' '
     ++		test_when_finished "rm -f .git/objects/pack/*.keep" &&
     ++		for i in .git/objects/pack/*.pack
     ++		do
     ++			>${i%.pack}.keep || return 1
     ++		done &&
     ++		reusable_pack --honor-pack-keep >empty.pack &&
     ++		git index-pack empty.pack &&
     ++		git show-index <empty.idx >actual &&
     ++		test_must_be_empty actual
     ++	'
     ++
     ++	test_expect_success 'pack reuse respects --local' '
     ++		mv .git/objects/pack/* alt.git/objects/pack/ &&
     ++		test_when_finished "mv alt.git/objects/pack/* .git/objects/pack/" &&
     ++		reusable_pack --local >empty.pack &&
     ++		git index-pack empty.pack &&
     ++		git show-index <empty.idx >actual &&
     ++		test_must_be_empty actual
     ++	'
     ++
     ++	test_expect_success 'pack reuse respects --incremental' '
     ++		reusable_pack --incremental >empty.pack &&
     ++		git index-pack empty.pack &&
     ++		git show-index <empty.idx >actual &&
     ++		test_must_be_empty actual
     ++	'
     ++
     ++	test_expect_success 'truncated bitmap fails gracefully (ewah)' '
     ++		test_config pack.writebitmaphashcache false &&
     ++		git repack -ad &&
     ++		git rev-list --use-bitmap-index --count --all >expect &&
     ++		bitmap=$(ls .git/objects/pack/*.bitmap) &&
     ++		test_when_finished "rm -f $bitmap" &&
     ++		test_copy_bytes 256 <$bitmap >$bitmap.tmp &&
     ++		mv -f $bitmap.tmp $bitmap &&
     ++		git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
     ++		test_cmp expect actual &&
     ++		test_i18ngrep corrupt.ewah.bitmap stderr
     ++	'
     ++
     ++	test_expect_success 'truncated bitmap fails gracefully (cache)' '
     ++		git repack -ad &&
     ++		git rev-list --use-bitmap-index --count --all >expect &&
     ++		bitmap=$(ls .git/objects/pack/*.bitmap) &&
     ++		test_when_finished "rm -f $bitmap" &&
     ++		test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
     ++		mv -f $bitmap.tmp $bitmap &&
     ++		git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
     ++		test_cmp expect actual &&
     ++		test_i18ngrep corrupted.bitmap.index stderr
     ++	'
     ++
     ++	# Create a state of history with these properties:
     ++	#
     ++	#  - refs that allow a client to fetch some new history, while sharing some old
     ++	#    history with the server; we use branches delta-reuse-old and
     ++	#    delta-reuse-new here
     ++	#
     ++	#  - the new history contains an object that is stored on the server as a delta
     ++	#    against a base that is in the old history
     ++	#
     ++	#  - the base object is not immediately reachable from the tip of the old
     ++	#    history; finding it would involve digging down through history we know the
     ++	#    other side has
     ++	#
     ++	# This should result in a state where fetching from old->new would not
     ++	# traditionally reuse the on-disk delta (because we'd have to dig to realize
     ++	# that the client has it), but we will do so if bitmaps can tell us cheaply
     ++	# that the other side has it.
     ++	test_expect_success 'set up thin delta-reuse parent' '
     ++		# This first commit contains the buried base object.
     ++		test-tool genrandom delta 16384 >file &&
     ++		git add file &&
     ++		git commit -m "delta base" &&
     ++		base=$(git rev-parse --verify HEAD:file) &&
     ++
     ++		# These intermediate commits bury the base back in history.
     ++		# This becomes the "old" state.
     ++		for i in 1 2 3 4 5
     ++		do
     ++			echo $i >file &&
     ++			git commit -am "intermediate $i" || return 1
     ++		done &&
     ++		git branch delta-reuse-old &&
     ++
     ++		# And now our new history has a delta against the buried base. Note
     ++		# that this must be smaller than the original file, since pack-objects
     ++		# prefers to create deltas from smaller objects to larger.
     ++		test-tool genrandom delta 16300 >file &&
     ++		git commit -am "delta result" &&
     ++		delta=$(git rev-parse --verify HEAD:file) &&
     ++		git branch delta-reuse-new &&
     ++
     ++		# Repack with bitmaps and double check that we have the expected delta
     ++		# relationship.
     ++		git repack -adb &&
     ++		have_delta $delta $base
     ++	'
     ++
     ++	# Now we can sanity-check the non-bitmap behavior (that the server is not able
     ++	# to reuse the delta). This isn't strictly something we care about, so this
     ++	# test could be scrapped in the future. But it makes sure that the next test is
     ++	# actually triggering the feature we want.
     ++	#
     ++	# Note that our tools for working with on-the-wire "thin" packs are limited. So
     ++	# we actually perform the fetch, retain the resulting pack, and inspect the
     ++	# result.
     ++	test_expect_success 'fetch without bitmaps ignores delta against old base' '
     ++		test_config pack.usebitmaps false &&
     ++		test_when_finished "rm -rf client.git" &&
     ++		git init --bare client.git &&
     ++		(
     ++			cd client.git &&
     ++			git config transfer.unpackLimit 1 &&
     ++			git fetch .. delta-reuse-old:delta-reuse-old &&
     ++			git fetch .. delta-reuse-new:delta-reuse-new &&
     ++			have_delta $delta $ZERO_OID
     ++		)
     ++	'
     ++
     ++	# And do the same for the bitmap case, where we do expect to find the delta.
     ++	test_expect_success 'fetch with bitmaps can reuse old base' '
     ++		test_config pack.usebitmaps true &&
     ++		test_when_finished "rm -rf client.git" &&
     ++		git init --bare client.git &&
     ++		(
     ++			cd client.git &&
     ++			git config transfer.unpackLimit 1 &&
     ++			git fetch .. delta-reuse-old:delta-reuse-old &&
     ++			git fetch .. delta-reuse-new:delta-reuse-new &&
     ++			have_delta $delta $base
     ++		)
     ++	'
     ++
     ++	test_expect_success 'pack.preferBitmapTips' '
     ++		git init repo &&
     ++		test_when_finished "rm -fr repo" &&
     ++		(
     ++			cd repo &&
     ++			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
     ++
     ++			# create enough commits that not all are receive bitmap
     ++			# coverage even if they are all at the tip of some reference.
     ++			test_commit_bulk --message="%s" 103 &&
     ++
     ++			git rev-list HEAD >commits.raw &&
     ++			sort <commits.raw >commits &&
     ++
     ++			git log --format="create refs/tags/%s %H" HEAD >refs &&
     ++			git update-ref --stdin <refs &&
     ++
     ++			git repack -adb &&
     ++			test-tool bitmap list-commits | sort >bitmaps &&
     ++
     ++			# remember which commits did not receive bitmaps
     ++			comm -13 bitmaps commits >before &&
     ++			test_file_not_empty before &&
     ++
     ++			# mark the commits which did not receive bitmaps as preferred,
     ++			# and generate the bitmap again
     ++			perl -pe "s{^}{create refs/tags/include/$. }" <before |
     ++				git update-ref --stdin &&
     ++			git -c pack.preferBitmapTips=refs/tags/include repack -adb &&
     ++
     ++			# finally, check that the commit(s) without bitmap coverage
     ++			# are not the same ones as before
     ++			test-tool bitmap list-commits | sort >bitmaps &&
     ++			comm -13 bitmaps commits >after &&
     ++
     ++			! test_cmp before after
     ++		)
     ++	'
     ++
     ++	test_expect_success 'complains about multiple pack bitmaps' '
     ++		rm -fr repo &&
     ++		git init repo &&
     ++		test_when_finished "rm -fr repo" &&
     ++		(
     ++			cd repo &&
     ++			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
     ++
     ++			test_commit base &&
     ++
     ++			git repack -adb &&
     ++			bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
     ++			mv "$bitmap" "$bitmap.bak" &&
     ++
     ++			test_commit other &&
     ++			git repack -ab &&
     ++
     ++			mv "$bitmap.bak" "$bitmap" &&
     ++
     ++			find .git/objects/pack -type f -name "*.pack" >packs &&
     ++			find .git/objects/pack -type f -name "*.bitmap" >bitmaps &&
     ++			test_line_count = 2 packs &&
     ++			test_line_count = 2 bitmaps &&
     ++
     ++			git rev-list --use-bitmap-index HEAD 2>err &&
     ++			grep "ignoring extra bitmap file" err
     ++		)
     ++	'
     ++}
     + 
     +-basic_bitmap_tests
     ++test_bitmap_cases
     + 
     + test_expect_success 'incremental repack fails when bitmaps are requested' '
     + 	test_commit more-1 &&
     +@@ t/t5310-pack-bitmaps.sh: test_expect_success 'incremental repack can disable bitmaps' '
     + 	git repack -d --no-write-bitmap-index
     + '
     + 
     +-test_expect_success 'pack-objects respects --local (non-local loose)' '
     +-	git init --bare alt.git &&
     +-	echo $(pwd)/alt.git/objects >.git/objects/info/alternates &&
     +-	echo content1 >file1 &&
     +-	# non-local loose object which is not present in bitmapped pack
     +-	altblob=$(GIT_DIR=alt.git git hash-object -w file1) &&
     +-	# non-local loose object which is also present in bitmapped pack
     +-	git cat-file blob $blob | GIT_DIR=alt.git git hash-object -w --stdin &&
     +-	git add file1 &&
     +-	test_tick &&
     +-	git commit -m commit_file1 &&
     +-	echo HEAD | git pack-objects --local --stdout --revs >1.pack &&
     +-	git index-pack 1.pack &&
     +-	list_packed_objects 1.idx >1.objects &&
     +-	printf "%s\n" "$altblob" "$blob" >nonlocal-loose &&
     +-	! has_any nonlocal-loose 1.objects
     +-'
     +-
     +-test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' '
     +-	echo content2 >file2 &&
     +-	blob2=$(git hash-object -w file2) &&
     +-	git add file2 &&
     +-	test_tick &&
     +-	git commit -m commit_file2 &&
     +-	printf "%s\n" "$blob2" "$bitmaptip" >keepobjects &&
     +-	pack2=$(git pack-objects pack2 <keepobjects) &&
     +-	mv pack2-$pack2.* .git/objects/pack/ &&
     +-	>.git/objects/pack/pack2-$pack2.keep &&
     +-	rm $(objpath $blob2) &&
     +-	echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >2a.pack &&
     +-	git index-pack 2a.pack &&
     +-	list_packed_objects 2a.idx >2a.objects &&
     +-	! has_any keepobjects 2a.objects
     +-'
     +-
     +-test_expect_success 'pack-objects respects --local (non-local pack)' '
     +-	mv .git/objects/pack/pack2-$pack2.* alt.git/objects/pack/ &&
     +-	echo HEAD | git pack-objects --local --stdout --revs >2b.pack &&
     +-	git index-pack 2b.pack &&
     +-	list_packed_objects 2b.idx >2b.objects &&
     +-	! has_any keepobjects 2b.objects
     +-'
     +-
     +-test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' '
     +-	ls .git/objects/pack/ | grep bitmap >output &&
     +-	test_line_count = 1 output &&
     +-	packbitmap=$(basename $(cat output) .bitmap) &&
     +-	list_packed_objects .git/objects/pack/$packbitmap.idx >packbitmap.objects &&
     +-	test_when_finished "rm -f .git/objects/pack/$packbitmap.keep" &&
     +-	>.git/objects/pack/$packbitmap.keep &&
     +-	echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack &&
     +-	git index-pack 3a.pack &&
     +-	list_packed_objects 3a.idx >3a.objects &&
     +-	! has_any packbitmap.objects 3a.objects
     +-'
     +-
     +-test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' '
     +-	mv .git/objects/pack/$packbitmap.* alt.git/objects/pack/ &&
     +-	rm -f .git/objects/pack/multi-pack-index &&
     +-	test_when_finished "mv alt.git/objects/pack/$packbitmap.* .git/objects/pack/" &&
     +-	echo HEAD | git pack-objects --local --stdout --revs >3b.pack &&
     +-	git index-pack 3b.pack &&
     +-	list_packed_objects 3b.idx >3b.objects &&
     +-	! has_any packbitmap.objects 3b.objects
     +-'
     +-
     +-test_expect_success 'pack-objects to file can use bitmap' '
     +-	# make sure we still have 1 bitmap index from previous tests
     +-	ls .git/objects/pack/ | grep bitmap >output &&
     +-	test_line_count = 1 output &&
     +-	# verify equivalent packs are generated with/without using bitmap index
     +-	packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) &&
     +-	packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) &&
     +-	list_packed_objects packa-$packasha1.idx >packa.objects &&
     +-	list_packed_objects packb-$packbsha1.idx >packb.objects &&
     +-	test_cmp packa.objects packb.objects
     +-'
     +-
     +-test_expect_success 'full repack, reusing previous bitmaps' '
     +-	git repack -ad &&
     +-	ls .git/objects/pack/ | grep bitmap >output &&
     +-	test_line_count = 1 output
     +-'
     +-
     +-test_expect_success 'fetch (full bitmap)' '
     +-	git --git-dir=clone.git fetch origin second:second &&
     +-	git rev-parse HEAD >expect &&
     +-	git --git-dir=clone.git rev-parse HEAD >actual &&
     +-	test_cmp expect actual
     +-'
     +-
     +-test_expect_success 'create objects for missing-HAVE tests' '
     +-	blob=$(echo "missing have" | git hash-object -w --stdin) &&
     +-	tree=$(printf "100644 blob $blob\tfile\n" | git mktree) &&
     +-	parent=$(echo parent | git commit-tree $tree) &&
     +-	commit=$(echo commit | git commit-tree $tree -p $parent) &&
     +-	cat >revs <<-EOF
     +-	HEAD
     +-	^HEAD^
     +-	^$commit
     +-	EOF
     +-'
     +-
     +-test_expect_success 'pack-objects respects --incremental' '
     +-	cat >revs2 <<-EOF &&
     +-	HEAD
     +-	$commit
     +-	EOF
     +-	git pack-objects --incremental --stdout --revs <revs2 >4.pack &&
     +-	git index-pack 4.pack &&
     +-	list_packed_objects 4.idx >4.objects &&
     +-	test_line_count = 4 4.objects &&
     +-	git rev-list --objects $commit >revlist &&
     +-	cut -d" " -f1 revlist |sort >objects &&
     +-	test_cmp 4.objects objects
     +-'
     +-
     +-test_expect_success 'pack with missing blob' '
     +-	rm $(objpath $blob) &&
     +-	git pack-objects --stdout --revs <revs >/dev/null
     +-'
     ++test_bitmap_cases "pack.writeBitmapLookupTable"
     + 
     +-test_expect_success 'pack with missing tree' '
     +-	rm $(objpath $tree) &&
     +-	git pack-objects --stdout --revs <revs >/dev/null
     +-'
     +-
     +-test_expect_success 'pack with missing parent' '
     +-	rm $(objpath $parent) &&
     +-	git pack-objects --stdout --revs <revs >/dev/null
     +-'
     +-
     +-test_expect_success JGIT,SHA1 'we can read jgit bitmaps' '
     +-	git clone --bare . compat-jgit.git &&
     +-	(
     +-		cd compat-jgit.git &&
     +-		rm -f objects/pack/*.bitmap &&
     +-		jgit gc &&
     +-		git rev-list --test-bitmap HEAD
     +-	)
     +-'
     +-
     +-test_expect_success JGIT,SHA1 'jgit can read our bitmaps' '
     +-	git clone --bare . compat-us.git &&
     +-	(
     +-		cd compat-us.git &&
     +-		git repack -adb &&
     +-		# jgit gc will barf if it does not like our bitmaps
     +-		jgit gc
     +-	)
     +-'
     +-
     +-test_expect_success 'splitting packs does not generate bogus bitmaps' '
     +-	test-tool genrandom foo $((1024 * 1024)) >rand &&
     +-	git add rand &&
     +-	git commit -m "commit with big file" &&
     +-	git -c pack.packSizeLimit=500k repack -adb &&
     +-	git init --bare no-bitmaps.git &&
     +-	git -C no-bitmaps.git fetch .. HEAD
     +-'
     +-
     +-test_expect_success 'set up reusable pack' '
     +-	rm -f .git/objects/pack/*.keep &&
     +-	git repack -adb &&
     +-	reusable_pack () {
     +-		git for-each-ref --format="%(objectname)" |
     +-		git pack-objects --delta-base-offset --revs --stdout "$@"
     +-	}
     +-'
     +-
     +-test_expect_success 'pack reuse respects --honor-pack-keep' '
     +-	test_when_finished "rm -f .git/objects/pack/*.keep" &&
     +-	for i in .git/objects/pack/*.pack
     +-	do
     +-		>${i%.pack}.keep || return 1
     +-	done &&
     +-	reusable_pack --honor-pack-keep >empty.pack &&
     +-	git index-pack empty.pack &&
     +-	git show-index <empty.idx >actual &&
     +-	test_must_be_empty actual
     +-'
     +-
     +-test_expect_success 'pack reuse respects --local' '
     +-	mv .git/objects/pack/* alt.git/objects/pack/ &&
     +-	test_when_finished "mv alt.git/objects/pack/* .git/objects/pack/" &&
     +-	reusable_pack --local >empty.pack &&
     +-	git index-pack empty.pack &&
     +-	git show-index <empty.idx >actual &&
     +-	test_must_be_empty actual
     +-'
     +-
     +-test_expect_success 'pack reuse respects --incremental' '
     +-	reusable_pack --incremental >empty.pack &&
     +-	git index-pack empty.pack &&
     +-	git show-index <empty.idx >actual &&
     +-	test_must_be_empty actual
     +-'
     +-
     +-test_expect_success 'truncated bitmap fails gracefully (ewah)' '
     +-	test_config pack.writebitmaphashcache false &&
     +-	git repack -ad &&
     +-	git rev-list --use-bitmap-index --count --all >expect &&
     +-	bitmap=$(ls .git/objects/pack/*.bitmap) &&
     +-	test_when_finished "rm -f $bitmap" &&
     +-	test_copy_bytes 256 <$bitmap >$bitmap.tmp &&
     +-	mv -f $bitmap.tmp $bitmap &&
     +-	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
     +-	test_cmp expect actual &&
     +-	test_i18ngrep corrupt.ewah.bitmap stderr
     +-'
     +-
     +-test_expect_success 'truncated bitmap fails gracefully (cache)' '
     +-	git repack -ad &&
     +-	git rev-list --use-bitmap-index --count --all >expect &&
     +-	bitmap=$(ls .git/objects/pack/*.bitmap) &&
     +-	test_when_finished "rm -f $bitmap" &&
     +-	test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
     +-	mv -f $bitmap.tmp $bitmap &&
     +-	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
     +-	test_cmp expect actual &&
     +-	test_i18ngrep corrupted.bitmap.index stderr
     +-'
     +-
     +-# Create a state of history with these properties:
     +-#
     +-#  - refs that allow a client to fetch some new history, while sharing some old
     +-#    history with the server; we use branches delta-reuse-old and
     +-#    delta-reuse-new here
     +-#
     +-#  - the new history contains an object that is stored on the server as a delta
     +-#    against a base that is in the old history
     +-#
     +-#  - the base object is not immediately reachable from the tip of the old
     +-#    history; finding it would involve digging down through history we know the
     +-#    other side has
     +-#
     +-# This should result in a state where fetching from old->new would not
     +-# traditionally reuse the on-disk delta (because we'd have to dig to realize
     +-# that the client has it), but we will do so if bitmaps can tell us cheaply
     +-# that the other side has it.
     +-test_expect_success 'set up thin delta-reuse parent' '
     +-	# This first commit contains the buried base object.
     +-	test-tool genrandom delta 16384 >file &&
     +-	git add file &&
     +-	git commit -m "delta base" &&
     +-	base=$(git rev-parse --verify HEAD:file) &&
     +-
     +-	# These intermediate commits bury the base back in history.
     +-	# This becomes the "old" state.
     +-	for i in 1 2 3 4 5
     +-	do
     +-		echo $i >file &&
     +-		git commit -am "intermediate $i" || return 1
     +-	done &&
     +-	git branch delta-reuse-old &&
     +-
     +-	# And now our new history has a delta against the buried base. Note
     +-	# that this must be smaller than the original file, since pack-objects
     +-	# prefers to create deltas from smaller objects to larger.
     +-	test-tool genrandom delta 16300 >file &&
     +-	git commit -am "delta result" &&
     +-	delta=$(git rev-parse --verify HEAD:file) &&
     +-	git branch delta-reuse-new &&
     +-
     +-	# Repack with bitmaps and double check that we have the expected delta
     +-	# relationship.
     +-	git repack -adb &&
     +-	have_delta $delta $base
     +-'
     +-
     +-# Now we can sanity-check the non-bitmap behavior (that the server is not able
     +-# to reuse the delta). This isn't strictly something we care about, so this
     +-# test could be scrapped in the future. But it makes sure that the next test is
     +-# actually triggering the feature we want.
     +-#
     +-# Note that our tools for working with on-the-wire "thin" packs are limited. So
     +-# we actually perform the fetch, retain the resulting pack, and inspect the
     +-# result.
     +-test_expect_success 'fetch without bitmaps ignores delta against old base' '
     +-	test_config pack.usebitmaps false &&
     +-	test_when_finished "rm -rf client.git" &&
     +-	git init --bare client.git &&
     +-	(
     +-		cd client.git &&
     +-		git config transfer.unpackLimit 1 &&
     +-		git fetch .. delta-reuse-old:delta-reuse-old &&
     +-		git fetch .. delta-reuse-new:delta-reuse-new &&
     +-		have_delta $delta $ZERO_OID
     +-	)
     +-'
     +-
     +-# And do the same for the bitmap case, where we do expect to find the delta.
     +-test_expect_success 'fetch with bitmaps can reuse old base' '
     +-	test_config pack.usebitmaps true &&
     +-	test_when_finished "rm -rf client.git" &&
     +-	git init --bare client.git &&
     +-	(
     +-		cd client.git &&
     +-		git config transfer.unpackLimit 1 &&
     +-		git fetch .. delta-reuse-old:delta-reuse-old &&
     +-		git fetch .. delta-reuse-new:delta-reuse-new &&
     +-		have_delta $delta $base
     +-	)
     +-'
     +-
     +-test_expect_success 'pack.preferBitmapTips' '
     +-	git init repo &&
     +-	test_when_finished "rm -fr repo" &&
     +-	(
     +-		cd repo &&
     +-
     +-		# create enough commits that not all are receive bitmap
     +-		# coverage even if they are all at the tip of some reference.
     +-		test_commit_bulk --message="%s" 103 &&
     +-
     +-		git rev-list HEAD >commits.raw &&
     +-		sort <commits.raw >commits &&
     +-
     +-		git log --format="create refs/tags/%s %H" HEAD >refs &&
     +-		git update-ref --stdin <refs &&
     +-
     +-		git repack -adb &&
     +-		test-tool bitmap list-commits | sort >bitmaps &&
     +-
     +-		# remember which commits did not receive bitmaps
     +-		comm -13 bitmaps commits >before &&
     +-		test_file_not_empty before &&
     +-
     +-		# mark the commits which did not receive bitmaps as preferred,
     +-		# and generate the bitmap again
     +-		perl -pe "s{^}{create refs/tags/include/$. }" <before |
     +-			git update-ref --stdin &&
     +-		git -c pack.preferBitmapTips=refs/tags/include repack -adb &&
     +-
     +-		# finally, check that the commit(s) without bitmap coverage
     +-		# are not the same ones as before
     +-		test-tool bitmap list-commits | sort >bitmaps &&
     +-		comm -13 bitmaps commits >after &&
     +-
     +-		! test_cmp before after
     +-	)
     +-'
     +-
     +-test_expect_success 'complains about multiple pack bitmaps' '
     +-	rm -fr repo &&
     +-	git init repo &&
     +-	test_when_finished "rm -fr repo" &&
     +-	(
     +-		cd repo &&
     +-
     +-		test_commit base &&
     +-
     +-		git repack -adb &&
     +-		bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
     +-		mv "$bitmap" "$bitmap.bak" &&
     +-
     +-		test_commit other &&
     +-		git repack -ab &&
     +-
     +-		mv "$bitmap.bak" "$bitmap" &&
     +-
     +-		find .git/objects/pack -type f -name "*.pack" >packs &&
     +-		find .git/objects/pack -type f -name "*.bitmap" >bitmaps &&
     +-		test_line_count = 2 packs &&
     +-		test_line_count = 2 bitmaps &&
     +-
     +-		git rev-list --use-bitmap-index HEAD 2>err &&
     +-		grep "ignoring extra bitmap file" err
     +-	)
     ++test_expect_success 'verify writing bitmap lookup table when enabled' '
     ++	GIT_TRACE2_EVENT="$(pwd)/trace2" \
     ++		git repack -ad &&
     ++	grep "\"label\":\"writing_lookup_table\"" trace2
       '
       
     - basic_bitmap_tests
     + test_done
     +
     + ## t/t5311-pack-bitmaps-shallow.sh ##
     +@@ t/t5311-pack-bitmaps-shallow.sh: test_description='check bitmap operation with shallow repositories'
     + # the tree for A. But in a shallow one, we've grafted away
     + # A, and fetching A to B requires that the other side send
     + # us the tree for file=1.
     +-test_expect_success 'setup shallow repo' '
     +-	echo 1 >file &&
     +-	git add file &&
     +-	git commit -m orig &&
     +-	echo 2 >file &&
     +-	git commit -a -m update &&
     +-	git clone --no-local --bare --depth=1 . shallow.git &&
     +-	echo 1 >file &&
     +-	git commit -a -m repeat
     +-'
     +-
     +-test_expect_success 'turn on bitmaps in the parent' '
     +-	git repack -adb
     +-'
     +-
     +-test_expect_success 'shallow fetch from bitmapped repo' '
     +-	(cd shallow.git && git fetch)
     +-'
     ++test_shallow_bitmaps () {
     ++	writeLookupTable=false
     ++
     ++	for i in "$@"
     ++	do
     ++		case $i in
     ++		"pack.writeBitmapLookupTable") writeLookupTable=true;;
     ++		esac
     ++	done
     ++
     ++	test_expect_success 'setup shallow repo' '
     ++		rm -rf * .git &&
     ++		git init &&
     ++		git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
     ++		echo 1 >file &&
     ++		git add file &&
     ++		git commit -m orig &&
     ++		echo 2 >file &&
     ++		git commit -a -m update &&
     ++		git clone --no-local --bare --depth=1 . shallow.git &&
     ++		echo 1 >file &&
     ++		git commit -a -m repeat
     ++	'
     ++
     ++	test_expect_success 'turn on bitmaps in the parent' '
     ++		git repack -adb
     ++	'
     ++
     ++	test_expect_success 'shallow fetch from bitmapped repo' '
     ++		(cd shallow.git && git fetch)
     ++	'
     ++}
     ++
     ++test_shallow_bitmaps
     ++
     + 
     + test_done
      
       ## t/t5326-multi-pack-bitmaps.sh ##
     -@@ t/t5326-multi-pack-bitmaps.sh: test_expect_success 'graceful fallback when missing reverse index' '
     - 	)
     - '
     +@@ t/t5326-multi-pack-bitmaps.sh: GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
     + sane_unset GIT_TEST_MIDX_WRITE_REV
     + sane_unset GIT_TEST_MIDX_READ_RIDX
       
     +-midx_bitmap_core
     +-
     + bitmap_reuse_tests() {
     + 	from=$1
     + 	to=$2
     ++	writeLookupTable=false
     ++
     ++	for i in $3-${$#}
     ++	do
     ++		case $i in
     ++		"pack.writeBitmapLookupTable") writeLookupTable=true;;
     ++		esac
     ++	done
     + 
     + 	test_expect_success "setup pack reuse tests ($from -> $to)" '
     + 		rm -fr repo &&
     + 		git init repo &&
     + 		(
     + 			cd repo &&
     ++			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
     + 			test_commit_bulk 16 &&
     + 			git tag old-tip &&
     + 
     +@@ t/t5326-multi-pack-bitmaps.sh: bitmap_reuse_tests() {
     + 	test_expect_success "build bitmap from existing ($from -> $to)" '
     + 		(
     + 			cd repo &&
     ++			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
     + 			test_commit_bulk --id=further 16 &&
     + 			git tag new-tip &&
     + 
     +@@ t/t5326-multi-pack-bitmaps.sh: bitmap_reuse_tests() {
     + 	test_expect_success "verify resulting bitmaps ($from -> $to)" '
     + 		(
     + 			cd repo &&
     ++			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
     + 			git for-each-ref &&
     + 			git rev-list --test-bitmap refs/tags/old-tip &&
     + 			git rev-list --test-bitmap refs/tags/new-tip
     +@@ t/t5326-multi-pack-bitmaps.sh: bitmap_reuse_tests() {
     + 	'
     + }
     + 
     +-bitmap_reuse_tests 'pack' 'MIDX'
     +-bitmap_reuse_tests 'MIDX' 'pack'
     +-bitmap_reuse_tests 'MIDX' 'MIDX'
     ++test_midx_bitmap_cases () {
     ++	writeLookupTable=false
     ++	writeBitmapLookupTable=
     ++
     ++	for i in "$@"
     ++	do
     ++		case $i in
     ++		"pack.writeBitmapLookupTable")
     ++			writeLookupTable=true
     ++			writeBitmapLookupTable="$i"
     ++			;;
     ++		esac
     ++	done
     ++
     ++	test_expect_success 'setup test_repository' '
     ++		rm -rf * .git &&
     ++		git init &&
     ++		git config pack.writeBitmapLookupTable '"$writeLookupTable"'
     ++	'
     + 
     +-test_expect_success 'missing object closure fails gracefully' '
     +-	rm -fr repo &&
     +-	git init repo &&
     +-	test_when_finished "rm -fr repo" &&
     +-	(
     +-		cd repo &&
     ++	midx_bitmap_core
     + 
     +-		test_commit loose &&
     +-		test_commit packed &&
     ++	bitmap_reuse_tests 'pack' 'MIDX' "$writeBitmapLookupTable"
     ++	bitmap_reuse_tests 'MIDX' 'pack' "$writeBitmapLookupTable"
     ++	bitmap_reuse_tests 'MIDX' 'MIDX' "$writeBitmapLookupTable"
     + 
     +-		# Do not pass "--revs"; we want a pack without the "loose"
     +-		# commit.
     +-		git pack-objects $objdir/pack/pack <<-EOF &&
     +-		$(git rev-parse packed)
     +-		EOF
     ++	test_expect_success 'missing object closure fails gracefully' '
     ++		rm -fr repo &&
     ++		git init repo &&
     ++		test_when_finished "rm -fr repo" &&
     ++		(
     ++			cd repo &&
     ++			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
     + 
     +-		test_must_fail git multi-pack-index write --bitmap 2>err &&
     +-		grep "doesn.t have full closure" err &&
     +-		test_path_is_missing $midx
     +-	)
     +-'
     ++			test_commit loose &&
     ++			test_commit packed &&
     + 
     +-midx_bitmap_partial_tests
     ++			# Do not pass "--revs"; we want a pack without the "loose"
     ++			# commit.
     ++			git pack-objects $objdir/pack/pack <<-EOF &&
     ++			$(git rev-parse packed)
     ++			EOF
     + 
     +-test_expect_success 'removing a MIDX clears stale bitmaps' '
     +-	rm -fr repo &&
     +-	git init repo &&
     +-	test_when_finished "rm -fr repo" &&
     +-	(
     +-		cd repo &&
     +-		test_commit base &&
     +-		git repack &&
     +-		git multi-pack-index write --bitmap &&
     ++			test_must_fail git multi-pack-index write --bitmap 2>err &&
     ++			grep "doesn.t have full closure" err &&
     ++			test_path_is_missing $midx
     ++		)
     ++	'
     + 
     +-		# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
     +-		stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
     +-		rm $midx &&
     ++	midx_bitmap_partial_tests
     + 
     +-		# Then write a new MIDX.
     +-		test_commit new &&
     +-		git repack &&
     +-		git multi-pack-index write --bitmap &&
     ++	test_expect_success 'removing a MIDX clears stale bitmaps' '
     ++		rm -fr repo &&
     ++		git init repo &&
     ++		test_when_finished "rm -fr repo" &&
     ++		(
     ++			cd repo &&
     ++			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
     ++			test_commit base &&
     ++			git repack &&
     ++			git multi-pack-index write --bitmap &&
     ++
     ++			# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
     ++			stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
     ++			rm $midx &&
     ++
     ++			# Then write a new MIDX.
     ++			test_commit new &&
     ++			git repack &&
     ++			git multi-pack-index write --bitmap &&
     ++
     ++			test_path_is_file $midx &&
     ++			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
     ++			test_path_is_missing $stale_bitmap
     ++		)
     ++	'
     + 
     +-		test_path_is_file $midx &&
     +-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
     +-		test_path_is_missing $stale_bitmap
     +-	)
     +-'
     ++	test_expect_success 'pack.preferBitmapTips' '
     ++		git init repo &&
     ++		test_when_finished "rm -fr repo" &&
     ++		(
     ++			cd repo &&
     ++			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
     + 
     +-test_expect_success 'pack.preferBitmapTips' '
     +-	git init repo &&
     +-	test_when_finished "rm -fr repo" &&
     +-	(
     +-		cd repo &&
     ++			test_commit_bulk --message="%s" 103 &&
     + 
     +-		test_commit_bulk --message="%s" 103 &&
     ++			git log --format="%H" >commits.raw &&
     ++			sort <commits.raw >commits &&
     + 
     +-		git log --format="%H" >commits.raw &&
     +-		sort <commits.raw >commits &&
     ++			git log --format="create refs/tags/%s %H" HEAD >refs &&
     ++			git update-ref --stdin <refs &&
     + 
     +-		git log --format="create refs/tags/%s %H" HEAD >refs &&
     +-		git update-ref --stdin <refs &&
     ++			git multi-pack-index write --bitmap &&
     ++			test_path_is_file $midx &&
     ++			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
     + 
     +-		git multi-pack-index write --bitmap &&
     +-		test_path_is_file $midx &&
     +-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
     ++			test-tool bitmap list-commits | sort >bitmaps &&
     ++			comm -13 bitmaps commits >before &&
     ++			test_line_count = 1 before &&
     + 
     +-		test-tool bitmap list-commits | sort >bitmaps &&
     +-		comm -13 bitmaps commits >before &&
     +-		test_line_count = 1 before &&
     ++			perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
     ++				<before | git update-ref --stdin &&
     + 
     +-		perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
     +-			<before | git update-ref --stdin &&
     ++			rm -fr $midx-$(midx_checksum $objdir).bitmap &&
     ++			rm -fr $midx &&
     + 
     +-		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
     +-		rm -fr $midx &&
     ++			git -c pack.preferBitmapTips=refs/tags/include \
     ++				multi-pack-index write --bitmap &&
     ++			test-tool bitmap list-commits | sort >bitmaps &&
     ++			comm -13 bitmaps commits >after &&
     + 
     +-		git -c pack.preferBitmapTips=refs/tags/include \
     +-			multi-pack-index write --bitmap &&
     +-		test-tool bitmap list-commits | sort >bitmaps &&
     +-		comm -13 bitmaps commits >after &&
     ++			! test_cmp before after
     ++		)
     ++	'
     + 
     +-		! test_cmp before after
     +-	)
     +-'
     ++	test_expect_success 'writing a bitmap with --refs-snapshot' '
     ++		git init repo &&
     ++		test_when_finished "rm -fr repo" &&
     ++		(
     ++			cd repo &&
     ++			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
     + 
     +-test_expect_success 'writing a bitmap with --refs-snapshot' '
     +-	git init repo &&
     +-	test_when_finished "rm -fr repo" &&
     +-	(
     +-		cd repo &&
     ++			test_commit one &&
     ++			test_commit two &&
     + 
     +-		test_commit one &&
     +-		test_commit two &&
     ++			git rev-parse one >snapshot &&
     + 
     +-		git rev-parse one >snapshot &&
     ++			git repack -ad &&
     + 
     +-		git repack -ad &&
     ++			# First, write a MIDX which see both refs/tags/one and
     ++			# refs/tags/two (causing both of those commits to receive
     ++			# bitmaps).
     ++			git multi-pack-index write --bitmap &&
     + 
     +-		# First, write a MIDX which see both refs/tags/one and
     +-		# refs/tags/two (causing both of those commits to receive
     +-		# bitmaps).
     +-		git multi-pack-index write --bitmap &&
     ++			test_path_is_file $midx &&
     ++			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
     + 
     +-		test_path_is_file $midx &&
     +-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
     ++			test-tool bitmap list-commits | sort >bitmaps &&
     ++			grep "$(git rev-parse one)" bitmaps &&
     ++			grep "$(git rev-parse two)" bitmaps &&
     + 
     +-		test-tool bitmap list-commits | sort >bitmaps &&
     +-		grep "$(git rev-parse one)" bitmaps &&
     +-		grep "$(git rev-parse two)" bitmaps &&
     ++			rm -fr $midx-$(midx_checksum $objdir).bitmap &&
     ++			rm -fr $midx &&
     + 
     +-		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
     +-		rm -fr $midx &&
     ++			# Then again, but with a refs snapshot which only sees
     ++			# refs/tags/one.
     ++			git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
     + 
     +-		# Then again, but with a refs snapshot which only sees
     +-		# refs/tags/one.
     +-		git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
     ++			test_path_is_file $midx &&
     ++			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
     + 
     +-		test_path_is_file $midx &&
     +-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
     ++			test-tool bitmap list-commits | sort >bitmaps &&
     ++			grep "$(git rev-parse one)" bitmaps &&
     ++			! grep "$(git rev-parse two)" bitmaps
     ++		)
     ++	'
     + 
     +-		test-tool bitmap list-commits | sort >bitmaps &&
     +-		grep "$(git rev-parse one)" bitmaps &&
     +-		! grep "$(git rev-parse two)" bitmaps
     +-	)
     +-'
     ++	test_expect_success 'write a bitmap with --refs-snapshot (preferred tips)' '
     ++		git init repo &&
     ++		test_when_finished "rm -fr repo" &&
     ++		(
     ++			cd repo &&
     ++			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
     + 
     +-test_expect_success 'write a bitmap with --refs-snapshot (preferred tips)' '
     +-	git init repo &&
     +-	test_when_finished "rm -fr repo" &&
     +-	(
     +-		cd repo &&
     ++			test_commit_bulk --message="%s" 103 &&
     + 
     +-		test_commit_bulk --message="%s" 103 &&
     ++			git log --format="%H" >commits.raw &&
     ++			sort <commits.raw >commits &&
     + 
     +-		git log --format="%H" >commits.raw &&
     +-		sort <commits.raw >commits &&
     ++			git log --format="create refs/tags/%s %H" HEAD >refs &&
     ++			git update-ref --stdin <refs &&
     + 
     +-		git log --format="create refs/tags/%s %H" HEAD >refs &&
     +-		git update-ref --stdin <refs &&
     ++			git multi-pack-index write --bitmap &&
     ++			test_path_is_file $midx &&
     ++			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
     + 
     +-		git multi-pack-index write --bitmap &&
     +-		test_path_is_file $midx &&
     +-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
     ++			test-tool bitmap list-commits | sort >bitmaps &&
     ++			comm -13 bitmaps commits >before &&
     ++			test_line_count = 1 before &&
     + 
     +-		test-tool bitmap list-commits | sort >bitmaps &&
     +-		comm -13 bitmaps commits >before &&
     +-		test_line_count = 1 before &&
     ++			(
     ++				grep -vf before commits.raw &&
     ++				# mark missing commits as preferred
     ++				sed "s/^/+/" before
     ++			) >snapshot &&
     + 
     ++			rm -fr $midx-$(midx_checksum $objdir).bitmap &&
     ++			rm -fr $midx &&
     ++
     ++			git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
     ++			test-tool bitmap list-commits | sort >bitmaps &&
     ++			comm -13 bitmaps commits >after &&
     ++
     ++			! test_cmp before after
     ++		)
     ++	'
     ++
     ++	test_expect_success 'hash-cache values are propagated from pack bitmaps' '
     ++		rm -fr repo &&
     ++		git init repo &&
     ++		test_when_finished "rm -fr repo" &&
     + 		(
     +-			grep -vf before commits.raw &&
     +-			# mark missing commits as preferred
     +-			sed "s/^/+/" before
     +-		) >snapshot &&
     ++			cd repo &&
     ++			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
     + 
     +-		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
     +-		rm -fr $midx &&
     ++			test_commit base &&
     ++			test_commit base2 &&
     ++			git repack -adb &&
     + 
     +-		git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
     +-		test-tool bitmap list-commits | sort >bitmaps &&
     +-		comm -13 bitmaps commits >after &&
     ++			test-tool bitmap dump-hashes >pack.raw &&
     ++			test_file_not_empty pack.raw &&
     ++			sort pack.raw >pack.hashes &&
     + 
     +-		! test_cmp before after
     +-	)
     +-'
     ++			test_commit new &&
     ++			git repack &&
     ++			git multi-pack-index write --bitmap &&
     + 
     +-test_expect_success 'hash-cache values are propagated from pack bitmaps' '
     +-	rm -fr repo &&
     +-	git init repo &&
     +-	test_when_finished "rm -fr repo" &&
     +-	(
     +-		cd repo &&
     ++			test-tool bitmap dump-hashes >midx.raw &&
     ++			sort midx.raw >midx.hashes &&
     + 
     +-		test_commit base &&
     +-		test_commit base2 &&
     +-		git repack -adb &&
     ++			# ensure that every namehash in the pack bitmap can be found in
     ++			# the midx bitmap (i.e., that there are no oid-namehash pairs
     ++			# unique to the pack bitmap).
     ++			comm -23 pack.hashes midx.hashes >dropped.hashes &&
     ++			test_must_be_empty dropped.hashes
     ++		)
     ++	'
     + 
     +-		test-tool bitmap dump-hashes >pack.raw &&
     +-		test_file_not_empty pack.raw &&
     +-		sort pack.raw >pack.hashes &&
     ++	test_expect_success 'no .bitmap is written without any objects' '
     ++		rm -fr repo &&
     ++		git init repo &&
     ++		test_when_finished "rm -fr repo" &&
     ++		(
     ++			cd repo &&
     ++			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
     + 
     +-		test_commit new &&
     +-		git repack &&
     +-		git multi-pack-index write --bitmap &&
     ++			empty="$(git pack-objects $objdir/pack/pack </dev/null)" &&
     ++			cat >packs <<-EOF &&
     ++			pack-$empty.idx
     ++			EOF
     + 
     +-		test-tool bitmap dump-hashes >midx.raw &&
     +-		sort midx.raw >midx.hashes &&
     ++			git multi-pack-index write --bitmap --stdin-packs \
     ++				<packs 2>err &&
     + 
     +-		# ensure that every namehash in the pack bitmap can be found in
     +-		# the midx bitmap (i.e., that there are no oid-namehash pairs
     +-		# unique to the pack bitmap).
     +-		comm -23 pack.hashes midx.hashes >dropped.hashes &&
     +-		test_must_be_empty dropped.hashes
     +-	)
     +-'
     ++			grep "bitmap without any objects" err &&
     + 
     +-test_expect_success 'no .bitmap is written without any objects' '
     +-	rm -fr repo &&
     +-	git init repo &&
     +-	test_when_finished "rm -fr repo" &&
     +-	(
     +-		cd repo &&
     ++			test_path_is_file $midx &&
     ++			test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
     ++		)
     ++	'
     ++
     ++	test_expect_success 'graceful fallback when missing reverse index' '
     ++		rm -fr repo &&
     ++		git init repo &&
     ++		test_when_finished "rm -fr repo" &&
     ++		(
     ++			cd repo &&
     ++			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
     + 
     +-		empty="$(git pack-objects $objdir/pack/pack </dev/null)" &&
     +-		cat >packs <<-EOF &&
     +-		pack-$empty.idx
     +-		EOF
     ++			test_commit base &&
     + 
     +-		git multi-pack-index write --bitmap --stdin-packs \
     +-			<packs 2>err &&
     ++			# write a pack and MIDX bitmap containing base
     ++			git repack -adb &&
     ++			git multi-pack-index write --bitmap &&
     + 
     +-		grep "bitmap without any objects" err &&
     ++			GIT_TEST_MIDX_READ_RIDX=0 \
     ++				git rev-list --use-bitmap-index HEAD 2>err &&
     ++			! grep "ignoring extra bitmap file" err
     ++		)
     ++	'
     ++}
     + 
     +-		test_path_is_file $midx &&
     +-		test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
     +-	)
     +-'
     ++test_midx_bitmap_cases
     ++
     ++test_midx_bitmap_cases "pack.writeBitmapLookupTable"
     + 
     +-test_expect_success 'graceful fallback when missing reverse index' '
      +test_expect_success 'multi-pack-index write writes lookup table if enabled' '
     -+	rm -fr repo &&
     -+	git init repo &&
     -+	test_when_finished "rm -fr repo" &&
     -+	(
     -+		cd repo &&
     -+		test_commit base &&
     + 	rm -fr repo &&
     + 	git init repo &&
     + 	test_when_finished "rm -fr repo" &&
     + 	(
     + 		cd repo &&
     +-
     + 		test_commit base &&
     +-
     +-		# write a pack and MIDX bitmap containing base
     +-		git repack -adb &&
     +-		git multi-pack-index write --bitmap &&
     +-
     +-		GIT_TEST_MIDX_READ_RIDX=0 \
     +-			git rev-list --use-bitmap-index HEAD 2>err &&
     +-		! grep "ignoring extra bitmap file" err
     ++		git config pack.writeBitmapLookupTable true &&
      +		git repack -ad &&
      +		GIT_TRACE2_EVENT="$(pwd)/trace" \
      +			git multi-pack-index write --bitmap &&
      +		grep "\"label\":\"writing_lookup_table\"" trace
     -+	)
     + 	)
     + '
     + 
     +
     + ## t/t5327-multi-pack-bitmaps-rev.sh ##
     +@@ t/t5327-multi-pack-bitmaps-rev.sh: export GIT_TEST_MIDX_READ_RIDX
     + midx_bitmap_core rev
     + midx_bitmap_partial_tests rev
     + 
     ++test_expect_success 'reinitialize the repository with lookup table enabled' '
     ++    rm -fr * .git &&
     ++    git init &&
     ++    git config pack.writeBitmapLookupTable true
      +'
     ++
     ++midx_bitmap_core rev
     ++midx_bitmap_partial_tests rev
     ++
       test_done
 4:  4fbfcff8a20 ! 4:  e64362621d2 pack-bitmap: prepare to read lookup table extension
     @@ Commit message
          does not know how to parse them.
      
          Teach Git to parse the existing bitmap lookup table. The older
     -    versions of git are not affected by it. Those versions ignore the
     +    versions of Git are not affected by it. Those versions ignore the
          lookup table.
      
     -    Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
          Mentored-by: Taylor Blau <me@ttaylorr.com>
          Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
     +    Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
      
       ## pack-bitmap.c ##
      @@ pack-bitmap.c: struct bitmap_index {
     @@ pack-bitmap.c: struct bitmap_index {
       
      +	/*
      +	 * If not NULL, this point into the commit table extension
     -+	 * (within map).
     ++	 * (within the memory mapped region `map`).
      +	 */
      +	unsigned char *table_lookup;
      +
     @@ pack-bitmap.c: static int load_bitmap_header(struct bitmap_index *index)
       			index_end -= cache_size;
       		}
      +
     -+		if (flags & BITMAP_OPT_LOOKUP_TABLE &&
     -+			git_env_bool("GIT_TEST_READ_COMMIT_TABLE", 1)) {
     -+			size_t table_size = 0;
     -+			size_t triplet_sz = st_add3(sizeof(uint32_t),    /* commit position */
     -+							sizeof(uint64_t),    /* offset */
     -+							sizeof(uint32_t));    /* xor offset */
     -+
     -+			table_size = st_add(table_size,
     -+					st_mult(ntohl(header->entry_count),
     -+						triplet_sz));
     ++		if (flags & BITMAP_OPT_LOOKUP_TABLE) {
     ++			size_t table_size = st_mult(ntohl(header->entry_count),
     ++						    BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH);
      +			if (table_size > index_end - index->map - header_size)
     -+				return error("corrupted bitmap index file (too short to fit lookup table)");
     -+			index->table_lookup = (void *)(index_end - table_size);
     ++				return error(_("corrupted bitmap index file (too short to fit lookup table)"));
     ++			if (git_env_bool("GIT_TEST_READ_COMMIT_TABLE", 1))
     ++				index->table_lookup = (void *)(index_end - table_size);
      +			index_end -= table_size;
      +		}
       	}
     @@ pack-bitmap.c: static struct stored_bitmap *store_bitmap(struct bitmap_index *in
      -	/* a 0 return code means the insertion succeeded with no changes,
      -	 * because the SHA1 already existed on the map. this is bad, there
      -	 * shouldn't be duplicated commits in the index */
     -+	/* A 0 return code means the insertion succeeded with no changes,
     -+	 * because the SHA1 already existed on the map. If lookup table
     -+	 * is NULL, this is bad, there shouldn't be duplicated commits
     -+	 * in the index.
     -+	 *
     -+	 * If table_lookup exists, that means the desired bitmap is already
     -+	 * loaded. Either this bitmap has been stored directly or another
     -+	 * bitmap has a direct or indirect xor relation with it. */
     ++	/*
     ++	 * A 0 return code means the insertion succeeded with no changes,
     ++	 * because the SHA1 already existed on the map. This is bad, there
     ++	 * shouldn't be duplicated commits in the index.
     ++	 */
       	if (ret == 0) {
      -		error("Duplicate entry in bitmap index: %s", oid_to_hex(oid));
     --		return NULL;
     -+		if (!index->table_lookup) {
     -+			error("Duplicate entry in bitmap index: %s", oid_to_hex(oid));
     -+			return NULL;
     -+		}
     -+		return kh_value(index->bitmaps, hash_pos);
     ++		error(_("duplicate entry in bitmap index: %s"), oid_to_hex(oid));
     + 		return NULL;
       	}
       
     - 	kh_value(index->bitmaps, hash_pos) = stored;
      @@ pack-bitmap.c: static int load_bitmap(struct bitmap_index *bitmap_git)
       		!(bitmap_git->tags = read_bitmap_1(bitmap_git)))
       		goto failed;
     @@ pack-bitmap.c: struct include_data {
       	struct bitmap *seen;
       };
       
     -+static inline const void *bitmap_get_triplet(struct bitmap_index *bitmap_git, uint32_t xor_pos)
     -+{
     -+	size_t triplet_sz = st_add3(sizeof(uint32_t), sizeof(uint64_t), sizeof(uint32_t));
     -+	const void *p = bitmap_git->table_lookup + st_mult(xor_pos, triplet_sz);
     -+	return p;
     -+}
     -+
     -+static uint64_t triplet_get_offset(const void *triplet)
     -+{
     -+	const void *p = (unsigned char*) triplet + sizeof(uint32_t);
     -+	return get_be64(p);
     -+}
     ++struct bitmap_lookup_table_triplet {
     ++	uint32_t commit_pos;
     ++	uint64_t offset;
     ++	uint32_t xor_row;
     ++};
      +
     -+static uint32_t triplet_get_xor_pos(const void *triplet)
     ++struct bitmap_lookup_table_xor_item {
     ++	struct object_id oid;
     ++	uint64_t offset;
     ++};
     ++
     ++/*
     ++ * This function gets the raw triplet from `row`'th row in the
     ++ * lookup table and fills that data to the `triplet`.
     ++ */
     ++static int lookup_table_get_triplet(struct bitmap_index *bitmap_git,
     ++				    uint32_t pos,
     ++				    struct bitmap_lookup_table_triplet *triplet)
      +{
     -+	const void *p = (unsigned char*) triplet + st_add(sizeof(uint32_t), sizeof(uint64_t));
     -+	return get_be32(p);
     ++	unsigned char *p = NULL;
     ++	if (pos >= bitmap_git->entry_count)
     ++		return error(_("corrupt bitmap lookup table: triplet position out of index"));
     ++
     ++	p = bitmap_git->table_lookup + st_mult(pos, BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH);
     ++
     ++	triplet->commit_pos = get_be32(p);
     ++	p += sizeof(uint32_t);
     ++	triplet->offset = get_be64(p);
     ++	p += sizeof(uint64_t);
     ++	triplet->xor_row = get_be32(p);
     ++	return 0;
      +}
      +
     ++/*
     ++ * Searches for a matching triplet. `va` is a pointer
     ++ * to the wanted commit position value. `vb` points to
     ++ * a triplet in lookup table. The first 4 bytes of each
     ++ * triplet (pointed by `vb`) are compared with `*va`.
     ++ */
      +static int triplet_cmp(const void *va, const void *vb)
      +{
     -+	int result = 0;
     -+	uint32_t *a = (uint32_t *) va;
     ++
     ++	uint32_t a = *(uint32_t *)va;
      +	uint32_t b = get_be32(vb);
     -+	if (*a > b)
     -+		result = 1;
     -+	else if (*a < b)
     -+		result = -1;
     -+	else
     -+		result = 0;
     ++	if (a > b)
     ++		return 1;
     ++	else if (a < b)
     ++		return -1;
      +
     -+	return result;
     ++	return 0;
      +}
      +
     -+static uint32_t bsearch_pos(struct bitmap_index *bitmap_git, struct object_id *oid,
     -+						uint32_t *result)
     ++static uint32_t bsearch_pos(struct bitmap_index *bitmap_git,
     ++			    struct object_id *oid,
     ++			    uint32_t *result)
      +{
      +	int found;
      +
     -+	if (bitmap_git->midx)
     ++	if (bitmap_is_midx(bitmap_git))
      +		found = bsearch_midx(oid, bitmap_git->midx, result);
      +	else
      +		found = bsearch_pack(oid, bitmap_git->pack, result);
     @@ pack-bitmap.c: struct include_data {
      +	return found;
      +}
      +
     ++/*
     ++ * `bsearch_triplet` function searches for the raw triplet having
     ++ * commit position same as `commit_pos` and fills `triplet`
     ++ * object from the raw triplet. Returns 1 on success and 0
     ++ * on failure.
     ++ */
     ++static int bsearch_triplet(uint32_t *commit_pos,
     ++			   struct bitmap_index *bitmap_git,
     ++			   struct bitmap_lookup_table_triplet *triplet)
     ++{
     ++	unsigned char *p = bsearch(commit_pos, bitmap_git->table_lookup, bitmap_git->entry_count,
     ++				   BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH, triplet_cmp);
     ++
     ++	if (!p)
     ++		return 0;
     ++	triplet->commit_pos = get_be32(p);
     ++	p += sizeof(uint32_t);
     ++	triplet->offset = get_be64(p);
     ++	p += sizeof(uint64_t);
     ++	triplet->xor_row = get_be32(p);
     ++	return 1;
     ++}
     ++
      +static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_git,
      +					  struct commit *commit)
      +{
     -+	uint32_t commit_pos, xor_pos;
     ++	uint32_t commit_pos, xor_row;
      +	uint64_t offset;
      +	int flags;
     -+	const void *triplet = NULL;
     ++	struct bitmap_lookup_table_triplet triplet;
      +	struct object_id *oid = &commit->object.oid;
      +	struct ewah_bitmap *bitmap;
      +	struct stored_bitmap *xor_bitmap = NULL;
     -+	size_t triplet_sz = st_add3(sizeof(uint32_t), sizeof(uint64_t), sizeof(uint32_t));
      +
      +	int found = bsearch_pos(bitmap_git, oid, &commit_pos);
      +
      +	if (!found)
      +		return NULL;
      +
     -+	triplet = bsearch(&commit_pos, bitmap_git->table_lookup, bitmap_git->entry_count,
     -+						triplet_sz, triplet_cmp);
     -+	if (!triplet)
     ++	if (!bsearch_triplet(&commit_pos, bitmap_git, &triplet))
      +		return NULL;
      +
     -+	offset = triplet_get_offset(triplet);
     -+	xor_pos = triplet_get_xor_pos(triplet);
     ++	offset = triplet.offset;
     ++	xor_row = triplet.xor_row;
      +
     -+	if (xor_pos != 0xffffffff) {
     ++	if (xor_row != 0xffffffff) {
      +		int xor_flags;
     ++		khiter_t hash_pos;
      +		uint64_t offset_xor;
     -+		uint32_t *xor_positions;
     -+		struct object_id xor_oid;
     -+		size_t size = 0;
     -+
     -+		ALLOC_ARRAY(xor_positions, bitmap_git->entry_count);
     -+		while (xor_pos != 0xffffffff) {
     -+			xor_positions[size++] = xor_pos;
     -+			triplet = bitmap_get_triplet(bitmap_git, xor_pos);
     -+			xor_pos = triplet_get_xor_pos(triplet);
     ++		struct bitmap_lookup_table_xor_item *xor_items;
     ++		struct bitmap_lookup_table_xor_item xor_item;
     ++		size_t xor_items_nr = 0, xor_items_alloc = 64;
     ++
     ++		ALLOC_ARRAY(xor_items, xor_items_alloc);
     ++		while (xor_row != 0xffffffff) {
     ++			struct object_id xor_oid;
     ++
     ++			if (xor_items_nr + 1 >= bitmap_git->entry_count) {
     ++				free(xor_items);
     ++				error(_("corrupt bitmap lookup table: xor chain exceed entry count"));
     ++				return NULL;
     ++			}
     ++
     ++			if (lookup_table_get_triplet(bitmap_git, xor_row, &triplet) < 0)
     ++				return NULL;
     ++
     ++			offset_xor = triplet.offset;
     ++
     ++			if (nth_bitmap_object_oid(bitmap_git, &xor_oid, triplet.commit_pos) < 0) {
     ++				free(xor_items);
     ++				error(_("corrupt bitmap lookup table: commit index %u out of range"),
     ++					triplet.commit_pos);
     ++				return NULL;
     ++			}
     ++
     ++			hash_pos = kh_get_oid_map(bitmap_git->bitmaps, xor_oid);
     ++
     ++			/*
     ++			 * If desired bitmap is already stored, we don't need
     ++			 * to iterate further. Because we know that bitmaps
     ++			 * that are needed to be parsed to parse this bitmap
     ++			 * has already been stored. So, assign this stored bitmap
     ++			 * to the xor_bitmap.
     ++			 */
     ++			if (hash_pos < kh_end(bitmap_git->bitmaps) &&
     ++			    (xor_bitmap = kh_value(bitmap_git->bitmaps, hash_pos)))
     ++				break;
     ++
     ++			ALLOC_GROW(xor_items, xor_items_nr + 1, xor_items_alloc);
     ++			xor_items[xor_items_nr++] = (struct bitmap_lookup_table_xor_item) {.oid = xor_oid,
     ++											   .offset = offset_xor};
     ++			xor_row = triplet.xor_row;
      +		}
      +
     -+		while (size){
     -+			xor_pos = xor_positions[size - 1];
     -+			triplet = bitmap_get_triplet(bitmap_git, xor_pos);
     -+			commit_pos = get_be32(triplet);
     -+			offset_xor = triplet_get_offset(triplet);
     ++		while (xor_items_nr) {
     ++			xor_item = xor_items[xor_items_nr - 1];
     ++			offset_xor = xor_item.offset;
      +
     -+			if (nth_bitmap_object_oid(bitmap_git, &xor_oid, commit_pos) < 0) {
     -+				free(xor_positions);
     ++			bitmap_git->map_pos = offset_xor;
     ++			if (bitmap_git->map_size - bitmap_git->map_pos < 6) {
     ++				error(_("corrupt ewah bitmap: truncated header for bitmap of commit \"%s\""),
     ++					oid_to_hex(&xor_item.oid));
     ++				free(xor_items);
      +				return NULL;
      +			}
      +
     -+			bitmap_git->map_pos = offset_xor + sizeof(uint32_t) + sizeof(uint8_t);
     ++			bitmap_git->map_pos = bitmap_git->map_pos + sizeof(uint32_t) + sizeof(uint8_t);
      +			xor_flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
      +			bitmap = read_bitmap_1(bitmap_git);
      +
     -+			if (!bitmap){
     -+				free(xor_positions);
     ++			if (!bitmap) {
     ++				free(xor_items);
      +				return NULL;
      +			}
      +
     -+			xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_oid, xor_bitmap, xor_flags);
     -+			size--;
     ++			xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item.oid, xor_bitmap, xor_flags);
     ++			xor_items_nr--;
      +		}
      +
     -+		free(xor_positions);
     ++		free(xor_items);
      +	}
      +
     -+	bitmap_git->map_pos = offset + sizeof(uint32_t) + sizeof(uint8_t);
     ++	bitmap_git->map_pos = offset;
     ++	if (bitmap_git->map_size - bitmap_git->map_pos < 6) {
     ++		error(_("corrupt ewah bitmap: truncated header for bitmap of commit \"%s\""),
     ++			oid_to_hex(oid));
     ++		return NULL;
     ++	}
     ++
     ++	bitmap_git->map_pos = bitmap_git->map_pos + sizeof(uint32_t) + sizeof(uint8_t);
      +	flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
      +	bitmap = read_bitmap_1(bitmap_git);
      +
     @@ pack-bitmap.c: struct include_data {
      +		if (!bitmap_git->table_lookup)
      +			return NULL;
      +
     ++		trace2_region_enter("pack-bitmap", "reading_lookup_table", the_repository);
      +		/* NEEDSWORK: cache misses aren't recorded */
      +		bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
     -+		if(!bitmap)
     ++		trace2_region_leave("pack-bitmap", "reading_lookup_table", the_repository);
     ++		if (!bitmap)
      +			return NULL;
      +		return lookup_stored_bitmap(bitmap);
      +	}
     @@ pack-bitmap.c: void test_bitmap_walk(struct rev_info *revs)
       		die("you must specify exactly one commit to test");
       
      -	fprintf(stderr, "Bitmap v%d test (%d entries loaded)\n",
     -+	fprintf(stderr, "Bitmap v%d test (%d entries)\n",
     - 		bitmap_git->version, bitmap_git->entry_count);
     +-		bitmap_git->version, bitmap_git->entry_count);
     ++	fprintf(stderr, "Bitmap v%d test (%d entries%s)",
     ++		bitmap_git->version,
     ++		bitmap_git->entry_count,
     ++		bitmap_git->table_lookup ? "" : " loaded");
       
     -+	if (!bitmap_git->table_lookup)
     -+		fprintf(stderr, "Bitmap v%d test (%d entries loaded)\n",
     -+			bitmap_git->version, bitmap_git->entry_count);
     -+
       	root = revs->pending.objects[0].item;
       	bm = bitmap_for_commit(bitmap_git, (struct commit *)root);
     - 
      @@ pack-bitmap.c: void test_bitmap_walk(struct rev_info *revs)
       
       int test_bitmap_commits(struct repository *r)
       {
      -	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
     -+	struct bitmap_index *bitmap_git = NULL;
       	struct object_id oid;
       	MAYBE_UNUSED void *value;
     - 
     -+	/* As this function is only used to print bitmap selected
     ++	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
     ++
     ++	/*
     ++	 * As this function is only used to print bitmap selected
      +	 * commits, we don't have to read the commit table.
      +	 */
     -+	setenv("GIT_TEST_READ_COMMIT_TABLE", "0", 1);
     -+
     -+	bitmap_git = prepare_bitmap_git(r);
     + 
       	if (!bitmap_git)
       		die("failed to load bitmap indexes");
       
     -@@ pack-bitmap.c: int test_bitmap_commits(struct repository *r)
     ++	if (bitmap_git->table_lookup) {
     ++		if (load_bitmap_entries_v1(bitmap_git) < 0)
     ++			die(_("failed to load bitmap indexes"));
     ++	}
     ++
     + 	kh_foreach(bitmap_git->bitmaps, oid, value, {
       		printf("%s\n", oid_to_hex(&oid));
       	});
     +
     + ## pack-bitmap.h ##
     +@@ pack-bitmap.h: struct bitmap_disk_header {
       
     -+	setenv("GIT_TEST_READ_COMMIT_TABLE", "1", 1);
     - 	free_bitmap_index(bitmap_git);
     + #define NEEDS_BITMAP (1u<<22)
       
     - 	return 0;
     ++/*
     ++ * The width in bytes of a single triplet in the lookup table
     ++ * extension:
     ++ *     (commit_pos, offset, xor_row)
     ++ *
     ++ * whose fields ar 32-, 64-, 32- bits wide, respectively.
     ++ */
     ++#define BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH (16)
     ++
     + enum pack_bitmap_opts {
     + 	BITMAP_OPT_FULL_DAG = 0x1,
     + 	BITMAP_OPT_HASH_CACHE = 0x4,
      
       ## t/t5310-pack-bitmaps.sh ##
     -@@ t/t5310-pack-bitmaps.sh: test_expect_success 'full repack creates bitmaps' '
     - 	grep "\"label\":\"writing_lookup_table\"" trace
     +@@ t/t5310-pack-bitmaps.sh: test_bitmap_cases () {
     + 
     + 	test_expect_success 'truncated bitmap fails gracefully (ewah)' '
     + 		test_config pack.writebitmaphashcache false &&
     ++		test_config pack.writebitmaplookuptable false &&
     + 		git repack -ad &&
     + 		git rev-list --use-bitmap-index --count --all >expect &&
     + 		bitmap=$(ls .git/objects/pack/*.bitmap) &&
     +@@ t/t5310-pack-bitmaps.sh: test_bitmap_cases () {
     + 	'
     + 
     + 	test_expect_success 'truncated bitmap fails gracefully (cache)' '
     ++		git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
     + 		git repack -ad &&
     + 		git rev-list --use-bitmap-index --count --all >expect &&
     + 		bitmap=$(ls .git/objects/pack/*.bitmap) &&
     +@@ t/t5310-pack-bitmaps.sh: test_expect_success 'verify writing bitmap lookup table when enabled' '
     + 	grep "\"label\":\"writing_lookup_table\"" trace2
       '
       
     -+test_expect_success 'using lookup table loads only necessary bitmaps' '
     -+	git rev-list --test-bitmap HEAD 2>out &&
     -+	! grep "Bitmap v1 test (106 entries loaded)" out &&
     -+	grep "Found bitmap for" out
     ++test_expect_success 'lookup table is actually used to traverse objects' '
     ++	git repack -adb &&
     ++	GIT_TRACE2_EVENT="$(pwd)/trace3" \
     ++		git rev-list --use-bitmap-index --count --all &&
     ++	grep "\"label\":\"reading_lookup_table\"" trace3
      +'
      +
     - basic_bitmap_tests
     - 
     - test_expect_success 'incremental repack fails when bitmaps are requested' '
     -@@ t/t5310-pack-bitmaps.sh: test_expect_success 'pack reuse respects --incremental' '
     - 
     - test_expect_success 'truncated bitmap fails gracefully (ewah)' '
     - 	test_config pack.writebitmaphashcache false &&
     -+	test_config pack.writebitmaplookuptable false &&
     - 	git repack -ad &&
     - 	git rev-list --use-bitmap-index --count --all >expect &&
     - 	bitmap=$(ls .git/objects/pack/*.bitmap) &&
     -
     - ## t/t5326-multi-pack-bitmaps.sh ##
     -@@ t/t5326-multi-pack-bitmaps.sh: test_expect_success 'multi-pack-index write writes lookup table if enabled' '
     - 		grep "\"label\":\"writing_lookup_table\"" trace
     - 	)
     - '
     ++test_expect_success 'truncated bitmap fails gracefully (lookup table)' '
     ++	test_config pack.writebitmaphashcache false &&
     ++	git repack -adb &&
     ++	git rev-list --use-bitmap-index --count --all >expect &&
     ++	bitmap=$(ls .git/objects/pack/*.bitmap) &&
     ++	test_when_finished "rm -f $bitmap" &&
     ++	test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
     ++	mv -f $bitmap.tmp $bitmap &&
     ++	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
     ++	test_cmp expect actual &&
     ++	test_i18ngrep corrupted.bitmap.index stderr
     ++'
      +
       test_done
 5:  96c0041688f ! 5:  a155c1e2eba bitmap-lookup-table: add performance tests for lookup table
     @@ Metadata
       ## Commit message ##
          bitmap-lookup-table: add performance tests for lookup table
      
     -    Add performance tests to verify the performance of lookup table.
     +    Add performance tests to verify the performance of lookup table with
     +    `pack.writeReverseIndex` enabled. This is to check the performance
     +    when the above configuration is set.
      
          Lookup table makes Git run faster in most of the cases. Below is the
          result of `t/perf/p5310-pack-bitmaps.sh`.`perf/p5326-multi-pack-bitmaps.sh`
          gives similar result. The repository used in the test is linux kernel.
      
          Test                                                      this tree
     -    --------------------------------------------------------------------------
     -    5310.4: repack to disk (lookup=false)                   295.94(250.45+15.24)
     -    5310.5: simulated clone                                 12.52(5.07+1.40)
     -    5310.6: simulated fetch                                 1.89(2.94+0.24)
     -    5310.7: pack to file (bitmap)                           41.39(20.33+7.20)
     -    5310.8: rev-list (commits)                              0.98(0.59+0.12)
     -    5310.9: rev-list (objects)                              3.40(3.27+0.10)
     +    ---------------------------------------------------------------------------
     +    5310.4: repack to disk (lookup=false)                   296.55(256.53+14.52)
     +    5310.5: simulated clone                                 15.64(8.88+1.39)
     +    5310.6: simulated fetch                                 1.65(2.75+0.20)
     +    5310.7: pack to file (bitmap)                           48.71(30.20+7.58)
     +    5310.8: rev-list (commits)                              0.61(0.41+0.08)
     +    5310.9: rev-list (objects)                              4.38(4.26+0.09)
          5310.10: rev-list with tag negated via --not            0.07(0.02+0.04)
                   --all (objects)
     -    5310.11: rev-list with negative tag (objects)           0.23(0.16+0.06)
     -    5310.12: rev-list count with blob:none                  0.26(0.18+0.07)
     -    5310.13: rev-list count with blob:limit=1k              6.45(5.94+0.37)
     -    5310.14: rev-list count with tree:0                     0.26(0.18+0.07)
     -    5310.15: simulated partial clone                        4.99(3.19+0.45)
     -    5310.19: repack to disk (lookup=true)                   269.67(174.70+21.33)
     -    5310.20: simulated clone                                11.03(5.07+1.11)
     -    5310.21: simulated fetch                                0.79(0.79+0.17)
     -    5310.22: pack to file (bitmap)                          43.03(20.28+7.43)
     -    5310.23: rev-list (commits)                             0.86(0.54+0.09)
     -    5310.24: rev-list (objects)                             3.35(3.26+0.07)
     -    5310.25: rev-list with tag negated via --not            0.05(0.00+0.03)
     +    5310.11: rev-list with negative tag (objects)           0.05(0.01+0.03)
     +    5310.12: rev-list count with blob:none                  0.08(0.03+0.04)
     +    5310.13: rev-list count with blob:limit=1k              7.29(6.92+0.30)
     +    5310.14: rev-list count with tree:0                     0.08(0.03+0.04)
     +    5310.15: simulated partial clone                        9.45(8.12+0.41)
     +    5310.19: repack to disk (lookup=true)                   255.92(188.13+20.47)
     +    5310.20: simulated clone                                13.78(8.84+1.09)
     +    5310.21: simulated fetch                                0.52(0.63+0.14)
     +    5310.22: pack to file (bitmap)                          44.34(28.94+6.84)
     +    5310.23: rev-list (commits)                             0.48(0.31+0.06)
     +    5310.24: rev-list (objects)                             4.02(3.93+0.07)
     +    5310.25: rev-list with tag negated via --not            0.04(0.00+0.03)
                   --all (objects)
     -    5310.26: rev-list with negative tag (objects)           0.22(0.16+0.05)
     -    5310.27: rev-list count with blob:none                  0.22(0.16+0.05)
     -    5310.28: rev-list count with blob:limit=1k              6.45(5.87+0.31)
     -    5310.29: rev-list count with tree:0                     0.22(0.16+0.05)
     -    5310.30: simulated partial clone                        5.17(3.12+0.48)
     +    5310.26: rev-list with negative tag (objects)           0.04(0.00+0.03)
     +    5310.27: rev-list count with blob:none                  0.04(0.01+0.03)
     +    5310.28: rev-list count with blob:limit=1k              6.48(6.23+0.22)
     +    5310.29: rev-list count with tree:0                     0.04(0.01+0.03)
     +    5310.30: simulated partial clone                        8.30(7.21+0.36)
      
          Test 4-15 are tested without using lookup table. Same tests are
          repeated in 16-30 (using lookup table).
      
     -    Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
          Mentored-by: Taylor Blau <me@ttaylorr.com>
          Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
     +    Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
      
       ## t/perf/p5310-pack-bitmaps.sh ##
     -@@ t/perf/p5310-pack-bitmaps.sh: test_expect_success 'setup bitmap config' '
     - 	git config pack.writebitmaps true
     +@@ t/perf/p5310-pack-bitmaps.sh: test_perf_large_repo
     + # We intentionally use the deprecated pack.writebitmaps
     + # config so that we can test against older versions of git.
     + test_expect_success 'setup bitmap config' '
     +-	git config pack.writebitmaps true
     ++	git config pack.writebitmaps true &&
     ++	git config pack.writeReverseIndex true
       '
       
      -# we need to create the tag up front such that it is covered by the repack and
     @@ t/perf/p5310-pack-bitmaps.sh: test_expect_success 'setup bitmap config' '
      -test_expect_success 'create tags' '
      -	git tag --message="tag pointing to HEAD" perf-tag HEAD
      -'
     --
     ++test_bitmap () {
     ++	local enabled="$1"
     + 
      -test_perf 'repack to disk' '
      -	git repack -ad
      -'
     --
     ++	# we need to create the tag up front such that it is covered by the repack and
     ++	# thus by generated bitmaps.
     ++	test_expect_success 'create tags' '
     ++		git tag --message="tag pointing to HEAD" perf-tag HEAD
     ++	'
     + 
      -test_full_bitmap
     --
     ++	test_expect_success "use lookup table: $enabled" '
     ++		git config pack.writeBitmapLookupTable '"$enabled"'
     ++	'
     + 
      -test_expect_success 'create partial bitmap state' '
      -	# pick a commit to represent the repo tip in the past
      -	cutoff=$(git rev-list HEAD~100 -1) &&
      -	orig_tip=$(git rev-parse HEAD) &&
     --
     ++	test_perf "repack to disk (lookup=$enabled)" '
     ++		git repack -ad
     ++	'
     + 
      -	# now kill off all of the refs and pretend we had
      -	# just the one tip
      -	rm -rf .git/logs .git/refs/* .git/packed-refs &&
      -	git update-ref HEAD $cutoff &&
     --
     ++	test_full_bitmap
     + 
      -	# and then repack, which will leave us with a nice
      -	# big bitmap pack of the "old" history, and all of
      -	# the new history will be loose, as if it had been pushed
      -	# up incrementally and exploded via unpack-objects
      -	git repack -Ad &&
     --
     ++	test_expect_success "create partial bitmap state (lookup=$enabled)" '
     ++		# pick a commit to represent the repo tip in the past
     ++		cutoff=$(git rev-list HEAD~100 -1) &&
     ++		orig_tip=$(git rev-parse HEAD) &&
     + 
      -	# and now restore our original tip, as if the pushes
      -	# had happened
      -	git update-ref HEAD $orig_tip
      -'
     --
     --test_partial_bitmap
     -+test_bitmap () {
     -+    local enabled="$1"
     -+
     -+	# we need to create the tag up front such that it is covered by the repack and
     -+	# thus by generated bitmaps.
     -+	test_expect_success 'create tags' '
     -+		git tag --message="tag pointing to HEAD" perf-tag HEAD
     -+	'
     -+
     -+	test_expect_success "use lookup table: $enabled" '
     -+		git config pack.writeBitmapLookupTable '"$enabled"'
     -+	'
     -+
     -+	test_perf "repack to disk (lookup=$enabled)" '
     -+		git repack -ad
     -+	'
     -+
     -+	test_full_bitmap
     -+
     -+    test_expect_success "create partial bitmap state (lookup=$enabled)" '
     -+		# pick a commit to represent the repo tip in the past
     -+		cutoff=$(git rev-list HEAD~100 -1) &&
     -+		orig_tip=$(git rev-parse HEAD) &&
     -+
      +		# now kill off all of the refs and pretend we had
      +		# just the one tip
      +		rm -rf .git/logs .git/refs/* .git/packed-refs &&
     @@ t/perf/p5310-pack-bitmaps.sh: test_expect_success 'setup bitmap config' '
      +		# and now restore our original tip, as if the pushes
      +		# had happened
      +		git update-ref HEAD $orig_tip
     -+    '
     ++	'
      +}
     -+
     + 
     +-test_partial_bitmap
      +test_bitmap false
      +test_bitmap true
       
     @@ t/perf/p5326-multi-pack-bitmaps.sh: test_description='Tests performance using mi
      -
      -test_partial_bitmap
      +test_bitmap () {
     -+    local enabled="$1"
     ++	local enabled="$1"
      +
      +	# we need to create the tag up front such that it is covered by the repack and
      +	# thus by generated bitmaps.
     @@ t/perf/p5326-multi-pack-bitmaps.sh: test_description='Tests performance using mi
      +
      +	test_full_bitmap
      +
     -+    test_expect_success "create partial bitmap state (lookup=$enabled)" '
     ++	test_expect_success "create partial bitmap state (lookup=$enabled)" '
      +		# pick a commit to represent the repo tip in the past
      +		cutoff=$(git rev-list HEAD~100 -1) &&
      +		orig_tip=$(git rev-parse HEAD) &&
     @@ t/perf/p5326-multi-pack-bitmaps.sh: test_description='Tests performance using mi
      +		# and now restore our original tip, as if the pushes
      +		# had happened
      +		git update-ref HEAD $orig_tip
     -+    '
     ++	'
      +}
      +
      +test_bitmap false
 6:  fe556b58814 ! 6:  4f9f1049485 p5310-pack-bitmaps.sh: enable pack.writeReverseIndex for testing
     @@ Metadata
      Author: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
      
       ## Commit message ##
     -    p5310-pack-bitmaps.sh: enable pack.writeReverseIndex for testing
     +    p5310-pack-bitmaps.sh: remove pack.writeReverseIndex
      
     -    Enable pack.writeReverseIndex to true to see the effect of writing
     -    the reverse index in the existing bitmap tests (with and without
     -    lookup table).
     +    The previous change enables the `pack.writereverseindex` to see
     +    the effect of writing reverse index in the performance test.
     +
     +    Remove the `pack.writeReverseIndex` configuration.
      
          Below is the result of performance test. Output format is in
          seconds.
      
     -    Test                                             this tree
     -    -------------------------------------------------------------------
     -    5310.4: repack to disk (lookup=false)           294.92(257.60+14.29)
     -    5310.5: simulated clone                         14.97(8.95+1.31)
     -    5310.6: simulated fetch                         1.64(2.77+0.20)
     -    5310.7: pack to file (bitmap)                   41.76(29.33+6.77)
     -    5310.8: rev-list (commits)                      0.71(0.49+0.09)
     -    5310.9: rev-list (objects)                      4.65(4.55+0.09)
     -    5310.10: rev-list with tag negated via --not    0.08(0.02+0.05)
     +    Test                                                  this tree
     +    ------------------------------------------------------------------------
     +    5310.4: repack to disk (lookup=false)               293.80(251.30+14.30)
     +    5310.5: simulated clone                             12.50(5.15+1.36)
     +    5310.6: simulated fetch                             1.83(2.90+0.23)
     +    5310.7: pack to file (bitmap)                       39.70(20.25+7.14)
     +    5310.8: rev-list (commits)                          1.00(0.60+0.13)
     +    5310.9: rev-list (objects)                          4.11(4.00+0.10)
     +    5310.10: rev-list with tag negated via --not        0.07(0.02+0.05)
                   --all (objects)
     -    5310.11: rev-list with negative tag (objects)   0.06(0.01+0.04)
     -    5310.12: rev-list count with blob:none          0.09(0.03+0.05)
     -    5310.13: rev-list count with blob:limit=1k      7.58(7.06+0.33)
     -    5310.14: rev-list count with tree:0             0.09(0.03+0.06)
     -    5310.15: simulated partial clone                8.64(8.04+0.35)
     -    5310.19: repack to disk (lookup=true)           249.86(191.57+19.50)
     -    5310.20: simulated clone                        13.67(8.83+1.06)
     -    5310.21: simulated fetch                        0.50(0.63+0.13)
     -    5310.22: pack to file (bitmap)                  41.24(28.99+6.67)
     -    5310.23: rev-list (commits)                     0.67(0.50+0.07)
     -    5310.24: rev-list (objects)                     4.88(4.79+0.08)
     -    5310.25: rev-list with tag negated via --not    0.04(0.00+0.03)
     +    5310.11: rev-list with negative tag (objects)       0.23(0.16+0.06)
     +    5310.12: rev-list count with blob:none              0.27(0.18+0.08)
     +    5310.13: rev-list count with blob:limit=1k          6.41(5.98+0.41)
     +    5310.14: rev-list count with tree:0                 0.26(0.18+0.07)
     +    5310.15: simulated partial clone                    4.34(3.29+0.37)
     +    5310.19: repack to disk (lookup=true)               250.93(171.97+20.78)
     +    5310.20: simulated clone                            10.80(5.14+1.06)
     +    5310.21: simulated fetch                            0.71(0.79+0.16)
     +    5310.22: pack to file (bitmap)                      39.49(20.19+6.98)
     +    5310.23: rev-list (commits)                         0.81(0.48+0.09)
     +    5310.24: rev-list (objects)                         3.48(3.38+0.09)
     +    5310.25: rev-list with tag negated via --not        0.04(0.00+0.03)
                   --all (objects)
     -    5310.26: rev-list with negative tag (objects)   0.05(0.00+0.04)
     -    5310.27: rev-list count with blob:none          0.05(0.01+0.03)
     -    5310.28: rev-list count with blob:limit=1k      8.02(7.16+0.34)
     -    5310.29: rev-list count with tree:0             0.05(0.01+0.04)
     -    5310.30: simulated partial clone                8.57(8.16+0.32)
     +    5310.26: rev-list with negative tag (objects)       0.22(0.16+0.05)
     +    5310.27: rev-list count with blob:none              0.22(0.16+0.05)
     +    5310.28: rev-list count with blob:limit=1k          6.21(5.76+0.29)
     +    5310.29: rev-list count with tree:0                 0.23(0.16+0.06)
     +    5310.30: simulated partial clone                    4.53(3.14+0.39)
      
          Tests 4-15 are without the use of lookup table. The rests are
          repeatation of the previous tests but using lookup table.
      
     -    Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
          Mentored-by: Taylor Blau <me@ttaylorr.com>
          Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
     +    Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
      
       ## t/perf/p5310-pack-bitmaps.sh ##
      @@ t/perf/p5310-pack-bitmaps.sh: test_perf_large_repo
       # We intentionally use the deprecated pack.writebitmaps
       # config so that we can test against older versions of git.
       test_expect_success 'setup bitmap config' '
     --	git config pack.writebitmaps true
     -+	git config pack.writebitmaps true &&
     -+	git config pack.writeReverseIndex true
     +-	git config pack.writebitmaps true &&
     +-	git config pack.writeReverseIndex true
     ++	git config pack.writebitmaps true
       '
       
       test_bitmap () {

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH v3 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-07-04  8:46   ` [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
@ 2022-07-04  8:46     ` Abhradeep Chakraborty via GitGitGadget
  2022-07-08 16:38       ` Philip Oakley
  2022-07-04  8:46     ` [PATCH v3 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
                       ` (7 subsequent siblings)
  8 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-07-04  8:46 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee,
	Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

When reading bitmap file, Git loads each and every bitmap one by one
even if all the bitmaps are not required. A "bitmap lookup table"
extension to the bitmap format can reduce the overhead of loading
bitmaps which stores a list of bitmapped commit id pos (in the midx
or pack, along with their offset and xor offset. This way git can
load only the necessary bitmaps without loading the previous bitmaps.

Older versions of Git ignore the lookup table extension and don't
throw any kind of warning or error while parsing the bitmap file.

Add some information for the new "bitmap lookup table" extension in the
bitmap-format documentation.

Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Co-Authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 Documentation/technical/bitmap-format.txt | 39 +++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index 04b3ec21785..c30dc177643 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -67,6 +67,17 @@ MIDXs, both the bit-cache and rev-cache extensions are required.
 			pack/MIDX. The format and meaning of the name-hash is
 			described below.
 
+			** {empty}
+			BITMAP_OPT_LOOKUP_TABLE (0x10): :::
+			If present, the end of the bitmap file contains a table
+			containing a list of `N` <commit_pos, offset, xor_row>
+			triplets. The format and meaning of the table is described
+			below.
++
+NOTE: Unlike the xor_offset used to compress an individual bitmap,
+`xor_row` stores an *absolute* index into the lookup table, not a location
+relative to the current entry.
+
 		4-byte entry count (network byte order)
 
 			The total count of entries (bitmapped commits) in this bitmap index.
@@ -205,3 +216,31 @@ Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag.
 If implementations want to choose a different hashing scheme, they are
 free to do so, but MUST allocate a new header flag (because comparing
 hashes made under two different schemes would be pointless).
+
+Commit lookup table
+-------------------
+
+If the BITMAP_OPT_LOOKUP_TABLE flag is set, the last `N * (4 + 8 + 4)`
+bytes (preceding the name-hash cache and trailing hash) of the `.bitmap`
+file contains a lookup table specifying the information needed to get
+the desired bitmap from the entries without parsing previous unnecessary
+bitmaps.
+
+For a `.bitmap` containing `nr_entries` reachability bitmaps, the table
+contains a list of `nr_entries` <commit_pos, offset, xor_row> triplets
+(sorted in the ascending order of `commit_pos`). The content of i'th
+triplet is -
+
+	* {empty}
+	commit_pos (4 byte integer, network byte order): ::
+	It stores the object position of a commit (in the midx or pack
+	index).
+
+	* {empty}
+	offset (8 byte integer, network byte order): ::
+	The offset from which that commit's bitmap can be read.
+
+	* {empty}
+	xor_row (4 byte integer, network byte order): ::
+	The position of the triplet whose bitmap is used to compress
+	this one, or `0xffffffff` if no such bitmap exists.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v3 2/6] pack-bitmap-write.c: write lookup table extension
  2022-07-04  8:46   ` [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
  2022-07-04  8:46     ` [PATCH v3 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
@ 2022-07-04  8:46     ` Abhradeep Chakraborty via GitGitGadget
  2022-07-14 23:26       ` Taylor Blau
                         ` (2 more replies)
  2022-07-04  8:46     ` [PATCH v3 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
                       ` (6 subsequent siblings)
  8 siblings, 3 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-07-04  8:46 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee,
	Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

The bitmap lookup table extension was documented by an earlier
change, but Git does not yet know how to write that extension.

Teach Git to write bitmap lookup table extension. The table contains
the list of `N` <commit_pos, offset, xor_row>` triplets. These
triplets are sorted according to their commit pos (ascending order).
The meaning of each data in the i'th triplet is given below:

  - commit_pos stores commit position (in the pack-index or midx).
    It is a 4 byte network byte order unsigned integer.

  - offset is the position (in the bitmap file) from which that
    commit's bitmap can be read.

  - xor_row is the position of the triplet in the lookup table
    whose bitmap is used to compress this bitmap, or `0xffffffff`
    if no such bitmap exists.

Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 pack-bitmap-write.c | 112 ++++++++++++++++++++++++++++++++++++++++++--
 pack-bitmap.h       |   5 +-
 2 files changed, 112 insertions(+), 5 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index c43375bd344..4a0edd746bc 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -648,9 +648,17 @@ static const struct object_id *oid_access(size_t pos, const void *table)
 	return &index[pos]->oid;
 }
 
+static int commit_bitmap_writer_pos(struct object_id *oid,
+				    struct pack_idx_entry **index,
+				    uint32_t index_nr)
+{
+	return oid_pos(oid, index, index_nr, oid_access);
+}
+
 static void write_selected_commits_v1(struct hashfile *f,
 				      struct pack_idx_entry **index,
-				      uint32_t index_nr)
+				      uint32_t index_nr,
+				      off_t *offsets)
 {
 	int i;
 
@@ -658,11 +666,14 @@ static void write_selected_commits_v1(struct hashfile *f,
 		struct bitmapped_commit *stored = &writer.selected[i];
 
 		int commit_pos =
-			oid_pos(&stored->commit->object.oid, index, index_nr, oid_access);
+			commit_bitmap_writer_pos(&stored->commit->object.oid, index, index_nr);
 
 		if (commit_pos < 0)
 			BUG("trying to write commit not in index");
 
+		if (offsets)
+			offsets[i] = hashfile_total(f);
+
 		hashwrite_be32(f, commit_pos);
 		hashwrite_u8(f, stored->xor_offset);
 		hashwrite_u8(f, stored->flags);
@@ -671,6 +682,92 @@ static void write_selected_commits_v1(struct hashfile *f,
 	}
 }
 
+static int table_cmp(const void *_va, const void *_vb, void *_data)
+{
+	uint32_t *commit_positions = _data;
+	uint32_t a = commit_positions[*(uint32_t *)_va];
+	uint32_t b = commit_positions[*(uint32_t *)_vb];
+
+	if (a > b)
+		return 1;
+	else if (a < b)
+		return -1;
+
+	return 0;
+}
+
+static void write_lookup_table(struct hashfile *f,
+			       struct pack_idx_entry **index,
+			       uint32_t index_nr,
+			       off_t *offsets)
+{
+	uint32_t i;
+	uint32_t *table, *table_inv, *commit_positions;
+
+	ALLOC_ARRAY(table, writer.selected_nr);
+	ALLOC_ARRAY(table_inv, writer.selected_nr);
+	ALLOC_ARRAY(commit_positions, writer.selected_nr);
+
+	/* store the index positions of the commits */
+	for (i = 0; i < writer.selected_nr; i++) {
+		int pos = commit_bitmap_writer_pos(&writer.selected[i].commit->object.oid,
+						   index, index_nr);
+		if (pos < 0)
+			BUG(_("trying to write commit not in index"));
+
+		commit_positions[i] = pos;
+	}
+
+	for (i = 0; i < writer.selected_nr; i++)
+		table[i] = i;
+
+	/*
+	 * At the end of this sort table[j] = i means that the i'th
+	 * bitmap corresponds to j'th bitmapped commit in lex order of
+	 * OIDs.
+	 */
+	QSORT_S(table, writer.selected_nr, table_cmp, commit_positions);
+
+	/* table_inv helps us discover that relationship (i'th bitmap
+	 * to j'th commit by j = table_inv[i])
+	 */
+	for (i = 0; i < writer.selected_nr; i++)
+		table_inv[table[i]] = i;
+
+	trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository);
+	for (i = 0; i < writer.selected_nr; i++) {
+		struct bitmapped_commit *selected = &writer.selected[table[i]];
+		uint32_t xor_offset = selected->xor_offset;
+		uint32_t xor_row;
+
+		if (xor_offset) {
+			/*
+			 * xor_index stores the index (in the bitmap entries)
+			 * of the corresponding xor bitmap. But we need to convert
+			 * this index into lookup table's index. So, table_inv[xor_index]
+			 * gives us the index position w.r.t. the lookup table.
+			 *
+			 * If "k = table[i] - xor_offset" then the xor base is the k'th
+			 * bitmap. `table_inv[k]` gives us the position of that bitmap
+			 * in the lookup table.
+			 */
+			uint32_t xor_index = table[i] - xor_offset;
+			xor_row = table_inv[xor_index];
+		} else {
+			xor_row = 0xffffffff;
+		}
+
+		hashwrite_be32(f, commit_positions[table[i]]);
+		hashwrite_be64(f, (uint64_t)offsets[table[i]]);
+		hashwrite_be32(f, xor_row);
+	}
+	trace2_region_leave("pack-bitmap-write", "writing_lookup_table", the_repository);
+
+	free(table);
+	free(table_inv);
+	free(commit_positions);
+}
+
 static void write_hash_cache(struct hashfile *f,
 			     struct pack_idx_entry **index,
 			     uint32_t index_nr)
@@ -695,6 +792,7 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 {
 	static uint16_t default_version = 1;
 	static uint16_t flags = BITMAP_OPT_FULL_DAG;
+	off_t *offsets = NULL;
 	struct strbuf tmp_file = STRBUF_INIT;
 	struct hashfile *f;
 
@@ -715,7 +813,14 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	dump_bitmap(f, writer.trees);
 	dump_bitmap(f, writer.blobs);
 	dump_bitmap(f, writer.tags);
-	write_selected_commits_v1(f, index, index_nr);
+
+	if (options & BITMAP_OPT_LOOKUP_TABLE)
+		CALLOC_ARRAY(offsets, index_nr);
+
+	write_selected_commits_v1(f, index, index_nr, offsets);
+
+	if (options & BITMAP_OPT_LOOKUP_TABLE)
+		write_lookup_table(f, index, index_nr, offsets);
 
 	if (options & BITMAP_OPT_HASH_CACHE)
 		write_hash_cache(f, index, index_nr);
@@ -730,4 +835,5 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 		die_errno("unable to rename temporary bitmap file to '%s'", filename);
 
 	strbuf_release(&tmp_file);
+	free(offsets);
 }
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 3d3ddd77345..67a9d0fc303 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -24,8 +24,9 @@ struct bitmap_disk_header {
 #define NEEDS_BITMAP (1u<<22)
 
 enum pack_bitmap_opts {
-	BITMAP_OPT_FULL_DAG = 1,
-	BITMAP_OPT_HASH_CACHE = 4,
+	BITMAP_OPT_FULL_DAG = 0x1,
+	BITMAP_OPT_HASH_CACHE = 0x4,
+	BITMAP_OPT_LOOKUP_TABLE = 0x10,
 };
 
 enum pack_bitmap_flags {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v3 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-07-04  8:46   ` [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
  2022-07-04  8:46     ` [PATCH v3 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
  2022-07-04  8:46     ` [PATCH v3 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
@ 2022-07-04  8:46     ` Abhradeep Chakraborty via GitGitGadget
  2022-07-04  8:46     ` [PATCH v3 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
                       ` (5 subsequent siblings)
  8 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-07-04  8:46 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee,
	Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Teach Git to provide a way for users to enable/disable bitmap lookup
table extension by providing a config option named 'writeBitmapLookupTable'.
Default is false.

Also add test to verify writting of lookup table.

Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Co-Authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 Documentation/config/pack.txt     |   7 +
 builtin/multi-pack-index.c        |   7 +
 builtin/pack-objects.c            |   8 +
 midx.c                            |   3 +
 midx.h                            |   1 +
 t/t5310-pack-bitmaps.sh           | 792 ++++++++++++++++--------------
 t/t5311-pack-bitmaps-shallow.sh   |  53 +-
 t/t5326-multi-pack-bitmaps.sh     | 421 +++++++++-------
 t/t5327-multi-pack-bitmaps-rev.sh |   9 +
 9 files changed, 720 insertions(+), 581 deletions(-)

diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index ad7f73a1ead..b955ca572ec 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -164,6 +164,13 @@ When writing a multi-pack reachability bitmap, no new namehashes are
 computed; instead, any namehashes stored in an existing bitmap are
 permuted into their appropriate location when writing a new bitmap.
 
+pack.writeBitmapLookupTable::
+	When true, Git will include a "lookup table" section in the
+	bitmap index (if one is written). This table is used to defer
+	loading individual bitmaps as late as possible. This can be
+	beneficial in repositories that have relatively large bitmap
+	indexes. Defaults to false.
+
 pack.writeReverseIndex::
 	When true, git will write a corresponding .rev file (see:
 	link:../technical/pack-format.html[Documentation/technical/pack-format.txt])
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 5edbb7fe86e..55402b46f41 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -87,6 +87,13 @@ static int git_multi_pack_index_write_config(const char *var, const char *value,
 			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
 	}
 
+	if (!strcmp(var, "pack.writebitmaplookuptable")) {
+		if (git_config_bool(var, value))
+			opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;
+		else
+			opts.flags &= ~MIDX_WRITE_BITMAP_LOOKUP_TABLE;
+	}
+
 	/*
 	 * We should never make a fall-back call to 'git_default_config', since
 	 * this was already called in 'cmd_multi_pack_index()'.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 39e28cfcafc..46e26774963 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3148,6 +3148,14 @@ static int git_pack_config(const char *k, const char *v, void *cb)
 		else
 			write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE;
 	}
+
+	if (!strcmp(k, "pack.writebitmaplookuptable")) {
+		if (git_config_bool(k, v))
+			write_bitmap_options |= BITMAP_OPT_LOOKUP_TABLE;
+		else
+			write_bitmap_options &= ~BITMAP_OPT_LOOKUP_TABLE;
+	}
+
 	if (!strcmp(k, "pack.usebitmaps")) {
 		use_bitmap_index_default = git_config_bool(k, v);
 		return 0;
diff --git a/midx.c b/midx.c
index 5f0dd386b02..9c26d04bfde 100644
--- a/midx.c
+++ b/midx.c
@@ -1072,6 +1072,9 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
 	if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
 		options |= BITMAP_OPT_HASH_CACHE;
 
+	if (flags & MIDX_WRITE_BITMAP_LOOKUP_TABLE)
+		options |= BITMAP_OPT_LOOKUP_TABLE;
+
 	prepare_midx_packing_data(&pdata, ctx);
 
 	commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx);
diff --git a/midx.h b/midx.h
index 22e8e53288e..5578cd7b835 100644
--- a/midx.h
+++ b/midx.h
@@ -47,6 +47,7 @@ struct multi_pack_index {
 #define MIDX_WRITE_REV_INDEX (1 << 1)
 #define MIDX_WRITE_BITMAP (1 << 2)
 #define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
+#define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m);
 void get_midx_filename(struct strbuf *out, const char *object_dir);
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index f775fc1ce69..c0607172827 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -26,22 +26,413 @@ has_any () {
 	grep -Ff "$1" "$2"
 }
 
-setup_bitmap_history
-
-test_expect_success 'setup writing bitmaps during repack' '
-	git config repack.writeBitmaps true
-'
-
-test_expect_success 'full repack creates bitmaps' '
-	GIT_TRACE2_EVENT="$(pwd)/trace" \
+test_bitmap_cases () {
+	writeLookupTable=false
+	for i in "$@"
+	do
+		case "$i" in
+		"pack.writeBitmapLookupTable") writeLookupTable=true;;
+		esac
+	done
+
+	test_expect_success 'setup test repository' '
+		rm -fr * .git &&
+		git init &&
+		git config pack.writeBitmapLookupTable '"$writeLookupTable"'
+	'
+	setup_bitmap_history
+
+	test_expect_success 'setup writing bitmaps during repack' '
+		git config repack.writeBitmaps true
+	'
+
+	test_expect_success 'full repack creates bitmaps' '
+		GIT_TRACE2_EVENT="$(pwd)/trace" \
+			git repack -ad &&
+		ls .git/objects/pack/ | grep bitmap >output &&
+		test_line_count = 1 output &&
+		grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
+		grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
+	'
+
+	basic_bitmap_tests
+
+	test_expect_success 'pack-objects respects --local (non-local loose)' '
+		git init --bare alt.git &&
+		echo $(pwd)/alt.git/objects >.git/objects/info/alternates &&
+		echo content1 >file1 &&
+		# non-local loose object which is not present in bitmapped pack
+		altblob=$(GIT_DIR=alt.git git hash-object -w file1) &&
+		# non-local loose object which is also present in bitmapped pack
+		git cat-file blob $blob | GIT_DIR=alt.git git hash-object -w --stdin &&
+		git add file1 &&
+		test_tick &&
+		git commit -m commit_file1 &&
+		echo HEAD | git pack-objects --local --stdout --revs >1.pack &&
+		git index-pack 1.pack &&
+		list_packed_objects 1.idx >1.objects &&
+		printf "%s\n" "$altblob" "$blob" >nonlocal-loose &&
+		! has_any nonlocal-loose 1.objects
+	'
+
+	test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' '
+		echo content2 >file2 &&
+		blob2=$(git hash-object -w file2) &&
+		git add file2 &&
+		test_tick &&
+		git commit -m commit_file2 &&
+		printf "%s\n" "$blob2" "$bitmaptip" >keepobjects &&
+		pack2=$(git pack-objects pack2 <keepobjects) &&
+		mv pack2-$pack2.* .git/objects/pack/ &&
+		>.git/objects/pack/pack2-$pack2.keep &&
+		rm $(objpath $blob2) &&
+		echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >2a.pack &&
+		git index-pack 2a.pack &&
+		list_packed_objects 2a.idx >2a.objects &&
+		! has_any keepobjects 2a.objects
+	'
+
+	test_expect_success 'pack-objects respects --local (non-local pack)' '
+		mv .git/objects/pack/pack2-$pack2.* alt.git/objects/pack/ &&
+		echo HEAD | git pack-objects --local --stdout --revs >2b.pack &&
+		git index-pack 2b.pack &&
+		list_packed_objects 2b.idx >2b.objects &&
+		! has_any keepobjects 2b.objects
+	'
+
+	test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' '
+		ls .git/objects/pack/ | grep bitmap >output &&
+		test_line_count = 1 output &&
+		packbitmap=$(basename $(cat output) .bitmap) &&
+		list_packed_objects .git/objects/pack/$packbitmap.idx >packbitmap.objects &&
+		test_when_finished "rm -f .git/objects/pack/$packbitmap.keep" &&
+		>.git/objects/pack/$packbitmap.keep &&
+		echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack &&
+		git index-pack 3a.pack &&
+		list_packed_objects 3a.idx >3a.objects &&
+		! has_any packbitmap.objects 3a.objects
+	'
+
+	test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' '
+		mv .git/objects/pack/$packbitmap.* alt.git/objects/pack/ &&
+		rm -f .git/objects/pack/multi-pack-index &&
+		test_when_finished "mv alt.git/objects/pack/$packbitmap.* .git/objects/pack/" &&
+		echo HEAD | git pack-objects --local --stdout --revs >3b.pack &&
+		git index-pack 3b.pack &&
+		list_packed_objects 3b.idx >3b.objects &&
+		! has_any packbitmap.objects 3b.objects
+	'
+
+	test_expect_success 'pack-objects to file can use bitmap' '
+		# make sure we still have 1 bitmap index from previous tests
+		ls .git/objects/pack/ | grep bitmap >output &&
+		test_line_count = 1 output &&
+		# verify equivalent packs are generated with/without using bitmap index
+		packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) &&
+		packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) &&
+		list_packed_objects packa-$packasha1.idx >packa.objects &&
+		list_packed_objects packb-$packbsha1.idx >packb.objects &&
+		test_cmp packa.objects packb.objects
+	'
+
+	test_expect_success 'full repack, reusing previous bitmaps' '
 		git repack -ad &&
-	ls .git/objects/pack/ | grep bitmap >output &&
-	test_line_count = 1 output &&
-	grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
-	grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
-'
+		ls .git/objects/pack/ | grep bitmap >output &&
+		test_line_count = 1 output
+	'
+
+	test_expect_success 'fetch (full bitmap)' '
+		git --git-dir=clone.git fetch origin second:second &&
+		git rev-parse HEAD >expect &&
+		git --git-dir=clone.git rev-parse HEAD >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success 'create objects for missing-HAVE tests' '
+		blob=$(echo "missing have" | git hash-object -w --stdin) &&
+		tree=$(printf "100644 blob $blob\tfile\n" | git mktree) &&
+		parent=$(echo parent | git commit-tree $tree) &&
+		commit=$(echo commit | git commit-tree $tree -p $parent) &&
+		cat >revs <<-EOF
+		HEAD
+		^HEAD^
+		^$commit
+		EOF
+	'
+
+	test_expect_success 'pack-objects respects --incremental' '
+		cat >revs2 <<-EOF &&
+		HEAD
+		$commit
+		EOF
+		git pack-objects --incremental --stdout --revs <revs2 >4.pack &&
+		git index-pack 4.pack &&
+		list_packed_objects 4.idx >4.objects &&
+		test_line_count = 4 4.objects &&
+		git rev-list --objects $commit >revlist &&
+		cut -d" " -f1 revlist |sort >objects &&
+		test_cmp 4.objects objects
+	'
+
+	test_expect_success 'pack with missing blob' '
+		rm $(objpath $blob) &&
+		git pack-objects --stdout --revs <revs >/dev/null
+	'
+
+	test_expect_success 'pack with missing tree' '
+		rm $(objpath $tree) &&
+		git pack-objects --stdout --revs <revs >/dev/null
+	'
+
+	test_expect_success 'pack with missing parent' '
+		rm $(objpath $parent) &&
+		git pack-objects --stdout --revs <revs >/dev/null
+	'
+
+	test_expect_success JGIT,SHA1 'we can read jgit bitmaps' '
+		git clone --bare . compat-jgit.git &&
+		(
+			cd compat-jgit.git &&
+			rm -f objects/pack/*.bitmap &&
+			jgit gc &&
+			git rev-list --test-bitmap HEAD
+		)
+	'
+
+	test_expect_success JGIT,SHA1 'jgit can read our bitmaps' '
+		git clone --bare . compat-us.git &&
+		(
+			cd compat-us.git &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+			git repack -adb &&
+			# jgit gc will barf if it does not like our bitmaps
+			jgit gc
+		)
+	'
+
+	test_expect_success 'splitting packs does not generate bogus bitmaps' '
+		test-tool genrandom foo $((1024 * 1024)) >rand &&
+		git add rand &&
+		git commit -m "commit with big file" &&
+		git -c pack.packSizeLimit=500k repack -adb &&
+		git init --bare no-bitmaps.git &&
+		git -C no-bitmaps.git fetch .. HEAD
+	'
+
+	test_expect_success 'set up reusable pack' '
+		rm -f .git/objects/pack/*.keep &&
+		git repack -adb &&
+		reusable_pack () {
+			git for-each-ref --format="%(objectname)" |
+			git pack-objects --delta-base-offset --revs --stdout "$@"
+		}
+	'
+
+	test_expect_success 'pack reuse respects --honor-pack-keep' '
+		test_when_finished "rm -f .git/objects/pack/*.keep" &&
+		for i in .git/objects/pack/*.pack
+		do
+			>${i%.pack}.keep || return 1
+		done &&
+		reusable_pack --honor-pack-keep >empty.pack &&
+		git index-pack empty.pack &&
+		git show-index <empty.idx >actual &&
+		test_must_be_empty actual
+	'
+
+	test_expect_success 'pack reuse respects --local' '
+		mv .git/objects/pack/* alt.git/objects/pack/ &&
+		test_when_finished "mv alt.git/objects/pack/* .git/objects/pack/" &&
+		reusable_pack --local >empty.pack &&
+		git index-pack empty.pack &&
+		git show-index <empty.idx >actual &&
+		test_must_be_empty actual
+	'
+
+	test_expect_success 'pack reuse respects --incremental' '
+		reusable_pack --incremental >empty.pack &&
+		git index-pack empty.pack &&
+		git show-index <empty.idx >actual &&
+		test_must_be_empty actual
+	'
+
+	test_expect_success 'truncated bitmap fails gracefully (ewah)' '
+		test_config pack.writebitmaphashcache false &&
+		git repack -ad &&
+		git rev-list --use-bitmap-index --count --all >expect &&
+		bitmap=$(ls .git/objects/pack/*.bitmap) &&
+		test_when_finished "rm -f $bitmap" &&
+		test_copy_bytes 256 <$bitmap >$bitmap.tmp &&
+		mv -f $bitmap.tmp $bitmap &&
+		git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
+		test_cmp expect actual &&
+		test_i18ngrep corrupt.ewah.bitmap stderr
+	'
+
+	test_expect_success 'truncated bitmap fails gracefully (cache)' '
+		git repack -ad &&
+		git rev-list --use-bitmap-index --count --all >expect &&
+		bitmap=$(ls .git/objects/pack/*.bitmap) &&
+		test_when_finished "rm -f $bitmap" &&
+		test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
+		mv -f $bitmap.tmp $bitmap &&
+		git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
+		test_cmp expect actual &&
+		test_i18ngrep corrupted.bitmap.index stderr
+	'
+
+	# Create a state of history with these properties:
+	#
+	#  - refs that allow a client to fetch some new history, while sharing some old
+	#    history with the server; we use branches delta-reuse-old and
+	#    delta-reuse-new here
+	#
+	#  - the new history contains an object that is stored on the server as a delta
+	#    against a base that is in the old history
+	#
+	#  - the base object is not immediately reachable from the tip of the old
+	#    history; finding it would involve digging down through history we know the
+	#    other side has
+	#
+	# This should result in a state where fetching from old->new would not
+	# traditionally reuse the on-disk delta (because we'd have to dig to realize
+	# that the client has it), but we will do so if bitmaps can tell us cheaply
+	# that the other side has it.
+	test_expect_success 'set up thin delta-reuse parent' '
+		# This first commit contains the buried base object.
+		test-tool genrandom delta 16384 >file &&
+		git add file &&
+		git commit -m "delta base" &&
+		base=$(git rev-parse --verify HEAD:file) &&
+
+		# These intermediate commits bury the base back in history.
+		# This becomes the "old" state.
+		for i in 1 2 3 4 5
+		do
+			echo $i >file &&
+			git commit -am "intermediate $i" || return 1
+		done &&
+		git branch delta-reuse-old &&
+
+		# And now our new history has a delta against the buried base. Note
+		# that this must be smaller than the original file, since pack-objects
+		# prefers to create deltas from smaller objects to larger.
+		test-tool genrandom delta 16300 >file &&
+		git commit -am "delta result" &&
+		delta=$(git rev-parse --verify HEAD:file) &&
+		git branch delta-reuse-new &&
+
+		# Repack with bitmaps and double check that we have the expected delta
+		# relationship.
+		git repack -adb &&
+		have_delta $delta $base
+	'
+
+	# Now we can sanity-check the non-bitmap behavior (that the server is not able
+	# to reuse the delta). This isn't strictly something we care about, so this
+	# test could be scrapped in the future. But it makes sure that the next test is
+	# actually triggering the feature we want.
+	#
+	# Note that our tools for working with on-the-wire "thin" packs are limited. So
+	# we actually perform the fetch, retain the resulting pack, and inspect the
+	# result.
+	test_expect_success 'fetch without bitmaps ignores delta against old base' '
+		test_config pack.usebitmaps false &&
+		test_when_finished "rm -rf client.git" &&
+		git init --bare client.git &&
+		(
+			cd client.git &&
+			git config transfer.unpackLimit 1 &&
+			git fetch .. delta-reuse-old:delta-reuse-old &&
+			git fetch .. delta-reuse-new:delta-reuse-new &&
+			have_delta $delta $ZERO_OID
+		)
+	'
+
+	# And do the same for the bitmap case, where we do expect to find the delta.
+	test_expect_success 'fetch with bitmaps can reuse old base' '
+		test_config pack.usebitmaps true &&
+		test_when_finished "rm -rf client.git" &&
+		git init --bare client.git &&
+		(
+			cd client.git &&
+			git config transfer.unpackLimit 1 &&
+			git fetch .. delta-reuse-old:delta-reuse-old &&
+			git fetch .. delta-reuse-new:delta-reuse-new &&
+			have_delta $delta $base
+		)
+	'
+
+	test_expect_success 'pack.preferBitmapTips' '
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+
+			# create enough commits that not all are receive bitmap
+			# coverage even if they are all at the tip of some reference.
+			test_commit_bulk --message="%s" 103 &&
+
+			git rev-list HEAD >commits.raw &&
+			sort <commits.raw >commits &&
+
+			git log --format="create refs/tags/%s %H" HEAD >refs &&
+			git update-ref --stdin <refs &&
+
+			git repack -adb &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+
+			# remember which commits did not receive bitmaps
+			comm -13 bitmaps commits >before &&
+			test_file_not_empty before &&
+
+			# mark the commits which did not receive bitmaps as preferred,
+			# and generate the bitmap again
+			perl -pe "s{^}{create refs/tags/include/$. }" <before |
+				git update-ref --stdin &&
+			git -c pack.preferBitmapTips=refs/tags/include repack -adb &&
+
+			# finally, check that the commit(s) without bitmap coverage
+			# are not the same ones as before
+			test-tool bitmap list-commits | sort >bitmaps &&
+			comm -13 bitmaps commits >after &&
+
+			! test_cmp before after
+		)
+	'
+
+	test_expect_success 'complains about multiple pack bitmaps' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+
+			test_commit base &&
+
+			git repack -adb &&
+			bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
+			mv "$bitmap" "$bitmap.bak" &&
+
+			test_commit other &&
+			git repack -ab &&
+
+			mv "$bitmap.bak" "$bitmap" &&
+
+			find .git/objects/pack -type f -name "*.pack" >packs &&
+			find .git/objects/pack -type f -name "*.bitmap" >bitmaps &&
+			test_line_count = 2 packs &&
+			test_line_count = 2 bitmaps &&
+
+			git rev-list --use-bitmap-index HEAD 2>err &&
+			grep "ignoring extra bitmap file" err
+		)
+	'
+}
 
-basic_bitmap_tests
+test_bitmap_cases
 
 test_expect_success 'incremental repack fails when bitmaps are requested' '
 	test_commit more-1 &&
@@ -54,375 +445,12 @@ test_expect_success 'incremental repack can disable bitmaps' '
 	git repack -d --no-write-bitmap-index
 '
 
-test_expect_success 'pack-objects respects --local (non-local loose)' '
-	git init --bare alt.git &&
-	echo $(pwd)/alt.git/objects >.git/objects/info/alternates &&
-	echo content1 >file1 &&
-	# non-local loose object which is not present in bitmapped pack
-	altblob=$(GIT_DIR=alt.git git hash-object -w file1) &&
-	# non-local loose object which is also present in bitmapped pack
-	git cat-file blob $blob | GIT_DIR=alt.git git hash-object -w --stdin &&
-	git add file1 &&
-	test_tick &&
-	git commit -m commit_file1 &&
-	echo HEAD | git pack-objects --local --stdout --revs >1.pack &&
-	git index-pack 1.pack &&
-	list_packed_objects 1.idx >1.objects &&
-	printf "%s\n" "$altblob" "$blob" >nonlocal-loose &&
-	! has_any nonlocal-loose 1.objects
-'
-
-test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' '
-	echo content2 >file2 &&
-	blob2=$(git hash-object -w file2) &&
-	git add file2 &&
-	test_tick &&
-	git commit -m commit_file2 &&
-	printf "%s\n" "$blob2" "$bitmaptip" >keepobjects &&
-	pack2=$(git pack-objects pack2 <keepobjects) &&
-	mv pack2-$pack2.* .git/objects/pack/ &&
-	>.git/objects/pack/pack2-$pack2.keep &&
-	rm $(objpath $blob2) &&
-	echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >2a.pack &&
-	git index-pack 2a.pack &&
-	list_packed_objects 2a.idx >2a.objects &&
-	! has_any keepobjects 2a.objects
-'
-
-test_expect_success 'pack-objects respects --local (non-local pack)' '
-	mv .git/objects/pack/pack2-$pack2.* alt.git/objects/pack/ &&
-	echo HEAD | git pack-objects --local --stdout --revs >2b.pack &&
-	git index-pack 2b.pack &&
-	list_packed_objects 2b.idx >2b.objects &&
-	! has_any keepobjects 2b.objects
-'
-
-test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' '
-	ls .git/objects/pack/ | grep bitmap >output &&
-	test_line_count = 1 output &&
-	packbitmap=$(basename $(cat output) .bitmap) &&
-	list_packed_objects .git/objects/pack/$packbitmap.idx >packbitmap.objects &&
-	test_when_finished "rm -f .git/objects/pack/$packbitmap.keep" &&
-	>.git/objects/pack/$packbitmap.keep &&
-	echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack &&
-	git index-pack 3a.pack &&
-	list_packed_objects 3a.idx >3a.objects &&
-	! has_any packbitmap.objects 3a.objects
-'
-
-test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' '
-	mv .git/objects/pack/$packbitmap.* alt.git/objects/pack/ &&
-	rm -f .git/objects/pack/multi-pack-index &&
-	test_when_finished "mv alt.git/objects/pack/$packbitmap.* .git/objects/pack/" &&
-	echo HEAD | git pack-objects --local --stdout --revs >3b.pack &&
-	git index-pack 3b.pack &&
-	list_packed_objects 3b.idx >3b.objects &&
-	! has_any packbitmap.objects 3b.objects
-'
-
-test_expect_success 'pack-objects to file can use bitmap' '
-	# make sure we still have 1 bitmap index from previous tests
-	ls .git/objects/pack/ | grep bitmap >output &&
-	test_line_count = 1 output &&
-	# verify equivalent packs are generated with/without using bitmap index
-	packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) &&
-	packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) &&
-	list_packed_objects packa-$packasha1.idx >packa.objects &&
-	list_packed_objects packb-$packbsha1.idx >packb.objects &&
-	test_cmp packa.objects packb.objects
-'
-
-test_expect_success 'full repack, reusing previous bitmaps' '
-	git repack -ad &&
-	ls .git/objects/pack/ | grep bitmap >output &&
-	test_line_count = 1 output
-'
-
-test_expect_success 'fetch (full bitmap)' '
-	git --git-dir=clone.git fetch origin second:second &&
-	git rev-parse HEAD >expect &&
-	git --git-dir=clone.git rev-parse HEAD >actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'create objects for missing-HAVE tests' '
-	blob=$(echo "missing have" | git hash-object -w --stdin) &&
-	tree=$(printf "100644 blob $blob\tfile\n" | git mktree) &&
-	parent=$(echo parent | git commit-tree $tree) &&
-	commit=$(echo commit | git commit-tree $tree -p $parent) &&
-	cat >revs <<-EOF
-	HEAD
-	^HEAD^
-	^$commit
-	EOF
-'
-
-test_expect_success 'pack-objects respects --incremental' '
-	cat >revs2 <<-EOF &&
-	HEAD
-	$commit
-	EOF
-	git pack-objects --incremental --stdout --revs <revs2 >4.pack &&
-	git index-pack 4.pack &&
-	list_packed_objects 4.idx >4.objects &&
-	test_line_count = 4 4.objects &&
-	git rev-list --objects $commit >revlist &&
-	cut -d" " -f1 revlist |sort >objects &&
-	test_cmp 4.objects objects
-'
-
-test_expect_success 'pack with missing blob' '
-	rm $(objpath $blob) &&
-	git pack-objects --stdout --revs <revs >/dev/null
-'
+test_bitmap_cases "pack.writeBitmapLookupTable"
 
-test_expect_success 'pack with missing tree' '
-	rm $(objpath $tree) &&
-	git pack-objects --stdout --revs <revs >/dev/null
-'
-
-test_expect_success 'pack with missing parent' '
-	rm $(objpath $parent) &&
-	git pack-objects --stdout --revs <revs >/dev/null
-'
-
-test_expect_success JGIT,SHA1 'we can read jgit bitmaps' '
-	git clone --bare . compat-jgit.git &&
-	(
-		cd compat-jgit.git &&
-		rm -f objects/pack/*.bitmap &&
-		jgit gc &&
-		git rev-list --test-bitmap HEAD
-	)
-'
-
-test_expect_success JGIT,SHA1 'jgit can read our bitmaps' '
-	git clone --bare . compat-us.git &&
-	(
-		cd compat-us.git &&
-		git repack -adb &&
-		# jgit gc will barf if it does not like our bitmaps
-		jgit gc
-	)
-'
-
-test_expect_success 'splitting packs does not generate bogus bitmaps' '
-	test-tool genrandom foo $((1024 * 1024)) >rand &&
-	git add rand &&
-	git commit -m "commit with big file" &&
-	git -c pack.packSizeLimit=500k repack -adb &&
-	git init --bare no-bitmaps.git &&
-	git -C no-bitmaps.git fetch .. HEAD
-'
-
-test_expect_success 'set up reusable pack' '
-	rm -f .git/objects/pack/*.keep &&
-	git repack -adb &&
-	reusable_pack () {
-		git for-each-ref --format="%(objectname)" |
-		git pack-objects --delta-base-offset --revs --stdout "$@"
-	}
-'
-
-test_expect_success 'pack reuse respects --honor-pack-keep' '
-	test_when_finished "rm -f .git/objects/pack/*.keep" &&
-	for i in .git/objects/pack/*.pack
-	do
-		>${i%.pack}.keep || return 1
-	done &&
-	reusable_pack --honor-pack-keep >empty.pack &&
-	git index-pack empty.pack &&
-	git show-index <empty.idx >actual &&
-	test_must_be_empty actual
-'
-
-test_expect_success 'pack reuse respects --local' '
-	mv .git/objects/pack/* alt.git/objects/pack/ &&
-	test_when_finished "mv alt.git/objects/pack/* .git/objects/pack/" &&
-	reusable_pack --local >empty.pack &&
-	git index-pack empty.pack &&
-	git show-index <empty.idx >actual &&
-	test_must_be_empty actual
-'
-
-test_expect_success 'pack reuse respects --incremental' '
-	reusable_pack --incremental >empty.pack &&
-	git index-pack empty.pack &&
-	git show-index <empty.idx >actual &&
-	test_must_be_empty actual
-'
-
-test_expect_success 'truncated bitmap fails gracefully (ewah)' '
-	test_config pack.writebitmaphashcache false &&
-	git repack -ad &&
-	git rev-list --use-bitmap-index --count --all >expect &&
-	bitmap=$(ls .git/objects/pack/*.bitmap) &&
-	test_when_finished "rm -f $bitmap" &&
-	test_copy_bytes 256 <$bitmap >$bitmap.tmp &&
-	mv -f $bitmap.tmp $bitmap &&
-	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
-	test_cmp expect actual &&
-	test_i18ngrep corrupt.ewah.bitmap stderr
-'
-
-test_expect_success 'truncated bitmap fails gracefully (cache)' '
-	git repack -ad &&
-	git rev-list --use-bitmap-index --count --all >expect &&
-	bitmap=$(ls .git/objects/pack/*.bitmap) &&
-	test_when_finished "rm -f $bitmap" &&
-	test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
-	mv -f $bitmap.tmp $bitmap &&
-	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
-	test_cmp expect actual &&
-	test_i18ngrep corrupted.bitmap.index stderr
-'
-
-# Create a state of history with these properties:
-#
-#  - refs that allow a client to fetch some new history, while sharing some old
-#    history with the server; we use branches delta-reuse-old and
-#    delta-reuse-new here
-#
-#  - the new history contains an object that is stored on the server as a delta
-#    against a base that is in the old history
-#
-#  - the base object is not immediately reachable from the tip of the old
-#    history; finding it would involve digging down through history we know the
-#    other side has
-#
-# This should result in a state where fetching from old->new would not
-# traditionally reuse the on-disk delta (because we'd have to dig to realize
-# that the client has it), but we will do so if bitmaps can tell us cheaply
-# that the other side has it.
-test_expect_success 'set up thin delta-reuse parent' '
-	# This first commit contains the buried base object.
-	test-tool genrandom delta 16384 >file &&
-	git add file &&
-	git commit -m "delta base" &&
-	base=$(git rev-parse --verify HEAD:file) &&
-
-	# These intermediate commits bury the base back in history.
-	# This becomes the "old" state.
-	for i in 1 2 3 4 5
-	do
-		echo $i >file &&
-		git commit -am "intermediate $i" || return 1
-	done &&
-	git branch delta-reuse-old &&
-
-	# And now our new history has a delta against the buried base. Note
-	# that this must be smaller than the original file, since pack-objects
-	# prefers to create deltas from smaller objects to larger.
-	test-tool genrandom delta 16300 >file &&
-	git commit -am "delta result" &&
-	delta=$(git rev-parse --verify HEAD:file) &&
-	git branch delta-reuse-new &&
-
-	# Repack with bitmaps and double check that we have the expected delta
-	# relationship.
-	git repack -adb &&
-	have_delta $delta $base
-'
-
-# Now we can sanity-check the non-bitmap behavior (that the server is not able
-# to reuse the delta). This isn't strictly something we care about, so this
-# test could be scrapped in the future. But it makes sure that the next test is
-# actually triggering the feature we want.
-#
-# Note that our tools for working with on-the-wire "thin" packs are limited. So
-# we actually perform the fetch, retain the resulting pack, and inspect the
-# result.
-test_expect_success 'fetch without bitmaps ignores delta against old base' '
-	test_config pack.usebitmaps false &&
-	test_when_finished "rm -rf client.git" &&
-	git init --bare client.git &&
-	(
-		cd client.git &&
-		git config transfer.unpackLimit 1 &&
-		git fetch .. delta-reuse-old:delta-reuse-old &&
-		git fetch .. delta-reuse-new:delta-reuse-new &&
-		have_delta $delta $ZERO_OID
-	)
-'
-
-# And do the same for the bitmap case, where we do expect to find the delta.
-test_expect_success 'fetch with bitmaps can reuse old base' '
-	test_config pack.usebitmaps true &&
-	test_when_finished "rm -rf client.git" &&
-	git init --bare client.git &&
-	(
-		cd client.git &&
-		git config transfer.unpackLimit 1 &&
-		git fetch .. delta-reuse-old:delta-reuse-old &&
-		git fetch .. delta-reuse-new:delta-reuse-new &&
-		have_delta $delta $base
-	)
-'
-
-test_expect_success 'pack.preferBitmapTips' '
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
-
-		# create enough commits that not all are receive bitmap
-		# coverage even if they are all at the tip of some reference.
-		test_commit_bulk --message="%s" 103 &&
-
-		git rev-list HEAD >commits.raw &&
-		sort <commits.raw >commits &&
-
-		git log --format="create refs/tags/%s %H" HEAD >refs &&
-		git update-ref --stdin <refs &&
-
-		git repack -adb &&
-		test-tool bitmap list-commits | sort >bitmaps &&
-
-		# remember which commits did not receive bitmaps
-		comm -13 bitmaps commits >before &&
-		test_file_not_empty before &&
-
-		# mark the commits which did not receive bitmaps as preferred,
-		# and generate the bitmap again
-		perl -pe "s{^}{create refs/tags/include/$. }" <before |
-			git update-ref --stdin &&
-		git -c pack.preferBitmapTips=refs/tags/include repack -adb &&
-
-		# finally, check that the commit(s) without bitmap coverage
-		# are not the same ones as before
-		test-tool bitmap list-commits | sort >bitmaps &&
-		comm -13 bitmaps commits >after &&
-
-		! test_cmp before after
-	)
-'
-
-test_expect_success 'complains about multiple pack bitmaps' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
-
-		test_commit base &&
-
-		git repack -adb &&
-		bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
-		mv "$bitmap" "$bitmap.bak" &&
-
-		test_commit other &&
-		git repack -ab &&
-
-		mv "$bitmap.bak" "$bitmap" &&
-
-		find .git/objects/pack -type f -name "*.pack" >packs &&
-		find .git/objects/pack -type f -name "*.bitmap" >bitmaps &&
-		test_line_count = 2 packs &&
-		test_line_count = 2 bitmaps &&
-
-		git rev-list --use-bitmap-index HEAD 2>err &&
-		grep "ignoring extra bitmap file" err
-	)
+test_expect_success 'verify writing bitmap lookup table when enabled' '
+	GIT_TRACE2_EVENT="$(pwd)/trace2" \
+		git repack -ad &&
+	grep "\"label\":\"writing_lookup_table\"" trace2
 '
 
 test_done
diff --git a/t/t5311-pack-bitmaps-shallow.sh b/t/t5311-pack-bitmaps-shallow.sh
index 872a95df338..f74c6a2da47 100755
--- a/t/t5311-pack-bitmaps-shallow.sh
+++ b/t/t5311-pack-bitmaps-shallow.sh
@@ -17,23 +17,40 @@ test_description='check bitmap operation with shallow repositories'
 # the tree for A. But in a shallow one, we've grafted away
 # A, and fetching A to B requires that the other side send
 # us the tree for file=1.
-test_expect_success 'setup shallow repo' '
-	echo 1 >file &&
-	git add file &&
-	git commit -m orig &&
-	echo 2 >file &&
-	git commit -a -m update &&
-	git clone --no-local --bare --depth=1 . shallow.git &&
-	echo 1 >file &&
-	git commit -a -m repeat
-'
-
-test_expect_success 'turn on bitmaps in the parent' '
-	git repack -adb
-'
-
-test_expect_success 'shallow fetch from bitmapped repo' '
-	(cd shallow.git && git fetch)
-'
+test_shallow_bitmaps () {
+	writeLookupTable=false
+
+	for i in "$@"
+	do
+		case $i in
+		"pack.writeBitmapLookupTable") writeLookupTable=true;;
+		esac
+	done
+
+	test_expect_success 'setup shallow repo' '
+		rm -rf * .git &&
+		git init &&
+		git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+		echo 1 >file &&
+		git add file &&
+		git commit -m orig &&
+		echo 2 >file &&
+		git commit -a -m update &&
+		git clone --no-local --bare --depth=1 . shallow.git &&
+		echo 1 >file &&
+		git commit -a -m repeat
+	'
+
+	test_expect_success 'turn on bitmaps in the parent' '
+		git repack -adb
+	'
+
+	test_expect_success 'shallow fetch from bitmapped repo' '
+		(cd shallow.git && git fetch)
+	'
+}
+
+test_shallow_bitmaps
+
 
 test_done
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 4fe57414c13..3b206adcee6 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -15,17 +15,24 @@ GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
 sane_unset GIT_TEST_MIDX_WRITE_REV
 sane_unset GIT_TEST_MIDX_READ_RIDX
 
-midx_bitmap_core
-
 bitmap_reuse_tests() {
 	from=$1
 	to=$2
+	writeLookupTable=false
+
+	for i in $3-${$#}
+	do
+		case $i in
+		"pack.writeBitmapLookupTable") writeLookupTable=true;;
+		esac
+	done
 
 	test_expect_success "setup pack reuse tests ($from -> $to)" '
 		rm -fr repo &&
 		git init repo &&
 		(
 			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 			test_commit_bulk 16 &&
 			git tag old-tip &&
 
@@ -43,6 +50,7 @@ bitmap_reuse_tests() {
 	test_expect_success "build bitmap from existing ($from -> $to)" '
 		(
 			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 			test_commit_bulk --id=further 16 &&
 			git tag new-tip &&
 
@@ -59,6 +67,7 @@ bitmap_reuse_tests() {
 	test_expect_success "verify resulting bitmaps ($from -> $to)" '
 		(
 			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 			git for-each-ref &&
 			git rev-list --test-bitmap refs/tags/old-tip &&
 			git rev-list --test-bitmap refs/tags/new-tip
@@ -66,244 +75,294 @@ bitmap_reuse_tests() {
 	'
 }
 
-bitmap_reuse_tests 'pack' 'MIDX'
-bitmap_reuse_tests 'MIDX' 'pack'
-bitmap_reuse_tests 'MIDX' 'MIDX'
+test_midx_bitmap_cases () {
+	writeLookupTable=false
+	writeBitmapLookupTable=
+
+	for i in "$@"
+	do
+		case $i in
+		"pack.writeBitmapLookupTable")
+			writeLookupTable=true
+			writeBitmapLookupTable="$i"
+			;;
+		esac
+	done
+
+	test_expect_success 'setup test_repository' '
+		rm -rf * .git &&
+		git init &&
+		git config pack.writeBitmapLookupTable '"$writeLookupTable"'
+	'
 
-test_expect_success 'missing object closure fails gracefully' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+	midx_bitmap_core
 
-		test_commit loose &&
-		test_commit packed &&
+	bitmap_reuse_tests 'pack' 'MIDX' "$writeBitmapLookupTable"
+	bitmap_reuse_tests 'MIDX' 'pack' "$writeBitmapLookupTable"
+	bitmap_reuse_tests 'MIDX' 'MIDX' "$writeBitmapLookupTable"
 
-		# Do not pass "--revs"; we want a pack without the "loose"
-		# commit.
-		git pack-objects $objdir/pack/pack <<-EOF &&
-		$(git rev-parse packed)
-		EOF
+	test_expect_success 'missing object closure fails gracefully' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-		test_must_fail git multi-pack-index write --bitmap 2>err &&
-		grep "doesn.t have full closure" err &&
-		test_path_is_missing $midx
-	)
-'
+			test_commit loose &&
+			test_commit packed &&
 
-midx_bitmap_partial_tests
+			# Do not pass "--revs"; we want a pack without the "loose"
+			# commit.
+			git pack-objects $objdir/pack/pack <<-EOF &&
+			$(git rev-parse packed)
+			EOF
 
-test_expect_success 'removing a MIDX clears stale bitmaps' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
-		test_commit base &&
-		git repack &&
-		git multi-pack-index write --bitmap &&
+			test_must_fail git multi-pack-index write --bitmap 2>err &&
+			grep "doesn.t have full closure" err &&
+			test_path_is_missing $midx
+		)
+	'
 
-		# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
-		stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
-		rm $midx &&
+	midx_bitmap_partial_tests
 
-		# Then write a new MIDX.
-		test_commit new &&
-		git repack &&
-		git multi-pack-index write --bitmap &&
+	test_expect_success 'removing a MIDX clears stale bitmaps' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+			test_commit base &&
+			git repack &&
+			git multi-pack-index write --bitmap &&
+
+			# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
+			stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
+			rm $midx &&
+
+			# Then write a new MIDX.
+			test_commit new &&
+			git repack &&
+			git multi-pack-index write --bitmap &&
+
+			test_path_is_file $midx &&
+			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+			test_path_is_missing $stale_bitmap
+		)
+	'
 
-		test_path_is_file $midx &&
-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
-		test_path_is_missing $stale_bitmap
-	)
-'
+	test_expect_success 'pack.preferBitmapTips' '
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-test_expect_success 'pack.preferBitmapTips' '
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+			test_commit_bulk --message="%s" 103 &&
 
-		test_commit_bulk --message="%s" 103 &&
+			git log --format="%H" >commits.raw &&
+			sort <commits.raw >commits &&
 
-		git log --format="%H" >commits.raw &&
-		sort <commits.raw >commits &&
+			git log --format="create refs/tags/%s %H" HEAD >refs &&
+			git update-ref --stdin <refs &&
 
-		git log --format="create refs/tags/%s %H" HEAD >refs &&
-		git update-ref --stdin <refs &&
+			git multi-pack-index write --bitmap &&
+			test_path_is_file $midx &&
+			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
 
-		git multi-pack-index write --bitmap &&
-		test_path_is_file $midx &&
-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			comm -13 bitmaps commits >before &&
+			test_line_count = 1 before &&
 
-		test-tool bitmap list-commits | sort >bitmaps &&
-		comm -13 bitmaps commits >before &&
-		test_line_count = 1 before &&
+			perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
+				<before | git update-ref --stdin &&
 
-		perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
-			<before | git update-ref --stdin &&
+			rm -fr $midx-$(midx_checksum $objdir).bitmap &&
+			rm -fr $midx &&
 
-		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
-		rm -fr $midx &&
+			git -c pack.preferBitmapTips=refs/tags/include \
+				multi-pack-index write --bitmap &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			comm -13 bitmaps commits >after &&
 
-		git -c pack.preferBitmapTips=refs/tags/include \
-			multi-pack-index write --bitmap &&
-		test-tool bitmap list-commits | sort >bitmaps &&
-		comm -13 bitmaps commits >after &&
+			! test_cmp before after
+		)
+	'
 
-		! test_cmp before after
-	)
-'
+	test_expect_success 'writing a bitmap with --refs-snapshot' '
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-test_expect_success 'writing a bitmap with --refs-snapshot' '
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+			test_commit one &&
+			test_commit two &&
 
-		test_commit one &&
-		test_commit two &&
+			git rev-parse one >snapshot &&
 
-		git rev-parse one >snapshot &&
+			git repack -ad &&
 
-		git repack -ad &&
+			# First, write a MIDX which see both refs/tags/one and
+			# refs/tags/two (causing both of those commits to receive
+			# bitmaps).
+			git multi-pack-index write --bitmap &&
 
-		# First, write a MIDX which see both refs/tags/one and
-		# refs/tags/two (causing both of those commits to receive
-		# bitmaps).
-		git multi-pack-index write --bitmap &&
+			test_path_is_file $midx &&
+			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
 
-		test_path_is_file $midx &&
-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			grep "$(git rev-parse one)" bitmaps &&
+			grep "$(git rev-parse two)" bitmaps &&
 
-		test-tool bitmap list-commits | sort >bitmaps &&
-		grep "$(git rev-parse one)" bitmaps &&
-		grep "$(git rev-parse two)" bitmaps &&
+			rm -fr $midx-$(midx_checksum $objdir).bitmap &&
+			rm -fr $midx &&
 
-		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
-		rm -fr $midx &&
+			# Then again, but with a refs snapshot which only sees
+			# refs/tags/one.
+			git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
 
-		# Then again, but with a refs snapshot which only sees
-		# refs/tags/one.
-		git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
+			test_path_is_file $midx &&
+			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
 
-		test_path_is_file $midx &&
-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			grep "$(git rev-parse one)" bitmaps &&
+			! grep "$(git rev-parse two)" bitmaps
+		)
+	'
 
-		test-tool bitmap list-commits | sort >bitmaps &&
-		grep "$(git rev-parse one)" bitmaps &&
-		! grep "$(git rev-parse two)" bitmaps
-	)
-'
+	test_expect_success 'write a bitmap with --refs-snapshot (preferred tips)' '
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-test_expect_success 'write a bitmap with --refs-snapshot (preferred tips)' '
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+			test_commit_bulk --message="%s" 103 &&
 
-		test_commit_bulk --message="%s" 103 &&
+			git log --format="%H" >commits.raw &&
+			sort <commits.raw >commits &&
 
-		git log --format="%H" >commits.raw &&
-		sort <commits.raw >commits &&
+			git log --format="create refs/tags/%s %H" HEAD >refs &&
+			git update-ref --stdin <refs &&
 
-		git log --format="create refs/tags/%s %H" HEAD >refs &&
-		git update-ref --stdin <refs &&
+			git multi-pack-index write --bitmap &&
+			test_path_is_file $midx &&
+			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
 
-		git multi-pack-index write --bitmap &&
-		test_path_is_file $midx &&
-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			comm -13 bitmaps commits >before &&
+			test_line_count = 1 before &&
 
-		test-tool bitmap list-commits | sort >bitmaps &&
-		comm -13 bitmaps commits >before &&
-		test_line_count = 1 before &&
+			(
+				grep -vf before commits.raw &&
+				# mark missing commits as preferred
+				sed "s/^/+/" before
+			) >snapshot &&
 
+			rm -fr $midx-$(midx_checksum $objdir).bitmap &&
+			rm -fr $midx &&
+
+			git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			comm -13 bitmaps commits >after &&
+
+			! test_cmp before after
+		)
+	'
+
+	test_expect_success 'hash-cache values are propagated from pack bitmaps' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
 		(
-			grep -vf before commits.raw &&
-			# mark missing commits as preferred
-			sed "s/^/+/" before
-		) >snapshot &&
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
-		rm -fr $midx &&
+			test_commit base &&
+			test_commit base2 &&
+			git repack -adb &&
 
-		git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
-		test-tool bitmap list-commits | sort >bitmaps &&
-		comm -13 bitmaps commits >after &&
+			test-tool bitmap dump-hashes >pack.raw &&
+			test_file_not_empty pack.raw &&
+			sort pack.raw >pack.hashes &&
 
-		! test_cmp before after
-	)
-'
+			test_commit new &&
+			git repack &&
+			git multi-pack-index write --bitmap &&
 
-test_expect_success 'hash-cache values are propagated from pack bitmaps' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+			test-tool bitmap dump-hashes >midx.raw &&
+			sort midx.raw >midx.hashes &&
 
-		test_commit base &&
-		test_commit base2 &&
-		git repack -adb &&
+			# ensure that every namehash in the pack bitmap can be found in
+			# the midx bitmap (i.e., that there are no oid-namehash pairs
+			# unique to the pack bitmap).
+			comm -23 pack.hashes midx.hashes >dropped.hashes &&
+			test_must_be_empty dropped.hashes
+		)
+	'
 
-		test-tool bitmap dump-hashes >pack.raw &&
-		test_file_not_empty pack.raw &&
-		sort pack.raw >pack.hashes &&
+	test_expect_success 'no .bitmap is written without any objects' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-		test_commit new &&
-		git repack &&
-		git multi-pack-index write --bitmap &&
+			empty="$(git pack-objects $objdir/pack/pack </dev/null)" &&
+			cat >packs <<-EOF &&
+			pack-$empty.idx
+			EOF
 
-		test-tool bitmap dump-hashes >midx.raw &&
-		sort midx.raw >midx.hashes &&
+			git multi-pack-index write --bitmap --stdin-packs \
+				<packs 2>err &&
 
-		# ensure that every namehash in the pack bitmap can be found in
-		# the midx bitmap (i.e., that there are no oid-namehash pairs
-		# unique to the pack bitmap).
-		comm -23 pack.hashes midx.hashes >dropped.hashes &&
-		test_must_be_empty dropped.hashes
-	)
-'
+			grep "bitmap without any objects" err &&
 
-test_expect_success 'no .bitmap is written without any objects' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+			test_path_is_file $midx &&
+			test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
+		)
+	'
+
+	test_expect_success 'graceful fallback when missing reverse index' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-		empty="$(git pack-objects $objdir/pack/pack </dev/null)" &&
-		cat >packs <<-EOF &&
-		pack-$empty.idx
-		EOF
+			test_commit base &&
 
-		git multi-pack-index write --bitmap --stdin-packs \
-			<packs 2>err &&
+			# write a pack and MIDX bitmap containing base
+			git repack -adb &&
+			git multi-pack-index write --bitmap &&
 
-		grep "bitmap without any objects" err &&
+			GIT_TEST_MIDX_READ_RIDX=0 \
+				git rev-list --use-bitmap-index HEAD 2>err &&
+			! grep "ignoring extra bitmap file" err
+		)
+	'
+}
 
-		test_path_is_file $midx &&
-		test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
-	)
-'
+test_midx_bitmap_cases
+
+test_midx_bitmap_cases "pack.writeBitmapLookupTable"
 
-test_expect_success 'graceful fallback when missing reverse index' '
+test_expect_success 'multi-pack-index write writes lookup table if enabled' '
 	rm -fr repo &&
 	git init repo &&
 	test_when_finished "rm -fr repo" &&
 	(
 		cd repo &&
-
 		test_commit base &&
-
-		# write a pack and MIDX bitmap containing base
-		git repack -adb &&
-		git multi-pack-index write --bitmap &&
-
-		GIT_TEST_MIDX_READ_RIDX=0 \
-			git rev-list --use-bitmap-index HEAD 2>err &&
-		! grep "ignoring extra bitmap file" err
+		git config pack.writeBitmapLookupTable true &&
+		git repack -ad &&
+		GIT_TRACE2_EVENT="$(pwd)/trace" \
+			git multi-pack-index write --bitmap &&
+		grep "\"label\":\"writing_lookup_table\"" trace
 	)
 '
 
diff --git a/t/t5327-multi-pack-bitmaps-rev.sh b/t/t5327-multi-pack-bitmaps-rev.sh
index d30ba632c87..d01c61c0c7e 100755
--- a/t/t5327-multi-pack-bitmaps-rev.sh
+++ b/t/t5327-multi-pack-bitmaps-rev.sh
@@ -20,4 +20,13 @@ export GIT_TEST_MIDX_READ_RIDX
 midx_bitmap_core rev
 midx_bitmap_partial_tests rev
 
+test_expect_success 'reinitialize the repository with lookup table enabled' '
+    rm -fr * .git &&
+    git init &&
+    git config pack.writeBitmapLookupTable true
+'
+
+midx_bitmap_core rev
+midx_bitmap_partial_tests rev
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v3 4/6] pack-bitmap: prepare to read lookup table extension
  2022-07-04  8:46   ` [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                       ` (2 preceding siblings ...)
  2022-07-04  8:46     ` [PATCH v3 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
@ 2022-07-04  8:46     ` Abhradeep Chakraborty via GitGitGadget
  2022-07-15  2:46       ` Taylor Blau
  2022-07-04  8:46     ` [PATCH v3 5/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
                       ` (4 subsequent siblings)
  8 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-07-04  8:46 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee,
	Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Earlier change teaches Git to write bitmap lookup table. But Git
does not know how to parse them.

Teach Git to parse the existing bitmap lookup table. The older
versions of Git are not affected by it. Those versions ignore the
lookup table.

Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 pack-bitmap.c           | 266 ++++++++++++++++++++++++++++++++++++++--
 pack-bitmap.h           |   9 ++
 t/t5310-pack-bitmaps.sh |  22 ++++
 3 files changed, 287 insertions(+), 10 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 36134222d7a..e22bbbdc60e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -82,6 +82,12 @@ struct bitmap_index {
 	/* The checksum of the packfile or MIDX; points into map. */
 	const unsigned char *checksum;
 
+	/*
+	 * If not NULL, this point into the commit table extension
+	 * (within the memory mapped region `map`).
+	 */
+	unsigned char *table_lookup;
+
 	/*
 	 * Extended index.
 	 *
@@ -185,6 +191,16 @@ static int load_bitmap_header(struct bitmap_index *index)
 			index->hashes = (void *)(index_end - cache_size);
 			index_end -= cache_size;
 		}
+
+		if (flags & BITMAP_OPT_LOOKUP_TABLE) {
+			size_t table_size = st_mult(ntohl(header->entry_count),
+						    BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH);
+			if (table_size > index_end - index->map - header_size)
+				return error(_("corrupted bitmap index file (too short to fit lookup table)"));
+			if (git_env_bool("GIT_TEST_READ_COMMIT_TABLE", 1))
+				index->table_lookup = (void *)(index_end - table_size);
+			index_end -= table_size;
+		}
 	}
 
 	index->entry_count = ntohl(header->entry_count);
@@ -211,11 +227,13 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
 
 	hash_pos = kh_put_oid_map(index->bitmaps, stored->oid, &ret);
 
-	/* a 0 return code means the insertion succeeded with no changes,
-	 * because the SHA1 already existed on the map. this is bad, there
-	 * shouldn't be duplicated commits in the index */
+	/*
+	 * A 0 return code means the insertion succeeded with no changes,
+	 * because the SHA1 already existed on the map. This is bad, there
+	 * shouldn't be duplicated commits in the index.
+	 */
 	if (ret == 0) {
-		error("Duplicate entry in bitmap index: %s", oid_to_hex(oid));
+		error(_("duplicate entry in bitmap index: %s"), oid_to_hex(oid));
 		return NULL;
 	}
 
@@ -470,7 +488,7 @@ static int load_bitmap(struct bitmap_index *bitmap_git)
 		!(bitmap_git->tags = read_bitmap_1(bitmap_git)))
 		goto failed;
 
-	if (load_bitmap_entries_v1(bitmap_git) < 0)
+	if (!bitmap_git->table_lookup && load_bitmap_entries_v1(bitmap_git) < 0)
 		goto failed;
 
 	return 0;
@@ -557,13 +575,229 @@ struct include_data {
 	struct bitmap *seen;
 };
 
+struct bitmap_lookup_table_triplet {
+	uint32_t commit_pos;
+	uint64_t offset;
+	uint32_t xor_row;
+};
+
+struct bitmap_lookup_table_xor_item {
+	struct object_id oid;
+	uint64_t offset;
+};
+
+/*
+ * This function gets the raw triplet from `row`'th row in the
+ * lookup table and fills that data to the `triplet`.
+ */
+static int lookup_table_get_triplet(struct bitmap_index *bitmap_git,
+				    uint32_t pos,
+				    struct bitmap_lookup_table_triplet *triplet)
+{
+	unsigned char *p = NULL;
+	if (pos >= bitmap_git->entry_count)
+		return error(_("corrupt bitmap lookup table: triplet position out of index"));
+
+	p = bitmap_git->table_lookup + st_mult(pos, BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH);
+
+	triplet->commit_pos = get_be32(p);
+	p += sizeof(uint32_t);
+	triplet->offset = get_be64(p);
+	p += sizeof(uint64_t);
+	triplet->xor_row = get_be32(p);
+	return 0;
+}
+
+/*
+ * Searches for a matching triplet. `va` is a pointer
+ * to the wanted commit position value. `vb` points to
+ * a triplet in lookup table. The first 4 bytes of each
+ * triplet (pointed by `vb`) are compared with `*va`.
+ */
+static int triplet_cmp(const void *va, const void *vb)
+{
+
+	uint32_t a = *(uint32_t *)va;
+	uint32_t b = get_be32(vb);
+	if (a > b)
+		return 1;
+	else if (a < b)
+		return -1;
+
+	return 0;
+}
+
+static uint32_t bsearch_pos(struct bitmap_index *bitmap_git,
+			    struct object_id *oid,
+			    uint32_t *result)
+{
+	int found;
+
+	if (bitmap_is_midx(bitmap_git))
+		found = bsearch_midx(oid, bitmap_git->midx, result);
+	else
+		found = bsearch_pack(oid, bitmap_git->pack, result);
+
+	return found;
+}
+
+/*
+ * `bsearch_triplet` function searches for the raw triplet having
+ * commit position same as `commit_pos` and fills `triplet`
+ * object from the raw triplet. Returns 1 on success and 0
+ * on failure.
+ */
+static int bsearch_triplet(uint32_t *commit_pos,
+			   struct bitmap_index *bitmap_git,
+			   struct bitmap_lookup_table_triplet *triplet)
+{
+	unsigned char *p = bsearch(commit_pos, bitmap_git->table_lookup, bitmap_git->entry_count,
+				   BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH, triplet_cmp);
+
+	if (!p)
+		return 0;
+	triplet->commit_pos = get_be32(p);
+	p += sizeof(uint32_t);
+	triplet->offset = get_be64(p);
+	p += sizeof(uint64_t);
+	triplet->xor_row = get_be32(p);
+	return 1;
+}
+
+static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_git,
+					  struct commit *commit)
+{
+	uint32_t commit_pos, xor_row;
+	uint64_t offset;
+	int flags;
+	struct bitmap_lookup_table_triplet triplet;
+	struct object_id *oid = &commit->object.oid;
+	struct ewah_bitmap *bitmap;
+	struct stored_bitmap *xor_bitmap = NULL;
+
+	int found = bsearch_pos(bitmap_git, oid, &commit_pos);
+
+	if (!found)
+		return NULL;
+
+	if (!bsearch_triplet(&commit_pos, bitmap_git, &triplet))
+		return NULL;
+
+	offset = triplet.offset;
+	xor_row = triplet.xor_row;
+
+	if (xor_row != 0xffffffff) {
+		int xor_flags;
+		khiter_t hash_pos;
+		uint64_t offset_xor;
+		struct bitmap_lookup_table_xor_item *xor_items;
+		struct bitmap_lookup_table_xor_item xor_item;
+		size_t xor_items_nr = 0, xor_items_alloc = 64;
+
+		ALLOC_ARRAY(xor_items, xor_items_alloc);
+		while (xor_row != 0xffffffff) {
+			struct object_id xor_oid;
+
+			if (xor_items_nr + 1 >= bitmap_git->entry_count) {
+				free(xor_items);
+				error(_("corrupt bitmap lookup table: xor chain exceed entry count"));
+				return NULL;
+			}
+
+			if (lookup_table_get_triplet(bitmap_git, xor_row, &triplet) < 0)
+				return NULL;
+
+			offset_xor = triplet.offset;
+
+			if (nth_bitmap_object_oid(bitmap_git, &xor_oid, triplet.commit_pos) < 0) {
+				free(xor_items);
+				error(_("corrupt bitmap lookup table: commit index %u out of range"),
+					triplet.commit_pos);
+				return NULL;
+			}
+
+			hash_pos = kh_get_oid_map(bitmap_git->bitmaps, xor_oid);
+
+			/*
+			 * If desired bitmap is already stored, we don't need
+			 * to iterate further. Because we know that bitmaps
+			 * that are needed to be parsed to parse this bitmap
+			 * has already been stored. So, assign this stored bitmap
+			 * to the xor_bitmap.
+			 */
+			if (hash_pos < kh_end(bitmap_git->bitmaps) &&
+			    (xor_bitmap = kh_value(bitmap_git->bitmaps, hash_pos)))
+				break;
+
+			ALLOC_GROW(xor_items, xor_items_nr + 1, xor_items_alloc);
+			xor_items[xor_items_nr++] = (struct bitmap_lookup_table_xor_item) {.oid = xor_oid,
+											   .offset = offset_xor};
+			xor_row = triplet.xor_row;
+		}
+
+		while (xor_items_nr) {
+			xor_item = xor_items[xor_items_nr - 1];
+			offset_xor = xor_item.offset;
+
+			bitmap_git->map_pos = offset_xor;
+			if (bitmap_git->map_size - bitmap_git->map_pos < 6) {
+				error(_("corrupt ewah bitmap: truncated header for bitmap of commit \"%s\""),
+					oid_to_hex(&xor_item.oid));
+				free(xor_items);
+				return NULL;
+			}
+
+			bitmap_git->map_pos = bitmap_git->map_pos + sizeof(uint32_t) + sizeof(uint8_t);
+			xor_flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
+			bitmap = read_bitmap_1(bitmap_git);
+
+			if (!bitmap) {
+				free(xor_items);
+				return NULL;
+			}
+
+			xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item.oid, xor_bitmap, xor_flags);
+			xor_items_nr--;
+		}
+
+		free(xor_items);
+	}
+
+	bitmap_git->map_pos = offset;
+	if (bitmap_git->map_size - bitmap_git->map_pos < 6) {
+		error(_("corrupt ewah bitmap: truncated header for bitmap of commit \"%s\""),
+			oid_to_hex(oid));
+		return NULL;
+	}
+
+	bitmap_git->map_pos = bitmap_git->map_pos + sizeof(uint32_t) + sizeof(uint8_t);
+	flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
+	bitmap = read_bitmap_1(bitmap_git);
+
+	if (!bitmap)
+		return NULL;
+
+	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags);
+}
+
 struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
 				      struct commit *commit)
 {
 	khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps,
 					   commit->object.oid);
-	if (hash_pos >= kh_end(bitmap_git->bitmaps))
-		return NULL;
+	if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
+		struct stored_bitmap *bitmap = NULL;
+		if (!bitmap_git->table_lookup)
+			return NULL;
+
+		trace2_region_enter("pack-bitmap", "reading_lookup_table", the_repository);
+		/* NEEDSWORK: cache misses aren't recorded */
+		bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
+		trace2_region_leave("pack-bitmap", "reading_lookup_table", the_repository);
+		if (!bitmap)
+			return NULL;
+		return lookup_stored_bitmap(bitmap);
+	}
 	return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
 }
 
@@ -1699,8 +1933,10 @@ void test_bitmap_walk(struct rev_info *revs)
 	if (revs->pending.nr != 1)
 		die("you must specify exactly one commit to test");
 
-	fprintf(stderr, "Bitmap v%d test (%d entries loaded)\n",
-		bitmap_git->version, bitmap_git->entry_count);
+	fprintf(stderr, "Bitmap v%d test (%d entries%s)",
+		bitmap_git->version,
+		bitmap_git->entry_count,
+		bitmap_git->table_lookup ? "" : " loaded");
 
 	root = revs->pending.objects[0].item;
 	bm = bitmap_for_commit(bitmap_git, (struct commit *)root);
@@ -1753,13 +1989,23 @@ void test_bitmap_walk(struct rev_info *revs)
 
 int test_bitmap_commits(struct repository *r)
 {
-	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
 	struct object_id oid;
 	MAYBE_UNUSED void *value;
+	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
+
+	/*
+	 * As this function is only used to print bitmap selected
+	 * commits, we don't have to read the commit table.
+	 */
 
 	if (!bitmap_git)
 		die("failed to load bitmap indexes");
 
+	if (bitmap_git->table_lookup) {
+		if (load_bitmap_entries_v1(bitmap_git) < 0)
+			die(_("failed to load bitmap indexes"));
+	}
+
 	kh_foreach(bitmap_git->bitmaps, oid, value, {
 		printf("%s\n", oid_to_hex(&oid));
 	});
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 67a9d0fc303..9278f71ac91 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -23,6 +23,15 @@ struct bitmap_disk_header {
 
 #define NEEDS_BITMAP (1u<<22)
 
+/*
+ * The width in bytes of a single triplet in the lookup table
+ * extension:
+ *     (commit_pos, offset, xor_row)
+ *
+ * whose fields ar 32-, 64-, 32- bits wide, respectively.
+ */
+#define BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH (16)
+
 enum pack_bitmap_opts {
 	BITMAP_OPT_FULL_DAG = 0x1,
 	BITMAP_OPT_HASH_CACHE = 0x4,
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index c0607172827..7e50f8e7653 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -258,6 +258,7 @@ test_bitmap_cases () {
 
 	test_expect_success 'truncated bitmap fails gracefully (ewah)' '
 		test_config pack.writebitmaphashcache false &&
+		test_config pack.writebitmaplookuptable false &&
 		git repack -ad &&
 		git rev-list --use-bitmap-index --count --all >expect &&
 		bitmap=$(ls .git/objects/pack/*.bitmap) &&
@@ -270,6 +271,7 @@ test_bitmap_cases () {
 	'
 
 	test_expect_success 'truncated bitmap fails gracefully (cache)' '
+		git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 		git repack -ad &&
 		git rev-list --use-bitmap-index --count --all >expect &&
 		bitmap=$(ls .git/objects/pack/*.bitmap) &&
@@ -453,4 +455,24 @@ test_expect_success 'verify writing bitmap lookup table when enabled' '
 	grep "\"label\":\"writing_lookup_table\"" trace2
 '
 
+test_expect_success 'lookup table is actually used to traverse objects' '
+	git repack -adb &&
+	GIT_TRACE2_EVENT="$(pwd)/trace3" \
+		git rev-list --use-bitmap-index --count --all &&
+	grep "\"label\":\"reading_lookup_table\"" trace3
+'
+
+test_expect_success 'truncated bitmap fails gracefully (lookup table)' '
+	test_config pack.writebitmaphashcache false &&
+	git repack -adb &&
+	git rev-list --use-bitmap-index --count --all >expect &&
+	bitmap=$(ls .git/objects/pack/*.bitmap) &&
+	test_when_finished "rm -f $bitmap" &&
+	test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
+	mv -f $bitmap.tmp $bitmap &&
+	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
+	test_cmp expect actual &&
+	test_i18ngrep corrupted.bitmap.index stderr
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v3 5/6] bitmap-lookup-table: add performance tests for lookup table
  2022-07-04  8:46   ` [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                       ` (3 preceding siblings ...)
  2022-07-04  8:46     ` [PATCH v3 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
@ 2022-07-04  8:46     ` Abhradeep Chakraborty via GitGitGadget
  2022-07-15  2:53       ` Taylor Blau
  2022-07-04  8:46     ` [PATCH v3 6/6] p5310-pack-bitmaps.sh: remove pack.writeReverseIndex Abhradeep Chakraborty via GitGitGadget
                       ` (3 subsequent siblings)
  8 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-07-04  8:46 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee,
	Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Add performance tests to verify the performance of lookup table with
`pack.writeReverseIndex` enabled. This is to check the performance
when the above configuration is set.

Lookup table makes Git run faster in most of the cases. Below is the
result of `t/perf/p5310-pack-bitmaps.sh`.`perf/p5326-multi-pack-bitmaps.sh`
gives similar result. The repository used in the test is linux kernel.

Test                                                      this tree
---------------------------------------------------------------------------
5310.4: repack to disk (lookup=false)                   296.55(256.53+14.52)
5310.5: simulated clone                                 15.64(8.88+1.39)
5310.6: simulated fetch                                 1.65(2.75+0.20)
5310.7: pack to file (bitmap)                           48.71(30.20+7.58)
5310.8: rev-list (commits)                              0.61(0.41+0.08)
5310.9: rev-list (objects)                              4.38(4.26+0.09)
5310.10: rev-list with tag negated via --not            0.07(0.02+0.04)
         --all (objects)
5310.11: rev-list with negative tag (objects)           0.05(0.01+0.03)
5310.12: rev-list count with blob:none                  0.08(0.03+0.04)
5310.13: rev-list count with blob:limit=1k              7.29(6.92+0.30)
5310.14: rev-list count with tree:0                     0.08(0.03+0.04)
5310.15: simulated partial clone                        9.45(8.12+0.41)
5310.19: repack to disk (lookup=true)                   255.92(188.13+20.47)
5310.20: simulated clone                                13.78(8.84+1.09)
5310.21: simulated fetch                                0.52(0.63+0.14)
5310.22: pack to file (bitmap)                          44.34(28.94+6.84)
5310.23: rev-list (commits)                             0.48(0.31+0.06)
5310.24: rev-list (objects)                             4.02(3.93+0.07)
5310.25: rev-list with tag negated via --not            0.04(0.00+0.03)
         --all (objects)
5310.26: rev-list with negative tag (objects)           0.04(0.00+0.03)
5310.27: rev-list count with blob:none                  0.04(0.01+0.03)
5310.28: rev-list count with blob:limit=1k              6.48(6.23+0.22)
5310.29: rev-list count with tree:0                     0.04(0.01+0.03)
5310.30: simulated partial clone                        8.30(7.21+0.36)

Test 4-15 are tested without using lookup table. Same tests are
repeated in 16-30 (using lookup table).

Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 t/perf/p5310-pack-bitmaps.sh       | 66 ++++++++++++---------
 t/perf/p5326-multi-pack-bitmaps.sh | 93 ++++++++++++++++--------------
 2 files changed, 89 insertions(+), 70 deletions(-)

diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index 7ad4f237bc3..1ad3c3f14c6 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -13,42 +13,52 @@ test_perf_large_repo
 # We intentionally use the deprecated pack.writebitmaps
 # config so that we can test against older versions of git.
 test_expect_success 'setup bitmap config' '
-	git config pack.writebitmaps true
+	git config pack.writebitmaps true &&
+	git config pack.writeReverseIndex true
 '
 
-# we need to create the tag up front such that it is covered by the repack and
-# thus by generated bitmaps.
-test_expect_success 'create tags' '
-	git tag --message="tag pointing to HEAD" perf-tag HEAD
-'
+test_bitmap () {
+	local enabled="$1"
 
-test_perf 'repack to disk' '
-	git repack -ad
-'
+	# we need to create the tag up front such that it is covered by the repack and
+	# thus by generated bitmaps.
+	test_expect_success 'create tags' '
+		git tag --message="tag pointing to HEAD" perf-tag HEAD
+	'
 
-test_full_bitmap
+	test_expect_success "use lookup table: $enabled" '
+		git config pack.writeBitmapLookupTable '"$enabled"'
+	'
 
-test_expect_success 'create partial bitmap state' '
-	# pick a commit to represent the repo tip in the past
-	cutoff=$(git rev-list HEAD~100 -1) &&
-	orig_tip=$(git rev-parse HEAD) &&
+	test_perf "repack to disk (lookup=$enabled)" '
+		git repack -ad
+	'
 
-	# now kill off all of the refs and pretend we had
-	# just the one tip
-	rm -rf .git/logs .git/refs/* .git/packed-refs &&
-	git update-ref HEAD $cutoff &&
+	test_full_bitmap
 
-	# and then repack, which will leave us with a nice
-	# big bitmap pack of the "old" history, and all of
-	# the new history will be loose, as if it had been pushed
-	# up incrementally and exploded via unpack-objects
-	git repack -Ad &&
+	test_expect_success "create partial bitmap state (lookup=$enabled)" '
+		# pick a commit to represent the repo tip in the past
+		cutoff=$(git rev-list HEAD~100 -1) &&
+		orig_tip=$(git rev-parse HEAD) &&
 
-	# and now restore our original tip, as if the pushes
-	# had happened
-	git update-ref HEAD $orig_tip
-'
+		# now kill off all of the refs and pretend we had
+		# just the one tip
+		rm -rf .git/logs .git/refs/* .git/packed-refs &&
+		git update-ref HEAD $cutoff &&
+
+		# and then repack, which will leave us with a nice
+		# big bitmap pack of the "old" history, and all of
+		# the new history will be loose, as if it had been pushed
+		# up incrementally and exploded via unpack-objects
+		git repack -Ad &&
+
+		# and now restore our original tip, as if the pushes
+		# had happened
+		git update-ref HEAD $orig_tip
+	'
+}
 
-test_partial_bitmap
+test_bitmap false
+test_bitmap true
 
 test_done
diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
index f2fa228f16a..c8cc68185a1 100755
--- a/t/perf/p5326-multi-pack-bitmaps.sh
+++ b/t/perf/p5326-multi-pack-bitmaps.sh
@@ -6,47 +6,56 @@ test_description='Tests performance using midx bitmaps'
 
 test_perf_large_repo
 
-# we need to create the tag up front such that it is covered by the repack and
-# thus by generated bitmaps.
-test_expect_success 'create tags' '
-	git tag --message="tag pointing to HEAD" perf-tag HEAD
-'
-
-test_expect_success 'start with bitmapped pack' '
-	git repack -adb
-'
-
-test_perf 'setup multi-pack index' '
-	git multi-pack-index write --bitmap
-'
-
-test_expect_success 'drop pack bitmap' '
-	rm -f .git/objects/pack/pack-*.bitmap
-'
-
-test_full_bitmap
-
-test_expect_success 'create partial bitmap state' '
-	# pick a commit to represent the repo tip in the past
-	cutoff=$(git rev-list HEAD~100 -1) &&
-	orig_tip=$(git rev-parse HEAD) &&
-
-	# now pretend we have just one tip
-	rm -rf .git/logs .git/refs/* .git/packed-refs &&
-	git update-ref HEAD $cutoff &&
-
-	# and then repack, which will leave us with a nice
-	# big bitmap pack of the "old" history, and all of
-	# the new history will be loose, as if it had been pushed
-	# up incrementally and exploded via unpack-objects
-	git repack -Ad &&
-	git multi-pack-index write --bitmap &&
-
-	# and now restore our original tip, as if the pushes
-	# had happened
-	git update-ref HEAD $orig_tip
-'
-
-test_partial_bitmap
+test_bitmap () {
+	local enabled="$1"
+
+	# we need to create the tag up front such that it is covered by the repack and
+	# thus by generated bitmaps.
+	test_expect_success 'create tags' '
+		git tag --message="tag pointing to HEAD" perf-tag HEAD
+	'
+
+	test_expect_success "use lookup table: $enabled" '
+		git config pack.writeBitmapLookupTable '"$enabled"'
+	'
+
+	test_expect_success "start with bitmapped pack (lookup=$enabled)" '
+		git repack -adb
+	'
+
+	test_perf "setup multi-pack index (lookup=$enabled)" '
+		git multi-pack-index write --bitmap
+	'
+
+	test_expect_success "drop pack bitmap (lookup=$enabled)" '
+		rm -f .git/objects/pack/pack-*.bitmap
+	'
+
+	test_full_bitmap
+
+	test_expect_success "create partial bitmap state (lookup=$enabled)" '
+		# pick a commit to represent the repo tip in the past
+		cutoff=$(git rev-list HEAD~100 -1) &&
+		orig_tip=$(git rev-parse HEAD) &&
+
+		# now pretend we have just one tip
+		rm -rf .git/logs .git/refs/* .git/packed-refs &&
+		git update-ref HEAD $cutoff &&
+
+		# and then repack, which will leave us with a nice
+		# big bitmap pack of the "old" history, and all of
+		# the new history will be loose, as if it had been pushed
+		# up incrementally and exploded via unpack-objects
+		git repack -Ad &&
+		git multi-pack-index write --bitmap &&
+
+		# and now restore our original tip, as if the pushes
+		# had happened
+		git update-ref HEAD $orig_tip
+	'
+}
+
+test_bitmap false
+test_bitmap true
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v3 6/6] p5310-pack-bitmaps.sh: remove pack.writeReverseIndex
  2022-07-04  8:46   ` [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                       ` (4 preceding siblings ...)
  2022-07-04  8:46     ` [PATCH v3 5/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
@ 2022-07-04  8:46     ` Abhradeep Chakraborty via GitGitGadget
  2022-07-04 16:35     ` [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty
                       ` (2 subsequent siblings)
  8 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-07-04  8:46 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee,
	Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

The previous change enables the `pack.writereverseindex` to see
the effect of writing reverse index in the performance test.

Remove the `pack.writeReverseIndex` configuration.

Below is the result of performance test. Output format is in
seconds.

Test                                                  this tree
------------------------------------------------------------------------
5310.4: repack to disk (lookup=false)               293.80(251.30+14.30)
5310.5: simulated clone                             12.50(5.15+1.36)
5310.6: simulated fetch                             1.83(2.90+0.23)
5310.7: pack to file (bitmap)                       39.70(20.25+7.14)
5310.8: rev-list (commits)                          1.00(0.60+0.13)
5310.9: rev-list (objects)                          4.11(4.00+0.10)
5310.10: rev-list with tag negated via --not        0.07(0.02+0.05)
         --all (objects)
5310.11: rev-list with negative tag (objects)       0.23(0.16+0.06)
5310.12: rev-list count with blob:none              0.27(0.18+0.08)
5310.13: rev-list count with blob:limit=1k          6.41(5.98+0.41)
5310.14: rev-list count with tree:0                 0.26(0.18+0.07)
5310.15: simulated partial clone                    4.34(3.29+0.37)
5310.19: repack to disk (lookup=true)               250.93(171.97+20.78)
5310.20: simulated clone                            10.80(5.14+1.06)
5310.21: simulated fetch                            0.71(0.79+0.16)
5310.22: pack to file (bitmap)                      39.49(20.19+6.98)
5310.23: rev-list (commits)                         0.81(0.48+0.09)
5310.24: rev-list (objects)                         3.48(3.38+0.09)
5310.25: rev-list with tag negated via --not        0.04(0.00+0.03)
         --all (objects)
5310.26: rev-list with negative tag (objects)       0.22(0.16+0.05)
5310.27: rev-list count with blob:none              0.22(0.16+0.05)
5310.28: rev-list count with blob:limit=1k          6.21(5.76+0.29)
5310.29: rev-list count with tree:0                 0.23(0.16+0.06)
5310.30: simulated partial clone                    4.53(3.14+0.39)

Tests 4-15 are without the use of lookup table. The rests are
repeatation of the previous tests but using lookup table.

Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 t/perf/p5310-pack-bitmaps.sh | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index 1ad3c3f14c6..ac5b7341e8e 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -13,8 +13,7 @@ test_perf_large_repo
 # We intentionally use the deprecated pack.writebitmaps
 # config so that we can test against older versions of git.
 test_expect_success 'setup bitmap config' '
-	git config pack.writebitmaps true &&
-	git config pack.writeReverseIndex true
+	git config pack.writebitmaps true
 '
 
 test_bitmap () {
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format
  2022-07-04  8:46   ` [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                       ` (5 preceding siblings ...)
  2022-07-04  8:46     ` [PATCH v3 6/6] p5310-pack-bitmaps.sh: remove pack.writeReverseIndex Abhradeep Chakraborty via GitGitGadget
@ 2022-07-04 16:35     ` Abhradeep Chakraborty
  2022-07-06 19:21     ` Junio C Hamano
  2022-07-20 14:05     ` [PATCH v4 " Abhradeep Chakraborty via GitGitGadget
  8 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-07-04 16:35 UTC (permalink / raw)
  To: Git; +Cc: Abhradeep Chakraborty, Taylor Blau, Kaartic Sivaraam, Derrick Stolee


Oops, I forgot to edit the PR cover letter!

I am adding it here. Sorry for that :P

Changes since v2:

 * Log messages related issues are fixed.
 * `pack.writeBitmapLookupTable` is now by default disabled.
 * Documentations are improved.
 * `xor_row` is used instead of `xor_pos` in triplets.
 * In `pack-bitmap-write.c`, `off_t *` is used for `offsets` array
   (Instead of `uint64_t *`).
 * `struct bitmap_lookup_table_triplet` is introduced and functions
   Like `triplet_get_offset()` and `triplet_get_xor_pos()` are removed.
 * `table_size` is getting subtracted from `index_end` irrespective of
   the value of `GIT_TEST_READ_COMMIT_TABLE`.
 * xor stack filling loop will stop iterating if a xor bitmap is already
   stored/parsed.
 * The stack will now store `bitmap_lookup_table_xor_item` items
   Of plain xor_row.
 * bitmap related test files are reformatted to allow repeating of tests
   with bitmap extension enabled.
 * comments are added.

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format
  2022-07-04  8:46   ` [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                       ` (6 preceding siblings ...)
  2022-07-04 16:35     ` [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty
@ 2022-07-06 19:21     ` Junio C Hamano
  2022-07-07  8:48       ` Abhradeep Chakraborty
  2022-07-20 14:05     ` [PATCH v4 " Abhradeep Chakraborty via GitGitGadget
  8 siblings, 1 reply; 162+ messages in thread
From: Junio C Hamano @ 2022-07-06 19:21 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Taylor Blau, Kaartic Sivaram, Derrick Stolee, Abhradeep Chakraborty

"Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> When parsing the .bitmap file, git loads all the bitmaps one by one even if
> some of the bitmaps are not necessary. We can remove this overhead by
> loading only the necessary bitmaps. A look up table extension can solve this
> issue.
>
> Changes since v1:
>
> This is the second version which addressed all (I think) the reviews. Please
> notify me if some reviews are not addressed :)

Is this the second version that is labeled as "v3" ;-)?

>  Documentation/technical/bitmap-format.txt |  39 ++

I haven't tried merging it yet, but doesn't [1/6] overlap with and
semantically depend on your other series that touch the formatting
of this file?

Thanks.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format
  2022-07-06 19:21     ` Junio C Hamano
@ 2022-07-07  8:48       ` Abhradeep Chakraborty
  2022-07-07 18:09         ` Kaartic Sivaraam
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-07-07  8:48 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Abhradeep Chakraborty, Git, Taylor Blau, Kaartic Sivaraam,
	Derrick Stolee


Junio C Hamano <gitster@pobox.com> wrote:

> > This is the second version which addressed all (I think) the reviews. Please
> > notify me if some reviews are not addressed :)
>
> Is this the second version that is labeled as "v3" ;-)?

Hi junio,

No, it is actually the third version. I forgot to update the cover
letter :P.

I am using Github's gitgitgadget to submit PRs and it
uses PR description as the cover letter.

So before submitting a new version of patchset, PR description
must be updated which I missed this time.

I wrote a reply comment[1] where you can find a summary of all the
new changes.

[1] https://lore.kernel.org/git/20220704163506.76162-1-chakrabortyabhradeep79@gmail.com/

> >  Documentation/technical/bitmap-format.txt |  39 ++
>
> I haven't tried merging it yet, but doesn't [1/6] overlap with and
> semantically depend on your other series that touch the formatting
> of this file?

Correct, [1/6] indeed depends on my previous patch series[2] and it
is assuming that that series has already been merged. As far as it seems,
it will not create any merge conflicts while merging but I am not sure.
This would be interesting to see.

[2] https://lore.kernel.org/git/pull.1246.v4.git.1655355834.gitgitgadget@gmail.com/

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format
  2022-07-07  8:48       ` Abhradeep Chakraborty
@ 2022-07-07 18:09         ` Kaartic Sivaraam
  2022-07-07 18:42           ` Abhradeep Chakraborty
  0 siblings, 1 reply; 162+ messages in thread
From: Kaartic Sivaraam @ 2022-07-07 18:09 UTC (permalink / raw)
  To: Abhradeep Chakraborty, Junio C Hamano; +Cc: Git, Taylor Blau, Derrick Stolee

On 07-07-2022 14:18, Abhradeep Chakraborty wrote:
> 
> Junio C Hamano <gitster@pobox.com> wrote:
> 
>>>  Documentation/technical/bitmap-format.txt |  39 ++
>>
>> I haven't tried merging it yet, but doesn't [1/6] overlap with and
>> semantically depend on your other series that touch the formatting
>> of this file?
> 
> Correct, [1/6] indeed depends on my previous patch series[2] and it
> is assuming that that series has already been merged.

I suppose it's the opposite. A quick check shows that the patch applies
cleanly over 'master' but fails to apply over 'next' which has the
changes from your other patch series. So, the base branch for [1/6]
is 'master'. The other 5 patches clearly don't conflict.

> As far as it seems,
> it will not create any merge conflicts while merging but I am not sure.
> This would be interesting to see.
>

Since the first hunk of 1/6 and your other series touch the same area
of Documentation/technical/bitmap-format.txt, the changes conflict.
Junio might be able to handle this one. If not, you would need to look
into separate 1/6 and based it over your other series to avoid the
conflict.

--
Sivaraam

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format
  2022-07-07 18:09         ` Kaartic Sivaraam
@ 2022-07-07 18:42           ` Abhradeep Chakraborty
  0 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-07-07 18:42 UTC (permalink / raw)
  To: Kaartic Sivaraam
  Cc: Abhradeep Chakraborty, Git, Junio C Hamano, Taylor Blau, Derrick Stolee


Kaartic Sivaraam <kaartic.sivaraam@gmail.com> wrote:

>> Correct, [1/6] indeed depends on my previous patch series[2] and it
>> is assuming that that series has already been merged.
>
> I suppose it's the opposite. A quick check shows that the patch applies
> cleanly over 'master' but fails to apply over 'next' which has the
> changes from your other patch series. So, the base branch for [1/6]
> is 'master'. The other 5 patches clearly don't conflict.

Actually by saying "[1/6] indeed depends on my previous patch series[2]
and it is assuming that that series has already been merged.", I wanted
to mean that the format followed in this patch (e.g. description list,
indentation etc.) is dependent on the format changes introduced in that
Patch series.

If you say about the base branch, yes, you're right. The base branch is
'Master'.

> Since the first hunk of 1/6 and your other series touch the same area
> of Documentation/technical/bitmap-format.txt, the changes conflict.
> Junio might be able to handle this one. If not, you would need to look
> into separate 1/6 and based it over your other series to avoid the
> conflict.

Oh, I see. I have no problem doing that :)
Let me know if Junio face any problem fixing the conflict.

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-07-04  8:46     ` [PATCH v3 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
@ 2022-07-08 16:38       ` Philip Oakley
  2022-07-09  7:53         ` Abhradeep Chakraborty
  0 siblings, 1 reply; 162+ messages in thread
From: Philip Oakley @ 2022-07-08 16:38 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget, git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee, Abhradeep Chakraborty

Hi Abhradeep,

On 04/07/2022 09:46, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>
> When reading bitmap file, Git loads each and every bitmap one by one
> even if all the bitmaps are not required. A "bitmap lookup table"
> extension to the bitmap format can reduce the overhead of loading
> bitmaps which stores a list of bitmapped commit id pos (in the midx
> or pack, along with their offset and xor offset. This way git can
> load only the necessary bitmaps without loading the previous bitmaps.
>
> Older versions of Git ignore the lookup table extension and don't
> throw any kind of warning or error while parsing the bitmap file.
>
> Add some information for the new "bitmap lookup table" extension in the
> bitmap-format documentation.

Not sure if this is new in this extension, but should there be a link or
two to the basics of XOR compression and some of the bitmap look up
techniques?

It's not always obvious if these techniques are 'heuristic' and only
have partial commit data, or they have all the commits listed, Nor
how/why they work. My point is more about giving new readers a hand-up
in their understanding, rather than simple implementation details for
those who already know what is going on. For example, are there any
external articles that you found helpful in getting started that could
be referenced somewhere in the docs?

Separately I'm preparing a short series on adding 'reachability bitmap'
and 'commit graph' (among other stuff) to the glossary as part of giving
folks [0] stepping stones to cross the chasm of understanding

Philip

[0] me included;-)
>
> Mentored-by: Taylor Blau <me@ttaylorr.com>
> Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
> Co-Authored-by: Taylor Blau <me@ttaylorr.com>
> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> ---
>  Documentation/technical/bitmap-format.txt | 39 +++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
>
> diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
> index 04b3ec21785..c30dc177643 100644
> --- a/Documentation/technical/bitmap-format.txt
> +++ b/Documentation/technical/bitmap-format.txt
> @@ -67,6 +67,17 @@ MIDXs, both the bit-cache and rev-cache extensions are required.
>  			pack/MIDX. The format and meaning of the name-hash is
>  			described below.
>  
> +			** {empty}
> +			BITMAP_OPT_LOOKUP_TABLE (0x10): :::
> +			If present, the end of the bitmap file contains a table
> +			containing a list of `N` <commit_pos, offset, xor_row>
> +			triplets. The format and meaning of the table is described
> +			below.
> ++
> +NOTE: Unlike the xor_offset used to compress an individual bitmap,
> +`xor_row` stores an *absolute* index into the lookup table, not a location
> +relative to the current entry.
> +
>  		4-byte entry count (network byte order)
>  
>  			The total count of entries (bitmapped commits) in this bitmap index.
> @@ -205,3 +216,31 @@ Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag.
>  If implementations want to choose a different hashing scheme, they are
>  free to do so, but MUST allocate a new header flag (because comparing
>  hashes made under two different schemes would be pointless).
> +
> +Commit lookup table
> +-------------------
> +
> +If the BITMAP_OPT_LOOKUP_TABLE flag is set, the last `N * (4 + 8 + 4)`
> +bytes (preceding the name-hash cache and trailing hash) of the `.bitmap`
> +file contains a lookup table specifying the information needed to get
> +the desired bitmap from the entries without parsing previous unnecessary
> +bitmaps.
> +
> +For a `.bitmap` containing `nr_entries` reachability bitmaps, the table
> +contains a list of `nr_entries` <commit_pos, offset, xor_row> triplets
> +(sorted in the ascending order of `commit_pos`). The content of i'th
> +triplet is -
> +
> +	* {empty}
> +	commit_pos (4 byte integer, network byte order): ::
> +	It stores the object position of a commit (in the midx or pack
> +	index).
> +
> +	* {empty}
> +	offset (8 byte integer, network byte order): ::
> +	The offset from which that commit's bitmap can be read.
> +
> +	* {empty}
> +	xor_row (4 byte integer, network byte order): ::
> +	The position of the triplet whose bitmap is used to compress
> +	this one, or `0xffffffff` if no such bitmap exists.


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-07-08 16:38       ` Philip Oakley
@ 2022-07-09  7:53         ` Abhradeep Chakraborty
  2022-07-10 15:01           ` Philip Oakley
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-07-09  7:53 UTC (permalink / raw)
  To: Philip Oakley
  Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Taylor Blau,
	Derrick Stolee, Junio C Hamano


Hello Philip,

Philip Oakley <philipoakley@iee.email> wrote:

> Not sure if this is new in this extension, but should there be a link or
> two to the basics of XOR compression and some of the bitmap look up
> techniques?
>
> It's not always obvious if these techniques are 'heuristic' and only
> have partial commit data, or they have all the commits listed, Nor
> how/why they work. My point is more about giving new readers a hand-up
> in their understanding, rather than simple implementation details for
> those who already know what is going on. For example, are there any
> external articles that you found helpful in getting started that could
> be referenced somewhere in the docs?

As this series is only about adding a lookup-table extension (and not
about bitmap itself), I am not sure whether it's good to include those
things in this series. But I agree with your point that it should be
able build a logical understanding among the new readers.

There are some external articles[1] which talk about bitmap internals.
But I think it would be better if we can make a new doc file (may be
`Documentation/technical/reachability-bitmaps.txt` or similar) rather
than putting those details in the `bitmap-format.txt` (As the name 
suggests, this file should only contain format details of bitmaps).
That file would provide the answers of "Why bitmaps", "how they are
stored",  "How they are fetched", "how they work with pack-objects,
git-fetch, midx etc.", "Detailed explanation of each bitmap extension"
, and lastly "what are the future works" (if any).

What do you think?

> Separately I'm preparing a short series on adding 'reachability bitmap'
> and 'commit graph' (among other stuff) to the glossary as part of giving
> folks [0] stepping stones to cross the chasm of understanding

Great!

Thanks :)

[1] https://github.blog/2015-09-22-counting-objects/, https://github.blog/2021-04-29-scaling-monorepo-maintenance/

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-07-09  7:53         ` Abhradeep Chakraborty
@ 2022-07-10 15:01           ` Philip Oakley
  2022-07-14 23:15             ` Taylor Blau
  2022-07-15 18:48             ` Abhradeep Chakraborty
  0 siblings, 2 replies; 162+ messages in thread
From: Philip Oakley @ 2022-07-10 15:01 UTC (permalink / raw)
  To: Abhradeep Chakraborty
  Cc: Git, Kaartic Sivaraam, Taylor Blau, Derrick Stolee, Junio C Hamano

On 09/07/2022 08:53, Abhradeep Chakraborty wrote:
> Hello Philip,
>
> Philip Oakley <philipoakley@iee.email> wrote:
>
>> Not sure if this is new in this extension, but should there be a link or
>> two to the basics of XOR compression and some of the bitmap look up
>> techniques?
>>
>> It's not always obvious if these techniques are 'heuristic' and only
>> have partial commit data, or they have all the commits listed, Nor
>> how/why they work. My point is more about giving new readers a hand-up
>> in their understanding, rather than simple implementation details for
>> those who already know what is going on. For example, are there any
>> external articles that you found helpful in getting started that could
>> be referenced somewhere in the docs?
> As this series is only about adding a lookup-table extension (and not
> about bitmap itself), I am not sure whether it's good to include those
> things in this series. 

Thanks for the clarification. I must have slight misread some of the
discussions and falsely thought it was the XOR compression (which is a
technique I wasn't really aware of), that was being provided by the
extension - Where would it be best for me to look up the background to
your "extension" project?


> But I agree with your point that it should be
> able build a logical understanding among the new readers.

*nod*
>
> There are some external articles[1] which talk about bitmap internals.
> But I think it would be better if we can make a new doc file (may be
> `Documentation/technical/reachability-bitmaps.txt` or similar) rather
> than putting those details in the `bitmap-format.txt` 

Thanks for the two links. In general I agree about the format document.

> (As the name 
> suggests, this file should only contain format details of bitmaps).
> That file would provide the answers of "Why bitmaps", "how they are
> stored",  "How they are fetched", "how they work with pack-objects,
> git-fetch, midx etc.", "Detailed explanation of each bitmap extension"
> , and lastly "what are the future works" (if any).

One thing I've realised on reflection is that I'm unclear how the
'reachability bitmaps' and the 'commit-graph file' techniques relate to
each other (and to the ODB DAG), and what features they pick out within
their heuristic, explained at just enough level to allow folks to
appreciate what the options that select them will do for their use case.

>
> What do you think?

I'd be happy to collate contributions, suggestions and thoughts.

Trying to create these good introductory descriptions can be really
difficult, as you can only step into the same river once (the 'reading
for the first time problem' of not being able to un-hear the
explanations of others when reading a 2nd draft...)
>
>> Separately I'm preparing a short series on adding 'reachability bitmap'
>> and 'commit graph' (among other stuff) to the glossary as part of giving
>> folks [0] stepping stones to cross the chasm of understanding
> Great!
>
> Thanks :)
>
> [1] https://github.blog/2015-09-22-counting-objects/, https://github.blog/2021-04-29-scaling-monorepo-maintenance/
Thank you.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-07-10 15:01           ` Philip Oakley
@ 2022-07-14 23:15             ` Taylor Blau
  2022-07-15 10:36               ` Philip Oakley
  2022-07-15 18:48             ` Abhradeep Chakraborty
  1 sibling, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-07-14 23:15 UTC (permalink / raw)
  To: Philip Oakley
  Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Derrick Stolee,
	Junio C Hamano

On Sun, Jul 10, 2022 at 04:01:11PM +0100, Philip Oakley wrote:
> >> Not sure if this is new in this extension, but should there be a link or
> >> two to the basics of XOR compression and some of the bitmap look up
> >> techniques?
> >>
> >> It's not always obvious if these techniques are 'heuristic' and only
> >> have partial commit data, or they have all the commits listed, Nor
> >> how/why they work. My point is more about giving new readers a hand-up
> >> in their understanding, rather than simple implementation details for
> >> those who already know what is going on. For example, are there any
> >> external articles that you found helpful in getting started that could
> >> be referenced somewhere in the docs?
> > As this series is only about adding a lookup-table extension (and not
> > about bitmap itself), I am not sure whether it's good to include those
> > things in this series.
>
> Thanks for the clarification. I must have slight misread some of the
> discussions and falsely thought it was the XOR compression (which is a
> technique I wasn't really aware of), that was being provided by the
> extension - Where would it be best for me to look up the background to
> your "extension" project?

Yeah, Abhradeep is right that the XOR compression isn't new, we already
serialize bitmaps with optional XOR offsets. The gist is that we give an
offset of some previous bitmap that is used to compress the current one
by XORing the bits in the current bitamp with the previous one. These
XOR-compressed bitmaps are often sparse, so they compress well and
reduce the overall size of the .bitmap.

A slightly more detailed overview can be found in
Documentation/technical/bitmap-format.txt under the bullet point reading
"1-byte XOR-offset".

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 2/6] pack-bitmap-write.c: write lookup table extension
  2022-07-04  8:46     ` [PATCH v3 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
@ 2022-07-14 23:26       ` Taylor Blau
  2022-07-15  2:22       ` Taylor Blau
  2022-07-18  8:59       ` Martin Ågren
  2 siblings, 0 replies; 162+ messages in thread
From: Taylor Blau @ 2022-07-14 23:26 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Kaartic Sivaram, Derrick Stolee, Abhradeep Chakraborty

On Mon, Jul 04, 2022 at 08:46:12AM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>
> The bitmap lookup table extension was documented by an earlier
> change, but Git does not yet know how to write that extension.

This and the first patch both look in great shape to me. I haven't had a
chance to take a close look through the remaining four patches, but I
anticipate that they are in similarly-good shape.

I'll have some more time to finish reviewing this tomorrow morning. I
want to give it a closer inspection this round to make sure that
everything is correct (and that we're assembling the various orderings
the right way by stepping through it in a debugger, etc., etc.).

Thanks for all of your patience :-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 2/6] pack-bitmap-write.c: write lookup table extension
  2022-07-04  8:46     ` [PATCH v3 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
  2022-07-14 23:26       ` Taylor Blau
@ 2022-07-15  2:22       ` Taylor Blau
  2022-07-15 15:58         ` Abhradeep Chakraborty
  2022-07-18  8:59       ` Martin Ågren
  2 siblings, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-07-15  2:22 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Kaartic Sivaram, Derrick Stolee, Abhradeep Chakraborty

Hi Abhradeep,

I just wanted to make absolutely sure that I understood what the
implementation in this patch was doing, since I think generating and
converting between all of these different orderings is by far the most
confusing component of this series.

On Mon, Jul 04, 2022 at 08:46:12AM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>
> The bitmap lookup table extension was documented by an earlier
> change, but Git does not yet know how to write that extension.
> +static int table_cmp(const void *_va, const void *_vb, void *_data)
> +{
> +	uint32_t *commit_positions = _data;
> +	uint32_t a = commit_positions[*(uint32_t *)_va];
> +	uint32_t b = commit_positions[*(uint32_t *)_vb];
> +
> +	if (a > b)
> +		return 1;
> +	else if (a < b)
> +		return -1;
> +
> +	return 0;
> +}

Let's skip the above part for now, and just look at the implementation
of writing_lookup_table():

> +static void write_lookup_table(struct hashfile *f,
> +			       struct pack_idx_entry **index,
> +			       uint32_t index_nr,
> +			       off_t *offsets)
> +{
> +	uint32_t i;
> +	uint32_t *table, *table_inv, *commit_positions;
> +
> +	ALLOC_ARRAY(table, writer.selected_nr);
> +	ALLOC_ARRAY(table_inv, writer.selected_nr);
> +	ALLOC_ARRAY(commit_positions, writer.selected_nr);

Makes sense.

> +	/* store the index positions of the commits */
> +	for (i = 0; i < writer.selected_nr; i++) {
> +		int pos = commit_bitmap_writer_pos(&writer.selected[i].commit->object.oid,
> +						   index, index_nr);
> +		if (pos < 0)
> +			BUG(_("trying to write commit not in index"));
> +
> +		commit_positions[i] = pos;
> +	}

By the end of this loop, we have an array `commit_positions` which maps
the ith selected commit to its lexical position among all objects in the
bitmap. IOW, `commit_positions[i] = j` means the `i`th selected commit
can be found at index `j` among all objects in the pack/MIDX in their
lexical order.

> +	for (i = 0; i < writer.selected_nr; i++)
> +		table[i] = i;

At this point, table[i] = i.

> +	/*
> +	 * At the end of this sort table[j] = i means that the i'th
> +	 * bitmap corresponds to j'th bitmapped commit in lex order of
> +	 * OIDs.
> +	 */
> +	QSORT_S(table, writer.selected_nr, table_cmp, commit_positions);

And then we sort table by treating its values as indexes into
`commit_positions`. Here's where I'm not sure that I follow what's going
on. You say above that `table[j] = i`, where `i` corresponds to the
order of selected commits, and `j` is in lexical order.

If that's the case, then I'd expect that printing `index[table[j]]` for
increasing `j` would output OIDs in increasing lexical order. But that
doesn't quite seem to be the case. From a debugger session that has a
breakpoint after computing and sorting table, along with building
`table_inv`:

    (gdb) p oid_to_hex(&index[table[0]]->oid)
    $17 = 0x555555983ea0 <hexbuffer> "0006763074748d43b539c1c8e8882c08034ab178"
    (gdb) p oid_to_hex(&index[table[1]]->oid)
    $18 = 0x555555983ee1 <hexbuffer+65> "001ce83dd43f03dcfc67f29d38922e4a9682aab0"
    (gdb) p oid_to_hex(&index[table[2]]->oid)
    $19 = 0x555555983f22 <hexbuffer+130> "002db882ece2ab6a240e495a169c6e06422289c8"
    (gdb) p oid_to_hex(&index[table[3]]->oid)
    $20 = 0x555555983f63 <hexbuffer+195> "0007a5feb040e1ff704f3ad636619ddca3e7382b"

that doesn't look like the OIDs are increasing in lexical order.

I'm not quite sure if I'm even looking at the right thing, or if this is
to be expected, or if the comment isn't quite accurate. If you could
help clarify what's going on here, that would be great.

> +	/* table_inv helps us discover that relationship (i'th bitmap
> +	 * to j'th commit by j = table_inv[i])
> +	 */
> +	for (i = 0; i < writer.selected_nr; i++)
> +		table_inv[table[i]] = i;

This part makes sense, as does the rest of the implementation.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 4/6] pack-bitmap: prepare to read lookup table extension
  2022-07-04  8:46     ` [PATCH v3 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
@ 2022-07-15  2:46       ` Taylor Blau
  2022-07-15 16:38         ` Abhradeep Chakraborty
  0 siblings, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-07-15  2:46 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Kaartic Sivaram, Derrick Stolee, Abhradeep Chakraborty

On Mon, Jul 04, 2022 at 08:46:14AM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> +/*
> + * Searches for a matching triplet. `va` is a pointer
> + * to the wanted commit position value. `vb` points to
> + * a triplet in lookup table. The first 4 bytes of each
> + * triplet (pointed by `vb`) are compared with `*va`.
> + */
> +static int triplet_cmp(const void *va, const void *vb)
> +{
> +
> +	uint32_t a = *(uint32_t *)va;

The comment you added is definitely helpful, but I still think that this
line is a little magical. `*va` isn't really a pointer to a `uint32_t`,
but a pointer to the start of a triplet, which just *happens* to have a
4-byte integer at the beginning of it.

I don't think there's a way to improve this much more than we already
have, though. Populating a triplet struct to just dereference the first
field feels wasteful and slow. So I think what you have here makes sense
to me.

> +static uint32_t bsearch_pos(struct bitmap_index *bitmap_git,
> +			    struct object_id *oid,
> +			    uint32_t *result)
> +{
> +	int found;
> +
> +	if (bitmap_is_midx(bitmap_git))
> +		found = bsearch_midx(oid, bitmap_git->midx, result);
> +	else
> +		found = bsearch_pack(oid, bitmap_git->pack, result);
> +
> +	return found;
> +}
> +
> +/*
> + * `bsearch_triplet` function searches for the raw triplet having
> + * commit position same as `commit_pos` and fills `triplet`
> + * object from the raw triplet. Returns 1 on success and 0
> + * on failure.
> + */
> +static int bsearch_triplet(uint32_t *commit_pos,
> +			   struct bitmap_index *bitmap_git,
> +			   struct bitmap_lookup_table_triplet *triplet)
> +{
> +	unsigned char *p = bsearch(commit_pos, bitmap_git->table_lookup, bitmap_git->entry_count,
> +				   BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH, triplet_cmp);
> +
> +	if (!p)
> +		return 0;
> +	triplet->commit_pos = get_be32(p);
> +	p += sizeof(uint32_t);
> +	triplet->offset = get_be64(p);
> +	p += sizeof(uint64_t);
> +	triplet->xor_row = get_be32(p);
> +	return 1;
> +}

This implementation jumped out as being quite similar to
`lookup_table_get_triplet()`. Ultimately they both end up filling a
triplet struct based on some position `p` within the bitmap. The main
difference being that in `lookup_table_get_triplet()`, `p` comes from a
numeric position which indexes into the table, while in
`bsearch_triplet()` the position `p` is given to us by a call to
`bsearch()`.

I wonder if it would be worth extracting the common part of: given a
pointer `p` and a triplet struct, read the triplet beginning at `p` into
the struct.

`lookup_table_get_triplet()` could compute `p` and then return the
result of calling the new auxiliary function with that `p`. Similarly
for `bsearch_triplet()`, it would call that auxiliary function with the
pointer it got from calling `bsearch()`, or return `0` if no match was
found.

It's a minor point, but I think it would help us clean up the
implementation a little bit.

> +
> +static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_git,
> +					  struct commit *commit)
> +{
> +	uint32_t commit_pos, xor_row;
> +	uint64_t offset;
> +	int flags;
> +	struct bitmap_lookup_table_triplet triplet;
> +	struct object_id *oid = &commit->object.oid;
> +	struct ewah_bitmap *bitmap;
> +	struct stored_bitmap *xor_bitmap = NULL;
> +
> +	int found = bsearch_pos(bitmap_git, oid, &commit_pos);
> +
> +	if (!found)
> +		return NULL;
> +
> +	if (!bsearch_triplet(&commit_pos, bitmap_git, &triplet))
> +		return NULL;
> +
> +	offset = triplet.offset;
> +	xor_row = triplet.xor_row;
> +
> +	if (xor_row != 0xffffffff) {
> +		int xor_flags;
> +		khiter_t hash_pos;
> +		uint64_t offset_xor;
> +		struct bitmap_lookup_table_xor_item *xor_items;
> +		struct bitmap_lookup_table_xor_item xor_item;
> +		size_t xor_items_nr = 0, xor_items_alloc = 64;
> +
> +		ALLOC_ARRAY(xor_items, xor_items_alloc);

This ALLOC_ARRAY() looks great to me. I wonder if we could amortize the
cost of allocating in this (somewhat) hot function by treating the
`xor_items` array as a reusable static buffer where we reset
xor_items_nr to 0 when entering this function.

> +		while (xor_row != 0xffffffff) {
> +			struct object_id xor_oid;
> +
> +			if (xor_items_nr + 1 >= bitmap_git->entry_count) {
> +				free(xor_items);
> +				error(_("corrupt bitmap lookup table: xor chain exceed entry count"));

I think we can probably `die()` here, we're pretty much out of luck in
this case.

> +				return NULL;
> +			}
> +
> +			if (lookup_table_get_triplet(bitmap_git, xor_row, &triplet) < 0)
> +				return NULL;
> +
> +			offset_xor = triplet.offset;
> +
> +			if (nth_bitmap_object_oid(bitmap_git, &xor_oid, triplet.commit_pos) < 0) {
> +				free(xor_items);
> +				error(_("corrupt bitmap lookup table: commit index %u out of range"),
> +					triplet.commit_pos);

Same here.

> +				return NULL;
> +			}
> +
> +			hash_pos = kh_get_oid_map(bitmap_git->bitmaps, xor_oid);
> +
> +			/*
> +			 * If desired bitmap is already stored, we don't need
> +			 * to iterate further. Because we know that bitmaps
> +			 * that are needed to be parsed to parse this bitmap
> +			 * has already been stored. So, assign this stored bitmap
> +			 * to the xor_bitmap.
> +			 */
> +			if (hash_pos < kh_end(bitmap_git->bitmaps) &&
> +			    (xor_bitmap = kh_value(bitmap_git->bitmaps, hash_pos)))
> +				break;
> +
> +			ALLOC_GROW(xor_items, xor_items_nr + 1, xor_items_alloc);
> +			xor_items[xor_items_nr++] = (struct bitmap_lookup_table_xor_item) {.oid = xor_oid,
> +											   .offset = offset_xor};

This style of initialization is somewhat uncommon for Git's codebase. It
might be a little more natural to write something like:

    xor_items[xor_items_nr].oid = xor_oid;
    xor_items[xor_items_nr].offset = offset_xor;
    xor_items_nr++;

But the struct-copying for `xor_oid` is definitely uncommon for us. We
should use the `oidcpy()` helper there instead. Or better yet, pass a
pointer to `&xor_items[xor_items_nr].oid` as the second argument to
`nth_bitmap_object_oid()` to avoid the copy altogether.

> +			xor_row = triplet.xor_row;
> +		}
> +
> +		while (xor_items_nr) {
> +			xor_item = xor_items[xor_items_nr - 1];
> +			offset_xor = xor_item.offset;
> +
> +			bitmap_git->map_pos = offset_xor;
> +			if (bitmap_git->map_size - bitmap_git->map_pos < 6) {

Should we extract `6` out to a named constant?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 5/6] bitmap-lookup-table: add performance tests for lookup table
  2022-07-04  8:46     ` [PATCH v3 5/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
@ 2022-07-15  2:53       ` Taylor Blau
  2022-07-15 18:23         ` Abhradeep Chakraborty
  0 siblings, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-07-15  2:53 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Kaartic Sivaram, Derrick Stolee, Abhradeep Chakraborty

On Mon, Jul 04, 2022 at 08:46:15AM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>
> Add performance tests to verify the performance of lookup table with
> `pack.writeReverseIndex` enabled. This is to check the performance
> when the above configuration is set.
>
> Lookup table makes Git run faster in most of the cases. Below is the
> result of `t/perf/p5310-pack-bitmaps.sh`.`perf/p5326-multi-pack-bitmaps.sh`
> gives similar result. The repository used in the test is linux kernel.
>
> Test                                                      this tree
> ---------------------------------------------------------------------------
> 5310.4: repack to disk (lookup=false)                   296.55(256.53+14.52)

Having "lookup=false" in this test definitely helps visually
differentiate which tests have a bitmap with and without the lookup
table.

I think we should take a slightly different approach for these
performance tests. I think the first change to the t/perf tests in this
series should only enable `pack.writeReverseIndex`. That patch would be
a good place to highlight the benefit of enabling the on-disk reverse
index by showing a before and after of running p5310 before and after
that commit.

Then the patch after that should look like this one, which runs the
suite with and without the lookup table. That should give us a sense of:

  - bitmaps without a lookup table or reverse index
  - bitmaps without a lookup table, but with a reverse index
  - bitamps with a reverse index and a lookup table

...which I think are the most interesting combinations (I wouldn't
expect many or any users to have lookup tables enabled without reverse
indexes).

I think that would allow us to drop the last patch in this version of
the series. But I'm definitely open to other testing strategies for the
performance tests (including this one!) if you have different thoughts
about what the best way to go about this is.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-07-14 23:15             ` Taylor Blau
@ 2022-07-15 10:36               ` Philip Oakley
  0 siblings, 0 replies; 162+ messages in thread
From: Philip Oakley @ 2022-07-15 10:36 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Abhradeep Chakraborty, Git, Kaartic Sivaraam, Derrick Stolee,
	Junio C Hamano

On 15/07/2022 00:15, Taylor Blau wrote:
> On Sun, Jul 10, 2022 at 04:01:11PM +0100, Philip Oakley wrote:
>>>> Not sure if this is new in this extension, but should there be a link or
>>>> two to the basics of XOR compression and some of the bitmap look up
>>>> techniques?
>>>>
>>>> It's not always obvious if these techniques are 'heuristic' and only
>>>> have partial commit data, or they have all the commits listed, Nor
>>>> how/why they work. My point is more about giving new readers a hand-up
>>>> in their understanding, rather than simple implementation details for
>>>> those who already know what is going on. For example, are there any
>>>> external articles that you found helpful in getting started that could
>>>> be referenced somewhere in the docs?
>>> As this series is only about adding a lookup-table extension (and not
>>> about bitmap itself), I am not sure whether it's good to include those
>>> things in this series.
>> Thanks for the clarification. I must have slight misread some of the
>> discussions and falsely thought it was the XOR compression (which is a
>> technique I wasn't really aware of), that was being provided by the
>> extension - Where would it be best for me to look up the background to
>> your "extension" project?
> Yeah, Abhradeep is right that the XOR compression isn't new, we already
> serialize bitmaps with optional XOR offsets. The gist is that we give an
> offset of some previous bitmap that is used to compress the current one
> by XORing the bits in the current bitamp with the previous one. These
> XOR-compressed bitmaps are often sparse, so they compress well and
> reduce the overall size of the .bitmap.

I was thinking of a short paragraph that covers the broader 'why'
aspects, rather than the what/how. For me, XOR is a 'new' compression
method that (IIUC) takes advantage of certain features of the way the
data is arranged, such that the XOR has lots of leading zeros, leading
to the compression mentioned.

I think it's that we sort on oid name, so that despite the oid being
long, we have (typically) sufficient oids that the leading XOR bits of
adjacent pairs allows effective compression. But I could have guessed
wildly wrong.

I'd been looking at
https://www.timescale.com/blog/time-series-compression-algorithms-explained/
which gave an overview for sorted floats.
I'd not had time to review the paper "Gorilla: A Fast, Scalable,
In-Memory Time Series Database"
http://www.vldb.org/pvldb/vol8/p1816-teller.pdf
>
> A slightly more detailed overview can be found in
> Documentation/technical/bitmap-format.txt under the bullet point reading
> "1-byte XOR-offset".
>
A separate point is the linkage (or not) between the older reachability
bit maps, and the commit graph, which sound to be independent options
and features, yet appear rather interrelated.

Thanks

Philip



^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 2/6] pack-bitmap-write.c: write lookup table extension
  2022-07-15  2:22       ` Taylor Blau
@ 2022-07-15 15:58         ` Abhradeep Chakraborty
  2022-07-15 22:15           ` Taylor Blau
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-07-15 15:58 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaram,
	Derrick Stolee

On Fri, Jul 15, 2022 at 7:52 AM Taylor Blau <me@ttaylorr.com> wrote:
>
> By the end of this loop, we have an array `commit_positions` which maps
> the ith selected commit to its lexical position among all objects in the
> bitmap. IOW, `commit_positions[i] = j` means the `i`th selected commit
> can be found at index `j` among all objects in the pack/MIDX in their
> lexical order.

Right.

> > +     /*
> > +      * At the end of this sort table[j] = i means that the i'th
> > +      * bitmap corresponds to j'th bitmapped commit in lex order of
> > +      * OIDs.
> > +      */
> > +     QSORT_S(table, writer.selected_nr, table_cmp, commit_positions);
>
> And then we sort table by treating its values as indexes into
> `commit_positions`. Here's where I'm not sure that I follow what's going
> on. You say above that `table[j] = i`, where `i` corresponds to the
> order of selected commits, and `j` is in lexical order.

Correct.

> If that's the case, then I'd expect that printing `index[table[j]]` for
> increasing `j` would output OIDs in increasing lexical order. But that
> doesn't quite seem to be the case. From a debugger session that has a
> breakpoint after computing and sorting table, along with building
> `table_inv`:
>
>     (gdb) p oid_to_hex(&index[table[0]]->oid)
>     $17 = 0x555555983ea0 <hexbuffer> "0006763074748d43b539c1c8e8882c08034ab178"
>     (gdb) p oid_to_hex(&index[table[1]]->oid)
>     $18 = 0x555555983ee1 <hexbuffer+65> "001ce83dd43f03dcfc67f29d38922e4a9682aab0"
>     (gdb) p oid_to_hex(&index[table[2]]->oid)
>     $19 = 0x555555983f22 <hexbuffer+130> "002db882ece2ab6a240e495a169c6e06422289c8"
>     (gdb) p oid_to_hex(&index[table[3]]->oid)
>     $20 = 0x555555983f63 <hexbuffer+195> "0007a5feb040e1ff704f3ad636619ddca3e7382b"
>
> that doesn't look like the OIDs are increasing in lexical order.
>
> I'm not quite sure if I'm even looking at the right thing, or if this is
> to be expected, or if the comment isn't quite accurate. If you could
> help clarify what's going on here, that would be great.

I think you're not looking at the right thing. you should look at
`writer.selected[table[i]].commit->object.oid` instead. I think the
order of `index[]`
is not the same as the pack index (or midx).

I am saying this because if we use the `pos` variable (that we get
from `commit_bitmap_writer_pos(&writer.selected[table[i]].commit->object.oid,
index, index_nr)`) in `fprintf(stderr, "commit hex: %s\n",
&index[pos]->oid);`, you'll see that `&index[pos]->oid` and
`&writer.selected[table[i]].commit->object.oid` are not same. So, If
you do -

  int spos = commit_bitmap_writer_pos(&index[pos]->oid, index, index_nr);

you'll see `spos` is not equal to `pos`.

> > +     /* table_inv helps us discover that relationship (i'th bitmap
> > +      * to j'th commit by j = table_inv[i])
> > +      */
> > +     for (i = 0; i < writer.selected_nr; i++)
> > +             table_inv[table[i]] = i;
>
> This part makes sense, as does the rest of the implementation.
>
> Thanks,
> Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 4/6] pack-bitmap: prepare to read lookup table extension
  2022-07-15  2:46       ` Taylor Blau
@ 2022-07-15 16:38         ` Abhradeep Chakraborty
  2022-07-15 22:20           ` Taylor Blau
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-07-15 16:38 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaram,
	Derrick Stolee

On Fri, Jul 15, 2022 at 8:16 AM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Mon, Jul 04, 2022 at 08:46:14AM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> > +/*
> > + * Searches for a matching triplet. `va` is a pointer
> > + * to the wanted commit position value. `vb` points to
> > + * a triplet in lookup table. The first 4 bytes of each
> > + * triplet (pointed by `vb`) are compared with `*va`.
> > + */
> > +static int triplet_cmp(const void *va, const void *vb)
> > +{
> > +
> > +     uint32_t a = *(uint32_t *)va;
>
> The comment you added is definitely helpful, but I still think that this
> line is a little magical. `*va` isn't really a pointer to a `uint32_t`,
> but a pointer to the start of a triplet, which just *happens* to have a
> 4-byte integer at the beginning of it.

Are you sure about this? As far as I know, the first parameter of such
comparing functions is always a pointer to the given key that we need
to search for and the second parameter points to each element of an
array.

I think "`va is a pointer to the wanted commit position value" is not
that descriptive. Maybe "`va` is a pointer to the given key" is
better. What do you think?

> > + * `bsearch_triplet` function searches for the raw triplet having
> > + * commit position same as `commit_pos` and fills `triplet`
> > + * object from the raw triplet. Returns 1 on success and 0
> > + * on failure.
> > + */
> > +static int bsearch_triplet(uint32_t *commit_pos,
> > +                        struct bitmap_index *bitmap_git,
> > +                        struct bitmap_lookup_table_triplet *triplet)
> > +{
> > +     unsigned char *p = bsearch(commit_pos, bitmap_git->table_lookup, bitmap_git->entry_count,
> > +                                BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH, triplet_cmp);
> > +
> > +     if (!p)
> > +             return 0;
> > +     triplet->commit_pos = get_be32(p);
> > +     p += sizeof(uint32_t);
> > +     triplet->offset = get_be64(p);
> > +     p += sizeof(uint64_t);
> > +     triplet->xor_row = get_be32(p);
> > +     return 1;
> > +}
>
> This implementation jumped out as being quite similar to
> `lookup_table_get_triplet()`. Ultimately they both end up filling a
> triplet struct based on some position `p` within the bitmap. The main
> difference being that in `lookup_table_get_triplet()`, `p` comes from a
> numeric position which indexes into the table, while in
> `bsearch_triplet()` the position `p` is given to us by a call to
> `bsearch()`.
>
> I wonder if it would be worth extracting the common part of: given a
> pointer `p` and a triplet struct, read the triplet beginning at `p` into
> the struct.
>
> `lookup_table_get_triplet()` could compute `p` and then return the
> result of calling the new auxiliary function with that `p`. Similarly
> for `bsearch_triplet()`, it would call that auxiliary function with the
> pointer it got from calling `bsearch()`, or return `0` if no match was
> found.
>
> It's a minor point, but I think it would help us clean up the
> implementation a little bit.

Sure! That would be a great idea!

> > +             ALLOC_ARRAY(xor_items, xor_items_alloc);
>
> This ALLOC_ARRAY() looks great to me. I wonder if we could amortize the
> cost of allocating in this (somewhat) hot function by treating the
> `xor_items` array as a reusable static buffer where we reset
> xor_items_nr to 0 when entering this function.
>
> > +             while (xor_row != 0xffffffff) {
> > +                     struct object_id xor_oid;
> > +
> > +                     if (xor_items_nr + 1 >= bitmap_git->entry_count) {
> > +                             free(xor_items);
> > +                             error(_("corrupt bitmap lookup table: xor chain exceed entry count"));
>
> I think we can probably `die()` here, we're pretty much out of luck in
> this case.
> ...
> > +                             error(_("corrupt bitmap lookup table: commit index %u out of range"),
> > +                                     triplet.commit_pos);
>
> Same here.

I didn't use `die()` here because I thought returning NULL would be a
better idea. In that case, Git can still do its job by using the
traditional approach  - traversing  between objects.
`load_bitmap_entries_v1` also returns NULL if an error occurs. What do
you think?

> > +                     ALLOC_GROW(xor_items, xor_items_nr + 1, xor_items_alloc);
> > +                     xor_items[xor_items_nr++] = (struct bitmap_lookup_table_xor_item) {.oid = xor_oid,
> > +                                                                                        .offset = offset_xor};
>
> This style of initialization is somewhat uncommon for Git's codebase. It
> might be a little more natural to write something like:
>
>     xor_items[xor_items_nr].oid = xor_oid;
>     xor_items[xor_items_nr].offset = offset_xor;
>     xor_items_nr++;
>
> But the struct-copying for `xor_oid` is definitely uncommon for us. We
> should use the `oidcpy()` helper there instead. Or better yet, pass a
> pointer to `&xor_items[xor_items_nr].oid` as the second argument to
> `nth_bitmap_object_oid()` to avoid the copy altogether.

Ok, got it.

> > +                     bitmap_git->map_pos = offset_xor;
> > +                     if (bitmap_git->map_size - bitmap_git->map_pos < 6) {
>
> Should we extract `6` out to a named constant?

Ok, sure!

Thanks :)

>
> Thanks,
> Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 5/6] bitmap-lookup-table: add performance tests for lookup table
  2022-07-15  2:53       ` Taylor Blau
@ 2022-07-15 18:23         ` Abhradeep Chakraborty
  0 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-07-15 18:23 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaram,
	Derrick Stolee

On Fri, Jul 15, 2022 at 8:23 AM Taylor Blau <me@ttaylorr.com> wrote:
>
> Having "lookup=false" in this test definitely helps visually
> differentiate which tests have a bitmap with and without the lookup
> table.
>
> I think we should take a slightly different approach for these
> performance tests. I think the first change to the t/perf tests in this
> series should only enable `pack.writeReverseIndex`. That patch would be
> a good place to highlight the benefit of enabling the on-disk reverse
> index by showing a before and after of running p5310 before and after
> that commit.
>
> Then the patch after that should look like this one, which runs the
> suite with and without the lookup table. That should give us a sense of:
>
>   - bitmaps without a lookup table or reverse index
>   - bitmaps without a lookup table, but with a reverse index
>   - bitamps with a reverse index and a lookup table
>
> ...which I think are the most interesting combinations (I wouldn't
> expect many or any users to have lookup tables enabled without reverse
> indexes).
>
> I think that would allow us to drop the last patch in this version of
> the series. But I'm definitely open to other testing strategies for the
> performance tests (including this one!) if you have different thoughts
> about what the best way to go about this is.

Got it. Thanks !

> Thanks,
> Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-07-10 15:01           ` Philip Oakley
  2022-07-14 23:15             ` Taylor Blau
@ 2022-07-15 18:48             ` Abhradeep Chakraborty
  1 sibling, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-07-15 18:48 UTC (permalink / raw)
  To: Philip Oakley
  Cc: Git, Kaartic Sivaraam, Taylor Blau, Derrick Stolee, Junio C Hamano

On Sun, Jul 10, 2022 at 8:31 PM Philip Oakley <philipoakley@iee.email> wrote:
>
> On 09/07/2022 08:53, Abhradeep Chakraborty wrote:
> > Hello Philip,
> >
> > Philip Oakley <philipoakley@iee.email> wrote:
> >
> >> Not sure if this is new in this extension, but should there be a link or
> >> two to the basics of XOR compression and some of the bitmap look up
> >> techniques?
> >>
> >> It's not always obvious if these techniques are 'heuristic' and only
> >> have partial commit data, or they have all the commits listed, Nor
> >> how/why they work. My point is more about giving new readers a hand-up
> >> in their understanding, rather than simple implementation details for
> >> those who already know what is going on. For example, are there any
> >> external articles that you found helpful in getting started that could
> >> be referenced somewhere in the docs?
> > As this series is only about adding a lookup-table extension (and not
> > about bitmap itself), I am not sure whether it's good to include those
> > things in this series.
>
> Thanks for the clarification. I must have slight misread some of the
> discussions and falsely thought it was the XOR compression (which is a
> technique I wasn't really aware of), that was being provided by the
> extension - Where would it be best for me to look up the background to
> your "extension" project?

Sorry that I missed this message. I got the information related to
this project from the gsoc project ideas[1] page, additionally you can
see the comments[2].

[1] https://git.github.io/SoC-2022-Ideas/
[2] https://lore.kernel.org/git/YNovuzAsaEb2uIaa@nand.local/

> > (As the name
> > suggests, this file should only contain format details of bitmaps).
> > That file would provide the answers of "Why bitmaps", "how they are
> > stored",  "How they are fetched", "how they work with pack-objects,
> > git-fetch, midx etc.", "Detailed explanation of each bitmap extension"
> > , and lastly "what are the future works" (if any).
>
> One thing I've realised on reflection is that I'm unclear how the
> 'reachability bitmaps' and the 'commit-graph file' techniques relate to
> each other (and to the ODB DAG), and what features they pick out within
> their heuristic, explained at just enough level to allow folks to
> appreciate what the options that select them will do for their use case.

I am not familiar with 'commit-graph file', so I can't tell you about
that. But for bitmaps, you can look at the introductory patches[3].
After that, if you wish, you can also inspect the code related to
bitmaps.

[3] https://github.com/gitster/git/commit/e127310

> > What do you think?
>
> I'd be happy to collate contributions, suggestions and thoughts.
>
> Trying to create these good introductory descriptions can be really
> difficult, as you can only step into the same river once (the 'reading
> for the first time problem' of not being able to un-hear the
> explanations of others when reading a 2nd draft...)

I agree ;-)

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 2/6] pack-bitmap-write.c: write lookup table extension
  2022-07-15 15:58         ` Abhradeep Chakraborty
@ 2022-07-15 22:15           ` Taylor Blau
  2022-07-16 11:50             ` Abhradeep Chakraborty
  0 siblings, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-07-15 22:15 UTC (permalink / raw)
  To: Abhradeep Chakraborty
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaram,
	Derrick Stolee

On Fri, Jul 15, 2022 at 09:28:25PM +0530, Abhradeep Chakraborty wrote:
> > If that's the case, then I'd expect that printing `index[table[j]]` for
> > increasing `j` would output OIDs in increasing lexical order. But that
> > doesn't quite seem to be the case. From a debugger session that has a
> > breakpoint after computing and sorting table, along with building
> > `table_inv`:
> >
> >     (gdb) p oid_to_hex(&index[table[0]]->oid)
> >     $17 = 0x555555983ea0 <hexbuffer> "0006763074748d43b539c1c8e8882c08034ab178"
> >     (gdb) p oid_to_hex(&index[table[1]]->oid)
> >     $18 = 0x555555983ee1 <hexbuffer+65> "001ce83dd43f03dcfc67f29d38922e4a9682aab0"
> >     (gdb) p oid_to_hex(&index[table[2]]->oid)
> >     $19 = 0x555555983f22 <hexbuffer+130> "002db882ece2ab6a240e495a169c6e06422289c8"
> >     (gdb) p oid_to_hex(&index[table[3]]->oid)
> >     $20 = 0x555555983f63 <hexbuffer+195> "0007a5feb040e1ff704f3ad636619ddca3e7382b"
> >
> > that doesn't look like the OIDs are increasing in lexical order.
> >
> > I'm not quite sure if I'm even looking at the right thing, or if this is
> > to be expected, or if the comment isn't quite accurate. If you could
> > help clarify what's going on here, that would be great.
>
> I think you're not looking at the right thing. you should look at
> `writer.selected[table[i]].commit->object.oid` instead. I think the
> order of `index[]`
> is not the same as the pack index (or midx).
>
> I am saying this because if we use the `pos` variable (that we get
> from `commit_bitmap_writer_pos(&writer.selected[table[i]].commit->object.oid,
> index, index_nr)`) in `fprintf(stderr, "commit hex: %s\n",
> &index[pos]->oid);`, you'll see that `&index[pos]->oid` and
> `&writer.selected[table[i]].commit->object.oid` are not same. So, If
> you do -
>
>   int spos = commit_bitmap_writer_pos(&index[pos]->oid, index, index_nr);
>
> you'll see `spos` is not equal to `pos`.

`index` there comes from the list of objects that `pack-objects` or the
MIDX told us about, and it's sorted in lexical order (via
`write_pack_file()` -> `stage_tmp_packfiles()` -> `write_idx_file()`).

So I think this implementation is indexing the commits by the order they
appearn in the `writer.selected` array, *not* by the order they appear
in the index.

For what it's worth, I think the latter ordering makes more sense to use
to refer to individual objects. But we should be consistent with our
choice here and what's in the documentation. And right now I think we're
not, since the documentation change in the first patch says we write the
`commit_pos` field in order of the index:

    * {empty}
    commit_pos (4 byte integer, network byte order): ::
    It stores the object position of a commit (in the midx or pack
    index).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 4/6] pack-bitmap: prepare to read lookup table extension
  2022-07-15 16:38         ` Abhradeep Chakraborty
@ 2022-07-15 22:20           ` Taylor Blau
  2022-07-18  9:06             ` Martin Ågren
  0 siblings, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-07-15 22:20 UTC (permalink / raw)
  To: Abhradeep Chakraborty
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaram,
	Derrick Stolee

On Fri, Jul 15, 2022 at 10:08:17PM +0530, Abhradeep Chakraborty wrote:
> On Fri, Jul 15, 2022 at 8:16 AM Taylor Blau <me@ttaylorr.com> wrote:
> >
> > On Mon, Jul 04, 2022 at 08:46:14AM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> > > +/*
> > > + * Searches for a matching triplet. `va` is a pointer
> > > + * to the wanted commit position value. `vb` points to
> > > + * a triplet in lookup table. The first 4 bytes of each
> > > + * triplet (pointed by `vb`) are compared with `*va`.
> > > + */
> > > +static int triplet_cmp(const void *va, const void *vb)
> > > +{
> > > +
> > > +     uint32_t a = *(uint32_t *)va;
> >
> > The comment you added is definitely helpful, but I still think that this
> > line is a little magical. `*va` isn't really a pointer to a `uint32_t`,
> > but a pointer to the start of a triplet, which just *happens* to have a
> > 4-byte integer at the beginning of it.
>
> Are you sure about this? As far as I know, the first parameter of such
> comparing functions is always a pointer to the given key that we need
> to search for and the second parameter points to each element of an
> array.
>
> I think "`va is a pointer to the wanted commit position value" is not
> that descriptive. Maybe "`va` is a pointer to the given key" is
> better. What do you think?

Yes, the first argument to the comparison function used in bsearch() is
a pointer to some element in the array. I just meant that that array is
the bitmap_git->table_lookup region, so each element isn't actually a
uint32_t array, but the whole thing is an array of (uint32_t, uint64_t,
uint32_t) triplets.

What you wrote here is fine, and I don't even think that the comment
needs updating. If you did want to clarify, I think you could say
something along the lines of what you wrote above ("`va` is a pointer to
an array element") and add something along the lines of "where the array
is the lookup table region of the .bitmap".

> > > +             ALLOC_ARRAY(xor_items, xor_items_alloc);
> >
> > This ALLOC_ARRAY() looks great to me. I wonder if we could amortize the
> > cost of allocating in this (somewhat) hot function by treating the
> > `xor_items` array as a reusable static buffer where we reset
> > xor_items_nr to 0 when entering this function.
> >
> > > +             while (xor_row != 0xffffffff) {
> > > +                     struct object_id xor_oid;
> > > +
> > > +                     if (xor_items_nr + 1 >= bitmap_git->entry_count) {
> > > +                             free(xor_items);
> > > +                             error(_("corrupt bitmap lookup table: xor chain exceed entry count"));
> >
> > I think we can probably `die()` here, we're pretty much out of luck in
> > this case.
> > ...
> > > +                             error(_("corrupt bitmap lookup table: commit index %u out of range"),
> > > +                                     triplet.commit_pos);
> >
> > Same here.
>
> I didn't use `die()` here because I thought returning NULL would be a
> better idea. In that case, Git can still do its job by using the
> traditional approach  - traversing  between objects.
> `load_bitmap_entries_v1` also returns NULL if an error occurs. What do
> you think?

Ah, I wasn't aware that our callers are graceful enough to handle this
like that. Yes, if we can fallback gracefully, we should, so I think
just error()-ing here (and above) is the right choice. Thanks for saying
so.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 2/6] pack-bitmap-write.c: write lookup table extension
  2022-07-15 22:15           ` Taylor Blau
@ 2022-07-16 11:50             ` Abhradeep Chakraborty
  2022-07-26  0:34               ` Taylor Blau
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-07-16 11:50 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaram,
	Derrick Stolee

On Sat, Jul 16, 2022 at 3:45 AM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Fri, Jul 15, 2022 at 09:28:25PM +0530, Abhradeep Chakraborty wrote:
> > > If that's the case, then I'd expect that printing `index[table[j]]` for
> > > increasing `j` would output OIDs in increasing lexical order. But that
> > > doesn't quite seem to be the case. From a debugger session that has a
> > > breakpoint after computing and sorting table, along with building
> > > `table_inv`:
> > >
> > >     (gdb) p oid_to_hex(&index[table[0]]->oid)
> > >     $17 = 0x555555983ea0 <hexbuffer> "0006763074748d43b539c1c8e8882c08034ab178"
> > >     (gdb) p oid_to_hex(&index[table[1]]->oid)
> > >     $18 = 0x555555983ee1 <hexbuffer+65> "001ce83dd43f03dcfc67f29d38922e4a9682aab0"
> > >     (gdb) p oid_to_hex(&index[table[2]]->oid)
> > >     $19 = 0x555555983f22 <hexbuffer+130> "002db882ece2ab6a240e495a169c6e06422289c8"
> > >     (gdb) p oid_to_hex(&index[table[3]]->oid)
> > >     $20 = 0x555555983f63 <hexbuffer+195> "0007a5feb040e1ff704f3ad636619ddca3e7382b"
> > >
> > > that doesn't look like the OIDs are increasing in lexical order.
> > >
> > > I'm not quite sure if I'm even looking at the right thing, or if this is
> > > to be expected, or if the comment isn't quite accurate. If you could
> > > help clarify what's going on here, that would be great.
> >
> > I think you're not looking at the right thing. you should look at
> > `writer.selected[table[i]].commit->object.oid` instead. I think the
> > order of `index[]`
> > is not the same as the pack index (or midx).
> >
> > I am saying this because if we use the `pos` variable (that we get
> > from `commit_bitmap_writer_pos(&writer.selected[table[i]].commit->object.oid,
> > index, index_nr)`) in `fprintf(stderr, "commit hex: %s\n",
> > &index[pos]->oid);`, you'll see that `&index[pos]->oid` and
> > `&writer.selected[table[i]].commit->object.oid` are not same. So, If
> > you do -
> >
> >   int spos = commit_bitmap_writer_pos(&index[pos]->oid, index, index_nr);
> >
> > you'll see `spos` is not equal to `pos`.
>
> `index` there comes from the list of objects that `pack-objects` or the
> MIDX told us about, and it's sorted in lexical order (via
> `write_pack_file()` -> `stage_tmp_packfiles()` -> `write_idx_file()`).

This was a bit strange for me because all the tests were passing. But
now I find the reason why your results were not in lexical order. you
were doing  `oid_to_hex(&index[table[i]]->oid)` which is not what you
intended to do. Let me explain it with a simple workflow -

Suppose 12 commits are selected for bitmaps and are sorted by their
date. I will now use their  index numbers to denote those commits
(i.e. `0` denotes the most recent commit, `1` denotes the second
commit in this order and so on..).
So, before that quick sort, `table` = {0,1, 2, 3, 4, ...,11}. Now
suppose, `11`th commit is lexically smallest among all the selected
commits, `5`th commit is the second smallest commit and so on. So,
after that quick sort, `table` array now contains the following - {11,
5, 9, 4,0, 3, ...}.

So, when you do `&index[table[0]]->oid`, it becomes `&index[11]->oid`.
Similarly, `&index[table[1]]->oid` becomes `&index[5]->oid` and so on.
That's why you're not getting the oids in lexical order -
`&index[11]->oid` gives the 11th oid in the pack-index and
`&index[5]->oid` gives the 5th oid in the pack-index.

So, the right thing would be to do
`&index[commit_positions[table[0]]]->oid`,
`&index[commit_positions[table[1]]]->oid` ...

Here `&index[commit_positions[table[0]]]->oid` becomes
`&index[commit_positions[11]]->oid` =>
`&index[pos_of_11_commit_with_respect_to_pack_index]->oid` which
ultimately prints the oid of 11th commit ( among the selected bitmap
commits IN THE SELECTED BITMAP COMMIT ORDER) .

I think the comment I added is not that good. The following might be better -

    At the end of this sort table[j] = i means that the i'th
    bitmap corresponds to j'th bitmapped commit (among the selected commits)
    in lex order of OIDs.

> Thanks,
> Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 2/6] pack-bitmap-write.c: write lookup table extension
  2022-07-04  8:46     ` [PATCH v3 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
  2022-07-14 23:26       ` Taylor Blau
  2022-07-15  2:22       ` Taylor Blau
@ 2022-07-18  8:59       ` Martin Ågren
  2 siblings, 0 replies; 162+ messages in thread
From: Martin Ågren @ 2022-07-18  8:59 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: Git Mailing List, Taylor Blau, Kaartic Sivaram, Derrick Stolee,
	Abhradeep Chakraborty

On Mon, 4 Jul 2022 at 10:48, Abhradeep Chakraborty via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> +static int table_cmp(const void *_va, const void *_vb, void *_data)
> +{
> +       uint32_t *commit_positions = _data;
> +       uint32_t a = commit_positions[*(uint32_t *)_va];
> +       uint32_t b = commit_positions[*(uint32_t *)_vb];

This casting and dereferencing are ok because ...

> +static void write_lookup_table(struct hashfile *f,
> +                              struct pack_idx_entry **index,
> +                              uint32_t index_nr,
> +                              off_t *offsets)
> +{
> +       uint32_t i;
> +       uint32_t *table, *table_inv, *commit_positions;
> +
> +       ALLOC_ARRAY(table, writer.selected_nr);
> +       ALLOC_ARRAY(table_inv, writer.selected_nr);
> +       ALLOC_ARRAY(commit_positions, writer.selected_nr);

... `table` is where `_va` and `_vb` will be pointing into.

> +       QSORT_S(table, writer.selected_nr, table_cmp, commit_positions);

I started looking at this casting because of something similar in
"pack-bitmap: prepare to read lookup table extension". I'm pointing out
this instance just to say that it looks ok to me.


Martin

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 4/6] pack-bitmap: prepare to read lookup table extension
  2022-07-15 22:20           ` Taylor Blau
@ 2022-07-18  9:06             ` Martin Ågren
  2022-07-18 19:25               ` Abhradeep Chakraborty
  2022-07-26  0:45               ` Taylor Blau
  0 siblings, 2 replies; 162+ messages in thread
From: Martin Ågren @ 2022-07-18  9:06 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Abhradeep Chakraborty, Abhradeep Chakraborty via GitGitGadget,
	git, Kaartic Sivaram, Derrick Stolee

Hi Abhradeep and Taylor,

I very much enjoy following from a distance Abhradeep's work on this
series and all the reviewing and mentoring. I don't grasp anywhere near
all the details, but I've looked into this a bit:

On Sat, 16 Jul 2022 at 00:37, Taylor Blau <me@ttaylorr.com> wrote:
>
> On Fri, Jul 15, 2022 at 10:08:17PM +0530, Abhradeep Chakraborty wrote:
> > On Fri, Jul 15, 2022 at 8:16 AM Taylor Blau <me@ttaylorr.com> wrote:
> > >
> > > On Mon, Jul 04, 2022 at 08:46:14AM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> > > > +/*
> > > > + * Searches for a matching triplet. `va` is a pointer
> > > > + * to the wanted commit position value. `vb` points to
> > > > + * a triplet in lookup table. The first 4 bytes of each
> > > > + * triplet (pointed by `vb`) are compared with `*va`.
> > > > + */
> > > > +static int triplet_cmp(const void *va, const void *vb)
> > > > +{
> > > > +
> > > > +     uint32_t a = *(uint32_t *)va;
> > >
> > > The comment you added is definitely helpful, but I still think that this
> > > line is a little magical. `*va` isn't really a pointer to a `uint32_t`,
> > > but a pointer to the start of a triplet, which just *happens* to have a
> > > 4-byte integer at the beginning of it.

Yeah, this all looks quite magical with the casting, and with the
asymmetric handling of `va` and `vb`.

> > Are you sure about this? As far as I know, the first parameter of such
> > comparing functions is always a pointer to the given key that we need
> > to search for and the second parameter points to each element of an
> > array.

Yes, that matches my understanding and the man-page for bsearch(3):

  "The compar routine is expected to have two arguments which point to
  the key object and to an array member, in that order, [...]"

I think it would help to make this something like

  static int triplet_cmp(const void *key, const void *array_item)

to really highlight this asymmetric nature of this function, or to make
clear how the values flow through our call-chain through something like

  static int triplet_cmp(const void *commit_pos, const void *table_entry)

Because we really do rely on this promise of bsearch(3) -- if we would
instantiate a 'dummy' triplet carrying the key, we wouldn't need to (but
we would instead need to have our `cmp` function constantly re-read the
same value, including doing the byteswap).

Would it make sense to let the `const void *key` directly carry the
32-bit value and hope that `sizeof(key) >= sizeof(uint32_t)`? That's
probably too magical, "just" to save on dereferencing.

One thing that could perhaps make things clearer is if
`bsearch_triplet()` did take the position directly, rather than as a
pointer:

-static int bsearch_triplet(uint32_t *commit_pos,
+static int bsearch_triplet(uint32_t commit_pos,
                           struct bitmap_index *bitmap_git,
                           struct bitmap_lookup_table_triplet *triplet)
 {
-       unsigned char *p = bsearch(commit_pos,
bitmap_git->table_lookup, bitmap_git->entry_count,
+       unsigned char *p = bsearch(&commit_pos,
bitmap_git->table_lookup, bitmap_git->entry_count,
                                   BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH,
triplet_cmp);


Also, maybe s/bsearch_triplet/&_by_pos/ could clarify the intent of this
function?

> > I think "`va is a pointer to the wanted commit position value" is not
> > that descriptive. Maybe "`va` is a pointer to the given key" is
> > better. What do you think?
>
> Yes, the first argument to the comparison function used in bsearch() is

s/first/second/

> a pointer to some element in the array. I just meant that that array is
> the bitmap_git->table_lookup region, so each element isn't actually a
> uint32_t array, but the whole thing is an array of (uint32_t, uint64_t,
> uint32_t) triplets.
>
> What you wrote here is fine, and I don't even think that the comment
> needs updating. If you did want to clarify, I think you could say
> something along the lines of what you wrote above ("`va` is a pointer to
> an array element") and add something along the lines of "where the array
> is the lookup table region of the .bitmap".

I mentioned a few ideas for clarifying things above. I do think it would
be a good idea to differentiate the names of `va` and `vb` to make the
fundamental asymmetry between them clearer. The rest of my comments are
really just musings.

I originally started looking at this because I wanted to see why the
casting to a `uint32_t *` and dereferencing it was safe. The reason is,
we're always handling the same pointer to a `uint32_t` on the stack, so
alignment is guaranteed.


Martin

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 4/6] pack-bitmap: prepare to read lookup table extension
  2022-07-18  9:06             ` Martin Ågren
@ 2022-07-18 19:25               ` Abhradeep Chakraborty
  2022-07-18 23:26                 ` Martin Ågren
  2022-07-26  0:45               ` Taylor Blau
  1 sibling, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-07-18 19:25 UTC (permalink / raw)
  To: Martin Ågren
  Cc: Taylor Blau, Abhradeep Chakraborty via GitGitGadget, git,
	Kaartic Sivaram, Derrick Stolee

On Mon, Jul 18, 2022 at 2:37 PM Martin Ågren <martin.agren@gmail.com> wrote:
>
> Hi Abhradeep and Taylor,
>
> I very much enjoy following from a distance Abhradeep's work on this
> series and all the reviewing and mentoring. I don't grasp anywhere near
> all the details, but I've looked into this a bit:

Thanks!

>   "The compar routine is expected to have two arguments which point to
>   the key object and to an array member, in that order, [...]"
>
> I think it would help to make this something like
>
>   static int triplet_cmp(const void *key, const void *array_item)
>
> to really highlight this asymmetric nature of this function, or to make
> clear how the values flow through our call-chain through something like
>
>   static int triplet_cmp(const void *commit_pos, const void *table_entry)

Nice. Will update it.

> Would it make sense to let the `const void *key` directly carry the
> 32-bit value and hope that `sizeof(key) >= sizeof(uint32_t)`? That's
> probably too magical, "just" to save on dereferencing.

I do not have any particular opinion here. I will do whatever you think is best.

> One thing that could perhaps make things clearer is if
> `bsearch_triplet()` did take the position directly, rather than as a
> pointer:
>
> -static int bsearch_triplet(uint32_t *commit_pos,
> +static int bsearch_triplet(uint32_t commit_pos,
>                            struct bitmap_index *bitmap_git,
>                            struct bitmap_lookup_table_triplet *triplet)
>  {
> -       unsigned char *p = bsearch(commit_pos,
> bitmap_git->table_lookup, bitmap_git->entry_count,
> +       unsigned char *p = bsearch(&commit_pos,
> bitmap_git->table_lookup, bitmap_git->entry_count,
>                                    BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH,
> triplet_cmp);
>
>
> Also, maybe s/bsearch_triplet/&_by_pos/ could clarify the intent of this
> function?

Ok, sure!

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 4/6] pack-bitmap: prepare to read lookup table extension
  2022-07-18 19:25               ` Abhradeep Chakraborty
@ 2022-07-18 23:26                 ` Martin Ågren
  0 siblings, 0 replies; 162+ messages in thread
From: Martin Ågren @ 2022-07-18 23:26 UTC (permalink / raw)
  To: Abhradeep Chakraborty
  Cc: Taylor Blau, Abhradeep Chakraborty via GitGitGadget, git,
	Kaartic Sivaram, Derrick Stolee

On Mon, 18 Jul 2022 at 21:26, Abhradeep Chakraborty
<chakrabortyabhradeep79@gmail.com> wrote:
>
> On Mon, Jul 18, 2022 at 2:37 PM Martin Ågren <martin.agren@gmail.com> wrote:
> >
> > Would it make sense to let the `const void *key` directly carry the
> > 32-bit value and hope that `sizeof(key) >= sizeof(uint32_t)`? That's
> > probably too magical, "just" to save on dereferencing.
>
> I do not have any particular opinion here. I will do whatever you think is best.

To be honest, I think it would be better not to do that. I floated it as
a random idea, but it's somewhere in the vicinity of undefined behavior,
and in any case, it might be a bit too tricky. If we're doing a byteswap
anyway (on virtually all platforms) and doing a bunch of comparisons,
trying to save on a dereference doesn't seem worth the increased "huh"
factor.

Martin

^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH v4 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format
  2022-07-04  8:46   ` [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                       ` (7 preceding siblings ...)
  2022-07-06 19:21     ` Junio C Hamano
@ 2022-07-20 14:05     ` Abhradeep Chakraborty via GitGitGadget
  2022-07-20 14:05       ` [PATCH v4 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
                         ` (6 more replies)
  8 siblings, 7 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-07-20 14:05 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Abhradeep Chakraborty

When parsing the .bitmap file, git loads all the bitmaps one by one even if
some of the bitmaps are not necessary. We can remove this overhead by
loading only the necessary bitmaps. A look up table extension can solve this
issue.

Changes since v3:

 * The common code from both lookup_table_get_triplet() and
   bsearch_triplet_by_pos are moved to lookup_table_get_triplet_by_pointer
   function
 * parameter names of triplet_cmp function is changes (as suggested by
   Martin)
 * xor_items array is now work as reusable static buffer.
 * I moved the filling commit_positions array part (from
   pack-bitmap-write.c) to bitmap_writer_finish function. Because we had to
   iterate two times for commit positions - one in write_selected_commits_v1
   and another in write_lookup_table function. Hope this is acceptable :)
 * changes in performance tests (as suggested by Taylor)

Changes since v2:

 * Log messages related issues are fixed.
 * pack.writeBitmapLookupTable is now by default disabled.
 * Documentations are improved.
 * xor_row is used instead of xor_pos in triplets.
 * In pack-bitmap-write.c, off_t * is used for offsets array (Instead of
   uint64_t *).
 * struct bitmap_lookup_table_triplet is introduced and functions Like
   triplet_get_offset() and triplet_get_xor_pos() are removed.
 * table_size is getting subtracted from index_end irrespective of the value
   of GIT_TEST_READ_COMMIT_TABLE.
 * xor stack filling loop will stop iterating if a xor bitmap is already
   stored/parsed.
 * The stack will now store bitmap_lookup_table_xor_item items Of plain
   xor_row.
 * bitmap related test files are reformatted to allow repeating of tests
   with bitmap extension enabled.
 * comments are added.

Changes since v1:

This is the second version which addressed all (I think) the reviews. Please
notify me if some reviews are not addressed :)

 * The table size is decreased and the format has also changed. It now
   contains nr_entries triplets of size 4+8+4 bytes. Each triplet contains
   the following things - (1) 4 byte commit position (in the pack-index or
   midx) (2) 8 byte offset and (3) 4 byte xor triplet (i.e. with whose
   bitmap the current triplet's bitmap has to xor) position.
 * Performance tests are splitted into two commits. First contains the
   actual performance tests and second enables the pack.writeReverseIndex
   (as suggested by Taylor).
 * st_*() functions are used.
 * commit order is changed according to Derrick's suggestion.
 * Iterative approach is used instead of recursive approach to parse xor
   bitmaps. (As suggested by Derrick).
 * Some minor bug fixes of previous version.

Initial version:

The proposed table has:

 * a list of nr_entries object ids. These objects are commits that has
   bitmaps. Ids are stored in lexicographic order (for better searching).
 * a list of <offset, xor-offset> pairs (4-byte integers, network-byte
   order). The i'th pair denotes the offset and xor-offset(respectively) of
   the bitmap of i'th commit in the previous list. These two informations
   are necessary because only in this way bitmaps can be found without
   parsing all the bitmap.
 * a 4-byte integer for table specific flags (none exists currently).

Whenever git want to parse the bitmap for a specific commit, it will first
refer to the table and will look for the offset and xor-offset for that
commit. Git will then try to parse the bitmap located at the offset
position. The xor-offset can be used to find the xor-bitmap for the
bitmap(if any).

Abhradeep Chakraborty (6):
  Documentation/technical: describe bitmap lookup table extension
  pack-bitmap-write.c: write lookup table extension
  pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  pack-bitmap: prepare to read lookup table extension
  p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex`
  bitmap-lookup-table: add performance tests for lookup table

 Documentation/config/pack.txt             |   7 +
 Documentation/technical/bitmap-format.txt |  39 ++
 builtin/multi-pack-index.c                |   7 +
 builtin/pack-objects.c                    |   8 +
 midx.c                                    |   3 +
 midx.h                                    |   1 +
 pack-bitmap-write.c                       | 112 ++-
 pack-bitmap.c                             | 278 +++++++-
 pack-bitmap.h                             |  14 +-
 t/perf/p5310-pack-bitmaps.sh              |  68 +-
 t/perf/p5326-multi-pack-bitmaps.sh        |  95 +--
 t/t5310-pack-bitmaps.sh                   | 786 ++++++++++++----------
 t/t5311-pack-bitmaps-shallow.sh           |  53 +-
 t/t5326-multi-pack-bitmaps.sh             | 421 +++++++-----
 t/t5327-multi-pack-bitmaps-rev.sh         |   9 +
 15 files changed, 1244 insertions(+), 657 deletions(-)


base-commit: 39c15e485575089eb77c769f6da02f98a55905e0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1266%2FAbhra303%2Fbitmap-commit-table-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1266/Abhra303/bitmap-commit-table-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1266

Range-diff vs v3:

 1:  f72bf11e6ef = 1:  f72bf11e6ef Documentation/technical: describe bitmap lookup table extension
 2:  5e9b985e39b ! 2:  04244fadf5c pack-bitmap-write.c: write lookup table extension
     @@ Commit message
      
       ## pack-bitmap-write.c ##
      @@ pack-bitmap-write.c: static const struct object_id *oid_access(size_t pos, const void *table)
     - 	return &index[pos]->oid;
     - }
       
     -+static int commit_bitmap_writer_pos(struct object_id *oid,
     -+				    struct pack_idx_entry **index,
     -+				    uint32_t index_nr)
     -+{
     -+	return oid_pos(oid, index, index_nr, oid_access);
     -+}
     -+
       static void write_selected_commits_v1(struct hashfile *f,
       				      struct pack_idx_entry **index,
      -				      uint32_t index_nr)
      +				      uint32_t index_nr,
     -+				      off_t *offsets)
     ++				      off_t *offsets,
     ++				      uint32_t *commit_positions)
       {
       	int i;
       
     -@@ pack-bitmap-write.c: static void write_selected_commits_v1(struct hashfile *f,
     + 	for (i = 0; i < writer.selected_nr; ++i) {
       		struct bitmapped_commit *stored = &writer.selected[i];
       
     - 		int commit_pos =
     +-		int commit_pos =
      -			oid_pos(&stored->commit->object.oid, index, index_nr, oid_access);
     -+			commit_bitmap_writer_pos(&stored->commit->object.oid, index, index_nr);
     - 
     - 		if (commit_pos < 0)
     - 			BUG("trying to write commit not in index");
     - 
      +		if (offsets)
      +			offsets[i] = hashfile_total(f);
     -+
     - 		hashwrite_be32(f, commit_pos);
     + 
     +-		if (commit_pos < 0)
     +-			BUG("trying to write commit not in index");
     +-
     +-		hashwrite_be32(f, commit_pos);
     ++		hashwrite_be32(f, commit_positions[i]);
       		hashwrite_u8(f, stored->xor_offset);
       		hashwrite_u8(f, stored->flags);
     + 
      @@ pack-bitmap-write.c: static void write_selected_commits_v1(struct hashfile *f,
       	}
       }
     @@ pack-bitmap-write.c: static void write_selected_commits_v1(struct hashfile *f,
      +static void write_lookup_table(struct hashfile *f,
      +			       struct pack_idx_entry **index,
      +			       uint32_t index_nr,
     -+			       off_t *offsets)
     ++			       off_t *offsets,
     ++			       uint32_t *commit_positions)
      +{
      +	uint32_t i;
     -+	uint32_t *table, *table_inv, *commit_positions;
     ++	uint32_t *table, *table_inv;
      +
      +	ALLOC_ARRAY(table, writer.selected_nr);
      +	ALLOC_ARRAY(table_inv, writer.selected_nr);
     -+	ALLOC_ARRAY(commit_positions, writer.selected_nr);
     -+
     -+	/* store the index positions of the commits */
     -+	for (i = 0; i < writer.selected_nr; i++) {
     -+		int pos = commit_bitmap_writer_pos(&writer.selected[i].commit->object.oid,
     -+						   index, index_nr);
     -+		if (pos < 0)
     -+			BUG(_("trying to write commit not in index"));
     -+
     -+		commit_positions[i] = pos;
     -+	}
      +
      +	for (i = 0; i < writer.selected_nr; i++)
      +		table[i] = i;
      +
      +	/*
      +	 * At the end of this sort table[j] = i means that the i'th
     -+	 * bitmap corresponds to j'th bitmapped commit in lex order of
     -+	 * OIDs.
     ++	 * bitmap corresponds to j'th bitmapped commit (among the selected
     ++	 * commits) in lex order of OIDs.
      +	 */
      +	QSORT_S(table, writer.selected_nr, table_cmp, commit_positions);
      +
     @@ pack-bitmap-write.c: static void write_selected_commits_v1(struct hashfile *f,
      +
      +	free(table);
      +	free(table_inv);
     -+	free(commit_positions);
      +}
      +
       static void write_hash_cache(struct hashfile *f,
     @@ pack-bitmap-write.c: void bitmap_writer_finish(struct pack_idx_entry **index,
      +	off_t *offsets = NULL;
       	struct strbuf tmp_file = STRBUF_INIT;
       	struct hashfile *f;
     ++	uint32_t *commit_positions = NULL;
     + 
     + 	struct bitmap_disk_header header;
       
      @@ pack-bitmap-write.c: void bitmap_writer_finish(struct pack_idx_entry **index,
       	dump_bitmap(f, writer.trees);
     @@ pack-bitmap-write.c: void bitmap_writer_finish(struct pack_idx_entry **index,
       	dump_bitmap(f, writer.tags);
      -	write_selected_commits_v1(f, index, index_nr);
      +
     ++	ALLOC_ARRAY(commit_positions, writer.selected_nr);
     ++	for (uint32_t i = 0; i < writer.selected_nr; ++i) {
     ++		struct bitmapped_commit *stored = &writer.selected[i];
     ++		int commit_pos = oid_pos(&stored->commit->object.oid, index, index_nr, oid_access);
     ++
     ++		if (commit_pos < 0)
     ++			BUG(_("trying to write commit not in index"));
     ++
     ++		commit_positions[i] = commit_pos;
     ++	}
     ++
      +	if (options & BITMAP_OPT_LOOKUP_TABLE)
      +		CALLOC_ARRAY(offsets, index_nr);
      +
     -+	write_selected_commits_v1(f, index, index_nr, offsets);
     ++	write_selected_commits_v1(f, index, index_nr, offsets, commit_positions);
      +
      +	if (options & BITMAP_OPT_LOOKUP_TABLE)
     -+		write_lookup_table(f, index, index_nr, offsets);
     ++		write_lookup_table(f, index, index_nr, offsets, commit_positions);
       
       	if (options & BITMAP_OPT_HASH_CACHE)
       		write_hash_cache(f, index, index_nr);
     @@ pack-bitmap-write.c: void bitmap_writer_finish(struct pack_idx_entry **index,
       
       	strbuf_release(&tmp_file);
      +	free(offsets);
     ++	free(commit_positions);
       }
      
       ## pack-bitmap.h ##
 3:  3dc40cc7f73 = 3:  8bd7639e4b9 pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
 4:  e64362621d2 ! 4:  afc8c660ac1 pack-bitmap: prepare to read lookup table extension
     @@ pack-bitmap.c: struct include_data {
      +};
      +
      +/*
     ++ * Given a `triplet` struct pointer and pointer `p`, this
     ++ * function reads the triplet beginning at `p` into the struct.
     ++ * Note that this function assumes that there is enough memory
     ++ * left for filling the `triplet` struct from `p`.
     ++ */
     ++static int lookup_table_get_triplet_by_pointer(struct bitmap_lookup_table_triplet *triplet,
     ++					       const unsigned char *p)
     ++{
     ++	if (!triplet)
     ++		return -1;
     ++
     ++	triplet->commit_pos = get_be32(p);
     ++	p += sizeof(uint32_t);
     ++	triplet->offset = get_be64(p);
     ++	p += sizeof(uint64_t);
     ++	triplet->xor_row = get_be32(p);
     ++	return 0;
     ++}
     ++
     ++/*
      + * This function gets the raw triplet from `row`'th row in the
      + * lookup table and fills that data to the `triplet`.
      + */
     @@ pack-bitmap.c: struct include_data {
      +
      +	p = bitmap_git->table_lookup + st_mult(pos, BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH);
      +
     -+	triplet->commit_pos = get_be32(p);
     -+	p += sizeof(uint32_t);
     -+	triplet->offset = get_be64(p);
     -+	p += sizeof(uint64_t);
     -+	triplet->xor_row = get_be32(p);
     -+	return 0;
     ++	return lookup_table_get_triplet_by_pointer(triplet, p);
      +}
      +
      +/*
     -+ * Searches for a matching triplet. `va` is a pointer
     -+ * to the wanted commit position value. `vb` points to
     ++ * Searches for a matching triplet. `commit_pos` is a pointer
     ++ * to the wanted commit position value. `table_entry` points to
      + * a triplet in lookup table. The first 4 bytes of each
     -+ * triplet (pointed by `vb`) are compared with `*va`.
     ++ * triplet (pointed by `table_entry`) are compared with `*commit_pos`.
      + */
     -+static int triplet_cmp(const void *va, const void *vb)
     ++static int triplet_cmp(const void *commit_pos, const void *table_entry)
      +{
      +
     -+	uint32_t a = *(uint32_t *)va;
     -+	uint32_t b = get_be32(vb);
     ++	uint32_t a = *(uint32_t *)commit_pos;
     ++	uint32_t b = get_be32(table_entry);
      +	if (a > b)
      +		return 1;
      +	else if (a < b)
     @@ pack-bitmap.c: struct include_data {
      +}
      +
      +/*
     -+ * `bsearch_triplet` function searches for the raw triplet having
     -+ * commit position same as `commit_pos` and fills `triplet`
     -+ * object from the raw triplet. Returns 1 on success and 0
     -+ * on failure.
     ++ * `bsearch_triplet_by_pos` function searches for the raw triplet
     ++ * having commit position same as `commit_pos` and fills `triplet`
     ++ * object from the raw triplet. Returns 1 on success and 0 on
     ++ * failure.
      + */
     -+static int bsearch_triplet(uint32_t *commit_pos,
     -+			   struct bitmap_index *bitmap_git,
     -+			   struct bitmap_lookup_table_triplet *triplet)
     ++static int bsearch_triplet_by_pos(uint32_t commit_pos,
     ++				  struct bitmap_index *bitmap_git,
     ++				  struct bitmap_lookup_table_triplet *triplet)
      +{
     -+	unsigned char *p = bsearch(commit_pos, bitmap_git->table_lookup, bitmap_git->entry_count,
     ++	unsigned char *p = bsearch(&commit_pos, bitmap_git->table_lookup, bitmap_git->entry_count,
      +				   BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH, triplet_cmp);
      +
      +	if (!p)
     -+		return 0;
     -+	triplet->commit_pos = get_be32(p);
     -+	p += sizeof(uint32_t);
     -+	triplet->offset = get_be64(p);
     -+	p += sizeof(uint64_t);
     -+	triplet->xor_row = get_be32(p);
     -+	return 1;
     ++		return -1;
     ++
     ++	return lookup_table_get_triplet_by_pointer(triplet, p);
      +}
      +
      +static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_git,
     @@ pack-bitmap.c: struct include_data {
      +{
      +	uint32_t commit_pos, xor_row;
      +	uint64_t offset;
     -+	int flags;
     ++	int flags, found;
      +	struct bitmap_lookup_table_triplet triplet;
      +	struct object_id *oid = &commit->object.oid;
      +	struct ewah_bitmap *bitmap;
      +	struct stored_bitmap *xor_bitmap = NULL;
     ++	const int bitmap_header_size = 6;
     ++	static struct bitmap_lookup_table_xor_item *xor_items = NULL;
     ++	static size_t xor_items_nr = 0, xor_items_alloc = 0;
     ++	static int is_corrupt = 0;
     ++
     ++	if (is_corrupt)
     ++		return NULL;
      +
     -+	int found = bsearch_pos(bitmap_git, oid, &commit_pos);
     ++	found = bsearch_pos(bitmap_git, oid, &commit_pos);
      +
      +	if (!found)
      +		return NULL;
      +
     -+	if (!bsearch_triplet(&commit_pos, bitmap_git, &triplet))
     ++	if (bsearch_triplet_by_pos(commit_pos, bitmap_git, &triplet) < 0)
      +		return NULL;
      +
     ++	xor_items_nr = 0;
      +	offset = triplet.offset;
      +	xor_row = triplet.xor_row;
      +
     @@ pack-bitmap.c: struct include_data {
      +		int xor_flags;
      +		khiter_t hash_pos;
      +		uint64_t offset_xor;
     -+		struct bitmap_lookup_table_xor_item *xor_items;
     -+		struct bitmap_lookup_table_xor_item xor_item;
     -+		size_t xor_items_nr = 0, xor_items_alloc = 64;
     ++		struct bitmap_lookup_table_xor_item *xor_item;
      +
     -+		ALLOC_ARRAY(xor_items, xor_items_alloc);
      +		while (xor_row != 0xffffffff) {
     -+			struct object_id xor_oid;
     ++			ALLOC_GROW(xor_items, xor_items_nr + 1, xor_items_alloc);
      +
      +			if (xor_items_nr + 1 >= bitmap_git->entry_count) {
     -+				free(xor_items);
      +				error(_("corrupt bitmap lookup table: xor chain exceed entry count"));
     -+				return NULL;
     ++				goto corrupt;
      +			}
      +
      +			if (lookup_table_get_triplet(bitmap_git, xor_row, &triplet) < 0)
     -+				return NULL;
     ++				goto corrupt;
      +
     -+			offset_xor = triplet.offset;
     ++			xor_item = &xor_items[xor_items_nr];
     ++			xor_item->offset = triplet.offset;
      +
     -+			if (nth_bitmap_object_oid(bitmap_git, &xor_oid, triplet.commit_pos) < 0) {
     -+				free(xor_items);
     ++			if (nth_bitmap_object_oid(bitmap_git, &xor_item->oid, triplet.commit_pos) < 0) {
      +				error(_("corrupt bitmap lookup table: commit index %u out of range"),
      +					triplet.commit_pos);
     -+				return NULL;
     ++				goto corrupt;
      +			}
      +
     -+			hash_pos = kh_get_oid_map(bitmap_git->bitmaps, xor_oid);
     ++			hash_pos = kh_get_oid_map(bitmap_git->bitmaps, xor_item->oid);
      +
      +			/*
      +			 * If desired bitmap is already stored, we don't need
     @@ pack-bitmap.c: struct include_data {
      +			if (hash_pos < kh_end(bitmap_git->bitmaps) &&
      +			    (xor_bitmap = kh_value(bitmap_git->bitmaps, hash_pos)))
      +				break;
     -+
     -+			ALLOC_GROW(xor_items, xor_items_nr + 1, xor_items_alloc);
     -+			xor_items[xor_items_nr++] = (struct bitmap_lookup_table_xor_item) {.oid = xor_oid,
     -+											   .offset = offset_xor};
     ++			xor_items_nr++;
      +			xor_row = triplet.xor_row;
      +		}
      +
      +		while (xor_items_nr) {
     -+			xor_item = xor_items[xor_items_nr - 1];
     -+			offset_xor = xor_item.offset;
     ++			xor_item = &xor_items[xor_items_nr - 1];
     ++			offset_xor = xor_item->offset;
      +
      +			bitmap_git->map_pos = offset_xor;
     -+			if (bitmap_git->map_size - bitmap_git->map_pos < 6) {
     ++			if (bitmap_git->map_size - bitmap_git->map_pos < bitmap_header_size) {
      +				error(_("corrupt ewah bitmap: truncated header for bitmap of commit \"%s\""),
     -+					oid_to_hex(&xor_item.oid));
     -+				free(xor_items);
     -+				return NULL;
     ++					oid_to_hex(&xor_item->oid));
     ++				goto corrupt;
      +			}
      +
      +			bitmap_git->map_pos = bitmap_git->map_pos + sizeof(uint32_t) + sizeof(uint8_t);
      +			xor_flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
      +			bitmap = read_bitmap_1(bitmap_git);
      +
     -+			if (!bitmap) {
     -+				free(xor_items);
     -+				return NULL;
     -+			}
     ++			if (!bitmap)
     ++				goto corrupt;
      +
     -+			xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item.oid, xor_bitmap, xor_flags);
     ++			xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid, xor_bitmap, xor_flags);
      +			xor_items_nr--;
      +		}
     -+
     -+		free(xor_items);
      +	}
      +
      +	bitmap_git->map_pos = offset;
     -+	if (bitmap_git->map_size - bitmap_git->map_pos < 6) {
     ++	if (bitmap_git->map_size - bitmap_git->map_pos < bitmap_header_size) {
      +		error(_("corrupt ewah bitmap: truncated header for bitmap of commit \"%s\""),
      +			oid_to_hex(oid));
     -+		return NULL;
     ++		goto corrupt;
      +	}
      +
      +	bitmap_git->map_pos = bitmap_git->map_pos + sizeof(uint32_t) + sizeof(uint8_t);
     @@ pack-bitmap.c: struct include_data {
      +	bitmap = read_bitmap_1(bitmap_git);
      +
      +	if (!bitmap)
     -+		return NULL;
     ++		goto corrupt;
      +
      +	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags);
     ++
     ++corrupt:
     ++	free(xor_items);
     ++	is_corrupt = 1;
     ++	return NULL;
      +}
      +
       struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
 6:  4f9f1049485 ! 5:  fc69489e395 p5310-pack-bitmaps.sh: remove pack.writeReverseIndex
     @@ Metadata
      Author: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
      
       ## Commit message ##
     -    p5310-pack-bitmaps.sh: remove pack.writeReverseIndex
     +    p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex`
      
     -    The previous change enables the `pack.writereverseindex` to see
     -    the effect of writing reverse index in the performance test.
     +    Enable `pack.writeReverseIndex` before running pack-bitmap related
     +    performance tests.
      
     -    Remove the `pack.writeReverseIndex` configuration.
     +    The performance difference with `pack.writeReverseIndex` enabled and
     +    with disabled are given below -
      
     -    Below is the result of performance test. Output format is in
     -    seconds.
     +    With `pack.writeReverseIndex`
     +    -------------------------------
     +
     +    Test                                                 this tree
     +    -------------------------------------------------------------------------
     +    5310.3: repack to disk                                 296.55(256.53+14.52)
     +    5310.4: simulated clone                                15.64(8.88+1.39)
     +    5310.5: simulated fetch                                1.65(2.75+0.20)
     +    5310.6: pack to file (bitmap)                          48.71(30.20+7.58)
     +    5310.7: rev-list (commits)                             0.61(0.41+0.08)
     +    5310.8: rev-list (objects)                             4.38(4.26+0.09)
     +    5310.9: rev-list with tag negated via --not            0.07(0.02+0.04)
     +             --all (objects)
     +    5310.10: rev-list with negative tag (objects)          0.05(0.01+0.03)
     +    5310.11: rev-list count with blob:none                 0.08(0.03+0.04)
     +    5310.12: rev-list count with blob:limit=1k             7.29(6.92+0.30)
     +    5310.13: rev-list count with tree:0                    0.08(0.03+0.04)
     +    5310.14: simulated partial clone                       9.45(8.12+0.41)
     +    5310.16: clone (partial bitmap)                        17.02(10.61+2.67)
     +    5310.17: pack to file (partial bitmap)                 51.91(28.57+7.48)
     +    5310.18: rev-list with tree filter (partial bitmap)    1.00(0.22+0.24)
     +
     +    Without `pack.writeReverseIndex`:
     +    -----------------------------
      
          Test                                                  this tree
          ------------------------------------------------------------------------
     -    5310.4: repack to disk (lookup=false)               293.80(251.30+14.30)
     -    5310.5: simulated clone                             12.50(5.15+1.36)
     -    5310.6: simulated fetch                             1.83(2.90+0.23)
     -    5310.7: pack to file (bitmap)                       39.70(20.25+7.14)
     -    5310.8: rev-list (commits)                          1.00(0.60+0.13)
     -    5310.9: rev-list (objects)                          4.11(4.00+0.10)
     -    5310.10: rev-list with tag negated via --not        0.07(0.02+0.05)
     +    5310.3: repack to disk                              293.80(251.30+14.30)
     +    5310.4: simulated clone                             12.50(5.15+1.36)
     +    5310.5: simulated fetch                             1.83(2.90+0.23)
     +    5310.6: pack to file (bitmap)                       39.70(20.25+7.14)
     +    5310.7: rev-list (commits)                          1.00(0.60+0.13)
     +    5310.8: rev-list (objects)                          4.11(4.00+0.10)
     +    5310.9: rev-list with tag negated via --not         0.07(0.02+0.05)
                   --all (objects)
     -    5310.11: rev-list with negative tag (objects)       0.23(0.16+0.06)
     -    5310.12: rev-list count with blob:none              0.27(0.18+0.08)
     -    5310.13: rev-list count with blob:limit=1k          6.41(5.98+0.41)
     -    5310.14: rev-list count with tree:0                 0.26(0.18+0.07)
     -    5310.15: simulated partial clone                    4.34(3.29+0.37)
     -    5310.19: repack to disk (lookup=true)               250.93(171.97+20.78)
     -    5310.20: simulated clone                            10.80(5.14+1.06)
     -    5310.21: simulated fetch                            0.71(0.79+0.16)
     -    5310.22: pack to file (bitmap)                      39.49(20.19+6.98)
     -    5310.23: rev-list (commits)                         0.81(0.48+0.09)
     -    5310.24: rev-list (objects)                         3.48(3.38+0.09)
     -    5310.25: rev-list with tag negated via --not        0.04(0.00+0.03)
     -             --all (objects)
     -    5310.26: rev-list with negative tag (objects)       0.22(0.16+0.05)
     -    5310.27: rev-list count with blob:none              0.22(0.16+0.05)
     -    5310.28: rev-list count with blob:limit=1k          6.21(5.76+0.29)
     -    5310.29: rev-list count with tree:0                 0.23(0.16+0.06)
     -    5310.30: simulated partial clone                    4.53(3.14+0.39)
     -
     -    Tests 4-15 are without the use of lookup table. The rests are
     -    repeatation of the previous tests but using lookup table.
     +    5310.10: rev-list with negative tag (objects)       0.23(0.16+0.06)
     +    5310.11: rev-list count with blob:none              0.27(0.18+0.08)
     +    5310.12: rev-list count with blob:limit=1k          6.41(5.98+0.41)
     +    5310.13: rev-list count with tree:0                 0.26(0.18+0.07)
     +    5310.14: simulated partial clone                    4.34(3.29+0.37)
     +    5310.16: clone (partial bitmap)                     21.48(15.12+2.42)
     +    5310.17: pack to file (partial bitmap)              47.35(37.80+4.84)
     +    5310.18: rev-list with tree filter (partial bitmap) 0.73(0.07+0.21)
      
     -    Mentored-by: Taylor Blau <me@ttaylorr.com>
     -    Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
          Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
      
       ## t/perf/p5310-pack-bitmaps.sh ##
     @@ t/perf/p5310-pack-bitmaps.sh: test_perf_large_repo
       # We intentionally use the deprecated pack.writebitmaps
       # config so that we can test against older versions of git.
       test_expect_success 'setup bitmap config' '
     --	git config pack.writebitmaps true &&
     --	git config pack.writeReverseIndex true
     -+	git config pack.writebitmaps true
     +-	git config pack.writebitmaps true
     ++	git config pack.writebitmaps true &&
     ++	git config pack.writeReverseIndex true
       '
       
     - test_bitmap () {
     + # we need to create the tag up front such that it is covered by the repack and
 5:  a155c1e2eba ! 6:  52f7d8359ee bitmap-lookup-table: add performance tests for lookup table
     @@ Commit message
          5310.13: rev-list count with blob:limit=1k              7.29(6.92+0.30)
          5310.14: rev-list count with tree:0                     0.08(0.03+0.04)
          5310.15: simulated partial clone                        9.45(8.12+0.41)
     -    5310.19: repack to disk (lookup=true)                   255.92(188.13+20.47)
     -    5310.20: simulated clone                                13.78(8.84+1.09)
     -    5310.21: simulated fetch                                0.52(0.63+0.14)
     -    5310.22: pack to file (bitmap)                          44.34(28.94+6.84)
     -    5310.23: rev-list (commits)                             0.48(0.31+0.06)
     -    5310.24: rev-list (objects)                             4.02(3.93+0.07)
     -    5310.25: rev-list with tag negated via --not            0.04(0.00+0.03)
     +    5310.17: clone (partial bitmap)                         21.00(15.04+2.39)
     +    5310.18: pack to file (partial bitmap)                  47.98(38.13+5.23)
     +    5310.19: rev-list with tree filter (partial bitmap)     0.70(0.07+0.20)
     +    5310.22: repack to disk (lookup=true)                   255.92(188.13+20.47)
     +    5310.23: simulated clone                                13.78(8.84+1.09)
     +    5310.24: simulated fetch                                0.52(0.63+0.14)
     +    5310.25: pack to file (bitmap)                          44.34(28.94+6.84)
     +    5310.26: rev-list (commits)                             0.48(0.31+0.06)
     +    5310.27: rev-list (objects)                             4.02(3.93+0.07)
     +    5310.28: rev-list with tag negated via --not            0.04(0.00+0.03)
                   --all (objects)
     -    5310.26: rev-list with negative tag (objects)           0.04(0.00+0.03)
     -    5310.27: rev-list count with blob:none                  0.04(0.01+0.03)
     -    5310.28: rev-list count with blob:limit=1k              6.48(6.23+0.22)
     -    5310.29: rev-list count with tree:0                     0.04(0.01+0.03)
     -    5310.30: simulated partial clone                        8.30(7.21+0.36)
     +    5310.29: rev-list with negative tag (objects)           0.04(0.00+0.03)
     +    5310.30: rev-list count with blob:none                  0.04(0.01+0.03)
     +    5310.31: rev-list count with blob:limit=1k              6.48(6.23+0.22)
     +    5310.32: rev-list count with tree:0                     0.04(0.01+0.03)
     +    5310.33: simulated partial clone                        8.30(7.21+0.36)
     +    5310.35: clone (partial bitmap)                         20.34(15.00+2.41)
     +    5310.36: pack to file (partial bitmap)                  46.45(38.05+5.20)
     +    5310.37: rev-list with tree filter (partial bitmap)     0.61(0.06+0.20)
      
          Test 4-15 are tested without using lookup table. Same tests are
          repeated in 16-30 (using lookup table).
     @@ Commit message
          Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
      
       ## t/perf/p5310-pack-bitmaps.sh ##
     -@@ t/perf/p5310-pack-bitmaps.sh: test_perf_large_repo
     - # We intentionally use the deprecated pack.writebitmaps
     - # config so that we can test against older versions of git.
     - test_expect_success 'setup bitmap config' '
     --	git config pack.writebitmaps true
     -+	git config pack.writebitmaps true &&
     -+	git config pack.writeReverseIndex true
     +@@ t/perf/p5310-pack-bitmaps.sh: test_expect_success 'setup bitmap config' '
     + 	git config pack.writeReverseIndex true
       '
       
      -# we need to create the tag up front such that it is covered by the repack and
     @@ t/perf/p5310-pack-bitmaps.sh: test_perf_large_repo
      +		# had happened
      +		git update-ref HEAD $orig_tip
      +	'
     ++
     ++	test_partial_bitmap
      +}
       
      -test_partial_bitmap
     @@ t/perf/p5326-multi-pack-bitmaps.sh: test_description='Tests performance using mi
      +		# had happened
      +		git update-ref HEAD $orig_tip
      +	'
     ++
     ++	test_partial_bitmap
      +}
      +
      +test_bitmap false

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH v4 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-07-20 14:05     ` [PATCH v4 " Abhradeep Chakraborty via GitGitGadget
@ 2022-07-20 14:05       ` Abhradeep Chakraborty via GitGitGadget
  2022-07-20 14:05       ` [PATCH v4 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
                         ` (5 subsequent siblings)
  6 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-07-20 14:05 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

When reading bitmap file, Git loads each and every bitmap one by one
even if all the bitmaps are not required. A "bitmap lookup table"
extension to the bitmap format can reduce the overhead of loading
bitmaps which stores a list of bitmapped commit id pos (in the midx
or pack, along with their offset and xor offset. This way git can
load only the necessary bitmaps without loading the previous bitmaps.

Older versions of Git ignore the lookup table extension and don't
throw any kind of warning or error while parsing the bitmap file.

Add some information for the new "bitmap lookup table" extension in the
bitmap-format documentation.

Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Co-Authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 Documentation/technical/bitmap-format.txt | 39 +++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index 04b3ec21785..c30dc177643 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -67,6 +67,17 @@ MIDXs, both the bit-cache and rev-cache extensions are required.
 			pack/MIDX. The format and meaning of the name-hash is
 			described below.
 
+			** {empty}
+			BITMAP_OPT_LOOKUP_TABLE (0x10): :::
+			If present, the end of the bitmap file contains a table
+			containing a list of `N` <commit_pos, offset, xor_row>
+			triplets. The format and meaning of the table is described
+			below.
++
+NOTE: Unlike the xor_offset used to compress an individual bitmap,
+`xor_row` stores an *absolute* index into the lookup table, not a location
+relative to the current entry.
+
 		4-byte entry count (network byte order)
 
 			The total count of entries (bitmapped commits) in this bitmap index.
@@ -205,3 +216,31 @@ Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag.
 If implementations want to choose a different hashing scheme, they are
 free to do so, but MUST allocate a new header flag (because comparing
 hashes made under two different schemes would be pointless).
+
+Commit lookup table
+-------------------
+
+If the BITMAP_OPT_LOOKUP_TABLE flag is set, the last `N * (4 + 8 + 4)`
+bytes (preceding the name-hash cache and trailing hash) of the `.bitmap`
+file contains a lookup table specifying the information needed to get
+the desired bitmap from the entries without parsing previous unnecessary
+bitmaps.
+
+For a `.bitmap` containing `nr_entries` reachability bitmaps, the table
+contains a list of `nr_entries` <commit_pos, offset, xor_row> triplets
+(sorted in the ascending order of `commit_pos`). The content of i'th
+triplet is -
+
+	* {empty}
+	commit_pos (4 byte integer, network byte order): ::
+	It stores the object position of a commit (in the midx or pack
+	index).
+
+	* {empty}
+	offset (8 byte integer, network byte order): ::
+	The offset from which that commit's bitmap can be read.
+
+	* {empty}
+	xor_row (4 byte integer, network byte order): ::
+	The position of the triplet whose bitmap is used to compress
+	this one, or `0xffffffff` if no such bitmap exists.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v4 2/6] pack-bitmap-write.c: write lookup table extension
  2022-07-20 14:05     ` [PATCH v4 " Abhradeep Chakraborty via GitGitGadget
  2022-07-20 14:05       ` [PATCH v4 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
@ 2022-07-20 14:05       ` Abhradeep Chakraborty via GitGitGadget
  2022-07-20 14:05       ` [PATCH v4 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
                         ` (4 subsequent siblings)
  6 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-07-20 14:05 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

The bitmap lookup table extension was documented by an earlier
change, but Git does not yet know how to write that extension.

Teach Git to write bitmap lookup table extension. The table contains
the list of `N` <commit_pos, offset, xor_row>` triplets. These
triplets are sorted according to their commit pos (ascending order).
The meaning of each data in the i'th triplet is given below:

  - commit_pos stores commit position (in the pack-index or midx).
    It is a 4 byte network byte order unsigned integer.

  - offset is the position (in the bitmap file) from which that
    commit's bitmap can be read.

  - xor_row is the position of the triplet in the lookup table
    whose bitmap is used to compress this bitmap, or `0xffffffff`
    if no such bitmap exists.

Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 pack-bitmap-write.c | 112 ++++++++++++++++++++++++++++++++++++++++----
 pack-bitmap.h       |   5 +-
 2 files changed, 107 insertions(+), 10 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index c43375bd344..9843790cb60 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -650,20 +650,19 @@ static const struct object_id *oid_access(size_t pos, const void *table)
 
 static void write_selected_commits_v1(struct hashfile *f,
 				      struct pack_idx_entry **index,
-				      uint32_t index_nr)
+				      uint32_t index_nr,
+				      off_t *offsets,
+				      uint32_t *commit_positions)
 {
 	int i;
 
 	for (i = 0; i < writer.selected_nr; ++i) {
 		struct bitmapped_commit *stored = &writer.selected[i];
 
-		int commit_pos =
-			oid_pos(&stored->commit->object.oid, index, index_nr, oid_access);
+		if (offsets)
+			offsets[i] = hashfile_total(f);
 
-		if (commit_pos < 0)
-			BUG("trying to write commit not in index");
-
-		hashwrite_be32(f, commit_pos);
+		hashwrite_be32(f, commit_positions[i]);
 		hashwrite_u8(f, stored->xor_offset);
 		hashwrite_u8(f, stored->flags);
 
@@ -671,6 +670,81 @@ static void write_selected_commits_v1(struct hashfile *f,
 	}
 }
 
+static int table_cmp(const void *_va, const void *_vb, void *_data)
+{
+	uint32_t *commit_positions = _data;
+	uint32_t a = commit_positions[*(uint32_t *)_va];
+	uint32_t b = commit_positions[*(uint32_t *)_vb];
+
+	if (a > b)
+		return 1;
+	else if (a < b)
+		return -1;
+
+	return 0;
+}
+
+static void write_lookup_table(struct hashfile *f,
+			       struct pack_idx_entry **index,
+			       uint32_t index_nr,
+			       off_t *offsets,
+			       uint32_t *commit_positions)
+{
+	uint32_t i;
+	uint32_t *table, *table_inv;
+
+	ALLOC_ARRAY(table, writer.selected_nr);
+	ALLOC_ARRAY(table_inv, writer.selected_nr);
+
+	for (i = 0; i < writer.selected_nr; i++)
+		table[i] = i;
+
+	/*
+	 * At the end of this sort table[j] = i means that the i'th
+	 * bitmap corresponds to j'th bitmapped commit (among the selected
+	 * commits) in lex order of OIDs.
+	 */
+	QSORT_S(table, writer.selected_nr, table_cmp, commit_positions);
+
+	/* table_inv helps us discover that relationship (i'th bitmap
+	 * to j'th commit by j = table_inv[i])
+	 */
+	for (i = 0; i < writer.selected_nr; i++)
+		table_inv[table[i]] = i;
+
+	trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository);
+	for (i = 0; i < writer.selected_nr; i++) {
+		struct bitmapped_commit *selected = &writer.selected[table[i]];
+		uint32_t xor_offset = selected->xor_offset;
+		uint32_t xor_row;
+
+		if (xor_offset) {
+			/*
+			 * xor_index stores the index (in the bitmap entries)
+			 * of the corresponding xor bitmap. But we need to convert
+			 * this index into lookup table's index. So, table_inv[xor_index]
+			 * gives us the index position w.r.t. the lookup table.
+			 *
+			 * If "k = table[i] - xor_offset" then the xor base is the k'th
+			 * bitmap. `table_inv[k]` gives us the position of that bitmap
+			 * in the lookup table.
+			 */
+			uint32_t xor_index = table[i] - xor_offset;
+			xor_row = table_inv[xor_index];
+		} else {
+			xor_row = 0xffffffff;
+		}
+
+		hashwrite_be32(f, commit_positions[table[i]]);
+		hashwrite_be64(f, (uint64_t)offsets[table[i]]);
+		hashwrite_be32(f, xor_row);
+	}
+	trace2_region_leave("pack-bitmap-write", "writing_lookup_table", the_repository);
+
+	free(table);
+	free(table_inv);
+}
+
 static void write_hash_cache(struct hashfile *f,
 			     struct pack_idx_entry **index,
 			     uint32_t index_nr)
@@ -695,8 +769,10 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 {
 	static uint16_t default_version = 1;
 	static uint16_t flags = BITMAP_OPT_FULL_DAG;
+	off_t *offsets = NULL;
 	struct strbuf tmp_file = STRBUF_INIT;
 	struct hashfile *f;
+	uint32_t *commit_positions = NULL;
 
 	struct bitmap_disk_header header;
 
@@ -715,7 +791,25 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	dump_bitmap(f, writer.trees);
 	dump_bitmap(f, writer.blobs);
 	dump_bitmap(f, writer.tags);
-	write_selected_commits_v1(f, index, index_nr);
+
+	ALLOC_ARRAY(commit_positions, writer.selected_nr);
+	for (uint32_t i = 0; i < writer.selected_nr; ++i) {
+		struct bitmapped_commit *stored = &writer.selected[i];
+		int commit_pos = oid_pos(&stored->commit->object.oid, index, index_nr, oid_access);
+
+		if (commit_pos < 0)
+			BUG(_("trying to write commit not in index"));
+
+		commit_positions[i] = commit_pos;
+	}
+
+	if (options & BITMAP_OPT_LOOKUP_TABLE)
+		CALLOC_ARRAY(offsets, index_nr);
+
+	write_selected_commits_v1(f, index, index_nr, offsets, commit_positions);
+
+	if (options & BITMAP_OPT_LOOKUP_TABLE)
+		write_lookup_table(f, index, index_nr, offsets, commit_positions);
 
 	if (options & BITMAP_OPT_HASH_CACHE)
 		write_hash_cache(f, index, index_nr);
@@ -730,4 +824,6 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 		die_errno("unable to rename temporary bitmap file to '%s'", filename);
 
 	strbuf_release(&tmp_file);
+	free(offsets);
+	free(commit_positions);
 }
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 3d3ddd77345..67a9d0fc303 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -24,8 +24,9 @@ struct bitmap_disk_header {
 #define NEEDS_BITMAP (1u<<22)
 
 enum pack_bitmap_opts {
-	BITMAP_OPT_FULL_DAG = 1,
-	BITMAP_OPT_HASH_CACHE = 4,
+	BITMAP_OPT_FULL_DAG = 0x1,
+	BITMAP_OPT_HASH_CACHE = 0x4,
+	BITMAP_OPT_LOOKUP_TABLE = 0x10,
 };
 
 enum pack_bitmap_flags {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v4 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-07-20 14:05     ` [PATCH v4 " Abhradeep Chakraborty via GitGitGadget
  2022-07-20 14:05       ` [PATCH v4 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
  2022-07-20 14:05       ` [PATCH v4 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
@ 2022-07-20 14:05       ` Abhradeep Chakraborty via GitGitGadget
  2022-07-20 14:05       ` [PATCH v4 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
                         ` (3 subsequent siblings)
  6 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-07-20 14:05 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Teach Git to provide a way for users to enable/disable bitmap lookup
table extension by providing a config option named 'writeBitmapLookupTable'.
Default is false.

Also add test to verify writting of lookup table.

Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Co-Authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 Documentation/config/pack.txt     |   7 +
 builtin/multi-pack-index.c        |   7 +
 builtin/pack-objects.c            |   8 +
 midx.c                            |   3 +
 midx.h                            |   1 +
 t/t5310-pack-bitmaps.sh           | 792 ++++++++++++++++--------------
 t/t5311-pack-bitmaps-shallow.sh   |  53 +-
 t/t5326-multi-pack-bitmaps.sh     | 421 +++++++++-------
 t/t5327-multi-pack-bitmaps-rev.sh |   9 +
 9 files changed, 720 insertions(+), 581 deletions(-)

diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index ad7f73a1ead..b955ca572ec 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -164,6 +164,13 @@ When writing a multi-pack reachability bitmap, no new namehashes are
 computed; instead, any namehashes stored in an existing bitmap are
 permuted into their appropriate location when writing a new bitmap.
 
+pack.writeBitmapLookupTable::
+	When true, Git will include a "lookup table" section in the
+	bitmap index (if one is written). This table is used to defer
+	loading individual bitmaps as late as possible. This can be
+	beneficial in repositories that have relatively large bitmap
+	indexes. Defaults to false.
+
 pack.writeReverseIndex::
 	When true, git will write a corresponding .rev file (see:
 	link:../technical/pack-format.html[Documentation/technical/pack-format.txt])
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 5edbb7fe86e..55402b46f41 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -87,6 +87,13 @@ static int git_multi_pack_index_write_config(const char *var, const char *value,
 			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
 	}
 
+	if (!strcmp(var, "pack.writebitmaplookuptable")) {
+		if (git_config_bool(var, value))
+			opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;
+		else
+			opts.flags &= ~MIDX_WRITE_BITMAP_LOOKUP_TABLE;
+	}
+
 	/*
 	 * We should never make a fall-back call to 'git_default_config', since
 	 * this was already called in 'cmd_multi_pack_index()'.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 39e28cfcafc..46e26774963 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3148,6 +3148,14 @@ static int git_pack_config(const char *k, const char *v, void *cb)
 		else
 			write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE;
 	}
+
+	if (!strcmp(k, "pack.writebitmaplookuptable")) {
+		if (git_config_bool(k, v))
+			write_bitmap_options |= BITMAP_OPT_LOOKUP_TABLE;
+		else
+			write_bitmap_options &= ~BITMAP_OPT_LOOKUP_TABLE;
+	}
+
 	if (!strcmp(k, "pack.usebitmaps")) {
 		use_bitmap_index_default = git_config_bool(k, v);
 		return 0;
diff --git a/midx.c b/midx.c
index 5f0dd386b02..9c26d04bfde 100644
--- a/midx.c
+++ b/midx.c
@@ -1072,6 +1072,9 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
 	if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
 		options |= BITMAP_OPT_HASH_CACHE;
 
+	if (flags & MIDX_WRITE_BITMAP_LOOKUP_TABLE)
+		options |= BITMAP_OPT_LOOKUP_TABLE;
+
 	prepare_midx_packing_data(&pdata, ctx);
 
 	commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx);
diff --git a/midx.h b/midx.h
index 22e8e53288e..5578cd7b835 100644
--- a/midx.h
+++ b/midx.h
@@ -47,6 +47,7 @@ struct multi_pack_index {
 #define MIDX_WRITE_REV_INDEX (1 << 1)
 #define MIDX_WRITE_BITMAP (1 << 2)
 #define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
+#define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m);
 void get_midx_filename(struct strbuf *out, const char *object_dir);
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index f775fc1ce69..c0607172827 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -26,22 +26,413 @@ has_any () {
 	grep -Ff "$1" "$2"
 }
 
-setup_bitmap_history
-
-test_expect_success 'setup writing bitmaps during repack' '
-	git config repack.writeBitmaps true
-'
-
-test_expect_success 'full repack creates bitmaps' '
-	GIT_TRACE2_EVENT="$(pwd)/trace" \
+test_bitmap_cases () {
+	writeLookupTable=false
+	for i in "$@"
+	do
+		case "$i" in
+		"pack.writeBitmapLookupTable") writeLookupTable=true;;
+		esac
+	done
+
+	test_expect_success 'setup test repository' '
+		rm -fr * .git &&
+		git init &&
+		git config pack.writeBitmapLookupTable '"$writeLookupTable"'
+	'
+	setup_bitmap_history
+
+	test_expect_success 'setup writing bitmaps during repack' '
+		git config repack.writeBitmaps true
+	'
+
+	test_expect_success 'full repack creates bitmaps' '
+		GIT_TRACE2_EVENT="$(pwd)/trace" \
+			git repack -ad &&
+		ls .git/objects/pack/ | grep bitmap >output &&
+		test_line_count = 1 output &&
+		grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
+		grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
+	'
+
+	basic_bitmap_tests
+
+	test_expect_success 'pack-objects respects --local (non-local loose)' '
+		git init --bare alt.git &&
+		echo $(pwd)/alt.git/objects >.git/objects/info/alternates &&
+		echo content1 >file1 &&
+		# non-local loose object which is not present in bitmapped pack
+		altblob=$(GIT_DIR=alt.git git hash-object -w file1) &&
+		# non-local loose object which is also present in bitmapped pack
+		git cat-file blob $blob | GIT_DIR=alt.git git hash-object -w --stdin &&
+		git add file1 &&
+		test_tick &&
+		git commit -m commit_file1 &&
+		echo HEAD | git pack-objects --local --stdout --revs >1.pack &&
+		git index-pack 1.pack &&
+		list_packed_objects 1.idx >1.objects &&
+		printf "%s\n" "$altblob" "$blob" >nonlocal-loose &&
+		! has_any nonlocal-loose 1.objects
+	'
+
+	test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' '
+		echo content2 >file2 &&
+		blob2=$(git hash-object -w file2) &&
+		git add file2 &&
+		test_tick &&
+		git commit -m commit_file2 &&
+		printf "%s\n" "$blob2" "$bitmaptip" >keepobjects &&
+		pack2=$(git pack-objects pack2 <keepobjects) &&
+		mv pack2-$pack2.* .git/objects/pack/ &&
+		>.git/objects/pack/pack2-$pack2.keep &&
+		rm $(objpath $blob2) &&
+		echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >2a.pack &&
+		git index-pack 2a.pack &&
+		list_packed_objects 2a.idx >2a.objects &&
+		! has_any keepobjects 2a.objects
+	'
+
+	test_expect_success 'pack-objects respects --local (non-local pack)' '
+		mv .git/objects/pack/pack2-$pack2.* alt.git/objects/pack/ &&
+		echo HEAD | git pack-objects --local --stdout --revs >2b.pack &&
+		git index-pack 2b.pack &&
+		list_packed_objects 2b.idx >2b.objects &&
+		! has_any keepobjects 2b.objects
+	'
+
+	test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' '
+		ls .git/objects/pack/ | grep bitmap >output &&
+		test_line_count = 1 output &&
+		packbitmap=$(basename $(cat output) .bitmap) &&
+		list_packed_objects .git/objects/pack/$packbitmap.idx >packbitmap.objects &&
+		test_when_finished "rm -f .git/objects/pack/$packbitmap.keep" &&
+		>.git/objects/pack/$packbitmap.keep &&
+		echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack &&
+		git index-pack 3a.pack &&
+		list_packed_objects 3a.idx >3a.objects &&
+		! has_any packbitmap.objects 3a.objects
+	'
+
+	test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' '
+		mv .git/objects/pack/$packbitmap.* alt.git/objects/pack/ &&
+		rm -f .git/objects/pack/multi-pack-index &&
+		test_when_finished "mv alt.git/objects/pack/$packbitmap.* .git/objects/pack/" &&
+		echo HEAD | git pack-objects --local --stdout --revs >3b.pack &&
+		git index-pack 3b.pack &&
+		list_packed_objects 3b.idx >3b.objects &&
+		! has_any packbitmap.objects 3b.objects
+	'
+
+	test_expect_success 'pack-objects to file can use bitmap' '
+		# make sure we still have 1 bitmap index from previous tests
+		ls .git/objects/pack/ | grep bitmap >output &&
+		test_line_count = 1 output &&
+		# verify equivalent packs are generated with/without using bitmap index
+		packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) &&
+		packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) &&
+		list_packed_objects packa-$packasha1.idx >packa.objects &&
+		list_packed_objects packb-$packbsha1.idx >packb.objects &&
+		test_cmp packa.objects packb.objects
+	'
+
+	test_expect_success 'full repack, reusing previous bitmaps' '
 		git repack -ad &&
-	ls .git/objects/pack/ | grep bitmap >output &&
-	test_line_count = 1 output &&
-	grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
-	grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
-'
+		ls .git/objects/pack/ | grep bitmap >output &&
+		test_line_count = 1 output
+	'
+
+	test_expect_success 'fetch (full bitmap)' '
+		git --git-dir=clone.git fetch origin second:second &&
+		git rev-parse HEAD >expect &&
+		git --git-dir=clone.git rev-parse HEAD >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success 'create objects for missing-HAVE tests' '
+		blob=$(echo "missing have" | git hash-object -w --stdin) &&
+		tree=$(printf "100644 blob $blob\tfile\n" | git mktree) &&
+		parent=$(echo parent | git commit-tree $tree) &&
+		commit=$(echo commit | git commit-tree $tree -p $parent) &&
+		cat >revs <<-EOF
+		HEAD
+		^HEAD^
+		^$commit
+		EOF
+	'
+
+	test_expect_success 'pack-objects respects --incremental' '
+		cat >revs2 <<-EOF &&
+		HEAD
+		$commit
+		EOF
+		git pack-objects --incremental --stdout --revs <revs2 >4.pack &&
+		git index-pack 4.pack &&
+		list_packed_objects 4.idx >4.objects &&
+		test_line_count = 4 4.objects &&
+		git rev-list --objects $commit >revlist &&
+		cut -d" " -f1 revlist |sort >objects &&
+		test_cmp 4.objects objects
+	'
+
+	test_expect_success 'pack with missing blob' '
+		rm $(objpath $blob) &&
+		git pack-objects --stdout --revs <revs >/dev/null
+	'
+
+	test_expect_success 'pack with missing tree' '
+		rm $(objpath $tree) &&
+		git pack-objects --stdout --revs <revs >/dev/null
+	'
+
+	test_expect_success 'pack with missing parent' '
+		rm $(objpath $parent) &&
+		git pack-objects --stdout --revs <revs >/dev/null
+	'
+
+	test_expect_success JGIT,SHA1 'we can read jgit bitmaps' '
+		git clone --bare . compat-jgit.git &&
+		(
+			cd compat-jgit.git &&
+			rm -f objects/pack/*.bitmap &&
+			jgit gc &&
+			git rev-list --test-bitmap HEAD
+		)
+	'
+
+	test_expect_success JGIT,SHA1 'jgit can read our bitmaps' '
+		git clone --bare . compat-us.git &&
+		(
+			cd compat-us.git &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+			git repack -adb &&
+			# jgit gc will barf if it does not like our bitmaps
+			jgit gc
+		)
+	'
+
+	test_expect_success 'splitting packs does not generate bogus bitmaps' '
+		test-tool genrandom foo $((1024 * 1024)) >rand &&
+		git add rand &&
+		git commit -m "commit with big file" &&
+		git -c pack.packSizeLimit=500k repack -adb &&
+		git init --bare no-bitmaps.git &&
+		git -C no-bitmaps.git fetch .. HEAD
+	'
+
+	test_expect_success 'set up reusable pack' '
+		rm -f .git/objects/pack/*.keep &&
+		git repack -adb &&
+		reusable_pack () {
+			git for-each-ref --format="%(objectname)" |
+			git pack-objects --delta-base-offset --revs --stdout "$@"
+		}
+	'
+
+	test_expect_success 'pack reuse respects --honor-pack-keep' '
+		test_when_finished "rm -f .git/objects/pack/*.keep" &&
+		for i in .git/objects/pack/*.pack
+		do
+			>${i%.pack}.keep || return 1
+		done &&
+		reusable_pack --honor-pack-keep >empty.pack &&
+		git index-pack empty.pack &&
+		git show-index <empty.idx >actual &&
+		test_must_be_empty actual
+	'
+
+	test_expect_success 'pack reuse respects --local' '
+		mv .git/objects/pack/* alt.git/objects/pack/ &&
+		test_when_finished "mv alt.git/objects/pack/* .git/objects/pack/" &&
+		reusable_pack --local >empty.pack &&
+		git index-pack empty.pack &&
+		git show-index <empty.idx >actual &&
+		test_must_be_empty actual
+	'
+
+	test_expect_success 'pack reuse respects --incremental' '
+		reusable_pack --incremental >empty.pack &&
+		git index-pack empty.pack &&
+		git show-index <empty.idx >actual &&
+		test_must_be_empty actual
+	'
+
+	test_expect_success 'truncated bitmap fails gracefully (ewah)' '
+		test_config pack.writebitmaphashcache false &&
+		git repack -ad &&
+		git rev-list --use-bitmap-index --count --all >expect &&
+		bitmap=$(ls .git/objects/pack/*.bitmap) &&
+		test_when_finished "rm -f $bitmap" &&
+		test_copy_bytes 256 <$bitmap >$bitmap.tmp &&
+		mv -f $bitmap.tmp $bitmap &&
+		git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
+		test_cmp expect actual &&
+		test_i18ngrep corrupt.ewah.bitmap stderr
+	'
+
+	test_expect_success 'truncated bitmap fails gracefully (cache)' '
+		git repack -ad &&
+		git rev-list --use-bitmap-index --count --all >expect &&
+		bitmap=$(ls .git/objects/pack/*.bitmap) &&
+		test_when_finished "rm -f $bitmap" &&
+		test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
+		mv -f $bitmap.tmp $bitmap &&
+		git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
+		test_cmp expect actual &&
+		test_i18ngrep corrupted.bitmap.index stderr
+	'
+
+	# Create a state of history with these properties:
+	#
+	#  - refs that allow a client to fetch some new history, while sharing some old
+	#    history with the server; we use branches delta-reuse-old and
+	#    delta-reuse-new here
+	#
+	#  - the new history contains an object that is stored on the server as a delta
+	#    against a base that is in the old history
+	#
+	#  - the base object is not immediately reachable from the tip of the old
+	#    history; finding it would involve digging down through history we know the
+	#    other side has
+	#
+	# This should result in a state where fetching from old->new would not
+	# traditionally reuse the on-disk delta (because we'd have to dig to realize
+	# that the client has it), but we will do so if bitmaps can tell us cheaply
+	# that the other side has it.
+	test_expect_success 'set up thin delta-reuse parent' '
+		# This first commit contains the buried base object.
+		test-tool genrandom delta 16384 >file &&
+		git add file &&
+		git commit -m "delta base" &&
+		base=$(git rev-parse --verify HEAD:file) &&
+
+		# These intermediate commits bury the base back in history.
+		# This becomes the "old" state.
+		for i in 1 2 3 4 5
+		do
+			echo $i >file &&
+			git commit -am "intermediate $i" || return 1
+		done &&
+		git branch delta-reuse-old &&
+
+		# And now our new history has a delta against the buried base. Note
+		# that this must be smaller than the original file, since pack-objects
+		# prefers to create deltas from smaller objects to larger.
+		test-tool genrandom delta 16300 >file &&
+		git commit -am "delta result" &&
+		delta=$(git rev-parse --verify HEAD:file) &&
+		git branch delta-reuse-new &&
+
+		# Repack with bitmaps and double check that we have the expected delta
+		# relationship.
+		git repack -adb &&
+		have_delta $delta $base
+	'
+
+	# Now we can sanity-check the non-bitmap behavior (that the server is not able
+	# to reuse the delta). This isn't strictly something we care about, so this
+	# test could be scrapped in the future. But it makes sure that the next test is
+	# actually triggering the feature we want.
+	#
+	# Note that our tools for working with on-the-wire "thin" packs are limited. So
+	# we actually perform the fetch, retain the resulting pack, and inspect the
+	# result.
+	test_expect_success 'fetch without bitmaps ignores delta against old base' '
+		test_config pack.usebitmaps false &&
+		test_when_finished "rm -rf client.git" &&
+		git init --bare client.git &&
+		(
+			cd client.git &&
+			git config transfer.unpackLimit 1 &&
+			git fetch .. delta-reuse-old:delta-reuse-old &&
+			git fetch .. delta-reuse-new:delta-reuse-new &&
+			have_delta $delta $ZERO_OID
+		)
+	'
+
+	# And do the same for the bitmap case, where we do expect to find the delta.
+	test_expect_success 'fetch with bitmaps can reuse old base' '
+		test_config pack.usebitmaps true &&
+		test_when_finished "rm -rf client.git" &&
+		git init --bare client.git &&
+		(
+			cd client.git &&
+			git config transfer.unpackLimit 1 &&
+			git fetch .. delta-reuse-old:delta-reuse-old &&
+			git fetch .. delta-reuse-new:delta-reuse-new &&
+			have_delta $delta $base
+		)
+	'
+
+	test_expect_success 'pack.preferBitmapTips' '
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+
+			# create enough commits that not all are receive bitmap
+			# coverage even if they are all at the tip of some reference.
+			test_commit_bulk --message="%s" 103 &&
+
+			git rev-list HEAD >commits.raw &&
+			sort <commits.raw >commits &&
+
+			git log --format="create refs/tags/%s %H" HEAD >refs &&
+			git update-ref --stdin <refs &&
+
+			git repack -adb &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+
+			# remember which commits did not receive bitmaps
+			comm -13 bitmaps commits >before &&
+			test_file_not_empty before &&
+
+			# mark the commits which did not receive bitmaps as preferred,
+			# and generate the bitmap again
+			perl -pe "s{^}{create refs/tags/include/$. }" <before |
+				git update-ref --stdin &&
+			git -c pack.preferBitmapTips=refs/tags/include repack -adb &&
+
+			# finally, check that the commit(s) without bitmap coverage
+			# are not the same ones as before
+			test-tool bitmap list-commits | sort >bitmaps &&
+			comm -13 bitmaps commits >after &&
+
+			! test_cmp before after
+		)
+	'
+
+	test_expect_success 'complains about multiple pack bitmaps' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+
+			test_commit base &&
+
+			git repack -adb &&
+			bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
+			mv "$bitmap" "$bitmap.bak" &&
+
+			test_commit other &&
+			git repack -ab &&
+
+			mv "$bitmap.bak" "$bitmap" &&
+
+			find .git/objects/pack -type f -name "*.pack" >packs &&
+			find .git/objects/pack -type f -name "*.bitmap" >bitmaps &&
+			test_line_count = 2 packs &&
+			test_line_count = 2 bitmaps &&
+
+			git rev-list --use-bitmap-index HEAD 2>err &&
+			grep "ignoring extra bitmap file" err
+		)
+	'
+}
 
-basic_bitmap_tests
+test_bitmap_cases
 
 test_expect_success 'incremental repack fails when bitmaps are requested' '
 	test_commit more-1 &&
@@ -54,375 +445,12 @@ test_expect_success 'incremental repack can disable bitmaps' '
 	git repack -d --no-write-bitmap-index
 '
 
-test_expect_success 'pack-objects respects --local (non-local loose)' '
-	git init --bare alt.git &&
-	echo $(pwd)/alt.git/objects >.git/objects/info/alternates &&
-	echo content1 >file1 &&
-	# non-local loose object which is not present in bitmapped pack
-	altblob=$(GIT_DIR=alt.git git hash-object -w file1) &&
-	# non-local loose object which is also present in bitmapped pack
-	git cat-file blob $blob | GIT_DIR=alt.git git hash-object -w --stdin &&
-	git add file1 &&
-	test_tick &&
-	git commit -m commit_file1 &&
-	echo HEAD | git pack-objects --local --stdout --revs >1.pack &&
-	git index-pack 1.pack &&
-	list_packed_objects 1.idx >1.objects &&
-	printf "%s\n" "$altblob" "$blob" >nonlocal-loose &&
-	! has_any nonlocal-loose 1.objects
-'
-
-test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' '
-	echo content2 >file2 &&
-	blob2=$(git hash-object -w file2) &&
-	git add file2 &&
-	test_tick &&
-	git commit -m commit_file2 &&
-	printf "%s\n" "$blob2" "$bitmaptip" >keepobjects &&
-	pack2=$(git pack-objects pack2 <keepobjects) &&
-	mv pack2-$pack2.* .git/objects/pack/ &&
-	>.git/objects/pack/pack2-$pack2.keep &&
-	rm $(objpath $blob2) &&
-	echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >2a.pack &&
-	git index-pack 2a.pack &&
-	list_packed_objects 2a.idx >2a.objects &&
-	! has_any keepobjects 2a.objects
-'
-
-test_expect_success 'pack-objects respects --local (non-local pack)' '
-	mv .git/objects/pack/pack2-$pack2.* alt.git/objects/pack/ &&
-	echo HEAD | git pack-objects --local --stdout --revs >2b.pack &&
-	git index-pack 2b.pack &&
-	list_packed_objects 2b.idx >2b.objects &&
-	! has_any keepobjects 2b.objects
-'
-
-test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' '
-	ls .git/objects/pack/ | grep bitmap >output &&
-	test_line_count = 1 output &&
-	packbitmap=$(basename $(cat output) .bitmap) &&
-	list_packed_objects .git/objects/pack/$packbitmap.idx >packbitmap.objects &&
-	test_when_finished "rm -f .git/objects/pack/$packbitmap.keep" &&
-	>.git/objects/pack/$packbitmap.keep &&
-	echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack &&
-	git index-pack 3a.pack &&
-	list_packed_objects 3a.idx >3a.objects &&
-	! has_any packbitmap.objects 3a.objects
-'
-
-test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' '
-	mv .git/objects/pack/$packbitmap.* alt.git/objects/pack/ &&
-	rm -f .git/objects/pack/multi-pack-index &&
-	test_when_finished "mv alt.git/objects/pack/$packbitmap.* .git/objects/pack/" &&
-	echo HEAD | git pack-objects --local --stdout --revs >3b.pack &&
-	git index-pack 3b.pack &&
-	list_packed_objects 3b.idx >3b.objects &&
-	! has_any packbitmap.objects 3b.objects
-'
-
-test_expect_success 'pack-objects to file can use bitmap' '
-	# make sure we still have 1 bitmap index from previous tests
-	ls .git/objects/pack/ | grep bitmap >output &&
-	test_line_count = 1 output &&
-	# verify equivalent packs are generated with/without using bitmap index
-	packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) &&
-	packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) &&
-	list_packed_objects packa-$packasha1.idx >packa.objects &&
-	list_packed_objects packb-$packbsha1.idx >packb.objects &&
-	test_cmp packa.objects packb.objects
-'
-
-test_expect_success 'full repack, reusing previous bitmaps' '
-	git repack -ad &&
-	ls .git/objects/pack/ | grep bitmap >output &&
-	test_line_count = 1 output
-'
-
-test_expect_success 'fetch (full bitmap)' '
-	git --git-dir=clone.git fetch origin second:second &&
-	git rev-parse HEAD >expect &&
-	git --git-dir=clone.git rev-parse HEAD >actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'create objects for missing-HAVE tests' '
-	blob=$(echo "missing have" | git hash-object -w --stdin) &&
-	tree=$(printf "100644 blob $blob\tfile\n" | git mktree) &&
-	parent=$(echo parent | git commit-tree $tree) &&
-	commit=$(echo commit | git commit-tree $tree -p $parent) &&
-	cat >revs <<-EOF
-	HEAD
-	^HEAD^
-	^$commit
-	EOF
-'
-
-test_expect_success 'pack-objects respects --incremental' '
-	cat >revs2 <<-EOF &&
-	HEAD
-	$commit
-	EOF
-	git pack-objects --incremental --stdout --revs <revs2 >4.pack &&
-	git index-pack 4.pack &&
-	list_packed_objects 4.idx >4.objects &&
-	test_line_count = 4 4.objects &&
-	git rev-list --objects $commit >revlist &&
-	cut -d" " -f1 revlist |sort >objects &&
-	test_cmp 4.objects objects
-'
-
-test_expect_success 'pack with missing blob' '
-	rm $(objpath $blob) &&
-	git pack-objects --stdout --revs <revs >/dev/null
-'
+test_bitmap_cases "pack.writeBitmapLookupTable"
 
-test_expect_success 'pack with missing tree' '
-	rm $(objpath $tree) &&
-	git pack-objects --stdout --revs <revs >/dev/null
-'
-
-test_expect_success 'pack with missing parent' '
-	rm $(objpath $parent) &&
-	git pack-objects --stdout --revs <revs >/dev/null
-'
-
-test_expect_success JGIT,SHA1 'we can read jgit bitmaps' '
-	git clone --bare . compat-jgit.git &&
-	(
-		cd compat-jgit.git &&
-		rm -f objects/pack/*.bitmap &&
-		jgit gc &&
-		git rev-list --test-bitmap HEAD
-	)
-'
-
-test_expect_success JGIT,SHA1 'jgit can read our bitmaps' '
-	git clone --bare . compat-us.git &&
-	(
-		cd compat-us.git &&
-		git repack -adb &&
-		# jgit gc will barf if it does not like our bitmaps
-		jgit gc
-	)
-'
-
-test_expect_success 'splitting packs does not generate bogus bitmaps' '
-	test-tool genrandom foo $((1024 * 1024)) >rand &&
-	git add rand &&
-	git commit -m "commit with big file" &&
-	git -c pack.packSizeLimit=500k repack -adb &&
-	git init --bare no-bitmaps.git &&
-	git -C no-bitmaps.git fetch .. HEAD
-'
-
-test_expect_success 'set up reusable pack' '
-	rm -f .git/objects/pack/*.keep &&
-	git repack -adb &&
-	reusable_pack () {
-		git for-each-ref --format="%(objectname)" |
-		git pack-objects --delta-base-offset --revs --stdout "$@"
-	}
-'
-
-test_expect_success 'pack reuse respects --honor-pack-keep' '
-	test_when_finished "rm -f .git/objects/pack/*.keep" &&
-	for i in .git/objects/pack/*.pack
-	do
-		>${i%.pack}.keep || return 1
-	done &&
-	reusable_pack --honor-pack-keep >empty.pack &&
-	git index-pack empty.pack &&
-	git show-index <empty.idx >actual &&
-	test_must_be_empty actual
-'
-
-test_expect_success 'pack reuse respects --local' '
-	mv .git/objects/pack/* alt.git/objects/pack/ &&
-	test_when_finished "mv alt.git/objects/pack/* .git/objects/pack/" &&
-	reusable_pack --local >empty.pack &&
-	git index-pack empty.pack &&
-	git show-index <empty.idx >actual &&
-	test_must_be_empty actual
-'
-
-test_expect_success 'pack reuse respects --incremental' '
-	reusable_pack --incremental >empty.pack &&
-	git index-pack empty.pack &&
-	git show-index <empty.idx >actual &&
-	test_must_be_empty actual
-'
-
-test_expect_success 'truncated bitmap fails gracefully (ewah)' '
-	test_config pack.writebitmaphashcache false &&
-	git repack -ad &&
-	git rev-list --use-bitmap-index --count --all >expect &&
-	bitmap=$(ls .git/objects/pack/*.bitmap) &&
-	test_when_finished "rm -f $bitmap" &&
-	test_copy_bytes 256 <$bitmap >$bitmap.tmp &&
-	mv -f $bitmap.tmp $bitmap &&
-	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
-	test_cmp expect actual &&
-	test_i18ngrep corrupt.ewah.bitmap stderr
-'
-
-test_expect_success 'truncated bitmap fails gracefully (cache)' '
-	git repack -ad &&
-	git rev-list --use-bitmap-index --count --all >expect &&
-	bitmap=$(ls .git/objects/pack/*.bitmap) &&
-	test_when_finished "rm -f $bitmap" &&
-	test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
-	mv -f $bitmap.tmp $bitmap &&
-	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
-	test_cmp expect actual &&
-	test_i18ngrep corrupted.bitmap.index stderr
-'
-
-# Create a state of history with these properties:
-#
-#  - refs that allow a client to fetch some new history, while sharing some old
-#    history with the server; we use branches delta-reuse-old and
-#    delta-reuse-new here
-#
-#  - the new history contains an object that is stored on the server as a delta
-#    against a base that is in the old history
-#
-#  - the base object is not immediately reachable from the tip of the old
-#    history; finding it would involve digging down through history we know the
-#    other side has
-#
-# This should result in a state where fetching from old->new would not
-# traditionally reuse the on-disk delta (because we'd have to dig to realize
-# that the client has it), but we will do so if bitmaps can tell us cheaply
-# that the other side has it.
-test_expect_success 'set up thin delta-reuse parent' '
-	# This first commit contains the buried base object.
-	test-tool genrandom delta 16384 >file &&
-	git add file &&
-	git commit -m "delta base" &&
-	base=$(git rev-parse --verify HEAD:file) &&
-
-	# These intermediate commits bury the base back in history.
-	# This becomes the "old" state.
-	for i in 1 2 3 4 5
-	do
-		echo $i >file &&
-		git commit -am "intermediate $i" || return 1
-	done &&
-	git branch delta-reuse-old &&
-
-	# And now our new history has a delta against the buried base. Note
-	# that this must be smaller than the original file, since pack-objects
-	# prefers to create deltas from smaller objects to larger.
-	test-tool genrandom delta 16300 >file &&
-	git commit -am "delta result" &&
-	delta=$(git rev-parse --verify HEAD:file) &&
-	git branch delta-reuse-new &&
-
-	# Repack with bitmaps and double check that we have the expected delta
-	# relationship.
-	git repack -adb &&
-	have_delta $delta $base
-'
-
-# Now we can sanity-check the non-bitmap behavior (that the server is not able
-# to reuse the delta). This isn't strictly something we care about, so this
-# test could be scrapped in the future. But it makes sure that the next test is
-# actually triggering the feature we want.
-#
-# Note that our tools for working with on-the-wire "thin" packs are limited. So
-# we actually perform the fetch, retain the resulting pack, and inspect the
-# result.
-test_expect_success 'fetch without bitmaps ignores delta against old base' '
-	test_config pack.usebitmaps false &&
-	test_when_finished "rm -rf client.git" &&
-	git init --bare client.git &&
-	(
-		cd client.git &&
-		git config transfer.unpackLimit 1 &&
-		git fetch .. delta-reuse-old:delta-reuse-old &&
-		git fetch .. delta-reuse-new:delta-reuse-new &&
-		have_delta $delta $ZERO_OID
-	)
-'
-
-# And do the same for the bitmap case, where we do expect to find the delta.
-test_expect_success 'fetch with bitmaps can reuse old base' '
-	test_config pack.usebitmaps true &&
-	test_when_finished "rm -rf client.git" &&
-	git init --bare client.git &&
-	(
-		cd client.git &&
-		git config transfer.unpackLimit 1 &&
-		git fetch .. delta-reuse-old:delta-reuse-old &&
-		git fetch .. delta-reuse-new:delta-reuse-new &&
-		have_delta $delta $base
-	)
-'
-
-test_expect_success 'pack.preferBitmapTips' '
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
-
-		# create enough commits that not all are receive bitmap
-		# coverage even if they are all at the tip of some reference.
-		test_commit_bulk --message="%s" 103 &&
-
-		git rev-list HEAD >commits.raw &&
-		sort <commits.raw >commits &&
-
-		git log --format="create refs/tags/%s %H" HEAD >refs &&
-		git update-ref --stdin <refs &&
-
-		git repack -adb &&
-		test-tool bitmap list-commits | sort >bitmaps &&
-
-		# remember which commits did not receive bitmaps
-		comm -13 bitmaps commits >before &&
-		test_file_not_empty before &&
-
-		# mark the commits which did not receive bitmaps as preferred,
-		# and generate the bitmap again
-		perl -pe "s{^}{create refs/tags/include/$. }" <before |
-			git update-ref --stdin &&
-		git -c pack.preferBitmapTips=refs/tags/include repack -adb &&
-
-		# finally, check that the commit(s) without bitmap coverage
-		# are not the same ones as before
-		test-tool bitmap list-commits | sort >bitmaps &&
-		comm -13 bitmaps commits >after &&
-
-		! test_cmp before after
-	)
-'
-
-test_expect_success 'complains about multiple pack bitmaps' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
-
-		test_commit base &&
-
-		git repack -adb &&
-		bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
-		mv "$bitmap" "$bitmap.bak" &&
-
-		test_commit other &&
-		git repack -ab &&
-
-		mv "$bitmap.bak" "$bitmap" &&
-
-		find .git/objects/pack -type f -name "*.pack" >packs &&
-		find .git/objects/pack -type f -name "*.bitmap" >bitmaps &&
-		test_line_count = 2 packs &&
-		test_line_count = 2 bitmaps &&
-
-		git rev-list --use-bitmap-index HEAD 2>err &&
-		grep "ignoring extra bitmap file" err
-	)
+test_expect_success 'verify writing bitmap lookup table when enabled' '
+	GIT_TRACE2_EVENT="$(pwd)/trace2" \
+		git repack -ad &&
+	grep "\"label\":\"writing_lookup_table\"" trace2
 '
 
 test_done
diff --git a/t/t5311-pack-bitmaps-shallow.sh b/t/t5311-pack-bitmaps-shallow.sh
index 872a95df338..f74c6a2da47 100755
--- a/t/t5311-pack-bitmaps-shallow.sh
+++ b/t/t5311-pack-bitmaps-shallow.sh
@@ -17,23 +17,40 @@ test_description='check bitmap operation with shallow repositories'
 # the tree for A. But in a shallow one, we've grafted away
 # A, and fetching A to B requires that the other side send
 # us the tree for file=1.
-test_expect_success 'setup shallow repo' '
-	echo 1 >file &&
-	git add file &&
-	git commit -m orig &&
-	echo 2 >file &&
-	git commit -a -m update &&
-	git clone --no-local --bare --depth=1 . shallow.git &&
-	echo 1 >file &&
-	git commit -a -m repeat
-'
-
-test_expect_success 'turn on bitmaps in the parent' '
-	git repack -adb
-'
-
-test_expect_success 'shallow fetch from bitmapped repo' '
-	(cd shallow.git && git fetch)
-'
+test_shallow_bitmaps () {
+	writeLookupTable=false
+
+	for i in "$@"
+	do
+		case $i in
+		"pack.writeBitmapLookupTable") writeLookupTable=true;;
+		esac
+	done
+
+	test_expect_success 'setup shallow repo' '
+		rm -rf * .git &&
+		git init &&
+		git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+		echo 1 >file &&
+		git add file &&
+		git commit -m orig &&
+		echo 2 >file &&
+		git commit -a -m update &&
+		git clone --no-local --bare --depth=1 . shallow.git &&
+		echo 1 >file &&
+		git commit -a -m repeat
+	'
+
+	test_expect_success 'turn on bitmaps in the parent' '
+		git repack -adb
+	'
+
+	test_expect_success 'shallow fetch from bitmapped repo' '
+		(cd shallow.git && git fetch)
+	'
+}
+
+test_shallow_bitmaps
+
 
 test_done
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 4fe57414c13..3b206adcee6 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -15,17 +15,24 @@ GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
 sane_unset GIT_TEST_MIDX_WRITE_REV
 sane_unset GIT_TEST_MIDX_READ_RIDX
 
-midx_bitmap_core
-
 bitmap_reuse_tests() {
 	from=$1
 	to=$2
+	writeLookupTable=false
+
+	for i in $3-${$#}
+	do
+		case $i in
+		"pack.writeBitmapLookupTable") writeLookupTable=true;;
+		esac
+	done
 
 	test_expect_success "setup pack reuse tests ($from -> $to)" '
 		rm -fr repo &&
 		git init repo &&
 		(
 			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 			test_commit_bulk 16 &&
 			git tag old-tip &&
 
@@ -43,6 +50,7 @@ bitmap_reuse_tests() {
 	test_expect_success "build bitmap from existing ($from -> $to)" '
 		(
 			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 			test_commit_bulk --id=further 16 &&
 			git tag new-tip &&
 
@@ -59,6 +67,7 @@ bitmap_reuse_tests() {
 	test_expect_success "verify resulting bitmaps ($from -> $to)" '
 		(
 			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 			git for-each-ref &&
 			git rev-list --test-bitmap refs/tags/old-tip &&
 			git rev-list --test-bitmap refs/tags/new-tip
@@ -66,244 +75,294 @@ bitmap_reuse_tests() {
 	'
 }
 
-bitmap_reuse_tests 'pack' 'MIDX'
-bitmap_reuse_tests 'MIDX' 'pack'
-bitmap_reuse_tests 'MIDX' 'MIDX'
+test_midx_bitmap_cases () {
+	writeLookupTable=false
+	writeBitmapLookupTable=
+
+	for i in "$@"
+	do
+		case $i in
+		"pack.writeBitmapLookupTable")
+			writeLookupTable=true
+			writeBitmapLookupTable="$i"
+			;;
+		esac
+	done
+
+	test_expect_success 'setup test_repository' '
+		rm -rf * .git &&
+		git init &&
+		git config pack.writeBitmapLookupTable '"$writeLookupTable"'
+	'
 
-test_expect_success 'missing object closure fails gracefully' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+	midx_bitmap_core
 
-		test_commit loose &&
-		test_commit packed &&
+	bitmap_reuse_tests 'pack' 'MIDX' "$writeBitmapLookupTable"
+	bitmap_reuse_tests 'MIDX' 'pack' "$writeBitmapLookupTable"
+	bitmap_reuse_tests 'MIDX' 'MIDX' "$writeBitmapLookupTable"
 
-		# Do not pass "--revs"; we want a pack without the "loose"
-		# commit.
-		git pack-objects $objdir/pack/pack <<-EOF &&
-		$(git rev-parse packed)
-		EOF
+	test_expect_success 'missing object closure fails gracefully' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-		test_must_fail git multi-pack-index write --bitmap 2>err &&
-		grep "doesn.t have full closure" err &&
-		test_path_is_missing $midx
-	)
-'
+			test_commit loose &&
+			test_commit packed &&
 
-midx_bitmap_partial_tests
+			# Do not pass "--revs"; we want a pack without the "loose"
+			# commit.
+			git pack-objects $objdir/pack/pack <<-EOF &&
+			$(git rev-parse packed)
+			EOF
 
-test_expect_success 'removing a MIDX clears stale bitmaps' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
-		test_commit base &&
-		git repack &&
-		git multi-pack-index write --bitmap &&
+			test_must_fail git multi-pack-index write --bitmap 2>err &&
+			grep "doesn.t have full closure" err &&
+			test_path_is_missing $midx
+		)
+	'
 
-		# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
-		stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
-		rm $midx &&
+	midx_bitmap_partial_tests
 
-		# Then write a new MIDX.
-		test_commit new &&
-		git repack &&
-		git multi-pack-index write --bitmap &&
+	test_expect_success 'removing a MIDX clears stale bitmaps' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+			test_commit base &&
+			git repack &&
+			git multi-pack-index write --bitmap &&
+
+			# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
+			stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
+			rm $midx &&
+
+			# Then write a new MIDX.
+			test_commit new &&
+			git repack &&
+			git multi-pack-index write --bitmap &&
+
+			test_path_is_file $midx &&
+			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+			test_path_is_missing $stale_bitmap
+		)
+	'
 
-		test_path_is_file $midx &&
-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
-		test_path_is_missing $stale_bitmap
-	)
-'
+	test_expect_success 'pack.preferBitmapTips' '
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-test_expect_success 'pack.preferBitmapTips' '
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+			test_commit_bulk --message="%s" 103 &&
 
-		test_commit_bulk --message="%s" 103 &&
+			git log --format="%H" >commits.raw &&
+			sort <commits.raw >commits &&
 
-		git log --format="%H" >commits.raw &&
-		sort <commits.raw >commits &&
+			git log --format="create refs/tags/%s %H" HEAD >refs &&
+			git update-ref --stdin <refs &&
 
-		git log --format="create refs/tags/%s %H" HEAD >refs &&
-		git update-ref --stdin <refs &&
+			git multi-pack-index write --bitmap &&
+			test_path_is_file $midx &&
+			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
 
-		git multi-pack-index write --bitmap &&
-		test_path_is_file $midx &&
-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			comm -13 bitmaps commits >before &&
+			test_line_count = 1 before &&
 
-		test-tool bitmap list-commits | sort >bitmaps &&
-		comm -13 bitmaps commits >before &&
-		test_line_count = 1 before &&
+			perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
+				<before | git update-ref --stdin &&
 
-		perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
-			<before | git update-ref --stdin &&
+			rm -fr $midx-$(midx_checksum $objdir).bitmap &&
+			rm -fr $midx &&
 
-		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
-		rm -fr $midx &&
+			git -c pack.preferBitmapTips=refs/tags/include \
+				multi-pack-index write --bitmap &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			comm -13 bitmaps commits >after &&
 
-		git -c pack.preferBitmapTips=refs/tags/include \
-			multi-pack-index write --bitmap &&
-		test-tool bitmap list-commits | sort >bitmaps &&
-		comm -13 bitmaps commits >after &&
+			! test_cmp before after
+		)
+	'
 
-		! test_cmp before after
-	)
-'
+	test_expect_success 'writing a bitmap with --refs-snapshot' '
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-test_expect_success 'writing a bitmap with --refs-snapshot' '
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+			test_commit one &&
+			test_commit two &&
 
-		test_commit one &&
-		test_commit two &&
+			git rev-parse one >snapshot &&
 
-		git rev-parse one >snapshot &&
+			git repack -ad &&
 
-		git repack -ad &&
+			# First, write a MIDX which see both refs/tags/one and
+			# refs/tags/two (causing both of those commits to receive
+			# bitmaps).
+			git multi-pack-index write --bitmap &&
 
-		# First, write a MIDX which see both refs/tags/one and
-		# refs/tags/two (causing both of those commits to receive
-		# bitmaps).
-		git multi-pack-index write --bitmap &&
+			test_path_is_file $midx &&
+			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
 
-		test_path_is_file $midx &&
-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			grep "$(git rev-parse one)" bitmaps &&
+			grep "$(git rev-parse two)" bitmaps &&
 
-		test-tool bitmap list-commits | sort >bitmaps &&
-		grep "$(git rev-parse one)" bitmaps &&
-		grep "$(git rev-parse two)" bitmaps &&
+			rm -fr $midx-$(midx_checksum $objdir).bitmap &&
+			rm -fr $midx &&
 
-		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
-		rm -fr $midx &&
+			# Then again, but with a refs snapshot which only sees
+			# refs/tags/one.
+			git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
 
-		# Then again, but with a refs snapshot which only sees
-		# refs/tags/one.
-		git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
+			test_path_is_file $midx &&
+			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
 
-		test_path_is_file $midx &&
-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			grep "$(git rev-parse one)" bitmaps &&
+			! grep "$(git rev-parse two)" bitmaps
+		)
+	'
 
-		test-tool bitmap list-commits | sort >bitmaps &&
-		grep "$(git rev-parse one)" bitmaps &&
-		! grep "$(git rev-parse two)" bitmaps
-	)
-'
+	test_expect_success 'write a bitmap with --refs-snapshot (preferred tips)' '
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-test_expect_success 'write a bitmap with --refs-snapshot (preferred tips)' '
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+			test_commit_bulk --message="%s" 103 &&
 
-		test_commit_bulk --message="%s" 103 &&
+			git log --format="%H" >commits.raw &&
+			sort <commits.raw >commits &&
 
-		git log --format="%H" >commits.raw &&
-		sort <commits.raw >commits &&
+			git log --format="create refs/tags/%s %H" HEAD >refs &&
+			git update-ref --stdin <refs &&
 
-		git log --format="create refs/tags/%s %H" HEAD >refs &&
-		git update-ref --stdin <refs &&
+			git multi-pack-index write --bitmap &&
+			test_path_is_file $midx &&
+			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
 
-		git multi-pack-index write --bitmap &&
-		test_path_is_file $midx &&
-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			comm -13 bitmaps commits >before &&
+			test_line_count = 1 before &&
 
-		test-tool bitmap list-commits | sort >bitmaps &&
-		comm -13 bitmaps commits >before &&
-		test_line_count = 1 before &&
+			(
+				grep -vf before commits.raw &&
+				# mark missing commits as preferred
+				sed "s/^/+/" before
+			) >snapshot &&
 
+			rm -fr $midx-$(midx_checksum $objdir).bitmap &&
+			rm -fr $midx &&
+
+			git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			comm -13 bitmaps commits >after &&
+
+			! test_cmp before after
+		)
+	'
+
+	test_expect_success 'hash-cache values are propagated from pack bitmaps' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
 		(
-			grep -vf before commits.raw &&
-			# mark missing commits as preferred
-			sed "s/^/+/" before
-		) >snapshot &&
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
-		rm -fr $midx &&
+			test_commit base &&
+			test_commit base2 &&
+			git repack -adb &&
 
-		git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
-		test-tool bitmap list-commits | sort >bitmaps &&
-		comm -13 bitmaps commits >after &&
+			test-tool bitmap dump-hashes >pack.raw &&
+			test_file_not_empty pack.raw &&
+			sort pack.raw >pack.hashes &&
 
-		! test_cmp before after
-	)
-'
+			test_commit new &&
+			git repack &&
+			git multi-pack-index write --bitmap &&
 
-test_expect_success 'hash-cache values are propagated from pack bitmaps' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+			test-tool bitmap dump-hashes >midx.raw &&
+			sort midx.raw >midx.hashes &&
 
-		test_commit base &&
-		test_commit base2 &&
-		git repack -adb &&
+			# ensure that every namehash in the pack bitmap can be found in
+			# the midx bitmap (i.e., that there are no oid-namehash pairs
+			# unique to the pack bitmap).
+			comm -23 pack.hashes midx.hashes >dropped.hashes &&
+			test_must_be_empty dropped.hashes
+		)
+	'
 
-		test-tool bitmap dump-hashes >pack.raw &&
-		test_file_not_empty pack.raw &&
-		sort pack.raw >pack.hashes &&
+	test_expect_success 'no .bitmap is written without any objects' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-		test_commit new &&
-		git repack &&
-		git multi-pack-index write --bitmap &&
+			empty="$(git pack-objects $objdir/pack/pack </dev/null)" &&
+			cat >packs <<-EOF &&
+			pack-$empty.idx
+			EOF
 
-		test-tool bitmap dump-hashes >midx.raw &&
-		sort midx.raw >midx.hashes &&
+			git multi-pack-index write --bitmap --stdin-packs \
+				<packs 2>err &&
 
-		# ensure that every namehash in the pack bitmap can be found in
-		# the midx bitmap (i.e., that there are no oid-namehash pairs
-		# unique to the pack bitmap).
-		comm -23 pack.hashes midx.hashes >dropped.hashes &&
-		test_must_be_empty dropped.hashes
-	)
-'
+			grep "bitmap without any objects" err &&
 
-test_expect_success 'no .bitmap is written without any objects' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+			test_path_is_file $midx &&
+			test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
+		)
+	'
+
+	test_expect_success 'graceful fallback when missing reverse index' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-		empty="$(git pack-objects $objdir/pack/pack </dev/null)" &&
-		cat >packs <<-EOF &&
-		pack-$empty.idx
-		EOF
+			test_commit base &&
 
-		git multi-pack-index write --bitmap --stdin-packs \
-			<packs 2>err &&
+			# write a pack and MIDX bitmap containing base
+			git repack -adb &&
+			git multi-pack-index write --bitmap &&
 
-		grep "bitmap without any objects" err &&
+			GIT_TEST_MIDX_READ_RIDX=0 \
+				git rev-list --use-bitmap-index HEAD 2>err &&
+			! grep "ignoring extra bitmap file" err
+		)
+	'
+}
 
-		test_path_is_file $midx &&
-		test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
-	)
-'
+test_midx_bitmap_cases
+
+test_midx_bitmap_cases "pack.writeBitmapLookupTable"
 
-test_expect_success 'graceful fallback when missing reverse index' '
+test_expect_success 'multi-pack-index write writes lookup table if enabled' '
 	rm -fr repo &&
 	git init repo &&
 	test_when_finished "rm -fr repo" &&
 	(
 		cd repo &&
-
 		test_commit base &&
-
-		# write a pack and MIDX bitmap containing base
-		git repack -adb &&
-		git multi-pack-index write --bitmap &&
-
-		GIT_TEST_MIDX_READ_RIDX=0 \
-			git rev-list --use-bitmap-index HEAD 2>err &&
-		! grep "ignoring extra bitmap file" err
+		git config pack.writeBitmapLookupTable true &&
+		git repack -ad &&
+		GIT_TRACE2_EVENT="$(pwd)/trace" \
+			git multi-pack-index write --bitmap &&
+		grep "\"label\":\"writing_lookup_table\"" trace
 	)
 '
 
diff --git a/t/t5327-multi-pack-bitmaps-rev.sh b/t/t5327-multi-pack-bitmaps-rev.sh
index d30ba632c87..d01c61c0c7e 100755
--- a/t/t5327-multi-pack-bitmaps-rev.sh
+++ b/t/t5327-multi-pack-bitmaps-rev.sh
@@ -20,4 +20,13 @@ export GIT_TEST_MIDX_READ_RIDX
 midx_bitmap_core rev
 midx_bitmap_partial_tests rev
 
+test_expect_success 'reinitialize the repository with lookup table enabled' '
+    rm -fr * .git &&
+    git init &&
+    git config pack.writeBitmapLookupTable true
+'
+
+midx_bitmap_core rev
+midx_bitmap_partial_tests rev
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v4 4/6] pack-bitmap: prepare to read lookup table extension
  2022-07-20 14:05     ` [PATCH v4 " Abhradeep Chakraborty via GitGitGadget
                         ` (2 preceding siblings ...)
  2022-07-20 14:05       ` [PATCH v4 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
@ 2022-07-20 14:05       ` Abhradeep Chakraborty via GitGitGadget
  2022-07-20 14:05       ` [PATCH v4 5/6] p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex` Abhradeep Chakraborty via GitGitGadget
                         ` (2 subsequent siblings)
  6 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-07-20 14:05 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Earlier change teaches Git to write bitmap lookup table. But Git
does not know how to parse them.

Teach Git to parse the existing bitmap lookup table. The older
versions of Git are not affected by it. Those versions ignore the
lookup table.

Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 pack-bitmap.c           | 278 ++++++++++++++++++++++++++++++++++++++--
 pack-bitmap.h           |   9 ++
 t/t5310-pack-bitmaps.sh |  22 ++++
 3 files changed, 299 insertions(+), 10 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 36134222d7a..7c66d4379f5 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -82,6 +82,12 @@ struct bitmap_index {
 	/* The checksum of the packfile or MIDX; points into map. */
 	const unsigned char *checksum;
 
+	/*
+	 * If not NULL, this point into the commit table extension
+	 * (within the memory mapped region `map`).
+	 */
+	unsigned char *table_lookup;
+
 	/*
 	 * Extended index.
 	 *
@@ -185,6 +191,16 @@ static int load_bitmap_header(struct bitmap_index *index)
 			index->hashes = (void *)(index_end - cache_size);
 			index_end -= cache_size;
 		}
+
+		if (flags & BITMAP_OPT_LOOKUP_TABLE) {
+			size_t table_size = st_mult(ntohl(header->entry_count),
+						    BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH);
+			if (table_size > index_end - index->map - header_size)
+				return error(_("corrupted bitmap index file (too short to fit lookup table)"));
+			if (git_env_bool("GIT_TEST_READ_COMMIT_TABLE", 1))
+				index->table_lookup = (void *)(index_end - table_size);
+			index_end -= table_size;
+		}
 	}
 
 	index->entry_count = ntohl(header->entry_count);
@@ -211,11 +227,13 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
 
 	hash_pos = kh_put_oid_map(index->bitmaps, stored->oid, &ret);
 
-	/* a 0 return code means the insertion succeeded with no changes,
-	 * because the SHA1 already existed on the map. this is bad, there
-	 * shouldn't be duplicated commits in the index */
+	/*
+	 * A 0 return code means the insertion succeeded with no changes,
+	 * because the SHA1 already existed on the map. This is bad, there
+	 * shouldn't be duplicated commits in the index.
+	 */
 	if (ret == 0) {
-		error("Duplicate entry in bitmap index: %s", oid_to_hex(oid));
+		error(_("duplicate entry in bitmap index: %s"), oid_to_hex(oid));
 		return NULL;
 	}
 
@@ -470,7 +488,7 @@ static int load_bitmap(struct bitmap_index *bitmap_git)
 		!(bitmap_git->tags = read_bitmap_1(bitmap_git)))
 		goto failed;
 
-	if (load_bitmap_entries_v1(bitmap_git) < 0)
+	if (!bitmap_git->table_lookup && load_bitmap_entries_v1(bitmap_git) < 0)
 		goto failed;
 
 	return 0;
@@ -557,13 +575,241 @@ struct include_data {
 	struct bitmap *seen;
 };
 
+struct bitmap_lookup_table_triplet {
+	uint32_t commit_pos;
+	uint64_t offset;
+	uint32_t xor_row;
+};
+
+struct bitmap_lookup_table_xor_item {
+	struct object_id oid;
+	uint64_t offset;
+};
+
+/*
+ * Given a `triplet` struct pointer and pointer `p`, this
+ * function reads the triplet beginning at `p` into the struct.
+ * Note that this function assumes that there is enough memory
+ * left for filling the `triplet` struct from `p`.
+ */
+static int lookup_table_get_triplet_by_pointer(struct bitmap_lookup_table_triplet *triplet,
+					       const unsigned char *p)
+{
+	if (!triplet)
+		return -1;
+
+	triplet->commit_pos = get_be32(p);
+	p += sizeof(uint32_t);
+	triplet->offset = get_be64(p);
+	p += sizeof(uint64_t);
+	triplet->xor_row = get_be32(p);
+	return 0;
+}
+
+/*
+ * This function gets the raw triplet from `row`'th row in the
+ * lookup table and fills that data to the `triplet`.
+ */
+static int lookup_table_get_triplet(struct bitmap_index *bitmap_git,
+				    uint32_t pos,
+				    struct bitmap_lookup_table_triplet *triplet)
+{
+	unsigned char *p = NULL;
+	if (pos >= bitmap_git->entry_count)
+		return error(_("corrupt bitmap lookup table: triplet position out of index"));
+
+	p = bitmap_git->table_lookup + st_mult(pos, BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH);
+
+	return lookup_table_get_triplet_by_pointer(triplet, p);
+}
+
+/*
+ * Searches for a matching triplet. `commit_pos` is a pointer
+ * to the wanted commit position value. `table_entry` points to
+ * a triplet in lookup table. The first 4 bytes of each
+ * triplet (pointed by `table_entry`) are compared with `*commit_pos`.
+ */
+static int triplet_cmp(const void *commit_pos, const void *table_entry)
+{
+
+	uint32_t a = *(uint32_t *)commit_pos;
+	uint32_t b = get_be32(table_entry);
+	if (a > b)
+		return 1;
+	else if (a < b)
+		return -1;
+
+	return 0;
+}
+
+static uint32_t bsearch_pos(struct bitmap_index *bitmap_git,
+			    struct object_id *oid,
+			    uint32_t *result)
+{
+	int found;
+
+	if (bitmap_is_midx(bitmap_git))
+		found = bsearch_midx(oid, bitmap_git->midx, result);
+	else
+		found = bsearch_pack(oid, bitmap_git->pack, result);
+
+	return found;
+}
+
+/*
+ * `bsearch_triplet_by_pos` function searches for the raw triplet
+ * having commit position same as `commit_pos` and fills `triplet`
+ * object from the raw triplet. Returns 1 on success and 0 on
+ * failure.
+ */
+static int bsearch_triplet_by_pos(uint32_t commit_pos,
+				  struct bitmap_index *bitmap_git,
+				  struct bitmap_lookup_table_triplet *triplet)
+{
+	unsigned char *p = bsearch(&commit_pos, bitmap_git->table_lookup, bitmap_git->entry_count,
+				   BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH, triplet_cmp);
+
+	if (!p)
+		return -1;
+
+	return lookup_table_get_triplet_by_pointer(triplet, p);
+}
+
+static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_git,
+					  struct commit *commit)
+{
+	uint32_t commit_pos, xor_row;
+	uint64_t offset;
+	int flags, found;
+	struct bitmap_lookup_table_triplet triplet;
+	struct object_id *oid = &commit->object.oid;
+	struct ewah_bitmap *bitmap;
+	struct stored_bitmap *xor_bitmap = NULL;
+	const int bitmap_header_size = 6;
+	static struct bitmap_lookup_table_xor_item *xor_items = NULL;
+	static size_t xor_items_nr = 0, xor_items_alloc = 0;
+	static int is_corrupt = 0;
+
+	if (is_corrupt)
+		return NULL;
+
+	found = bsearch_pos(bitmap_git, oid, &commit_pos);
+
+	if (!found)
+		return NULL;
+
+	if (bsearch_triplet_by_pos(commit_pos, bitmap_git, &triplet) < 0)
+		return NULL;
+
+	xor_items_nr = 0;
+	offset = triplet.offset;
+	xor_row = triplet.xor_row;
+
+	if (xor_row != 0xffffffff) {
+		int xor_flags;
+		khiter_t hash_pos;
+		uint64_t offset_xor;
+		struct bitmap_lookup_table_xor_item *xor_item;
+
+		while (xor_row != 0xffffffff) {
+			ALLOC_GROW(xor_items, xor_items_nr + 1, xor_items_alloc);
+
+			if (xor_items_nr + 1 >= bitmap_git->entry_count) {
+				error(_("corrupt bitmap lookup table: xor chain exceed entry count"));
+				goto corrupt;
+			}
+
+			if (lookup_table_get_triplet(bitmap_git, xor_row, &triplet) < 0)
+				goto corrupt;
+
+			xor_item = &xor_items[xor_items_nr];
+			xor_item->offset = triplet.offset;
+
+			if (nth_bitmap_object_oid(bitmap_git, &xor_item->oid, triplet.commit_pos) < 0) {
+				error(_("corrupt bitmap lookup table: commit index %u out of range"),
+					triplet.commit_pos);
+				goto corrupt;
+			}
+
+			hash_pos = kh_get_oid_map(bitmap_git->bitmaps, xor_item->oid);
+
+			/*
+			 * If desired bitmap is already stored, we don't need
+			 * to iterate further. Because we know that bitmaps
+			 * that are needed to be parsed to parse this bitmap
+			 * has already been stored. So, assign this stored bitmap
+			 * to the xor_bitmap.
+			 */
+			if (hash_pos < kh_end(bitmap_git->bitmaps) &&
+			    (xor_bitmap = kh_value(bitmap_git->bitmaps, hash_pos)))
+				break;
+			xor_items_nr++;
+			xor_row = triplet.xor_row;
+		}
+
+		while (xor_items_nr) {
+			xor_item = &xor_items[xor_items_nr - 1];
+			offset_xor = xor_item->offset;
+
+			bitmap_git->map_pos = offset_xor;
+			if (bitmap_git->map_size - bitmap_git->map_pos < bitmap_header_size) {
+				error(_("corrupt ewah bitmap: truncated header for bitmap of commit \"%s\""),
+					oid_to_hex(&xor_item->oid));
+				goto corrupt;
+			}
+
+			bitmap_git->map_pos = bitmap_git->map_pos + sizeof(uint32_t) + sizeof(uint8_t);
+			xor_flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
+			bitmap = read_bitmap_1(bitmap_git);
+
+			if (!bitmap)
+				goto corrupt;
+
+			xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid, xor_bitmap, xor_flags);
+			xor_items_nr--;
+		}
+	}
+
+	bitmap_git->map_pos = offset;
+	if (bitmap_git->map_size - bitmap_git->map_pos < bitmap_header_size) {
+		error(_("corrupt ewah bitmap: truncated header for bitmap of commit \"%s\""),
+			oid_to_hex(oid));
+		goto corrupt;
+	}
+
+	bitmap_git->map_pos = bitmap_git->map_pos + sizeof(uint32_t) + sizeof(uint8_t);
+	flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
+	bitmap = read_bitmap_1(bitmap_git);
+
+	if (!bitmap)
+		goto corrupt;
+
+	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags);
+
+corrupt:
+	free(xor_items);
+	is_corrupt = 1;
+	return NULL;
+}
+
 struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
 				      struct commit *commit)
 {
 	khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps,
 					   commit->object.oid);
-	if (hash_pos >= kh_end(bitmap_git->bitmaps))
-		return NULL;
+	if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
+		struct stored_bitmap *bitmap = NULL;
+		if (!bitmap_git->table_lookup)
+			return NULL;
+
+		trace2_region_enter("pack-bitmap", "reading_lookup_table", the_repository);
+		/* NEEDSWORK: cache misses aren't recorded */
+		bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
+		trace2_region_leave("pack-bitmap", "reading_lookup_table", the_repository);
+		if (!bitmap)
+			return NULL;
+		return lookup_stored_bitmap(bitmap);
+	}
 	return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
 }
 
@@ -1699,8 +1945,10 @@ void test_bitmap_walk(struct rev_info *revs)
 	if (revs->pending.nr != 1)
 		die("you must specify exactly one commit to test");
 
-	fprintf(stderr, "Bitmap v%d test (%d entries loaded)\n",
-		bitmap_git->version, bitmap_git->entry_count);
+	fprintf(stderr, "Bitmap v%d test (%d entries%s)",
+		bitmap_git->version,
+		bitmap_git->entry_count,
+		bitmap_git->table_lookup ? "" : " loaded");
 
 	root = revs->pending.objects[0].item;
 	bm = bitmap_for_commit(bitmap_git, (struct commit *)root);
@@ -1753,13 +2001,23 @@ void test_bitmap_walk(struct rev_info *revs)
 
 int test_bitmap_commits(struct repository *r)
 {
-	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
 	struct object_id oid;
 	MAYBE_UNUSED void *value;
+	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
+
+	/*
+	 * As this function is only used to print bitmap selected
+	 * commits, we don't have to read the commit table.
+	 */
 
 	if (!bitmap_git)
 		die("failed to load bitmap indexes");
 
+	if (bitmap_git->table_lookup) {
+		if (load_bitmap_entries_v1(bitmap_git) < 0)
+			die(_("failed to load bitmap indexes"));
+	}
+
 	kh_foreach(bitmap_git->bitmaps, oid, value, {
 		printf("%s\n", oid_to_hex(&oid));
 	});
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 67a9d0fc303..9278f71ac91 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -23,6 +23,15 @@ struct bitmap_disk_header {
 
 #define NEEDS_BITMAP (1u<<22)
 
+/*
+ * The width in bytes of a single triplet in the lookup table
+ * extension:
+ *     (commit_pos, offset, xor_row)
+ *
+ * whose fields ar 32-, 64-, 32- bits wide, respectively.
+ */
+#define BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH (16)
+
 enum pack_bitmap_opts {
 	BITMAP_OPT_FULL_DAG = 0x1,
 	BITMAP_OPT_HASH_CACHE = 0x4,
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index c0607172827..7e50f8e7653 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -258,6 +258,7 @@ test_bitmap_cases () {
 
 	test_expect_success 'truncated bitmap fails gracefully (ewah)' '
 		test_config pack.writebitmaphashcache false &&
+		test_config pack.writebitmaplookuptable false &&
 		git repack -ad &&
 		git rev-list --use-bitmap-index --count --all >expect &&
 		bitmap=$(ls .git/objects/pack/*.bitmap) &&
@@ -270,6 +271,7 @@ test_bitmap_cases () {
 	'
 
 	test_expect_success 'truncated bitmap fails gracefully (cache)' '
+		git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 		git repack -ad &&
 		git rev-list --use-bitmap-index --count --all >expect &&
 		bitmap=$(ls .git/objects/pack/*.bitmap) &&
@@ -453,4 +455,24 @@ test_expect_success 'verify writing bitmap lookup table when enabled' '
 	grep "\"label\":\"writing_lookup_table\"" trace2
 '
 
+test_expect_success 'lookup table is actually used to traverse objects' '
+	git repack -adb &&
+	GIT_TRACE2_EVENT="$(pwd)/trace3" \
+		git rev-list --use-bitmap-index --count --all &&
+	grep "\"label\":\"reading_lookup_table\"" trace3
+'
+
+test_expect_success 'truncated bitmap fails gracefully (lookup table)' '
+	test_config pack.writebitmaphashcache false &&
+	git repack -adb &&
+	git rev-list --use-bitmap-index --count --all >expect &&
+	bitmap=$(ls .git/objects/pack/*.bitmap) &&
+	test_when_finished "rm -f $bitmap" &&
+	test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
+	mv -f $bitmap.tmp $bitmap &&
+	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
+	test_cmp expect actual &&
+	test_i18ngrep corrupted.bitmap.index stderr
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v4 5/6] p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex`
  2022-07-20 14:05     ` [PATCH v4 " Abhradeep Chakraborty via GitGitGadget
                         ` (3 preceding siblings ...)
  2022-07-20 14:05       ` [PATCH v4 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
@ 2022-07-20 14:05       ` Abhradeep Chakraborty via GitGitGadget
  2022-07-20 14:05       ` [PATCH v4 6/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
  2022-07-20 18:38       ` [PATCH v5 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
  6 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-07-20 14:05 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Enable `pack.writeReverseIndex` before running pack-bitmap related
performance tests.

The performance difference with `pack.writeReverseIndex` enabled and
with disabled are given below -

With `pack.writeReverseIndex`
-------------------------------

Test                                                 this tree
-------------------------------------------------------------------------
5310.3: repack to disk                                 296.55(256.53+14.52)
5310.4: simulated clone                                15.64(8.88+1.39)
5310.5: simulated fetch                                1.65(2.75+0.20)
5310.6: pack to file (bitmap)                          48.71(30.20+7.58)
5310.7: rev-list (commits)                             0.61(0.41+0.08)
5310.8: rev-list (objects)                             4.38(4.26+0.09)
5310.9: rev-list with tag negated via --not            0.07(0.02+0.04)
         --all (objects)
5310.10: rev-list with negative tag (objects)          0.05(0.01+0.03)
5310.11: rev-list count with blob:none                 0.08(0.03+0.04)
5310.12: rev-list count with blob:limit=1k             7.29(6.92+0.30)
5310.13: rev-list count with tree:0                    0.08(0.03+0.04)
5310.14: simulated partial clone                       9.45(8.12+0.41)
5310.16: clone (partial bitmap)                        17.02(10.61+2.67)
5310.17: pack to file (partial bitmap)                 51.91(28.57+7.48)
5310.18: rev-list with tree filter (partial bitmap)    1.00(0.22+0.24)

Without `pack.writeReverseIndex`:
-----------------------------

Test                                                  this tree
------------------------------------------------------------------------
5310.3: repack to disk                              293.80(251.30+14.30)
5310.4: simulated clone                             12.50(5.15+1.36)
5310.5: simulated fetch                             1.83(2.90+0.23)
5310.6: pack to file (bitmap)                       39.70(20.25+7.14)
5310.7: rev-list (commits)                          1.00(0.60+0.13)
5310.8: rev-list (objects)                          4.11(4.00+0.10)
5310.9: rev-list with tag negated via --not         0.07(0.02+0.05)
         --all (objects)
5310.10: rev-list with negative tag (objects)       0.23(0.16+0.06)
5310.11: rev-list count with blob:none              0.27(0.18+0.08)
5310.12: rev-list count with blob:limit=1k          6.41(5.98+0.41)
5310.13: rev-list count with tree:0                 0.26(0.18+0.07)
5310.14: simulated partial clone                    4.34(3.29+0.37)
5310.16: clone (partial bitmap)                     21.48(15.12+2.42)
5310.17: pack to file (partial bitmap)              47.35(37.80+4.84)
5310.18: rev-list with tree filter (partial bitmap) 0.73(0.07+0.21)

Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 t/perf/p5310-pack-bitmaps.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index 7ad4f237bc3..6e8abcd5b21 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -13,7 +13,8 @@ test_perf_large_repo
 # We intentionally use the deprecated pack.writebitmaps
 # config so that we can test against older versions of git.
 test_expect_success 'setup bitmap config' '
-	git config pack.writebitmaps true
+	git config pack.writebitmaps true &&
+	git config pack.writeReverseIndex true
 '
 
 # we need to create the tag up front such that it is covered by the repack and
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v4 6/6] bitmap-lookup-table: add performance tests for lookup table
  2022-07-20 14:05     ` [PATCH v4 " Abhradeep Chakraborty via GitGitGadget
                         ` (4 preceding siblings ...)
  2022-07-20 14:05       ` [PATCH v4 5/6] p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex` Abhradeep Chakraborty via GitGitGadget
@ 2022-07-20 14:05       ` Abhradeep Chakraborty via GitGitGadget
  2022-07-20 18:38       ` [PATCH v5 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
  6 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-07-20 14:05 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Add performance tests to verify the performance of lookup table with
`pack.writeReverseIndex` enabled. This is to check the performance
when the above configuration is set.

Lookup table makes Git run faster in most of the cases. Below is the
result of `t/perf/p5310-pack-bitmaps.sh`.`perf/p5326-multi-pack-bitmaps.sh`
gives similar result. The repository used in the test is linux kernel.

Test                                                      this tree
---------------------------------------------------------------------------
5310.4: repack to disk (lookup=false)                   296.55(256.53+14.52)
5310.5: simulated clone                                 15.64(8.88+1.39)
5310.6: simulated fetch                                 1.65(2.75+0.20)
5310.7: pack to file (bitmap)                           48.71(30.20+7.58)
5310.8: rev-list (commits)                              0.61(0.41+0.08)
5310.9: rev-list (objects)                              4.38(4.26+0.09)
5310.10: rev-list with tag negated via --not            0.07(0.02+0.04)
         --all (objects)
5310.11: rev-list with negative tag (objects)           0.05(0.01+0.03)
5310.12: rev-list count with blob:none                  0.08(0.03+0.04)
5310.13: rev-list count with blob:limit=1k              7.29(6.92+0.30)
5310.14: rev-list count with tree:0                     0.08(0.03+0.04)
5310.15: simulated partial clone                        9.45(8.12+0.41)
5310.17: clone (partial bitmap)                         21.00(15.04+2.39)
5310.18: pack to file (partial bitmap)                  47.98(38.13+5.23)
5310.19: rev-list with tree filter (partial bitmap)     0.70(0.07+0.20)
5310.22: repack to disk (lookup=true)                   255.92(188.13+20.47)
5310.23: simulated clone                                13.78(8.84+1.09)
5310.24: simulated fetch                                0.52(0.63+0.14)
5310.25: pack to file (bitmap)                          44.34(28.94+6.84)
5310.26: rev-list (commits)                             0.48(0.31+0.06)
5310.27: rev-list (objects)                             4.02(3.93+0.07)
5310.28: rev-list with tag negated via --not            0.04(0.00+0.03)
         --all (objects)
5310.29: rev-list with negative tag (objects)           0.04(0.00+0.03)
5310.30: rev-list count with blob:none                  0.04(0.01+0.03)
5310.31: rev-list count with blob:limit=1k              6.48(6.23+0.22)
5310.32: rev-list count with tree:0                     0.04(0.01+0.03)
5310.33: simulated partial clone                        8.30(7.21+0.36)
5310.35: clone (partial bitmap)                         20.34(15.00+2.41)
5310.36: pack to file (partial bitmap)                  46.45(38.05+5.20)
5310.37: rev-list with tree filter (partial bitmap)     0.61(0.06+0.20)

Test 4-15 are tested without using lookup table. Same tests are
repeated in 16-30 (using lookup table).

Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 t/perf/p5310-pack-bitmaps.sh       | 65 +++++++++++---------
 t/perf/p5326-multi-pack-bitmaps.sh | 95 +++++++++++++++++-------------
 2 files changed, 91 insertions(+), 69 deletions(-)

diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index 6e8abcd5b21..adc753b6177 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -17,39 +17,50 @@ test_expect_success 'setup bitmap config' '
 	git config pack.writeReverseIndex true
 '
 
-# we need to create the tag up front such that it is covered by the repack and
-# thus by generated bitmaps.
-test_expect_success 'create tags' '
-	git tag --message="tag pointing to HEAD" perf-tag HEAD
-'
+test_bitmap () {
+	local enabled="$1"
 
-test_perf 'repack to disk' '
-	git repack -ad
-'
+	# we need to create the tag up front such that it is covered by the repack and
+	# thus by generated bitmaps.
+	test_expect_success 'create tags' '
+		git tag --message="tag pointing to HEAD" perf-tag HEAD
+	'
 
-test_full_bitmap
+	test_expect_success "use lookup table: $enabled" '
+		git config pack.writeBitmapLookupTable '"$enabled"'
+	'
 
-test_expect_success 'create partial bitmap state' '
-	# pick a commit to represent the repo tip in the past
-	cutoff=$(git rev-list HEAD~100 -1) &&
-	orig_tip=$(git rev-parse HEAD) &&
+	test_perf "repack to disk (lookup=$enabled)" '
+		git repack -ad
+	'
 
-	# now kill off all of the refs and pretend we had
-	# just the one tip
-	rm -rf .git/logs .git/refs/* .git/packed-refs &&
-	git update-ref HEAD $cutoff &&
+	test_full_bitmap
 
-	# and then repack, which will leave us with a nice
-	# big bitmap pack of the "old" history, and all of
-	# the new history will be loose, as if it had been pushed
-	# up incrementally and exploded via unpack-objects
-	git repack -Ad &&
+	test_expect_success "create partial bitmap state (lookup=$enabled)" '
+		# pick a commit to represent the repo tip in the past
+		cutoff=$(git rev-list HEAD~100 -1) &&
+		orig_tip=$(git rev-parse HEAD) &&
 
-	# and now restore our original tip, as if the pushes
-	# had happened
-	git update-ref HEAD $orig_tip
-'
+		# now kill off all of the refs and pretend we had
+		# just the one tip
+		rm -rf .git/logs .git/refs/* .git/packed-refs &&
+		git update-ref HEAD $cutoff &&
+
+		# and then repack, which will leave us with a nice
+		# big bitmap pack of the "old" history, and all of
+		# the new history will be loose, as if it had been pushed
+		# up incrementally and exploded via unpack-objects
+		git repack -Ad &&
+
+		# and now restore our original tip, as if the pushes
+		# had happened
+		git update-ref HEAD $orig_tip
+	'
+
+	test_partial_bitmap
+}
 
-test_partial_bitmap
+test_bitmap false
+test_bitmap true
 
 test_done
diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
index f2fa228f16a..1f4c7103529 100755
--- a/t/perf/p5326-multi-pack-bitmaps.sh
+++ b/t/perf/p5326-multi-pack-bitmaps.sh
@@ -6,47 +6,58 @@ test_description='Tests performance using midx bitmaps'
 
 test_perf_large_repo
 
-# we need to create the tag up front such that it is covered by the repack and
-# thus by generated bitmaps.
-test_expect_success 'create tags' '
-	git tag --message="tag pointing to HEAD" perf-tag HEAD
-'
-
-test_expect_success 'start with bitmapped pack' '
-	git repack -adb
-'
-
-test_perf 'setup multi-pack index' '
-	git multi-pack-index write --bitmap
-'
-
-test_expect_success 'drop pack bitmap' '
-	rm -f .git/objects/pack/pack-*.bitmap
-'
-
-test_full_bitmap
-
-test_expect_success 'create partial bitmap state' '
-	# pick a commit to represent the repo tip in the past
-	cutoff=$(git rev-list HEAD~100 -1) &&
-	orig_tip=$(git rev-parse HEAD) &&
-
-	# now pretend we have just one tip
-	rm -rf .git/logs .git/refs/* .git/packed-refs &&
-	git update-ref HEAD $cutoff &&
-
-	# and then repack, which will leave us with a nice
-	# big bitmap pack of the "old" history, and all of
-	# the new history will be loose, as if it had been pushed
-	# up incrementally and exploded via unpack-objects
-	git repack -Ad &&
-	git multi-pack-index write --bitmap &&
-
-	# and now restore our original tip, as if the pushes
-	# had happened
-	git update-ref HEAD $orig_tip
-'
-
-test_partial_bitmap
+test_bitmap () {
+	local enabled="$1"
+
+	# we need to create the tag up front such that it is covered by the repack and
+	# thus by generated bitmaps.
+	test_expect_success 'create tags' '
+		git tag --message="tag pointing to HEAD" perf-tag HEAD
+	'
+
+	test_expect_success "use lookup table: $enabled" '
+		git config pack.writeBitmapLookupTable '"$enabled"'
+	'
+
+	test_expect_success "start with bitmapped pack (lookup=$enabled)" '
+		git repack -adb
+	'
+
+	test_perf "setup multi-pack index (lookup=$enabled)" '
+		git multi-pack-index write --bitmap
+	'
+
+	test_expect_success "drop pack bitmap (lookup=$enabled)" '
+		rm -f .git/objects/pack/pack-*.bitmap
+	'
+
+	test_full_bitmap
+
+	test_expect_success "create partial bitmap state (lookup=$enabled)" '
+		# pick a commit to represent the repo tip in the past
+		cutoff=$(git rev-list HEAD~100 -1) &&
+		orig_tip=$(git rev-parse HEAD) &&
+
+		# now pretend we have just one tip
+		rm -rf .git/logs .git/refs/* .git/packed-refs &&
+		git update-ref HEAD $cutoff &&
+
+		# and then repack, which will leave us with a nice
+		# big bitmap pack of the "old" history, and all of
+		# the new history will be loose, as if it had been pushed
+		# up incrementally and exploded via unpack-objects
+		git repack -Ad &&
+		git multi-pack-index write --bitmap &&
+
+		# and now restore our original tip, as if the pushes
+		# had happened
+		git update-ref HEAD $orig_tip
+	'
+
+	test_partial_bitmap
+}
+
+test_bitmap false
+test_bitmap true
 
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v5 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format
  2022-07-20 14:05     ` [PATCH v4 " Abhradeep Chakraborty via GitGitGadget
                         ` (5 preceding siblings ...)
  2022-07-20 14:05       ` [PATCH v4 6/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
@ 2022-07-20 18:38       ` Abhradeep Chakraborty via GitGitGadget
  2022-07-20 18:38         ` [PATCH v5 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
                           ` (6 more replies)
  6 siblings, 7 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-07-20 18:38 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Abhradeep Chakraborty

When parsing the .bitmap file, git loads all the bitmaps one by one even if
some of the bitmaps are not necessary. We can remove this overhead by
loading only the necessary bitmaps. A look up table extension can solve this
issue.

Changes since v4:

 * There was a CI failing test for linux-sha256 in the previous version.
   Fixed now.

Changes since v3:

 * The common code from both lookup_table_get_triplet() and
   bsearch_triplet_by_pos are moved to lookup_table_get_triplet_by_pointer
   function
 * parameter names of triplet_cmp function is changes (as suggested by
   Martin)
 * xor_items array is now work as reusable static buffer.
 * I moved the filling commit_positions array part (from
   pack-bitmap-write.c) to bitmap_writer_finish function. Because we had to
   iterate two times for commit positions - one in write_selected_commits_v1
   and another in write_lookup_table function. Hope this is acceptable :)
 * changes in performance tests (as suggested by Taylor)

Changes since v2:

 * Log messages related issues are fixed.
 * pack.writeBitmapLookupTable is now by default disabled.
 * Documentations are improved.
 * xor_row is used instead of xor_pos in triplets.
 * In pack-bitmap-write.c, off_t * is used for offsets array (Instead of
   uint64_t *).
 * struct bitmap_lookup_table_triplet is introduced and functions Like
   triplet_get_offset() and triplet_get_xor_pos() are removed.
 * table_size is getting subtracted from index_end irrespective of the value
   of GIT_TEST_READ_COMMIT_TABLE.
 * xor stack filling loop will stop iterating if a xor bitmap is already
   stored/parsed.
 * The stack will now store bitmap_lookup_table_xor_item items Of plain
   xor_row.
 * bitmap related test files are reformatted to allow repeating of tests
   with bitmap extension enabled.
 * comments are added.

Changes since v1:

This is the second version which addressed all (I think) the reviews. Please
notify me if some reviews are not addressed :)

 * The table size is decreased and the format has also changed. It now
   contains nr_entries triplets of size 4+8+4 bytes. Each triplet contains
   the following things - (1) 4 byte commit position (in the pack-index or
   midx) (2) 8 byte offset and (3) 4 byte xor triplet (i.e. with whose
   bitmap the current triplet's bitmap has to xor) position.
 * Performance tests are splitted into two commits. First contains the
   actual performance tests and second enables the pack.writeReverseIndex
   (as suggested by Taylor).
 * st_*() functions are used.
 * commit order is changed according to Derrick's suggestion.
 * Iterative approach is used instead of recursive approach to parse xor
   bitmaps. (As suggested by Derrick).
 * Some minor bug fixes of previous version.

Initial version:

The proposed table has:

 * a list of nr_entries object ids. These objects are commits that has
   bitmaps. Ids are stored in lexicographic order (for better searching).
 * a list of <offset, xor-offset> pairs (4-byte integers, network-byte
   order). The i'th pair denotes the offset and xor-offset(respectively) of
   the bitmap of i'th commit in the previous list. These two informations
   are necessary because only in this way bitmaps can be found without
   parsing all the bitmap.
 * a 4-byte integer for table specific flags (none exists currently).

Whenever git want to parse the bitmap for a specific commit, it will first
refer to the table and will look for the offset and xor-offset for that
commit. Git will then try to parse the bitmap located at the offset
position. The xor-offset can be used to find the xor-bitmap for the
bitmap(if any).

Abhradeep Chakraborty (6):
  Documentation/technical: describe bitmap lookup table extension
  pack-bitmap-write.c: write lookup table extension
  pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  pack-bitmap: prepare to read lookup table extension
  p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex`
  bitmap-lookup-table: add performance tests for lookup table

 Documentation/config/pack.txt             |   7 +
 Documentation/technical/bitmap-format.txt |  39 ++
 builtin/multi-pack-index.c                |   7 +
 builtin/pack-objects.c                    |   8 +
 midx.c                                    |   3 +
 midx.h                                    |   1 +
 pack-bitmap-write.c                       | 112 ++-
 pack-bitmap.c                             | 275 +++++++-
 pack-bitmap.h                             |  14 +-
 t/perf/p5310-pack-bitmaps.sh              |  68 +-
 t/perf/p5326-multi-pack-bitmaps.sh        |  95 +--
 t/t5310-pack-bitmaps.sh                   | 786 ++++++++++++----------
 t/t5311-pack-bitmaps-shallow.sh           |  53 +-
 t/t5326-multi-pack-bitmaps.sh             | 421 +++++++-----
 t/t5327-multi-pack-bitmaps-rev.sh         |  24 +-
 15 files changed, 1254 insertions(+), 659 deletions(-)


base-commit: 39c15e485575089eb77c769f6da02f98a55905e0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1266%2FAbhra303%2Fbitmap-commit-table-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1266/Abhra303/bitmap-commit-table-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/1266

Range-diff vs v4:

 1:  f72bf11e6ef ! 1:  33aca8f3dc8 Documentation/technical: describe bitmap lookup table extension
     @@ Commit message
          even if all the bitmaps are not required. A "bitmap lookup table"
          extension to the bitmap format can reduce the overhead of loading
          bitmaps which stores a list of bitmapped commit id pos (in the midx
     -    or pack, along with their offset and xor offset. This way git can
     +    or pack, along with their offset and xor offset. This way Git can
          load only the necessary bitmaps without loading the previous bitmaps.
      
          Older versions of Git ignore the lookup table extension and don't
 2:  04244fadf5c = 2:  a913e6a2cb3 pack-bitmap-write.c: write lookup table extension
 3:  8bd7639e4b9 ! 3:  59b465e5a78 pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
     @@ t/t5311-pack-bitmaps-shallow.sh: test_description='check bitmap operation with s
      +}
      +
      +test_shallow_bitmaps
     -+
     ++test_shallow_bitmaps "pack.writeBitmapLookupTable"
       
       test_done
      
     @@ t/t5326-multi-pack-bitmaps.sh: bitmap_reuse_tests() {
       
      
       ## t/t5327-multi-pack-bitmaps-rev.sh ##
     -@@ t/t5327-multi-pack-bitmaps-rev.sh: export GIT_TEST_MIDX_READ_RIDX
     - midx_bitmap_core rev
     - midx_bitmap_partial_tests rev
     +@@ t/t5327-multi-pack-bitmaps-rev.sh: GIT_TEST_MIDX_READ_RIDX=0
     + export GIT_TEST_MIDX_WRITE_REV
     + export GIT_TEST_MIDX_READ_RIDX
     + 
     +-midx_bitmap_core rev
     +-midx_bitmap_partial_tests rev
     ++test_midx_bitmap_rev () {
     ++     writeLookupTable=false
     ++
     ++ 	for i in "$@"
     ++ 	do
     ++ 		case $i in
     ++ 		"pack.writeBitmapLookupTable") writeLookupTable=true;;
     ++ 		esac
     ++ 	done
     ++
     ++     test_expect_success 'setup bitmap config' '
     ++         rm -rf * .git &&
     ++         git init &&
     ++         git config pack.writeBitmapLookupTable '"$writeLookupTable"'
     ++     '
     ++
     ++     midx_bitmap_core rev
     ++     midx_bitmap_partial_tests rev
     ++ }
     ++
     ++ test_midx_bitmap_rev
     ++ test_midx_bitmap_rev "pack.writeBitmapLookupTable"
       
     -+test_expect_success 'reinitialize the repository with lookup table enabled' '
     -+    rm -fr * .git &&
     -+    git init &&
     -+    git config pack.writeBitmapLookupTable true
     -+'
     -+
     -+midx_bitmap_core rev
     -+midx_bitmap_partial_tests rev
     -+
       test_done
 4:  afc8c660ac1 ! 4:  6918f0860ad pack-bitmap: prepare to read lookup table extension
     @@ pack-bitmap.c: struct include_data {
      +	if (xor_row != 0xffffffff) {
      +		int xor_flags;
      +		khiter_t hash_pos;
     -+		uint64_t offset_xor;
      +		struct bitmap_lookup_table_xor_item *xor_item;
      +
      +		while (xor_row != 0xffffffff) {
     @@ pack-bitmap.c: struct include_data {
      +
      +		while (xor_items_nr) {
      +			xor_item = &xor_items[xor_items_nr - 1];
     -+			offset_xor = xor_item->offset;
     -+
     -+			bitmap_git->map_pos = offset_xor;
     ++			bitmap_git->map_pos = xor_item->offset;
      +			if (bitmap_git->map_size - bitmap_git->map_pos < bitmap_header_size) {
      +				error(_("corrupt ewah bitmap: truncated header for bitmap of commit \"%s\""),
      +					oid_to_hex(&xor_item->oid));
 5:  fc69489e395 = 5:  e7ef420f321 p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex`
 6:  52f7d8359ee = 6:  6628001241d bitmap-lookup-table: add performance tests for lookup table

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH v5 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-07-20 18:38       ` [PATCH v5 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
@ 2022-07-20 18:38         ` Abhradeep Chakraborty via GitGitGadget
  2022-07-20 18:38         ` [PATCH v5 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
                           ` (5 subsequent siblings)
  6 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-07-20 18:38 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

When reading bitmap file, Git loads each and every bitmap one by one
even if all the bitmaps are not required. A "bitmap lookup table"
extension to the bitmap format can reduce the overhead of loading
bitmaps which stores a list of bitmapped commit id pos (in the midx
or pack, along with their offset and xor offset. This way Git can
load only the necessary bitmaps without loading the previous bitmaps.

Older versions of Git ignore the lookup table extension and don't
throw any kind of warning or error while parsing the bitmap file.

Add some information for the new "bitmap lookup table" extension in the
bitmap-format documentation.

Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Co-Authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 Documentation/technical/bitmap-format.txt | 39 +++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index 04b3ec21785..c30dc177643 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -67,6 +67,17 @@ MIDXs, both the bit-cache and rev-cache extensions are required.
 			pack/MIDX. The format and meaning of the name-hash is
 			described below.
 
+			** {empty}
+			BITMAP_OPT_LOOKUP_TABLE (0x10): :::
+			If present, the end of the bitmap file contains a table
+			containing a list of `N` <commit_pos, offset, xor_row>
+			triplets. The format and meaning of the table is described
+			below.
++
+NOTE: Unlike the xor_offset used to compress an individual bitmap,
+`xor_row` stores an *absolute* index into the lookup table, not a location
+relative to the current entry.
+
 		4-byte entry count (network byte order)
 
 			The total count of entries (bitmapped commits) in this bitmap index.
@@ -205,3 +216,31 @@ Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag.
 If implementations want to choose a different hashing scheme, they are
 free to do so, but MUST allocate a new header flag (because comparing
 hashes made under two different schemes would be pointless).
+
+Commit lookup table
+-------------------
+
+If the BITMAP_OPT_LOOKUP_TABLE flag is set, the last `N * (4 + 8 + 4)`
+bytes (preceding the name-hash cache and trailing hash) of the `.bitmap`
+file contains a lookup table specifying the information needed to get
+the desired bitmap from the entries without parsing previous unnecessary
+bitmaps.
+
+For a `.bitmap` containing `nr_entries` reachability bitmaps, the table
+contains a list of `nr_entries` <commit_pos, offset, xor_row> triplets
+(sorted in the ascending order of `commit_pos`). The content of i'th
+triplet is -
+
+	* {empty}
+	commit_pos (4 byte integer, network byte order): ::
+	It stores the object position of a commit (in the midx or pack
+	index).
+
+	* {empty}
+	offset (8 byte integer, network byte order): ::
+	The offset from which that commit's bitmap can be read.
+
+	* {empty}
+	xor_row (4 byte integer, network byte order): ::
+	The position of the triplet whose bitmap is used to compress
+	this one, or `0xffffffff` if no such bitmap exists.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v5 2/6] pack-bitmap-write.c: write lookup table extension
  2022-07-20 18:38       ` [PATCH v5 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
  2022-07-20 18:38         ` [PATCH v5 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
@ 2022-07-20 18:38         ` Abhradeep Chakraborty via GitGitGadget
  2022-07-26  0:52           ` Taylor Blau
  2022-07-20 18:38         ` [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
                           ` (4 subsequent siblings)
  6 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-07-20 18:38 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

The bitmap lookup table extension was documented by an earlier
change, but Git does not yet know how to write that extension.

Teach Git to write bitmap lookup table extension. The table contains
the list of `N` <commit_pos, offset, xor_row>` triplets. These
triplets are sorted according to their commit pos (ascending order).
The meaning of each data in the i'th triplet is given below:

  - commit_pos stores commit position (in the pack-index or midx).
    It is a 4 byte network byte order unsigned integer.

  - offset is the position (in the bitmap file) from which that
    commit's bitmap can be read.

  - xor_row is the position of the triplet in the lookup table
    whose bitmap is used to compress this bitmap, or `0xffffffff`
    if no such bitmap exists.

Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 pack-bitmap-write.c | 112 ++++++++++++++++++++++++++++++++++++++++----
 pack-bitmap.h       |   5 +-
 2 files changed, 107 insertions(+), 10 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index c43375bd344..9843790cb60 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -650,20 +650,19 @@ static const struct object_id *oid_access(size_t pos, const void *table)
 
 static void write_selected_commits_v1(struct hashfile *f,
 				      struct pack_idx_entry **index,
-				      uint32_t index_nr)
+				      uint32_t index_nr,
+				      off_t *offsets,
+				      uint32_t *commit_positions)
 {
 	int i;
 
 	for (i = 0; i < writer.selected_nr; ++i) {
 		struct bitmapped_commit *stored = &writer.selected[i];
 
-		int commit_pos =
-			oid_pos(&stored->commit->object.oid, index, index_nr, oid_access);
+		if (offsets)
+			offsets[i] = hashfile_total(f);
 
-		if (commit_pos < 0)
-			BUG("trying to write commit not in index");
-
-		hashwrite_be32(f, commit_pos);
+		hashwrite_be32(f, commit_positions[i]);
 		hashwrite_u8(f, stored->xor_offset);
 		hashwrite_u8(f, stored->flags);
 
@@ -671,6 +670,81 @@ static void write_selected_commits_v1(struct hashfile *f,
 	}
 }
 
+static int table_cmp(const void *_va, const void *_vb, void *_data)
+{
+	uint32_t *commit_positions = _data;
+	uint32_t a = commit_positions[*(uint32_t *)_va];
+	uint32_t b = commit_positions[*(uint32_t *)_vb];
+
+	if (a > b)
+		return 1;
+	else if (a < b)
+		return -1;
+
+	return 0;
+}
+
+static void write_lookup_table(struct hashfile *f,
+			       struct pack_idx_entry **index,
+			       uint32_t index_nr,
+			       off_t *offsets,
+			       uint32_t *commit_positions)
+{
+	uint32_t i;
+	uint32_t *table, *table_inv;
+
+	ALLOC_ARRAY(table, writer.selected_nr);
+	ALLOC_ARRAY(table_inv, writer.selected_nr);
+
+	for (i = 0; i < writer.selected_nr; i++)
+		table[i] = i;
+
+	/*
+	 * At the end of this sort table[j] = i means that the i'th
+	 * bitmap corresponds to j'th bitmapped commit (among the selected
+	 * commits) in lex order of OIDs.
+	 */
+	QSORT_S(table, writer.selected_nr, table_cmp, commit_positions);
+
+	/* table_inv helps us discover that relationship (i'th bitmap
+	 * to j'th commit by j = table_inv[i])
+	 */
+	for (i = 0; i < writer.selected_nr; i++)
+		table_inv[table[i]] = i;
+
+	trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository);
+	for (i = 0; i < writer.selected_nr; i++) {
+		struct bitmapped_commit *selected = &writer.selected[table[i]];
+		uint32_t xor_offset = selected->xor_offset;
+		uint32_t xor_row;
+
+		if (xor_offset) {
+			/*
+			 * xor_index stores the index (in the bitmap entries)
+			 * of the corresponding xor bitmap. But we need to convert
+			 * this index into lookup table's index. So, table_inv[xor_index]
+			 * gives us the index position w.r.t. the lookup table.
+			 *
+			 * If "k = table[i] - xor_offset" then the xor base is the k'th
+			 * bitmap. `table_inv[k]` gives us the position of that bitmap
+			 * in the lookup table.
+			 */
+			uint32_t xor_index = table[i] - xor_offset;
+			xor_row = table_inv[xor_index];
+		} else {
+			xor_row = 0xffffffff;
+		}
+
+		hashwrite_be32(f, commit_positions[table[i]]);
+		hashwrite_be64(f, (uint64_t)offsets[table[i]]);
+		hashwrite_be32(f, xor_row);
+	}
+	trace2_region_leave("pack-bitmap-write", "writing_lookup_table", the_repository);
+
+	free(table);
+	free(table_inv);
+}
+
 static void write_hash_cache(struct hashfile *f,
 			     struct pack_idx_entry **index,
 			     uint32_t index_nr)
@@ -695,8 +769,10 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 {
 	static uint16_t default_version = 1;
 	static uint16_t flags = BITMAP_OPT_FULL_DAG;
+	off_t *offsets = NULL;
 	struct strbuf tmp_file = STRBUF_INIT;
 	struct hashfile *f;
+	uint32_t *commit_positions = NULL;
 
 	struct bitmap_disk_header header;
 
@@ -715,7 +791,25 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	dump_bitmap(f, writer.trees);
 	dump_bitmap(f, writer.blobs);
 	dump_bitmap(f, writer.tags);
-	write_selected_commits_v1(f, index, index_nr);
+
+	ALLOC_ARRAY(commit_positions, writer.selected_nr);
+	for (uint32_t i = 0; i < writer.selected_nr; ++i) {
+		struct bitmapped_commit *stored = &writer.selected[i];
+		int commit_pos = oid_pos(&stored->commit->object.oid, index, index_nr, oid_access);
+
+		if (commit_pos < 0)
+			BUG(_("trying to write commit not in index"));
+
+		commit_positions[i] = commit_pos;
+	}
+
+	if (options & BITMAP_OPT_LOOKUP_TABLE)
+		CALLOC_ARRAY(offsets, index_nr);
+
+	write_selected_commits_v1(f, index, index_nr, offsets, commit_positions);
+
+	if (options & BITMAP_OPT_LOOKUP_TABLE)
+		write_lookup_table(f, index, index_nr, offsets, commit_positions);
 
 	if (options & BITMAP_OPT_HASH_CACHE)
 		write_hash_cache(f, index, index_nr);
@@ -730,4 +824,6 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 		die_errno("unable to rename temporary bitmap file to '%s'", filename);
 
 	strbuf_release(&tmp_file);
+	free(offsets);
+	free(commit_positions);
 }
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 3d3ddd77345..67a9d0fc303 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -24,8 +24,9 @@ struct bitmap_disk_header {
 #define NEEDS_BITMAP (1u<<22)
 
 enum pack_bitmap_opts {
-	BITMAP_OPT_FULL_DAG = 1,
-	BITMAP_OPT_HASH_CACHE = 4,
+	BITMAP_OPT_FULL_DAG = 0x1,
+	BITMAP_OPT_HASH_CACHE = 0x4,
+	BITMAP_OPT_LOOKUP_TABLE = 0x10,
 };
 
 enum pack_bitmap_flags {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-07-20 18:38       ` [PATCH v5 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
  2022-07-20 18:38         ` [PATCH v5 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
  2022-07-20 18:38         ` [PATCH v5 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
@ 2022-07-20 18:38         ` Abhradeep Chakraborty via GitGitGadget
  2022-07-28 19:22           ` Johannes Schindelin
  2022-07-20 18:38         ` [PATCH v5 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
                           ` (3 subsequent siblings)
  6 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-07-20 18:38 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Teach Git to provide a way for users to enable/disable bitmap lookup
table extension by providing a config option named 'writeBitmapLookupTable'.
Default is false.

Also add test to verify writting of lookup table.

Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Co-Authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 Documentation/config/pack.txt     |   7 +
 builtin/multi-pack-index.c        |   7 +
 builtin/pack-objects.c            |   8 +
 midx.c                            |   3 +
 midx.h                            |   1 +
 t/t5310-pack-bitmaps.sh           | 792 ++++++++++++++++--------------
 t/t5311-pack-bitmaps-shallow.sh   |  53 +-
 t/t5326-multi-pack-bitmaps.sh     | 421 +++++++++-------
 t/t5327-multi-pack-bitmaps-rev.sh |  24 +-
 9 files changed, 733 insertions(+), 583 deletions(-)

diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index ad7f73a1ead..b955ca572ec 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -164,6 +164,13 @@ When writing a multi-pack reachability bitmap, no new namehashes are
 computed; instead, any namehashes stored in an existing bitmap are
 permuted into their appropriate location when writing a new bitmap.
 
+pack.writeBitmapLookupTable::
+	When true, Git will include a "lookup table" section in the
+	bitmap index (if one is written). This table is used to defer
+	loading individual bitmaps as late as possible. This can be
+	beneficial in repositories that have relatively large bitmap
+	indexes. Defaults to false.
+
 pack.writeReverseIndex::
 	When true, git will write a corresponding .rev file (see:
 	link:../technical/pack-format.html[Documentation/technical/pack-format.txt])
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 5edbb7fe86e..55402b46f41 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -87,6 +87,13 @@ static int git_multi_pack_index_write_config(const char *var, const char *value,
 			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
 	}
 
+	if (!strcmp(var, "pack.writebitmaplookuptable")) {
+		if (git_config_bool(var, value))
+			opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;
+		else
+			opts.flags &= ~MIDX_WRITE_BITMAP_LOOKUP_TABLE;
+	}
+
 	/*
 	 * We should never make a fall-back call to 'git_default_config', since
 	 * this was already called in 'cmd_multi_pack_index()'.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 39e28cfcafc..46e26774963 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3148,6 +3148,14 @@ static int git_pack_config(const char *k, const char *v, void *cb)
 		else
 			write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE;
 	}
+
+	if (!strcmp(k, "pack.writebitmaplookuptable")) {
+		if (git_config_bool(k, v))
+			write_bitmap_options |= BITMAP_OPT_LOOKUP_TABLE;
+		else
+			write_bitmap_options &= ~BITMAP_OPT_LOOKUP_TABLE;
+	}
+
 	if (!strcmp(k, "pack.usebitmaps")) {
 		use_bitmap_index_default = git_config_bool(k, v);
 		return 0;
diff --git a/midx.c b/midx.c
index 5f0dd386b02..9c26d04bfde 100644
--- a/midx.c
+++ b/midx.c
@@ -1072,6 +1072,9 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
 	if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
 		options |= BITMAP_OPT_HASH_CACHE;
 
+	if (flags & MIDX_WRITE_BITMAP_LOOKUP_TABLE)
+		options |= BITMAP_OPT_LOOKUP_TABLE;
+
 	prepare_midx_packing_data(&pdata, ctx);
 
 	commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx);
diff --git a/midx.h b/midx.h
index 22e8e53288e..5578cd7b835 100644
--- a/midx.h
+++ b/midx.h
@@ -47,6 +47,7 @@ struct multi_pack_index {
 #define MIDX_WRITE_REV_INDEX (1 << 1)
 #define MIDX_WRITE_BITMAP (1 << 2)
 #define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
+#define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m);
 void get_midx_filename(struct strbuf *out, const char *object_dir);
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index f775fc1ce69..c0607172827 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -26,22 +26,413 @@ has_any () {
 	grep -Ff "$1" "$2"
 }
 
-setup_bitmap_history
-
-test_expect_success 'setup writing bitmaps during repack' '
-	git config repack.writeBitmaps true
-'
-
-test_expect_success 'full repack creates bitmaps' '
-	GIT_TRACE2_EVENT="$(pwd)/trace" \
+test_bitmap_cases () {
+	writeLookupTable=false
+	for i in "$@"
+	do
+		case "$i" in
+		"pack.writeBitmapLookupTable") writeLookupTable=true;;
+		esac
+	done
+
+	test_expect_success 'setup test repository' '
+		rm -fr * .git &&
+		git init &&
+		git config pack.writeBitmapLookupTable '"$writeLookupTable"'
+	'
+	setup_bitmap_history
+
+	test_expect_success 'setup writing bitmaps during repack' '
+		git config repack.writeBitmaps true
+	'
+
+	test_expect_success 'full repack creates bitmaps' '
+		GIT_TRACE2_EVENT="$(pwd)/trace" \
+			git repack -ad &&
+		ls .git/objects/pack/ | grep bitmap >output &&
+		test_line_count = 1 output &&
+		grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
+		grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
+	'
+
+	basic_bitmap_tests
+
+	test_expect_success 'pack-objects respects --local (non-local loose)' '
+		git init --bare alt.git &&
+		echo $(pwd)/alt.git/objects >.git/objects/info/alternates &&
+		echo content1 >file1 &&
+		# non-local loose object which is not present in bitmapped pack
+		altblob=$(GIT_DIR=alt.git git hash-object -w file1) &&
+		# non-local loose object which is also present in bitmapped pack
+		git cat-file blob $blob | GIT_DIR=alt.git git hash-object -w --stdin &&
+		git add file1 &&
+		test_tick &&
+		git commit -m commit_file1 &&
+		echo HEAD | git pack-objects --local --stdout --revs >1.pack &&
+		git index-pack 1.pack &&
+		list_packed_objects 1.idx >1.objects &&
+		printf "%s\n" "$altblob" "$blob" >nonlocal-loose &&
+		! has_any nonlocal-loose 1.objects
+	'
+
+	test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' '
+		echo content2 >file2 &&
+		blob2=$(git hash-object -w file2) &&
+		git add file2 &&
+		test_tick &&
+		git commit -m commit_file2 &&
+		printf "%s\n" "$blob2" "$bitmaptip" >keepobjects &&
+		pack2=$(git pack-objects pack2 <keepobjects) &&
+		mv pack2-$pack2.* .git/objects/pack/ &&
+		>.git/objects/pack/pack2-$pack2.keep &&
+		rm $(objpath $blob2) &&
+		echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >2a.pack &&
+		git index-pack 2a.pack &&
+		list_packed_objects 2a.idx >2a.objects &&
+		! has_any keepobjects 2a.objects
+	'
+
+	test_expect_success 'pack-objects respects --local (non-local pack)' '
+		mv .git/objects/pack/pack2-$pack2.* alt.git/objects/pack/ &&
+		echo HEAD | git pack-objects --local --stdout --revs >2b.pack &&
+		git index-pack 2b.pack &&
+		list_packed_objects 2b.idx >2b.objects &&
+		! has_any keepobjects 2b.objects
+	'
+
+	test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' '
+		ls .git/objects/pack/ | grep bitmap >output &&
+		test_line_count = 1 output &&
+		packbitmap=$(basename $(cat output) .bitmap) &&
+		list_packed_objects .git/objects/pack/$packbitmap.idx >packbitmap.objects &&
+		test_when_finished "rm -f .git/objects/pack/$packbitmap.keep" &&
+		>.git/objects/pack/$packbitmap.keep &&
+		echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack &&
+		git index-pack 3a.pack &&
+		list_packed_objects 3a.idx >3a.objects &&
+		! has_any packbitmap.objects 3a.objects
+	'
+
+	test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' '
+		mv .git/objects/pack/$packbitmap.* alt.git/objects/pack/ &&
+		rm -f .git/objects/pack/multi-pack-index &&
+		test_when_finished "mv alt.git/objects/pack/$packbitmap.* .git/objects/pack/" &&
+		echo HEAD | git pack-objects --local --stdout --revs >3b.pack &&
+		git index-pack 3b.pack &&
+		list_packed_objects 3b.idx >3b.objects &&
+		! has_any packbitmap.objects 3b.objects
+	'
+
+	test_expect_success 'pack-objects to file can use bitmap' '
+		# make sure we still have 1 bitmap index from previous tests
+		ls .git/objects/pack/ | grep bitmap >output &&
+		test_line_count = 1 output &&
+		# verify equivalent packs are generated with/without using bitmap index
+		packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) &&
+		packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) &&
+		list_packed_objects packa-$packasha1.idx >packa.objects &&
+		list_packed_objects packb-$packbsha1.idx >packb.objects &&
+		test_cmp packa.objects packb.objects
+	'
+
+	test_expect_success 'full repack, reusing previous bitmaps' '
 		git repack -ad &&
-	ls .git/objects/pack/ | grep bitmap >output &&
-	test_line_count = 1 output &&
-	grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
-	grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
-'
+		ls .git/objects/pack/ | grep bitmap >output &&
+		test_line_count = 1 output
+	'
+
+	test_expect_success 'fetch (full bitmap)' '
+		git --git-dir=clone.git fetch origin second:second &&
+		git rev-parse HEAD >expect &&
+		git --git-dir=clone.git rev-parse HEAD >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success 'create objects for missing-HAVE tests' '
+		blob=$(echo "missing have" | git hash-object -w --stdin) &&
+		tree=$(printf "100644 blob $blob\tfile\n" | git mktree) &&
+		parent=$(echo parent | git commit-tree $tree) &&
+		commit=$(echo commit | git commit-tree $tree -p $parent) &&
+		cat >revs <<-EOF
+		HEAD
+		^HEAD^
+		^$commit
+		EOF
+	'
+
+	test_expect_success 'pack-objects respects --incremental' '
+		cat >revs2 <<-EOF &&
+		HEAD
+		$commit
+		EOF
+		git pack-objects --incremental --stdout --revs <revs2 >4.pack &&
+		git index-pack 4.pack &&
+		list_packed_objects 4.idx >4.objects &&
+		test_line_count = 4 4.objects &&
+		git rev-list --objects $commit >revlist &&
+		cut -d" " -f1 revlist |sort >objects &&
+		test_cmp 4.objects objects
+	'
+
+	test_expect_success 'pack with missing blob' '
+		rm $(objpath $blob) &&
+		git pack-objects --stdout --revs <revs >/dev/null
+	'
+
+	test_expect_success 'pack with missing tree' '
+		rm $(objpath $tree) &&
+		git pack-objects --stdout --revs <revs >/dev/null
+	'
+
+	test_expect_success 'pack with missing parent' '
+		rm $(objpath $parent) &&
+		git pack-objects --stdout --revs <revs >/dev/null
+	'
+
+	test_expect_success JGIT,SHA1 'we can read jgit bitmaps' '
+		git clone --bare . compat-jgit.git &&
+		(
+			cd compat-jgit.git &&
+			rm -f objects/pack/*.bitmap &&
+			jgit gc &&
+			git rev-list --test-bitmap HEAD
+		)
+	'
+
+	test_expect_success JGIT,SHA1 'jgit can read our bitmaps' '
+		git clone --bare . compat-us.git &&
+		(
+			cd compat-us.git &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+			git repack -adb &&
+			# jgit gc will barf if it does not like our bitmaps
+			jgit gc
+		)
+	'
+
+	test_expect_success 'splitting packs does not generate bogus bitmaps' '
+		test-tool genrandom foo $((1024 * 1024)) >rand &&
+		git add rand &&
+		git commit -m "commit with big file" &&
+		git -c pack.packSizeLimit=500k repack -adb &&
+		git init --bare no-bitmaps.git &&
+		git -C no-bitmaps.git fetch .. HEAD
+	'
+
+	test_expect_success 'set up reusable pack' '
+		rm -f .git/objects/pack/*.keep &&
+		git repack -adb &&
+		reusable_pack () {
+			git for-each-ref --format="%(objectname)" |
+			git pack-objects --delta-base-offset --revs --stdout "$@"
+		}
+	'
+
+	test_expect_success 'pack reuse respects --honor-pack-keep' '
+		test_when_finished "rm -f .git/objects/pack/*.keep" &&
+		for i in .git/objects/pack/*.pack
+		do
+			>${i%.pack}.keep || return 1
+		done &&
+		reusable_pack --honor-pack-keep >empty.pack &&
+		git index-pack empty.pack &&
+		git show-index <empty.idx >actual &&
+		test_must_be_empty actual
+	'
+
+	test_expect_success 'pack reuse respects --local' '
+		mv .git/objects/pack/* alt.git/objects/pack/ &&
+		test_when_finished "mv alt.git/objects/pack/* .git/objects/pack/" &&
+		reusable_pack --local >empty.pack &&
+		git index-pack empty.pack &&
+		git show-index <empty.idx >actual &&
+		test_must_be_empty actual
+	'
+
+	test_expect_success 'pack reuse respects --incremental' '
+		reusable_pack --incremental >empty.pack &&
+		git index-pack empty.pack &&
+		git show-index <empty.idx >actual &&
+		test_must_be_empty actual
+	'
+
+	test_expect_success 'truncated bitmap fails gracefully (ewah)' '
+		test_config pack.writebitmaphashcache false &&
+		git repack -ad &&
+		git rev-list --use-bitmap-index --count --all >expect &&
+		bitmap=$(ls .git/objects/pack/*.bitmap) &&
+		test_when_finished "rm -f $bitmap" &&
+		test_copy_bytes 256 <$bitmap >$bitmap.tmp &&
+		mv -f $bitmap.tmp $bitmap &&
+		git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
+		test_cmp expect actual &&
+		test_i18ngrep corrupt.ewah.bitmap stderr
+	'
+
+	test_expect_success 'truncated bitmap fails gracefully (cache)' '
+		git repack -ad &&
+		git rev-list --use-bitmap-index --count --all >expect &&
+		bitmap=$(ls .git/objects/pack/*.bitmap) &&
+		test_when_finished "rm -f $bitmap" &&
+		test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
+		mv -f $bitmap.tmp $bitmap &&
+		git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
+		test_cmp expect actual &&
+		test_i18ngrep corrupted.bitmap.index stderr
+	'
+
+	# Create a state of history with these properties:
+	#
+	#  - refs that allow a client to fetch some new history, while sharing some old
+	#    history with the server; we use branches delta-reuse-old and
+	#    delta-reuse-new here
+	#
+	#  - the new history contains an object that is stored on the server as a delta
+	#    against a base that is in the old history
+	#
+	#  - the base object is not immediately reachable from the tip of the old
+	#    history; finding it would involve digging down through history we know the
+	#    other side has
+	#
+	# This should result in a state where fetching from old->new would not
+	# traditionally reuse the on-disk delta (because we'd have to dig to realize
+	# that the client has it), but we will do so if bitmaps can tell us cheaply
+	# that the other side has it.
+	test_expect_success 'set up thin delta-reuse parent' '
+		# This first commit contains the buried base object.
+		test-tool genrandom delta 16384 >file &&
+		git add file &&
+		git commit -m "delta base" &&
+		base=$(git rev-parse --verify HEAD:file) &&
+
+		# These intermediate commits bury the base back in history.
+		# This becomes the "old" state.
+		for i in 1 2 3 4 5
+		do
+			echo $i >file &&
+			git commit -am "intermediate $i" || return 1
+		done &&
+		git branch delta-reuse-old &&
+
+		# And now our new history has a delta against the buried base. Note
+		# that this must be smaller than the original file, since pack-objects
+		# prefers to create deltas from smaller objects to larger.
+		test-tool genrandom delta 16300 >file &&
+		git commit -am "delta result" &&
+		delta=$(git rev-parse --verify HEAD:file) &&
+		git branch delta-reuse-new &&
+
+		# Repack with bitmaps and double check that we have the expected delta
+		# relationship.
+		git repack -adb &&
+		have_delta $delta $base
+	'
+
+	# Now we can sanity-check the non-bitmap behavior (that the server is not able
+	# to reuse the delta). This isn't strictly something we care about, so this
+	# test could be scrapped in the future. But it makes sure that the next test is
+	# actually triggering the feature we want.
+	#
+	# Note that our tools for working with on-the-wire "thin" packs are limited. So
+	# we actually perform the fetch, retain the resulting pack, and inspect the
+	# result.
+	test_expect_success 'fetch without bitmaps ignores delta against old base' '
+		test_config pack.usebitmaps false &&
+		test_when_finished "rm -rf client.git" &&
+		git init --bare client.git &&
+		(
+			cd client.git &&
+			git config transfer.unpackLimit 1 &&
+			git fetch .. delta-reuse-old:delta-reuse-old &&
+			git fetch .. delta-reuse-new:delta-reuse-new &&
+			have_delta $delta $ZERO_OID
+		)
+	'
+
+	# And do the same for the bitmap case, where we do expect to find the delta.
+	test_expect_success 'fetch with bitmaps can reuse old base' '
+		test_config pack.usebitmaps true &&
+		test_when_finished "rm -rf client.git" &&
+		git init --bare client.git &&
+		(
+			cd client.git &&
+			git config transfer.unpackLimit 1 &&
+			git fetch .. delta-reuse-old:delta-reuse-old &&
+			git fetch .. delta-reuse-new:delta-reuse-new &&
+			have_delta $delta $base
+		)
+	'
+
+	test_expect_success 'pack.preferBitmapTips' '
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+
+			# create enough commits that not all are receive bitmap
+			# coverage even if they are all at the tip of some reference.
+			test_commit_bulk --message="%s" 103 &&
+
+			git rev-list HEAD >commits.raw &&
+			sort <commits.raw >commits &&
+
+			git log --format="create refs/tags/%s %H" HEAD >refs &&
+			git update-ref --stdin <refs &&
+
+			git repack -adb &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+
+			# remember which commits did not receive bitmaps
+			comm -13 bitmaps commits >before &&
+			test_file_not_empty before &&
+
+			# mark the commits which did not receive bitmaps as preferred,
+			# and generate the bitmap again
+			perl -pe "s{^}{create refs/tags/include/$. }" <before |
+				git update-ref --stdin &&
+			git -c pack.preferBitmapTips=refs/tags/include repack -adb &&
+
+			# finally, check that the commit(s) without bitmap coverage
+			# are not the same ones as before
+			test-tool bitmap list-commits | sort >bitmaps &&
+			comm -13 bitmaps commits >after &&
+
+			! test_cmp before after
+		)
+	'
+
+	test_expect_success 'complains about multiple pack bitmaps' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+
+			test_commit base &&
+
+			git repack -adb &&
+			bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
+			mv "$bitmap" "$bitmap.bak" &&
+
+			test_commit other &&
+			git repack -ab &&
+
+			mv "$bitmap.bak" "$bitmap" &&
+
+			find .git/objects/pack -type f -name "*.pack" >packs &&
+			find .git/objects/pack -type f -name "*.bitmap" >bitmaps &&
+			test_line_count = 2 packs &&
+			test_line_count = 2 bitmaps &&
+
+			git rev-list --use-bitmap-index HEAD 2>err &&
+			grep "ignoring extra bitmap file" err
+		)
+	'
+}
 
-basic_bitmap_tests
+test_bitmap_cases
 
 test_expect_success 'incremental repack fails when bitmaps are requested' '
 	test_commit more-1 &&
@@ -54,375 +445,12 @@ test_expect_success 'incremental repack can disable bitmaps' '
 	git repack -d --no-write-bitmap-index
 '
 
-test_expect_success 'pack-objects respects --local (non-local loose)' '
-	git init --bare alt.git &&
-	echo $(pwd)/alt.git/objects >.git/objects/info/alternates &&
-	echo content1 >file1 &&
-	# non-local loose object which is not present in bitmapped pack
-	altblob=$(GIT_DIR=alt.git git hash-object -w file1) &&
-	# non-local loose object which is also present in bitmapped pack
-	git cat-file blob $blob | GIT_DIR=alt.git git hash-object -w --stdin &&
-	git add file1 &&
-	test_tick &&
-	git commit -m commit_file1 &&
-	echo HEAD | git pack-objects --local --stdout --revs >1.pack &&
-	git index-pack 1.pack &&
-	list_packed_objects 1.idx >1.objects &&
-	printf "%s\n" "$altblob" "$blob" >nonlocal-loose &&
-	! has_any nonlocal-loose 1.objects
-'
-
-test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' '
-	echo content2 >file2 &&
-	blob2=$(git hash-object -w file2) &&
-	git add file2 &&
-	test_tick &&
-	git commit -m commit_file2 &&
-	printf "%s\n" "$blob2" "$bitmaptip" >keepobjects &&
-	pack2=$(git pack-objects pack2 <keepobjects) &&
-	mv pack2-$pack2.* .git/objects/pack/ &&
-	>.git/objects/pack/pack2-$pack2.keep &&
-	rm $(objpath $blob2) &&
-	echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >2a.pack &&
-	git index-pack 2a.pack &&
-	list_packed_objects 2a.idx >2a.objects &&
-	! has_any keepobjects 2a.objects
-'
-
-test_expect_success 'pack-objects respects --local (non-local pack)' '
-	mv .git/objects/pack/pack2-$pack2.* alt.git/objects/pack/ &&
-	echo HEAD | git pack-objects --local --stdout --revs >2b.pack &&
-	git index-pack 2b.pack &&
-	list_packed_objects 2b.idx >2b.objects &&
-	! has_any keepobjects 2b.objects
-'
-
-test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' '
-	ls .git/objects/pack/ | grep bitmap >output &&
-	test_line_count = 1 output &&
-	packbitmap=$(basename $(cat output) .bitmap) &&
-	list_packed_objects .git/objects/pack/$packbitmap.idx >packbitmap.objects &&
-	test_when_finished "rm -f .git/objects/pack/$packbitmap.keep" &&
-	>.git/objects/pack/$packbitmap.keep &&
-	echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack &&
-	git index-pack 3a.pack &&
-	list_packed_objects 3a.idx >3a.objects &&
-	! has_any packbitmap.objects 3a.objects
-'
-
-test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' '
-	mv .git/objects/pack/$packbitmap.* alt.git/objects/pack/ &&
-	rm -f .git/objects/pack/multi-pack-index &&
-	test_when_finished "mv alt.git/objects/pack/$packbitmap.* .git/objects/pack/" &&
-	echo HEAD | git pack-objects --local --stdout --revs >3b.pack &&
-	git index-pack 3b.pack &&
-	list_packed_objects 3b.idx >3b.objects &&
-	! has_any packbitmap.objects 3b.objects
-'
-
-test_expect_success 'pack-objects to file can use bitmap' '
-	# make sure we still have 1 bitmap index from previous tests
-	ls .git/objects/pack/ | grep bitmap >output &&
-	test_line_count = 1 output &&
-	# verify equivalent packs are generated with/without using bitmap index
-	packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) &&
-	packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) &&
-	list_packed_objects packa-$packasha1.idx >packa.objects &&
-	list_packed_objects packb-$packbsha1.idx >packb.objects &&
-	test_cmp packa.objects packb.objects
-'
-
-test_expect_success 'full repack, reusing previous bitmaps' '
-	git repack -ad &&
-	ls .git/objects/pack/ | grep bitmap >output &&
-	test_line_count = 1 output
-'
-
-test_expect_success 'fetch (full bitmap)' '
-	git --git-dir=clone.git fetch origin second:second &&
-	git rev-parse HEAD >expect &&
-	git --git-dir=clone.git rev-parse HEAD >actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'create objects for missing-HAVE tests' '
-	blob=$(echo "missing have" | git hash-object -w --stdin) &&
-	tree=$(printf "100644 blob $blob\tfile\n" | git mktree) &&
-	parent=$(echo parent | git commit-tree $tree) &&
-	commit=$(echo commit | git commit-tree $tree -p $parent) &&
-	cat >revs <<-EOF
-	HEAD
-	^HEAD^
-	^$commit
-	EOF
-'
-
-test_expect_success 'pack-objects respects --incremental' '
-	cat >revs2 <<-EOF &&
-	HEAD
-	$commit
-	EOF
-	git pack-objects --incremental --stdout --revs <revs2 >4.pack &&
-	git index-pack 4.pack &&
-	list_packed_objects 4.idx >4.objects &&
-	test_line_count = 4 4.objects &&
-	git rev-list --objects $commit >revlist &&
-	cut -d" " -f1 revlist |sort >objects &&
-	test_cmp 4.objects objects
-'
-
-test_expect_success 'pack with missing blob' '
-	rm $(objpath $blob) &&
-	git pack-objects --stdout --revs <revs >/dev/null
-'
+test_bitmap_cases "pack.writeBitmapLookupTable"
 
-test_expect_success 'pack with missing tree' '
-	rm $(objpath $tree) &&
-	git pack-objects --stdout --revs <revs >/dev/null
-'
-
-test_expect_success 'pack with missing parent' '
-	rm $(objpath $parent) &&
-	git pack-objects --stdout --revs <revs >/dev/null
-'
-
-test_expect_success JGIT,SHA1 'we can read jgit bitmaps' '
-	git clone --bare . compat-jgit.git &&
-	(
-		cd compat-jgit.git &&
-		rm -f objects/pack/*.bitmap &&
-		jgit gc &&
-		git rev-list --test-bitmap HEAD
-	)
-'
-
-test_expect_success JGIT,SHA1 'jgit can read our bitmaps' '
-	git clone --bare . compat-us.git &&
-	(
-		cd compat-us.git &&
-		git repack -adb &&
-		# jgit gc will barf if it does not like our bitmaps
-		jgit gc
-	)
-'
-
-test_expect_success 'splitting packs does not generate bogus bitmaps' '
-	test-tool genrandom foo $((1024 * 1024)) >rand &&
-	git add rand &&
-	git commit -m "commit with big file" &&
-	git -c pack.packSizeLimit=500k repack -adb &&
-	git init --bare no-bitmaps.git &&
-	git -C no-bitmaps.git fetch .. HEAD
-'
-
-test_expect_success 'set up reusable pack' '
-	rm -f .git/objects/pack/*.keep &&
-	git repack -adb &&
-	reusable_pack () {
-		git for-each-ref --format="%(objectname)" |
-		git pack-objects --delta-base-offset --revs --stdout "$@"
-	}
-'
-
-test_expect_success 'pack reuse respects --honor-pack-keep' '
-	test_when_finished "rm -f .git/objects/pack/*.keep" &&
-	for i in .git/objects/pack/*.pack
-	do
-		>${i%.pack}.keep || return 1
-	done &&
-	reusable_pack --honor-pack-keep >empty.pack &&
-	git index-pack empty.pack &&
-	git show-index <empty.idx >actual &&
-	test_must_be_empty actual
-'
-
-test_expect_success 'pack reuse respects --local' '
-	mv .git/objects/pack/* alt.git/objects/pack/ &&
-	test_when_finished "mv alt.git/objects/pack/* .git/objects/pack/" &&
-	reusable_pack --local >empty.pack &&
-	git index-pack empty.pack &&
-	git show-index <empty.idx >actual &&
-	test_must_be_empty actual
-'
-
-test_expect_success 'pack reuse respects --incremental' '
-	reusable_pack --incremental >empty.pack &&
-	git index-pack empty.pack &&
-	git show-index <empty.idx >actual &&
-	test_must_be_empty actual
-'
-
-test_expect_success 'truncated bitmap fails gracefully (ewah)' '
-	test_config pack.writebitmaphashcache false &&
-	git repack -ad &&
-	git rev-list --use-bitmap-index --count --all >expect &&
-	bitmap=$(ls .git/objects/pack/*.bitmap) &&
-	test_when_finished "rm -f $bitmap" &&
-	test_copy_bytes 256 <$bitmap >$bitmap.tmp &&
-	mv -f $bitmap.tmp $bitmap &&
-	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
-	test_cmp expect actual &&
-	test_i18ngrep corrupt.ewah.bitmap stderr
-'
-
-test_expect_success 'truncated bitmap fails gracefully (cache)' '
-	git repack -ad &&
-	git rev-list --use-bitmap-index --count --all >expect &&
-	bitmap=$(ls .git/objects/pack/*.bitmap) &&
-	test_when_finished "rm -f $bitmap" &&
-	test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
-	mv -f $bitmap.tmp $bitmap &&
-	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
-	test_cmp expect actual &&
-	test_i18ngrep corrupted.bitmap.index stderr
-'
-
-# Create a state of history with these properties:
-#
-#  - refs that allow a client to fetch some new history, while sharing some old
-#    history with the server; we use branches delta-reuse-old and
-#    delta-reuse-new here
-#
-#  - the new history contains an object that is stored on the server as a delta
-#    against a base that is in the old history
-#
-#  - the base object is not immediately reachable from the tip of the old
-#    history; finding it would involve digging down through history we know the
-#    other side has
-#
-# This should result in a state where fetching from old->new would not
-# traditionally reuse the on-disk delta (because we'd have to dig to realize
-# that the client has it), but we will do so if bitmaps can tell us cheaply
-# that the other side has it.
-test_expect_success 'set up thin delta-reuse parent' '
-	# This first commit contains the buried base object.
-	test-tool genrandom delta 16384 >file &&
-	git add file &&
-	git commit -m "delta base" &&
-	base=$(git rev-parse --verify HEAD:file) &&
-
-	# These intermediate commits bury the base back in history.
-	# This becomes the "old" state.
-	for i in 1 2 3 4 5
-	do
-		echo $i >file &&
-		git commit -am "intermediate $i" || return 1
-	done &&
-	git branch delta-reuse-old &&
-
-	# And now our new history has a delta against the buried base. Note
-	# that this must be smaller than the original file, since pack-objects
-	# prefers to create deltas from smaller objects to larger.
-	test-tool genrandom delta 16300 >file &&
-	git commit -am "delta result" &&
-	delta=$(git rev-parse --verify HEAD:file) &&
-	git branch delta-reuse-new &&
-
-	# Repack with bitmaps and double check that we have the expected delta
-	# relationship.
-	git repack -adb &&
-	have_delta $delta $base
-'
-
-# Now we can sanity-check the non-bitmap behavior (that the server is not able
-# to reuse the delta). This isn't strictly something we care about, so this
-# test could be scrapped in the future. But it makes sure that the next test is
-# actually triggering the feature we want.
-#
-# Note that our tools for working with on-the-wire "thin" packs are limited. So
-# we actually perform the fetch, retain the resulting pack, and inspect the
-# result.
-test_expect_success 'fetch without bitmaps ignores delta against old base' '
-	test_config pack.usebitmaps false &&
-	test_when_finished "rm -rf client.git" &&
-	git init --bare client.git &&
-	(
-		cd client.git &&
-		git config transfer.unpackLimit 1 &&
-		git fetch .. delta-reuse-old:delta-reuse-old &&
-		git fetch .. delta-reuse-new:delta-reuse-new &&
-		have_delta $delta $ZERO_OID
-	)
-'
-
-# And do the same for the bitmap case, where we do expect to find the delta.
-test_expect_success 'fetch with bitmaps can reuse old base' '
-	test_config pack.usebitmaps true &&
-	test_when_finished "rm -rf client.git" &&
-	git init --bare client.git &&
-	(
-		cd client.git &&
-		git config transfer.unpackLimit 1 &&
-		git fetch .. delta-reuse-old:delta-reuse-old &&
-		git fetch .. delta-reuse-new:delta-reuse-new &&
-		have_delta $delta $base
-	)
-'
-
-test_expect_success 'pack.preferBitmapTips' '
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
-
-		# create enough commits that not all are receive bitmap
-		# coverage even if they are all at the tip of some reference.
-		test_commit_bulk --message="%s" 103 &&
-
-		git rev-list HEAD >commits.raw &&
-		sort <commits.raw >commits &&
-
-		git log --format="create refs/tags/%s %H" HEAD >refs &&
-		git update-ref --stdin <refs &&
-
-		git repack -adb &&
-		test-tool bitmap list-commits | sort >bitmaps &&
-
-		# remember which commits did not receive bitmaps
-		comm -13 bitmaps commits >before &&
-		test_file_not_empty before &&
-
-		# mark the commits which did not receive bitmaps as preferred,
-		# and generate the bitmap again
-		perl -pe "s{^}{create refs/tags/include/$. }" <before |
-			git update-ref --stdin &&
-		git -c pack.preferBitmapTips=refs/tags/include repack -adb &&
-
-		# finally, check that the commit(s) without bitmap coverage
-		# are not the same ones as before
-		test-tool bitmap list-commits | sort >bitmaps &&
-		comm -13 bitmaps commits >after &&
-
-		! test_cmp before after
-	)
-'
-
-test_expect_success 'complains about multiple pack bitmaps' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
-
-		test_commit base &&
-
-		git repack -adb &&
-		bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
-		mv "$bitmap" "$bitmap.bak" &&
-
-		test_commit other &&
-		git repack -ab &&
-
-		mv "$bitmap.bak" "$bitmap" &&
-
-		find .git/objects/pack -type f -name "*.pack" >packs &&
-		find .git/objects/pack -type f -name "*.bitmap" >bitmaps &&
-		test_line_count = 2 packs &&
-		test_line_count = 2 bitmaps &&
-
-		git rev-list --use-bitmap-index HEAD 2>err &&
-		grep "ignoring extra bitmap file" err
-	)
+test_expect_success 'verify writing bitmap lookup table when enabled' '
+	GIT_TRACE2_EVENT="$(pwd)/trace2" \
+		git repack -ad &&
+	grep "\"label\":\"writing_lookup_table\"" trace2
 '
 
 test_done
diff --git a/t/t5311-pack-bitmaps-shallow.sh b/t/t5311-pack-bitmaps-shallow.sh
index 872a95df338..9dae60f73e3 100755
--- a/t/t5311-pack-bitmaps-shallow.sh
+++ b/t/t5311-pack-bitmaps-shallow.sh
@@ -17,23 +17,40 @@ test_description='check bitmap operation with shallow repositories'
 # the tree for A. But in a shallow one, we've grafted away
 # A, and fetching A to B requires that the other side send
 # us the tree for file=1.
-test_expect_success 'setup shallow repo' '
-	echo 1 >file &&
-	git add file &&
-	git commit -m orig &&
-	echo 2 >file &&
-	git commit -a -m update &&
-	git clone --no-local --bare --depth=1 . shallow.git &&
-	echo 1 >file &&
-	git commit -a -m repeat
-'
-
-test_expect_success 'turn on bitmaps in the parent' '
-	git repack -adb
-'
-
-test_expect_success 'shallow fetch from bitmapped repo' '
-	(cd shallow.git && git fetch)
-'
+test_shallow_bitmaps () {
+	writeLookupTable=false
+
+	for i in "$@"
+	do
+		case $i in
+		"pack.writeBitmapLookupTable") writeLookupTable=true;;
+		esac
+	done
+
+	test_expect_success 'setup shallow repo' '
+		rm -rf * .git &&
+		git init &&
+		git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+		echo 1 >file &&
+		git add file &&
+		git commit -m orig &&
+		echo 2 >file &&
+		git commit -a -m update &&
+		git clone --no-local --bare --depth=1 . shallow.git &&
+		echo 1 >file &&
+		git commit -a -m repeat
+	'
+
+	test_expect_success 'turn on bitmaps in the parent' '
+		git repack -adb
+	'
+
+	test_expect_success 'shallow fetch from bitmapped repo' '
+		(cd shallow.git && git fetch)
+	'
+}
+
+test_shallow_bitmaps
+test_shallow_bitmaps "pack.writeBitmapLookupTable"
 
 test_done
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 4fe57414c13..3b206adcee6 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -15,17 +15,24 @@ GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
 sane_unset GIT_TEST_MIDX_WRITE_REV
 sane_unset GIT_TEST_MIDX_READ_RIDX
 
-midx_bitmap_core
-
 bitmap_reuse_tests() {
 	from=$1
 	to=$2
+	writeLookupTable=false
+
+	for i in $3-${$#}
+	do
+		case $i in
+		"pack.writeBitmapLookupTable") writeLookupTable=true;;
+		esac
+	done
 
 	test_expect_success "setup pack reuse tests ($from -> $to)" '
 		rm -fr repo &&
 		git init repo &&
 		(
 			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 			test_commit_bulk 16 &&
 			git tag old-tip &&
 
@@ -43,6 +50,7 @@ bitmap_reuse_tests() {
 	test_expect_success "build bitmap from existing ($from -> $to)" '
 		(
 			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 			test_commit_bulk --id=further 16 &&
 			git tag new-tip &&
 
@@ -59,6 +67,7 @@ bitmap_reuse_tests() {
 	test_expect_success "verify resulting bitmaps ($from -> $to)" '
 		(
 			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 			git for-each-ref &&
 			git rev-list --test-bitmap refs/tags/old-tip &&
 			git rev-list --test-bitmap refs/tags/new-tip
@@ -66,244 +75,294 @@ bitmap_reuse_tests() {
 	'
 }
 
-bitmap_reuse_tests 'pack' 'MIDX'
-bitmap_reuse_tests 'MIDX' 'pack'
-bitmap_reuse_tests 'MIDX' 'MIDX'
+test_midx_bitmap_cases () {
+	writeLookupTable=false
+	writeBitmapLookupTable=
+
+	for i in "$@"
+	do
+		case $i in
+		"pack.writeBitmapLookupTable")
+			writeLookupTable=true
+			writeBitmapLookupTable="$i"
+			;;
+		esac
+	done
+
+	test_expect_success 'setup test_repository' '
+		rm -rf * .git &&
+		git init &&
+		git config pack.writeBitmapLookupTable '"$writeLookupTable"'
+	'
 
-test_expect_success 'missing object closure fails gracefully' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+	midx_bitmap_core
 
-		test_commit loose &&
-		test_commit packed &&
+	bitmap_reuse_tests 'pack' 'MIDX' "$writeBitmapLookupTable"
+	bitmap_reuse_tests 'MIDX' 'pack' "$writeBitmapLookupTable"
+	bitmap_reuse_tests 'MIDX' 'MIDX' "$writeBitmapLookupTable"
 
-		# Do not pass "--revs"; we want a pack without the "loose"
-		# commit.
-		git pack-objects $objdir/pack/pack <<-EOF &&
-		$(git rev-parse packed)
-		EOF
+	test_expect_success 'missing object closure fails gracefully' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-		test_must_fail git multi-pack-index write --bitmap 2>err &&
-		grep "doesn.t have full closure" err &&
-		test_path_is_missing $midx
-	)
-'
+			test_commit loose &&
+			test_commit packed &&
 
-midx_bitmap_partial_tests
+			# Do not pass "--revs"; we want a pack without the "loose"
+			# commit.
+			git pack-objects $objdir/pack/pack <<-EOF &&
+			$(git rev-parse packed)
+			EOF
 
-test_expect_success 'removing a MIDX clears stale bitmaps' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
-		test_commit base &&
-		git repack &&
-		git multi-pack-index write --bitmap &&
+			test_must_fail git multi-pack-index write --bitmap 2>err &&
+			grep "doesn.t have full closure" err &&
+			test_path_is_missing $midx
+		)
+	'
 
-		# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
-		stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
-		rm $midx &&
+	midx_bitmap_partial_tests
 
-		# Then write a new MIDX.
-		test_commit new &&
-		git repack &&
-		git multi-pack-index write --bitmap &&
+	test_expect_success 'removing a MIDX clears stale bitmaps' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+			test_commit base &&
+			git repack &&
+			git multi-pack-index write --bitmap &&
+
+			# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
+			stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
+			rm $midx &&
+
+			# Then write a new MIDX.
+			test_commit new &&
+			git repack &&
+			git multi-pack-index write --bitmap &&
+
+			test_path_is_file $midx &&
+			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+			test_path_is_missing $stale_bitmap
+		)
+	'
 
-		test_path_is_file $midx &&
-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
-		test_path_is_missing $stale_bitmap
-	)
-'
+	test_expect_success 'pack.preferBitmapTips' '
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-test_expect_success 'pack.preferBitmapTips' '
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+			test_commit_bulk --message="%s" 103 &&
 
-		test_commit_bulk --message="%s" 103 &&
+			git log --format="%H" >commits.raw &&
+			sort <commits.raw >commits &&
 
-		git log --format="%H" >commits.raw &&
-		sort <commits.raw >commits &&
+			git log --format="create refs/tags/%s %H" HEAD >refs &&
+			git update-ref --stdin <refs &&
 
-		git log --format="create refs/tags/%s %H" HEAD >refs &&
-		git update-ref --stdin <refs &&
+			git multi-pack-index write --bitmap &&
+			test_path_is_file $midx &&
+			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
 
-		git multi-pack-index write --bitmap &&
-		test_path_is_file $midx &&
-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			comm -13 bitmaps commits >before &&
+			test_line_count = 1 before &&
 
-		test-tool bitmap list-commits | sort >bitmaps &&
-		comm -13 bitmaps commits >before &&
-		test_line_count = 1 before &&
+			perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
+				<before | git update-ref --stdin &&
 
-		perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
-			<before | git update-ref --stdin &&
+			rm -fr $midx-$(midx_checksum $objdir).bitmap &&
+			rm -fr $midx &&
 
-		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
-		rm -fr $midx &&
+			git -c pack.preferBitmapTips=refs/tags/include \
+				multi-pack-index write --bitmap &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			comm -13 bitmaps commits >after &&
 
-		git -c pack.preferBitmapTips=refs/tags/include \
-			multi-pack-index write --bitmap &&
-		test-tool bitmap list-commits | sort >bitmaps &&
-		comm -13 bitmaps commits >after &&
+			! test_cmp before after
+		)
+	'
 
-		! test_cmp before after
-	)
-'
+	test_expect_success 'writing a bitmap with --refs-snapshot' '
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-test_expect_success 'writing a bitmap with --refs-snapshot' '
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+			test_commit one &&
+			test_commit two &&
 
-		test_commit one &&
-		test_commit two &&
+			git rev-parse one >snapshot &&
 
-		git rev-parse one >snapshot &&
+			git repack -ad &&
 
-		git repack -ad &&
+			# First, write a MIDX which see both refs/tags/one and
+			# refs/tags/two (causing both of those commits to receive
+			# bitmaps).
+			git multi-pack-index write --bitmap &&
 
-		# First, write a MIDX which see both refs/tags/one and
-		# refs/tags/two (causing both of those commits to receive
-		# bitmaps).
-		git multi-pack-index write --bitmap &&
+			test_path_is_file $midx &&
+			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
 
-		test_path_is_file $midx &&
-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			grep "$(git rev-parse one)" bitmaps &&
+			grep "$(git rev-parse two)" bitmaps &&
 
-		test-tool bitmap list-commits | sort >bitmaps &&
-		grep "$(git rev-parse one)" bitmaps &&
-		grep "$(git rev-parse two)" bitmaps &&
+			rm -fr $midx-$(midx_checksum $objdir).bitmap &&
+			rm -fr $midx &&
 
-		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
-		rm -fr $midx &&
+			# Then again, but with a refs snapshot which only sees
+			# refs/tags/one.
+			git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
 
-		# Then again, but with a refs snapshot which only sees
-		# refs/tags/one.
-		git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
+			test_path_is_file $midx &&
+			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
 
-		test_path_is_file $midx &&
-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			grep "$(git rev-parse one)" bitmaps &&
+			! grep "$(git rev-parse two)" bitmaps
+		)
+	'
 
-		test-tool bitmap list-commits | sort >bitmaps &&
-		grep "$(git rev-parse one)" bitmaps &&
-		! grep "$(git rev-parse two)" bitmaps
-	)
-'
+	test_expect_success 'write a bitmap with --refs-snapshot (preferred tips)' '
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-test_expect_success 'write a bitmap with --refs-snapshot (preferred tips)' '
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+			test_commit_bulk --message="%s" 103 &&
 
-		test_commit_bulk --message="%s" 103 &&
+			git log --format="%H" >commits.raw &&
+			sort <commits.raw >commits &&
 
-		git log --format="%H" >commits.raw &&
-		sort <commits.raw >commits &&
+			git log --format="create refs/tags/%s %H" HEAD >refs &&
+			git update-ref --stdin <refs &&
 
-		git log --format="create refs/tags/%s %H" HEAD >refs &&
-		git update-ref --stdin <refs &&
+			git multi-pack-index write --bitmap &&
+			test_path_is_file $midx &&
+			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
 
-		git multi-pack-index write --bitmap &&
-		test_path_is_file $midx &&
-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			comm -13 bitmaps commits >before &&
+			test_line_count = 1 before &&
 
-		test-tool bitmap list-commits | sort >bitmaps &&
-		comm -13 bitmaps commits >before &&
-		test_line_count = 1 before &&
+			(
+				grep -vf before commits.raw &&
+				# mark missing commits as preferred
+				sed "s/^/+/" before
+			) >snapshot &&
 
+			rm -fr $midx-$(midx_checksum $objdir).bitmap &&
+			rm -fr $midx &&
+
+			git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			comm -13 bitmaps commits >after &&
+
+			! test_cmp before after
+		)
+	'
+
+	test_expect_success 'hash-cache values are propagated from pack bitmaps' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
 		(
-			grep -vf before commits.raw &&
-			# mark missing commits as preferred
-			sed "s/^/+/" before
-		) >snapshot &&
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
-		rm -fr $midx &&
+			test_commit base &&
+			test_commit base2 &&
+			git repack -adb &&
 
-		git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
-		test-tool bitmap list-commits | sort >bitmaps &&
-		comm -13 bitmaps commits >after &&
+			test-tool bitmap dump-hashes >pack.raw &&
+			test_file_not_empty pack.raw &&
+			sort pack.raw >pack.hashes &&
 
-		! test_cmp before after
-	)
-'
+			test_commit new &&
+			git repack &&
+			git multi-pack-index write --bitmap &&
 
-test_expect_success 'hash-cache values are propagated from pack bitmaps' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+			test-tool bitmap dump-hashes >midx.raw &&
+			sort midx.raw >midx.hashes &&
 
-		test_commit base &&
-		test_commit base2 &&
-		git repack -adb &&
+			# ensure that every namehash in the pack bitmap can be found in
+			# the midx bitmap (i.e., that there are no oid-namehash pairs
+			# unique to the pack bitmap).
+			comm -23 pack.hashes midx.hashes >dropped.hashes &&
+			test_must_be_empty dropped.hashes
+		)
+	'
 
-		test-tool bitmap dump-hashes >pack.raw &&
-		test_file_not_empty pack.raw &&
-		sort pack.raw >pack.hashes &&
+	test_expect_success 'no .bitmap is written without any objects' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-		test_commit new &&
-		git repack &&
-		git multi-pack-index write --bitmap &&
+			empty="$(git pack-objects $objdir/pack/pack </dev/null)" &&
+			cat >packs <<-EOF &&
+			pack-$empty.idx
+			EOF
 
-		test-tool bitmap dump-hashes >midx.raw &&
-		sort midx.raw >midx.hashes &&
+			git multi-pack-index write --bitmap --stdin-packs \
+				<packs 2>err &&
 
-		# ensure that every namehash in the pack bitmap can be found in
-		# the midx bitmap (i.e., that there are no oid-namehash pairs
-		# unique to the pack bitmap).
-		comm -23 pack.hashes midx.hashes >dropped.hashes &&
-		test_must_be_empty dropped.hashes
-	)
-'
+			grep "bitmap without any objects" err &&
 
-test_expect_success 'no .bitmap is written without any objects' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+			test_path_is_file $midx &&
+			test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
+		)
+	'
+
+	test_expect_success 'graceful fallback when missing reverse index' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-		empty="$(git pack-objects $objdir/pack/pack </dev/null)" &&
-		cat >packs <<-EOF &&
-		pack-$empty.idx
-		EOF
+			test_commit base &&
 
-		git multi-pack-index write --bitmap --stdin-packs \
-			<packs 2>err &&
+			# write a pack and MIDX bitmap containing base
+			git repack -adb &&
+			git multi-pack-index write --bitmap &&
 
-		grep "bitmap without any objects" err &&
+			GIT_TEST_MIDX_READ_RIDX=0 \
+				git rev-list --use-bitmap-index HEAD 2>err &&
+			! grep "ignoring extra bitmap file" err
+		)
+	'
+}
 
-		test_path_is_file $midx &&
-		test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
-	)
-'
+test_midx_bitmap_cases
+
+test_midx_bitmap_cases "pack.writeBitmapLookupTable"
 
-test_expect_success 'graceful fallback when missing reverse index' '
+test_expect_success 'multi-pack-index write writes lookup table if enabled' '
 	rm -fr repo &&
 	git init repo &&
 	test_when_finished "rm -fr repo" &&
 	(
 		cd repo &&
-
 		test_commit base &&
-
-		# write a pack and MIDX bitmap containing base
-		git repack -adb &&
-		git multi-pack-index write --bitmap &&
-
-		GIT_TEST_MIDX_READ_RIDX=0 \
-			git rev-list --use-bitmap-index HEAD 2>err &&
-		! grep "ignoring extra bitmap file" err
+		git config pack.writeBitmapLookupTable true &&
+		git repack -ad &&
+		GIT_TRACE2_EVENT="$(pwd)/trace" \
+			git multi-pack-index write --bitmap &&
+		grep "\"label\":\"writing_lookup_table\"" trace
 	)
 '
 
diff --git a/t/t5327-multi-pack-bitmaps-rev.sh b/t/t5327-multi-pack-bitmaps-rev.sh
index d30ba632c87..5ed16a820d1 100755
--- a/t/t5327-multi-pack-bitmaps-rev.sh
+++ b/t/t5327-multi-pack-bitmaps-rev.sh
@@ -17,7 +17,27 @@ GIT_TEST_MIDX_READ_RIDX=0
 export GIT_TEST_MIDX_WRITE_REV
 export GIT_TEST_MIDX_READ_RIDX
 
-midx_bitmap_core rev
-midx_bitmap_partial_tests rev
+test_midx_bitmap_rev () {
+     writeLookupTable=false
+
+ 	for i in "$@"
+ 	do
+ 		case $i in
+ 		"pack.writeBitmapLookupTable") writeLookupTable=true;;
+ 		esac
+ 	done
+
+     test_expect_success 'setup bitmap config' '
+         rm -rf * .git &&
+         git init &&
+         git config pack.writeBitmapLookupTable '"$writeLookupTable"'
+     '
+
+     midx_bitmap_core rev
+     midx_bitmap_partial_tests rev
+ }
+
+ test_midx_bitmap_rev
+ test_midx_bitmap_rev "pack.writeBitmapLookupTable"
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v5 4/6] pack-bitmap: prepare to read lookup table extension
  2022-07-20 18:38       ` [PATCH v5 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                           ` (2 preceding siblings ...)
  2022-07-20 18:38         ` [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
@ 2022-07-20 18:38         ` Abhradeep Chakraborty via GitGitGadget
  2022-07-26  1:13           ` Taylor Blau
  2022-07-20 18:38         ` [PATCH v5 5/6] p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex` Abhradeep Chakraborty via GitGitGadget
                           ` (2 subsequent siblings)
  6 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-07-20 18:38 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Earlier change teaches Git to write bitmap lookup table. But Git
does not know how to parse them.

Teach Git to parse the existing bitmap lookup table. The older
versions of Git are not affected by it. Those versions ignore the
lookup table.

Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 pack-bitmap.c           | 275 ++++++++++++++++++++++++++++++++++++++--
 pack-bitmap.h           |   9 ++
 t/t5310-pack-bitmaps.sh |  22 ++++
 3 files changed, 296 insertions(+), 10 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 36134222d7a..c7336397717 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -82,6 +82,12 @@ struct bitmap_index {
 	/* The checksum of the packfile or MIDX; points into map. */
 	const unsigned char *checksum;
 
+	/*
+	 * If not NULL, this point into the commit table extension
+	 * (within the memory mapped region `map`).
+	 */
+	unsigned char *table_lookup;
+
 	/*
 	 * Extended index.
 	 *
@@ -185,6 +191,16 @@ static int load_bitmap_header(struct bitmap_index *index)
 			index->hashes = (void *)(index_end - cache_size);
 			index_end -= cache_size;
 		}
+
+		if (flags & BITMAP_OPT_LOOKUP_TABLE) {
+			size_t table_size = st_mult(ntohl(header->entry_count),
+						    BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH);
+			if (table_size > index_end - index->map - header_size)
+				return error(_("corrupted bitmap index file (too short to fit lookup table)"));
+			if (git_env_bool("GIT_TEST_READ_COMMIT_TABLE", 1))
+				index->table_lookup = (void *)(index_end - table_size);
+			index_end -= table_size;
+		}
 	}
 
 	index->entry_count = ntohl(header->entry_count);
@@ -211,11 +227,13 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
 
 	hash_pos = kh_put_oid_map(index->bitmaps, stored->oid, &ret);
 
-	/* a 0 return code means the insertion succeeded with no changes,
-	 * because the SHA1 already existed on the map. this is bad, there
-	 * shouldn't be duplicated commits in the index */
+	/*
+	 * A 0 return code means the insertion succeeded with no changes,
+	 * because the SHA1 already existed on the map. This is bad, there
+	 * shouldn't be duplicated commits in the index.
+	 */
 	if (ret == 0) {
-		error("Duplicate entry in bitmap index: %s", oid_to_hex(oid));
+		error(_("duplicate entry in bitmap index: %s"), oid_to_hex(oid));
 		return NULL;
 	}
 
@@ -470,7 +488,7 @@ static int load_bitmap(struct bitmap_index *bitmap_git)
 		!(bitmap_git->tags = read_bitmap_1(bitmap_git)))
 		goto failed;
 
-	if (load_bitmap_entries_v1(bitmap_git) < 0)
+	if (!bitmap_git->table_lookup && load_bitmap_entries_v1(bitmap_git) < 0)
 		goto failed;
 
 	return 0;
@@ -557,13 +575,238 @@ struct include_data {
 	struct bitmap *seen;
 };
 
+struct bitmap_lookup_table_triplet {
+	uint32_t commit_pos;
+	uint64_t offset;
+	uint32_t xor_row;
+};
+
+struct bitmap_lookup_table_xor_item {
+	struct object_id oid;
+	uint64_t offset;
+};
+
+/*
+ * Given a `triplet` struct pointer and pointer `p`, this
+ * function reads the triplet beginning at `p` into the struct.
+ * Note that this function assumes that there is enough memory
+ * left for filling the `triplet` struct from `p`.
+ */
+static int lookup_table_get_triplet_by_pointer(struct bitmap_lookup_table_triplet *triplet,
+					       const unsigned char *p)
+{
+	if (!triplet)
+		return -1;
+
+	triplet->commit_pos = get_be32(p);
+	p += sizeof(uint32_t);
+	triplet->offset = get_be64(p);
+	p += sizeof(uint64_t);
+	triplet->xor_row = get_be32(p);
+	return 0;
+}
+
+/*
+ * This function gets the raw triplet from `row`'th row in the
+ * lookup table and fills that data to the `triplet`.
+ */
+static int lookup_table_get_triplet(struct bitmap_index *bitmap_git,
+				    uint32_t pos,
+				    struct bitmap_lookup_table_triplet *triplet)
+{
+	unsigned char *p = NULL;
+	if (pos >= bitmap_git->entry_count)
+		return error(_("corrupt bitmap lookup table: triplet position out of index"));
+
+	p = bitmap_git->table_lookup + st_mult(pos, BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH);
+
+	return lookup_table_get_triplet_by_pointer(triplet, p);
+}
+
+/*
+ * Searches for a matching triplet. `commit_pos` is a pointer
+ * to the wanted commit position value. `table_entry` points to
+ * a triplet in lookup table. The first 4 bytes of each
+ * triplet (pointed by `table_entry`) are compared with `*commit_pos`.
+ */
+static int triplet_cmp(const void *commit_pos, const void *table_entry)
+{
+
+	uint32_t a = *(uint32_t *)commit_pos;
+	uint32_t b = get_be32(table_entry);
+	if (a > b)
+		return 1;
+	else if (a < b)
+		return -1;
+
+	return 0;
+}
+
+static uint32_t bsearch_pos(struct bitmap_index *bitmap_git,
+			    struct object_id *oid,
+			    uint32_t *result)
+{
+	int found;
+
+	if (bitmap_is_midx(bitmap_git))
+		found = bsearch_midx(oid, bitmap_git->midx, result);
+	else
+		found = bsearch_pack(oid, bitmap_git->pack, result);
+
+	return found;
+}
+
+/*
+ * `bsearch_triplet_by_pos` function searches for the raw triplet
+ * having commit position same as `commit_pos` and fills `triplet`
+ * object from the raw triplet. Returns 1 on success and 0 on
+ * failure.
+ */
+static int bsearch_triplet_by_pos(uint32_t commit_pos,
+				  struct bitmap_index *bitmap_git,
+				  struct bitmap_lookup_table_triplet *triplet)
+{
+	unsigned char *p = bsearch(&commit_pos, bitmap_git->table_lookup, bitmap_git->entry_count,
+				   BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH, triplet_cmp);
+
+	if (!p)
+		return -1;
+
+	return lookup_table_get_triplet_by_pointer(triplet, p);
+}
+
+static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_git,
+					  struct commit *commit)
+{
+	uint32_t commit_pos, xor_row;
+	uint64_t offset;
+	int flags, found;
+	struct bitmap_lookup_table_triplet triplet;
+	struct object_id *oid = &commit->object.oid;
+	struct ewah_bitmap *bitmap;
+	struct stored_bitmap *xor_bitmap = NULL;
+	const int bitmap_header_size = 6;
+	static struct bitmap_lookup_table_xor_item *xor_items = NULL;
+	static size_t xor_items_nr = 0, xor_items_alloc = 0;
+	static int is_corrupt = 0;
+
+	if (is_corrupt)
+		return NULL;
+
+	found = bsearch_pos(bitmap_git, oid, &commit_pos);
+
+	if (!found)
+		return NULL;
+
+	if (bsearch_triplet_by_pos(commit_pos, bitmap_git, &triplet) < 0)
+		return NULL;
+
+	xor_items_nr = 0;
+	offset = triplet.offset;
+	xor_row = triplet.xor_row;
+
+	if (xor_row != 0xffffffff) {
+		int xor_flags;
+		khiter_t hash_pos;
+		struct bitmap_lookup_table_xor_item *xor_item;
+
+		while (xor_row != 0xffffffff) {
+			ALLOC_GROW(xor_items, xor_items_nr + 1, xor_items_alloc);
+
+			if (xor_items_nr + 1 >= bitmap_git->entry_count) {
+				error(_("corrupt bitmap lookup table: xor chain exceed entry count"));
+				goto corrupt;
+			}
+
+			if (lookup_table_get_triplet(bitmap_git, xor_row, &triplet) < 0)
+				goto corrupt;
+
+			xor_item = &xor_items[xor_items_nr];
+			xor_item->offset = triplet.offset;
+
+			if (nth_bitmap_object_oid(bitmap_git, &xor_item->oid, triplet.commit_pos) < 0) {
+				error(_("corrupt bitmap lookup table: commit index %u out of range"),
+					triplet.commit_pos);
+				goto corrupt;
+			}
+
+			hash_pos = kh_get_oid_map(bitmap_git->bitmaps, xor_item->oid);
+
+			/*
+			 * If desired bitmap is already stored, we don't need
+			 * to iterate further. Because we know that bitmaps
+			 * that are needed to be parsed to parse this bitmap
+			 * has already been stored. So, assign this stored bitmap
+			 * to the xor_bitmap.
+			 */
+			if (hash_pos < kh_end(bitmap_git->bitmaps) &&
+			    (xor_bitmap = kh_value(bitmap_git->bitmaps, hash_pos)))
+				break;
+			xor_items_nr++;
+			xor_row = triplet.xor_row;
+		}
+
+		while (xor_items_nr) {
+			xor_item = &xor_items[xor_items_nr - 1];
+			bitmap_git->map_pos = xor_item->offset;
+			if (bitmap_git->map_size - bitmap_git->map_pos < bitmap_header_size) {
+				error(_("corrupt ewah bitmap: truncated header for bitmap of commit \"%s\""),
+					oid_to_hex(&xor_item->oid));
+				goto corrupt;
+			}
+
+			bitmap_git->map_pos = bitmap_git->map_pos + sizeof(uint32_t) + sizeof(uint8_t);
+			xor_flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
+			bitmap = read_bitmap_1(bitmap_git);
+
+			if (!bitmap)
+				goto corrupt;
+
+			xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid, xor_bitmap, xor_flags);
+			xor_items_nr--;
+		}
+	}
+
+	bitmap_git->map_pos = offset;
+	if (bitmap_git->map_size - bitmap_git->map_pos < bitmap_header_size) {
+		error(_("corrupt ewah bitmap: truncated header for bitmap of commit \"%s\""),
+			oid_to_hex(oid));
+		goto corrupt;
+	}
+
+	bitmap_git->map_pos = bitmap_git->map_pos + sizeof(uint32_t) + sizeof(uint8_t);
+	flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
+	bitmap = read_bitmap_1(bitmap_git);
+
+	if (!bitmap)
+		goto corrupt;
+
+	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags);
+
+corrupt:
+	free(xor_items);
+	is_corrupt = 1;
+	return NULL;
+}
+
 struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
 				      struct commit *commit)
 {
 	khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps,
 					   commit->object.oid);
-	if (hash_pos >= kh_end(bitmap_git->bitmaps))
-		return NULL;
+	if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
+		struct stored_bitmap *bitmap = NULL;
+		if (!bitmap_git->table_lookup)
+			return NULL;
+
+		trace2_region_enter("pack-bitmap", "reading_lookup_table", the_repository);
+		/* NEEDSWORK: cache misses aren't recorded */
+		bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
+		trace2_region_leave("pack-bitmap", "reading_lookup_table", the_repository);
+		if (!bitmap)
+			return NULL;
+		return lookup_stored_bitmap(bitmap);
+	}
 	return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
 }
 
@@ -1699,8 +1942,10 @@ void test_bitmap_walk(struct rev_info *revs)
 	if (revs->pending.nr != 1)
 		die("you must specify exactly one commit to test");
 
-	fprintf(stderr, "Bitmap v%d test (%d entries loaded)\n",
-		bitmap_git->version, bitmap_git->entry_count);
+	fprintf(stderr, "Bitmap v%d test (%d entries%s)",
+		bitmap_git->version,
+		bitmap_git->entry_count,
+		bitmap_git->table_lookup ? "" : " loaded");
 
 	root = revs->pending.objects[0].item;
 	bm = bitmap_for_commit(bitmap_git, (struct commit *)root);
@@ -1753,13 +1998,23 @@ void test_bitmap_walk(struct rev_info *revs)
 
 int test_bitmap_commits(struct repository *r)
 {
-	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
 	struct object_id oid;
 	MAYBE_UNUSED void *value;
+	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
+
+	/*
+	 * As this function is only used to print bitmap selected
+	 * commits, we don't have to read the commit table.
+	 */
 
 	if (!bitmap_git)
 		die("failed to load bitmap indexes");
 
+	if (bitmap_git->table_lookup) {
+		if (load_bitmap_entries_v1(bitmap_git) < 0)
+			die(_("failed to load bitmap indexes"));
+	}
+
 	kh_foreach(bitmap_git->bitmaps, oid, value, {
 		printf("%s\n", oid_to_hex(&oid));
 	});
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 67a9d0fc303..9278f71ac91 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -23,6 +23,15 @@ struct bitmap_disk_header {
 
 #define NEEDS_BITMAP (1u<<22)
 
+/*
+ * The width in bytes of a single triplet in the lookup table
+ * extension:
+ *     (commit_pos, offset, xor_row)
+ *
+ * whose fields ar 32-, 64-, 32- bits wide, respectively.
+ */
+#define BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH (16)
+
 enum pack_bitmap_opts {
 	BITMAP_OPT_FULL_DAG = 0x1,
 	BITMAP_OPT_HASH_CACHE = 0x4,
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index c0607172827..7e50f8e7653 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -258,6 +258,7 @@ test_bitmap_cases () {
 
 	test_expect_success 'truncated bitmap fails gracefully (ewah)' '
 		test_config pack.writebitmaphashcache false &&
+		test_config pack.writebitmaplookuptable false &&
 		git repack -ad &&
 		git rev-list --use-bitmap-index --count --all >expect &&
 		bitmap=$(ls .git/objects/pack/*.bitmap) &&
@@ -270,6 +271,7 @@ test_bitmap_cases () {
 	'
 
 	test_expect_success 'truncated bitmap fails gracefully (cache)' '
+		git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 		git repack -ad &&
 		git rev-list --use-bitmap-index --count --all >expect &&
 		bitmap=$(ls .git/objects/pack/*.bitmap) &&
@@ -453,4 +455,24 @@ test_expect_success 'verify writing bitmap lookup table when enabled' '
 	grep "\"label\":\"writing_lookup_table\"" trace2
 '
 
+test_expect_success 'lookup table is actually used to traverse objects' '
+	git repack -adb &&
+	GIT_TRACE2_EVENT="$(pwd)/trace3" \
+		git rev-list --use-bitmap-index --count --all &&
+	grep "\"label\":\"reading_lookup_table\"" trace3
+'
+
+test_expect_success 'truncated bitmap fails gracefully (lookup table)' '
+	test_config pack.writebitmaphashcache false &&
+	git repack -adb &&
+	git rev-list --use-bitmap-index --count --all >expect &&
+	bitmap=$(ls .git/objects/pack/*.bitmap) &&
+	test_when_finished "rm -f $bitmap" &&
+	test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
+	mv -f $bitmap.tmp $bitmap &&
+	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
+	test_cmp expect actual &&
+	test_i18ngrep corrupted.bitmap.index stderr
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v5 5/6] p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex`
  2022-07-20 18:38       ` [PATCH v5 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                           ` (3 preceding siblings ...)
  2022-07-20 18:38         ` [PATCH v5 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
@ 2022-07-20 18:38         ` Abhradeep Chakraborty via GitGitGadget
  2022-07-26  1:18           ` Taylor Blau
  2022-07-20 18:38         ` [PATCH v5 6/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
  2022-08-14 16:55         ` [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
  6 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-07-20 18:38 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Enable `pack.writeReverseIndex` before running pack-bitmap related
performance tests.

The performance difference with `pack.writeReverseIndex` enabled and
with disabled are given below -

With `pack.writeReverseIndex`
-------------------------------

Test                                                 this tree
-------------------------------------------------------------------------
5310.3: repack to disk                                 296.55(256.53+14.52)
5310.4: simulated clone                                15.64(8.88+1.39)
5310.5: simulated fetch                                1.65(2.75+0.20)
5310.6: pack to file (bitmap)                          48.71(30.20+7.58)
5310.7: rev-list (commits)                             0.61(0.41+0.08)
5310.8: rev-list (objects)                             4.38(4.26+0.09)
5310.9: rev-list with tag negated via --not            0.07(0.02+0.04)
         --all (objects)
5310.10: rev-list with negative tag (objects)          0.05(0.01+0.03)
5310.11: rev-list count with blob:none                 0.08(0.03+0.04)
5310.12: rev-list count with blob:limit=1k             7.29(6.92+0.30)
5310.13: rev-list count with tree:0                    0.08(0.03+0.04)
5310.14: simulated partial clone                       9.45(8.12+0.41)
5310.16: clone (partial bitmap)                        17.02(10.61+2.67)
5310.17: pack to file (partial bitmap)                 51.91(28.57+7.48)
5310.18: rev-list with tree filter (partial bitmap)    1.00(0.22+0.24)

Without `pack.writeReverseIndex`:
-----------------------------

Test                                                  this tree
------------------------------------------------------------------------
5310.3: repack to disk                              293.80(251.30+14.30)
5310.4: simulated clone                             12.50(5.15+1.36)
5310.5: simulated fetch                             1.83(2.90+0.23)
5310.6: pack to file (bitmap)                       39.70(20.25+7.14)
5310.7: rev-list (commits)                          1.00(0.60+0.13)
5310.8: rev-list (objects)                          4.11(4.00+0.10)
5310.9: rev-list with tag negated via --not         0.07(0.02+0.05)
         --all (objects)
5310.10: rev-list with negative tag (objects)       0.23(0.16+0.06)
5310.11: rev-list count with blob:none              0.27(0.18+0.08)
5310.12: rev-list count with blob:limit=1k          6.41(5.98+0.41)
5310.13: rev-list count with tree:0                 0.26(0.18+0.07)
5310.14: simulated partial clone                    4.34(3.29+0.37)
5310.16: clone (partial bitmap)                     21.48(15.12+2.42)
5310.17: pack to file (partial bitmap)              47.35(37.80+4.84)
5310.18: rev-list with tree filter (partial bitmap) 0.73(0.07+0.21)

Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 t/perf/p5310-pack-bitmaps.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index 7ad4f237bc3..6e8abcd5b21 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -13,7 +13,8 @@ test_perf_large_repo
 # We intentionally use the deprecated pack.writebitmaps
 # config so that we can test against older versions of git.
 test_expect_success 'setup bitmap config' '
-	git config pack.writebitmaps true
+	git config pack.writebitmaps true &&
+	git config pack.writeReverseIndex true
 '
 
 # we need to create the tag up front such that it is covered by the repack and
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v5 6/6] bitmap-lookup-table: add performance tests for lookup table
  2022-07-20 18:38       ` [PATCH v5 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                           ` (4 preceding siblings ...)
  2022-07-20 18:38         ` [PATCH v5 5/6] p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex` Abhradeep Chakraborty via GitGitGadget
@ 2022-07-20 18:38         ` Abhradeep Chakraborty via GitGitGadget
  2022-08-14 16:55         ` [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
  6 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-07-20 18:38 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Abhradeep Chakraborty, Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Add performance tests to verify the performance of lookup table with
`pack.writeReverseIndex` enabled. This is to check the performance
when the above configuration is set.

Lookup table makes Git run faster in most of the cases. Below is the
result of `t/perf/p5310-pack-bitmaps.sh`.`perf/p5326-multi-pack-bitmaps.sh`
gives similar result. The repository used in the test is linux kernel.

Test                                                      this tree
---------------------------------------------------------------------------
5310.4: repack to disk (lookup=false)                   296.55(256.53+14.52)
5310.5: simulated clone                                 15.64(8.88+1.39)
5310.6: simulated fetch                                 1.65(2.75+0.20)
5310.7: pack to file (bitmap)                           48.71(30.20+7.58)
5310.8: rev-list (commits)                              0.61(0.41+0.08)
5310.9: rev-list (objects)                              4.38(4.26+0.09)
5310.10: rev-list with tag negated via --not            0.07(0.02+0.04)
         --all (objects)
5310.11: rev-list with negative tag (objects)           0.05(0.01+0.03)
5310.12: rev-list count with blob:none                  0.08(0.03+0.04)
5310.13: rev-list count with blob:limit=1k              7.29(6.92+0.30)
5310.14: rev-list count with tree:0                     0.08(0.03+0.04)
5310.15: simulated partial clone                        9.45(8.12+0.41)
5310.17: clone (partial bitmap)                         21.00(15.04+2.39)
5310.18: pack to file (partial bitmap)                  47.98(38.13+5.23)
5310.19: rev-list with tree filter (partial bitmap)     0.70(0.07+0.20)
5310.22: repack to disk (lookup=true)                   255.92(188.13+20.47)
5310.23: simulated clone                                13.78(8.84+1.09)
5310.24: simulated fetch                                0.52(0.63+0.14)
5310.25: pack to file (bitmap)                          44.34(28.94+6.84)
5310.26: rev-list (commits)                             0.48(0.31+0.06)
5310.27: rev-list (objects)                             4.02(3.93+0.07)
5310.28: rev-list with tag negated via --not            0.04(0.00+0.03)
         --all (objects)
5310.29: rev-list with negative tag (objects)           0.04(0.00+0.03)
5310.30: rev-list count with blob:none                  0.04(0.01+0.03)
5310.31: rev-list count with blob:limit=1k              6.48(6.23+0.22)
5310.32: rev-list count with tree:0                     0.04(0.01+0.03)
5310.33: simulated partial clone                        8.30(7.21+0.36)
5310.35: clone (partial bitmap)                         20.34(15.00+2.41)
5310.36: pack to file (partial bitmap)                  46.45(38.05+5.20)
5310.37: rev-list with tree filter (partial bitmap)     0.61(0.06+0.20)

Test 4-15 are tested without using lookup table. Same tests are
repeated in 16-30 (using lookup table).

Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 t/perf/p5310-pack-bitmaps.sh       | 65 +++++++++++---------
 t/perf/p5326-multi-pack-bitmaps.sh | 95 +++++++++++++++++-------------
 2 files changed, 91 insertions(+), 69 deletions(-)

diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index 6e8abcd5b21..adc753b6177 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -17,39 +17,50 @@ test_expect_success 'setup bitmap config' '
 	git config pack.writeReverseIndex true
 '
 
-# we need to create the tag up front such that it is covered by the repack and
-# thus by generated bitmaps.
-test_expect_success 'create tags' '
-	git tag --message="tag pointing to HEAD" perf-tag HEAD
-'
+test_bitmap () {
+	local enabled="$1"
 
-test_perf 'repack to disk' '
-	git repack -ad
-'
+	# we need to create the tag up front such that it is covered by the repack and
+	# thus by generated bitmaps.
+	test_expect_success 'create tags' '
+		git tag --message="tag pointing to HEAD" perf-tag HEAD
+	'
 
-test_full_bitmap
+	test_expect_success "use lookup table: $enabled" '
+		git config pack.writeBitmapLookupTable '"$enabled"'
+	'
 
-test_expect_success 'create partial bitmap state' '
-	# pick a commit to represent the repo tip in the past
-	cutoff=$(git rev-list HEAD~100 -1) &&
-	orig_tip=$(git rev-parse HEAD) &&
+	test_perf "repack to disk (lookup=$enabled)" '
+		git repack -ad
+	'
 
-	# now kill off all of the refs and pretend we had
-	# just the one tip
-	rm -rf .git/logs .git/refs/* .git/packed-refs &&
-	git update-ref HEAD $cutoff &&
+	test_full_bitmap
 
-	# and then repack, which will leave us with a nice
-	# big bitmap pack of the "old" history, and all of
-	# the new history will be loose, as if it had been pushed
-	# up incrementally and exploded via unpack-objects
-	git repack -Ad &&
+	test_expect_success "create partial bitmap state (lookup=$enabled)" '
+		# pick a commit to represent the repo tip in the past
+		cutoff=$(git rev-list HEAD~100 -1) &&
+		orig_tip=$(git rev-parse HEAD) &&
 
-	# and now restore our original tip, as if the pushes
-	# had happened
-	git update-ref HEAD $orig_tip
-'
+		# now kill off all of the refs and pretend we had
+		# just the one tip
+		rm -rf .git/logs .git/refs/* .git/packed-refs &&
+		git update-ref HEAD $cutoff &&
+
+		# and then repack, which will leave us with a nice
+		# big bitmap pack of the "old" history, and all of
+		# the new history will be loose, as if it had been pushed
+		# up incrementally and exploded via unpack-objects
+		git repack -Ad &&
+
+		# and now restore our original tip, as if the pushes
+		# had happened
+		git update-ref HEAD $orig_tip
+	'
+
+	test_partial_bitmap
+}
 
-test_partial_bitmap
+test_bitmap false
+test_bitmap true
 
 test_done
diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
index f2fa228f16a..1f4c7103529 100755
--- a/t/perf/p5326-multi-pack-bitmaps.sh
+++ b/t/perf/p5326-multi-pack-bitmaps.sh
@@ -6,47 +6,58 @@ test_description='Tests performance using midx bitmaps'
 
 test_perf_large_repo
 
-# we need to create the tag up front such that it is covered by the repack and
-# thus by generated bitmaps.
-test_expect_success 'create tags' '
-	git tag --message="tag pointing to HEAD" perf-tag HEAD
-'
-
-test_expect_success 'start with bitmapped pack' '
-	git repack -adb
-'
-
-test_perf 'setup multi-pack index' '
-	git multi-pack-index write --bitmap
-'
-
-test_expect_success 'drop pack bitmap' '
-	rm -f .git/objects/pack/pack-*.bitmap
-'
-
-test_full_bitmap
-
-test_expect_success 'create partial bitmap state' '
-	# pick a commit to represent the repo tip in the past
-	cutoff=$(git rev-list HEAD~100 -1) &&
-	orig_tip=$(git rev-parse HEAD) &&
-
-	# now pretend we have just one tip
-	rm -rf .git/logs .git/refs/* .git/packed-refs &&
-	git update-ref HEAD $cutoff &&
-
-	# and then repack, which will leave us with a nice
-	# big bitmap pack of the "old" history, and all of
-	# the new history will be loose, as if it had been pushed
-	# up incrementally and exploded via unpack-objects
-	git repack -Ad &&
-	git multi-pack-index write --bitmap &&
-
-	# and now restore our original tip, as if the pushes
-	# had happened
-	git update-ref HEAD $orig_tip
-'
-
-test_partial_bitmap
+test_bitmap () {
+	local enabled="$1"
+
+	# we need to create the tag up front such that it is covered by the repack and
+	# thus by generated bitmaps.
+	test_expect_success 'create tags' '
+		git tag --message="tag pointing to HEAD" perf-tag HEAD
+	'
+
+	test_expect_success "use lookup table: $enabled" '
+		git config pack.writeBitmapLookupTable '"$enabled"'
+	'
+
+	test_expect_success "start with bitmapped pack (lookup=$enabled)" '
+		git repack -adb
+	'
+
+	test_perf "setup multi-pack index (lookup=$enabled)" '
+		git multi-pack-index write --bitmap
+	'
+
+	test_expect_success "drop pack bitmap (lookup=$enabled)" '
+		rm -f .git/objects/pack/pack-*.bitmap
+	'
+
+	test_full_bitmap
+
+	test_expect_success "create partial bitmap state (lookup=$enabled)" '
+		# pick a commit to represent the repo tip in the past
+		cutoff=$(git rev-list HEAD~100 -1) &&
+		orig_tip=$(git rev-parse HEAD) &&
+
+		# now pretend we have just one tip
+		rm -rf .git/logs .git/refs/* .git/packed-refs &&
+		git update-ref HEAD $cutoff &&
+
+		# and then repack, which will leave us with a nice
+		# big bitmap pack of the "old" history, and all of
+		# the new history will be loose, as if it had been pushed
+		# up incrementally and exploded via unpack-objects
+		git repack -Ad &&
+		git multi-pack-index write --bitmap &&
+
+		# and now restore our original tip, as if the pushes
+		# had happened
+		git update-ref HEAD $orig_tip
+	'
+
+	test_partial_bitmap
+}
+
+test_bitmap false
+test_bitmap true
 
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 2/6] pack-bitmap-write.c: write lookup table extension
  2022-07-16 11:50             ` Abhradeep Chakraborty
@ 2022-07-26  0:34               ` Taylor Blau
  0 siblings, 0 replies; 162+ messages in thread
From: Taylor Blau @ 2022-07-26  0:34 UTC (permalink / raw)
  To: Abhradeep Chakraborty
  Cc: Taylor Blau, Abhradeep Chakraborty via GitGitGadget, git,
	Kaartic Sivaram, Derrick Stolee

On Sat, Jul 16, 2022 at 05:20:57PM +0530, Abhradeep Chakraborty wrote:
> I think the comment I added is not that good. The following might be better -
>
>     At the end of this sort table[j] = i means that the i'th
>     bitmap corresponds to j'th bitmapped commit (among the selected commits)
>     in lex order of OIDs.

Makes sense, I think that version of the comment is more helpful. I
appreciate your attention to detail on getting these things right!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v3 4/6] pack-bitmap: prepare to read lookup table extension
  2022-07-18  9:06             ` Martin Ågren
  2022-07-18 19:25               ` Abhradeep Chakraborty
@ 2022-07-26  0:45               ` Taylor Blau
  1 sibling, 0 replies; 162+ messages in thread
From: Taylor Blau @ 2022-07-26  0:45 UTC (permalink / raw)
  To: Martin gren
  Cc: Taylor Blau, Abhradeep Chakraborty,
	Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaram,
	Derrick Stolee

Hi Martin,

> > > > The comment you added is definitely helpful, but I still think that this
> > > > line is a little magical. `*va` isn't really a pointer to a `uint32_t`,
> > > > but a pointer to the start of a triplet, which just *happens* to have a
> > > > 4-byte integer at the beginning of it.
>
> Yeah, this all looks quite magical with the casting, and with the
> asymmetric handling of `va` and `vb`.

Yeah, this was my main point (which I didn't intend to create as much of
a digression with as I appear to have!).

The handling here is all correct, but what I was saying was that even
though we're treating `*vb` as a pointer to a `uint32_t`, reading vb[1]
is bogus, since there isn't another 32-bit value there.

So I was saying that you *could* initialize a triplet struct, assign its
fields appropriately, and then compare `*va` to `triplet->foo`. But I
think setting up a struct to only bother reading the first field is
probably wasteful, hence my suggestion for a clarifying comment.

> > > Are you sure about this? As far as I know, the first parameter of such
> > > comparing functions is always a pointer to the given key that we need
> > > to search for and the second parameter points to each element of an
> > > array.
>
> Yes, that matches my understanding and the man-page for bsearch(3):
>
>   "The compar routine is expected to have two arguments which point to
>   the key object and to an array member, in that order, [...]"
>
> I think it would help to make this something like
>
>   static int triplet_cmp(const void *key, const void *array_item)
>
> to really highlight this asymmetric nature of this function, or to make
> clear how the values flow through our call-chain through something like
>
>   static int triplet_cmp(const void *commit_pos, const void *table_entry)

Yeah, that makes sense to me. I'm not too attached to either name, both
seem OK to me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 2/6] pack-bitmap-write.c: write lookup table extension
  2022-07-20 18:38         ` [PATCH v5 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
@ 2022-07-26  0:52           ` Taylor Blau
  2022-07-26 18:22             ` Abhradeep Chakraborty
  0 siblings, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-07-26  0:52 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Taylor Blau, Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Abhradeep Chakraborty

On Wed, Jul 20, 2022 at 06:38:20PM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>
> The bitmap lookup table extension was documented by an earlier
> change, but Git does not yet know how to write that extension.
>
> Teach Git to write bitmap lookup table extension. The table contains
> the list of `N` <commit_pos, offset, xor_row>` triplets. These
> triplets are sorted according to their commit pos (ascending order).
> The meaning of each data in the i'th triplet is given below:
>
>   - commit_pos stores commit position (in the pack-index or midx).
>     It is a 4 byte network byte order unsigned integer.
>
>   - offset is the position (in the bitmap file) from which that
>     commit's bitmap can be read.
>
>   - xor_row is the position of the triplet in the lookup table
>     whose bitmap is used to compress this bitmap, or `0xffffffff`
>     if no such bitmap exists.
>
> Mentored-by: Taylor Blau <me@ttaylorr.com>
> Co-mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
> Co-authored-by: Taylor Blau <me@ttaylorr.com>
> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> ---
>  pack-bitmap-write.c | 112 ++++++++++++++++++++++++++++++++++++++++----
>  pack-bitmap.h       |   5 +-
>  2 files changed, 107 insertions(+), 10 deletions(-)
>
> diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
> index c43375bd344..9843790cb60 100644
> --- a/pack-bitmap-write.c
> +++ b/pack-bitmap-write.c
> @@ -650,20 +650,19 @@ static const struct object_id *oid_access(size_t pos, const void *table)
>
>  static void write_selected_commits_v1(struct hashfile *f,
>  				      struct pack_idx_entry **index,
> -				      uint32_t index_nr)
> +				      uint32_t index_nr,
> +				      off_t *offsets,
> +				      uint32_t *commit_positions)
>  {
>  	int i;
>
>  	for (i = 0; i < writer.selected_nr; ++i) {
>  		struct bitmapped_commit *stored = &writer.selected[i];
>
> -		int commit_pos =
> -			oid_pos(&stored->commit->object.oid, index, index_nr, oid_access);
> +		if (offsets)
> +			offsets[i] = hashfile_total(f);
>
> -		if (commit_pos < 0)
> -			BUG("trying to write commit not in index");
> -
> -		hashwrite_be32(f, commit_pos);
> +		hashwrite_be32(f, commit_positions[i]);

I wonder if it would make this patch a little more readable to construct
and use the commit_positions array as a single preparatory step before
this commit.

What do you think?

> +static void write_lookup_table(struct hashfile *f,
> +			       struct pack_idx_entry **index,
> +			       uint32_t index_nr,
> +			       off_t *offsets,
> +			       uint32_t *commit_positions)
> +{
> +	uint32_t i;
> +	uint32_t *table, *table_inv;
> +
> +	ALLOC_ARRAY(table, writer.selected_nr);
> +	ALLOC_ARRAY(table_inv, writer.selected_nr);
> +
> +	for (i = 0; i < writer.selected_nr; i++)
> +		table[i] = i;
> +
> +	/*
> +	 * At the end of this sort table[j] = i means that the i'th
> +	 * bitmap corresponds to j'th bitmapped commit (among the selected
> +	 * commits) in lex order of OIDs.
> +	 */
> +	QSORT_S(table, writer.selected_nr, table_cmp, commit_positions);
> +
> +	/* table_inv helps us discover that relationship (i'th bitmap
> +	 * to j'th commit by j = table_inv[i])
> +	 */
> +	for (i = 0; i < writer.selected_nr; i++)
> +		table_inv[table[i]] = i;
> +
> +	trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository);
> +	for (i = 0; i < writer.selected_nr; i++) {
> +		struct bitmapped_commit *selected = &writer.selected[table[i]];
> +		uint32_t xor_offset = selected->xor_offset;
> +		uint32_t xor_row;
> +
> +		if (xor_offset) {
> +			/*
> +			 * xor_index stores the index (in the bitmap entries)
> +			 * of the corresponding xor bitmap. But we need to convert
> +			 * this index into lookup table's index. So, table_inv[xor_index]
> +			 * gives us the index position w.r.t. the lookup table.
> +			 *
> +			 * If "k = table[i] - xor_offset" then the xor base is the k'th
> +			 * bitmap. `table_inv[k]` gives us the position of that bitmap
> +			 * in the lookup table.
> +			 */
> +			uint32_t xor_index = table[i] - xor_offset;
> +			xor_row = table_inv[xor_index];
> +		} else {
> +			xor_row = 0xffffffff;
> +		}
> +
> +		hashwrite_be32(f, commit_positions[table[i]]);
> +		hashwrite_be64(f, (uint64_t)offsets[table[i]]);
> +		hashwrite_be32(f, xor_row);
> +	}
> +	trace2_region_leave("pack-bitmap-write", "writing_lookup_table", the_repository);
> +
> +	free(table);
> +	free(table_inv);
> +}


> @@ -715,7 +791,25 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
>  	dump_bitmap(f, writer.trees);
>  	dump_bitmap(f, writer.blobs);
>  	dump_bitmap(f, writer.tags);
> -	write_selected_commits_v1(f, index, index_nr);
> +
> +	ALLOC_ARRAY(commit_positions, writer.selected_nr);
> +	for (uint32_t i = 0; i < writer.selected_nr; ++i) {

Nit; we don't typically write for-loop expressions with variable
declarations inside of them. Make sure to declare i outside of the loop,
and then this becomes:

    for (i = 0; i < writer.selected_nr; i++)

(also, we typically use the postfix ++ operator, that is "i++" instead
of "++i" unless there is a reason to prefer the latter over the former).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 4/6] pack-bitmap: prepare to read lookup table extension
  2022-07-20 18:38         ` [PATCH v5 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
@ 2022-07-26  1:13           ` Taylor Blau
  2022-07-26 18:56             ` Abhradeep Chakraborty
  2022-07-26 19:36             ` Eric Sunshine
  0 siblings, 2 replies; 162+ messages in thread
From: Taylor Blau @ 2022-07-26  1:13 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Taylor Blau, Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Abhradeep Chakraborty

On Wed, Jul 20, 2022 at 06:38:22PM +0000, Abhradeep Chakraborty via GitGitGadget wrote:

> @@ -557,13 +575,238 @@ struct include_data {
>  	struct bitmap *seen;
>  };
>
> +struct bitmap_lookup_table_triplet {
> +	uint32_t commit_pos;
> +	uint64_t offset;
> +	uint32_t xor_row;
> +};
> +
> +struct bitmap_lookup_table_xor_item {
> +	struct object_id oid;
> +	uint64_t offset;
> +};
> +
> +/*
> + * Given a `triplet` struct pointer and pointer `p`, this
> + * function reads the triplet beginning at `p` into the struct.
> + * Note that this function assumes that there is enough memory
> + * left for filling the `triplet` struct from `p`.
> + */
> +static int lookup_table_get_triplet_by_pointer(struct bitmap_lookup_table_triplet *triplet,
> +					       const unsigned char *p)
> +{
> +	if (!triplet)
> +		return -1;
> +
> +	triplet->commit_pos = get_be32(p);
> +	p += sizeof(uint32_t);
> +	triplet->offset = get_be64(p);
> +	p += sizeof(uint64_t);
> +	triplet->xor_row = get_be32(p);
> +	return 0;

Just noticing this now, but I wonder if we could avoid incrementing `p`
here and instead write something like:

    triplet->commit_pos = get_be32(p);
    triplet->offset = get_be64(p + sizeof(uint32_t));
    triplet->xor_row = get_be64(p + sizeof(uint64_t) + sizeof(uint32_t));

I don't have a strong feeling about it, though, it just seems to read a
little more directly to me and avoid modifying a variable that is only
going to live as long as the function executes (p).

> +/*
> + * This function gets the raw triplet from `row`'th row in the
> + * lookup table and fills that data to the `triplet`.
> + */
> +static int lookup_table_get_triplet(struct bitmap_index *bitmap_git,
> +				    uint32_t pos,
> +				    struct bitmap_lookup_table_triplet *triplet)
> +{
> +	unsigned char *p = NULL;
> +	if (pos >= bitmap_git->entry_count)
> +		return error(_("corrupt bitmap lookup table: triplet position out of index"));
> +
> +	p = bitmap_git->table_lookup + st_mult(pos, BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH);
> +
> +	return lookup_table_get_triplet_by_pointer(triplet, p);
> +}

Very nice. This cleans things up nicely by being able to call
lookup_table_get_triplet_by_pointer().

Since these are static functions, it doesn't really matter whether or
not they are prefixed with 'bitmap_', since they won't be visible
outside of pack-bitmap.c's compilation unit. But it may be nice to
prefix them with 'bitmap_' just to make it extra clear that these are
internal functions meant to be used within the bitmap machinery only.

> +static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_git,
> +					  struct commit *commit)
> +{
> +	uint32_t commit_pos, xor_row;
> +	uint64_t offset;
> +	int flags, found;
> +	struct bitmap_lookup_table_triplet triplet;
> +	struct object_id *oid = &commit->object.oid;
> +	struct ewah_bitmap *bitmap;
> +	struct stored_bitmap *xor_bitmap = NULL;
> +	const int bitmap_header_size = 6;
> +	static struct bitmap_lookup_table_xor_item *xor_items = NULL;
> +	static size_t xor_items_nr = 0, xor_items_alloc = 0;

I had to double check, but xor_items_alloc inherits the same storage
class at xor_items_nr, so they are both static size_t's.

> +	static int is_corrupt = 0;
> +
> +	if (is_corrupt)
> +		return NULL;

What is the purpose of this conditional? We don't modify `is_corrupt`
before reading it here, so this should be dead code, unless I'm missing
something.

> +	found = bsearch_pos(bitmap_git, oid, &commit_pos);
> +
> +	if (!found)
> +		return NULL;

FWIW, we could eliminate this variable from a particularly long list of
stack variables above by just writing:

    if (!bsearch_pos(bitmap_git, oid, &commit_pos))
      return NULL;

and I think that would be OK.

> +	if (bsearch_triplet_by_pos(commit_pos, bitmap_git, &triplet) < 0)
> +		return NULL;
> +
> +	xor_items_nr = 0;

This initialization *is* necessary, since the xor_items_nr and
xor_items_alloc variable are statically allocated.

> +	offset = triplet.offset;
> +	xor_row = triplet.xor_row;
> +
> +	if (xor_row != 0xffffffff) {

Is this outer conditional needed? I don't think it is. If xor_row is
0xffffffff, then the while loop below won't be entered, and
xor_items_nr will be zero, meaning that the second while loop will also
be skipped.

So I think we can just as easily get rid of this outermost if-statement
and de-dent the main part of this function's body.

> +		int xor_flags;
> +		khiter_t hash_pos;
> +		struct bitmap_lookup_table_xor_item *xor_item;
> +
> +		while (xor_row != 0xffffffff) {
> +			ALLOC_GROW(xor_items, xor_items_nr + 1, xor_items_alloc);
> +
> +			if (xor_items_nr + 1 >= bitmap_git->entry_count) {
> +				error(_("corrupt bitmap lookup table: xor chain exceed entry count"));
> +				goto corrupt;
> +			}
> +
> +			if (lookup_table_get_triplet(bitmap_git, xor_row, &triplet) < 0)
> +				goto corrupt;
> +
> +			xor_item = &xor_items[xor_items_nr];
> +			xor_item->offset = triplet.offset;
> +
> +			if (nth_bitmap_object_oid(bitmap_git, &xor_item->oid, triplet.commit_pos) < 0) {
> +				error(_("corrupt bitmap lookup table: commit index %u out of range"),
> +					triplet.commit_pos);
> +				goto corrupt;
> +			}
> +
> +			hash_pos = kh_get_oid_map(bitmap_git->bitmaps, xor_item->oid);
> +
> +			/*
> +			 * If desired bitmap is already stored, we don't need
> +			 * to iterate further. Because we know that bitmaps
> +			 * that are needed to be parsed to parse this bitmap
> +			 * has already been stored. So, assign this stored bitmap
> +			 * to the xor_bitmap.
> +			 */
> +			if (hash_pos < kh_end(bitmap_git->bitmaps) &&
> +			    (xor_bitmap = kh_value(bitmap_git->bitmaps, hash_pos)))
> +				break;
> +			xor_items_nr++;
> +			xor_row = triplet.xor_row;
> +		}
> +
> +		while (xor_items_nr) {
> +			xor_item = &xor_items[xor_items_nr - 1];
> +			bitmap_git->map_pos = xor_item->offset;
> +			if (bitmap_git->map_size - bitmap_git->map_pos < bitmap_header_size) {
> +				error(_("corrupt ewah bitmap: truncated header for bitmap of commit \"%s\""),
> +					oid_to_hex(&xor_item->oid));
> +				goto corrupt;
> +			}
> +
> +			bitmap_git->map_pos = bitmap_git->map_pos + sizeof(uint32_t) + sizeof(uint8_t);

Could we write:

    bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t)

or similar? Also, a clarifying comment would help explain why we are
advancing, what we're skipping past, and why it's OK.

> +			xor_flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
> +			bitmap = read_bitmap_1(bitmap_git);
> +
> +			if (!bitmap)
> +				goto corrupt;
> +
> +			xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid, xor_bitmap, xor_flags);
> +			xor_items_nr--;
> +		}
> +	}
> +
> +	bitmap_git->map_pos = offset;
> +	if (bitmap_git->map_size - bitmap_git->map_pos < bitmap_header_size) {
> +		error(_("corrupt ewah bitmap: truncated header for bitmap of commit \"%s\""),
> +			oid_to_hex(oid));
> +		goto corrupt;
> +	}
> +
> +	bitmap_git->map_pos = bitmap_git->map_pos + sizeof(uint32_t) + sizeof(uint8_t);
> +	flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
> +	bitmap = read_bitmap_1(bitmap_git);
> +
> +	if (!bitmap)
> +		goto corrupt;
> +
> +	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags);
> +
> +corrupt:
> +	free(xor_items);
> +	is_corrupt = 1;
> +	return NULL;
> +}

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 5/6] p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex`
  2022-07-20 18:38         ` [PATCH v5 5/6] p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex` Abhradeep Chakraborty via GitGitGadget
@ 2022-07-26  1:18           ` Taylor Blau
  2022-07-26  7:15             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-07-26  1:18 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Taylor Blau, Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Abhradeep Chakraborty

On Wed, Jul 20, 2022 at 06:38:23PM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>
> Enable `pack.writeReverseIndex` before running pack-bitmap related
> performance tests.
>
> The performance difference with `pack.writeReverseIndex` enabled and
> with disabled are given below -

Thanks; this order of changes in the t/perf suite makes sense to me. One
note, this sort of change where we're comparing all of the tests in a
single t/perf file against themselves before and after some change, it
is helpful to do (in t/perf)

  ./run HEAD . p5310-pack-bitmaps.sh

which compares HEAD to what's in the current tree. You'll get the
results side-by-side, which makes them a little easier to scan. You can
also aggregate results together from multiple runs with the
t/perf/aggregate.perl script.

One gotcha (that has often bitten me in the past) is that when running
the perf suite with `.` as your build target, it uses whatever git
binary is sitting in your tree. So make sure that it is both (a)
up-to-date, ie., that it is the result of compiling what's currently in
your tree, and (b) that it is compiled with the same settings as what
you built HEAD with.

I have often scratched my head at why the result of running some perf
suite on '.' seems much slower than it should be, only to realize that
the "git" binary sitting in my tree was built with -O0 or something.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 5/6] p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex`
  2022-07-26  1:18           ` Taylor Blau
@ 2022-07-26  7:15             ` Ævar Arnfjörð Bjarmason
  2022-07-26 13:32               ` Derrick Stolee
  0 siblings, 1 reply; 162+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-26  7:15 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaram,
	Derrick Stolee, Philip Oakley, Martin Ågren,
	Abhradeep Chakraborty


On Mon, Jul 25 2022, Taylor Blau wrote:

> On Wed, Jul 20, 2022 at 06:38:23PM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
>> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>>
>> Enable `pack.writeReverseIndex` before running pack-bitmap related
>> performance tests.
>>
>> The performance difference with `pack.writeReverseIndex` enabled and
>> with disabled are given below -
>
> Thanks; this order of changes in the t/perf suite makes sense to me. One
> note, this sort of change where we're comparing all of the tests in a
> single t/perf file against themselves before and after some change, it
> is helpful to do (in t/perf)
>
>   ./run HEAD . p5310-pack-bitmaps.sh
>
> which compares HEAD to what's in the current tree. You'll get the
> results side-by-side, which makes them a little easier to scan. You can
> also aggregate results together from multiple runs with the
> t/perf/aggregate.perl script.
>
> One gotcha (that has often bitten me in the past) is that when running
> the perf suite with `.` as your build target, it uses whatever git
> binary is sitting in your tree. So make sure that it is both (a)
> up-to-date, ie., that it is the result of compiling what's currently in
> your tree, and (b) that it is compiled with the same settings as what
> you built HEAD with.
>
> I have often scratched my head at why the result of running some perf
> suite on '.' seems much slower than it should be, only to realize that
> the "git" binary sitting in my tree was built with -O0 or something.

Rather than comparing HEAD to your current tree it's generally better
to do something like:

	GIT_PERF_MAKE_OPTS='-j3' ./run HEAD~ HEAD [...]

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 5/6] p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex`
  2022-07-26  7:15             ` Ævar Arnfjörð Bjarmason
@ 2022-07-26 13:32               ` Derrick Stolee
  2022-07-26 13:54                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 162+ messages in thread
From: Derrick Stolee @ 2022-07-26 13:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Taylor Blau
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaram,
	Philip Oakley, Martin Ågren, Abhradeep Chakraborty

On 7/26/2022 3:15 AM, Ævar Arnfjörð Bjarmason wrote:
> Rather than comparing HEAD to your current tree it's generally better
> to do something like:
> 
> 	GIT_PERF_MAKE_OPTS='-j3' ./run HEAD~ HEAD [...]

Using the 'run' script fixes the perf test in the worktree and tests
different versions of the 'git' executable.

That doesn't work when the change is in the performance test itself.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 5/6] p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex`
  2022-07-26 13:32               ` Derrick Stolee
@ 2022-07-26 13:54                 ` Ævar Arnfjörð Bjarmason
  2022-07-26 18:17                   ` Abhradeep Chakraborty
  0 siblings, 1 reply; 162+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-26 13:54 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Taylor Blau, Abhradeep Chakraborty via GitGitGadget, git,
	Kaartic Sivaram, Philip Oakley, Martin Ågren,
	Abhradeep Chakraborty


On Tue, Jul 26 2022, Derrick Stolee wrote:

> On 7/26/2022 3:15 AM, Ævar Arnfjörð Bjarmason wrote:
>> Rather than comparing HEAD to your current tree it's generally better
>> to do something like:
>> 
>> 	GIT_PERF_MAKE_OPTS='-j3' ./run HEAD~ HEAD [...]
>
> Using the 'run' script fixes the perf test in the worktree and tests
> different versions of the 'git' executable.
>
> That doesn't work when the change is in the performance test itself.

Thanks, I'm clearly wrong about that. I didn't look enough at the
context.

But then we're losing the perf test coverage for the case where we don't
have the *.rev files. Isn't it better to run both with & without *.rev,
perhaps by splitting up the test file? We could make it a function in
perf/lib-bitmap.sh that we call both with & without the wanted *.rev
repack config.

I suspect that's also subtly broken, in that t/perf assumes that it can
re-use the repo for a given <rev>, but this is modifying that repo, so
if you run e.g. test Y after this Y, that Y will unexpectedly get a
repack'd repo ...

But we could just start the test with a git clone . "$TEST_NAME" or
whatever, then repack that with whatever options we want...


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 5/6] p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex`
  2022-07-26 13:54                 ` Ævar Arnfjörð Bjarmason
@ 2022-07-26 18:17                   ` Abhradeep Chakraborty
  0 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-07-26 18:17 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee, Taylor Blau,
	Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaram,
	Philip Oakley, Martin Ågren

On Tue, Jul 26, 2022 at 7:26 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> But then we're losing the perf test coverage for the case where we don't
> have the *.rev files. Isn't it better to run both with & without *.rev,
> perhaps by splitting up the test file? We could make it a function in
> perf/lib-bitmap.sh that we call both with & without the wanted *.rev
> repack config.

Ok.

> I suspect that's also subtly broken, in that t/perf assumes that it can
> re-use the repo for a given <rev>, but this is modifying that repo, so
> if you run e.g. test Y after this Y, that Y will unexpectedly get a
> repack'd repo ...

Thanks Ævar! This is the problem that I informed Taylor off-list. Will
update it.

> But we could just start the test with a git clone . "$TEST_NAME" or
> whatever, then repack that with whatever options we want...

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 2/6] pack-bitmap-write.c: write lookup table extension
  2022-07-26  0:52           ` Taylor Blau
@ 2022-07-26 18:22             ` Abhradeep Chakraborty
  0 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-07-26 18:22 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaram,
	Derrick Stolee, Philip Oakley, Martin Ågren

On Tue, Jul 26, 2022 at 6:22 AM Taylor Blau <me@ttaylorr.com> wrote:
> I wonder if it would make this patch a little more readable to construct
> and use the commit_positions array as a single preparatory step before
> this commit.
>
> What do you think?

Yeah, sure! I have no problem with that.
> > +
> > +     ALLOC_ARRAY(commit_positions, writer.selected_nr);
> > +     for (uint32_t i = 0; i < writer.selected_nr; ++i) {
>
> Nit; we don't typically write for-loop expressions with variable
> declarations inside of them. Make sure to declare i outside of the loop,
> and then this becomes:
>
>     for (i = 0; i < writer.selected_nr; i++)
>
> (also, we typically use the postfix ++ operator, that is "i++" instead
> of "++i" unless there is a reason to prefer the latter over the former).

Got it. Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 4/6] pack-bitmap: prepare to read lookup table extension
  2022-07-26  1:13           ` Taylor Blau
@ 2022-07-26 18:56             ` Abhradeep Chakraborty
  2022-07-26 19:36             ` Eric Sunshine
  1 sibling, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-07-26 18:56 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaram,
	Derrick Stolee, Philip Oakley, Martin Ågren

On Tue, Jul 26, 2022 at 6:43 AM Taylor Blau <me@ttaylorr.com> wrote:
> Just noticing this now, but I wonder if we could avoid incrementing `p`
> here and instead write something like:
>
>     triplet->commit_pos = get_be32(p);
>     triplet->offset = get_be64(p + sizeof(uint32_t));
>     triplet->xor_row = get_be64(p + sizeof(uint64_t) + sizeof(uint32_t));
>
> I don't have a strong feeling about it, though, it just seems to read a
> little more directly to me and avoid modifying a variable that is only
> going to live as long as the function executes (p).

Ok, will update.

> > +/*
> > + * This function gets the raw triplet from `row`'th row in the
> > + * lookup table and fills that data to the `triplet`.
> > + */
> > +static int lookup_table_get_triplet(struct bitmap_index *bitmap_git,
> > +                                 uint32_t pos,
> > +                                 struct bitmap_lookup_table_triplet *triplet)
> > +{
> > +     unsigned char *p = NULL;
> > +     if (pos >= bitmap_git->entry_count)
> > +             return error(_("corrupt bitmap lookup table: triplet position out of index"));
> > +
> > +     p = bitmap_git->table_lookup + st_mult(pos, BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH);
> > +
> > +     return lookup_table_get_triplet_by_pointer(triplet, p);
> > +}
>
> Very nice. This cleans things up nicely by being able to call
> lookup_table_get_triplet_by_pointer().
>
> Since these are static functions, it doesn't really matter whether or
> not they are prefixed with 'bitmap_', since they won't be visible
> outside of pack-bitmap.c's compilation unit. But it may be nice to
> prefix them with 'bitmap_' just to make it extra clear that these are
> internal functions meant to be used within the bitmap machinery only.

Yeah, sure!

> > +     static int is_corrupt = 0;
> > +
> > +     if (is_corrupt)
> > +             return NULL;
>
> What is the purpose of this conditional? We don't modify `is_corrupt`
> before reading it here, so this should be dead code, unless I'm missing
> something.

My intention behind this code was -
Initially `is_corrupt` is 0, so the above code will not execute for
the first `lazy_bitmap...()` call. Now, for some reason, if we get to
know that the `.bitmap` file is corrupted, the function will `goto
corrupt` and `is_corrupt` will be set to 1.
As `is_corrupt` is a static variable, its value will be preserved. So,
whenr we call `lazy_bitmap...()` function for the second time (or
third time etc.; i.e. `bitmap_for_commit` under a for loop), we
instantly know that `.bitmap` file is corrupt (by seeing `is_corrupt`)
and we will not do all the computations any more.

>
> > +     offset = triplet.offset;
> > +     xor_row = triplet.xor_row;
> > +
> > +     if (xor_row != 0xffffffff) {
>
> Is this outer conditional needed? I don't think it is. If xor_row is
> 0xffffffff, then the while loop below won't be entered, and
> xor_items_nr will be zero, meaning that the second while loop will also
> be skipped.

Yes, you're right - it is not needed. But it guarantees that all the
code inside its braces will be run only if has a `xor offset` causing
the allocation of `xor_items` array as lazy as possible.

Should I remove it?

> > +                     bitmap_git->map_pos = bitmap_git->map_pos + sizeof(uint32_t) + sizeof(uint8_t);
>
> Could we write:
>
>     bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t)

Sure!

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 4/6] pack-bitmap: prepare to read lookup table extension
  2022-07-26  1:13           ` Taylor Blau
  2022-07-26 18:56             ` Abhradeep Chakraborty
@ 2022-07-26 19:36             ` Eric Sunshine
  1 sibling, 0 replies; 162+ messages in thread
From: Eric Sunshine @ 2022-07-26 19:36 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Abhradeep Chakraborty via GitGitGadget, Git List,
	Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Abhradeep Chakraborty

On Mon, Jul 25, 2022 at 9:21 PM Taylor Blau <me@ttaylorr.com> wrote:
> On Wed, Jul 20, 2022 at 06:38:22PM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> > +     triplet->commit_pos = get_be32(p);
> > +     p += sizeof(uint32_t);
> > +     triplet->offset = get_be64(p);
> > +     p += sizeof(uint64_t);
> > +     triplet->xor_row = get_be32(p);
> > +     return 0;
>
> Just noticing this now, but I wonder if we could avoid incrementing `p`
> here and instead write something like:
>
>     triplet->commit_pos = get_be32(p);
>     triplet->offset = get_be64(p + sizeof(uint32_t));
>     triplet->xor_row = get_be64(p + sizeof(uint64_t) + sizeof(uint32_t));
>
> I don't have a strong feeling about it, though, it just seems to read a
> little more directly to me and avoid modifying a variable that is only
> going to live as long as the function executes (p).

While it may not matter much in this tiny function, the code in the
patch sets a better precedent by conforming to a pattern which is more
maintainable in situations involving more pieces of data which need to
be decoded. It's also easier to reason about than the suggested
replacement since you don't have to spend extra cycles double-checking
if it's adding the correct number of and correctly-sized offsets at
each get_be*() invocation. IMHO, the way the patch already handles
this seems preferable.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-07-20 18:38         ` [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
@ 2022-07-28 19:22           ` Johannes Schindelin
  2022-08-02 12:40             ` Abhradeep Chakraborty
  0 siblings, 1 reply; 162+ messages in thread
From: Johannes Schindelin @ 2022-07-28 19:22 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Taylor Blau, Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Abhradeep Chakraborty, Abhradeep Chakraborty

Hi Abhradeep,

On Wed, 20 Jul 2022, Abhradeep Chakraborty via GitGitGadget wrote:

> From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
>
> Teach Git to provide a way for users to enable/disable bitmap lookup
> table extension by providing a config option named 'writeBitmapLookupTable'.
> Default is false.
>
> Also add test to verify writting of lookup table.
>
> Mentored-by: Taylor Blau <me@ttaylorr.com>
> Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
> Co-Authored-by: Taylor Blau <me@ttaylorr.com>
> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
> ---
>  Documentation/config/pack.txt     |   7 +
>  builtin/multi-pack-index.c        |   7 +
>  builtin/pack-objects.c            |   8 +
>  midx.c                            |   3 +
>  midx.h                            |   1 +
>  t/t5310-pack-bitmaps.sh           | 792 ++++++++++++++++--------------
>  t/t5311-pack-bitmaps-shallow.sh   |  53 +-
>  t/t5326-multi-pack-bitmaps.sh     | 421 +++++++++-------

That's quite a large a change, and unfortunately I pinpointed a flake to
this patch when running with GIT_TEST_DEFAULT_HASH=sha256. The symptom is
this:

-- snip --
[...]
+ diff -u expect.normalized actual.normalized
+ rm -f expect.normalized actual.normalized
ok 317 - enumerate --objects (full bitmap, other)

expecting success of 5326.318 'bitmap --objects handles non-commit objects (full bitmap, other)':
                git rev-list --objects --use-bitmap-index $branch tagged-blob >actual &&
                grep $blob actual

+ git rev-list --objects --use-bitmap-index other tagged-blob
+ grep bff4ed5e839bd73e821f78b45a7fa34208aa85596535ec8e9ac5eab477ca6f81 actual
bff4ed5e839bd73e821f78b45a7fa34208aa85596535ec8e9ac5eab477ca6f81
ok 318 - bitmap --objects handles non-commit objects (full bitmap, other)

expecting success of 5326.319 'clone from bitmapped repository':
                rm -fr clone.git &&
                git clone --no-local --bare . clone.git &&
                git rev-parse HEAD >expect &&
                git --git-dir=clone.git rev-parse HEAD >actual &&
                test_cmp expect actual

+ rm -fr clone.git
+ git clone --no-local --bare . clone.git
Cloning into bare repository 'clone.git'...
remote: Enumerating objects: 756, done.
remote: Counting objects: 100% (754/754), done.
remote: Compressing objects: 100% (281/281), done.
remote: Total 756 (delta 245), reused 740 (delta 234), pack-reused 2
Receiving objects: 100% (756/756), 77.50 KiB | 8.61 MiB/s, done.
fatal: REF_DELTA at offset 221 already resolved (duplicate base 4d332072f161629ffe4652ecd3ce377ef88447bec73f05ab0f3515f98bd061cf?)
fatal: fetch-pack: invalid index-pack output
error: last command exited with $?=128
not ok 319 - clone from bitmapped repository
#
#                       rm -fr clone.git &&
#                       git clone --no-local --bare . clone.git &&
#                       git rev-parse HEAD >expect &&
#                       git --git-dir=clone.git rev-parse HEAD >actual &&
#                       test_cmp expect actual
#
1..319
-- snap --

On a hunch, I ran this through valgrind (took a while) but it did not
point out the problem.

Again, this is only with SHA-256 (and somewhat flaky), it passes every
time with SHA-1. Maybe you can reproduce on your side with that
information?

Sadly, this patch is way too large for me to do a drive-by debugging
session, so I will have to leave it to you to investigate further.

Ciao,
Dscho

>  t/t5327-multi-pack-bitmaps-rev.sh |  24 +-
>  9 files changed, 733 insertions(+), 583 deletions(-)
>
> diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
> index ad7f73a1ead..b955ca572ec 100644
> --- a/Documentation/config/pack.txt
> +++ b/Documentation/config/pack.txt
> @@ -164,6 +164,13 @@ When writing a multi-pack reachability bitmap, no new namehashes are
>  computed; instead, any namehashes stored in an existing bitmap are
>  permuted into their appropriate location when writing a new bitmap.
>
> +pack.writeBitmapLookupTable::
> +	When true, Git will include a "lookup table" section in the
> +	bitmap index (if one is written). This table is used to defer
> +	loading individual bitmaps as late as possible. This can be
> +	beneficial in repositories that have relatively large bitmap
> +	indexes. Defaults to false.
> +
>  pack.writeReverseIndex::
>  	When true, git will write a corresponding .rev file (see:
>  	link:../technical/pack-format.html[Documentation/technical/pack-format.txt])
> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> index 5edbb7fe86e..55402b46f41 100644
> --- a/builtin/multi-pack-index.c
> +++ b/builtin/multi-pack-index.c
> @@ -87,6 +87,13 @@ static int git_multi_pack_index_write_config(const char *var, const char *value,
>  			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
>  	}
>
> +	if (!strcmp(var, "pack.writebitmaplookuptable")) {
> +		if (git_config_bool(var, value))
> +			opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;
> +		else
> +			opts.flags &= ~MIDX_WRITE_BITMAP_LOOKUP_TABLE;
> +	}
> +
>  	/*
>  	 * We should never make a fall-back call to 'git_default_config', since
>  	 * this was already called in 'cmd_multi_pack_index()'.
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 39e28cfcafc..46e26774963 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3148,6 +3148,14 @@ static int git_pack_config(const char *k, const char *v, void *cb)
>  		else
>  			write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE;
>  	}
> +
> +	if (!strcmp(k, "pack.writebitmaplookuptable")) {
> +		if (git_config_bool(k, v))
> +			write_bitmap_options |= BITMAP_OPT_LOOKUP_TABLE;
> +		else
> +			write_bitmap_options &= ~BITMAP_OPT_LOOKUP_TABLE;
> +	}
> +
>  	if (!strcmp(k, "pack.usebitmaps")) {
>  		use_bitmap_index_default = git_config_bool(k, v);
>  		return 0;
> diff --git a/midx.c b/midx.c
> index 5f0dd386b02..9c26d04bfde 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -1072,6 +1072,9 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
>  	if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
>  		options |= BITMAP_OPT_HASH_CACHE;
>
> +	if (flags & MIDX_WRITE_BITMAP_LOOKUP_TABLE)
> +		options |= BITMAP_OPT_LOOKUP_TABLE;
> +
>  	prepare_midx_packing_data(&pdata, ctx);
>
>  	commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx);
> diff --git a/midx.h b/midx.h
> index 22e8e53288e..5578cd7b835 100644
> --- a/midx.h
> +++ b/midx.h
> @@ -47,6 +47,7 @@ struct multi_pack_index {
>  #define MIDX_WRITE_REV_INDEX (1 << 1)
>  #define MIDX_WRITE_BITMAP (1 << 2)
>  #define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
> +#define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
>
>  const unsigned char *get_midx_checksum(struct multi_pack_index *m);
>  void get_midx_filename(struct strbuf *out, const char *object_dir);
> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
> index f775fc1ce69..c0607172827 100755
> --- a/t/t5310-pack-bitmaps.sh
> +++ b/t/t5310-pack-bitmaps.sh
> @@ -26,22 +26,413 @@ has_any () {
>  	grep -Ff "$1" "$2"
>  }
>
> -setup_bitmap_history
> -
> -test_expect_success 'setup writing bitmaps during repack' '
> -	git config repack.writeBitmaps true
> -'
> -
> -test_expect_success 'full repack creates bitmaps' '
> -	GIT_TRACE2_EVENT="$(pwd)/trace" \
> +test_bitmap_cases () {
> +	writeLookupTable=false
> +	for i in "$@"
> +	do
> +		case "$i" in
> +		"pack.writeBitmapLookupTable") writeLookupTable=true;;
> +		esac
> +	done
> +
> +	test_expect_success 'setup test repository' '
> +		rm -fr * .git &&
> +		git init &&
> +		git config pack.writeBitmapLookupTable '"$writeLookupTable"'
> +	'
> +	setup_bitmap_history
> +
> +	test_expect_success 'setup writing bitmaps during repack' '
> +		git config repack.writeBitmaps true
> +	'
> +
> +	test_expect_success 'full repack creates bitmaps' '
> +		GIT_TRACE2_EVENT="$(pwd)/trace" \
> +			git repack -ad &&
> +		ls .git/objects/pack/ | grep bitmap >output &&
> +		test_line_count = 1 output &&
> +		grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
> +		grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
> +	'
> +
> +	basic_bitmap_tests
> +
> +	test_expect_success 'pack-objects respects --local (non-local loose)' '
> +		git init --bare alt.git &&
> +		echo $(pwd)/alt.git/objects >.git/objects/info/alternates &&
> +		echo content1 >file1 &&
> +		# non-local loose object which is not present in bitmapped pack
> +		altblob=$(GIT_DIR=alt.git git hash-object -w file1) &&
> +		# non-local loose object which is also present in bitmapped pack
> +		git cat-file blob $blob | GIT_DIR=alt.git git hash-object -w --stdin &&
> +		git add file1 &&
> +		test_tick &&
> +		git commit -m commit_file1 &&
> +		echo HEAD | git pack-objects --local --stdout --revs >1.pack &&
> +		git index-pack 1.pack &&
> +		list_packed_objects 1.idx >1.objects &&
> +		printf "%s\n" "$altblob" "$blob" >nonlocal-loose &&
> +		! has_any nonlocal-loose 1.objects
> +	'
> +
> +	test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' '
> +		echo content2 >file2 &&
> +		blob2=$(git hash-object -w file2) &&
> +		git add file2 &&
> +		test_tick &&
> +		git commit -m commit_file2 &&
> +		printf "%s\n" "$blob2" "$bitmaptip" >keepobjects &&
> +		pack2=$(git pack-objects pack2 <keepobjects) &&
> +		mv pack2-$pack2.* .git/objects/pack/ &&
> +		>.git/objects/pack/pack2-$pack2.keep &&
> +		rm $(objpath $blob2) &&
> +		echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >2a.pack &&
> +		git index-pack 2a.pack &&
> +		list_packed_objects 2a.idx >2a.objects &&
> +		! has_any keepobjects 2a.objects
> +	'
> +
> +	test_expect_success 'pack-objects respects --local (non-local pack)' '
> +		mv .git/objects/pack/pack2-$pack2.* alt.git/objects/pack/ &&
> +		echo HEAD | git pack-objects --local --stdout --revs >2b.pack &&
> +		git index-pack 2b.pack &&
> +		list_packed_objects 2b.idx >2b.objects &&
> +		! has_any keepobjects 2b.objects
> +	'
> +
> +	test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' '
> +		ls .git/objects/pack/ | grep bitmap >output &&
> +		test_line_count = 1 output &&
> +		packbitmap=$(basename $(cat output) .bitmap) &&
> +		list_packed_objects .git/objects/pack/$packbitmap.idx >packbitmap.objects &&
> +		test_when_finished "rm -f .git/objects/pack/$packbitmap.keep" &&
> +		>.git/objects/pack/$packbitmap.keep &&
> +		echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack &&
> +		git index-pack 3a.pack &&
> +		list_packed_objects 3a.idx >3a.objects &&
> +		! has_any packbitmap.objects 3a.objects
> +	'
> +
> +	test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' '
> +		mv .git/objects/pack/$packbitmap.* alt.git/objects/pack/ &&
> +		rm -f .git/objects/pack/multi-pack-index &&
> +		test_when_finished "mv alt.git/objects/pack/$packbitmap.* .git/objects/pack/" &&
> +		echo HEAD | git pack-objects --local --stdout --revs >3b.pack &&
> +		git index-pack 3b.pack &&
> +		list_packed_objects 3b.idx >3b.objects &&
> +		! has_any packbitmap.objects 3b.objects
> +	'
> +
> +	test_expect_success 'pack-objects to file can use bitmap' '
> +		# make sure we still have 1 bitmap index from previous tests
> +		ls .git/objects/pack/ | grep bitmap >output &&
> +		test_line_count = 1 output &&
> +		# verify equivalent packs are generated with/without using bitmap index
> +		packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) &&
> +		packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) &&
> +		list_packed_objects packa-$packasha1.idx >packa.objects &&
> +		list_packed_objects packb-$packbsha1.idx >packb.objects &&
> +		test_cmp packa.objects packb.objects
> +	'
> +
> +	test_expect_success 'full repack, reusing previous bitmaps' '
>  		git repack -ad &&
> -	ls .git/objects/pack/ | grep bitmap >output &&
> -	test_line_count = 1 output &&
> -	grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
> -	grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
> -'
> +		ls .git/objects/pack/ | grep bitmap >output &&
> +		test_line_count = 1 output
> +	'
> +
> +	test_expect_success 'fetch (full bitmap)' '
> +		git --git-dir=clone.git fetch origin second:second &&
> +		git rev-parse HEAD >expect &&
> +		git --git-dir=clone.git rev-parse HEAD >actual &&
> +		test_cmp expect actual
> +	'
> +
> +	test_expect_success 'create objects for missing-HAVE tests' '
> +		blob=$(echo "missing have" | git hash-object -w --stdin) &&
> +		tree=$(printf "100644 blob $blob\tfile\n" | git mktree) &&
> +		parent=$(echo parent | git commit-tree $tree) &&
> +		commit=$(echo commit | git commit-tree $tree -p $parent) &&
> +		cat >revs <<-EOF
> +		HEAD
> +		^HEAD^
> +		^$commit
> +		EOF
> +	'
> +
> +	test_expect_success 'pack-objects respects --incremental' '
> +		cat >revs2 <<-EOF &&
> +		HEAD
> +		$commit
> +		EOF
> +		git pack-objects --incremental --stdout --revs <revs2 >4.pack &&
> +		git index-pack 4.pack &&
> +		list_packed_objects 4.idx >4.objects &&
> +		test_line_count = 4 4.objects &&
> +		git rev-list --objects $commit >revlist &&
> +		cut -d" " -f1 revlist |sort >objects &&
> +		test_cmp 4.objects objects
> +	'
> +
> +	test_expect_success 'pack with missing blob' '
> +		rm $(objpath $blob) &&
> +		git pack-objects --stdout --revs <revs >/dev/null
> +	'
> +
> +	test_expect_success 'pack with missing tree' '
> +		rm $(objpath $tree) &&
> +		git pack-objects --stdout --revs <revs >/dev/null
> +	'
> +
> +	test_expect_success 'pack with missing parent' '
> +		rm $(objpath $parent) &&
> +		git pack-objects --stdout --revs <revs >/dev/null
> +	'
> +
> +	test_expect_success JGIT,SHA1 'we can read jgit bitmaps' '
> +		git clone --bare . compat-jgit.git &&
> +		(
> +			cd compat-jgit.git &&
> +			rm -f objects/pack/*.bitmap &&
> +			jgit gc &&
> +			git rev-list --test-bitmap HEAD
> +		)
> +	'
> +
> +	test_expect_success JGIT,SHA1 'jgit can read our bitmaps' '
> +		git clone --bare . compat-us.git &&
> +		(
> +			cd compat-us.git &&
> +			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
> +			git repack -adb &&
> +			# jgit gc will barf if it does not like our bitmaps
> +			jgit gc
> +		)
> +	'
> +
> +	test_expect_success 'splitting packs does not generate bogus bitmaps' '
> +		test-tool genrandom foo $((1024 * 1024)) >rand &&
> +		git add rand &&
> +		git commit -m "commit with big file" &&
> +		git -c pack.packSizeLimit=500k repack -adb &&
> +		git init --bare no-bitmaps.git &&
> +		git -C no-bitmaps.git fetch .. HEAD
> +	'
> +
> +	test_expect_success 'set up reusable pack' '
> +		rm -f .git/objects/pack/*.keep &&
> +		git repack -adb &&
> +		reusable_pack () {
> +			git for-each-ref --format="%(objectname)" |
> +			git pack-objects --delta-base-offset --revs --stdout "$@"
> +		}
> +	'
> +
> +	test_expect_success 'pack reuse respects --honor-pack-keep' '
> +		test_when_finished "rm -f .git/objects/pack/*.keep" &&
> +		for i in .git/objects/pack/*.pack
> +		do
> +			>${i%.pack}.keep || return 1
> +		done &&
> +		reusable_pack --honor-pack-keep >empty.pack &&
> +		git index-pack empty.pack &&
> +		git show-index <empty.idx >actual &&
> +		test_must_be_empty actual
> +	'
> +
> +	test_expect_success 'pack reuse respects --local' '
> +		mv .git/objects/pack/* alt.git/objects/pack/ &&
> +		test_when_finished "mv alt.git/objects/pack/* .git/objects/pack/" &&
> +		reusable_pack --local >empty.pack &&
> +		git index-pack empty.pack &&
> +		git show-index <empty.idx >actual &&
> +		test_must_be_empty actual
> +	'
> +
> +	test_expect_success 'pack reuse respects --incremental' '
> +		reusable_pack --incremental >empty.pack &&
> +		git index-pack empty.pack &&
> +		git show-index <empty.idx >actual &&
> +		test_must_be_empty actual
> +	'
> +
> +	test_expect_success 'truncated bitmap fails gracefully (ewah)' '
> +		test_config pack.writebitmaphashcache false &&
> +		git repack -ad &&
> +		git rev-list --use-bitmap-index --count --all >expect &&
> +		bitmap=$(ls .git/objects/pack/*.bitmap) &&
> +		test_when_finished "rm -f $bitmap" &&
> +		test_copy_bytes 256 <$bitmap >$bitmap.tmp &&
> +		mv -f $bitmap.tmp $bitmap &&
> +		git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
> +		test_cmp expect actual &&
> +		test_i18ngrep corrupt.ewah.bitmap stderr
> +	'
> +
> +	test_expect_success 'truncated bitmap fails gracefully (cache)' '
> +		git repack -ad &&
> +		git rev-list --use-bitmap-index --count --all >expect &&
> +		bitmap=$(ls .git/objects/pack/*.bitmap) &&
> +		test_when_finished "rm -f $bitmap" &&
> +		test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
> +		mv -f $bitmap.tmp $bitmap &&
> +		git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
> +		test_cmp expect actual &&
> +		test_i18ngrep corrupted.bitmap.index stderr
> +	'
> +
> +	# Create a state of history with these properties:
> +	#
> +	#  - refs that allow a client to fetch some new history, while sharing some old
> +	#    history with the server; we use branches delta-reuse-old and
> +	#    delta-reuse-new here
> +	#
> +	#  - the new history contains an object that is stored on the server as a delta
> +	#    against a base that is in the old history
> +	#
> +	#  - the base object is not immediately reachable from the tip of the old
> +	#    history; finding it would involve digging down through history we know the
> +	#    other side has
> +	#
> +	# This should result in a state where fetching from old->new would not
> +	# traditionally reuse the on-disk delta (because we'd have to dig to realize
> +	# that the client has it), but we will do so if bitmaps can tell us cheaply
> +	# that the other side has it.
> +	test_expect_success 'set up thin delta-reuse parent' '
> +		# This first commit contains the buried base object.
> +		test-tool genrandom delta 16384 >file &&
> +		git add file &&
> +		git commit -m "delta base" &&
> +		base=$(git rev-parse --verify HEAD:file) &&
> +
> +		# These intermediate commits bury the base back in history.
> +		# This becomes the "old" state.
> +		for i in 1 2 3 4 5
> +		do
> +			echo $i >file &&
> +			git commit -am "intermediate $i" || return 1
> +		done &&
> +		git branch delta-reuse-old &&
> +
> +		# And now our new history has a delta against the buried base. Note
> +		# that this must be smaller than the original file, since pack-objects
> +		# prefers to create deltas from smaller objects to larger.
> +		test-tool genrandom delta 16300 >file &&
> +		git commit -am "delta result" &&
> +		delta=$(git rev-parse --verify HEAD:file) &&
> +		git branch delta-reuse-new &&
> +
> +		# Repack with bitmaps and double check that we have the expected delta
> +		# relationship.
> +		git repack -adb &&
> +		have_delta $delta $base
> +	'
> +
> +	# Now we can sanity-check the non-bitmap behavior (that the server is not able
> +	# to reuse the delta). This isn't strictly something we care about, so this
> +	# test could be scrapped in the future. But it makes sure that the next test is
> +	# actually triggering the feature we want.
> +	#
> +	# Note that our tools for working with on-the-wire "thin" packs are limited. So
> +	# we actually perform the fetch, retain the resulting pack, and inspect the
> +	# result.
> +	test_expect_success 'fetch without bitmaps ignores delta against old base' '
> +		test_config pack.usebitmaps false &&
> +		test_when_finished "rm -rf client.git" &&
> +		git init --bare client.git &&
> +		(
> +			cd client.git &&
> +			git config transfer.unpackLimit 1 &&
> +			git fetch .. delta-reuse-old:delta-reuse-old &&
> +			git fetch .. delta-reuse-new:delta-reuse-new &&
> +			have_delta $delta $ZERO_OID
> +		)
> +	'
> +
> +	# And do the same for the bitmap case, where we do expect to find the delta.
> +	test_expect_success 'fetch with bitmaps can reuse old base' '
> +		test_config pack.usebitmaps true &&
> +		test_when_finished "rm -rf client.git" &&
> +		git init --bare client.git &&
> +		(
> +			cd client.git &&
> +			git config transfer.unpackLimit 1 &&
> +			git fetch .. delta-reuse-old:delta-reuse-old &&
> +			git fetch .. delta-reuse-new:delta-reuse-new &&
> +			have_delta $delta $base
> +		)
> +	'
> +
> +	test_expect_success 'pack.preferBitmapTips' '
> +		git init repo &&
> +		test_when_finished "rm -fr repo" &&
> +		(
> +			cd repo &&
> +			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
> +
> +			# create enough commits that not all are receive bitmap
> +			# coverage even if they are all at the tip of some reference.
> +			test_commit_bulk --message="%s" 103 &&
> +
> +			git rev-list HEAD >commits.raw &&
> +			sort <commits.raw >commits &&
> +
> +			git log --format="create refs/tags/%s %H" HEAD >refs &&
> +			git update-ref --stdin <refs &&
> +
> +			git repack -adb &&
> +			test-tool bitmap list-commits | sort >bitmaps &&
> +
> +			# remember which commits did not receive bitmaps
> +			comm -13 bitmaps commits >before &&
> +			test_file_not_empty before &&
> +
> +			# mark the commits which did not receive bitmaps as preferred,
> +			# and generate the bitmap again
> +			perl -pe "s{^}{create refs/tags/include/$. }" <before |
> +				git update-ref --stdin &&
> +			git -c pack.preferBitmapTips=refs/tags/include repack -adb &&
> +
> +			# finally, check that the commit(s) without bitmap coverage
> +			# are not the same ones as before
> +			test-tool bitmap list-commits | sort >bitmaps &&
> +			comm -13 bitmaps commits >after &&
> +
> +			! test_cmp before after
> +		)
> +	'
> +
> +	test_expect_success 'complains about multiple pack bitmaps' '
> +		rm -fr repo &&
> +		git init repo &&
> +		test_when_finished "rm -fr repo" &&
> +		(
> +			cd repo &&
> +			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
> +
> +			test_commit base &&
> +
> +			git repack -adb &&
> +			bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
> +			mv "$bitmap" "$bitmap.bak" &&
> +
> +			test_commit other &&
> +			git repack -ab &&
> +
> +			mv "$bitmap.bak" "$bitmap" &&
> +
> +			find .git/objects/pack -type f -name "*.pack" >packs &&
> +			find .git/objects/pack -type f -name "*.bitmap" >bitmaps &&
> +			test_line_count = 2 packs &&
> +			test_line_count = 2 bitmaps &&
> +
> +			git rev-list --use-bitmap-index HEAD 2>err &&
> +			grep "ignoring extra bitmap file" err
> +		)
> +	'
> +}
>
> -basic_bitmap_tests
> +test_bitmap_cases
>
>  test_expect_success 'incremental repack fails when bitmaps are requested' '
>  	test_commit more-1 &&
> @@ -54,375 +445,12 @@ test_expect_success 'incremental repack can disable bitmaps' '
>  	git repack -d --no-write-bitmap-index
>  '
>
> -test_expect_success 'pack-objects respects --local (non-local loose)' '
> -	git init --bare alt.git &&
> -	echo $(pwd)/alt.git/objects >.git/objects/info/alternates &&
> -	echo content1 >file1 &&
> -	# non-local loose object which is not present in bitmapped pack
> -	altblob=$(GIT_DIR=alt.git git hash-object -w file1) &&
> -	# non-local loose object which is also present in bitmapped pack
> -	git cat-file blob $blob | GIT_DIR=alt.git git hash-object -w --stdin &&
> -	git add file1 &&
> -	test_tick &&
> -	git commit -m commit_file1 &&
> -	echo HEAD | git pack-objects --local --stdout --revs >1.pack &&
> -	git index-pack 1.pack &&
> -	list_packed_objects 1.idx >1.objects &&
> -	printf "%s\n" "$altblob" "$blob" >nonlocal-loose &&
> -	! has_any nonlocal-loose 1.objects
> -'
> -
> -test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' '
> -	echo content2 >file2 &&
> -	blob2=$(git hash-object -w file2) &&
> -	git add file2 &&
> -	test_tick &&
> -	git commit -m commit_file2 &&
> -	printf "%s\n" "$blob2" "$bitmaptip" >keepobjects &&
> -	pack2=$(git pack-objects pack2 <keepobjects) &&
> -	mv pack2-$pack2.* .git/objects/pack/ &&
> -	>.git/objects/pack/pack2-$pack2.keep &&
> -	rm $(objpath $blob2) &&
> -	echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >2a.pack &&
> -	git index-pack 2a.pack &&
> -	list_packed_objects 2a.idx >2a.objects &&
> -	! has_any keepobjects 2a.objects
> -'
> -
> -test_expect_success 'pack-objects respects --local (non-local pack)' '
> -	mv .git/objects/pack/pack2-$pack2.* alt.git/objects/pack/ &&
> -	echo HEAD | git pack-objects --local --stdout --revs >2b.pack &&
> -	git index-pack 2b.pack &&
> -	list_packed_objects 2b.idx >2b.objects &&
> -	! has_any keepobjects 2b.objects
> -'
> -
> -test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' '
> -	ls .git/objects/pack/ | grep bitmap >output &&
> -	test_line_count = 1 output &&
> -	packbitmap=$(basename $(cat output) .bitmap) &&
> -	list_packed_objects .git/objects/pack/$packbitmap.idx >packbitmap.objects &&
> -	test_when_finished "rm -f .git/objects/pack/$packbitmap.keep" &&
> -	>.git/objects/pack/$packbitmap.keep &&
> -	echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack &&
> -	git index-pack 3a.pack &&
> -	list_packed_objects 3a.idx >3a.objects &&
> -	! has_any packbitmap.objects 3a.objects
> -'
> -
> -test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' '
> -	mv .git/objects/pack/$packbitmap.* alt.git/objects/pack/ &&
> -	rm -f .git/objects/pack/multi-pack-index &&
> -	test_when_finished "mv alt.git/objects/pack/$packbitmap.* .git/objects/pack/" &&
> -	echo HEAD | git pack-objects --local --stdout --revs >3b.pack &&
> -	git index-pack 3b.pack &&
> -	list_packed_objects 3b.idx >3b.objects &&
> -	! has_any packbitmap.objects 3b.objects
> -'
> -
> -test_expect_success 'pack-objects to file can use bitmap' '
> -	# make sure we still have 1 bitmap index from previous tests
> -	ls .git/objects/pack/ | grep bitmap >output &&
> -	test_line_count = 1 output &&
> -	# verify equivalent packs are generated with/without using bitmap index
> -	packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) &&
> -	packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) &&
> -	list_packed_objects packa-$packasha1.idx >packa.objects &&
> -	list_packed_objects packb-$packbsha1.idx >packb.objects &&
> -	test_cmp packa.objects packb.objects
> -'
> -
> -test_expect_success 'full repack, reusing previous bitmaps' '
> -	git repack -ad &&
> -	ls .git/objects/pack/ | grep bitmap >output &&
> -	test_line_count = 1 output
> -'
> -
> -test_expect_success 'fetch (full bitmap)' '
> -	git --git-dir=clone.git fetch origin second:second &&
> -	git rev-parse HEAD >expect &&
> -	git --git-dir=clone.git rev-parse HEAD >actual &&
> -	test_cmp expect actual
> -'
> -
> -test_expect_success 'create objects for missing-HAVE tests' '
> -	blob=$(echo "missing have" | git hash-object -w --stdin) &&
> -	tree=$(printf "100644 blob $blob\tfile\n" | git mktree) &&
> -	parent=$(echo parent | git commit-tree $tree) &&
> -	commit=$(echo commit | git commit-tree $tree -p $parent) &&
> -	cat >revs <<-EOF
> -	HEAD
> -	^HEAD^
> -	^$commit
> -	EOF
> -'
> -
> -test_expect_success 'pack-objects respects --incremental' '
> -	cat >revs2 <<-EOF &&
> -	HEAD
> -	$commit
> -	EOF
> -	git pack-objects --incremental --stdout --revs <revs2 >4.pack &&
> -	git index-pack 4.pack &&
> -	list_packed_objects 4.idx >4.objects &&
> -	test_line_count = 4 4.objects &&
> -	git rev-list --objects $commit >revlist &&
> -	cut -d" " -f1 revlist |sort >objects &&
> -	test_cmp 4.objects objects
> -'
> -
> -test_expect_success 'pack with missing blob' '
> -	rm $(objpath $blob) &&
> -	git pack-objects --stdout --revs <revs >/dev/null
> -'
> +test_bitmap_cases "pack.writeBitmapLookupTable"
>
> -test_expect_success 'pack with missing tree' '
> -	rm $(objpath $tree) &&
> -	git pack-objects --stdout --revs <revs >/dev/null
> -'
> -
> -test_expect_success 'pack with missing parent' '
> -	rm $(objpath $parent) &&
> -	git pack-objects --stdout --revs <revs >/dev/null
> -'
> -
> -test_expect_success JGIT,SHA1 'we can read jgit bitmaps' '
> -	git clone --bare . compat-jgit.git &&
> -	(
> -		cd compat-jgit.git &&
> -		rm -f objects/pack/*.bitmap &&
> -		jgit gc &&
> -		git rev-list --test-bitmap HEAD
> -	)
> -'
> -
> -test_expect_success JGIT,SHA1 'jgit can read our bitmaps' '
> -	git clone --bare . compat-us.git &&
> -	(
> -		cd compat-us.git &&
> -		git repack -adb &&
> -		# jgit gc will barf if it does not like our bitmaps
> -		jgit gc
> -	)
> -'
> -
> -test_expect_success 'splitting packs does not generate bogus bitmaps' '
> -	test-tool genrandom foo $((1024 * 1024)) >rand &&
> -	git add rand &&
> -	git commit -m "commit with big file" &&
> -	git -c pack.packSizeLimit=500k repack -adb &&
> -	git init --bare no-bitmaps.git &&
> -	git -C no-bitmaps.git fetch .. HEAD
> -'
> -
> -test_expect_success 'set up reusable pack' '
> -	rm -f .git/objects/pack/*.keep &&
> -	git repack -adb &&
> -	reusable_pack () {
> -		git for-each-ref --format="%(objectname)" |
> -		git pack-objects --delta-base-offset --revs --stdout "$@"
> -	}
> -'
> -
> -test_expect_success 'pack reuse respects --honor-pack-keep' '
> -	test_when_finished "rm -f .git/objects/pack/*.keep" &&
> -	for i in .git/objects/pack/*.pack
> -	do
> -		>${i%.pack}.keep || return 1
> -	done &&
> -	reusable_pack --honor-pack-keep >empty.pack &&
> -	git index-pack empty.pack &&
> -	git show-index <empty.idx >actual &&
> -	test_must_be_empty actual
> -'
> -
> -test_expect_success 'pack reuse respects --local' '
> -	mv .git/objects/pack/* alt.git/objects/pack/ &&
> -	test_when_finished "mv alt.git/objects/pack/* .git/objects/pack/" &&
> -	reusable_pack --local >empty.pack &&
> -	git index-pack empty.pack &&
> -	git show-index <empty.idx >actual &&
> -	test_must_be_empty actual
> -'
> -
> -test_expect_success 'pack reuse respects --incremental' '
> -	reusable_pack --incremental >empty.pack &&
> -	git index-pack empty.pack &&
> -	git show-index <empty.idx >actual &&
> -	test_must_be_empty actual
> -'
> -
> -test_expect_success 'truncated bitmap fails gracefully (ewah)' '
> -	test_config pack.writebitmaphashcache false &&
> -	git repack -ad &&
> -	git rev-list --use-bitmap-index --count --all >expect &&
> -	bitmap=$(ls .git/objects/pack/*.bitmap) &&
> -	test_when_finished "rm -f $bitmap" &&
> -	test_copy_bytes 256 <$bitmap >$bitmap.tmp &&
> -	mv -f $bitmap.tmp $bitmap &&
> -	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
> -	test_cmp expect actual &&
> -	test_i18ngrep corrupt.ewah.bitmap stderr
> -'
> -
> -test_expect_success 'truncated bitmap fails gracefully (cache)' '
> -	git repack -ad &&
> -	git rev-list --use-bitmap-index --count --all >expect &&
> -	bitmap=$(ls .git/objects/pack/*.bitmap) &&
> -	test_when_finished "rm -f $bitmap" &&
> -	test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
> -	mv -f $bitmap.tmp $bitmap &&
> -	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
> -	test_cmp expect actual &&
> -	test_i18ngrep corrupted.bitmap.index stderr
> -'
> -
> -# Create a state of history with these properties:
> -#
> -#  - refs that allow a client to fetch some new history, while sharing some old
> -#    history with the server; we use branches delta-reuse-old and
> -#    delta-reuse-new here
> -#
> -#  - the new history contains an object that is stored on the server as a delta
> -#    against a base that is in the old history
> -#
> -#  - the base object is not immediately reachable from the tip of the old
> -#    history; finding it would involve digging down through history we know the
> -#    other side has
> -#
> -# This should result in a state where fetching from old->new would not
> -# traditionally reuse the on-disk delta (because we'd have to dig to realize
> -# that the client has it), but we will do so if bitmaps can tell us cheaply
> -# that the other side has it.
> -test_expect_success 'set up thin delta-reuse parent' '
> -	# This first commit contains the buried base object.
> -	test-tool genrandom delta 16384 >file &&
> -	git add file &&
> -	git commit -m "delta base" &&
> -	base=$(git rev-parse --verify HEAD:file) &&
> -
> -	# These intermediate commits bury the base back in history.
> -	# This becomes the "old" state.
> -	for i in 1 2 3 4 5
> -	do
> -		echo $i >file &&
> -		git commit -am "intermediate $i" || return 1
> -	done &&
> -	git branch delta-reuse-old &&
> -
> -	# And now our new history has a delta against the buried base. Note
> -	# that this must be smaller than the original file, since pack-objects
> -	# prefers to create deltas from smaller objects to larger.
> -	test-tool genrandom delta 16300 >file &&
> -	git commit -am "delta result" &&
> -	delta=$(git rev-parse --verify HEAD:file) &&
> -	git branch delta-reuse-new &&
> -
> -	# Repack with bitmaps and double check that we have the expected delta
> -	# relationship.
> -	git repack -adb &&
> -	have_delta $delta $base
> -'
> -
> -# Now we can sanity-check the non-bitmap behavior (that the server is not able
> -# to reuse the delta). This isn't strictly something we care about, so this
> -# test could be scrapped in the future. But it makes sure that the next test is
> -# actually triggering the feature we want.
> -#
> -# Note that our tools for working with on-the-wire "thin" packs are limited. So
> -# we actually perform the fetch, retain the resulting pack, and inspect the
> -# result.
> -test_expect_success 'fetch without bitmaps ignores delta against old base' '
> -	test_config pack.usebitmaps false &&
> -	test_when_finished "rm -rf client.git" &&
> -	git init --bare client.git &&
> -	(
> -		cd client.git &&
> -		git config transfer.unpackLimit 1 &&
> -		git fetch .. delta-reuse-old:delta-reuse-old &&
> -		git fetch .. delta-reuse-new:delta-reuse-new &&
> -		have_delta $delta $ZERO_OID
> -	)
> -'
> -
> -# And do the same for the bitmap case, where we do expect to find the delta.
> -test_expect_success 'fetch with bitmaps can reuse old base' '
> -	test_config pack.usebitmaps true &&
> -	test_when_finished "rm -rf client.git" &&
> -	git init --bare client.git &&
> -	(
> -		cd client.git &&
> -		git config transfer.unpackLimit 1 &&
> -		git fetch .. delta-reuse-old:delta-reuse-old &&
> -		git fetch .. delta-reuse-new:delta-reuse-new &&
> -		have_delta $delta $base
> -	)
> -'
> -
> -test_expect_success 'pack.preferBitmapTips' '
> -	git init repo &&
> -	test_when_finished "rm -fr repo" &&
> -	(
> -		cd repo &&
> -
> -		# create enough commits that not all are receive bitmap
> -		# coverage even if they are all at the tip of some reference.
> -		test_commit_bulk --message="%s" 103 &&
> -
> -		git rev-list HEAD >commits.raw &&
> -		sort <commits.raw >commits &&
> -
> -		git log --format="create refs/tags/%s %H" HEAD >refs &&
> -		git update-ref --stdin <refs &&
> -
> -		git repack -adb &&
> -		test-tool bitmap list-commits | sort >bitmaps &&
> -
> -		# remember which commits did not receive bitmaps
> -		comm -13 bitmaps commits >before &&
> -		test_file_not_empty before &&
> -
> -		# mark the commits which did not receive bitmaps as preferred,
> -		# and generate the bitmap again
> -		perl -pe "s{^}{create refs/tags/include/$. }" <before |
> -			git update-ref --stdin &&
> -		git -c pack.preferBitmapTips=refs/tags/include repack -adb &&
> -
> -		# finally, check that the commit(s) without bitmap coverage
> -		# are not the same ones as before
> -		test-tool bitmap list-commits | sort >bitmaps &&
> -		comm -13 bitmaps commits >after &&
> -
> -		! test_cmp before after
> -	)
> -'
> -
> -test_expect_success 'complains about multiple pack bitmaps' '
> -	rm -fr repo &&
> -	git init repo &&
> -	test_when_finished "rm -fr repo" &&
> -	(
> -		cd repo &&
> -
> -		test_commit base &&
> -
> -		git repack -adb &&
> -		bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
> -		mv "$bitmap" "$bitmap.bak" &&
> -
> -		test_commit other &&
> -		git repack -ab &&
> -
> -		mv "$bitmap.bak" "$bitmap" &&
> -
> -		find .git/objects/pack -type f -name "*.pack" >packs &&
> -		find .git/objects/pack -type f -name "*.bitmap" >bitmaps &&
> -		test_line_count = 2 packs &&
> -		test_line_count = 2 bitmaps &&
> -
> -		git rev-list --use-bitmap-index HEAD 2>err &&
> -		grep "ignoring extra bitmap file" err
> -	)
> +test_expect_success 'verify writing bitmap lookup table when enabled' '
> +	GIT_TRACE2_EVENT="$(pwd)/trace2" \
> +		git repack -ad &&
> +	grep "\"label\":\"writing_lookup_table\"" trace2
>  '
>
>  test_done
> diff --git a/t/t5311-pack-bitmaps-shallow.sh b/t/t5311-pack-bitmaps-shallow.sh
> index 872a95df338..9dae60f73e3 100755
> --- a/t/t5311-pack-bitmaps-shallow.sh
> +++ b/t/t5311-pack-bitmaps-shallow.sh
> @@ -17,23 +17,40 @@ test_description='check bitmap operation with shallow repositories'
>  # the tree for A. But in a shallow one, we've grafted away
>  # A, and fetching A to B requires that the other side send
>  # us the tree for file=1.
> -test_expect_success 'setup shallow repo' '
> -	echo 1 >file &&
> -	git add file &&
> -	git commit -m orig &&
> -	echo 2 >file &&
> -	git commit -a -m update &&
> -	git clone --no-local --bare --depth=1 . shallow.git &&
> -	echo 1 >file &&
> -	git commit -a -m repeat
> -'
> -
> -test_expect_success 'turn on bitmaps in the parent' '
> -	git repack -adb
> -'
> -
> -test_expect_success 'shallow fetch from bitmapped repo' '
> -	(cd shallow.git && git fetch)
> -'
> +test_shallow_bitmaps () {
> +	writeLookupTable=false
> +
> +	for i in "$@"
> +	do
> +		case $i in
> +		"pack.writeBitmapLookupTable") writeLookupTable=true;;
> +		esac
> +	done
> +
> +	test_expect_success 'setup shallow repo' '
> +		rm -rf * .git &&
> +		git init &&
> +		git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
> +		echo 1 >file &&
> +		git add file &&
> +		git commit -m orig &&
> +		echo 2 >file &&
> +		git commit -a -m update &&
> +		git clone --no-local --bare --depth=1 . shallow.git &&
> +		echo 1 >file &&
> +		git commit -a -m repeat
> +	'
> +
> +	test_expect_success 'turn on bitmaps in the parent' '
> +		git repack -adb
> +	'
> +
> +	test_expect_success 'shallow fetch from bitmapped repo' '
> +		(cd shallow.git && git fetch)
> +	'
> +}
> +
> +test_shallow_bitmaps
> +test_shallow_bitmaps "pack.writeBitmapLookupTable"
>
>  test_done
> diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
> index 4fe57414c13..3b206adcee6 100755
> --- a/t/t5326-multi-pack-bitmaps.sh
> +++ b/t/t5326-multi-pack-bitmaps.sh
> @@ -15,17 +15,24 @@ GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
>  sane_unset GIT_TEST_MIDX_WRITE_REV
>  sane_unset GIT_TEST_MIDX_READ_RIDX
>
> -midx_bitmap_core
> -
>  bitmap_reuse_tests() {
>  	from=$1
>  	to=$2
> +	writeLookupTable=false
> +
> +	for i in $3-${$#}
> +	do
> +		case $i in
> +		"pack.writeBitmapLookupTable") writeLookupTable=true;;
> +		esac
> +	done
>
>  	test_expect_success "setup pack reuse tests ($from -> $to)" '
>  		rm -fr repo &&
>  		git init repo &&
>  		(
>  			cd repo &&
> +			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
>  			test_commit_bulk 16 &&
>  			git tag old-tip &&
>
> @@ -43,6 +50,7 @@ bitmap_reuse_tests() {
>  	test_expect_success "build bitmap from existing ($from -> $to)" '
>  		(
>  			cd repo &&
> +			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
>  			test_commit_bulk --id=further 16 &&
>  			git tag new-tip &&
>
> @@ -59,6 +67,7 @@ bitmap_reuse_tests() {
>  	test_expect_success "verify resulting bitmaps ($from -> $to)" '
>  		(
>  			cd repo &&
> +			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
>  			git for-each-ref &&
>  			git rev-list --test-bitmap refs/tags/old-tip &&
>  			git rev-list --test-bitmap refs/tags/new-tip
> @@ -66,244 +75,294 @@ bitmap_reuse_tests() {
>  	'
>  }
>
> -bitmap_reuse_tests 'pack' 'MIDX'
> -bitmap_reuse_tests 'MIDX' 'pack'
> -bitmap_reuse_tests 'MIDX' 'MIDX'
> +test_midx_bitmap_cases () {
> +	writeLookupTable=false
> +	writeBitmapLookupTable=
> +
> +	for i in "$@"
> +	do
> +		case $i in
> +		"pack.writeBitmapLookupTable")
> +			writeLookupTable=true
> +			writeBitmapLookupTable="$i"
> +			;;
> +		esac
> +	done
> +
> +	test_expect_success 'setup test_repository' '
> +		rm -rf * .git &&
> +		git init &&
> +		git config pack.writeBitmapLookupTable '"$writeLookupTable"'
> +	'
>
> -test_expect_success 'missing object closure fails gracefully' '
> -	rm -fr repo &&
> -	git init repo &&
> -	test_when_finished "rm -fr repo" &&
> -	(
> -		cd repo &&
> +	midx_bitmap_core
>
> -		test_commit loose &&
> -		test_commit packed &&
> +	bitmap_reuse_tests 'pack' 'MIDX' "$writeBitmapLookupTable"
> +	bitmap_reuse_tests 'MIDX' 'pack' "$writeBitmapLookupTable"
> +	bitmap_reuse_tests 'MIDX' 'MIDX' "$writeBitmapLookupTable"
>
> -		# Do not pass "--revs"; we want a pack without the "loose"
> -		# commit.
> -		git pack-objects $objdir/pack/pack <<-EOF &&
> -		$(git rev-parse packed)
> -		EOF
> +	test_expect_success 'missing object closure fails gracefully' '
> +		rm -fr repo &&
> +		git init repo &&
> +		test_when_finished "rm -fr repo" &&
> +		(
> +			cd repo &&
> +			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
>
> -		test_must_fail git multi-pack-index write --bitmap 2>err &&
> -		grep "doesn.t have full closure" err &&
> -		test_path_is_missing $midx
> -	)
> -'
> +			test_commit loose &&
> +			test_commit packed &&
>
> -midx_bitmap_partial_tests
> +			# Do not pass "--revs"; we want a pack without the "loose"
> +			# commit.
> +			git pack-objects $objdir/pack/pack <<-EOF &&
> +			$(git rev-parse packed)
> +			EOF
>
> -test_expect_success 'removing a MIDX clears stale bitmaps' '
> -	rm -fr repo &&
> -	git init repo &&
> -	test_when_finished "rm -fr repo" &&
> -	(
> -		cd repo &&
> -		test_commit base &&
> -		git repack &&
> -		git multi-pack-index write --bitmap &&
> +			test_must_fail git multi-pack-index write --bitmap 2>err &&
> +			grep "doesn.t have full closure" err &&
> +			test_path_is_missing $midx
> +		)
> +	'
>
> -		# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
> -		stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
> -		rm $midx &&
> +	midx_bitmap_partial_tests
>
> -		# Then write a new MIDX.
> -		test_commit new &&
> -		git repack &&
> -		git multi-pack-index write --bitmap &&
> +	test_expect_success 'removing a MIDX clears stale bitmaps' '
> +		rm -fr repo &&
> +		git init repo &&
> +		test_when_finished "rm -fr repo" &&
> +		(
> +			cd repo &&
> +			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
> +			test_commit base &&
> +			git repack &&
> +			git multi-pack-index write --bitmap &&
> +
> +			# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
> +			stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
> +			rm $midx &&
> +
> +			# Then write a new MIDX.
> +			test_commit new &&
> +			git repack &&
> +			git multi-pack-index write --bitmap &&
> +
> +			test_path_is_file $midx &&
> +			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
> +			test_path_is_missing $stale_bitmap
> +		)
> +	'
>
> -		test_path_is_file $midx &&
> -		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
> -		test_path_is_missing $stale_bitmap
> -	)
> -'
> +	test_expect_success 'pack.preferBitmapTips' '
> +		git init repo &&
> +		test_when_finished "rm -fr repo" &&
> +		(
> +			cd repo &&
> +			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
>
> -test_expect_success 'pack.preferBitmapTips' '
> -	git init repo &&
> -	test_when_finished "rm -fr repo" &&
> -	(
> -		cd repo &&
> +			test_commit_bulk --message="%s" 103 &&
>
> -		test_commit_bulk --message="%s" 103 &&
> +			git log --format="%H" >commits.raw &&
> +			sort <commits.raw >commits &&
>
> -		git log --format="%H" >commits.raw &&
> -		sort <commits.raw >commits &&
> +			git log --format="create refs/tags/%s %H" HEAD >refs &&
> +			git update-ref --stdin <refs &&
>
> -		git log --format="create refs/tags/%s %H" HEAD >refs &&
> -		git update-ref --stdin <refs &&
> +			git multi-pack-index write --bitmap &&
> +			test_path_is_file $midx &&
> +			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
>
> -		git multi-pack-index write --bitmap &&
> -		test_path_is_file $midx &&
> -		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
> +			test-tool bitmap list-commits | sort >bitmaps &&
> +			comm -13 bitmaps commits >before &&
> +			test_line_count = 1 before &&
>
> -		test-tool bitmap list-commits | sort >bitmaps &&
> -		comm -13 bitmaps commits >before &&
> -		test_line_count = 1 before &&
> +			perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
> +				<before | git update-ref --stdin &&
>
> -		perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
> -			<before | git update-ref --stdin &&
> +			rm -fr $midx-$(midx_checksum $objdir).bitmap &&
> +			rm -fr $midx &&
>
> -		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
> -		rm -fr $midx &&
> +			git -c pack.preferBitmapTips=refs/tags/include \
> +				multi-pack-index write --bitmap &&
> +			test-tool bitmap list-commits | sort >bitmaps &&
> +			comm -13 bitmaps commits >after &&
>
> -		git -c pack.preferBitmapTips=refs/tags/include \
> -			multi-pack-index write --bitmap &&
> -		test-tool bitmap list-commits | sort >bitmaps &&
> -		comm -13 bitmaps commits >after &&
> +			! test_cmp before after
> +		)
> +	'
>
> -		! test_cmp before after
> -	)
> -'
> +	test_expect_success 'writing a bitmap with --refs-snapshot' '
> +		git init repo &&
> +		test_when_finished "rm -fr repo" &&
> +		(
> +			cd repo &&
> +			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
>
> -test_expect_success 'writing a bitmap with --refs-snapshot' '
> -	git init repo &&
> -	test_when_finished "rm -fr repo" &&
> -	(
> -		cd repo &&
> +			test_commit one &&
> +			test_commit two &&
>
> -		test_commit one &&
> -		test_commit two &&
> +			git rev-parse one >snapshot &&
>
> -		git rev-parse one >snapshot &&
> +			git repack -ad &&
>
> -		git repack -ad &&
> +			# First, write a MIDX which see both refs/tags/one and
> +			# refs/tags/two (causing both of those commits to receive
> +			# bitmaps).
> +			git multi-pack-index write --bitmap &&
>
> -		# First, write a MIDX which see both refs/tags/one and
> -		# refs/tags/two (causing both of those commits to receive
> -		# bitmaps).
> -		git multi-pack-index write --bitmap &&
> +			test_path_is_file $midx &&
> +			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
>
> -		test_path_is_file $midx &&
> -		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
> +			test-tool bitmap list-commits | sort >bitmaps &&
> +			grep "$(git rev-parse one)" bitmaps &&
> +			grep "$(git rev-parse two)" bitmaps &&
>
> -		test-tool bitmap list-commits | sort >bitmaps &&
> -		grep "$(git rev-parse one)" bitmaps &&
> -		grep "$(git rev-parse two)" bitmaps &&
> +			rm -fr $midx-$(midx_checksum $objdir).bitmap &&
> +			rm -fr $midx &&
>
> -		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
> -		rm -fr $midx &&
> +			# Then again, but with a refs snapshot which only sees
> +			# refs/tags/one.
> +			git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
>
> -		# Then again, but with a refs snapshot which only sees
> -		# refs/tags/one.
> -		git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
> +			test_path_is_file $midx &&
> +			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
>
> -		test_path_is_file $midx &&
> -		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
> +			test-tool bitmap list-commits | sort >bitmaps &&
> +			grep "$(git rev-parse one)" bitmaps &&
> +			! grep "$(git rev-parse two)" bitmaps
> +		)
> +	'
>
> -		test-tool bitmap list-commits | sort >bitmaps &&
> -		grep "$(git rev-parse one)" bitmaps &&
> -		! grep "$(git rev-parse two)" bitmaps
> -	)
> -'
> +	test_expect_success 'write a bitmap with --refs-snapshot (preferred tips)' '
> +		git init repo &&
> +		test_when_finished "rm -fr repo" &&
> +		(
> +			cd repo &&
> +			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
>
> -test_expect_success 'write a bitmap with --refs-snapshot (preferred tips)' '
> -	git init repo &&
> -	test_when_finished "rm -fr repo" &&
> -	(
> -		cd repo &&
> +			test_commit_bulk --message="%s" 103 &&
>
> -		test_commit_bulk --message="%s" 103 &&
> +			git log --format="%H" >commits.raw &&
> +			sort <commits.raw >commits &&
>
> -		git log --format="%H" >commits.raw &&
> -		sort <commits.raw >commits &&
> +			git log --format="create refs/tags/%s %H" HEAD >refs &&
> +			git update-ref --stdin <refs &&
>
> -		git log --format="create refs/tags/%s %H" HEAD >refs &&
> -		git update-ref --stdin <refs &&
> +			git multi-pack-index write --bitmap &&
> +			test_path_is_file $midx &&
> +			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
>
> -		git multi-pack-index write --bitmap &&
> -		test_path_is_file $midx &&
> -		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
> +			test-tool bitmap list-commits | sort >bitmaps &&
> +			comm -13 bitmaps commits >before &&
> +			test_line_count = 1 before &&
>
> -		test-tool bitmap list-commits | sort >bitmaps &&
> -		comm -13 bitmaps commits >before &&
> -		test_line_count = 1 before &&
> +			(
> +				grep -vf before commits.raw &&
> +				# mark missing commits as preferred
> +				sed "s/^/+/" before
> +			) >snapshot &&
>
> +			rm -fr $midx-$(midx_checksum $objdir).bitmap &&
> +			rm -fr $midx &&
> +
> +			git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
> +			test-tool bitmap list-commits | sort >bitmaps &&
> +			comm -13 bitmaps commits >after &&
> +
> +			! test_cmp before after
> +		)
> +	'
> +
> +	test_expect_success 'hash-cache values are propagated from pack bitmaps' '
> +		rm -fr repo &&
> +		git init repo &&
> +		test_when_finished "rm -fr repo" &&
>  		(
> -			grep -vf before commits.raw &&
> -			# mark missing commits as preferred
> -			sed "s/^/+/" before
> -		) >snapshot &&
> +			cd repo &&
> +			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
>
> -		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
> -		rm -fr $midx &&
> +			test_commit base &&
> +			test_commit base2 &&
> +			git repack -adb &&
>
> -		git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
> -		test-tool bitmap list-commits | sort >bitmaps &&
> -		comm -13 bitmaps commits >after &&
> +			test-tool bitmap dump-hashes >pack.raw &&
> +			test_file_not_empty pack.raw &&
> +			sort pack.raw >pack.hashes &&
>
> -		! test_cmp before after
> -	)
> -'
> +			test_commit new &&
> +			git repack &&
> +			git multi-pack-index write --bitmap &&
>
> -test_expect_success 'hash-cache values are propagated from pack bitmaps' '
> -	rm -fr repo &&
> -	git init repo &&
> -	test_when_finished "rm -fr repo" &&
> -	(
> -		cd repo &&
> +			test-tool bitmap dump-hashes >midx.raw &&
> +			sort midx.raw >midx.hashes &&
>
> -		test_commit base &&
> -		test_commit base2 &&
> -		git repack -adb &&
> +			# ensure that every namehash in the pack bitmap can be found in
> +			# the midx bitmap (i.e., that there are no oid-namehash pairs
> +			# unique to the pack bitmap).
> +			comm -23 pack.hashes midx.hashes >dropped.hashes &&
> +			test_must_be_empty dropped.hashes
> +		)
> +	'
>
> -		test-tool bitmap dump-hashes >pack.raw &&
> -		test_file_not_empty pack.raw &&
> -		sort pack.raw >pack.hashes &&
> +	test_expect_success 'no .bitmap is written without any objects' '
> +		rm -fr repo &&
> +		git init repo &&
> +		test_when_finished "rm -fr repo" &&
> +		(
> +			cd repo &&
> +			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
>
> -		test_commit new &&
> -		git repack &&
> -		git multi-pack-index write --bitmap &&
> +			empty="$(git pack-objects $objdir/pack/pack </dev/null)" &&
> +			cat >packs <<-EOF &&
> +			pack-$empty.idx
> +			EOF
>
> -		test-tool bitmap dump-hashes >midx.raw &&
> -		sort midx.raw >midx.hashes &&
> +			git multi-pack-index write --bitmap --stdin-packs \
> +				<packs 2>err &&
>
> -		# ensure that every namehash in the pack bitmap can be found in
> -		# the midx bitmap (i.e., that there are no oid-namehash pairs
> -		# unique to the pack bitmap).
> -		comm -23 pack.hashes midx.hashes >dropped.hashes &&
> -		test_must_be_empty dropped.hashes
> -	)
> -'
> +			grep "bitmap without any objects" err &&
>
> -test_expect_success 'no .bitmap is written without any objects' '
> -	rm -fr repo &&
> -	git init repo &&
> -	test_when_finished "rm -fr repo" &&
> -	(
> -		cd repo &&
> +			test_path_is_file $midx &&
> +			test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
> +		)
> +	'
> +
> +	test_expect_success 'graceful fallback when missing reverse index' '
> +		rm -fr repo &&
> +		git init repo &&
> +		test_when_finished "rm -fr repo" &&
> +		(
> +			cd repo &&
> +			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
>
> -		empty="$(git pack-objects $objdir/pack/pack </dev/null)" &&
> -		cat >packs <<-EOF &&
> -		pack-$empty.idx
> -		EOF
> +			test_commit base &&
>
> -		git multi-pack-index write --bitmap --stdin-packs \
> -			<packs 2>err &&
> +			# write a pack and MIDX bitmap containing base
> +			git repack -adb &&
> +			git multi-pack-index write --bitmap &&
>
> -		grep "bitmap without any objects" err &&
> +			GIT_TEST_MIDX_READ_RIDX=0 \
> +				git rev-list --use-bitmap-index HEAD 2>err &&
> +			! grep "ignoring extra bitmap file" err
> +		)
> +	'
> +}
>
> -		test_path_is_file $midx &&
> -		test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
> -	)
> -'
> +test_midx_bitmap_cases
> +
> +test_midx_bitmap_cases "pack.writeBitmapLookupTable"
>
> -test_expect_success 'graceful fallback when missing reverse index' '
> +test_expect_success 'multi-pack-index write writes lookup table if enabled' '
>  	rm -fr repo &&
>  	git init repo &&
>  	test_when_finished "rm -fr repo" &&
>  	(
>  		cd repo &&
> -
>  		test_commit base &&
> -
> -		# write a pack and MIDX bitmap containing base
> -		git repack -adb &&
> -		git multi-pack-index write --bitmap &&
> -
> -		GIT_TEST_MIDX_READ_RIDX=0 \
> -			git rev-list --use-bitmap-index HEAD 2>err &&
> -		! grep "ignoring extra bitmap file" err
> +		git config pack.writeBitmapLookupTable true &&
> +		git repack -ad &&
> +		GIT_TRACE2_EVENT="$(pwd)/trace" \
> +			git multi-pack-index write --bitmap &&
> +		grep "\"label\":\"writing_lookup_table\"" trace
>  	)
>  '
>
> diff --git a/t/t5327-multi-pack-bitmaps-rev.sh b/t/t5327-multi-pack-bitmaps-rev.sh
> index d30ba632c87..5ed16a820d1 100755
> --- a/t/t5327-multi-pack-bitmaps-rev.sh
> +++ b/t/t5327-multi-pack-bitmaps-rev.sh
> @@ -17,7 +17,27 @@ GIT_TEST_MIDX_READ_RIDX=0
>  export GIT_TEST_MIDX_WRITE_REV
>  export GIT_TEST_MIDX_READ_RIDX
>
> -midx_bitmap_core rev
> -midx_bitmap_partial_tests rev
> +test_midx_bitmap_rev () {
> +     writeLookupTable=false
> +
> + 	for i in "$@"
> + 	do
> + 		case $i in
> + 		"pack.writeBitmapLookupTable") writeLookupTable=true;;
> + 		esac
> + 	done
> +
> +     test_expect_success 'setup bitmap config' '
> +         rm -rf * .git &&
> +         git init &&
> +         git config pack.writeBitmapLookupTable '"$writeLookupTable"'
> +     '
> +
> +     midx_bitmap_core rev
> +     midx_bitmap_partial_tests rev
> + }
> +
> + test_midx_bitmap_rev
> + test_midx_bitmap_rev "pack.writeBitmapLookupTable"
>
>  test_done
> --
> gitgitgadget
>
>
>

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-07-28 19:22           ` Johannes Schindelin
@ 2022-08-02 12:40             ` Abhradeep Chakraborty
  2022-08-02 15:35               ` Johannes Schindelin
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-08-02 12:40 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Taylor Blau,
	Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren

On Fri, Jul 29, 2022 at 12:52 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> That's quite a large a change, and unfortunately I pinpointed a flake to
> this patch when running with GIT_TEST_DEFAULT_HASH=sha256. The symptom is
> this:

Hi Dscho, sorry for this long delay in response. I was quite busy for
3-4 days in hostel room shifting. So, I couldn't work properly during
this time.

> -- snip --
> [...]
> + diff -u expect.normalized actual.normalized
> + rm -f expect.normalized actual.normalized
> ok 317 - enumerate --objects (full bitmap, other)
>
> expecting success of 5326.318 'bitmap --objects handles non-commit objects (full bitmap, other)':
>                 git rev-list --objects --use-bitmap-index $branch tagged-blob >actual &&
>                 grep $blob actual
>
> + git rev-list --objects --use-bitmap-index other tagged-blob
> + grep bff4ed5e839bd73e821f78b45a7fa34208aa85596535ec8e9ac5eab477ca6f81 actual
> bff4ed5e839bd73e821f78b45a7fa34208aa85596535ec8e9ac5eab477ca6f81
> ok 318 - bitmap --objects handles non-commit objects (full bitmap, other)
>
> expecting success of 5326.319 'clone from bitmapped repository':
>                 rm -fr clone.git &&
>                 git clone --no-local --bare . clone.git &&
>                 git rev-parse HEAD >expect &&
>                 git --git-dir=clone.git rev-parse HEAD >actual &&
>                 test_cmp expect actual
>
> + rm -fr clone.git
> + git clone --no-local --bare . clone.git
> Cloning into bare repository 'clone.git'...
> remote: Enumerating objects: 756, done.
> remote: Counting objects: 100% (754/754), done.
> remote: Compressing objects: 100% (281/281), done.
> remote: Total 756 (delta 245), reused 740 (delta 234), pack-reused 2
> Receiving objects: 100% (756/756), 77.50 KiB | 8.61 MiB/s, done.
> fatal: REF_DELTA at offset 221 already resolved (duplicate base 4d332072f161629ffe4652ecd3ce377ef88447bec73f05ab0f3515f98bd061cf?)
> fatal: fetch-pack: invalid index-pack output
> error: last command exited with $?=128
> not ok 319 - clone from bitmapped repository
> #
> #                       rm -fr clone.git &&
> #                       git clone --no-local --bare . clone.git &&
> #                       git rev-parse HEAD >expect &&
> #                       git --git-dir=clone.git rev-parse HEAD >actual &&
> #                       test_cmp expect actual
> #
> 1..319
> -- snap --
>
> On a hunch, I ran this through valgrind (took a while) but it did not
> point out the problem.
>
> Again, this is only with SHA-256 (and somewhat flaky), it passes every
> time with SHA-1. Maybe you can reproduce on your side with that
> information?

Yeah, I can reproduce it on my side. But I am sure it is not related
to the lookup table implementation code. Because when I swap the order
of calling  `test_midx_bitmap_cases "pack.writeBitmapLookupTable"` and
`test_midx_bitmap_cases` (in t5326-multi-pack-bitmaps.sh), in that
case, the error is being generated in  `test_midx_bitmap_cases` call.
Generally speaking, the error is always being generated in the second
call.

For now, my understanding says that there is something fishy in the
test script. I am still not able to figure out the problem here. But
let me further investigate.

If anyone has some idea about what could be the culprit, I will be
very happy to know.

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-08-02 12:40             ` Abhradeep Chakraborty
@ 2022-08-02 15:35               ` Johannes Schindelin
  2022-08-02 17:44                 ` Abhradeep Chakraborty
  0 siblings, 1 reply; 162+ messages in thread
From: Johannes Schindelin @ 2022-08-02 15:35 UTC (permalink / raw)
  To: Abhradeep Chakraborty
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Taylor Blau,
	Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren

Hi Abhradeep,

On Tue, 2 Aug 2022, Abhradeep Chakraborty wrote:

> On Fri, Jul 29, 2022 at 12:52 AM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > That's quite a large a change, and unfortunately I pinpointed a flake to
> > this patch when running with GIT_TEST_DEFAULT_HASH=sha256. The symptom is
> > this:
>
> Hi Dscho, sorry for this long delay in response. I was quite busy for
> 3-4 days in hostel room shifting. So, I couldn't work properly during
> this time.
>
> > -- snip --
> > [...]
> > + diff -u expect.normalized actual.normalized
> > + rm -f expect.normalized actual.normalized
> > ok 317 - enumerate --objects (full bitmap, other)
> >
> > expecting success of 5326.318 'bitmap --objects handles non-commit objects (full bitmap, other)':
> >                 git rev-list --objects --use-bitmap-index $branch tagged-blob >actual &&
> >                 grep $blob actual
> >
> > + git rev-list --objects --use-bitmap-index other tagged-blob
> > + grep bff4ed5e839bd73e821f78b45a7fa34208aa85596535ec8e9ac5eab477ca6f81 actual
> > bff4ed5e839bd73e821f78b45a7fa34208aa85596535ec8e9ac5eab477ca6f81
> > ok 318 - bitmap --objects handles non-commit objects (full bitmap, other)
> >
> > expecting success of 5326.319 'clone from bitmapped repository':
> >                 rm -fr clone.git &&
> >                 git clone --no-local --bare . clone.git &&
> >                 git rev-parse HEAD >expect &&
> >                 git --git-dir=clone.git rev-parse HEAD >actual &&
> >                 test_cmp expect actual
> >
> > + rm -fr clone.git
> > + git clone --no-local --bare . clone.git
> > Cloning into bare repository 'clone.git'...
> > remote: Enumerating objects: 756, done.
> > remote: Counting objects: 100% (754/754), done.
> > remote: Compressing objects: 100% (281/281), done.
> > remote: Total 756 (delta 245), reused 740 (delta 234), pack-reused 2
> > Receiving objects: 100% (756/756), 77.50 KiB | 8.61 MiB/s, done.
> > fatal: REF_DELTA at offset 221 already resolved (duplicate base 4d332072f161629ffe4652ecd3ce377ef88447bec73f05ab0f3515f98bd061cf?)
> > fatal: fetch-pack: invalid index-pack output
> > error: last command exited with $?=128
> > not ok 319 - clone from bitmapped repository
> > #
> > #                       rm -fr clone.git &&
> > #                       git clone --no-local --bare . clone.git &&
> > #                       git rev-parse HEAD >expect &&
> > #                       git --git-dir=clone.git rev-parse HEAD >actual &&
> > #                       test_cmp expect actual
> > #
> > 1..319
> > -- snap --
> >
> > On a hunch, I ran this through valgrind (took a while) but it did not
> > point out the problem.
> >
> > Again, this is only with SHA-256 (and somewhat flaky), it passes every
> > time with SHA-1. Maybe you can reproduce on your side with that
> > information?
>
> Yeah, I can reproduce it on my side.

Good.

> But I am sure it is not related to the lookup table implementation code.
> Because when I swap the order of calling  `test_midx_bitmap_cases
> "pack.writeBitmapLookupTable"` and `test_midx_bitmap_cases` (in
> t5326-multi-pack-bitmaps.sh), in that case, the error is being generated
> in  `test_midx_bitmap_cases` call. Generally speaking, the error is
> always being generated in the second call.

Indeed, it probably has something to do with the test tick (which gives
rise to the author/committer date of the commits that are generated, and
hence with the SHA order of said commits).

With this patch:

-- snip --
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 3b206adcee6..a340f005b89 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -347,7 +347,11 @@ test_midx_bitmap_cases () {
 	'
 }

-test_midx_bitmap_cases
+# test_midx_bitmap_cases
+
+GIT_COMMITTER_DATE='1112928553 -0700'
+GIT_AUTHOR_DATE='1112928553 -0700'
+test_tick='1112928553'

 test_midx_bitmap_cases "pack.writeBitmapLookupTable"

-- snap --

I can reproduce it quicker via

	GIT_TEST_DEFAULT_HASH=sha256 sh t5326-*.sh --run=1,71,91,92,93,124,145

Without setting those variables, I cannot skip the first
`test_midx_bitmap_cases` invocation _and_ reproduce the failure.

For shiggles, I now also ran this command-line after deleting the
`"pack.writeBitmapLookupTable"` argument, and it fails in the exact same
way. So you're correct: this has nothing to do with the
`writeBitmapLookupTable` code, it's just a failure that is triggered by
those patches.

> For now, my understanding says that there is something fishy in the
> test script.

I do not actually think so. I believe that this just points out a bug in
the MIDX bitmap code.

> I am still not able to figure out the problem here. But let me further
> investigate.
>
> If anyone has some idea about what could be the culprit, I will be
> very happy to know.

So I noticed that the test will pass every 4th to 5th time over here,
which means that it is a racy condition that is the culprit.

I dug a bit deeper and reduced the reproducer even further, by running
this command with a trash directory just after above test script
invocation failed:

	bin-wrappers/git -C t/trash\ directory.t5326-multi-pack-bitmaps/ \
		-c pack.threads=1 pack-objects --revs --thin --stdout \
		--progress --delta-base-offset </tmp/a5 |
	bin-wrappers/git -C t/trash\ directory.t5326-multi-pack-bitmaps/ \
		-c pack.threads=1 index-pack --stdin -v --fix-thin \
		'--keep=fetch-pack 12345 on labtop' \
		--check-self-contained-and-connected

where `/tmp/a5` contains these lines:

-- snip --
0ae5a358dcea86d81c0903aaec1e21857688cdb36c7fd89b04bd293fb2cceaa6
67df8a01ac84cf5f028855c48384eac3336bb02a52603bac285c4b31d66b3ab5
098a57f7753320c8a37cf0cb84526a9e50439d9f70fb673c91436a5283a7efe8
--not
-- snap --

This allowed me to instrument the code with _many_ debug printf statements
(I actually use `error("%s:d: ...", __FILE__, __LINE__, ...)` calls) to
dive deeper into the weeds.

One relatively obvious difference I can see is that when the code reaches
builtin/pack-objects.c:1198, in the passing case after writing the reused
pack we're at offset 900 in the written pack file, but in the failing case
we're at offset 269.

Another difference I first saw was that the mtime of
`.git/objects/pack/multi-pack-index` was identical to the mtime of
`.git/objects/pack/multi-pack-index-2ec3c30357d2fff78db9b36cc749b393087e989bffdd278771d6f62089406061.bitmap`
in the failing case, while the mtimes of the corresponding files were
different in the passing case.

But in another failing run, the mtimes were also non-identical. Meaning:
the race cannot be caused by identical or non-identical timestamps there.

One consistent difference, however, was the SHA-256 in that `.bitmap` file
name: In the failing case it was always
2ec3c30357d2fff78db9b36cc749b393087e989bffdd278771d6f62089406061, while in
the succeeding case it was always
0c275657a915eeff1f2a1c17e5ded43cc3b232b0e178923e44fc15c1970516fb.

My suspicion is that this `.bitmap` file is written out in an earlier test
case, and is already incorrect at that stage. Maybe it should have been
updated, but isn't, and the result is an incorrectly-reused partial pack
file.

I also noticed that deleting the `multi-pack-index-*.bitmap` file in the
failing case will "fix" the `pack-objects | index-pack` command I showed
above.

Hopefully this will help you dig in further because even if the bug is not
in your code, it needs to be fixed. And I suspect that it is a bug in the
code we already have in the main branch, so that fix is really, really
needed, now.

Since you are very familiar with the details of bitmaps now, I would like
to encourage you to work on some kind of validator/inspector, e.g.
something along the lines of a `test-tool midx-bitmap dump` (and later
`... verify`) that would help future you (and future me) investigate
similar breakages. Ideally, that tool will not only parse the `.bitmap`
file but immediately print out everything in a human-readable form.

The reason I suggest this: I got a bit tired of staring at the output of
`hexdump -C` and comparing it to the documentation in
https://git-scm.com/docs/pack-format, so I had to stop after looking too
long at one broken pack file (i.e. the output of the `pack-objects`
command I showed above, where already the first entry seems to have an
infinite delta chain that pretends that
4d332072f161629ffe4652ecd3ce377ef88447bec73f05ab0f3515f98bd061cf has
itself as delta base) before I even could analyze the MIDX bitmap files.

The proposed tool would make analyzing MIDX bitmaps substantially more
fun, and would also help stave off future breakages if it was taught some
`verify` mode that would essentially automate what right now has to be
done manually: to verify that the MIDX bitmap file contents are sound and
consistent with the contents of the pack files.

Obviously, this `verify` command should be called in strategic places of
t5326.

Thanks,
Dscho

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-08-02 15:35               ` Johannes Schindelin
@ 2022-08-02 17:44                 ` Abhradeep Chakraborty
  2022-08-08 13:06                   ` Johannes Schindelin
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-08-02 17:44 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Taylor Blau,
	Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren

On Tue, Aug 2, 2022 at 9:05 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> > But I am sure it is not related to the lookup table implementation code.
> > Because when I swap the order of calling  `test_midx_bitmap_cases
> > "pack.writeBitmapLookupTable"` and `test_midx_bitmap_cases` (in
> > t5326-multi-pack-bitmaps.sh), in that case, the error is being generated
> > in  `test_midx_bitmap_cases` call. Generally speaking, the error is
> > always being generated in the second call.
>
> Indeed, it probably has something to do with the test tick (which gives
> rise to the author/committer date of the commits that are generated, and
> hence with the SHA order of said commits).
>
> With this patch:
>
> -- snip --
> diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
> index 3b206adcee6..a340f005b89 100755
> --- a/t/t5326-multi-pack-bitmaps.sh
> +++ b/t/t5326-multi-pack-bitmaps.sh
> @@ -347,7 +347,11 @@ test_midx_bitmap_cases () {
>         '
>  }
>
> -test_midx_bitmap_cases
> +# test_midx_bitmap_cases
> +
> +GIT_COMMITTER_DATE='1112928553 -0700'
> +GIT_AUTHOR_DATE='1112928553 -0700'
> +test_tick='1112928553'
>
>  test_midx_bitmap_cases "pack.writeBitmapLookupTable"
>
> -- snap --
>
> I can reproduce it quicker via
>
>         GIT_TEST_DEFAULT_HASH=sha256 sh t5326-*.sh --run=1,71,91,92,93,124,145
>
> Without setting those variables, I cannot skip the first
> `test_midx_bitmap_cases` invocation _and_ reproduce the failure.

Yeah, this works for me also.

> > I am still not able to figure out the problem here. But let me further
> > investigate.
> >
> > If anyone has some idea about what could be the culprit, I will be
> > very happy to know.
>
> So I noticed that the test will pass every 4th to 5th time over here,
> which means that it is a racy condition that is the culprit.

I also encountered the same and it blew my mind at first (because it
is the first race condition that I faced in my life) :)

> I dug a bit deeper and reduced the reproducer even further, by running
> this command with a trash directory just after above test script
> invocation failed:
>
>         bin-wrappers/git -C t/trash\ directory.t5326-multi-pack-bitmaps/ \
>                 -c pack.threads=1 pack-objects --revs --thin --stdout \
>                 --progress --delta-base-offset </tmp/a5 |
>         bin-wrappers/git -C t/trash\ directory.t5326-multi-pack-bitmaps/ \
>                 -c pack.threads=1 index-pack --stdin -v --fix-thin \
>                 '--keep=fetch-pack 12345 on labtop' \
>                 --check-self-contained-and-connected
>
> where `/tmp/a5` contains these lines:
>
> -- snip --
> 0ae5a358dcea86d81c0903aaec1e21857688cdb36c7fd89b04bd293fb2cceaa6
> 67df8a01ac84cf5f028855c48384eac3336bb02a52603bac285c4b31d66b3ab5
> 098a57f7753320c8a37cf0cb84526a9e50439d9f70fb673c91436a5283a7efe8
> --not
> -- snap --
>
> This allowed me to instrument the code with _many_ debug printf statements
> (I actually use `error("%s:d: ...", __FILE__, __LINE__, ...)` calls) to
> dive deeper into the weeds.
>
> One relatively obvious difference I can see is that when the code reaches
> builtin/pack-objects.c:1198, in the passing case after writing the reused
> pack we're at offset 900 in the written pack file, but in the failing case
> we're at offset 269.
>
> Another difference I first saw was that the mtime of
> `.git/objects/pack/multi-pack-index` was identical to the mtime of
> `.git/objects/pack/multi-pack-index-2ec3c30357d2fff78db9b36cc749b393087e989bffdd278771d6f62089406061.bitmap`
> in the failing case, while the mtimes of the corresponding files were
> different in the passing case.
>
> But in another failing run, the mtimes were also non-identical. Meaning:
> the race cannot be caused by identical or non-identical timestamps there.
>
> One consistent difference, however, was the SHA-256 in that `.bitmap` file
> name: In the failing case it was always
> 2ec3c30357d2fff78db9b36cc749b393087e989bffdd278771d6f62089406061, while in
> the succeeding case it was always
> 0c275657a915eeff1f2a1c17e5ded43cc3b232b0e178923e44fc15c1970516fb.
>
> My suspicion is that this `.bitmap` file is written out in an earlier test
> case, and is already incorrect at that stage. Maybe it should have been
> updated, but isn't, and the result is an incorrectly-reused partial pack
> file.

I agree with you.

> I also noticed that deleting the `multi-pack-index-*.bitmap` file in the
> failing case will "fix" the `pack-objects | index-pack` command I showed
> above.
>
> Hopefully this will help you dig in further because even if the bug is not
> in your code, it needs to be fixed. And I suspect that it is a bug in the
> code we already have in the main branch, so that fix is really, really
> needed, now.

Yeah, definitely! Thanks for all the information! It will truly help
me to identify the problem.

> Since you are very familiar with the details of bitmaps now, I would like
> to encourage you to work on some kind of validator/inspector, e.g.
> something along the lines of a `test-tool midx-bitmap dump` (and later
> `... verify`) that would help future you (and future me) investigate
> similar breakages. Ideally, that tool will not only parse the `.bitmap`
> file but immediately print out everything in a human-readable form.
>
> The reason I suggest this: I got a bit tired of staring at the output of
> `hexdump -C` and comparing it to the documentation in
> https://git-scm.com/docs/pack-format, so I had to stop after looking too
> long at one broken pack file (i.e. the output of the `pack-objects`
> command I showed above, where already the first entry seems to have an
> infinite delta chain that pretends that
> 4d332072f161629ffe4652ecd3ce377ef88447bec73f05ab0f3515f98bd061cf has
> itself as delta base) before I even could analyze the MIDX bitmap files.
>
> The proposed tool would make analyzing MIDX bitmaps substantially more
> fun, and would also help stave off future breakages if it was taught some
> `verify` mode that would essentially automate what right now has to be
> done manually: to verify that the MIDX bitmap file contents are sound and
> consistent with the contents of the pack files.
>
> Obviously, this `verify` command should be called in strategic places of
> t5326.

Ok, sure!

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-08-02 17:44                 ` Abhradeep Chakraborty
@ 2022-08-08 13:06                   ` Johannes Schindelin
  2022-08-08 13:58                     ` Abhradeep Chakraborty
  0 siblings, 1 reply; 162+ messages in thread
From: Johannes Schindelin @ 2022-08-08 13:06 UTC (permalink / raw)
  To: Abhradeep Chakraborty
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Taylor Blau,
	Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren

Hi Abhradeep,

On Tue, 2 Aug 2022, Abhradeep Chakraborty wrote:

> On Tue, Aug 2, 2022 at 9:05 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>
> > Since you are very familiar with the details of bitmaps now, I would
> > like to encourage you to work on some kind of validator/inspector,
> > e.g. something along the lines of a `test-tool midx-bitmap dump` (and
> > later `... verify`) that would help future you (and future me)
> > investigate similar breakages. Ideally, that tool will not only parse
> > the `.bitmap` file but immediately print out everything in a
> > human-readable form.

Have you made progress on this? I am interested mostly because I am trying
very hard to maintain passing CI runs of Git for Windows' `shears/seen`
branch (which essentially tries to rebase all of Git for Windows' patches
on top of `seen`), and this failure is consistently causing said CI runs
to fail for a while already.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-08-08 13:06                   ` Johannes Schindelin
@ 2022-08-08 13:58                     ` Abhradeep Chakraborty
  2022-08-09  9:03                       ` Johannes Schindelin
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-08-08 13:58 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Taylor Blau,
	Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren

On Mon, Aug 8, 2022 at 6:36 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Abhradeep,
>
> On Tue, 2 Aug 2022, Abhradeep Chakraborty wrote:
>
> > On Tue, Aug 2, 2022 at 9:05 PM Johannes Schindelin
> > <Johannes.Schindelin@gmx.de> wrote:
> >
> > > Since you are very familiar with the details of bitmaps now, I would
> > > like to encourage you to work on some kind of validator/inspector,
> > > e.g. something along the lines of a `test-tool midx-bitmap dump` (and
> > > later `... verify`) that would help future you (and future me)
> > > investigate similar breakages. Ideally, that tool will not only parse
> > > the `.bitmap` file but immediately print out everything in a
> > > human-readable form.
>
> Have you made progress on this? I am interested mostly because I am trying
> very hard to maintain passing CI runs of Git for Windows' `shears/seen`
> branch (which essentially tries to rebase all of Git for Windows' patches
> on top of `seen`), and this failure is consistently causing said CI runs
> to fail for a while already.

Hey Dscho, I am trying hard to solve the issue but unfortunately I
haven't found the key yet.
I investigated the bitmap code-base and used debug lines but didn't
find a way to fix it. Sorry for that :|
I am still trying it.

Hope I will be able to share the good news soon. Thanks :)
> Ciao,
> Dscho

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-08-08 13:58                     ` Abhradeep Chakraborty
@ 2022-08-09  9:03                       ` Johannes Schindelin
  2022-08-09 12:03                         ` Abhradeep Chakraborty
  0 siblings, 1 reply; 162+ messages in thread
From: Johannes Schindelin @ 2022-08-09  9:03 UTC (permalink / raw)
  To: Abhradeep Chakraborty
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Taylor Blau,
	Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren

Hi Abhradeep,

On Mon, 8 Aug 2022, Abhradeep Chakraborty wrote:

> On Mon, Aug 8, 2022 at 6:36 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > On Tue, 2 Aug 2022, Abhradeep Chakraborty wrote:
> >
> > > On Tue, Aug 2, 2022 at 9:05 PM Johannes Schindelin
> > > <Johannes.Schindelin@gmx.de> wrote:
> > >
> > > > Since you are very familiar with the details of bitmaps now, I would
> > > > like to encourage you to work on some kind of validator/inspector,
> > > > e.g. something along the lines of a `test-tool midx-bitmap dump` (and
> > > > later `... verify`) that would help future you (and future me)
> > > > investigate similar breakages. Ideally, that tool will not only parse
> > > > the `.bitmap` file but immediately print out everything in a
> > > > human-readable form.
> >
> > Have you made progress on this? I am interested mostly because I am trying
> > very hard to maintain passing CI runs of Git for Windows' `shears/seen`
> > branch (which essentially tries to rebase all of Git for Windows' patches
> > on top of `seen`), and this failure is consistently causing said CI runs
> > to fail for a while already.
>
> Hey Dscho, I am trying hard to solve the issue but unfortunately I
> haven't found the key yet.

The tool I proposed could potentially help, in particular with
distributing the burden of the investigation on more shoulders than just
yours.

> I investigated the bitmap code-base and used debug lines but didn't
> find a way to fix it.

Have you investigated whether the `.bitmap` file was produced for the
latest set of pack files? It should be relatively quick to investigate
that, and if it turns out not to be the case, the fix should be quick,
too.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-08-09  9:03                       ` Johannes Schindelin
@ 2022-08-09 12:03                         ` Abhradeep Chakraborty
  2022-08-09 12:07                           ` Abhradeep Chakraborty
  2022-08-10  9:09                           ` Johannes Schindelin
  0 siblings, 2 replies; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-08-09 12:03 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Taylor Blau,
	Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren

On Tue, Aug 9, 2022 at 2:33 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Abhradeep,
>
> On Mon, 8 Aug 2022, Abhradeep Chakraborty wrote:
>
> > On Mon, Aug 8, 2022 at 6:36 PM Johannes Schindelin
> > <Johannes.Schindelin@gmx.de> wrote:
> > >
> > > On Tue, 2 Aug 2022, Abhradeep Chakraborty wrote:
> > >
> > > > On Tue, Aug 2, 2022 at 9:05 PM Johannes Schindelin
> > > > <Johannes.Schindelin@gmx.de> wrote:
> > > >
> > > > > Since you are very familiar with the details of bitmaps now, I would
> > > > > like to encourage you to work on some kind of validator/inspector,
> > > > > e.g. something along the lines of a `test-tool midx-bitmap dump` (and
> > > > > later `... verify`) that would help future you (and future me)
> > > > > investigate similar breakages. Ideally, that tool will not only parse
> > > > > the `.bitmap` file but immediately print out everything in a
> > > > > human-readable form.
> > >
> > > Have you made progress on this? I am interested mostly because I am trying
> > > very hard to maintain passing CI runs of Git for Windows' `shears/seen`
> > > branch (which essentially tries to rebase all of Git for Windows' patches
> > > on top of `seen`), and this failure is consistently causing said CI runs
> > > to fail for a while already.
> >
> > Hey Dscho, I am trying hard to solve the issue but unfortunately I
> > haven't found the key yet.
>
> The tool I proposed could potentially help, in particular with
> distributing the burden of the investigation on more shoulders than just
> yours.

Yeah, it should. I thought that I would do that after fixing the bug.
Now I think I was wrong.

> > I investigated the bitmap code-base and used debug lines but didn't
> > find a way to fix it.
>
> Have you investigated whether the `.bitmap` file was produced for the
> latest set of pack files? It should be relatively quick to investigate
> that, and if it turns out not to be the case, the fix should be quick,
> too.

Frankly speaking, I doubt that the generated multi-pack-index file is
faulty. The first reason is the `.bitmap` filename. As you said before
(and as I noticed here), `.bitmap` filenames in failing case and in
passing case are different. As far as I know the hash value in the
filename depends on the content of its respective midx file. So, if
the midx contents were the same in both cases, `.bitmap` filename
should not differ.

I compared both the multi-pack-index files (i.e. passing case and
failing case) using `cmp ./trash\
directory.t5326-multi-pack-bitmaps/.git/objects/pack/multi-pack-index
../tmp/trash\ directory.t5326-multi-pack-bitmaps/.git/objects/pack/multi-pack-index`
and found that these both defers -

    differ: char 3124, line 10

I also checked whether the `packing_data->in_pack_by_idx` contained
all the packs. For this I wrote a debug error message in
`prepare_in_pack_by_idx()[1]` function and found that `packing_data`
is using the latest packs.

 I noticed in the 'setup partial bitmaps' test case that if we comment
out the line `git repack &&` , it runs successfully.

    test_expect_success 'setup partial bitmaps' '
        test_commit packed &&
        # git repack &&
        test_commit loose &&
        git multi-pack-index write --bitmap 2>err &&
        ...
    '

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-08-09 12:03                         ` Abhradeep Chakraborty
@ 2022-08-09 12:07                           ` Abhradeep Chakraborty
  2022-08-10  9:09                           ` Johannes Schindelin
  1 sibling, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-08-09 12:07 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Taylor Blau,
	Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren

On Tue, Aug 9, 2022 at 5:33 PM Abhradeep Chakraborty
<chakrabortyabhradeep79@gmail.com> wrote:
> I also checked whether the `packing_data->in_pack_by_idx` contained
> all the packs. For this I wrote a debug error message in
> `prepare_in_pack_by_idx()[1]` function and found that `packing_data`
> is using the latest packs.

Looks like I forgot to specify the link -
https://github.com/git/git/blob/c50926e1f48891e2671e1830dbcd2912a4563450/pack-objects.c#L87

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-08-09 12:03                         ` Abhradeep Chakraborty
  2022-08-09 12:07                           ` Abhradeep Chakraborty
@ 2022-08-10  9:09                           ` Johannes Schindelin
  2022-08-10  9:20                             ` Johannes Schindelin
  2022-08-16 18:47                             ` Taylor Blau
  1 sibling, 2 replies; 162+ messages in thread
From: Johannes Schindelin @ 2022-08-10  9:09 UTC (permalink / raw)
  To: Abhradeep Chakraborty
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Taylor Blau,
	Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren

Hi Abhradeep,

On Tue, 9 Aug 2022, Abhradeep Chakraborty wrote:

>  I noticed in the 'setup partial bitmaps' test case that if we comment
> out the line `git repack &&` , it runs successfully.
>
>     test_expect_success 'setup partial bitmaps' '
>         test_commit packed &&
>         # git repack &&
>         test_commit loose &&
>         git multi-pack-index write --bitmap 2>err &&
>         ...
>     '

That's interesting. Are the `.bitmap` and `.midx` files updated as part of
that `repack`?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-08-10  9:09                           ` Johannes Schindelin
@ 2022-08-10  9:20                             ` Johannes Schindelin
  2022-08-10 10:04                               ` Abhradeep Chakraborty
  2022-08-13 11:05                               ` Abhradeep Chakraborty
  2022-08-16 18:47                             ` Taylor Blau
  1 sibling, 2 replies; 162+ messages in thread
From: Johannes Schindelin @ 2022-08-10  9:20 UTC (permalink / raw)
  To: Abhradeep Chakraborty
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Taylor Blau,
	Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren

Hi Abhradeep,

On Wed, 10 Aug 2022, Johannes Schindelin wrote:

> On Tue, 9 Aug 2022, Abhradeep Chakraborty wrote:
>
> >  I noticed in the 'setup partial bitmaps' test case that if we comment
> > out the line `git repack &&` , it runs successfully.
> >
> >     test_expect_success 'setup partial bitmaps' '
> >         test_commit packed &&
> >         # git repack &&
> >         test_commit loose &&
> >         git multi-pack-index write --bitmap 2>err &&
> >         ...
> >     '
>
> That's interesting. Are the `.bitmap` and `.midx` files updated as part of
> that `repack`?

I instrumented this, and saw that the `multi-pack-index` and
`multi-pack-index*.bitmap` files were unchanged by the `git repack`
invocation.

Re-generating the MIDX bitmap forcefully after the repack seems to fix
things over here:

-- snip --
diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index a95537e759b..564124bda27 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -438,7 +438,10 @@ midx_bitmap_partial_tests () {

 	test_expect_success 'setup partial bitmaps' '
 		test_commit packed &&
+ls -l .git/objects/pack/ &&
 		git repack &&
+git multi-pack-index write --bitmap &&
+ls -l .git/objects/pack/ &&
 		test_commit loose &&
 		git multi-pack-index write --bitmap 2>err &&
 		test_path_is_file $midx &&
-- snap --

This suggests to me that the `multi-pack-index write --bitmap 2>err` call
in this hunk might reuse a stale MIDX bitmap, and that _that_  might be
the root cause of this breakage.

What do you think?

Ciao,
Dscho

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-08-10  9:20                             ` Johannes Schindelin
@ 2022-08-10 10:04                               ` Abhradeep Chakraborty
  2022-08-10 17:51                                 ` Derrick Stolee
  2022-08-13 11:05                               ` Abhradeep Chakraborty
  1 sibling, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-08-10 10:04 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Taylor Blau,
	Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren

On Wed, Aug 10, 2022 at 2:50 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Abhradeep,
> I instrumented this, and saw that the `multi-pack-index` and
> `multi-pack-index*.bitmap` files were unchanged by the `git repack`
> invocation.

Yeah, those two files remain unchanged here.

> Re-generating the MIDX bitmap forcefully after the repack seems to fix
> things over here:
>
> -- snip --
> diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
> index a95537e759b..564124bda27 100644
> --- a/t/lib-bitmap.sh
> +++ b/t/lib-bitmap.sh
> @@ -438,7 +438,10 @@ midx_bitmap_partial_tests () {
>
>         test_expect_success 'setup partial bitmaps' '
>                 test_commit packed &&
> +ls -l .git/objects/pack/ &&
>                 git repack &&
> +git multi-pack-index write --bitmap &&
> +ls -l .git/objects/pack/ &&
>                 test_commit loose &&
>                 git multi-pack-index write --bitmap 2>err &&
>                 test_path_is_file $midx &&
> -- snap --
>
> This suggests to me that the `multi-pack-index write --bitmap 2>err` call
> in this hunk might reuse a stale MIDX bitmap, and that _that_  might be
> the root cause of this breakage.

Yeah, the `multi-pack-index write --bitmap 2>err` is creating the
problem. More specifically the `multi-pack-index write` part. As you
can see in my previous  comment (if you get the comment), I shared a
screenshot there which pointed out that the multi-pack-index files in
both cases are different. The portion from which it started to differ
belongs to the `RIDX` chunk.

So, I used some debug lines in `midx_pack_order()` function[1] and
found that the objects are sorted differently in those cases (i.e.
passing case and failing case). For passing case, the RIDX chunk
contents are like below -

pack_order = [ 1, 36, 11, 6, 18, 3, 19, 12, 5, 31, 27, 23, 29, 8, 38,
22, 9, 15, 14, 24, 37, 28, 7, 39, 10, 34, 26, 4, 30, 33, 2, 35, 17,
32, 0, 21, 16, 25, 13, 40, 20,]

And in the failing case, this is -

pack_order = [ 12, 18, 3, 19, 1, 36, 11, 6, 5, 31, 27, 23, 29, 8, 38,
22, 9, 15, 14, 24, 37, 28, 7, 39, 10, 34, 26, 4, 30, 33, 2, 35, 17,
32, 0, 21, 16, 25, 13, 40, 20,]

I went further and realized that this is due to the line[2] -

    if (!e->preferred)
        data[i].pack |= (1U << 31);

I.e. 4- 5 `pack_midx_entry` objects have different `preferred` values
in those cases. For example,
"46193a971f5045cb3ca6022957541f9ccddfbfe78591d8506e2d952f8113059b"
(with pack order 12) is `preferred` in failing case (that's why it is
in the first position) and the same is `not preferred` in the passing
case.

It may be because of reusing a stale midx bitmap (as you said). But I
am not sure. Just to ensure myself, I compared all the other
packfiles, idx files and a pack `.bitmap` file (which you can see
using ls command) of failing and passing cases and found that they are
the same.

Thanks :)

[1] https://github.com/git/git/blob/c50926e1f48891e2671e1830dbcd2912a4563450/midx.c#L861
[2] https://github.com/git/git/blob/c50926e1f48891e2671e1830dbcd2912a4563450/midx.c#L872

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-08-10 10:04                               ` Abhradeep Chakraborty
@ 2022-08-10 17:51                                 ` Derrick Stolee
  2022-08-12 18:51                                   ` Abhradeep Chakraborty
  0 siblings, 1 reply; 162+ messages in thread
From: Derrick Stolee @ 2022-08-10 17:51 UTC (permalink / raw)
  To: Abhradeep Chakraborty, Johannes Schindelin
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Taylor Blau,
	Kaartic Sivaram, Philip Oakley, Martin Ågren

On 8/10/2022 6:04 AM, Abhradeep Chakraborty wrote:
> On Wed, Aug 10, 2022 at 2:50 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>>
>> Hi Abhradeep,
>> I instrumented this, and saw that the `multi-pack-index` and
>> `multi-pack-index*.bitmap` files were unchanged by the `git repack`
>> invocation.
> 
> Yeah, those two files remain unchanged here.
> 
>> Re-generating the MIDX bitmap forcefully after the repack seems to fix
>> things over here:
>>
>> -- snip --
>> diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
>> index a95537e759b..564124bda27 100644
>> --- a/t/lib-bitmap.sh
>> +++ b/t/lib-bitmap.sh
>> @@ -438,7 +438,10 @@ midx_bitmap_partial_tests () {
>>
>>         test_expect_success 'setup partial bitmaps' '
>>                 test_commit packed &&
>> +ls -l .git/objects/pack/ &&
>>                 git repack &&
>> +git multi-pack-index write --bitmap &&
>> +ls -l .git/objects/pack/ &&
>>                 test_commit loose &&
>>                 git multi-pack-index write --bitmap 2>err &&
>>                 test_path_is_file $midx &&
>> -- snap --
>>
>> This suggests to me that the `multi-pack-index write --bitmap 2>err` call
>> in this hunk might reuse a stale MIDX bitmap, and that _that_  might be
>> the root cause of this breakage.
> 
> Yeah, the `multi-pack-index write --bitmap 2>err` is creating the
> problem. More specifically the `multi-pack-index write` part. As you
> can see in my previous  comment (if you get the comment), I shared a
> screenshot there which pointed out that the multi-pack-index files in
> both cases are different. The portion from which it started to differ
> belongs to the `RIDX` chunk.
> 
> So, I used some debug lines in `midx_pack_order()` function[1] and
> found that the objects are sorted differently in those cases (i.e.
> passing case and failing case). For passing case, the RIDX chunk
> contents are like below -
> 
> pack_order = [ 1, 36, 11, 6, 18, 3, 19, 12, 5, 31, 27, 23, 29, 8, 38,
> 22, 9, 15, 14, 24, 37, 28, 7, 39, 10, 34, 26, 4, 30, 33, 2, 35, 17,
> 32, 0, 21, 16, 25, 13, 40, 20,]
> 
> And in the failing case, this is -
> 
> pack_order = [ 12, 18, 3, 19, 1, 36, 11, 6, 5, 31, 27, 23, 29, 8, 38,
> 22, 9, 15, 14, 24, 37, 28, 7, 39, 10, 34, 26, 4, 30, 33, 2, 35, 17,
> 32, 0, 21, 16, 25, 13, 40, 20,]
> 
> I went further and realized that this is due to the line[2] -
> 
>     if (!e->preferred)
>         data[i].pack |= (1U << 31);
> 
> I.e. 4- 5 `pack_midx_entry` objects have different `preferred` values
> in those cases. For example,
> "46193a971f5045cb3ca6022957541f9ccddfbfe78591d8506e2d952f8113059b"
> (with pack order 12) is `preferred` in failing case (that's why it is
> in the first position) and the same is `not preferred` in the passing
> case.
> 
> It may be because of reusing a stale midx bitmap (as you said). But I
> am not sure. Just to ensure myself, I compared all the other
> packfiles, idx files and a pack `.bitmap` file (which you can see
> using ls command) of failing and passing cases and found that they are
> the same.

You are right that this choice of a 'preferred' pack is part of the
root cause for this flake. This choice is not deterministic if the
mtime of some pack-files are within the same second.

I can make the flake go away with this change:

diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index a95537e759b0..30347285f10f 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -438,7 +438,9 @@ midx_bitmap_partial_tests () {
 
 	test_expect_success 'setup partial bitmaps' '
 		test_commit packed &&
+		test_tick &&
 		git repack &&
+		test_tick &&
 		test_commit loose &&
 		git multi-pack-index write --bitmap 2>err &&
 		test_path_is_file $midx &&


However, that doesn't help us actually find out what the problem is
in our case.

I've tried exploring other considerations, resulting in this diff:

diff --git a/midx.c b/midx.c
index 9c26d04bfded..3b9094d55ae5 100644
--- a/midx.c
+++ b/midx.c
@@ -921,8 +921,9 @@ static void prepare_midx_packing_data(struct packing_data *pdata,
 		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
 		struct object_entry *to = packlist_alloc(pdata, &from->oid);
 
+		/* Why does removing the permutation here not change the outcome? */
 		oe_set_in_pack(pdata, to,
-			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
+			       ctx->info[from->pack_int_id].p);
 	}
 }
 
This method is setting up some important information, supposedly, and
in the failing case I see that the ctx->pack_perm performs a 5-cycle
( 0->1, 1->2, 2->3, 3->4, 4->0 ) but this removal does not affect _any
existing test cases_!

Turns out that the packfile sent here goes through this very trivial
path in oe_set_in_pack() every time we are writing a multi-pack-index:

static inline void oe_set_in_pack(struct packing_data *pack,
				  struct object_entry *e,
				  struct packed_git *p)
{
	if (pack->in_pack_by_idx) {
		if (p->index) {
			e->in_pack_idx = p->index;
			return;
		}
		/*
		 * We're accessing packs by index, but this pack doesn't have
		 * an index (e.g., because it was added since we created the
		 * in_pack_by_idx array). Bail to oe_map_new_pack(), which
		 * will convert us to using the full in_pack array, and then
		 * fall through to our in_pack handling.
		 */
		oe_map_new_pack(pack);
	}
	pack->in_pack[e - pack->objects] = p;
}

By debugging, I discovered we are hitting the case that calls
oe_map_new_pack(pack). The documentation for that method provides
the following (**emphasis mine**):

/*
 * A new pack appears after prepare_in_pack_by_idx() has been
 * run. **This is likely a race.**
 *
 * We could map this new pack to in_pack_by_idx[] array, but then we
 * have to deal with full array anyway. And since it's hard to test
 * this fall back code, just stay simple and fall back to using
 * in_pack[] array.
 */
void oe_map_new_pack(struct packing_data *pack)

The issue being that prepare_packing_data() uses get_all_packs() to
get the list of pack-files, but that list is stale for some reason.
Adding a reprepare_packed_git() in advance of that call also removes
the flake (with always passing):

diff --git a/midx.c b/midx.c
index 9c26d04bfded..48db91d2728a 100644
--- a/midx.c
+++ b/midx.c
@@ -915,6 +915,7 @@ static void prepare_midx_packing_data(struct packing_data *pdata,
 	uint32_t i;
 
 	memset(pdata, 0, sizeof(struct packing_data));
+	reprepare_packed_git(the_repository);
 	prepare_packing_data(the_repository, pdata);
 
 	for (i = 0; i < ctx->entries_nr; i++) {

But this still appears like it is just a band-aid over a trickier
underlying issue.

Hopefully my rambling helps push you in a helpful direction to find
a more complete fix.

Thanks,
-Stolee

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-08-10 17:51                                 ` Derrick Stolee
@ 2022-08-12 18:51                                   ` Abhradeep Chakraborty
  2022-08-12 19:22                                     ` Derrick Stolee
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-08-12 18:51 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Johannes Schindelin, Abhradeep Chakraborty via GitGitGadget, git,
	Taylor Blau, Kaartic Sivaram, Philip Oakley, Martin Ågren

Hello,

I think I have found the problem. Derrick was right that `mtime` part
is not the culprit. I tried to understand the whole midx workflow and
some questions were raised in my mind. I don't know whether those are
features or bugs (because I do not have much experience in
multi-pack-index code).

I am writing a brief description for the context of the issue and the
questions I have.

Let us start from the `write_midx_internal()` function. As
`packs_to_include` is null in our case, We can use the old midx to
write a new midx file. The line `ctx.m =
lookup_multi_pack_index(the_repository, object_dir)`[1]  does this. It
also loads packs that do not belong to any multi-pack-indexes. It also
sets `the_repository->objects->packed_git_intialized` to 1.  If we
look at our test case (`setup partial bitmap`) the last `.pack` file
(generated by `git repack &&` ) does not belong to any midx. So, that
pack will be loaded in this step.

[1] https://github.com/git/git/blob/5502f77b6944eda8e26813d8f542cffe7d110aea/midx.c#L1169

Next let us move to the `if (ctx.m)`[2] block. As we will be writing a
bitmap, `if (flags & MIDX_WRITE_REV_INDEX)` is true. Thus all packs
related to the old midx are loaded and `ctx.info[ctx.nr].p` stores the
pointers of these packs.

[2] https://github.com/git/git/blob/5502f77b6944eda8e26813d8f542cffe7d110aea/midx.c#L1182

After that we come to the `for_each_file_in_pack_dir(object_dir,
add_pack_to_midx, &ctx);` line[3] . The `add_pack_to_midx`[4] function
adds packs (that are not in the old midx) to `ctx.info`. Now I have a
question here - Why are we using  the `add_packed_git()`[5] function
provided we already loaded those packs in the
`lookup_multi_pack_index` step (i.e. 1st step)? These packs are not
added in `r->objects->packed_git`. This question is related to our
current issue.

I.e. instead of this -

   ctx->info[ctx->nr].p = add_packed_git(full_path,

full_path_len, 0);

Why not this (or similar) -

    for (cp = the_repository->objects->packed_git; cp; cp = cp->next)
        if (!cmp_idx_or_pack_name(cp->pack_name, full_path))
            ctx->info[ctx->nr].p = cp;

[3] https://github.com/git/git/blob/5502f77b6944eda8e26813d8f542cffe7d110aea/midx.c#L1221
[4] https://github.com/git/git/blob/5502f77b6944eda8e26813d8f542cffe7d110aea/midx.c#L462
[5] https://github.com/git/git/blob/5502f77b6944eda8e26813d8f542cffe7d110aea/midx.c#L492

 `write_midx_bitmap()` function is where bitmap related code starts.
let us directly jump into the `prepare_packed_git()` function (called
by `get_all_packs()`[6]). As I said previously,
`r->objects->packed_git_initialized` is already enabled so this
function becomes a no-op function. Which means it does not load the
newly written midx (by calling `prepare_multi_pack_index_one`
function) and uses old midx to write the bitmap (though we still have
new packs and they can be used with the old midx to generate the
bitmap, maybe?) . Here comes my second question - Is this the desired
case? or should we use the new midx to write the bitmaps?

One important point to note is that `get_all_packs()` returns
`r->objects->packed_git` which now stores pointers of all the packs
and only these packfiles have their `->index` set.

[6] https://github.com/git/git/blob/5502f77b6944eda8e26813d8f542cffe7d110aea/packfile.c#L1043

Now let us move to the last function - `oe_set_in_pack()` (called by
`prepare_midx_packing_data()`). Note that, we are passing
`ctx->info[ctx->pack_perm[from->pack_int_id]].p` along with other
parameters. As I have said in an earlier para (containing my first
question), `ctx->info` has some packs (i.e. newer packs that are not
related to the old midx) that are not installed in
`r->objects->packed_git` . In other words, we have two instances of
the same pack file - one in `r->objects->packed_git` list and another
in `ctx->info[id].p`. As `prepare_in_pack_by_idx` function only sets
`->index` for `r->objects->packed_git` packs, these packs (i.e.
`ctx.info[id].p`) do not have their p->index set and thus end up
calling the `oe_map_new_pack` function.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-08-12 18:51                                   ` Abhradeep Chakraborty
@ 2022-08-12 19:22                                     ` Derrick Stolee
  2022-08-13 10:59                                       ` Abhradeep Chakraborty
  0 siblings, 1 reply; 162+ messages in thread
From: Derrick Stolee @ 2022-08-12 19:22 UTC (permalink / raw)
  To: Abhradeep Chakraborty
  Cc: Johannes Schindelin, Abhradeep Chakraborty via GitGitGadget, git,
	Taylor Blau, Kaartic Sivaram, Philip Oakley, Martin Ågren

On 8/12/2022 2:51 PM, Abhradeep Chakraborty wrote:

> I think I have found the problem. Derrick was right that `mtime` part
> is not the culprit. I tried to understand the whole midx workflow and
> some questions were raised in my mind. I don't know whether those are
> features or bugs (because I do not have much experience in
> multi-pack-index code).
> 
> I am writing a brief description for the context of the issue and the
> questions I have.

Thanks for the detailed writeup.

> Let us start from the `write_midx_internal()` function. As
> `packs_to_include` is null in our case, We can use the old midx to
> write a new midx file. The line `ctx.m =
> lookup_multi_pack_index(the_repository, object_dir)`[1]  does this. It
> also loads packs that do not belong to any multi-pack-indexes. It also
> sets `the_repository->objects->packed_git_intialized` to 1.  If we
> look at our test case (`setup partial bitmap`) the last `.pack` file
> (generated by `git repack &&` ) does not belong to any midx. So, that
> pack will be loaded in this step.
> 
> [1] https://github.com/git/git/blob/5502f77b6944eda8e26813d8f542cffe7d110aea/midx.c#L1169
> 
> Next let us move to the `if (ctx.m)`[2] block. As we will be writing a
> bitmap, `if (flags & MIDX_WRITE_REV_INDEX)` is true. Thus all packs
> related to the old midx are loaded and `ctx.info[ctx.nr].p` stores the
> pointers of these packs.
> 
> [2] https://github.com/git/git/blob/5502f77b6944eda8e26813d8f542cffe7d110aea/midx.c#L1182
> 
> After that we come to the `for_each_file_in_pack_dir(object_dir,
> add_pack_to_midx, &ctx);` line[3] . The `add_pack_to_midx`[4] function
> adds packs (that are not in the old midx) to `ctx.info`. Now I have a
> question here - Why are we using  the `add_packed_git()`[5] function
> provided we already loaded those packs in the
> `lookup_multi_pack_index` step (i.e. 1st step)? These packs are not
> added in `r->objects->packed_git`. This question is related to our
> current issue.
> 
> I.e. instead of this -
> 
>    ctx->info[ctx->nr].p = add_packed_git(full_path,
> 
> full_path_len, 0);
> 
> Why not this (or similar) -
> 
>     for (cp = the_repository->objects->packed_git; cp; cp = cp->next)
>         if (!cmp_idx_or_pack_name(cp->pack_name, full_path))
>             ctx->info[ctx->nr].p = cp;
> 
> [3] https://github.com/git/git/blob/5502f77b6944eda8e26813d8f542cffe7d110aea/midx.c#L1221
> [4] https://github.com/git/git/blob/5502f77b6944eda8e26813d8f542cffe7d110aea/midx.c#L462
> [5] https://github.com/git/git/blob/5502f77b6944eda8e26813d8f542cffe7d110aea/midx.c#L492
> 
>  `write_midx_bitmap()` function is where bitmap related code starts.
> let us directly jump into the `prepare_packed_git()` function (called
> by `get_all_packs()`[6]). As I said previously,
> `r->objects->packed_git_initialized` is already enabled so this
> function becomes a no-op function. Which means it does not load the
> newly written midx (by calling `prepare_multi_pack_index_one`
> function) and uses old midx to write the bitmap (though we still have
> new packs and they can be used with the old midx to generate the
> bitmap, maybe?) . Here comes my second question - Is this the desired
> case? or should we use the new midx to write the bitmaps?

The confusing part of all this is that the bitmaps are being written
while the "new" midx is written only to "multi-pack-index.lock" and
has not been renamed to "multi-pack-index". If we renamed first, then
the old .bitmap file would not match the new midx and all Git commands
would act as if there was no .bitmap file.
 
> One important point to note is that `get_all_packs()` returns
> `r->objects->packed_git` which now stores pointers of all the packs
> and only these packfiles have their `->index` set.
> 
> [6] https://github.com/git/git/blob/5502f77b6944eda8e26813d8f542cffe7d110aea/packfile.c#L1043
> 
> Now let us move to the last function - `oe_set_in_pack()` (called by
> `prepare_midx_packing_data()`). Note that, we are passing
> `ctx->info[ctx->pack_perm[from->pack_int_id]].p` along with other
> parameters. As I have said in an earlier para (containing my first
> question), `ctx->info` has some packs (i.e. newer packs that are not
> related to the old midx) that are not installed in
> `r->objects->packed_git` . In other words, we have two instances of
> the same pack file - one in `r->objects->packed_git` list and another
> in `ctx->info[id].p`. As `prepare_in_pack_by_idx` function only sets
> `->index` for `r->objects->packed_git` packs, these packs (i.e.
> `ctx.info[id].p`) do not have their p->index set and thus end up
> calling the `oe_map_new_pack` function.

So really, the problem is that we are handling the r->objects->packed_git
list instead of an array of packs that are under the control of the new
midx. This assumption is baked deep in the pack-objects flow, so it
would be hard to separate this idea.

Perhaps doing the reprepare_packed_git() to regenerate the list would be
sufficient as a band-aid for now, but we would want to later do the big
dig of focusing the pack_data struct to a specific list of pack-files
(by default the set from get_all_packs(), but for midx bitmaps we can
supply a specific set of packs).

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-08-12 19:22                                     ` Derrick Stolee
@ 2022-08-13 10:59                                       ` Abhradeep Chakraborty
  2022-08-16 21:57                                         ` Taylor Blau
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-08-13 10:59 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Johannes Schindelin, Abhradeep Chakraborty via GitGitGadget, git,
	Taylor Blau, Kaartic Sivaram, Philip Oakley, Martin Ågren

On Sat, Aug 13, 2022 at 12:52 AM Derrick Stolee
<derrickstolee@github.com> wrote:
>
> So really, the problem is that we are handling the r->objects->packed_git
> list instead of an array of packs that are under the control of the new
> midx. This assumption is baked deep in the pack-objects flow, so it
> would be hard to separate this idea.
>
> Perhaps doing the reprepare_packed_git() to regenerate the list would be
> sufficient as a band-aid for now, but we would want to later do the big
> dig of focusing the pack_data struct to a specific list of pack-files
> (by default the set from get_all_packs(), but for midx bitmaps we can
> supply a specific set of packs).

`reprepare_packed_git()` can not stop it. Because this function
updates `r->objects->packed_git` list (i.e. it reloads packs that are
not in the old midx) and as I said before, we are setting `->index`
for only r->objects->packed_git not ctx.info[id].p. So, it will call
the `oe_map_new_pack()` function in either way.  I have tested it.

One thing that really worries me is what if the failure is not related
to calling `oe_map_new_pack()? I did all my work assuming that this
function is the culprit. But I don't know if it is.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-08-10  9:20                             ` Johannes Schindelin
  2022-08-10 10:04                               ` Abhradeep Chakraborty
@ 2022-08-13 11:05                               ` Abhradeep Chakraborty
  1 sibling, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-08-13 11:05 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Taylor Blau,
	Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren

On Wed, Aug 10, 2022 at 2:50 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Abhradeep,
>
> On Wed, 10 Aug 2022, Johannes Schindelin wrote:
>
> > On Tue, 9 Aug 2022, Abhradeep Chakraborty wrote:
> >
> > >  I noticed in the 'setup partial bitmaps' test case that if we comment
> > > out the line `git repack &&` , it runs successfully.
> > >
> > >     test_expect_success 'setup partial bitmaps' '
> > >         test_commit packed &&
> > >         # git repack &&
> > >         test_commit loose &&
> > >         git multi-pack-index write --bitmap 2>err &&
> > >         ...
> > >     '
> >
> > That's interesting. Are the `.bitmap` and `.midx` files updated as part of
> > that `repack`?
>
> I instrumented this, and saw that the `multi-pack-index` and
> `multi-pack-index*.bitmap` files were unchanged by the `git repack`
> invocation.
>
> Re-generating the MIDX bitmap forcefully after the repack seems to fix
> things over here:
>
> -- snip --
> diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
> index a95537e759b..564124bda27 100644
> --- a/t/lib-bitmap.sh
> +++ b/t/lib-bitmap.sh
> @@ -438,7 +438,10 @@ midx_bitmap_partial_tests () {
>
>         test_expect_success 'setup partial bitmaps' '
>                 test_commit packed &&
> +ls -l .git/objects/pack/ &&
>                 git repack &&
> +git multi-pack-index write --bitmap &&
> +ls -l .git/objects/pack/ &&
>                 test_commit loose &&
>                 git multi-pack-index write --bitmap 2>err &&
>                 test_path_is_file $midx &&
> -- snap --
>
> This suggests to me that the `multi-pack-index write --bitmap 2>err` call
> in this hunk might reuse a stale MIDX bitmap, and that _that_  might be
> the root cause of this breakage.
>
> What do you think?

Hi Dscho,
I used your code to see if it is the case but it doesn't affect the
result (at least in my laptop).

^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format
  2022-07-20 18:38       ` [PATCH v5 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                           ` (5 preceding siblings ...)
  2022-07-20 18:38         ` [PATCH v5 6/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
@ 2022-08-14 16:55         ` Abhradeep Chakraborty via GitGitGadget
  2022-08-14 16:55           ` [PATCH v6 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
                             ` (7 more replies)
  6 siblings, 8 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-08-14 16:55 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Junio C Hamano, Derrick Stolee,
	Philip Oakley, Martin Ågren,
	Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Johannes Schindelin, Abhradeep Chakraborty

When parsing the .bitmap file, git loads all the bitmaps one by one even if
some of the bitmaps are not necessary. We can remove this overhead by
loading only the necessary bitmaps. A look up table extension can solve this
issue.

Changes since v5:

As the failure in the test case is not due to this code, I think it makes no
sense to delay the patch further.

 * The performance test changes were not accurate as the second
   test_bitmap_cases call using the repo built for the previous call. This
   version fixes that.
 * Taylor suggested some minor changes. Those are addressed in this version.

Changes since v4:

 * There was a CI failing test for linux-sha256 in the previous version.
   Fixed now.

Changes since v3:

 * The common code from both lookup_table_get_triplet() and
   bsearch_triplet_by_pos are moved to lookup_table_get_triplet_by_pointer
   function
 * parameter names of triplet_cmp function is changes (as suggested by
   Martin)
 * xor_items array is now work as reusable static buffer.
 * I moved the filling commit_positions array part (from
   pack-bitmap-write.c) to bitmap_writer_finish function. Because we had to
   iterate two times for commit positions - one in write_selected_commits_v1
   and another in write_lookup_table function. Hope this is acceptable :)
 * changes in performance tests (as suggested by Taylor)

Changes since v2:

 * Log messages related issues are fixed.
 * pack.writeBitmapLookupTable is now by default disabled.
 * Documentations are improved.
 * xor_row is used instead of xor_pos in triplets.
 * In pack-bitmap-write.c, off_t * is used for offsets array (Instead of
   uint64_t *).
 * struct bitmap_lookup_table_triplet is introduced and functions Like
   triplet_get_offset() and triplet_get_xor_pos() are removed.
 * table_size is getting subtracted from index_end irrespective of the value
   of GIT_TEST_READ_COMMIT_TABLE.
 * xor stack filling loop will stop iterating if a xor bitmap is already
   stored/parsed.
 * The stack will now store bitmap_lookup_table_xor_item items Of plain
   xor_row.
 * bitmap related test files are reformatted to allow repeating of tests
   with bitmap extension enabled.
 * comments are added.

Changes since v1:

This is the second version which addressed all (I think) the reviews. Please
notify me if some reviews are not addressed :)

 * The table size is decreased and the format has also changed. It now
   contains nr_entries triplets of size 4+8+4 bytes. Each triplet contains
   the following things - (1) 4 byte commit position (in the pack-index or
   midx) (2) 8 byte offset and (3) 4 byte xor triplet (i.e. with whose
   bitmap the current triplet's bitmap has to xor) position.
 * Performance tests are splitted into two commits. First contains the
   actual performance tests and second enables the pack.writeReverseIndex
   (as suggested by Taylor).
 * st_*() functions are used.
 * commit order is changed according to Derrick's suggestion.
 * Iterative approach is used instead of recursive approach to parse xor
   bitmaps. (As suggested by Derrick).
 * Some minor bug fixes of previous version.

Initial version:

The proposed table has:

 * a list of nr_entries object ids. These objects are commits that has
   bitmaps. Ids are stored in lexicographic order (for better searching).
 * a list of <offset, xor-offset> pairs (4-byte integers, network-byte
   order). The i'th pair denotes the offset and xor-offset(respectively) of
   the bitmap of i'th commit in the previous list. These two informations
   are necessary because only in this way bitmaps can be found without
   parsing all the bitmap.
 * a 4-byte integer for table specific flags (none exists currently).

Whenever git want to parse the bitmap for a specific commit, it will first
refer to the table and will look for the offset and xor-offset for that
commit. Git will then try to parse the bitmap located at the offset
position. The xor-offset can be used to find the xor-bitmap for the
bitmap(if any).

Abhradeep Chakraborty (6):
  Documentation/technical: describe bitmap lookup table extension
  bitmap: move `get commit positions` code to `bitmap_writer_finish`
  pack-bitmap-write.c: write lookup table extension
  pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  pack-bitmap: prepare to read lookup table extension
  bitmap-lookup-table: add performance tests for lookup table

 Documentation/config/pack.txt             |   7 +
 Documentation/technical/bitmap-format.txt |  39 ++
 builtin/multi-pack-index.c                |   7 +
 builtin/pack-objects.c                    |   8 +
 midx.c                                    |   3 +
 midx.h                                    |   1 +
 pack-bitmap-write.c                       | 114 +++-
 pack-bitmap.c                             | 290 +++++++-
 pack-bitmap.h                             |  14 +-
 t/perf/lib-bitmap.sh                      |  31 +
 t/perf/p5310-pack-bitmaps.sh              |  78 +--
 t/perf/p5311-pack-bitmaps-fetch.sh        |  74 +-
 t/perf/p5312-pack-bitmaps-revs.sh         |  35 +
 t/perf/p5326-multi-pack-bitmaps.sh        | 103 +--
 t/t5310-pack-bitmaps.sh                   | 786 ++++++++++++----------
 t/t5311-pack-bitmaps-shallow.sh           |  53 +-
 t/t5326-multi-pack-bitmaps.sh             | 421 +++++++-----
 t/t5327-multi-pack-bitmaps-rev.sh         |  24 +-
 18 files changed, 1378 insertions(+), 710 deletions(-)
 create mode 100755 t/perf/p5312-pack-bitmaps-revs.sh


base-commit: afa70145a25e81faa685dc0b465e52b45d2444bd
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1266%2FAbhra303%2Fbitmap-commit-table-v6
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1266/Abhra303/bitmap-commit-table-v6
Pull-Request: https://github.com/gitgitgadget/git/pull/1266

Range-diff vs v5:

 1:  33aca8f3dc8 ! 1:  67b71be8c85 Documentation/technical: describe bitmap lookup table extension
     @@ Commit message
      
       ## Documentation/technical/bitmap-format.txt ##
      @@ Documentation/technical/bitmap-format.txt: MIDXs, both the bit-cache and rev-cache extensions are required.
     - 			pack/MIDX. The format and meaning of the name-hash is
     - 			described below.
     + 	    pack/MIDX. The format and meaning of the name-hash is
     + 	    described below.
       
     -+			** {empty}
     -+			BITMAP_OPT_LOOKUP_TABLE (0x10): :::
     -+			If present, the end of the bitmap file contains a table
     -+			containing a list of `N` <commit_pos, offset, xor_row>
     -+			triplets. The format and meaning of the table is described
     -+			below.
     ++		** {empty}
     ++		BITMAP_OPT_LOOKUP_TABLE (0x10): :::
     ++		If present, the end of the bitmap file contains a table
     ++		containing a list of `N` <commit_pos, offset, xor_row>
     ++		triplets. The format and meaning of the table is described
     ++		below.
      ++
      +NOTE: Unlike the xor_offset used to compress an individual bitmap,
      +`xor_row` stores an *absolute* index into the lookup table, not a location
      +relative to the current entry.
      +
     - 		4-byte entry count (network byte order)
     + 	4-byte entry count (network byte order): ::
     + 	    The total count of entries (bitmapped commits) in this bitmap index.
       
     - 			The total count of entries (bitmapped commits) in this bitmap index.
      @@ Documentation/technical/bitmap-format.txt: Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag.
       If implementations want to choose a different hashing scheme, they are
       free to do so, but MUST allocate a new header flag (because comparing
 -:  ----------- > 2:  92ca58fbeeb bitmap: move `get commit positions` code to `bitmap_writer_finish`
 2:  a913e6a2cb3 ! 3:  090becaabe0 pack-bitmap-write.c: write lookup table extension
     @@ Commit message
      
       ## pack-bitmap-write.c ##
      @@ pack-bitmap-write.c: static const struct object_id *oid_access(size_t pos, const void *table)
     - 
       static void write_selected_commits_v1(struct hashfile *f,
       				      struct pack_idx_entry **index,
     --				      uint32_t index_nr)
     -+				      uint32_t index_nr,
     -+				      off_t *offsets,
     -+				      uint32_t *commit_positions)
     + 				      uint32_t index_nr,
     +-				      uint32_t *commit_positions)
     ++				      uint32_t *commit_positions,
     ++				      off_t *offsets)
       {
       	int i;
       
       	for (i = 0; i < writer.selected_nr; ++i) {
       		struct bitmapped_commit *stored = &writer.selected[i];
       
     --		int commit_pos =
     --			oid_pos(&stored->commit->object.oid, index, index_nr, oid_access);
      +		if (offsets)
      +			offsets[i] = hashfile_total(f);
     - 
     --		if (commit_pos < 0)
     --			BUG("trying to write commit not in index");
     --
     --		hashwrite_be32(f, commit_pos);
     -+		hashwrite_be32(f, commit_positions[i]);
     ++
     + 		hashwrite_be32(f, commit_positions[i]);
       		hashwrite_u8(f, stored->xor_offset);
       		hashwrite_u8(f, stored->flags);
     - 
      @@ pack-bitmap-write.c: static void write_selected_commits_v1(struct hashfile *f,
       	}
       }
     @@ pack-bitmap-write.c: static void write_selected_commits_v1(struct hashfile *f,
      +static void write_lookup_table(struct hashfile *f,
      +			       struct pack_idx_entry **index,
      +			       uint32_t index_nr,
     -+			       off_t *offsets,
     -+			       uint32_t *commit_positions)
     ++			       uint32_t *commit_positions,
     ++			       off_t *offsets)
      +{
      +	uint32_t i;
      +	uint32_t *table, *table_inv;
     @@ pack-bitmap-write.c: static void write_selected_commits_v1(struct hashfile *f,
       			     struct pack_idx_entry **index,
       			     uint32_t index_nr)
      @@ pack-bitmap-write.c: void bitmap_writer_finish(struct pack_idx_entry **index,
     - {
     - 	static uint16_t default_version = 1;
     - 	static uint16_t flags = BITMAP_OPT_FULL_DAG;
     -+	off_t *offsets = NULL;
       	struct strbuf tmp_file = STRBUF_INIT;
       	struct hashfile *f;
     -+	uint32_t *commit_positions = NULL;
     + 	uint32_t *commit_positions = NULL;
     ++	off_t *offsets = NULL;
     + 	uint32_t i;
       
       	struct bitmap_disk_header header;
     - 
      @@ pack-bitmap-write.c: void bitmap_writer_finish(struct pack_idx_entry **index,
     - 	dump_bitmap(f, writer.trees);
       	dump_bitmap(f, writer.blobs);
       	dump_bitmap(f, writer.tags);
     --	write_selected_commits_v1(f, index, index_nr);
     -+
     -+	ALLOC_ARRAY(commit_positions, writer.selected_nr);
     -+	for (uint32_t i = 0; i < writer.selected_nr; ++i) {
     -+		struct bitmapped_commit *stored = &writer.selected[i];
     -+		int commit_pos = oid_pos(&stored->commit->object.oid, index, index_nr, oid_access);
     -+
     -+		if (commit_pos < 0)
     -+			BUG(_("trying to write commit not in index"));
     -+
     -+		commit_positions[i] = commit_pos;
     -+	}
     -+
     + 
      +	if (options & BITMAP_OPT_LOOKUP_TABLE)
      +		CALLOC_ARRAY(offsets, index_nr);
      +
     -+	write_selected_commits_v1(f, index, index_nr, offsets, commit_positions);
     + 	ALLOC_ARRAY(commit_positions, writer.selected_nr);
     + 
     + 	for (i = 0; i < writer.selected_nr; i++) {
     +@@ pack-bitmap-write.c: void bitmap_writer_finish(struct pack_idx_entry **index,
     + 		commit_positions[i] = commit_pos;
     + 	}
     + 
     +-	write_selected_commits_v1(f, index, index_nr, commit_positions);
     ++	write_selected_commits_v1(f, index, index_nr, commit_positions, offsets);
      +
      +	if (options & BITMAP_OPT_LOOKUP_TABLE)
     -+		write_lookup_table(f, index, index_nr, offsets, commit_positions);
     ++		write_lookup_table(f, index, index_nr, commit_positions, offsets);
       
       	if (options & BITMAP_OPT_HASH_CACHE)
       		write_hash_cache(f, index, index_nr);
      @@ pack-bitmap-write.c: void bitmap_writer_finish(struct pack_idx_entry **index,
     - 		die_errno("unable to rename temporary bitmap file to '%s'", filename);
       
       	strbuf_release(&tmp_file);
     + 	free(commit_positions);
      +	free(offsets);
     -+	free(commit_positions);
       }
      
       ## pack-bitmap.h ##
 3:  59b465e5a78 ! 4:  b2b7c5c1703 pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
     @@ builtin/pack-objects.c: static int git_pack_config(const char *k, const char *v,
       		return 0;
      
       ## midx.c ##
     -@@ midx.c: static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
     +@@ midx.c: static int write_midx_bitmap(const char *midx_name,
       	if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
       		options |= BITMAP_OPT_HASH_CACHE;
       
      +	if (flags & MIDX_WRITE_BITMAP_LOOKUP_TABLE)
      +		options |= BITMAP_OPT_LOOKUP_TABLE;
      +
     - 	prepare_midx_packing_data(&pdata, ctx);
     - 
     - 	commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx);
     + 	/*
     + 	 * Build the MIDX-order index based on pdata.objects (which is already
     + 	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
      
       ## midx.h ##
      @@ midx.h: struct multi_pack_index {
     @@ t/t5327-multi-pack-bitmaps-rev.sh: GIT_TEST_MIDX_READ_RIDX=0
      -midx_bitmap_core rev
      -midx_bitmap_partial_tests rev
      +test_midx_bitmap_rev () {
     -+     writeLookupTable=false
     -+
     -+ 	for i in "$@"
     -+ 	do
     -+ 		case $i in
     -+ 		"pack.writeBitmapLookupTable") writeLookupTable=true;;
     -+ 		esac
     -+ 	done
     -+
     -+     test_expect_success 'setup bitmap config' '
     -+         rm -rf * .git &&
     -+         git init &&
     -+         git config pack.writeBitmapLookupTable '"$writeLookupTable"'
     -+     '
     -+
     -+     midx_bitmap_core rev
     -+     midx_bitmap_partial_tests rev
     -+ }
     -+
     -+ test_midx_bitmap_rev
     -+ test_midx_bitmap_rev "pack.writeBitmapLookupTable"
     ++	writeLookupTable=false
     ++
     ++	for i in "$@"
     ++	do
     ++		case $i in
     ++		"pack.writeBitmapLookupTable") writeLookupTable=true;;
     ++		esac
     ++	done
     ++
     ++	test_expect_success 'setup bitmap config' '
     ++		rm -rf * .git &&
     ++		git init &&
     ++		git config pack.writeBitmapLookupTable '"$writeLookupTable"'
     ++	'
     ++
     ++	midx_bitmap_core rev
     ++	midx_bitmap_partial_tests rev
     ++}
     ++
     ++test_midx_bitmap_rev
     ++test_midx_bitmap_rev "pack.writeBitmapLookupTable"
       
       test_done
 4:  6918f0860ad ! 5:  79842ca590c pack-bitmap: prepare to read lookup table extension
     @@ pack-bitmap.c: static struct stored_bitmap *store_bitmap(struct bitmap_index *in
      +	 * shouldn't be duplicated commits in the index.
      +	 */
       	if (ret == 0) {
     --		error("Duplicate entry in bitmap index: %s", oid_to_hex(oid));
     -+		error(_("duplicate entry in bitmap index: %s"), oid_to_hex(oid));
     + 		error(_("duplicate entry in bitmap index: '%s'"), oid_to_hex(oid));
       		return NULL;
     - 	}
     - 
      @@ pack-bitmap.c: static int load_bitmap(struct bitmap_index *bitmap_git)
       		!(bitmap_git->tags = read_bitmap_1(bitmap_git)))
       		goto failed;
     @@ pack-bitmap.c: struct include_data {
      + * Note that this function assumes that there is enough memory
      + * left for filling the `triplet` struct from `p`.
      + */
     -+static int lookup_table_get_triplet_by_pointer(struct bitmap_lookup_table_triplet *triplet,
     -+					       const unsigned char *p)
     ++static int bitmap_lookup_table_get_triplet_by_pointer(struct bitmap_lookup_table_triplet *triplet,
     ++						      const unsigned char *p)
      +{
      +	if (!triplet)
      +		return -1;
     @@ pack-bitmap.c: struct include_data {
      + * This function gets the raw triplet from `row`'th row in the
      + * lookup table and fills that data to the `triplet`.
      + */
     -+static int lookup_table_get_triplet(struct bitmap_index *bitmap_git,
     -+				    uint32_t pos,
     -+				    struct bitmap_lookup_table_triplet *triplet)
     ++static int bitmap_lookup_table_get_triplet(struct bitmap_index *bitmap_git,
     ++					   uint32_t pos,
     ++					   struct bitmap_lookup_table_triplet *triplet)
      +{
      +	unsigned char *p = NULL;
      +	if (pos >= bitmap_git->entry_count)
     @@ pack-bitmap.c: struct include_data {
      +
      +	p = bitmap_git->table_lookup + st_mult(pos, BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH);
      +
     -+	return lookup_table_get_triplet_by_pointer(triplet, p);
     ++	return bitmap_lookup_table_get_triplet_by_pointer(triplet, p);
      +}
      +
      +/*
     @@ pack-bitmap.c: struct include_data {
      +	return 0;
      +}
      +
     -+static uint32_t bsearch_pos(struct bitmap_index *bitmap_git,
     ++static uint32_t bitmap_bsearch_pos(struct bitmap_index *bitmap_git,
      +			    struct object_id *oid,
      +			    uint32_t *result)
      +{
     @@ pack-bitmap.c: struct include_data {
      + * object from the raw triplet. Returns 1 on success and 0 on
      + * failure.
      + */
     -+static int bsearch_triplet_by_pos(uint32_t commit_pos,
     ++static int bitmap_bsearch_triplet_by_pos(uint32_t commit_pos,
      +				  struct bitmap_index *bitmap_git,
      +				  struct bitmap_lookup_table_triplet *triplet)
      +{
     @@ pack-bitmap.c: struct include_data {
      +	if (!p)
      +		return -1;
      +
     -+	return lookup_table_get_triplet_by_pointer(triplet, p);
     ++	return bitmap_lookup_table_get_triplet_by_pointer(triplet, p);
      +}
      +
      +static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_git,
     -+					  struct commit *commit)
     ++						    struct commit *commit)
      +{
      +	uint32_t commit_pos, xor_row;
      +	uint64_t offset;
     -+	int flags, found;
     ++	int flags;
      +	struct bitmap_lookup_table_triplet triplet;
      +	struct object_id *oid = &commit->object.oid;
      +	struct ewah_bitmap *bitmap;
     @@ pack-bitmap.c: struct include_data {
      +	static struct bitmap_lookup_table_xor_item *xor_items = NULL;
      +	static size_t xor_items_nr = 0, xor_items_alloc = 0;
      +	static int is_corrupt = 0;
     ++	int xor_flags;
     ++	khiter_t hash_pos;
     ++	struct bitmap_lookup_table_xor_item *xor_item;
      +
      +	if (is_corrupt)
      +		return NULL;
      +
     -+	found = bsearch_pos(bitmap_git, oid, &commit_pos);
     -+
     -+	if (!found)
     ++	if (!bitmap_bsearch_pos(bitmap_git, oid, &commit_pos))
      +		return NULL;
      +
     -+	if (bsearch_triplet_by_pos(commit_pos, bitmap_git, &triplet) < 0)
     ++	if (bitmap_bsearch_triplet_by_pos(commit_pos, bitmap_git, &triplet) < 0)
      +		return NULL;
      +
      +	xor_items_nr = 0;
      +	offset = triplet.offset;
      +	xor_row = triplet.xor_row;
      +
     -+	if (xor_row != 0xffffffff) {
     -+		int xor_flags;
     -+		khiter_t hash_pos;
     -+		struct bitmap_lookup_table_xor_item *xor_item;
     -+
     -+		while (xor_row != 0xffffffff) {
     -+			ALLOC_GROW(xor_items, xor_items_nr + 1, xor_items_alloc);
     -+
     -+			if (xor_items_nr + 1 >= bitmap_git->entry_count) {
     -+				error(_("corrupt bitmap lookup table: xor chain exceed entry count"));
     -+				goto corrupt;
     -+			}
     -+
     -+			if (lookup_table_get_triplet(bitmap_git, xor_row, &triplet) < 0)
     -+				goto corrupt;
     -+
     -+			xor_item = &xor_items[xor_items_nr];
     -+			xor_item->offset = triplet.offset;
     -+
     -+			if (nth_bitmap_object_oid(bitmap_git, &xor_item->oid, triplet.commit_pos) < 0) {
     -+				error(_("corrupt bitmap lookup table: commit index %u out of range"),
     -+					triplet.commit_pos);
     -+				goto corrupt;
     -+			}
     -+
     -+			hash_pos = kh_get_oid_map(bitmap_git->bitmaps, xor_item->oid);
     -+
     -+			/*
     -+			 * If desired bitmap is already stored, we don't need
     -+			 * to iterate further. Because we know that bitmaps
     -+			 * that are needed to be parsed to parse this bitmap
     -+			 * has already been stored. So, assign this stored bitmap
     -+			 * to the xor_bitmap.
     -+			 */
     -+			if (hash_pos < kh_end(bitmap_git->bitmaps) &&
     -+			    (xor_bitmap = kh_value(bitmap_git->bitmaps, hash_pos)))
     -+				break;
     -+			xor_items_nr++;
     -+			xor_row = triplet.xor_row;
     ++	while (xor_row != 0xffffffff) {
     ++		ALLOC_GROW(xor_items, xor_items_nr + 1, xor_items_alloc);
     ++
     ++		if (xor_items_nr + 1 >= bitmap_git->entry_count) {
     ++			error(_("corrupt bitmap lookup table: xor chain exceed entry count"));
     ++			goto corrupt;
      +		}
      +
     -+		while (xor_items_nr) {
     -+			xor_item = &xor_items[xor_items_nr - 1];
     -+			bitmap_git->map_pos = xor_item->offset;
     -+			if (bitmap_git->map_size - bitmap_git->map_pos < bitmap_header_size) {
     -+				error(_("corrupt ewah bitmap: truncated header for bitmap of commit \"%s\""),
     -+					oid_to_hex(&xor_item->oid));
     -+				goto corrupt;
     -+			}
     ++		if (bitmap_lookup_table_get_triplet(bitmap_git, xor_row, &triplet) < 0)
     ++			goto corrupt;
     ++
     ++		xor_item = &xor_items[xor_items_nr];
     ++		xor_item->offset = triplet.offset;
      +
     -+			bitmap_git->map_pos = bitmap_git->map_pos + sizeof(uint32_t) + sizeof(uint8_t);
     -+			xor_flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
     -+			bitmap = read_bitmap_1(bitmap_git);
     ++		if (nth_bitmap_object_oid(bitmap_git, &xor_item->oid, triplet.commit_pos) < 0) {
     ++			error(_("corrupt bitmap lookup table: commit index %u out of range"),
     ++				triplet.commit_pos);
     ++			goto corrupt;
     ++		}
      +
     -+			if (!bitmap)
     -+				goto corrupt;
     ++		hash_pos = kh_get_oid_map(bitmap_git->bitmaps, xor_item->oid);
     ++
     ++		/*
     ++		 * If desired bitmap is already stored, we don't need
     ++		 * to iterate further. Because we know that bitmaps
     ++		 * that are needed to be parsed to parse this bitmap
     ++		 * has already been stored. So, assign this stored bitmap
     ++		 * to the xor_bitmap.
     ++		 */
     ++		if (hash_pos < kh_end(bitmap_git->bitmaps) &&
     ++			(xor_bitmap = kh_value(bitmap_git->bitmaps, hash_pos)))
     ++			break;
     ++		xor_items_nr++;
     ++		xor_row = triplet.xor_row;
     ++	}
      +
     -+			xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid, xor_bitmap, xor_flags);
     -+			xor_items_nr--;
     ++	while (xor_items_nr) {
     ++		xor_item = &xor_items[xor_items_nr - 1];
     ++		bitmap_git->map_pos = xor_item->offset;
     ++		if (bitmap_git->map_size - bitmap_git->map_pos < bitmap_header_size) {
     ++			error(_("corrupt ewah bitmap: truncated header for bitmap of commit \"%s\""),
     ++				oid_to_hex(&xor_item->oid));
     ++			goto corrupt;
      +		}
     ++
     ++		bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t);
     ++		xor_flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
     ++		bitmap = read_bitmap_1(bitmap_git);
     ++
     ++		if (!bitmap)
     ++			goto corrupt;
     ++
     ++		xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid, xor_bitmap, xor_flags);
     ++		xor_items_nr--;
      +	}
      +
      +	bitmap_git->map_pos = offset;
     @@ pack-bitmap.c: struct include_data {
      +		goto corrupt;
      +	}
      +
     -+	bitmap_git->map_pos = bitmap_git->map_pos + sizeof(uint32_t) + sizeof(uint8_t);
     ++	/*
     ++	 * Don't bother reading the commit's index position or its xor
     ++	 * offset:
     ++	 *
     ++	 *   - The commit's index position is irrelevant to us, since
     ++	 *     load_bitmap_entries_v1 only uses it to learn the object
     ++	 *     id which is used to compute the hashmap's key. We already
     ++	 *     have an object id, so no need to look it up again.
     ++	 *
     ++	 *   - The xor_offset is unusable for us, since it specifies how
     ++	 *     many entries previous to ours we should look at. This
     ++	 *     makes sense when reading the bitmaps sequentially (as in
     ++	 *     load_bitmap_entries_v1()), since we can keep track of
     ++	 *     each bitmap as we read them.
     ++	 *
     ++	 *     But it can't work for us, since the bitmap's don't have a
     ++	 *     fixed size. So we learn the position of the xor'd bitmap
     ++	 *     from the commit table (and resolve it to a bitmap in the
     ++	 *     above if-statement).
     ++	 *
     ++	 * Instead, we can skip ahead and immediately read the flags and
     ++	 * ewah bitmap.
     ++	 */
     ++	bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t);
      +	flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
      +	bitmap = read_bitmap_1(bitmap_git);
      +
     @@ pack-bitmap.c: struct include_data {
       
      @@ pack-bitmap.c: void test_bitmap_walk(struct rev_info *revs)
       	if (revs->pending.nr != 1)
     - 		die("you must specify exactly one commit to test");
     + 		die(_("you must specify exactly one commit to test"));
       
     --	fprintf(stderr, "Bitmap v%d test (%d entries loaded)\n",
     +-	fprintf_ln(stderr, "Bitmap v%d test (%d entries loaded)",
      -		bitmap_git->version, bitmap_git->entry_count);
     -+	fprintf(stderr, "Bitmap v%d test (%d entries%s)",
     ++	fprintf_ln(stderr, "Bitmap v%d test (%d entries%s)",
      +		bitmap_git->version,
      +		bitmap_git->entry_count,
      +		bitmap_git->table_lookup ? "" : " loaded");
     @@ pack-bitmap.c: void test_bitmap_walk(struct rev_info *revs)
       	struct object_id oid;
       	MAYBE_UNUSED void *value;
      +	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
     -+
     + 
     + 	if (!bitmap_git)
     + 		die(_("failed to load bitmap indexes"));
     + 
      +	/*
      +	 * As this function is only used to print bitmap selected
      +	 * commits, we don't have to read the commit table.
      +	 */
     - 
     - 	if (!bitmap_git)
     - 		die("failed to load bitmap indexes");
     - 
      +	if (bitmap_git->table_lookup) {
      +		if (load_bitmap_entries_v1(bitmap_git) < 0)
      +			die(_("failed to load bitmap indexes"));
      +	}
      +
       	kh_foreach(bitmap_git->bitmaps, oid, value, {
     - 		printf("%s\n", oid_to_hex(&oid));
     + 		printf_ln("%s", oid_to_hex(&oid));
       	});
      
       ## pack-bitmap.h ##
 5:  e7ef420f321 < -:  ----------- p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex`
 6:  6628001241d ! 6:  b460516b306 bitmap-lookup-table: add performance tests for lookup table
     @@ Metadata
       ## Commit message ##
          bitmap-lookup-table: add performance tests for lookup table
      
     -    Add performance tests to verify the performance of lookup table with
     -    `pack.writeReverseIndex` enabled. This is to check the performance
     -    when the above configuration is set.
     +    Add performance tests to verify the performance of lookup table.
     +    `p5310-pack-bitmaps.sh` contain tests with and without lookup table.
     +    `p5312-pack-bitmaps-revs.sh` contain same tests with and without
     +    lookup table but with `pack.writeReverseIndex` enabled.
      
          Lookup table makes Git run faster in most of the cases. Below is the
          result of `t/perf/p5310-pack-bitmaps.sh`.`perf/p5326-multi-pack-bitmaps.sh`
          gives similar result. The repository used in the test is linux kernel.
      
     -    Test                                                      this tree
     -    ---------------------------------------------------------------------------
     -    5310.4: repack to disk (lookup=false)                   296.55(256.53+14.52)
     -    5310.5: simulated clone                                 15.64(8.88+1.39)
     -    5310.6: simulated fetch                                 1.65(2.75+0.20)
     -    5310.7: pack to file (bitmap)                           48.71(30.20+7.58)
     -    5310.8: rev-list (commits)                              0.61(0.41+0.08)
     -    5310.9: rev-list (objects)                              4.38(4.26+0.09)
     -    5310.10: rev-list with tag negated via --not            0.07(0.02+0.04)
     +    Test                                                    this tree
     +    -----------------------------------------------------------------------
     +    5310.4: enable lookup table: false                    0.01(0.00+0.00)
     +    5310.5: repack to disk                                320.89(230.20+23.45)
     +    5310.6: simulated clone                               14.04(5.78+1.79)
     +    5310.7: simulated fetch                               1.95(3.05+0.20)
     +    5310.8: pack to file (bitmap)                         44.73(20.55+7.45)
     +    5310.9: rev-list (commits)                            0.78(0.46+0.10)
     +    5310.10: rev-list (objects)                           4.07(3.97+0.08)
     +    5310.11: rev-list with tag negated via --not          0.06(0.02+0.03)
                   --all (objects)
     -    5310.11: rev-list with negative tag (objects)           0.05(0.01+0.03)
     -    5310.12: rev-list count with blob:none                  0.08(0.03+0.04)
     -    5310.13: rev-list count with blob:limit=1k              7.29(6.92+0.30)
     -    5310.14: rev-list count with tree:0                     0.08(0.03+0.04)
     -    5310.15: simulated partial clone                        9.45(8.12+0.41)
     -    5310.17: clone (partial bitmap)                         21.00(15.04+2.39)
     -    5310.18: pack to file (partial bitmap)                  47.98(38.13+5.23)
     -    5310.19: rev-list with tree filter (partial bitmap)     0.70(0.07+0.20)
     -    5310.22: repack to disk (lookup=true)                   255.92(188.13+20.47)
     -    5310.23: simulated clone                                13.78(8.84+1.09)
     -    5310.24: simulated fetch                                0.52(0.63+0.14)
     -    5310.25: pack to file (bitmap)                          44.34(28.94+6.84)
     -    5310.26: rev-list (commits)                             0.48(0.31+0.06)
     -    5310.27: rev-list (objects)                             4.02(3.93+0.07)
     -    5310.28: rev-list with tag negated via --not            0.04(0.00+0.03)
     +    5310.12: rev-list with negative tag (objects)         0.21(0.15+0.05)
     +    5310.13: rev-list count with blob:none                0.24(0.17+0.06)
     +    5310.14: rev-list count with blob:limit=1k            7.07(5.92+0.48)
     +    5310.15: rev-list count with tree:0                   0.25(0.17+0.07)
     +    5310.16: simulated partial clone                      5.67(3.28+0.64)
     +    5310.18: clone (partial bitmap)                       16.05(8.34+1.86)
     +    5310.19: pack to file (partial bitmap)                59.76(27.22+7.43)
     +    5310.20: rev-list with tree filter (partial bitmap)   0.90(0.18+0.16)
     +    5310.24: enable lookup table: true                    0.01(0.00+0.00)
     +    5310.25: repack to disk                               319.73(229.30+23.01)
     +    5310.26: simulated clone                              13.69(5.72+1.78)
     +    5310.27: simulated fetch                              1.84(3.02+0.16)
     +    5310.28: pack to file (bitmap)                        45.63(20.67+7.50)
     +    5310.29: rev-list (commits)                           0.56(0.39+0.8)
     +    5310.30: rev-list (objects)                           3.77(3.74+0.08)
     +    5310.31: rev-list with tag negated via --not          0.05(0.02+0.03)
                   --all (objects)
     -    5310.29: rev-list with negative tag (objects)           0.04(0.00+0.03)
     -    5310.30: rev-list count with blob:none                  0.04(0.01+0.03)
     -    5310.31: rev-list count with blob:limit=1k              6.48(6.23+0.22)
     -    5310.32: rev-list count with tree:0                     0.04(0.01+0.03)
     -    5310.33: simulated partial clone                        8.30(7.21+0.36)
     -    5310.35: clone (partial bitmap)                         20.34(15.00+2.41)
     -    5310.36: pack to file (partial bitmap)                  46.45(38.05+5.20)
     -    5310.37: rev-list with tree filter (partial bitmap)     0.61(0.06+0.20)
     +    5310.32: rev-list with negative tag (objects)         0.21(0.15+0.05)
     +    5310.33: rev-list count with blob:none                0.23(0.17+0.05)
     +    5310.34: rev-list count with blob:limit=1k            6.65(5.72+0.40)
     +    5310.35: rev-list count with tree:0                   0.23(0.16+0.06)
     +    5310.36: simulated partial clone                      5.57(3.26+0.59)
     +    5310.38: clone (partial bitmap)                       15.89(8.39+1.84)
     +    5310.39: pack to file (partial bitmap)                58.32(27.55+7.47)
     +    5310.40: rev-list with tree filter (partial bitmap)   0.73(0.18+0.15)
      
          Test 4-15 are tested without using lookup table. Same tests are
          repeated in 16-30 (using lookup table).
     @@ Commit message
          Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
          Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
      
     + ## t/perf/lib-bitmap.sh ##
     +@@ t/perf/lib-bitmap.sh: test_partial_bitmap () {
     + 			--filter=tree:0 >/dev/null
     + 	'
     + }
     ++
     ++test_pack_bitmap () {
     ++	test_perf "repack to disk" '
     ++		git repack -ad
     ++	'
     ++
     ++	test_full_bitmap
     ++
     ++	test_expect_success "create partial bitmap state" '
     ++		# pick a commit to represent the repo tip in the past
     ++		cutoff=$(git rev-list HEAD~100 -1) &&
     ++		orig_tip=$(git rev-parse HEAD) &&
     ++
     ++		# now kill off all of the refs and pretend we had
     ++		# just the one tip
     ++		rm -rf .git/logs .git/refs/* .git/packed-refs &&
     ++		git update-ref HEAD $cutoff &&
     ++
     ++		# and then repack, which will leave us with a nice
     ++		# big bitmap pack of the "old" history, and all of
     ++		# the new history will be loose, as if it had been pushed
     ++		# up incrementally and exploded via unpack-objects
     ++		git repack -Ad &&
     ++
     ++		# and now restore our original tip, as if the pushes
     ++		# had happened
     ++		git update-ref HEAD $orig_tip
     ++	'
     ++
     ++	test_partial_bitmap
     ++}
     +
       ## t/perf/p5310-pack-bitmaps.sh ##
     -@@ t/perf/p5310-pack-bitmaps.sh: test_expect_success 'setup bitmap config' '
     - 	git config pack.writeReverseIndex true
     - '
     +@@ t/perf/p5310-pack-bitmaps.sh: test_description='Tests pack performance using bitmaps'
     + . ./perf-lib.sh
     + . "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
       
     +-test_perf_large_repo
     +-
     +-# note that we do everything through config,
     +-# since we want to be able to compare bitmap-aware
     +-# git versus non-bitmap git
     +-#
     +-# We intentionally use the deprecated pack.writebitmaps
     +-# config so that we can test against older versions of git.
     +-test_expect_success 'setup bitmap config' '
     +-	git config pack.writebitmaps true
     +-'
     +-
      -# we need to create the tag up front such that it is covered by the repack and
      -# thus by generated bitmaps.
      -test_expect_success 'create tags' '
      -	git tag --message="tag pointing to HEAD" perf-tag HEAD
      -'
     -+test_bitmap () {
     -+	local enabled="$1"
     - 
     +-
      -test_perf 'repack to disk' '
      -	git repack -ad
      -'
     -+	# we need to create the tag up front such that it is covered by the repack and
     -+	# thus by generated bitmaps.
     -+	test_expect_success 'create tags' '
     -+		git tag --message="tag pointing to HEAD" perf-tag HEAD
     -+	'
     - 
     +-
      -test_full_bitmap
     -+	test_expect_success "use lookup table: $enabled" '
     -+		git config pack.writeBitmapLookupTable '"$enabled"'
     -+	'
     - 
     +-
      -test_expect_success 'create partial bitmap state' '
      -	# pick a commit to represent the repo tip in the past
      -	cutoff=$(git rev-list HEAD~100 -1) &&
      -	orig_tip=$(git rev-parse HEAD) &&
     -+	test_perf "repack to disk (lookup=$enabled)" '
     -+		git repack -ad
     -+	'
     - 
     +-
      -	# now kill off all of the refs and pretend we had
      -	# just the one tip
      -	rm -rf .git/logs .git/refs/* .git/packed-refs &&
      -	git update-ref HEAD $cutoff &&
     -+	test_full_bitmap
     - 
     +-
      -	# and then repack, which will leave us with a nice
      -	# big bitmap pack of the "old" history, and all of
      -	# the new history will be loose, as if it had been pushed
      -	# up incrementally and exploded via unpack-objects
      -	git repack -Ad &&
     -+	test_expect_success "create partial bitmap state (lookup=$enabled)" '
     -+		# pick a commit to represent the repo tip in the past
     -+		cutoff=$(git rev-list HEAD~100 -1) &&
     -+		orig_tip=$(git rev-parse HEAD) &&
     - 
     +-
      -	# and now restore our original tip, as if the pushes
      -	# had happened
      -	git update-ref HEAD $orig_tip
      -'
     -+		# now kill off all of the refs and pretend we had
     -+		# just the one tip
     -+		rm -rf .git/logs .git/refs/* .git/packed-refs &&
     -+		git update-ref HEAD $cutoff &&
     +-
     +-test_partial_bitmap
     ++test_lookup_pack_bitmap () {
     ++	test_expect_success 'start the test from scratch' '
     ++		rm -rf * .git
     ++	'
      +
     -+		# and then repack, which will leave us with a nice
     -+		# big bitmap pack of the "old" history, and all of
     -+		# the new history will be loose, as if it had been pushed
     -+		# up incrementally and exploded via unpack-objects
     -+		git repack -Ad &&
     ++	test_perf_large_repo
      +
     -+		# and now restore our original tip, as if the pushes
     -+		# had happened
     -+		git update-ref HEAD $orig_tip
     ++	# note that we do everything through config,
     ++	# since we want to be able to compare bitmap-aware
     ++	# git versus non-bitmap git
     ++	#
     ++	# We intentionally use the deprecated pack.writebitmaps
     ++	# config so that we can test against older versions of git.
     ++	test_expect_success 'setup bitmap config' '
     ++		git config pack.writebitmaps true
      +	'
      +
     -+	test_partial_bitmap
     ++	# we need to create the tag up front such that it is covered by the repack and
     ++	# thus by generated bitmaps.
     ++	test_expect_success 'create tags' '
     ++		git tag --message="tag pointing to HEAD" perf-tag HEAD
     ++	'
     ++
     ++	test_perf "enable lookup table: $1" '
     ++		git config pack.writeBitmapLookupTable '"$1"'
     ++	'
     ++
     ++	test_pack_bitmap
      +}
     ++
     ++test_lookup_pack_bitmap false
     ++test_lookup_pack_bitmap true
       
     --test_partial_bitmap
     -+test_bitmap false
     -+test_bitmap true
     + test_done
     +
     + ## t/perf/p5311-pack-bitmaps-fetch.sh ##
     +@@
     + test_description='performance of fetches from bitmapped packs'
     + . ./perf-lib.sh
     + 
     +-test_perf_default_repo
     +-
     +-test_expect_success 'create bitmapped server repo' '
     +-	git config pack.writebitmaps true &&
     +-	git repack -ad
     +-'
     +-
     +-# simulate a fetch from a repository that last fetched N days ago, for
     +-# various values of N. We do so by following the first-parent chain,
     +-# and assume the first entry in the chain that is N days older than the current
     +-# HEAD is where the HEAD would have been then.
     +-for days in 1 2 4 8 16 32 64 128; do
     +-	title=$(printf '%10s' "($days days)")
     +-	test_expect_success "setup revs from $days days ago" '
     +-		now=$(git log -1 --format=%ct HEAD) &&
     +-		then=$(($now - ($days * 86400))) &&
     +-		tip=$(git rev-list -1 --first-parent --until=$then HEAD) &&
     +-		{
     +-			echo HEAD &&
     +-			echo ^$tip
     +-		} >revs
     ++test_fetch_bitmaps () {
     ++	test_expect_success 'setup test directory' '
     ++		rm -fr * .git
     + 	'
     + 
     +-	test_perf "server $title" '
     +-		git pack-objects --stdout --revs \
     +-				 --thin --delta-base-offset \
     +-				 <revs >tmp.pack
     +-	'
     ++	test_perf_default_repo
     + 
     +-	test_size "size   $title" '
     +-		wc -c <tmp.pack
     ++	test_expect_success 'create bitmapped server repo' '
     ++		git config pack.writebitmaps true &&
     ++		git config pack.writeBitmapLookupTable '"$1"' &&
     ++		git repack -ad
     + 	'
     + 
     +-	test_perf "client $title" '
     +-		git index-pack --stdin --fix-thin <tmp.pack
     +-	'
     +-done
     ++	# simulate a fetch from a repository that last fetched N days ago, for
     ++	# various values of N. We do so by following the first-parent chain,
     ++	# and assume the first entry in the chain that is N days older than the current
     ++	# HEAD is where the HEAD would have been then.
     ++	for days in 1 2 4 8 16 32 64 128; do
     ++		title=$(printf '%10s' "($days days)")
     ++		test_expect_success "setup revs from $days days ago" '
     ++			now=$(git log -1 --format=%ct HEAD) &&
     ++			then=$(($now - ($days * 86400))) &&
     ++			tip=$(git rev-list -1 --first-parent --until=$then HEAD) &&
     ++			{
     ++				echo HEAD &&
     ++				echo ^$tip
     ++			} >revs
     ++		'
     ++
     ++		test_perf "server $title (lookup=$1)" '
     ++			git pack-objects --stdout --revs \
     ++					--thin --delta-base-offset \
     ++					<revs >tmp.pack
     ++		'
     ++
     ++		test_size "size   $title" '
     ++			wc -c <tmp.pack
     ++		'
     ++
     ++		test_perf "client $title (lookup=$1)" '
     ++			git index-pack --stdin --fix-thin <tmp.pack
     ++		'
     ++	done
     ++}
     ++
     ++test_fetch_bitmaps true
     ++test_fetch_bitmaps false
       
       test_done
      
     + ## t/perf/p5312-pack-bitmaps-revs.sh (new) ##
     +@@
     ++#!/bin/sh
     ++
     ++test_description='Tests pack performance using bitmaps (rev index enabled)'
     ++. ./perf-lib.sh
     ++. "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
     ++
     ++test_lookup_pack_bitmap () {
     ++	test_expect_success 'start the test from scratch' '
     ++		rm -rf * .git
     ++	'
     ++
     ++	test_perf_large_repo
     ++
     ++	test_expect_success 'setup bitmap config' '
     ++		git config pack.writebitmaps true &&
     ++		git config pack.writeReverseIndex true
     ++	'
     ++
     ++	# we need to create the tag up front such that it is covered by the repack and
     ++	# thus by generated bitmaps.
     ++	test_expect_success 'create tags' '
     ++		git tag --message="tag pointing to HEAD" perf-tag HEAD
     ++	'
     ++
     ++	test_perf "enable lookup table: $1" '
     ++		git config pack.writeBitmapLookupTable '"$1"'
     ++	'
     ++
     ++	test_pack_bitmap
     ++}
     ++
     ++test_lookup_pack_bitmap false
     ++test_lookup_pack_bitmap true
     ++
     ++test_done
     +
       ## t/perf/p5326-multi-pack-bitmaps.sh ##
      @@ t/perf/p5326-multi-pack-bitmaps.sh: test_description='Tests performance using midx bitmaps'
     + . ./perf-lib.sh
     + . "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
       
     - test_perf_large_repo
     - 
     +-test_perf_large_repo
     +-
      -# we need to create the tag up front such that it is covered by the repack and
      -# thus by generated bitmaps.
      -test_expect_success 'create tags' '
     @@ t/perf/p5326-multi-pack-bitmaps.sh: test_description='Tests performance using mi
      +test_bitmap () {
      +	local enabled="$1"
      +
     ++	test_expect_success "remove existing repo (lookup=$enabled)" '
     ++		rm -fr * .git
     ++	'
     ++
     ++	test_perf_large_repo
     ++
      +	# we need to create the tag up front such that it is covered by the repack and
      +	# thus by generated bitmaps.
      +	test_expect_success 'create tags' '

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH v6 1/6] Documentation/technical: describe bitmap lookup table extension
  2022-08-14 16:55         ` [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
@ 2022-08-14 16:55           ` Abhradeep Chakraborty via GitGitGadget
  2022-08-14 16:55           ` [PATCH v6 2/6] bitmap: move `get commit positions` code to `bitmap_writer_finish` Abhradeep Chakraborty via GitGitGadget
                             ` (6 subsequent siblings)
  7 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-08-14 16:55 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Junio C Hamano, Derrick Stolee,
	Philip Oakley, Martin Ågren,
	Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Johannes Schindelin, Abhradeep Chakraborty,
	Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

When reading bitmap file, Git loads each and every bitmap one by one
even if all the bitmaps are not required. A "bitmap lookup table"
extension to the bitmap format can reduce the overhead of loading
bitmaps which stores a list of bitmapped commit id pos (in the midx
or pack, along with their offset and xor offset. This way Git can
load only the necessary bitmaps without loading the previous bitmaps.

Older versions of Git ignore the lookup table extension and don't
throw any kind of warning or error while parsing the bitmap file.

Add some information for the new "bitmap lookup table" extension in the
bitmap-format documentation.

Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Co-Authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 Documentation/technical/bitmap-format.txt | 39 +++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index a85f58f5153..c2e652b71a7 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -72,6 +72,17 @@ MIDXs, both the bit-cache and rev-cache extensions are required.
 	    pack/MIDX. The format and meaning of the name-hash is
 	    described below.
 
+		** {empty}
+		BITMAP_OPT_LOOKUP_TABLE (0x10): :::
+		If present, the end of the bitmap file contains a table
+		containing a list of `N` <commit_pos, offset, xor_row>
+		triplets. The format and meaning of the table is described
+		below.
++
+NOTE: Unlike the xor_offset used to compress an individual bitmap,
+`xor_row` stores an *absolute* index into the lookup table, not a location
+relative to the current entry.
+
 	4-byte entry count (network byte order): ::
 	    The total count of entries (bitmapped commits) in this bitmap index.
 
@@ -216,3 +227,31 @@ Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag.
 If implementations want to choose a different hashing scheme, they are
 free to do so, but MUST allocate a new header flag (because comparing
 hashes made under two different schemes would be pointless).
+
+Commit lookup table
+-------------------
+
+If the BITMAP_OPT_LOOKUP_TABLE flag is set, the last `N * (4 + 8 + 4)`
+bytes (preceding the name-hash cache and trailing hash) of the `.bitmap`
+file contains a lookup table specifying the information needed to get
+the desired bitmap from the entries without parsing previous unnecessary
+bitmaps.
+
+For a `.bitmap` containing `nr_entries` reachability bitmaps, the table
+contains a list of `nr_entries` <commit_pos, offset, xor_row> triplets
+(sorted in the ascending order of `commit_pos`). The content of i'th
+triplet is -
+
+	* {empty}
+	commit_pos (4 byte integer, network byte order): ::
+	It stores the object position of a commit (in the midx or pack
+	index).
+
+	* {empty}
+	offset (8 byte integer, network byte order): ::
+	The offset from which that commit's bitmap can be read.
+
+	* {empty}
+	xor_row (4 byte integer, network byte order): ::
+	The position of the triplet whose bitmap is used to compress
+	this one, or `0xffffffff` if no such bitmap exists.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v6 2/6] bitmap: move `get commit positions` code to `bitmap_writer_finish`
  2022-08-14 16:55         ` [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
  2022-08-14 16:55           ` [PATCH v6 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
@ 2022-08-14 16:55           ` Abhradeep Chakraborty via GitGitGadget
  2022-08-14 16:55           ` [PATCH v6 3/6] pack-bitmap-write.c: write lookup table extension Abhradeep Chakraborty via GitGitGadget
                             ` (5 subsequent siblings)
  7 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-08-14 16:55 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Junio C Hamano, Derrick Stolee,
	Philip Oakley, Martin Ågren,
	Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Johannes Schindelin, Abhradeep Chakraborty,
	Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

The `write_selected_commits_v1` function takes care of writing commit
positions along with their corresponding bitmaps in the disk. It is
OK because this `search commit position of a given commit` algorithm
is needed only once here. But in later changes of the `lookup table
extension series`, we need same commit positions which means we have
to run the above mentioned algorithm one more time.

Move the `search commit position of a given commit` algorithm to
`bitmap_writer_finish()` and use the `commit_positions` array
to get commit positions of their corresponding bitmaps.

Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 pack-bitmap-write.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 4fcfaed428f..9b1be59f6d3 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -650,20 +650,15 @@ static const struct object_id *oid_access(size_t pos, const void *table)
 
 static void write_selected_commits_v1(struct hashfile *f,
 				      struct pack_idx_entry **index,
-				      uint32_t index_nr)
+				      uint32_t index_nr,
+				      uint32_t *commit_positions)
 {
 	int i;
 
 	for (i = 0; i < writer.selected_nr; ++i) {
 		struct bitmapped_commit *stored = &writer.selected[i];
 
-		int commit_pos =
-			oid_pos(&stored->commit->object.oid, index, index_nr, oid_access);
-
-		if (commit_pos < 0)
-			BUG("trying to write commit not in index");
-
-		hashwrite_be32(f, commit_pos);
+		hashwrite_be32(f, commit_positions[i]);
 		hashwrite_u8(f, stored->xor_offset);
 		hashwrite_u8(f, stored->flags);
 
@@ -697,6 +692,8 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	static uint16_t flags = BITMAP_OPT_FULL_DAG;
 	struct strbuf tmp_file = STRBUF_INIT;
 	struct hashfile *f;
+	uint32_t *commit_positions = NULL;
+	uint32_t i;
 
 	struct bitmap_disk_header header;
 
@@ -715,7 +712,20 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	dump_bitmap(f, writer.trees);
 	dump_bitmap(f, writer.blobs);
 	dump_bitmap(f, writer.tags);
-	write_selected_commits_v1(f, index, index_nr);
+
+	ALLOC_ARRAY(commit_positions, writer.selected_nr);
+
+	for (i = 0; i < writer.selected_nr; i++) {
+		struct bitmapped_commit *stored = &writer.selected[i];
+		int commit_pos = oid_pos(&stored->commit->object.oid, index, index_nr, oid_access);
+
+		if (commit_pos < 0)
+			BUG(_("trying to write commit not in index"));
+
+		commit_positions[i] = commit_pos;
+	}
+
+	write_selected_commits_v1(f, index, index_nr, commit_positions);
 
 	if (options & BITMAP_OPT_HASH_CACHE)
 		write_hash_cache(f, index, index_nr);
@@ -730,4 +740,5 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 		die_errno("unable to rename temporary bitmap file to '%s'", filename);
 
 	strbuf_release(&tmp_file);
+	free(commit_positions);
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v6 3/6] pack-bitmap-write.c: write lookup table extension
  2022-08-14 16:55         ` [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
  2022-08-14 16:55           ` [PATCH v6 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
  2022-08-14 16:55           ` [PATCH v6 2/6] bitmap: move `get commit positions` code to `bitmap_writer_finish` Abhradeep Chakraborty via GitGitGadget
@ 2022-08-14 16:55           ` Abhradeep Chakraborty via GitGitGadget
  2022-08-14 16:55           ` [PATCH v6 4/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
                             ` (4 subsequent siblings)
  7 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-08-14 16:55 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Junio C Hamano, Derrick Stolee,
	Philip Oakley, Martin Ågren,
	Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Johannes Schindelin, Abhradeep Chakraborty,
	Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

The bitmap lookup table extension was documented by an earlier
change, but Git does not yet know how to write that extension.

Teach Git to write bitmap lookup table extension. The table contains
the list of `N` <commit_pos, offset, xor_row>` triplets. These
triplets are sorted according to their commit pos (ascending order).
The meaning of each data in the i'th triplet is given below:

  - commit_pos stores commit position (in the pack-index or midx).
    It is a 4 byte network byte order unsigned integer.

  - offset is the position (in the bitmap file) from which that
    commit's bitmap can be read.

  - xor_row is the position of the triplet in the lookup table
    whose bitmap is used to compress this bitmap, or `0xffffffff`
    if no such bitmap exists.

Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 pack-bitmap-write.c | 91 ++++++++++++++++++++++++++++++++++++++++++++-
 pack-bitmap.h       |  5 ++-
 2 files changed, 92 insertions(+), 4 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 9b1be59f6d3..2cfc92f2871 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -651,13 +651,17 @@ static const struct object_id *oid_access(size_t pos, const void *table)
 static void write_selected_commits_v1(struct hashfile *f,
 				      struct pack_idx_entry **index,
 				      uint32_t index_nr,
-				      uint32_t *commit_positions)
+				      uint32_t *commit_positions,
+				      off_t *offsets)
 {
 	int i;
 
 	for (i = 0; i < writer.selected_nr; ++i) {
 		struct bitmapped_commit *stored = &writer.selected[i];
 
+		if (offsets)
+			offsets[i] = hashfile_total(f);
+
 		hashwrite_be32(f, commit_positions[i]);
 		hashwrite_u8(f, stored->xor_offset);
 		hashwrite_u8(f, stored->flags);
@@ -666,6 +670,81 @@ static void write_selected_commits_v1(struct hashfile *f,
 	}
 }
 
+static int table_cmp(const void *_va, const void *_vb, void *_data)
+{
+	uint32_t *commit_positions = _data;
+	uint32_t a = commit_positions[*(uint32_t *)_va];
+	uint32_t b = commit_positions[*(uint32_t *)_vb];
+
+	if (a > b)
+		return 1;
+	else if (a < b)
+		return -1;
+
+	return 0;
+}
+
+static void write_lookup_table(struct hashfile *f,
+			       struct pack_idx_entry **index,
+			       uint32_t index_nr,
+			       uint32_t *commit_positions,
+			       off_t *offsets)
+{
+	uint32_t i;
+	uint32_t *table, *table_inv;
+
+	ALLOC_ARRAY(table, writer.selected_nr);
+	ALLOC_ARRAY(table_inv, writer.selected_nr);
+
+	for (i = 0; i < writer.selected_nr; i++)
+		table[i] = i;
+
+	/*
+	 * At the end of this sort table[j] = i means that the i'th
+	 * bitmap corresponds to j'th bitmapped commit (among the selected
+	 * commits) in lex order of OIDs.
+	 */
+	QSORT_S(table, writer.selected_nr, table_cmp, commit_positions);
+
+	/* table_inv helps us discover that relationship (i'th bitmap
+	 * to j'th commit by j = table_inv[i])
+	 */
+	for (i = 0; i < writer.selected_nr; i++)
+		table_inv[table[i]] = i;
+
+	trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository);
+	for (i = 0; i < writer.selected_nr; i++) {
+		struct bitmapped_commit *selected = &writer.selected[table[i]];
+		uint32_t xor_offset = selected->xor_offset;
+		uint32_t xor_row;
+
+		if (xor_offset) {
+			/*
+			 * xor_index stores the index (in the bitmap entries)
+			 * of the corresponding xor bitmap. But we need to convert
+			 * this index into lookup table's index. So, table_inv[xor_index]
+			 * gives us the index position w.r.t. the lookup table.
+			 *
+			 * If "k = table[i] - xor_offset" then the xor base is the k'th
+			 * bitmap. `table_inv[k]` gives us the position of that bitmap
+			 * in the lookup table.
+			 */
+			uint32_t xor_index = table[i] - xor_offset;
+			xor_row = table_inv[xor_index];
+		} else {
+			xor_row = 0xffffffff;
+		}
+
+		hashwrite_be32(f, commit_positions[table[i]]);
+		hashwrite_be64(f, (uint64_t)offsets[table[i]]);
+		hashwrite_be32(f, xor_row);
+	}
+	trace2_region_leave("pack-bitmap-write", "writing_lookup_table", the_repository);
+
+	free(table);
+	free(table_inv);
+}
+
 static void write_hash_cache(struct hashfile *f,
 			     struct pack_idx_entry **index,
 			     uint32_t index_nr)
@@ -693,6 +772,7 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	struct strbuf tmp_file = STRBUF_INIT;
 	struct hashfile *f;
 	uint32_t *commit_positions = NULL;
+	off_t *offsets = NULL;
 	uint32_t i;
 
 	struct bitmap_disk_header header;
@@ -713,6 +793,9 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	dump_bitmap(f, writer.blobs);
 	dump_bitmap(f, writer.tags);
 
+	if (options & BITMAP_OPT_LOOKUP_TABLE)
+		CALLOC_ARRAY(offsets, index_nr);
+
 	ALLOC_ARRAY(commit_positions, writer.selected_nr);
 
 	for (i = 0; i < writer.selected_nr; i++) {
@@ -725,7 +808,10 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 		commit_positions[i] = commit_pos;
 	}
 
-	write_selected_commits_v1(f, index, index_nr, commit_positions);
+	write_selected_commits_v1(f, index, index_nr, commit_positions, offsets);
+
+	if (options & BITMAP_OPT_LOOKUP_TABLE)
+		write_lookup_table(f, index, index_nr, commit_positions, offsets);
 
 	if (options & BITMAP_OPT_HASH_CACHE)
 		write_hash_cache(f, index, index_nr);
@@ -741,4 +827,5 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 
 	strbuf_release(&tmp_file);
 	free(commit_positions);
+	free(offsets);
 }
diff --git a/pack-bitmap.h b/pack-bitmap.h
index f3a57ca065f..cb065a263cb 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -24,8 +24,9 @@ struct bitmap_disk_header {
 #define NEEDS_BITMAP (1u<<22)
 
 enum pack_bitmap_opts {
-	BITMAP_OPT_FULL_DAG = 1,
-	BITMAP_OPT_HASH_CACHE = 4,
+	BITMAP_OPT_FULL_DAG = 0x1,
+	BITMAP_OPT_HASH_CACHE = 0x4,
+	BITMAP_OPT_LOOKUP_TABLE = 0x10,
 };
 
 enum pack_bitmap_flags {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v6 4/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-08-14 16:55         ` [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                             ` (2 preceding siblings ...)
  2022-08-14 16:55           ` [PATCH v6 3/6] pack-bitmap-write.c: write lookup table extension Abhradeep Chakraborty via GitGitGadget
@ 2022-08-14 16:55           ` Abhradeep Chakraborty via GitGitGadget
  2022-08-14 16:55           ` [PATCH v6 5/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
                             ` (3 subsequent siblings)
  7 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-08-14 16:55 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Junio C Hamano, Derrick Stolee,
	Philip Oakley, Martin Ågren,
	Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Johannes Schindelin, Abhradeep Chakraborty,
	Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Teach Git to provide a way for users to enable/disable bitmap lookup
table extension by providing a config option named 'writeBitmapLookupTable'.
Default is false.

Also add test to verify writting of lookup table.

Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Co-Authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 Documentation/config/pack.txt     |   7 +
 builtin/multi-pack-index.c        |   7 +
 builtin/pack-objects.c            |   8 +
 midx.c                            |   3 +
 midx.h                            |   1 +
 t/t5310-pack-bitmaps.sh           | 792 ++++++++++++++++--------------
 t/t5311-pack-bitmaps-shallow.sh   |  53 +-
 t/t5326-multi-pack-bitmaps.sh     | 421 +++++++++-------
 t/t5327-multi-pack-bitmaps-rev.sh |  24 +-
 9 files changed, 733 insertions(+), 583 deletions(-)

diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index ad7f73a1ead..b955ca572ec 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -164,6 +164,13 @@ When writing a multi-pack reachability bitmap, no new namehashes are
 computed; instead, any namehashes stored in an existing bitmap are
 permuted into their appropriate location when writing a new bitmap.
 
+pack.writeBitmapLookupTable::
+	When true, Git will include a "lookup table" section in the
+	bitmap index (if one is written). This table is used to defer
+	loading individual bitmaps as late as possible. This can be
+	beneficial in repositories that have relatively large bitmap
+	indexes. Defaults to false.
+
 pack.writeReverseIndex::
 	When true, git will write a corresponding .rev file (see:
 	link:../technical/pack-format.html[Documentation/technical/pack-format.txt])
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 8f24d59a753..e7cce1d26ee 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -87,6 +87,13 @@ static int git_multi_pack_index_write_config(const char *var, const char *value,
 			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
 	}
 
+	if (!strcmp(var, "pack.writebitmaplookuptable")) {
+		if (git_config_bool(var, value))
+			opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;
+		else
+			opts.flags &= ~MIDX_WRITE_BITMAP_LOOKUP_TABLE;
+	}
+
 	/*
 	 * We should never make a fall-back call to 'git_default_config', since
 	 * this was already called in 'cmd_multi_pack_index()'.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 39e28cfcafc..46e26774963 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3148,6 +3148,14 @@ static int git_pack_config(const char *k, const char *v, void *cb)
 		else
 			write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE;
 	}
+
+	if (!strcmp(k, "pack.writebitmaplookuptable")) {
+		if (git_config_bool(k, v))
+			write_bitmap_options |= BITMAP_OPT_LOOKUP_TABLE;
+		else
+			write_bitmap_options &= ~BITMAP_OPT_LOOKUP_TABLE;
+	}
+
 	if (!strcmp(k, "pack.usebitmaps")) {
 		use_bitmap_index_default = git_config_bool(k, v);
 		return 0;
diff --git a/midx.c b/midx.c
index 4e956cacb71..3ff6e91e6ee 100644
--- a/midx.c
+++ b/midx.c
@@ -1070,6 +1070,9 @@ static int write_midx_bitmap(const char *midx_name,
 	if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
 		options |= BITMAP_OPT_HASH_CACHE;
 
+	if (flags & MIDX_WRITE_BITMAP_LOOKUP_TABLE)
+		options |= BITMAP_OPT_LOOKUP_TABLE;
+
 	/*
 	 * Build the MIDX-order index based on pdata.objects (which is already
 	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
diff --git a/midx.h b/midx.h
index 22e8e53288e..5578cd7b835 100644
--- a/midx.h
+++ b/midx.h
@@ -47,6 +47,7 @@ struct multi_pack_index {
 #define MIDX_WRITE_REV_INDEX (1 << 1)
 #define MIDX_WRITE_BITMAP (1 << 2)
 #define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
+#define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m);
 void get_midx_filename(struct strbuf *out, const char *object_dir);
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index f775fc1ce69..c0607172827 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -26,22 +26,413 @@ has_any () {
 	grep -Ff "$1" "$2"
 }
 
-setup_bitmap_history
-
-test_expect_success 'setup writing bitmaps during repack' '
-	git config repack.writeBitmaps true
-'
-
-test_expect_success 'full repack creates bitmaps' '
-	GIT_TRACE2_EVENT="$(pwd)/trace" \
+test_bitmap_cases () {
+	writeLookupTable=false
+	for i in "$@"
+	do
+		case "$i" in
+		"pack.writeBitmapLookupTable") writeLookupTable=true;;
+		esac
+	done
+
+	test_expect_success 'setup test repository' '
+		rm -fr * .git &&
+		git init &&
+		git config pack.writeBitmapLookupTable '"$writeLookupTable"'
+	'
+	setup_bitmap_history
+
+	test_expect_success 'setup writing bitmaps during repack' '
+		git config repack.writeBitmaps true
+	'
+
+	test_expect_success 'full repack creates bitmaps' '
+		GIT_TRACE2_EVENT="$(pwd)/trace" \
+			git repack -ad &&
+		ls .git/objects/pack/ | grep bitmap >output &&
+		test_line_count = 1 output &&
+		grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
+		grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
+	'
+
+	basic_bitmap_tests
+
+	test_expect_success 'pack-objects respects --local (non-local loose)' '
+		git init --bare alt.git &&
+		echo $(pwd)/alt.git/objects >.git/objects/info/alternates &&
+		echo content1 >file1 &&
+		# non-local loose object which is not present in bitmapped pack
+		altblob=$(GIT_DIR=alt.git git hash-object -w file1) &&
+		# non-local loose object which is also present in bitmapped pack
+		git cat-file blob $blob | GIT_DIR=alt.git git hash-object -w --stdin &&
+		git add file1 &&
+		test_tick &&
+		git commit -m commit_file1 &&
+		echo HEAD | git pack-objects --local --stdout --revs >1.pack &&
+		git index-pack 1.pack &&
+		list_packed_objects 1.idx >1.objects &&
+		printf "%s\n" "$altblob" "$blob" >nonlocal-loose &&
+		! has_any nonlocal-loose 1.objects
+	'
+
+	test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' '
+		echo content2 >file2 &&
+		blob2=$(git hash-object -w file2) &&
+		git add file2 &&
+		test_tick &&
+		git commit -m commit_file2 &&
+		printf "%s\n" "$blob2" "$bitmaptip" >keepobjects &&
+		pack2=$(git pack-objects pack2 <keepobjects) &&
+		mv pack2-$pack2.* .git/objects/pack/ &&
+		>.git/objects/pack/pack2-$pack2.keep &&
+		rm $(objpath $blob2) &&
+		echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >2a.pack &&
+		git index-pack 2a.pack &&
+		list_packed_objects 2a.idx >2a.objects &&
+		! has_any keepobjects 2a.objects
+	'
+
+	test_expect_success 'pack-objects respects --local (non-local pack)' '
+		mv .git/objects/pack/pack2-$pack2.* alt.git/objects/pack/ &&
+		echo HEAD | git pack-objects --local --stdout --revs >2b.pack &&
+		git index-pack 2b.pack &&
+		list_packed_objects 2b.idx >2b.objects &&
+		! has_any keepobjects 2b.objects
+	'
+
+	test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' '
+		ls .git/objects/pack/ | grep bitmap >output &&
+		test_line_count = 1 output &&
+		packbitmap=$(basename $(cat output) .bitmap) &&
+		list_packed_objects .git/objects/pack/$packbitmap.idx >packbitmap.objects &&
+		test_when_finished "rm -f .git/objects/pack/$packbitmap.keep" &&
+		>.git/objects/pack/$packbitmap.keep &&
+		echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack &&
+		git index-pack 3a.pack &&
+		list_packed_objects 3a.idx >3a.objects &&
+		! has_any packbitmap.objects 3a.objects
+	'
+
+	test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' '
+		mv .git/objects/pack/$packbitmap.* alt.git/objects/pack/ &&
+		rm -f .git/objects/pack/multi-pack-index &&
+		test_when_finished "mv alt.git/objects/pack/$packbitmap.* .git/objects/pack/" &&
+		echo HEAD | git pack-objects --local --stdout --revs >3b.pack &&
+		git index-pack 3b.pack &&
+		list_packed_objects 3b.idx >3b.objects &&
+		! has_any packbitmap.objects 3b.objects
+	'
+
+	test_expect_success 'pack-objects to file can use bitmap' '
+		# make sure we still have 1 bitmap index from previous tests
+		ls .git/objects/pack/ | grep bitmap >output &&
+		test_line_count = 1 output &&
+		# verify equivalent packs are generated with/without using bitmap index
+		packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) &&
+		packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) &&
+		list_packed_objects packa-$packasha1.idx >packa.objects &&
+		list_packed_objects packb-$packbsha1.idx >packb.objects &&
+		test_cmp packa.objects packb.objects
+	'
+
+	test_expect_success 'full repack, reusing previous bitmaps' '
 		git repack -ad &&
-	ls .git/objects/pack/ | grep bitmap >output &&
-	test_line_count = 1 output &&
-	grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
-	grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
-'
+		ls .git/objects/pack/ | grep bitmap >output &&
+		test_line_count = 1 output
+	'
+
+	test_expect_success 'fetch (full bitmap)' '
+		git --git-dir=clone.git fetch origin second:second &&
+		git rev-parse HEAD >expect &&
+		git --git-dir=clone.git rev-parse HEAD >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success 'create objects for missing-HAVE tests' '
+		blob=$(echo "missing have" | git hash-object -w --stdin) &&
+		tree=$(printf "100644 blob $blob\tfile\n" | git mktree) &&
+		parent=$(echo parent | git commit-tree $tree) &&
+		commit=$(echo commit | git commit-tree $tree -p $parent) &&
+		cat >revs <<-EOF
+		HEAD
+		^HEAD^
+		^$commit
+		EOF
+	'
+
+	test_expect_success 'pack-objects respects --incremental' '
+		cat >revs2 <<-EOF &&
+		HEAD
+		$commit
+		EOF
+		git pack-objects --incremental --stdout --revs <revs2 >4.pack &&
+		git index-pack 4.pack &&
+		list_packed_objects 4.idx >4.objects &&
+		test_line_count = 4 4.objects &&
+		git rev-list --objects $commit >revlist &&
+		cut -d" " -f1 revlist |sort >objects &&
+		test_cmp 4.objects objects
+	'
+
+	test_expect_success 'pack with missing blob' '
+		rm $(objpath $blob) &&
+		git pack-objects --stdout --revs <revs >/dev/null
+	'
+
+	test_expect_success 'pack with missing tree' '
+		rm $(objpath $tree) &&
+		git pack-objects --stdout --revs <revs >/dev/null
+	'
+
+	test_expect_success 'pack with missing parent' '
+		rm $(objpath $parent) &&
+		git pack-objects --stdout --revs <revs >/dev/null
+	'
+
+	test_expect_success JGIT,SHA1 'we can read jgit bitmaps' '
+		git clone --bare . compat-jgit.git &&
+		(
+			cd compat-jgit.git &&
+			rm -f objects/pack/*.bitmap &&
+			jgit gc &&
+			git rev-list --test-bitmap HEAD
+		)
+	'
+
+	test_expect_success JGIT,SHA1 'jgit can read our bitmaps' '
+		git clone --bare . compat-us.git &&
+		(
+			cd compat-us.git &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+			git repack -adb &&
+			# jgit gc will barf if it does not like our bitmaps
+			jgit gc
+		)
+	'
+
+	test_expect_success 'splitting packs does not generate bogus bitmaps' '
+		test-tool genrandom foo $((1024 * 1024)) >rand &&
+		git add rand &&
+		git commit -m "commit with big file" &&
+		git -c pack.packSizeLimit=500k repack -adb &&
+		git init --bare no-bitmaps.git &&
+		git -C no-bitmaps.git fetch .. HEAD
+	'
+
+	test_expect_success 'set up reusable pack' '
+		rm -f .git/objects/pack/*.keep &&
+		git repack -adb &&
+		reusable_pack () {
+			git for-each-ref --format="%(objectname)" |
+			git pack-objects --delta-base-offset --revs --stdout "$@"
+		}
+	'
+
+	test_expect_success 'pack reuse respects --honor-pack-keep' '
+		test_when_finished "rm -f .git/objects/pack/*.keep" &&
+		for i in .git/objects/pack/*.pack
+		do
+			>${i%.pack}.keep || return 1
+		done &&
+		reusable_pack --honor-pack-keep >empty.pack &&
+		git index-pack empty.pack &&
+		git show-index <empty.idx >actual &&
+		test_must_be_empty actual
+	'
+
+	test_expect_success 'pack reuse respects --local' '
+		mv .git/objects/pack/* alt.git/objects/pack/ &&
+		test_when_finished "mv alt.git/objects/pack/* .git/objects/pack/" &&
+		reusable_pack --local >empty.pack &&
+		git index-pack empty.pack &&
+		git show-index <empty.idx >actual &&
+		test_must_be_empty actual
+	'
+
+	test_expect_success 'pack reuse respects --incremental' '
+		reusable_pack --incremental >empty.pack &&
+		git index-pack empty.pack &&
+		git show-index <empty.idx >actual &&
+		test_must_be_empty actual
+	'
+
+	test_expect_success 'truncated bitmap fails gracefully (ewah)' '
+		test_config pack.writebitmaphashcache false &&
+		git repack -ad &&
+		git rev-list --use-bitmap-index --count --all >expect &&
+		bitmap=$(ls .git/objects/pack/*.bitmap) &&
+		test_when_finished "rm -f $bitmap" &&
+		test_copy_bytes 256 <$bitmap >$bitmap.tmp &&
+		mv -f $bitmap.tmp $bitmap &&
+		git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
+		test_cmp expect actual &&
+		test_i18ngrep corrupt.ewah.bitmap stderr
+	'
+
+	test_expect_success 'truncated bitmap fails gracefully (cache)' '
+		git repack -ad &&
+		git rev-list --use-bitmap-index --count --all >expect &&
+		bitmap=$(ls .git/objects/pack/*.bitmap) &&
+		test_when_finished "rm -f $bitmap" &&
+		test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
+		mv -f $bitmap.tmp $bitmap &&
+		git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
+		test_cmp expect actual &&
+		test_i18ngrep corrupted.bitmap.index stderr
+	'
+
+	# Create a state of history with these properties:
+	#
+	#  - refs that allow a client to fetch some new history, while sharing some old
+	#    history with the server; we use branches delta-reuse-old and
+	#    delta-reuse-new here
+	#
+	#  - the new history contains an object that is stored on the server as a delta
+	#    against a base that is in the old history
+	#
+	#  - the base object is not immediately reachable from the tip of the old
+	#    history; finding it would involve digging down through history we know the
+	#    other side has
+	#
+	# This should result in a state where fetching from old->new would not
+	# traditionally reuse the on-disk delta (because we'd have to dig to realize
+	# that the client has it), but we will do so if bitmaps can tell us cheaply
+	# that the other side has it.
+	test_expect_success 'set up thin delta-reuse parent' '
+		# This first commit contains the buried base object.
+		test-tool genrandom delta 16384 >file &&
+		git add file &&
+		git commit -m "delta base" &&
+		base=$(git rev-parse --verify HEAD:file) &&
+
+		# These intermediate commits bury the base back in history.
+		# This becomes the "old" state.
+		for i in 1 2 3 4 5
+		do
+			echo $i >file &&
+			git commit -am "intermediate $i" || return 1
+		done &&
+		git branch delta-reuse-old &&
+
+		# And now our new history has a delta against the buried base. Note
+		# that this must be smaller than the original file, since pack-objects
+		# prefers to create deltas from smaller objects to larger.
+		test-tool genrandom delta 16300 >file &&
+		git commit -am "delta result" &&
+		delta=$(git rev-parse --verify HEAD:file) &&
+		git branch delta-reuse-new &&
+
+		# Repack with bitmaps and double check that we have the expected delta
+		# relationship.
+		git repack -adb &&
+		have_delta $delta $base
+	'
+
+	# Now we can sanity-check the non-bitmap behavior (that the server is not able
+	# to reuse the delta). This isn't strictly something we care about, so this
+	# test could be scrapped in the future. But it makes sure that the next test is
+	# actually triggering the feature we want.
+	#
+	# Note that our tools for working with on-the-wire "thin" packs are limited. So
+	# we actually perform the fetch, retain the resulting pack, and inspect the
+	# result.
+	test_expect_success 'fetch without bitmaps ignores delta against old base' '
+		test_config pack.usebitmaps false &&
+		test_when_finished "rm -rf client.git" &&
+		git init --bare client.git &&
+		(
+			cd client.git &&
+			git config transfer.unpackLimit 1 &&
+			git fetch .. delta-reuse-old:delta-reuse-old &&
+			git fetch .. delta-reuse-new:delta-reuse-new &&
+			have_delta $delta $ZERO_OID
+		)
+	'
+
+	# And do the same for the bitmap case, where we do expect to find the delta.
+	test_expect_success 'fetch with bitmaps can reuse old base' '
+		test_config pack.usebitmaps true &&
+		test_when_finished "rm -rf client.git" &&
+		git init --bare client.git &&
+		(
+			cd client.git &&
+			git config transfer.unpackLimit 1 &&
+			git fetch .. delta-reuse-old:delta-reuse-old &&
+			git fetch .. delta-reuse-new:delta-reuse-new &&
+			have_delta $delta $base
+		)
+	'
+
+	test_expect_success 'pack.preferBitmapTips' '
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+
+			# create enough commits that not all are receive bitmap
+			# coverage even if they are all at the tip of some reference.
+			test_commit_bulk --message="%s" 103 &&
+
+			git rev-list HEAD >commits.raw &&
+			sort <commits.raw >commits &&
+
+			git log --format="create refs/tags/%s %H" HEAD >refs &&
+			git update-ref --stdin <refs &&
+
+			git repack -adb &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+
+			# remember which commits did not receive bitmaps
+			comm -13 bitmaps commits >before &&
+			test_file_not_empty before &&
+
+			# mark the commits which did not receive bitmaps as preferred,
+			# and generate the bitmap again
+			perl -pe "s{^}{create refs/tags/include/$. }" <before |
+				git update-ref --stdin &&
+			git -c pack.preferBitmapTips=refs/tags/include repack -adb &&
+
+			# finally, check that the commit(s) without bitmap coverage
+			# are not the same ones as before
+			test-tool bitmap list-commits | sort >bitmaps &&
+			comm -13 bitmaps commits >after &&
+
+			! test_cmp before after
+		)
+	'
+
+	test_expect_success 'complains about multiple pack bitmaps' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+
+			test_commit base &&
+
+			git repack -adb &&
+			bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
+			mv "$bitmap" "$bitmap.bak" &&
+
+			test_commit other &&
+			git repack -ab &&
+
+			mv "$bitmap.bak" "$bitmap" &&
+
+			find .git/objects/pack -type f -name "*.pack" >packs &&
+			find .git/objects/pack -type f -name "*.bitmap" >bitmaps &&
+			test_line_count = 2 packs &&
+			test_line_count = 2 bitmaps &&
+
+			git rev-list --use-bitmap-index HEAD 2>err &&
+			grep "ignoring extra bitmap file" err
+		)
+	'
+}
 
-basic_bitmap_tests
+test_bitmap_cases
 
 test_expect_success 'incremental repack fails when bitmaps are requested' '
 	test_commit more-1 &&
@@ -54,375 +445,12 @@ test_expect_success 'incremental repack can disable bitmaps' '
 	git repack -d --no-write-bitmap-index
 '
 
-test_expect_success 'pack-objects respects --local (non-local loose)' '
-	git init --bare alt.git &&
-	echo $(pwd)/alt.git/objects >.git/objects/info/alternates &&
-	echo content1 >file1 &&
-	# non-local loose object which is not present in bitmapped pack
-	altblob=$(GIT_DIR=alt.git git hash-object -w file1) &&
-	# non-local loose object which is also present in bitmapped pack
-	git cat-file blob $blob | GIT_DIR=alt.git git hash-object -w --stdin &&
-	git add file1 &&
-	test_tick &&
-	git commit -m commit_file1 &&
-	echo HEAD | git pack-objects --local --stdout --revs >1.pack &&
-	git index-pack 1.pack &&
-	list_packed_objects 1.idx >1.objects &&
-	printf "%s\n" "$altblob" "$blob" >nonlocal-loose &&
-	! has_any nonlocal-loose 1.objects
-'
-
-test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' '
-	echo content2 >file2 &&
-	blob2=$(git hash-object -w file2) &&
-	git add file2 &&
-	test_tick &&
-	git commit -m commit_file2 &&
-	printf "%s\n" "$blob2" "$bitmaptip" >keepobjects &&
-	pack2=$(git pack-objects pack2 <keepobjects) &&
-	mv pack2-$pack2.* .git/objects/pack/ &&
-	>.git/objects/pack/pack2-$pack2.keep &&
-	rm $(objpath $blob2) &&
-	echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >2a.pack &&
-	git index-pack 2a.pack &&
-	list_packed_objects 2a.idx >2a.objects &&
-	! has_any keepobjects 2a.objects
-'
-
-test_expect_success 'pack-objects respects --local (non-local pack)' '
-	mv .git/objects/pack/pack2-$pack2.* alt.git/objects/pack/ &&
-	echo HEAD | git pack-objects --local --stdout --revs >2b.pack &&
-	git index-pack 2b.pack &&
-	list_packed_objects 2b.idx >2b.objects &&
-	! has_any keepobjects 2b.objects
-'
-
-test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' '
-	ls .git/objects/pack/ | grep bitmap >output &&
-	test_line_count = 1 output &&
-	packbitmap=$(basename $(cat output) .bitmap) &&
-	list_packed_objects .git/objects/pack/$packbitmap.idx >packbitmap.objects &&
-	test_when_finished "rm -f .git/objects/pack/$packbitmap.keep" &&
-	>.git/objects/pack/$packbitmap.keep &&
-	echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack &&
-	git index-pack 3a.pack &&
-	list_packed_objects 3a.idx >3a.objects &&
-	! has_any packbitmap.objects 3a.objects
-'
-
-test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' '
-	mv .git/objects/pack/$packbitmap.* alt.git/objects/pack/ &&
-	rm -f .git/objects/pack/multi-pack-index &&
-	test_when_finished "mv alt.git/objects/pack/$packbitmap.* .git/objects/pack/" &&
-	echo HEAD | git pack-objects --local --stdout --revs >3b.pack &&
-	git index-pack 3b.pack &&
-	list_packed_objects 3b.idx >3b.objects &&
-	! has_any packbitmap.objects 3b.objects
-'
-
-test_expect_success 'pack-objects to file can use bitmap' '
-	# make sure we still have 1 bitmap index from previous tests
-	ls .git/objects/pack/ | grep bitmap >output &&
-	test_line_count = 1 output &&
-	# verify equivalent packs are generated with/without using bitmap index
-	packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) &&
-	packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) &&
-	list_packed_objects packa-$packasha1.idx >packa.objects &&
-	list_packed_objects packb-$packbsha1.idx >packb.objects &&
-	test_cmp packa.objects packb.objects
-'
-
-test_expect_success 'full repack, reusing previous bitmaps' '
-	git repack -ad &&
-	ls .git/objects/pack/ | grep bitmap >output &&
-	test_line_count = 1 output
-'
-
-test_expect_success 'fetch (full bitmap)' '
-	git --git-dir=clone.git fetch origin second:second &&
-	git rev-parse HEAD >expect &&
-	git --git-dir=clone.git rev-parse HEAD >actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'create objects for missing-HAVE tests' '
-	blob=$(echo "missing have" | git hash-object -w --stdin) &&
-	tree=$(printf "100644 blob $blob\tfile\n" | git mktree) &&
-	parent=$(echo parent | git commit-tree $tree) &&
-	commit=$(echo commit | git commit-tree $tree -p $parent) &&
-	cat >revs <<-EOF
-	HEAD
-	^HEAD^
-	^$commit
-	EOF
-'
-
-test_expect_success 'pack-objects respects --incremental' '
-	cat >revs2 <<-EOF &&
-	HEAD
-	$commit
-	EOF
-	git pack-objects --incremental --stdout --revs <revs2 >4.pack &&
-	git index-pack 4.pack &&
-	list_packed_objects 4.idx >4.objects &&
-	test_line_count = 4 4.objects &&
-	git rev-list --objects $commit >revlist &&
-	cut -d" " -f1 revlist |sort >objects &&
-	test_cmp 4.objects objects
-'
-
-test_expect_success 'pack with missing blob' '
-	rm $(objpath $blob) &&
-	git pack-objects --stdout --revs <revs >/dev/null
-'
+test_bitmap_cases "pack.writeBitmapLookupTable"
 
-test_expect_success 'pack with missing tree' '
-	rm $(objpath $tree) &&
-	git pack-objects --stdout --revs <revs >/dev/null
-'
-
-test_expect_success 'pack with missing parent' '
-	rm $(objpath $parent) &&
-	git pack-objects --stdout --revs <revs >/dev/null
-'
-
-test_expect_success JGIT,SHA1 'we can read jgit bitmaps' '
-	git clone --bare . compat-jgit.git &&
-	(
-		cd compat-jgit.git &&
-		rm -f objects/pack/*.bitmap &&
-		jgit gc &&
-		git rev-list --test-bitmap HEAD
-	)
-'
-
-test_expect_success JGIT,SHA1 'jgit can read our bitmaps' '
-	git clone --bare . compat-us.git &&
-	(
-		cd compat-us.git &&
-		git repack -adb &&
-		# jgit gc will barf if it does not like our bitmaps
-		jgit gc
-	)
-'
-
-test_expect_success 'splitting packs does not generate bogus bitmaps' '
-	test-tool genrandom foo $((1024 * 1024)) >rand &&
-	git add rand &&
-	git commit -m "commit with big file" &&
-	git -c pack.packSizeLimit=500k repack -adb &&
-	git init --bare no-bitmaps.git &&
-	git -C no-bitmaps.git fetch .. HEAD
-'
-
-test_expect_success 'set up reusable pack' '
-	rm -f .git/objects/pack/*.keep &&
-	git repack -adb &&
-	reusable_pack () {
-		git for-each-ref --format="%(objectname)" |
-		git pack-objects --delta-base-offset --revs --stdout "$@"
-	}
-'
-
-test_expect_success 'pack reuse respects --honor-pack-keep' '
-	test_when_finished "rm -f .git/objects/pack/*.keep" &&
-	for i in .git/objects/pack/*.pack
-	do
-		>${i%.pack}.keep || return 1
-	done &&
-	reusable_pack --honor-pack-keep >empty.pack &&
-	git index-pack empty.pack &&
-	git show-index <empty.idx >actual &&
-	test_must_be_empty actual
-'
-
-test_expect_success 'pack reuse respects --local' '
-	mv .git/objects/pack/* alt.git/objects/pack/ &&
-	test_when_finished "mv alt.git/objects/pack/* .git/objects/pack/" &&
-	reusable_pack --local >empty.pack &&
-	git index-pack empty.pack &&
-	git show-index <empty.idx >actual &&
-	test_must_be_empty actual
-'
-
-test_expect_success 'pack reuse respects --incremental' '
-	reusable_pack --incremental >empty.pack &&
-	git index-pack empty.pack &&
-	git show-index <empty.idx >actual &&
-	test_must_be_empty actual
-'
-
-test_expect_success 'truncated bitmap fails gracefully (ewah)' '
-	test_config pack.writebitmaphashcache false &&
-	git repack -ad &&
-	git rev-list --use-bitmap-index --count --all >expect &&
-	bitmap=$(ls .git/objects/pack/*.bitmap) &&
-	test_when_finished "rm -f $bitmap" &&
-	test_copy_bytes 256 <$bitmap >$bitmap.tmp &&
-	mv -f $bitmap.tmp $bitmap &&
-	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
-	test_cmp expect actual &&
-	test_i18ngrep corrupt.ewah.bitmap stderr
-'
-
-test_expect_success 'truncated bitmap fails gracefully (cache)' '
-	git repack -ad &&
-	git rev-list --use-bitmap-index --count --all >expect &&
-	bitmap=$(ls .git/objects/pack/*.bitmap) &&
-	test_when_finished "rm -f $bitmap" &&
-	test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
-	mv -f $bitmap.tmp $bitmap &&
-	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
-	test_cmp expect actual &&
-	test_i18ngrep corrupted.bitmap.index stderr
-'
-
-# Create a state of history with these properties:
-#
-#  - refs that allow a client to fetch some new history, while sharing some old
-#    history with the server; we use branches delta-reuse-old and
-#    delta-reuse-new here
-#
-#  - the new history contains an object that is stored on the server as a delta
-#    against a base that is in the old history
-#
-#  - the base object is not immediately reachable from the tip of the old
-#    history; finding it would involve digging down through history we know the
-#    other side has
-#
-# This should result in a state where fetching from old->new would not
-# traditionally reuse the on-disk delta (because we'd have to dig to realize
-# that the client has it), but we will do so if bitmaps can tell us cheaply
-# that the other side has it.
-test_expect_success 'set up thin delta-reuse parent' '
-	# This first commit contains the buried base object.
-	test-tool genrandom delta 16384 >file &&
-	git add file &&
-	git commit -m "delta base" &&
-	base=$(git rev-parse --verify HEAD:file) &&
-
-	# These intermediate commits bury the base back in history.
-	# This becomes the "old" state.
-	for i in 1 2 3 4 5
-	do
-		echo $i >file &&
-		git commit -am "intermediate $i" || return 1
-	done &&
-	git branch delta-reuse-old &&
-
-	# And now our new history has a delta against the buried base. Note
-	# that this must be smaller than the original file, since pack-objects
-	# prefers to create deltas from smaller objects to larger.
-	test-tool genrandom delta 16300 >file &&
-	git commit -am "delta result" &&
-	delta=$(git rev-parse --verify HEAD:file) &&
-	git branch delta-reuse-new &&
-
-	# Repack with bitmaps and double check that we have the expected delta
-	# relationship.
-	git repack -adb &&
-	have_delta $delta $base
-'
-
-# Now we can sanity-check the non-bitmap behavior (that the server is not able
-# to reuse the delta). This isn't strictly something we care about, so this
-# test could be scrapped in the future. But it makes sure that the next test is
-# actually triggering the feature we want.
-#
-# Note that our tools for working with on-the-wire "thin" packs are limited. So
-# we actually perform the fetch, retain the resulting pack, and inspect the
-# result.
-test_expect_success 'fetch without bitmaps ignores delta against old base' '
-	test_config pack.usebitmaps false &&
-	test_when_finished "rm -rf client.git" &&
-	git init --bare client.git &&
-	(
-		cd client.git &&
-		git config transfer.unpackLimit 1 &&
-		git fetch .. delta-reuse-old:delta-reuse-old &&
-		git fetch .. delta-reuse-new:delta-reuse-new &&
-		have_delta $delta $ZERO_OID
-	)
-'
-
-# And do the same for the bitmap case, where we do expect to find the delta.
-test_expect_success 'fetch with bitmaps can reuse old base' '
-	test_config pack.usebitmaps true &&
-	test_when_finished "rm -rf client.git" &&
-	git init --bare client.git &&
-	(
-		cd client.git &&
-		git config transfer.unpackLimit 1 &&
-		git fetch .. delta-reuse-old:delta-reuse-old &&
-		git fetch .. delta-reuse-new:delta-reuse-new &&
-		have_delta $delta $base
-	)
-'
-
-test_expect_success 'pack.preferBitmapTips' '
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
-
-		# create enough commits that not all are receive bitmap
-		# coverage even if they are all at the tip of some reference.
-		test_commit_bulk --message="%s" 103 &&
-
-		git rev-list HEAD >commits.raw &&
-		sort <commits.raw >commits &&
-
-		git log --format="create refs/tags/%s %H" HEAD >refs &&
-		git update-ref --stdin <refs &&
-
-		git repack -adb &&
-		test-tool bitmap list-commits | sort >bitmaps &&
-
-		# remember which commits did not receive bitmaps
-		comm -13 bitmaps commits >before &&
-		test_file_not_empty before &&
-
-		# mark the commits which did not receive bitmaps as preferred,
-		# and generate the bitmap again
-		perl -pe "s{^}{create refs/tags/include/$. }" <before |
-			git update-ref --stdin &&
-		git -c pack.preferBitmapTips=refs/tags/include repack -adb &&
-
-		# finally, check that the commit(s) without bitmap coverage
-		# are not the same ones as before
-		test-tool bitmap list-commits | sort >bitmaps &&
-		comm -13 bitmaps commits >after &&
-
-		! test_cmp before after
-	)
-'
-
-test_expect_success 'complains about multiple pack bitmaps' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
-
-		test_commit base &&
-
-		git repack -adb &&
-		bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
-		mv "$bitmap" "$bitmap.bak" &&
-
-		test_commit other &&
-		git repack -ab &&
-
-		mv "$bitmap.bak" "$bitmap" &&
-
-		find .git/objects/pack -type f -name "*.pack" >packs &&
-		find .git/objects/pack -type f -name "*.bitmap" >bitmaps &&
-		test_line_count = 2 packs &&
-		test_line_count = 2 bitmaps &&
-
-		git rev-list --use-bitmap-index HEAD 2>err &&
-		grep "ignoring extra bitmap file" err
-	)
+test_expect_success 'verify writing bitmap lookup table when enabled' '
+	GIT_TRACE2_EVENT="$(pwd)/trace2" \
+		git repack -ad &&
+	grep "\"label\":\"writing_lookup_table\"" trace2
 '
 
 test_done
diff --git a/t/t5311-pack-bitmaps-shallow.sh b/t/t5311-pack-bitmaps-shallow.sh
index 872a95df338..9dae60f73e3 100755
--- a/t/t5311-pack-bitmaps-shallow.sh
+++ b/t/t5311-pack-bitmaps-shallow.sh
@@ -17,23 +17,40 @@ test_description='check bitmap operation with shallow repositories'
 # the tree for A. But in a shallow one, we've grafted away
 # A, and fetching A to B requires that the other side send
 # us the tree for file=1.
-test_expect_success 'setup shallow repo' '
-	echo 1 >file &&
-	git add file &&
-	git commit -m orig &&
-	echo 2 >file &&
-	git commit -a -m update &&
-	git clone --no-local --bare --depth=1 . shallow.git &&
-	echo 1 >file &&
-	git commit -a -m repeat
-'
-
-test_expect_success 'turn on bitmaps in the parent' '
-	git repack -adb
-'
-
-test_expect_success 'shallow fetch from bitmapped repo' '
-	(cd shallow.git && git fetch)
-'
+test_shallow_bitmaps () {
+	writeLookupTable=false
+
+	for i in "$@"
+	do
+		case $i in
+		"pack.writeBitmapLookupTable") writeLookupTable=true;;
+		esac
+	done
+
+	test_expect_success 'setup shallow repo' '
+		rm -rf * .git &&
+		git init &&
+		git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+		echo 1 >file &&
+		git add file &&
+		git commit -m orig &&
+		echo 2 >file &&
+		git commit -a -m update &&
+		git clone --no-local --bare --depth=1 . shallow.git &&
+		echo 1 >file &&
+		git commit -a -m repeat
+	'
+
+	test_expect_success 'turn on bitmaps in the parent' '
+		git repack -adb
+	'
+
+	test_expect_success 'shallow fetch from bitmapped repo' '
+		(cd shallow.git && git fetch)
+	'
+}
+
+test_shallow_bitmaps
+test_shallow_bitmaps "pack.writeBitmapLookupTable"
 
 test_done
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 4fe57414c13..3b206adcee6 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -15,17 +15,24 @@ GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
 sane_unset GIT_TEST_MIDX_WRITE_REV
 sane_unset GIT_TEST_MIDX_READ_RIDX
 
-midx_bitmap_core
-
 bitmap_reuse_tests() {
 	from=$1
 	to=$2
+	writeLookupTable=false
+
+	for i in $3-${$#}
+	do
+		case $i in
+		"pack.writeBitmapLookupTable") writeLookupTable=true;;
+		esac
+	done
 
 	test_expect_success "setup pack reuse tests ($from -> $to)" '
 		rm -fr repo &&
 		git init repo &&
 		(
 			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 			test_commit_bulk 16 &&
 			git tag old-tip &&
 
@@ -43,6 +50,7 @@ bitmap_reuse_tests() {
 	test_expect_success "build bitmap from existing ($from -> $to)" '
 		(
 			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 			test_commit_bulk --id=further 16 &&
 			git tag new-tip &&
 
@@ -59,6 +67,7 @@ bitmap_reuse_tests() {
 	test_expect_success "verify resulting bitmaps ($from -> $to)" '
 		(
 			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 			git for-each-ref &&
 			git rev-list --test-bitmap refs/tags/old-tip &&
 			git rev-list --test-bitmap refs/tags/new-tip
@@ -66,244 +75,294 @@ bitmap_reuse_tests() {
 	'
 }
 
-bitmap_reuse_tests 'pack' 'MIDX'
-bitmap_reuse_tests 'MIDX' 'pack'
-bitmap_reuse_tests 'MIDX' 'MIDX'
+test_midx_bitmap_cases () {
+	writeLookupTable=false
+	writeBitmapLookupTable=
+
+	for i in "$@"
+	do
+		case $i in
+		"pack.writeBitmapLookupTable")
+			writeLookupTable=true
+			writeBitmapLookupTable="$i"
+			;;
+		esac
+	done
+
+	test_expect_success 'setup test_repository' '
+		rm -rf * .git &&
+		git init &&
+		git config pack.writeBitmapLookupTable '"$writeLookupTable"'
+	'
 
-test_expect_success 'missing object closure fails gracefully' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+	midx_bitmap_core
 
-		test_commit loose &&
-		test_commit packed &&
+	bitmap_reuse_tests 'pack' 'MIDX' "$writeBitmapLookupTable"
+	bitmap_reuse_tests 'MIDX' 'pack' "$writeBitmapLookupTable"
+	bitmap_reuse_tests 'MIDX' 'MIDX' "$writeBitmapLookupTable"
 
-		# Do not pass "--revs"; we want a pack without the "loose"
-		# commit.
-		git pack-objects $objdir/pack/pack <<-EOF &&
-		$(git rev-parse packed)
-		EOF
+	test_expect_success 'missing object closure fails gracefully' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-		test_must_fail git multi-pack-index write --bitmap 2>err &&
-		grep "doesn.t have full closure" err &&
-		test_path_is_missing $midx
-	)
-'
+			test_commit loose &&
+			test_commit packed &&
 
-midx_bitmap_partial_tests
+			# Do not pass "--revs"; we want a pack without the "loose"
+			# commit.
+			git pack-objects $objdir/pack/pack <<-EOF &&
+			$(git rev-parse packed)
+			EOF
 
-test_expect_success 'removing a MIDX clears stale bitmaps' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
-		test_commit base &&
-		git repack &&
-		git multi-pack-index write --bitmap &&
+			test_must_fail git multi-pack-index write --bitmap 2>err &&
+			grep "doesn.t have full closure" err &&
+			test_path_is_missing $midx
+		)
+	'
 
-		# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
-		stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
-		rm $midx &&
+	midx_bitmap_partial_tests
 
-		# Then write a new MIDX.
-		test_commit new &&
-		git repack &&
-		git multi-pack-index write --bitmap &&
+	test_expect_success 'removing a MIDX clears stale bitmaps' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
+			test_commit base &&
+			git repack &&
+			git multi-pack-index write --bitmap &&
+
+			# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
+			stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
+			rm $midx &&
+
+			# Then write a new MIDX.
+			test_commit new &&
+			git repack &&
+			git multi-pack-index write --bitmap &&
+
+			test_path_is_file $midx &&
+			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+			test_path_is_missing $stale_bitmap
+		)
+	'
 
-		test_path_is_file $midx &&
-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
-		test_path_is_missing $stale_bitmap
-	)
-'
+	test_expect_success 'pack.preferBitmapTips' '
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-test_expect_success 'pack.preferBitmapTips' '
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+			test_commit_bulk --message="%s" 103 &&
 
-		test_commit_bulk --message="%s" 103 &&
+			git log --format="%H" >commits.raw &&
+			sort <commits.raw >commits &&
 
-		git log --format="%H" >commits.raw &&
-		sort <commits.raw >commits &&
+			git log --format="create refs/tags/%s %H" HEAD >refs &&
+			git update-ref --stdin <refs &&
 
-		git log --format="create refs/tags/%s %H" HEAD >refs &&
-		git update-ref --stdin <refs &&
+			git multi-pack-index write --bitmap &&
+			test_path_is_file $midx &&
+			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
 
-		git multi-pack-index write --bitmap &&
-		test_path_is_file $midx &&
-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			comm -13 bitmaps commits >before &&
+			test_line_count = 1 before &&
 
-		test-tool bitmap list-commits | sort >bitmaps &&
-		comm -13 bitmaps commits >before &&
-		test_line_count = 1 before &&
+			perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
+				<before | git update-ref --stdin &&
 
-		perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
-			<before | git update-ref --stdin &&
+			rm -fr $midx-$(midx_checksum $objdir).bitmap &&
+			rm -fr $midx &&
 
-		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
-		rm -fr $midx &&
+			git -c pack.preferBitmapTips=refs/tags/include \
+				multi-pack-index write --bitmap &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			comm -13 bitmaps commits >after &&
 
-		git -c pack.preferBitmapTips=refs/tags/include \
-			multi-pack-index write --bitmap &&
-		test-tool bitmap list-commits | sort >bitmaps &&
-		comm -13 bitmaps commits >after &&
+			! test_cmp before after
+		)
+	'
 
-		! test_cmp before after
-	)
-'
+	test_expect_success 'writing a bitmap with --refs-snapshot' '
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-test_expect_success 'writing a bitmap with --refs-snapshot' '
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+			test_commit one &&
+			test_commit two &&
 
-		test_commit one &&
-		test_commit two &&
+			git rev-parse one >snapshot &&
 
-		git rev-parse one >snapshot &&
+			git repack -ad &&
 
-		git repack -ad &&
+			# First, write a MIDX which see both refs/tags/one and
+			# refs/tags/two (causing both of those commits to receive
+			# bitmaps).
+			git multi-pack-index write --bitmap &&
 
-		# First, write a MIDX which see both refs/tags/one and
-		# refs/tags/two (causing both of those commits to receive
-		# bitmaps).
-		git multi-pack-index write --bitmap &&
+			test_path_is_file $midx &&
+			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
 
-		test_path_is_file $midx &&
-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			grep "$(git rev-parse one)" bitmaps &&
+			grep "$(git rev-parse two)" bitmaps &&
 
-		test-tool bitmap list-commits | sort >bitmaps &&
-		grep "$(git rev-parse one)" bitmaps &&
-		grep "$(git rev-parse two)" bitmaps &&
+			rm -fr $midx-$(midx_checksum $objdir).bitmap &&
+			rm -fr $midx &&
 
-		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
-		rm -fr $midx &&
+			# Then again, but with a refs snapshot which only sees
+			# refs/tags/one.
+			git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
 
-		# Then again, but with a refs snapshot which only sees
-		# refs/tags/one.
-		git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
+			test_path_is_file $midx &&
+			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
 
-		test_path_is_file $midx &&
-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			grep "$(git rev-parse one)" bitmaps &&
+			! grep "$(git rev-parse two)" bitmaps
+		)
+	'
 
-		test-tool bitmap list-commits | sort >bitmaps &&
-		grep "$(git rev-parse one)" bitmaps &&
-		! grep "$(git rev-parse two)" bitmaps
-	)
-'
+	test_expect_success 'write a bitmap with --refs-snapshot (preferred tips)' '
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-test_expect_success 'write a bitmap with --refs-snapshot (preferred tips)' '
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+			test_commit_bulk --message="%s" 103 &&
 
-		test_commit_bulk --message="%s" 103 &&
+			git log --format="%H" >commits.raw &&
+			sort <commits.raw >commits &&
 
-		git log --format="%H" >commits.raw &&
-		sort <commits.raw >commits &&
+			git log --format="create refs/tags/%s %H" HEAD >refs &&
+			git update-ref --stdin <refs &&
 
-		git log --format="create refs/tags/%s %H" HEAD >refs &&
-		git update-ref --stdin <refs &&
+			git multi-pack-index write --bitmap &&
+			test_path_is_file $midx &&
+			test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
 
-		git multi-pack-index write --bitmap &&
-		test_path_is_file $midx &&
-		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			comm -13 bitmaps commits >before &&
+			test_line_count = 1 before &&
 
-		test-tool bitmap list-commits | sort >bitmaps &&
-		comm -13 bitmaps commits >before &&
-		test_line_count = 1 before &&
+			(
+				grep -vf before commits.raw &&
+				# mark missing commits as preferred
+				sed "s/^/+/" before
+			) >snapshot &&
 
+			rm -fr $midx-$(midx_checksum $objdir).bitmap &&
+			rm -fr $midx &&
+
+			git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
+			test-tool bitmap list-commits | sort >bitmaps &&
+			comm -13 bitmaps commits >after &&
+
+			! test_cmp before after
+		)
+	'
+
+	test_expect_success 'hash-cache values are propagated from pack bitmaps' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
 		(
-			grep -vf before commits.raw &&
-			# mark missing commits as preferred
-			sed "s/^/+/" before
-		) >snapshot &&
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
-		rm -fr $midx &&
+			test_commit base &&
+			test_commit base2 &&
+			git repack -adb &&
 
-		git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
-		test-tool bitmap list-commits | sort >bitmaps &&
-		comm -13 bitmaps commits >after &&
+			test-tool bitmap dump-hashes >pack.raw &&
+			test_file_not_empty pack.raw &&
+			sort pack.raw >pack.hashes &&
 
-		! test_cmp before after
-	)
-'
+			test_commit new &&
+			git repack &&
+			git multi-pack-index write --bitmap &&
 
-test_expect_success 'hash-cache values are propagated from pack bitmaps' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+			test-tool bitmap dump-hashes >midx.raw &&
+			sort midx.raw >midx.hashes &&
 
-		test_commit base &&
-		test_commit base2 &&
-		git repack -adb &&
+			# ensure that every namehash in the pack bitmap can be found in
+			# the midx bitmap (i.e., that there are no oid-namehash pairs
+			# unique to the pack bitmap).
+			comm -23 pack.hashes midx.hashes >dropped.hashes &&
+			test_must_be_empty dropped.hashes
+		)
+	'
 
-		test-tool bitmap dump-hashes >pack.raw &&
-		test_file_not_empty pack.raw &&
-		sort pack.raw >pack.hashes &&
+	test_expect_success 'no .bitmap is written without any objects' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-		test_commit new &&
-		git repack &&
-		git multi-pack-index write --bitmap &&
+			empty="$(git pack-objects $objdir/pack/pack </dev/null)" &&
+			cat >packs <<-EOF &&
+			pack-$empty.idx
+			EOF
 
-		test-tool bitmap dump-hashes >midx.raw &&
-		sort midx.raw >midx.hashes &&
+			git multi-pack-index write --bitmap --stdin-packs \
+				<packs 2>err &&
 
-		# ensure that every namehash in the pack bitmap can be found in
-		# the midx bitmap (i.e., that there are no oid-namehash pairs
-		# unique to the pack bitmap).
-		comm -23 pack.hashes midx.hashes >dropped.hashes &&
-		test_must_be_empty dropped.hashes
-	)
-'
+			grep "bitmap without any objects" err &&
 
-test_expect_success 'no .bitmap is written without any objects' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
+			test_path_is_file $midx &&
+			test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
+		)
+	'
+
+	test_expect_success 'graceful fallback when missing reverse index' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+			git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 
-		empty="$(git pack-objects $objdir/pack/pack </dev/null)" &&
-		cat >packs <<-EOF &&
-		pack-$empty.idx
-		EOF
+			test_commit base &&
 
-		git multi-pack-index write --bitmap --stdin-packs \
-			<packs 2>err &&
+			# write a pack and MIDX bitmap containing base
+			git repack -adb &&
+			git multi-pack-index write --bitmap &&
 
-		grep "bitmap without any objects" err &&
+			GIT_TEST_MIDX_READ_RIDX=0 \
+				git rev-list --use-bitmap-index HEAD 2>err &&
+			! grep "ignoring extra bitmap file" err
+		)
+	'
+}
 
-		test_path_is_file $midx &&
-		test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
-	)
-'
+test_midx_bitmap_cases
+
+test_midx_bitmap_cases "pack.writeBitmapLookupTable"
 
-test_expect_success 'graceful fallback when missing reverse index' '
+test_expect_success 'multi-pack-index write writes lookup table if enabled' '
 	rm -fr repo &&
 	git init repo &&
 	test_when_finished "rm -fr repo" &&
 	(
 		cd repo &&
-
 		test_commit base &&
-
-		# write a pack and MIDX bitmap containing base
-		git repack -adb &&
-		git multi-pack-index write --bitmap &&
-
-		GIT_TEST_MIDX_READ_RIDX=0 \
-			git rev-list --use-bitmap-index HEAD 2>err &&
-		! grep "ignoring extra bitmap file" err
+		git config pack.writeBitmapLookupTable true &&
+		git repack -ad &&
+		GIT_TRACE2_EVENT="$(pwd)/trace" \
+			git multi-pack-index write --bitmap &&
+		grep "\"label\":\"writing_lookup_table\"" trace
 	)
 '
 
diff --git a/t/t5327-multi-pack-bitmaps-rev.sh b/t/t5327-multi-pack-bitmaps-rev.sh
index d30ba632c87..e65e311cd73 100755
--- a/t/t5327-multi-pack-bitmaps-rev.sh
+++ b/t/t5327-multi-pack-bitmaps-rev.sh
@@ -17,7 +17,27 @@ GIT_TEST_MIDX_READ_RIDX=0
 export GIT_TEST_MIDX_WRITE_REV
 export GIT_TEST_MIDX_READ_RIDX
 
-midx_bitmap_core rev
-midx_bitmap_partial_tests rev
+test_midx_bitmap_rev () {
+	writeLookupTable=false
+
+	for i in "$@"
+	do
+		case $i in
+		"pack.writeBitmapLookupTable") writeLookupTable=true;;
+		esac
+	done
+
+	test_expect_success 'setup bitmap config' '
+		rm -rf * .git &&
+		git init &&
+		git config pack.writeBitmapLookupTable '"$writeLookupTable"'
+	'
+
+	midx_bitmap_core rev
+	midx_bitmap_partial_tests rev
+}
+
+test_midx_bitmap_rev
+test_midx_bitmap_rev "pack.writeBitmapLookupTable"
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v6 5/6] pack-bitmap: prepare to read lookup table extension
  2022-08-14 16:55         ` [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                             ` (3 preceding siblings ...)
  2022-08-14 16:55           ` [PATCH v6 4/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
@ 2022-08-14 16:55           ` Abhradeep Chakraborty via GitGitGadget
  2022-08-14 16:55           ` [PATCH v6 6/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
                             ` (2 subsequent siblings)
  7 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-08-14 16:55 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Junio C Hamano, Derrick Stolee,
	Philip Oakley, Martin Ågren,
	Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Johannes Schindelin, Abhradeep Chakraborty,
	Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Earlier change teaches Git to write bitmap lookup table. But Git
does not know how to parse them.

Teach Git to parse the existing bitmap lookup table. The older
versions of Git are not affected by it. Those versions ignore the
lookup table.

Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 pack-bitmap.c           | 290 ++++++++++++++++++++++++++++++++++++++--
 pack-bitmap.h           |   9 ++
 t/t5310-pack-bitmaps.sh |  22 +++
 3 files changed, 312 insertions(+), 9 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index ef580be9e3f..9a208abc1fd 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -83,6 +83,12 @@ struct bitmap_index {
 	/* The checksum of the packfile or MIDX; points into map. */
 	const unsigned char *checksum;
 
+	/*
+	 * If not NULL, this point into the commit table extension
+	 * (within the memory mapped region `map`).
+	 */
+	unsigned char *table_lookup;
+
 	/*
 	 * Extended index.
 	 *
@@ -186,6 +192,16 @@ static int load_bitmap_header(struct bitmap_index *index)
 			index->hashes = (void *)(index_end - cache_size);
 			index_end -= cache_size;
 		}
+
+		if (flags & BITMAP_OPT_LOOKUP_TABLE) {
+			size_t table_size = st_mult(ntohl(header->entry_count),
+						    BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH);
+			if (table_size > index_end - index->map - header_size)
+				return error(_("corrupted bitmap index file (too short to fit lookup table)"));
+			if (git_env_bool("GIT_TEST_READ_COMMIT_TABLE", 1))
+				index->table_lookup = (void *)(index_end - table_size);
+			index_end -= table_size;
+		}
 	}
 
 	index->entry_count = ntohl(header->entry_count);
@@ -212,9 +228,11 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
 
 	hash_pos = kh_put_oid_map(index->bitmaps, stored->oid, &ret);
 
-	/* a 0 return code means the insertion succeeded with no changes,
-	 * because the SHA1 already existed on the map. this is bad, there
-	 * shouldn't be duplicated commits in the index */
+	/*
+	 * A 0 return code means the insertion succeeded with no changes,
+	 * because the SHA1 already existed on the map. This is bad, there
+	 * shouldn't be duplicated commits in the index.
+	 */
 	if (ret == 0) {
 		error(_("duplicate entry in bitmap index: '%s'"), oid_to_hex(oid));
 		return NULL;
@@ -482,7 +500,7 @@ static int load_bitmap(struct bitmap_index *bitmap_git)
 		!(bitmap_git->tags = read_bitmap_1(bitmap_git)))
 		goto failed;
 
-	if (load_bitmap_entries_v1(bitmap_git) < 0)
+	if (!bitmap_git->table_lookup && load_bitmap_entries_v1(bitmap_git) < 0)
 		goto failed;
 
 	return 0;
@@ -570,13 +588,256 @@ struct include_data {
 	struct bitmap *seen;
 };
 
+struct bitmap_lookup_table_triplet {
+	uint32_t commit_pos;
+	uint64_t offset;
+	uint32_t xor_row;
+};
+
+struct bitmap_lookup_table_xor_item {
+	struct object_id oid;
+	uint64_t offset;
+};
+
+/*
+ * Given a `triplet` struct pointer and pointer `p`, this
+ * function reads the triplet beginning at `p` into the struct.
+ * Note that this function assumes that there is enough memory
+ * left for filling the `triplet` struct from `p`.
+ */
+static int bitmap_lookup_table_get_triplet_by_pointer(struct bitmap_lookup_table_triplet *triplet,
+						      const unsigned char *p)
+{
+	if (!triplet)
+		return -1;
+
+	triplet->commit_pos = get_be32(p);
+	p += sizeof(uint32_t);
+	triplet->offset = get_be64(p);
+	p += sizeof(uint64_t);
+	triplet->xor_row = get_be32(p);
+	return 0;
+}
+
+/*
+ * This function gets the raw triplet from `row`'th row in the
+ * lookup table and fills that data to the `triplet`.
+ */
+static int bitmap_lookup_table_get_triplet(struct bitmap_index *bitmap_git,
+					   uint32_t pos,
+					   struct bitmap_lookup_table_triplet *triplet)
+{
+	unsigned char *p = NULL;
+	if (pos >= bitmap_git->entry_count)
+		return error(_("corrupt bitmap lookup table: triplet position out of index"));
+
+	p = bitmap_git->table_lookup + st_mult(pos, BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH);
+
+	return bitmap_lookup_table_get_triplet_by_pointer(triplet, p);
+}
+
+/*
+ * Searches for a matching triplet. `commit_pos` is a pointer
+ * to the wanted commit position value. `table_entry` points to
+ * a triplet in lookup table. The first 4 bytes of each
+ * triplet (pointed by `table_entry`) are compared with `*commit_pos`.
+ */
+static int triplet_cmp(const void *commit_pos, const void *table_entry)
+{
+
+	uint32_t a = *(uint32_t *)commit_pos;
+	uint32_t b = get_be32(table_entry);
+	if (a > b)
+		return 1;
+	else if (a < b)
+		return -1;
+
+	return 0;
+}
+
+static uint32_t bitmap_bsearch_pos(struct bitmap_index *bitmap_git,
+			    struct object_id *oid,
+			    uint32_t *result)
+{
+	int found;
+
+	if (bitmap_is_midx(bitmap_git))
+		found = bsearch_midx(oid, bitmap_git->midx, result);
+	else
+		found = bsearch_pack(oid, bitmap_git->pack, result);
+
+	return found;
+}
+
+/*
+ * `bsearch_triplet_by_pos` function searches for the raw triplet
+ * having commit position same as `commit_pos` and fills `triplet`
+ * object from the raw triplet. Returns 1 on success and 0 on
+ * failure.
+ */
+static int bitmap_bsearch_triplet_by_pos(uint32_t commit_pos,
+				  struct bitmap_index *bitmap_git,
+				  struct bitmap_lookup_table_triplet *triplet)
+{
+	unsigned char *p = bsearch(&commit_pos, bitmap_git->table_lookup, bitmap_git->entry_count,
+				   BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH, triplet_cmp);
+
+	if (!p)
+		return -1;
+
+	return bitmap_lookup_table_get_triplet_by_pointer(triplet, p);
+}
+
+static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_git,
+						    struct commit *commit)
+{
+	uint32_t commit_pos, xor_row;
+	uint64_t offset;
+	int flags;
+	struct bitmap_lookup_table_triplet triplet;
+	struct object_id *oid = &commit->object.oid;
+	struct ewah_bitmap *bitmap;
+	struct stored_bitmap *xor_bitmap = NULL;
+	const int bitmap_header_size = 6;
+	static struct bitmap_lookup_table_xor_item *xor_items = NULL;
+	static size_t xor_items_nr = 0, xor_items_alloc = 0;
+	static int is_corrupt = 0;
+	int xor_flags;
+	khiter_t hash_pos;
+	struct bitmap_lookup_table_xor_item *xor_item;
+
+	if (is_corrupt)
+		return NULL;
+
+	if (!bitmap_bsearch_pos(bitmap_git, oid, &commit_pos))
+		return NULL;
+
+	if (bitmap_bsearch_triplet_by_pos(commit_pos, bitmap_git, &triplet) < 0)
+		return NULL;
+
+	xor_items_nr = 0;
+	offset = triplet.offset;
+	xor_row = triplet.xor_row;
+
+	while (xor_row != 0xffffffff) {
+		ALLOC_GROW(xor_items, xor_items_nr + 1, xor_items_alloc);
+
+		if (xor_items_nr + 1 >= bitmap_git->entry_count) {
+			error(_("corrupt bitmap lookup table: xor chain exceed entry count"));
+			goto corrupt;
+		}
+
+		if (bitmap_lookup_table_get_triplet(bitmap_git, xor_row, &triplet) < 0)
+			goto corrupt;
+
+		xor_item = &xor_items[xor_items_nr];
+		xor_item->offset = triplet.offset;
+
+		if (nth_bitmap_object_oid(bitmap_git, &xor_item->oid, triplet.commit_pos) < 0) {
+			error(_("corrupt bitmap lookup table: commit index %u out of range"),
+				triplet.commit_pos);
+			goto corrupt;
+		}
+
+		hash_pos = kh_get_oid_map(bitmap_git->bitmaps, xor_item->oid);
+
+		/*
+		 * If desired bitmap is already stored, we don't need
+		 * to iterate further. Because we know that bitmaps
+		 * that are needed to be parsed to parse this bitmap
+		 * has already been stored. So, assign this stored bitmap
+		 * to the xor_bitmap.
+		 */
+		if (hash_pos < kh_end(bitmap_git->bitmaps) &&
+			(xor_bitmap = kh_value(bitmap_git->bitmaps, hash_pos)))
+			break;
+		xor_items_nr++;
+		xor_row = triplet.xor_row;
+	}
+
+	while (xor_items_nr) {
+		xor_item = &xor_items[xor_items_nr - 1];
+		bitmap_git->map_pos = xor_item->offset;
+		if (bitmap_git->map_size - bitmap_git->map_pos < bitmap_header_size) {
+			error(_("corrupt ewah bitmap: truncated header for bitmap of commit \"%s\""),
+				oid_to_hex(&xor_item->oid));
+			goto corrupt;
+		}
+
+		bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t);
+		xor_flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
+		bitmap = read_bitmap_1(bitmap_git);
+
+		if (!bitmap)
+			goto corrupt;
+
+		xor_bitmap = store_bitmap(bitmap_git, bitmap, &xor_item->oid, xor_bitmap, xor_flags);
+		xor_items_nr--;
+	}
+
+	bitmap_git->map_pos = offset;
+	if (bitmap_git->map_size - bitmap_git->map_pos < bitmap_header_size) {
+		error(_("corrupt ewah bitmap: truncated header for bitmap of commit \"%s\""),
+			oid_to_hex(oid));
+		goto corrupt;
+	}
+
+	/*
+	 * Don't bother reading the commit's index position or its xor
+	 * offset:
+	 *
+	 *   - The commit's index position is irrelevant to us, since
+	 *     load_bitmap_entries_v1 only uses it to learn the object
+	 *     id which is used to compute the hashmap's key. We already
+	 *     have an object id, so no need to look it up again.
+	 *
+	 *   - The xor_offset is unusable for us, since it specifies how
+	 *     many entries previous to ours we should look at. This
+	 *     makes sense when reading the bitmaps sequentially (as in
+	 *     load_bitmap_entries_v1()), since we can keep track of
+	 *     each bitmap as we read them.
+	 *
+	 *     But it can't work for us, since the bitmap's don't have a
+	 *     fixed size. So we learn the position of the xor'd bitmap
+	 *     from the commit table (and resolve it to a bitmap in the
+	 *     above if-statement).
+	 *
+	 * Instead, we can skip ahead and immediately read the flags and
+	 * ewah bitmap.
+	 */
+	bitmap_git->map_pos += sizeof(uint32_t) + sizeof(uint8_t);
+	flags = read_u8(bitmap_git->map, &bitmap_git->map_pos);
+	bitmap = read_bitmap_1(bitmap_git);
+
+	if (!bitmap)
+		goto corrupt;
+
+	return store_bitmap(bitmap_git, bitmap, oid, xor_bitmap, flags);
+
+corrupt:
+	free(xor_items);
+	is_corrupt = 1;
+	return NULL;
+}
+
 struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
 				      struct commit *commit)
 {
 	khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps,
 					   commit->object.oid);
-	if (hash_pos >= kh_end(bitmap_git->bitmaps))
-		return NULL;
+	if (hash_pos >= kh_end(bitmap_git->bitmaps)) {
+		struct stored_bitmap *bitmap = NULL;
+		if (!bitmap_git->table_lookup)
+			return NULL;
+
+		trace2_region_enter("pack-bitmap", "reading_lookup_table", the_repository);
+		/* NEEDSWORK: cache misses aren't recorded */
+		bitmap = lazy_bitmap_for_commit(bitmap_git, commit);
+		trace2_region_leave("pack-bitmap", "reading_lookup_table", the_repository);
+		if (!bitmap)
+			return NULL;
+		return lookup_stored_bitmap(bitmap);
+	}
 	return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos));
 }
 
@@ -1712,8 +1973,10 @@ void test_bitmap_walk(struct rev_info *revs)
 	if (revs->pending.nr != 1)
 		die(_("you must specify exactly one commit to test"));
 
-	fprintf_ln(stderr, "Bitmap v%d test (%d entries loaded)",
-		bitmap_git->version, bitmap_git->entry_count);
+	fprintf_ln(stderr, "Bitmap v%d test (%d entries%s)",
+		bitmap_git->version,
+		bitmap_git->entry_count,
+		bitmap_git->table_lookup ? "" : " loaded");
 
 	root = revs->pending.objects[0].item;
 	bm = bitmap_for_commit(bitmap_git, (struct commit *)root);
@@ -1766,13 +2029,22 @@ void test_bitmap_walk(struct rev_info *revs)
 
 int test_bitmap_commits(struct repository *r)
 {
-	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
 	struct object_id oid;
 	MAYBE_UNUSED void *value;
+	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
 
 	if (!bitmap_git)
 		die(_("failed to load bitmap indexes"));
 
+	/*
+	 * As this function is only used to print bitmap selected
+	 * commits, we don't have to read the commit table.
+	 */
+	if (bitmap_git->table_lookup) {
+		if (load_bitmap_entries_v1(bitmap_git) < 0)
+			die(_("failed to load bitmap indexes"));
+	}
+
 	kh_foreach(bitmap_git->bitmaps, oid, value, {
 		printf_ln("%s", oid_to_hex(&oid));
 	});
diff --git a/pack-bitmap.h b/pack-bitmap.h
index cb065a263cb..f0180b5276b 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -23,6 +23,15 @@ struct bitmap_disk_header {
 
 #define NEEDS_BITMAP (1u<<22)
 
+/*
+ * The width in bytes of a single triplet in the lookup table
+ * extension:
+ *     (commit_pos, offset, xor_row)
+ *
+ * whose fields ar 32-, 64-, 32- bits wide, respectively.
+ */
+#define BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH (16)
+
 enum pack_bitmap_opts {
 	BITMAP_OPT_FULL_DAG = 0x1,
 	BITMAP_OPT_HASH_CACHE = 0x4,
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index c0607172827..7e50f8e7653 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -258,6 +258,7 @@ test_bitmap_cases () {
 
 	test_expect_success 'truncated bitmap fails gracefully (ewah)' '
 		test_config pack.writebitmaphashcache false &&
+		test_config pack.writebitmaplookuptable false &&
 		git repack -ad &&
 		git rev-list --use-bitmap-index --count --all >expect &&
 		bitmap=$(ls .git/objects/pack/*.bitmap) &&
@@ -270,6 +271,7 @@ test_bitmap_cases () {
 	'
 
 	test_expect_success 'truncated bitmap fails gracefully (cache)' '
+		git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
 		git repack -ad &&
 		git rev-list --use-bitmap-index --count --all >expect &&
 		bitmap=$(ls .git/objects/pack/*.bitmap) &&
@@ -453,4 +455,24 @@ test_expect_success 'verify writing bitmap lookup table when enabled' '
 	grep "\"label\":\"writing_lookup_table\"" trace2
 '
 
+test_expect_success 'lookup table is actually used to traverse objects' '
+	git repack -adb &&
+	GIT_TRACE2_EVENT="$(pwd)/trace3" \
+		git rev-list --use-bitmap-index --count --all &&
+	grep "\"label\":\"reading_lookup_table\"" trace3
+'
+
+test_expect_success 'truncated bitmap fails gracefully (lookup table)' '
+	test_config pack.writebitmaphashcache false &&
+	git repack -adb &&
+	git rev-list --use-bitmap-index --count --all >expect &&
+	bitmap=$(ls .git/objects/pack/*.bitmap) &&
+	test_when_finished "rm -f $bitmap" &&
+	test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
+	mv -f $bitmap.tmp $bitmap &&
+	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
+	test_cmp expect actual &&
+	test_i18ngrep corrupted.bitmap.index stderr
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [PATCH v6 6/6] bitmap-lookup-table: add performance tests for lookup table
  2022-08-14 16:55         ` [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                             ` (4 preceding siblings ...)
  2022-08-14 16:55           ` [PATCH v6 5/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
@ 2022-08-14 16:55           ` Abhradeep Chakraborty via GitGitGadget
  2022-08-19 21:21           ` [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Junio C Hamano
  2022-08-25 22:16           ` Taylor Blau
  7 siblings, 0 replies; 162+ messages in thread
From: Abhradeep Chakraborty via GitGitGadget @ 2022-08-14 16:55 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Kaartic Sivaram, Junio C Hamano, Derrick Stolee,
	Philip Oakley, Martin Ågren,
	Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Johannes Schindelin, Abhradeep Chakraborty,
	Abhradeep Chakraborty

From: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>

Add performance tests to verify the performance of lookup table.
`p5310-pack-bitmaps.sh` contain tests with and without lookup table.
`p5312-pack-bitmaps-revs.sh` contain same tests with and without
lookup table but with `pack.writeReverseIndex` enabled.

Lookup table makes Git run faster in most of the cases. Below is the
result of `t/perf/p5310-pack-bitmaps.sh`.`perf/p5326-multi-pack-bitmaps.sh`
gives similar result. The repository used in the test is linux kernel.

Test                                                    this tree
-----------------------------------------------------------------------
5310.4: enable lookup table: false                    0.01(0.00+0.00)
5310.5: repack to disk                                320.89(230.20+23.45)
5310.6: simulated clone                               14.04(5.78+1.79)
5310.7: simulated fetch                               1.95(3.05+0.20)
5310.8: pack to file (bitmap)                         44.73(20.55+7.45)
5310.9: rev-list (commits)                            0.78(0.46+0.10)
5310.10: rev-list (objects)                           4.07(3.97+0.08)
5310.11: rev-list with tag negated via --not          0.06(0.02+0.03)
         --all (objects)
5310.12: rev-list with negative tag (objects)         0.21(0.15+0.05)
5310.13: rev-list count with blob:none                0.24(0.17+0.06)
5310.14: rev-list count with blob:limit=1k            7.07(5.92+0.48)
5310.15: rev-list count with tree:0                   0.25(0.17+0.07)
5310.16: simulated partial clone                      5.67(3.28+0.64)
5310.18: clone (partial bitmap)                       16.05(8.34+1.86)
5310.19: pack to file (partial bitmap)                59.76(27.22+7.43)
5310.20: rev-list with tree filter (partial bitmap)   0.90(0.18+0.16)
5310.24: enable lookup table: true                    0.01(0.00+0.00)
5310.25: repack to disk                               319.73(229.30+23.01)
5310.26: simulated clone                              13.69(5.72+1.78)
5310.27: simulated fetch                              1.84(3.02+0.16)
5310.28: pack to file (bitmap)                        45.63(20.67+7.50)
5310.29: rev-list (commits)                           0.56(0.39+0.8)
5310.30: rev-list (objects)                           3.77(3.74+0.08)
5310.31: rev-list with tag negated via --not          0.05(0.02+0.03)
         --all (objects)
5310.32: rev-list with negative tag (objects)         0.21(0.15+0.05)
5310.33: rev-list count with blob:none                0.23(0.17+0.05)
5310.34: rev-list count with blob:limit=1k            6.65(5.72+0.40)
5310.35: rev-list count with tree:0                   0.23(0.16+0.06)
5310.36: simulated partial clone                      5.57(3.26+0.59)
5310.38: clone (partial bitmap)                       15.89(8.39+1.84)
5310.39: pack to file (partial bitmap)                58.32(27.55+7.47)
5310.40: rev-list with tree filter (partial bitmap)   0.73(0.18+0.15)

Test 4-15 are tested without using lookup table. Same tests are
repeated in 16-30 (using lookup table).

Mentored-by: Taylor Blau <me@ttaylorr.com>
Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
---
 t/perf/lib-bitmap.sh               |  31 +++++++++
 t/perf/p5310-pack-bitmaps.sh       |  78 +++++++++-------------
 t/perf/p5311-pack-bitmaps-fetch.sh |  74 ++++++++++++---------
 t/perf/p5312-pack-bitmaps-revs.sh  |  35 ++++++++++
 t/perf/p5326-multi-pack-bitmaps.sh | 103 +++++++++++++++++------------
 5 files changed, 199 insertions(+), 122 deletions(-)
 create mode 100755 t/perf/p5312-pack-bitmaps-revs.sh

diff --git a/t/perf/lib-bitmap.sh b/t/perf/lib-bitmap.sh
index 63d3bc7cece..55a8feb1dc4 100644
--- a/t/perf/lib-bitmap.sh
+++ b/t/perf/lib-bitmap.sh
@@ -67,3 +67,34 @@ test_partial_bitmap () {
 			--filter=tree:0 >/dev/null
 	'
 }
+
+test_pack_bitmap () {
+	test_perf "repack to disk" '
+		git repack -ad
+	'
+
+	test_full_bitmap
+
+	test_expect_success "create partial bitmap state" '
+		# pick a commit to represent the repo tip in the past
+		cutoff=$(git rev-list HEAD~100 -1) &&
+		orig_tip=$(git rev-parse HEAD) &&
+
+		# now kill off all of the refs and pretend we had
+		# just the one tip
+		rm -rf .git/logs .git/refs/* .git/packed-refs &&
+		git update-ref HEAD $cutoff &&
+
+		# and then repack, which will leave us with a nice
+		# big bitmap pack of the "old" history, and all of
+		# the new history will be loose, as if it had been pushed
+		# up incrementally and exploded via unpack-objects
+		git repack -Ad &&
+
+		# and now restore our original tip, as if the pushes
+		# had happened
+		git update-ref HEAD $orig_tip
+	'
+
+	test_partial_bitmap
+}
diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index 7ad4f237bc3..b1399f1007e 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -4,51 +4,37 @@ test_description='Tests pack performance using bitmaps'
 . ./perf-lib.sh
 . "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
 
-test_perf_large_repo
-
-# note that we do everything through config,
-# since we want to be able to compare bitmap-aware
-# git versus non-bitmap git
-#
-# We intentionally use the deprecated pack.writebitmaps
-# config so that we can test against older versions of git.
-test_expect_success 'setup bitmap config' '
-	git config pack.writebitmaps true
-'
-
-# we need to create the tag up front such that it is covered by the repack and
-# thus by generated bitmaps.
-test_expect_success 'create tags' '
-	git tag --message="tag pointing to HEAD" perf-tag HEAD
-'
-
-test_perf 'repack to disk' '
-	git repack -ad
-'
-
-test_full_bitmap
-
-test_expect_success 'create partial bitmap state' '
-	# pick a commit to represent the repo tip in the past
-	cutoff=$(git rev-list HEAD~100 -1) &&
-	orig_tip=$(git rev-parse HEAD) &&
-
-	# now kill off all of the refs and pretend we had
-	# just the one tip
-	rm -rf .git/logs .git/refs/* .git/packed-refs &&
-	git update-ref HEAD $cutoff &&
-
-	# and then repack, which will leave us with a nice
-	# big bitmap pack of the "old" history, and all of
-	# the new history will be loose, as if it had been pushed
-	# up incrementally and exploded via unpack-objects
-	git repack -Ad &&
-
-	# and now restore our original tip, as if the pushes
-	# had happened
-	git update-ref HEAD $orig_tip
-'
-
-test_partial_bitmap
+test_lookup_pack_bitmap () {
+	test_expect_success 'start the test from scratch' '
+		rm -rf * .git
+	'
+
+	test_perf_large_repo
+
+	# note that we do everything through config,
+	# since we want to be able to compare bitmap-aware
+	# git versus non-bitmap git
+	#
+	# We intentionally use the deprecated pack.writebitmaps
+	# config so that we can test against older versions of git.
+	test_expect_success 'setup bitmap config' '
+		git config pack.writebitmaps true
+	'
+
+	# we need to create the tag up front such that it is covered by the repack and
+	# thus by generated bitmaps.
+	test_expect_success 'create tags' '
+		git tag --message="tag pointing to HEAD" perf-tag HEAD
+	'
+
+	test_perf "enable lookup table: $1" '
+		git config pack.writeBitmapLookupTable '"$1"'
+	'
+
+	test_pack_bitmap
+}
+
+test_lookup_pack_bitmap false
+test_lookup_pack_bitmap true
 
 test_done
diff --git a/t/perf/p5311-pack-bitmaps-fetch.sh b/t/perf/p5311-pack-bitmaps-fetch.sh
index 47c3fd7581c..426fab87e32 100755
--- a/t/perf/p5311-pack-bitmaps-fetch.sh
+++ b/t/perf/p5311-pack-bitmaps-fetch.sh
@@ -3,42 +3,52 @@
 test_description='performance of fetches from bitmapped packs'
 . ./perf-lib.sh
 
-test_perf_default_repo
-
-test_expect_success 'create bitmapped server repo' '
-	git config pack.writebitmaps true &&
-	git repack -ad
-'
-
-# simulate a fetch from a repository that last fetched N days ago, for
-# various values of N. We do so by following the first-parent chain,
-# and assume the first entry in the chain that is N days older than the current
-# HEAD is where the HEAD would have been then.
-for days in 1 2 4 8 16 32 64 128; do
-	title=$(printf '%10s' "($days days)")
-	test_expect_success "setup revs from $days days ago" '
-		now=$(git log -1 --format=%ct HEAD) &&
-		then=$(($now - ($days * 86400))) &&
-		tip=$(git rev-list -1 --first-parent --until=$then HEAD) &&
-		{
-			echo HEAD &&
-			echo ^$tip
-		} >revs
+test_fetch_bitmaps () {
+	test_expect_success 'setup test directory' '
+		rm -fr * .git
 	'
 
-	test_perf "server $title" '
-		git pack-objects --stdout --revs \
-				 --thin --delta-base-offset \
-				 <revs >tmp.pack
-	'
+	test_perf_default_repo
 
-	test_size "size   $title" '
-		wc -c <tmp.pack
+	test_expect_success 'create bitmapped server repo' '
+		git config pack.writebitmaps true &&
+		git config pack.writeBitmapLookupTable '"$1"' &&
+		git repack -ad
 	'
 
-	test_perf "client $title" '
-		git index-pack --stdin --fix-thin <tmp.pack
-	'
-done
+	# simulate a fetch from a repository that last fetched N days ago, for
+	# various values of N. We do so by following the first-parent chain,
+	# and assume the first entry in the chain that is N days older than the current
+	# HEAD is where the HEAD would have been then.
+	for days in 1 2 4 8 16 32 64 128; do
+		title=$(printf '%10s' "($days days)")
+		test_expect_success "setup revs from $days days ago" '
+			now=$(git log -1 --format=%ct HEAD) &&
+			then=$(($now - ($days * 86400))) &&
+			tip=$(git rev-list -1 --first-parent --until=$then HEAD) &&
+			{
+				echo HEAD &&
+				echo ^$tip
+			} >revs
+		'
+
+		test_perf "server $title (lookup=$1)" '
+			git pack-objects --stdout --revs \
+					--thin --delta-base-offset \
+					<revs >tmp.pack
+		'
+
+		test_size "size   $title" '
+			wc -c <tmp.pack
+		'
+
+		test_perf "client $title (lookup=$1)" '
+			git index-pack --stdin --fix-thin <tmp.pack
+		'
+	done
+}
+
+test_fetch_bitmaps true
+test_fetch_bitmaps false
 
 test_done
diff --git a/t/perf/p5312-pack-bitmaps-revs.sh b/t/perf/p5312-pack-bitmaps-revs.sh
new file mode 100755
index 00000000000..0684b690af0
--- /dev/null
+++ b/t/perf/p5312-pack-bitmaps-revs.sh
@@ -0,0 +1,35 @@
+#!/bin/sh
+
+test_description='Tests pack performance using bitmaps (rev index enabled)'
+. ./perf-lib.sh
+. "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
+
+test_lookup_pack_bitmap () {
+	test_expect_success 'start the test from scratch' '
+		rm -rf * .git
+	'
+
+	test_perf_large_repo
+
+	test_expect_success 'setup bitmap config' '
+		git config pack.writebitmaps true &&
+		git config pack.writeReverseIndex true
+	'
+
+	# we need to create the tag up front such that it is covered by the repack and
+	# thus by generated bitmaps.
+	test_expect_success 'create tags' '
+		git tag --message="tag pointing to HEAD" perf-tag HEAD
+	'
+
+	test_perf "enable lookup table: $1" '
+		git config pack.writeBitmapLookupTable '"$1"'
+	'
+
+	test_pack_bitmap
+}
+
+test_lookup_pack_bitmap false
+test_lookup_pack_bitmap true
+
+test_done
diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
index f2fa228f16a..d082e6cacbe 100755
--- a/t/perf/p5326-multi-pack-bitmaps.sh
+++ b/t/perf/p5326-multi-pack-bitmaps.sh
@@ -4,49 +4,64 @@ test_description='Tests performance using midx bitmaps'
 . ./perf-lib.sh
 . "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
 
-test_perf_large_repo
-
-# we need to create the tag up front such that it is covered by the repack and
-# thus by generated bitmaps.
-test_expect_success 'create tags' '
-	git tag --message="tag pointing to HEAD" perf-tag HEAD
-'
-
-test_expect_success 'start with bitmapped pack' '
-	git repack -adb
-'
-
-test_perf 'setup multi-pack index' '
-	git multi-pack-index write --bitmap
-'
-
-test_expect_success 'drop pack bitmap' '
-	rm -f .git/objects/pack/pack-*.bitmap
-'
-
-test_full_bitmap
-
-test_expect_success 'create partial bitmap state' '
-	# pick a commit to represent the repo tip in the past
-	cutoff=$(git rev-list HEAD~100 -1) &&
-	orig_tip=$(git rev-parse HEAD) &&
-
-	# now pretend we have just one tip
-	rm -rf .git/logs .git/refs/* .git/packed-refs &&
-	git update-ref HEAD $cutoff &&
-
-	# and then repack, which will leave us with a nice
-	# big bitmap pack of the "old" history, and all of
-	# the new history will be loose, as if it had been pushed
-	# up incrementally and exploded via unpack-objects
-	git repack -Ad &&
-	git multi-pack-index write --bitmap &&
-
-	# and now restore our original tip, as if the pushes
-	# had happened
-	git update-ref HEAD $orig_tip
-'
-
-test_partial_bitmap
+test_bitmap () {
+	local enabled="$1"
+
+	test_expect_success "remove existing repo (lookup=$enabled)" '
+		rm -fr * .git
+	'
+
+	test_perf_large_repo
+
+	# we need to create the tag up front such that it is covered by the repack and
+	# thus by generated bitmaps.
+	test_expect_success 'create tags' '
+		git tag --message="tag pointing to HEAD" perf-tag HEAD
+	'
+
+	test_expect_success "use lookup table: $enabled" '
+		git config pack.writeBitmapLookupTable '"$enabled"'
+	'
+
+	test_expect_success "start with bitmapped pack (lookup=$enabled)" '
+		git repack -adb
+	'
+
+	test_perf "setup multi-pack index (lookup=$enabled)" '
+		git multi-pack-index write --bitmap
+	'
+
+	test_expect_success "drop pack bitmap (lookup=$enabled)" '
+		rm -f .git/objects/pack/pack-*.bitmap
+	'
+
+	test_full_bitmap
+
+	test_expect_success "create partial bitmap state (lookup=$enabled)" '
+		# pick a commit to represent the repo tip in the past
+		cutoff=$(git rev-list HEAD~100 -1) &&
+		orig_tip=$(git rev-parse HEAD) &&
+
+		# now pretend we have just one tip
+		rm -rf .git/logs .git/refs/* .git/packed-refs &&
+		git update-ref HEAD $cutoff &&
+
+		# and then repack, which will leave us with a nice
+		# big bitmap pack of the "old" history, and all of
+		# the new history will be loose, as if it had been pushed
+		# up incrementally and exploded via unpack-objects
+		git repack -Ad &&
+		git multi-pack-index write --bitmap &&
+
+		# and now restore our original tip, as if the pushes
+		# had happened
+		git update-ref HEAD $orig_tip
+	'
+
+	test_partial_bitmap
+}
+
+test_bitmap false
+test_bitmap true
 
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-08-10  9:09                           ` Johannes Schindelin
  2022-08-10  9:20                             ` Johannes Schindelin
@ 2022-08-16 18:47                             ` Taylor Blau
  1 sibling, 0 replies; 162+ messages in thread
From: Taylor Blau @ 2022-08-16 18:47 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Abhradeep Chakraborty, Abhradeep Chakraborty via GitGitGadget,
	git, Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren

On Wed, Aug 10, 2022 at 11:09:40AM +0200, Johannes Schindelin wrote:
> Hi Abhradeep,
>
> On Tue, 9 Aug 2022, Abhradeep Chakraborty wrote:
>
> >  I noticed in the 'setup partial bitmaps' test case that if we comment
> > out the line `git repack &&` , it runs successfully.
> >
> >     test_expect_success 'setup partial bitmaps' '
> >         test_commit packed &&
> >         # git repack &&
> >         test_commit loose &&
> >         git multi-pack-index write --bitmap 2>err &&
> >         ...
> >     '
>
> That's interesting. Are the `.bitmap` and `.midx` files updated as part of
> that `repack`?

They aren't. You can cause a MIDX / bitmap to be updated during `git
repack` provided that the flags `--write-midx` and
`--write-bitmap-index` are given to `repack`.

But the point of that `git repack` in this test case specifically is to
ensure that the commit generated on the previous line is included in a
new pack, and that that pack makes its way into the MIDX.

So removing that invocation of `git repack` means that the set of packs
would be unchanged, and the `git multi-pack-index write --bitmap` would
be a noop. That should rule out the theory that the existing MIDX is
broken, since without the `git repack`, we'd be using that MIDX in
subsequent tests (which pass).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-08-13 10:59                                       ` Abhradeep Chakraborty
@ 2022-08-16 21:57                                         ` Taylor Blau
  2022-08-17 10:02                                           ` Abhradeep Chakraborty
  0 siblings, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-08-16 21:57 UTC (permalink / raw)
  To: Abhradeep Chakraborty
  Cc: Derrick Stolee, Johannes Schindelin,
	Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaram,
	Philip Oakley, Martin Ågren

Hi Abhradeep,

On Sat, Aug 13, 2022 at 04:29:32PM +0530, Abhradeep Chakraborty wrote:
> One thing that really worries me is what if the failure is not related
> to calling `oe_map_new_pack()? I did all my work assuming that this
> function is the culprit. But I don't know if it is.

After much consternation, I was able to rule out `oe_map_new_pack()` as
the culprit.

(Your find that we call `add_packed_git()` with arguments corresponding
to pack(s) that we've already loaded is good, and I think that is
definitely something we can and should consider cleaning up. But it
ultimately doesn't affect correctness, just the memory efficiency of the
process).

When I took a close look at the process to generate MIDX bitmaps, I found a
couple of interesting things. The first more trivial fix is that we
incorrectly propagate the "preferred"-ness bit from packs in an existing
MIDX when generating a new one. If the identity of the preferred pack
changes, we should not drag forward those bits on objects already known
(and preferred) by the existing MIDX:

--- >8 ---

diff --git a/midx.c b/midx.c
index 3ff6e91e6e..40e520534c 100644
--- a/midx.c
+++ b/midx.c
@@ -619,6 +619,9 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,

 		if (m) {
 			uint32_t start = 0, end;
+			int orig_preferred_pack = -1;
+			if (0 <= preferred_pack && preferred_pack < m->num_packs)
+				orig_preferred_pack = info[preferred_pack].orig_pack_int_id;

 			if (cur_fanout)
 				start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
@@ -629,7 +632,7 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 				nth_midxed_pack_midx_entry(m,
 							   &entries_by_fanout[nr_fanout],
 							   cur_object);
-				if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
+				if (nth_midxed_pack_int_id(m, cur_object) == orig_preferred_pack)
 					entries_by_fanout[nr_fanout].preferred = 1;
 				else
 					entries_by_fanout[nr_fanout].preferred = 0;

--- 8< ---

But a more interesting problem arose when I took a closer look at the
psuedo-pack order of objects generated according to
`prepare_midx_packing_data()`. With Johannes' fixed $test_tick value, I
was able to see the following in runs that succeeded:

    27bb4ecd3e96cd0b3bc37d92a78cb5cbf34c418afa67f74cc52517ff7df418e1 (12 in pack-63c460f99a5c08f631396b1828c64006170a9d543b064506fd11b504a62acf52.idx)
    c68154d69c19f010afce786c6debe926ae6e7decfb946a4549085a792cf9de7e (202 in pack-63c460f99a5c08f631396b1828c64006170a9d543b064506fd11b504a62acf52.idx)
    a0b85b314ede46aa9f9b5796a284a4cf0b86ebb8fa32f87ae246e21b5378b11c (392 in pack-63c460f99a5c08f631396b1828c64006170a9d543b064506fd11b504a62acf52.idx)
    [...]

and the following in runs that failed:

    46193a971f5045cb3ca6022957541f9ccddfbfe78591d8506e2d952f8113059b (221 in pack-3fc052de674e3d48096af7cc5125675c0ae1082aa798eb9358de357b2655f9ad.idx)
    67df8a01ac84cf5f028855c48384eac3336bb02a52603bac285c4b31d66b3ab5 (12 in pack-2021cdedb33b542b244eacf3d009d1384471a53286b0c1235c91d124355dc818.idx)
    1556b5f0ad7cb0c25a1fc47355fcffc00775e90d94ae8c511e5776b204796ce6 (200 in pack-2021cdedb33b542b244eacf3d009d1384471a53286b0c1235c91d124355dc818.idx)

In the successful case, pack 63c460f99a... is preferred, and its objects
appear in ascending order of their pack offsets. But in the other case,
pack 3fc052de67... is preferred, but its first object starts at offset
221. Huh? That's not right:

    $ git show-index <.git/objects/pack/pack-3fc052de674e3d48096af7cc5125675c0ae1082aa798eb9358de357b2655f9ad.idx
    221 46193a971f5045cb3ca6022957541f9ccddfbfe78591d8506e2d952f8113059b (1f4bd28e)
    12 4d332072f161629ffe4652ecd3ce377ef88447bec73f05ab0f3515f98bd061cf (fadf885b)

Indeed, there is another object there at offset 12. Missing that object
(since it comes from a preferred pack) is an invariant violation (since
all objects from the preferred pack should be selected when multiple
copies are available).

It's missing because the existing MIDX selects that object from a
different pack, and when we get to fanout 0x4d (the one which should
include that object), we skip over seeing its copy in the preferred pack
because that pack already appears in the existing MIDX, though it wasn't
preferred.

I think there are a couple of ways to fix this. The easiest thing to do
would be to force the identity of the preferred pack to be the same when
generating a MIDX bitmap *while reusing an existing MIDX*, since that is
the only time this bug can happen.

But that's a little magical for my taste. I think a more reasonable fix
would be to include copies of all objects from the preferred pack
including in the case where that pack was non-preferred in an existing
MIDX and at least one object in that pack was selected from a different
pack in the existing MIDX.

Abhradeep -- let me know if this is something you want to look into. I
think it's a very worthwhile bug to fix, since it is definitely
trigger-able in the wild (notably, only with `git multi-pack-index write
--bitmap` without `--stdin-packs` and only under certain circumstances),
and not just limited to SHA-256 mode.

If you are busy experimenting with CRoaring, that's no problem and I can
fix this up, too. Either way, it would be worth you and others weighing
in on which fix you think is worth pursuing.

Phew.

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-08-16 21:57                                         ` Taylor Blau
@ 2022-08-17 10:02                                           ` Abhradeep Chakraborty
  2022-08-17 20:38                                             ` Taylor Blau
  0 siblings, 1 reply; 162+ messages in thread
From: Abhradeep Chakraborty @ 2022-08-17 10:02 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Derrick Stolee, Johannes Schindelin,
	Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaram,
	Philip Oakley, Martin Ågren

Hello Taylor, extremely thanks for finding the reason for this failure.

On Wed, Aug 17, 2022 at 3:28 AM Taylor Blau <me@ttaylorr.com> wrote:
>
> Hi Abhradeep,
>
> When I took a close look at the process to generate MIDX bitmaps, I found a
> couple of interesting things. The first more trivial fix is that we
> incorrectly propagate the "preferred"-ness bit from packs in an existing
> MIDX when generating a new one. If the identity of the preferred pack
> changes, we should not drag forward those bits on objects already known
> (and preferred) by the existing MIDX:
>
> --- >8 ---
>
> diff --git a/midx.c b/midx.c
> index 3ff6e91e6e..40e520534c 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -619,6 +619,9 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
>
>                 if (m) {
>                         uint32_t start = 0, end;
> +                       int orig_preferred_pack = -1;
> +                       if (0 <= preferred_pack && preferred_pack < m->num_packs)
> +                               orig_preferred_pack = info[preferred_pack].orig_pack_int_id;
>
>                         if (cur_fanout)
>                                 start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
> @@ -629,7 +632,7 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
>                                 nth_midxed_pack_midx_entry(m,
>                                                            &entries_by_fanout[nr_fanout],
>                                                            cur_object);
> -                               if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
> +                               if (nth_midxed_pack_int_id(m, cur_object) == orig_preferred_pack)
>                                         entries_by_fanout[nr_fanout].preferred = 1;
>                                 else
>                                         entries_by_fanout[nr_fanout].preferred = 0;
>
> --- 8< ---

I am not able to understand this modification.
`info[preferred_pack].orig_pack_int_id` and `preferred_pack` have the
same value, right? I see `ctx.info` getting sorted only after calling
`get_sorted_entries()` function.

> But a more interesting problem arose when I took a closer look at the
> psuedo-pack order of objects generated according to
> `prepare_midx_packing_data()`. With Johannes' fixed $test_tick value, I
> was able to see the following in runs that succeeded:
>
>     27bb4ecd3e96cd0b3bc37d92a78cb5cbf34c418afa67f74cc52517ff7df418e1 (12 in pack-63c460f99a5c08f631396b1828c64006170a9d543b064506fd11b504a62acf52.idx)
>     c68154d69c19f010afce786c6debe926ae6e7decfb946a4549085a792cf9de7e (202 in pack-63c460f99a5c08f631396b1828c64006170a9d543b064506fd11b504a62acf52.idx)
>     a0b85b314ede46aa9f9b5796a284a4cf0b86ebb8fa32f87ae246e21b5378b11c (392 in pack-63c460f99a5c08f631396b1828c64006170a9d543b064506fd11b504a62acf52.idx)
>     [...]
>
> and the following in runs that failed:
>
>     46193a971f5045cb3ca6022957541f9ccddfbfe78591d8506e2d952f8113059b (221 in pack-3fc052de674e3d48096af7cc5125675c0ae1082aa798eb9358de357b2655f9ad.idx)
>     67df8a01ac84cf5f028855c48384eac3336bb02a52603bac285c4b31d66b3ab5 (12 in pack-2021cdedb33b542b244eacf3d009d1384471a53286b0c1235c91d124355dc818.idx)
>     1556b5f0ad7cb0c25a1fc47355fcffc00775e90d94ae8c511e5776b204796ce6 (200 in pack-2021cdedb33b542b244eacf3d009d1384471a53286b0c1235c91d124355dc818.idx)
>
> In the successful case, pack 63c460f99a... is preferred, and its objects
> appear in ascending order of their pack offsets. But in the other case,
> pack 3fc052de67... is preferred, but its first object starts at offset
> 221. Huh? That's not right:
>
>     $ git show-index <.git/objects/pack/pack-3fc052de674e3d48096af7cc5125675c0ae1082aa798eb9358de357b2655f9ad.idx
>     221 46193a971f5045cb3ca6022957541f9ccddfbfe78591d8506e2d952f8113059b (1f4bd28e)
>     12 4d332072f161629ffe4652ecd3ce377ef88447bec73f05ab0f3515f98bd061cf (fadf885b)
>
> Indeed, there is another object there at offset 12. Missing that object
> (since it comes from a preferred pack) is an invariant violation (since
> all objects from the preferred pack should be selected when multiple
> copies are available).
>
> It's missing because the existing MIDX selects that object from a
> different pack, and when we get to fanout 0x4d (the one which should
> include that object), we skip over seeing its copy in the preferred pack
> because that pack already appears in the existing MIDX, though it wasn't
> preferred.

ahh, now I understand what the problem was actually. Thanks :)

> I think there are a couple of ways to fix this. The easiest thing to do
> would be to force the identity of the preferred pack to be the same when
> generating a MIDX bitmap *while reusing an existing MIDX*, since that is
> the only time this bug can happen.
>
> But that's a little magical for my taste. I think a more reasonable fix
> would be to include copies of all objects from the preferred pack
> including in the case where that pack was non-preferred in an existing
> MIDX and at least one object in that pack was selected from a different
> pack in the existing MIDX.

I think the later approach makes the most sense to me. It might not be
a good idea to keep the same pack as `preferred` as a better candidate
would be ignored in that case.

> Abhradeep -- let me know if this is something you want to look into. I
> think it's a very worthwhile bug to fix, since it is definitely
> trigger-able in the wild (notably, only with `git multi-pack-index write
> --bitmap` without `--stdin-packs` and only under certain circumstances),
> and not just limited to SHA-256 mode.
>
> If you are busy experimenting with CRoaring, that's no problem and I can
> fix this up, too. Either way, it would be worth you and others weighing
> in on which fix you think is worth pursuing.

I will be happy to fix it but I can't work on it right now (neither on
CRoaring) because I am currently preparing for my exam. I can continue
my work after that (i.e. from 19 aug). If you feel it is getting too
late then you can do this too. I am also thinking of  writing a patch
for bitmap specific test dump tool (as Johannes proposed previously).

My exam dates are 18 Aug, 31 Aug, 1 Sep, 2 Sep and 3 Sep (I know the
dates are weird) The dates are adjusted on request for Smart India
Hackathon ( 24 Aug - 27 Aug).

Thanks :)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-08-17 10:02                                           ` Abhradeep Chakraborty
@ 2022-08-17 20:38                                             ` Taylor Blau
  2022-08-19 21:49                                               ` Taylor Blau
  0 siblings, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-08-17 20:38 UTC (permalink / raw)
  To: Abhradeep Chakraborty
  Cc: Derrick Stolee, Johannes Schindelin,
	Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaram,
	Philip Oakley, Martin Ågren

On Wed, Aug 17, 2022 at 03:32:31PM +0530, Abhradeep Chakraborty wrote:
> Hello Taylor, extremely thanks for finding the reason for this failure.

No problem. I appreciate all of the time and effort that you, Dscho, and
Stolee all put into looking into this (especially while I was out).

My experience with the bitmap code is that it can be somewhat difficult
to work in, because there are both (a) many ways to introduce bugs, and
(b) the effect of a bug can occur far away from the source of the bug.
Those two together make debugging difficult, at least for me.

I think that having a test-tool (like Dscho suggested) to dump some
basic information about a bitmap's structure would be quite helpful in
the future.

> On Wed, Aug 17, 2022 at 3:28 AM Taylor Blau <me@ttaylorr.com> wrote:
> I am not able to understand this modification.
> `info[preferred_pack].orig_pack_int_id` and `preferred_pack` have the
> same value, right? I see `ctx.info` getting sorted only after calling
> `get_sorted_entries()` function.

Yeah, I realized that this is bogus. For one, (as you note) those have
the same value before setting up the pack_perm array. But it also goes
against the grain of what we're trying to do: the point is that the
prefered-ness of objects in an existing MIDX should be discarded when
generating a new pseudo-pack order.

> > I think there are a couple of ways to fix this. The easiest thing to do
> > would be to force the identity of the preferred pack to be the same when
> > generating a MIDX bitmap *while reusing an existing MIDX*, since that is
> > the only time this bug can happen.
> >
> > But that's a little magical for my taste. I think a more reasonable fix
> > would be to include copies of all objects from the preferred pack
> > including in the case where that pack was non-preferred in an existing
> > MIDX and at least one object in that pack was selected from a different
> > pack in the existing MIDX.
>
> I think the later approach makes the most sense to me. It might not be
> a good idea to keep the same pack as `preferred` as a better candidate
> would be ignored in that case.

Yep, I agree. Users should feel free to change the identity of the
preferred pack when rewriting a MIDX regardless of whether or not they
are using `--stdin-packs`.

> > Abhradeep -- let me know if this is something you want to look into. I
> > think it's a very worthwhile bug to fix, since it is definitely
> > trigger-able in the wild (notably, only with `git multi-pack-index write
> > --bitmap` without `--stdin-packs` and only under certain circumstances),
> > and not just limited to SHA-256 mode.
> >
> > If you are busy experimenting with CRoaring, that's no problem and I can
> > fix this up, too. Either way, it would be worth you and others weighing
> > in on which fix you think is worth pursuing.
>
> I will be happy to fix it but I can't work on it right now (neither on
> CRoaring) because I am currently preparing for my exam. I can continue
> my work after that (i.e. from 19 aug). If you feel it is getting too
> late then you can do this too. I am also thinking of  writing a patch
> for bitmap specific test dump tool (as Johannes proposed previously).

No problem. I wrote up some patches today myself that implement the
above fix. I haven't polished them up yet, but they are available here:

    https://github.com/ttaylorr/git/compare/master...ttaylorr:git:tb/bitmap-use-existing-preferred

I want to add a more direct reproduction that works in both SHA-1 and
SHA-256 to demonstrate that these patches fix the issue. But in the
meantime, you can use Dscho's reproduction with these patches (based on
the tip of `master`) applied on top and observe that it passes
consistently.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format
  2022-08-14 16:55         ` [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                             ` (5 preceding siblings ...)
  2022-08-14 16:55           ` [PATCH v6 6/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
@ 2022-08-19 21:21           ` Junio C Hamano
  2022-08-22 14:42             ` Johannes Schindelin
  2022-08-25 22:16           ` Taylor Blau
  7 siblings, 1 reply; 162+ messages in thread
From: Junio C Hamano @ 2022-08-19 21:21 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Taylor Blau, Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Ævar Arnfjörð Bjarmason,
	Eric Sunshine, Johannes Schindelin, Abhradeep Chakraborty

"Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> When parsing the .bitmap file, git loads all the bitmaps one by one even if
> some of the bitmaps are not necessary. We can remove this overhead by
> loading only the necessary bitmaps. A look up table extension can solve this
> issue.
>
> Changes since v5:
>
> As the failure in the test case is not due to this code, I think it makes no
> sense to delay the patch further.
>
>  * The performance test changes were not accurate as the second
>    test_bitmap_cases call using the repo built for the previous call. This
>    version fixes that.
>  * Taylor suggested some minor changes. Those are addressed in this version.

The discussion on v5 was quite active, but we haven't seen any
traffic on this round.  Is everybody happy with what we see here?


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
  2022-08-17 20:38                                             ` Taylor Blau
@ 2022-08-19 21:49                                               ` Taylor Blau
  0 siblings, 0 replies; 162+ messages in thread
From: Taylor Blau @ 2022-08-19 21:49 UTC (permalink / raw)
  To: Abhradeep Chakraborty
  Cc: Derrick Stolee, Johannes Schindelin,
	Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaram,
	Philip Oakley, Martin Ågren

Hi Abhradeep,

On Wed, Aug 17, 2022 at 04:38:14PM -0400, Taylor Blau wrote:
> > > Abhradeep -- let me know if this is something you want to look into. I
> > > think it's a very worthwhile bug to fix, since it is definitely
> > > trigger-able in the wild (notably, only with `git multi-pack-index write
> > > --bitmap` without `--stdin-packs` and only under certain circumstances),
> > > and not just limited to SHA-256 mode.
> > >
> > > If you are busy experimenting with CRoaring, that's no problem and I can
> > > fix this up, too. Either way, it would be worth you and others weighing
> > > in on which fix you think is worth pursuing.
> >
> > I will be happy to fix it but I can't work on it right now (neither on
> > CRoaring) because I am currently preparing for my exam. I can continue
> > my work after that (i.e. from 19 aug). If you feel it is getting too
> > late then you can do this too. I am also thinking of  writing a patch
> > for bitmap specific test dump tool (as Johannes proposed previously).
>
> No problem. I wrote up some patches today myself that implement the
> above fix. I haven't polished them up yet, but they are available here:
>
>     https://github.com/ttaylorr/git/compare/master...ttaylorr:git:tb/bitmap-use-existing-preferred
>
> I want to add a more direct reproduction that works in both SHA-1 and
> SHA-256 to demonstrate that these patches fix the issue. But in the
> meantime, you can use Dscho's reproduction with these patches (based on
> the tip of `master`) applied on top and observe that it passes
> consistently.

That is now done and I sent the resulting patch series to the list,
which I'd encourage you to review here:

    https://lore.kernel.org/git/cover.1660944574.git.me@ttaylorr.com/T/#t

Phew!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format
  2022-08-19 21:21           ` [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Junio C Hamano
@ 2022-08-22 14:42             ` Johannes Schindelin
  2022-08-22 14:48               ` Taylor Blau
  0 siblings, 1 reply; 162+ messages in thread
From: Johannes Schindelin @ 2022-08-22 14:42 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Taylor Blau,
	Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Ævar Arnfjörð Bjarmason,
	Eric Sunshine, Abhradeep Chakraborty

Hi Junio,

On Fri, 19 Aug 2022, Junio C Hamano wrote:

> "Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > When parsing the .bitmap file, git loads all the bitmaps one by one even if
> > some of the bitmaps are not necessary. We can remove this overhead by
> > loading only the necessary bitmaps. A look up table extension can solve this
> > issue.
> >
> > Changes since v5:
> >
> > As the failure in the test case is not due to this code, I think it makes no
> > sense to delay the patch further.
> >
> >  * The performance test changes were not accurate as the second
> >    test_bitmap_cases call using the repo built for the previous call. This
> >    version fixes that.
> >  * Taylor suggested some minor changes. Those are addressed in this version.
>
> The discussion on v5 was quite active, but we haven't seen any
> traffic on this round.  Is everybody happy with what we see here?

The part of the lively discussion in which I participated exclusively
focused on the failed CI runs and trying to get to the bottom of this bug.

Taylor contributed <cover.1660944574.git.me@ttaylorr.com> to address the
bug. While he seems grateful for my help, I am honestly puzzled because I
lack too much knowledge about the code to have been of assistance in any
meaningful way.

My participation in this thread should not be mistaken for a review: I am
woefully unfamiliar with the bitmap design (let alone code) and would
therefore not _dare_ to offer anything that I would claim is a code
review.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format
  2022-08-22 14:42             ` Johannes Schindelin
@ 2022-08-22 14:48               ` Taylor Blau
  0 siblings, 0 replies; 162+ messages in thread
From: Taylor Blau @ 2022-08-22 14:48 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Abhradeep Chakraborty via GitGitGadget, git,
	Taylor Blau, Kaartic Sivaram, Derrick Stolee, Philip Oakley,
	Martin Ågren, Ævar Arnfjörð Bjarmason,
	Eric Sunshine, Abhradeep Chakraborty

On Mon, Aug 22, 2022 at 04:42:49PM +0200, Johannes Schindelin wrote:
> Taylor contributed <cover.1660944574.git.me@ttaylorr.com> to address the
> bug. While he seems grateful for my help, I am honestly puzzled because I
> lack too much knowledge about the code to have been of assistance in any
> meaningful way.

I was grateful ;-). Your contributions were quite helpful, especially
making the bug more easily reproducible (doubly so since that test
*could* have failed on master since its introduction, but didn't).

Pinning down some of the effects of the bug and documenting those were
helpful, too.

> My participation in this thread should not be mistaken for a review: I am
> woefully unfamiliar with the bitmap design (let alone code) and would
> therefore not _dare_ to offer anything that I would claim is a code
> review.

Reviewing Abhradeep's patches are on my list of things to get to,
hopefully today. I had hoped to get to it last week after getting back
from vacation, but was stymied by the aforementioned bug.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format
  2022-08-14 16:55         ` [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
                             ` (6 preceding siblings ...)
  2022-08-19 21:21           ` [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Junio C Hamano
@ 2022-08-25 22:16           ` Taylor Blau
  2022-08-26 16:02             ` Junio C Hamano
  7 siblings, 1 reply; 162+ messages in thread
From: Taylor Blau @ 2022-08-25 22:16 UTC (permalink / raw)
  To: Abhradeep Chakraborty via GitGitGadget
  Cc: git, Kaartic Sivaram, Junio C Hamano, Derrick Stolee,
	Philip Oakley, Martin Ågren,
	Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Johannes Schindelin, Abhradeep Chakraborty

Hi Abhradeep,

On Sun, Aug 14, 2022 at 04:55:05PM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> Changes since v5:
>
> As the failure in the test case is not due to this code, I think it makes no
> sense to delay the patch further.
>
>  * The performance test changes were not accurate as the second
>    test_bitmap_cases call using the repo built for the previous call. This
>    version fixes that.
>  * Taylor suggested some minor changes. Those are addressed in this version.

Apologies for my slow reaction time reviewing this series. Between
looking at that preferred pack bug you and Dscho spotted to catching up
after my vacation, it has taken me longer than I wanted to to take a
look at this.

I read through v6 carefully and am happy with the current state of
things. I think there are some small incremental clean-ups that we could
do on top, but they need not block this series, especially since the new
code is made opt-in behind a configuration knob.

This series all looks great to me, and the performance numbers that you
achieved at the end are a nice payoff for all of your hard work. Well
done!

    Reviewed-by: Taylor Blau <me@ttaylorr.com>

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format
  2022-08-25 22:16           ` Taylor Blau
@ 2022-08-26 16:02             ` Junio C Hamano
  0 siblings, 0 replies; 162+ messages in thread
From: Junio C Hamano @ 2022-08-26 16:02 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Abhradeep Chakraborty via GitGitGadget, git, Kaartic Sivaram,
	Derrick Stolee, Philip Oakley, Martin Ågren,
	Ævar Arnfjörð Bjarmason, Eric Sunshine,
	Johannes Schindelin, Abhradeep Chakraborty

Taylor Blau <me@ttaylorr.com> writes:

> This series all looks great to me, and the performance numbers that you
> achieved at the end are a nice payoff for all of your hard work. Well
> done!
>
>     Reviewed-by: Taylor Blau <me@ttaylorr.com>

Thanks, both.

^ permalink raw reply	[flat|nested] 162+ messages in thread

end of thread, other threads:[~2022-08-26 16:02 UTC | newest]

Thread overview: 162+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-20 12:33 [PATCH 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
2022-06-20 12:33 ` [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-06-20 16:56   ` Derrick Stolee
2022-06-20 17:09     ` Taylor Blau
2022-06-21  8:31       ` Abhradeep Chakraborty
2022-06-22 16:26         ` Taylor Blau
2022-06-21  8:23     ` Abhradeep Chakraborty
2022-06-20 17:21   ` Taylor Blau
2022-06-21  9:22     ` Abhradeep Chakraborty
2022-06-22 16:29       ` Taylor Blau
2022-06-22 16:45         ` Abhradeep Chakraborty
2022-06-20 20:21   ` Derrick Stolee
2022-06-21 10:08     ` Abhradeep Chakraborty
2022-06-22 16:30       ` Taylor Blau
2022-06-20 12:33 ` [PATCH 2/6] pack-bitmap: prepare to read " Abhradeep Chakraborty via GitGitGadget
2022-06-20 20:49   ` Derrick Stolee
2022-06-21 10:28     ` Abhradeep Chakraborty
2022-06-20 22:06   ` Taylor Blau
2022-06-21 11:52     ` Abhradeep Chakraborty
2022-06-22 16:49       ` Taylor Blau
2022-06-22 17:18         ` Abhradeep Chakraborty
2022-06-22 21:34           ` Taylor Blau
2022-06-20 12:33 ` [PATCH 3/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
2022-06-20 22:16   ` Taylor Blau
2022-06-21 12:50     ` Abhradeep Chakraborty
2022-06-22 16:51       ` Taylor Blau
2022-06-20 12:33 ` [PATCH 4/6] builtin/pack-objects.c: learn pack.writeBitmapLookupTable Taylor Blau via GitGitGadget
2022-06-20 22:18   ` Taylor Blau
2022-06-20 12:33 ` [PATCH 5/6] bitmap-commit-table: add tests for the bitmap lookup table Abhradeep Chakraborty via GitGitGadget
2022-06-22 16:54   ` Taylor Blau
2022-06-20 12:33 ` [PATCH 6/6] bitmap-lookup-table: add performance tests Abhradeep Chakraborty via GitGitGadget
2022-06-22 17:14   ` Taylor Blau
2022-06-26 13:10 ` [PATCH v2 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
2022-06-26 13:10   ` [PATCH v2 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-06-27 14:18     ` Derrick Stolee
2022-06-27 15:48       ` Taylor Blau
2022-06-27 16:51       ` Abhradeep Chakraborty
2022-06-26 13:10   ` [PATCH v2 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
2022-06-27 14:35     ` Derrick Stolee
2022-06-27 16:12       ` Taylor Blau
2022-06-27 17:10       ` Abhradeep Chakraborty
2022-06-27 16:05     ` Taylor Blau
2022-06-27 18:29       ` Abhradeep Chakraborty
2022-06-26 13:10   ` [PATCH v2 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
2022-06-27 14:43     ` Derrick Stolee
2022-06-27 17:42       ` Abhradeep Chakraborty
2022-06-27 17:49         ` Taylor Blau
2022-06-27 17:47     ` Taylor Blau
2022-06-27 18:39       ` Abhradeep Chakraborty
2022-06-29 20:11         ` Taylor Blau
2022-06-26 13:10   ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-06-27 15:12     ` Derrick Stolee
2022-06-27 18:06       ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table Abhradeep Chakraborty
2022-06-27 18:32         ` Derrick Stolee
2022-06-27 21:49       ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension Taylor Blau
2022-06-28  8:59         ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table Abhradeep Chakraborty
2022-06-29 20:22           ` Taylor Blau
2022-06-30  6:58             ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty
2022-06-27 21:38     ` Taylor Blau
2022-06-28 19:25       ` Abhradeep Chakraborty
2022-06-29 20:37         ` Taylor Blau
2022-06-29 20:41           ` Taylor Blau
2022-06-30  8:35           ` Abhradeep Chakraborty
2022-06-26 13:10   ` [PATCH v2 5/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
2022-06-27 21:53     ` Taylor Blau
2022-06-28  7:58       ` Abhradeep Chakraborty
2022-06-29 20:40         ` Taylor Blau
2022-06-26 13:10   ` [PATCH v2 6/6] p5310-pack-bitmaps.sh: enable pack.writeReverseIndex for testing Abhradeep Chakraborty via GitGitGadget
2022-06-27 21:50     ` Taylor Blau
2022-06-28  8:01       ` Abhradeep Chakraborty
2022-07-04  8:46   ` [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
2022-07-04  8:46     ` [PATCH v3 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-07-08 16:38       ` Philip Oakley
2022-07-09  7:53         ` Abhradeep Chakraborty
2022-07-10 15:01           ` Philip Oakley
2022-07-14 23:15             ` Taylor Blau
2022-07-15 10:36               ` Philip Oakley
2022-07-15 18:48             ` Abhradeep Chakraborty
2022-07-04  8:46     ` [PATCH v3 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
2022-07-14 23:26       ` Taylor Blau
2022-07-15  2:22       ` Taylor Blau
2022-07-15 15:58         ` Abhradeep Chakraborty
2022-07-15 22:15           ` Taylor Blau
2022-07-16 11:50             ` Abhradeep Chakraborty
2022-07-26  0:34               ` Taylor Blau
2022-07-18  8:59       ` Martin Ågren
2022-07-04  8:46     ` [PATCH v3 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
2022-07-04  8:46     ` [PATCH v3 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-07-15  2:46       ` Taylor Blau
2022-07-15 16:38         ` Abhradeep Chakraborty
2022-07-15 22:20           ` Taylor Blau
2022-07-18  9:06             ` Martin Ågren
2022-07-18 19:25               ` Abhradeep Chakraborty
2022-07-18 23:26                 ` Martin Ågren
2022-07-26  0:45               ` Taylor Blau
2022-07-04  8:46     ` [PATCH v3 5/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
2022-07-15  2:53       ` Taylor Blau
2022-07-15 18:23         ` Abhradeep Chakraborty
2022-07-04  8:46     ` [PATCH v3 6/6] p5310-pack-bitmaps.sh: remove pack.writeReverseIndex Abhradeep Chakraborty via GitGitGadget
2022-07-04 16:35     ` [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty
2022-07-06 19:21     ` Junio C Hamano
2022-07-07  8:48       ` Abhradeep Chakraborty
2022-07-07 18:09         ` Kaartic Sivaraam
2022-07-07 18:42           ` Abhradeep Chakraborty
2022-07-20 14:05     ` [PATCH v4 " Abhradeep Chakraborty via GitGitGadget
2022-07-20 14:05       ` [PATCH v4 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-07-20 14:05       ` [PATCH v4 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
2022-07-20 14:05       ` [PATCH v4 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
2022-07-20 14:05       ` [PATCH v4 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-07-20 14:05       ` [PATCH v4 5/6] p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex` Abhradeep Chakraborty via GitGitGadget
2022-07-20 14:05       ` [PATCH v4 6/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
2022-07-20 18:38       ` [PATCH v5 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
2022-07-20 18:38         ` [PATCH v5 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-07-20 18:38         ` [PATCH v5 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
2022-07-26  0:52           ` Taylor Blau
2022-07-26 18:22             ` Abhradeep Chakraborty
2022-07-20 18:38         ` [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
2022-07-28 19:22           ` Johannes Schindelin
2022-08-02 12:40             ` Abhradeep Chakraborty
2022-08-02 15:35               ` Johannes Schindelin
2022-08-02 17:44                 ` Abhradeep Chakraborty
2022-08-08 13:06                   ` Johannes Schindelin
2022-08-08 13:58                     ` Abhradeep Chakraborty
2022-08-09  9:03                       ` Johannes Schindelin
2022-08-09 12:03                         ` Abhradeep Chakraborty
2022-08-09 12:07                           ` Abhradeep Chakraborty
2022-08-10  9:09                           ` Johannes Schindelin
2022-08-10  9:20                             ` Johannes Schindelin
2022-08-10 10:04                               ` Abhradeep Chakraborty
2022-08-10 17:51                                 ` Derrick Stolee
2022-08-12 18:51                                   ` Abhradeep Chakraborty
2022-08-12 19:22                                     ` Derrick Stolee
2022-08-13 10:59                                       ` Abhradeep Chakraborty
2022-08-16 21:57                                         ` Taylor Blau
2022-08-17 10:02                                           ` Abhradeep Chakraborty
2022-08-17 20:38                                             ` Taylor Blau
2022-08-19 21:49                                               ` Taylor Blau
2022-08-13 11:05                               ` Abhradeep Chakraborty
2022-08-16 18:47                             ` Taylor Blau
2022-07-20 18:38         ` [PATCH v5 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-07-26  1:13           ` Taylor Blau
2022-07-26 18:56             ` Abhradeep Chakraborty
2022-07-26 19:36             ` Eric Sunshine
2022-07-20 18:38         ` [PATCH v5 5/6] p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex` Abhradeep Chakraborty via GitGitGadget
2022-07-26  1:18           ` Taylor Blau
2022-07-26  7:15             ` Ævar Arnfjörð Bjarmason
2022-07-26 13:32               ` Derrick Stolee
2022-07-26 13:54                 ` Ævar Arnfjörð Bjarmason
2022-07-26 18:17                   ` Abhradeep Chakraborty
2022-07-20 18:38         ` [PATCH v5 6/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
2022-08-14 16:55         ` [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
2022-08-14 16:55           ` [PATCH v6 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-08-14 16:55           ` [PATCH v6 2/6] bitmap: move `get commit positions` code to `bitmap_writer_finish` Abhradeep Chakraborty via GitGitGadget
2022-08-14 16:55           ` [PATCH v6 3/6] pack-bitmap-write.c: write lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-08-14 16:55           ` [PATCH v6 4/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
2022-08-14 16:55           ` [PATCH v6 5/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-08-14 16:55           ` [PATCH v6 6/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
2022-08-19 21:21           ` [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Junio C Hamano
2022-08-22 14:42             ` Johannes Schindelin
2022-08-22 14:48               ` Taylor Blau
2022-08-25 22:16           ` Taylor Blau
2022-08-26 16:02             ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).