All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/22] multi-pack reachability bitmaps
@ 2021-04-09 18:10 Taylor Blau
  2021-04-09 18:10 ` [PATCH 01/22] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
                   ` (25 more replies)
  0 siblings, 26 replies; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:10 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

This series implements multi-pack reachability bitmaps. It is based on
'master' after merging 'tb/pack-preferred-tips-to-give-bitmap'.

This is an extension of the classic single-pack bitmaps. Instead of
mapping between objects and bit positions according to each object's
pack-relative position, multi-pack bitmaps use each object's position in
a kind of "pseudo pack".

The pseudo pack doesn't refer to a physical packfile, but instead a
conceptual ordering of objects in a multi-pack index. This ordering is
reflected in the MIDX's .rev file, which is used extensively to power
multi-pack bitmaps.

This somewhat lengthy series is organized as follows:

  - The first eight patches are cleanup and preparation.

  - The next three patches factor out functions which have different
    implementations based on whether a bitmap is tied to a pack or MIDX.

  - The next two patches implement support for reading and writing
    multi-pack bitmaps.

  - The remaining tests prepare for a new
    GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP mode of running the test
    suite, and add new tests covering the new multi-pack bitmap
    behavior.

You can experiment with the new functionality by running "git
multi-pack-index write --bitmap", which updates the multi-pack index (if
necessary), and writes out a corresponding .bitmap file. Eventually,
support for invoking the above during "git repack" will be introduced,
but this is done in a separate series.

These patches have been extracted from a version which has been running
on every repository on GitHub for the past few weeks.

Thanks in advance for your review (including on all of the many series leading
up to this one).

Jeff King (1):
  t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP

Taylor Blau (21):
  pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
  pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  pack-bitmap-write.c: free existing bitmaps
  Documentation: build 'technical/bitmap-format' by default
  Documentation: describe MIDX-based bitmaps
  midx: make a number of functions non-static
  midx: clear auxiliary .rev after replacing the MIDX
  midx: respect 'core.multiPackIndex' when writing
  pack-bitmap.c: introduce 'bitmap_num_objects()'
  pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
  pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
  pack-bitmap: read multi-pack bitmaps
  pack-bitmap: write multi-pack bitmaps
  t5310: move some tests to lib-bitmap.sh
  t/helper/test-read-midx.c: add --checksum mode
  t5326: test multi-pack bitmap behavior
  t5319: don't write MIDX bitmaps in t5319
  t7700: update to work with MIDX bitmap test knob
  midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  p5310: extract full and partial bitmap tests
  p5326: perf tests for MIDX bitmaps

 Documentation/Makefile                       |   1 +
 Documentation/git-multi-pack-index.txt       |  12 +-
 Documentation/technical/bitmap-format.txt    |  72 ++-
 Documentation/technical/multi-pack-index.txt |  10 +-
 builtin/multi-pack-index.c                   |   2 +
 builtin/pack-objects.c                       |  15 +-
 builtin/repack.c                             |  13 +-
 ci/run-build-and-tests.sh                    |   1 +
 midx.c                                       | 216 ++++++++-
 midx.h                                       |   5 +
 pack-bitmap-write.c                          |  79 +++-
 pack-bitmap.c                                | 463 +++++++++++++++++--
 pack-bitmap.h                                |   8 +-
 packfile.c                                   |   2 +-
 t/README                                     |   4 +
 t/helper/test-read-midx.c                    |  16 +-
 t/lib-bitmap.sh                              | 216 +++++++++
 t/perf/lib-bitmap.sh                         |  69 +++
 t/perf/p5310-pack-bitmaps.sh                 |  65 +--
 t/perf/p5326-multi-pack-bitmaps.sh           |  43 ++
 t/t5310-pack-bitmaps.sh                      | 208 +--------
 t/t5319-multi-pack-index.sh                  |   3 +-
 t/t5326-multi-pack-bitmaps.sh                | 278 +++++++++++
 t/t7700-repack.sh                            |  18 +-
 24 files changed, 1435 insertions(+), 384 deletions(-)
 create mode 100644 t/perf/lib-bitmap.sh
 create mode 100755 t/perf/p5326-multi-pack-bitmaps.sh
 create mode 100755 t/t5326-multi-pack-bitmaps.sh

-- 
2.31.1.163.ga65ce7f831

^ permalink raw reply	[flat|nested] 273+ messages in thread

* [PATCH 01/22] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
@ 2021-04-09 18:10 ` Taylor Blau
  2021-04-09 18:10 ` [PATCH 02/22] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:10 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

The special `--test-bitmap` mode of `git rev-list` is used to compare
the result of an object traversal with a bitmap to check its integrity.
This mode does not, however, assert that the types of reachable objects
are stored correctly.

Harden this mode by teaching it to also check that each time an object's
bit is marked, the corresponding bit should be set in exactly one of the
type bitmaps (whose type matches the object's true type).

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 3ed15431cd..d45e91db1e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1263,10 +1263,52 @@ void count_bitmap_commit_list(struct bitmap_index *bitmap_git,
 struct bitmap_test_data {
 	struct bitmap_index *bitmap_git;
 	struct bitmap *base;
+	struct bitmap *commits;
+	struct bitmap *trees;
+	struct bitmap *blobs;
+	struct bitmap *tags;
 	struct progress *prg;
 	size_t seen;
 };
 
+static void test_bitmap_type(struct bitmap_test_data *tdata,
+			     struct object *obj, int pos)
+{
+	enum object_type bitmap_type = OBJ_NONE;
+	int bitmaps_nr = 0;
+
+	if (bitmap_get(tdata->commits, pos)) {
+		bitmap_type = OBJ_COMMIT;
+		bitmaps_nr++;
+	}
+	if (bitmap_get(tdata->trees, pos)) {
+		bitmap_type = OBJ_TREE;
+		bitmaps_nr++;
+	}
+	if (bitmap_get(tdata->blobs, pos)) {
+		bitmap_type = OBJ_BLOB;
+		bitmaps_nr++;
+	}
+	if (bitmap_get(tdata->tags, pos)) {
+		bitmap_type = OBJ_TAG;
+		bitmaps_nr++;
+	}
+
+	if (!bitmap_type)
+		die("object %s not found in type bitmaps",
+		    oid_to_hex(&obj->oid));
+
+	if (bitmaps_nr > 1)
+		die("object %s does not have a unique type",
+		    oid_to_hex(&obj->oid));
+
+	if (bitmap_type != obj->type)
+		die("object %s: real type %s, expected: %s",
+		    oid_to_hex(&obj->oid),
+		    type_name(obj->type),
+		    type_name(bitmap_type));
+}
+
 static void test_show_object(struct object *object, const char *name,
 			     void *data)
 {
@@ -1276,6 +1318,7 @@ static void test_show_object(struct object *object, const char *name,
 	bitmap_pos = bitmap_position(tdata->bitmap_git, &object->oid);
 	if (bitmap_pos < 0)
 		die("Object not in bitmap: %s\n", oid_to_hex(&object->oid));
+	test_bitmap_type(tdata, object, bitmap_pos);
 
 	bitmap_set(tdata->base, bitmap_pos);
 	display_progress(tdata->prg, ++tdata->seen);
@@ -1290,6 +1333,7 @@ static void test_show_commit(struct commit *commit, void *data)
 				     &commit->object.oid);
 	if (bitmap_pos < 0)
 		die("Object not in bitmap: %s\n", oid_to_hex(&commit->object.oid));
+	test_bitmap_type(tdata, &commit->object, bitmap_pos);
 
 	bitmap_set(tdata->base, bitmap_pos);
 	display_progress(tdata->prg, ++tdata->seen);
@@ -1337,6 +1381,10 @@ void test_bitmap_walk(struct rev_info *revs)
 
 	tdata.bitmap_git = bitmap_git;
 	tdata.base = bitmap_new();
+	tdata.commits = ewah_to_bitmap(bitmap_git->commits);
+	tdata.trees = ewah_to_bitmap(bitmap_git->trees);
+	tdata.blobs = ewah_to_bitmap(bitmap_git->blobs);
+	tdata.tags = ewah_to_bitmap(bitmap_git->tags);
 	tdata.prg = start_progress("Verifying bitmap entries", result_popcnt);
 	tdata.seen = 0;
 
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH 02/22] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
  2021-04-09 18:10 ` [PATCH 01/22] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
@ 2021-04-09 18:10 ` Taylor Blau
  2021-04-16  2:46   ` Jonathan Tan
  2021-04-09 18:10 ` [PATCH 03/22] pack-bitmap-write.c: free existing bitmaps Taylor Blau
                   ` (23 subsequent siblings)
  25 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:10 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

The set of objects covered by a bitmap must be closed under
reachability, since it must be the case that there is a valid bit
position assigned for every possible reachable object (otherwise the
bitmaps would be incomplete).

Pack bitmaps are never written from 'git repack' unless repacking
all-into-one, and so we never write non-closed bitmaps.

But multi-pack bitmaps change this, since it isn't known whether the
set of objects in the MIDX is closed under reachability until walking
them. Plumb through a bit that is set when a reachable object isn't
found.

As soon as a reachable object isn't found in the set of objects to
include in the bitmap, bitmap_writer_build() knows that the set is not
closed, and so it now fails gracefully.

(The new conditional in builtin/pack-objects.c:bitmap_writer_build()
guards against other failure modes, but is never triggered here, because
of the all-into-one detail above. This return value will be important to
check from the multi-pack index caller.)

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c |  3 +-
 pack-bitmap-write.c    | 76 +++++++++++++++++++++++++++++-------------
 pack-bitmap.h          |  2 +-
 3 files changed, 56 insertions(+), 25 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a1e33d7507..5205dde2e1 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1116,7 +1116,8 @@ static void write_pack_file(void)
 
 				bitmap_writer_show_progress(progress);
 				bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1);
-				bitmap_writer_build(&to_pack);
+				if (bitmap_writer_build(&to_pack) < 0)
+					die(_("failed to write bitmap index"));
 				bitmap_writer_finish(written_list, nr_written,
 						     tmpname.buf, write_bitmap_options);
 				write_bitmap_index = 0;
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 88d9e696a5..e829c46649 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -125,15 +125,20 @@ static inline void push_bitmapped_commit(struct commit *commit)
 	writer.selected_nr++;
 }
 
-static uint32_t find_object_pos(const struct object_id *oid)
+static uint32_t find_object_pos(const struct object_id *oid, int *found)
 {
 	struct object_entry *entry = packlist_find(writer.to_pack, oid);
 
 	if (!entry) {
-		die("Failed to write bitmap index. Packfile doesn't have full closure "
+		if (found)
+			*found = 0;
+		warning("Failed to write bitmap index. Packfile doesn't have full closure "
 			"(object %s is missing)", oid_to_hex(oid));
+		return 0;
 	}
 
+	if (found)
+		*found = 1;
 	return oe_in_pack_pos(writer.to_pack, entry);
 }
 
@@ -331,9 +336,10 @@ static void bitmap_builder_clear(struct bitmap_builder *bb)
 	bb->commits_nr = bb->commits_alloc = 0;
 }
 
-static void fill_bitmap_tree(struct bitmap *bitmap,
-			     struct tree *tree)
+static int fill_bitmap_tree(struct bitmap *bitmap,
+			    struct tree *tree)
 {
+	int found;
 	uint32_t pos;
 	struct tree_desc desc;
 	struct name_entry entry;
@@ -342,9 +348,11 @@ static void fill_bitmap_tree(struct bitmap *bitmap,
 	 * If our bit is already set, then there is nothing to do. Both this
 	 * tree and all of its children will be set.
 	 */
-	pos = find_object_pos(&tree->object.oid);
+	pos = find_object_pos(&tree->object.oid, &found);
+	if (!found)
+		return -1;
 	if (bitmap_get(bitmap, pos))
-		return;
+		return 0;
 	bitmap_set(bitmap, pos);
 
 	if (parse_tree(tree) < 0)
@@ -355,11 +363,15 @@ static void fill_bitmap_tree(struct bitmap *bitmap,
 	while (tree_entry(&desc, &entry)) {
 		switch (object_type(entry.mode)) {
 		case OBJ_TREE:
-			fill_bitmap_tree(bitmap,
-					 lookup_tree(the_repository, &entry.oid));
+			if (fill_bitmap_tree(bitmap,
+					     lookup_tree(the_repository, &entry.oid)) < 0)
+				return -1;
 			break;
 		case OBJ_BLOB:
-			bitmap_set(bitmap, find_object_pos(&entry.oid));
+			pos = find_object_pos(&entry.oid, &found);
+			if (!found)
+				return -1;
+			bitmap_set(bitmap, pos);
 			break;
 		default:
 			/* Gitlink, etc; not reachable */
@@ -368,15 +380,18 @@ static void fill_bitmap_tree(struct bitmap *bitmap,
 	}
 
 	free_tree_buffer(tree);
+	return 0;
 }
 
-static void fill_bitmap_commit(struct bb_commit *ent,
-			       struct commit *commit,
-			       struct prio_queue *queue,
-			       struct prio_queue *tree_queue,
-			       struct bitmap_index *old_bitmap,
-			       const uint32_t *mapping)
+static int fill_bitmap_commit(struct bb_commit *ent,
+			      struct commit *commit,
+			      struct prio_queue *queue,
+			      struct prio_queue *tree_queue,
+			      struct bitmap_index *old_bitmap,
+			      const uint32_t *mapping)
 {
+	int found;
+	uint32_t pos;
 	if (!ent->bitmap)
 		ent->bitmap = bitmap_new();
 
@@ -401,11 +416,16 @@ static void fill_bitmap_commit(struct bb_commit *ent,
 		 * Mark ourselves and queue our tree. The commit
 		 * walk ensures we cover all parents.
 		 */
-		bitmap_set(ent->bitmap, find_object_pos(&c->object.oid));
+		pos = find_object_pos(&c->object.oid, &found);
+		if (!found)
+			return -1;
+		bitmap_set(ent->bitmap, pos);
 		prio_queue_put(tree_queue, get_commit_tree(c));
 
 		for (p = c->parents; p; p = p->next) {
-			int pos = find_object_pos(&p->item->object.oid);
+			pos = find_object_pos(&p->item->object.oid, &found);
+			if (!found)
+				return -1;
 			if (!bitmap_get(ent->bitmap, pos)) {
 				bitmap_set(ent->bitmap, pos);
 				prio_queue_put(queue, p->item);
@@ -413,8 +433,12 @@ static void fill_bitmap_commit(struct bb_commit *ent,
 		}
 	}
 
-	while (tree_queue->nr)
-		fill_bitmap_tree(ent->bitmap, prio_queue_get(tree_queue));
+	while (tree_queue->nr) {
+		if (fill_bitmap_tree(ent->bitmap,
+				     prio_queue_get(tree_queue)) < 0)
+			return -1;
+	}
+	return 0;
 }
 
 static void store_selected(struct bb_commit *ent, struct commit *commit)
@@ -432,7 +456,7 @@ static void store_selected(struct bb_commit *ent, struct commit *commit)
 	kh_value(writer.bitmaps, hash_pos) = stored;
 }
 
-void bitmap_writer_build(struct packing_data *to_pack)
+int bitmap_writer_build(struct packing_data *to_pack)
 {
 	struct bitmap_builder bb;
 	size_t i;
@@ -441,6 +465,7 @@ void bitmap_writer_build(struct packing_data *to_pack)
 	struct prio_queue tree_queue = { NULL };
 	struct bitmap_index *old_bitmap;
 	uint32_t *mapping;
+	int closed = 1; /* until proven otherwise */
 
 	writer.bitmaps = kh_init_oid_map();
 	writer.to_pack = to_pack;
@@ -463,8 +488,11 @@ void bitmap_writer_build(struct packing_data *to_pack)
 		struct commit *child;
 		int reused = 0;
 
-		fill_bitmap_commit(ent, commit, &queue, &tree_queue,
-				   old_bitmap, mapping);
+		if (fill_bitmap_commit(ent, commit, &queue, &tree_queue,
+				       old_bitmap, mapping) < 0) {
+			closed = 0;
+			break;
+		}
 
 		if (ent->selected) {
 			store_selected(ent, commit);
@@ -499,7 +527,9 @@ void bitmap_writer_build(struct packing_data *to_pack)
 
 	stop_progress(&writer.progress);
 
-	compute_xor_offsets();
+	if (closed)
+		compute_xor_offsets();
+	return closed;
 }
 
 /**
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 78f2b3ff79..988ed3a30d 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -86,7 +86,7 @@ struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
 				      struct commit *commit);
 void bitmap_writer_select_commits(struct commit **indexed_commits,
 		unsigned int indexed_commits_nr, int max_bitmaps);
-void bitmap_writer_build(struct packing_data *to_pack);
+int bitmap_writer_build(struct packing_data *to_pack);
 void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint32_t index_nr,
 			  const char *filename,
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH 03/22] pack-bitmap-write.c: free existing bitmaps
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
  2021-04-09 18:10 ` [PATCH 01/22] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
  2021-04-09 18:10 ` [PATCH 02/22] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
@ 2021-04-09 18:10 ` Taylor Blau
  2021-04-09 18:10 ` [PATCH 04/22] Documentation: build 'technical/bitmap-format' by default Taylor Blau
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:10 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

When writing a new bitmap, the bitmap writer code attempts to read the
existing bitmap (if one is present). This is done in order to quickly
permute the bits of any bitmaps for commits which appear in the existing
bitmap, and were also selected for the new bitmap.

But since this code was added in 341fa34887 (pack-bitmap-write: use
existing bitmaps, 2020-12-08), the resources associated with opening an
existing bitmap were never released.

It's fine to ignore this, but it's bad hygiene. It will also cause a
problem for the multi-pack-index builtin, which will be responsible not
only for writing bitmaps, but also for expiring any old multi-pack
bitmaps.

If an existing bitmap was reused here, it will also be expired. That
will cause a problem on platforms which require file resources to be
closed before unlinking them, like Windows. Avoid this by ensuring we
close reused bitmaps with free_bitmap_index() before removing them.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index e829c46649..f90e100e3e 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -520,6 +520,7 @@ int bitmap_writer_build(struct packing_data *to_pack)
 	clear_prio_queue(&queue);
 	clear_prio_queue(&tree_queue);
 	bitmap_builder_clear(&bb);
+	free_bitmap_index(old_bitmap);
 	free(mapping);
 
 	trace2_region_leave("pack-bitmap-write", "building_bitmaps_total",
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH 04/22] Documentation: build 'technical/bitmap-format' by default
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (2 preceding siblings ...)
  2021-04-09 18:10 ` [PATCH 03/22] pack-bitmap-write.c: free existing bitmaps Taylor Blau
@ 2021-04-09 18:10 ` Taylor Blau
  2021-04-09 18:11 ` [PATCH 05/22] Documentation: describe MIDX-based bitmaps Taylor Blau
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:10 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Even though the 'TECH_DOCS' variable was introduced all the way back in
5e00439f0a (Documentation: build html for all files in technical and
howto, 2012-10-23), the 'bitmap-format' document was never added to that
list when it was created.

Prepare for changes to this file by including it in the list of
technical documentation that 'make doc' will build by default.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 874a01d7a8..6d60c8c165 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -83,6 +83,7 @@ SP_ARTICLES += $(API_DOCS)
 TECH_DOCS += MyFirstContribution
 TECH_DOCS += MyFirstObjectWalk
 TECH_DOCS += SubmittingPatches
+TECH_DOCS += technical/bitmap-format
 TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
 TECH_DOCS += technical/index-format
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH 05/22] Documentation: describe MIDX-based bitmaps
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (3 preceding siblings ...)
  2021-04-09 18:10 ` [PATCH 04/22] Documentation: build 'technical/bitmap-format' by default Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-04-09 18:11 ` [PATCH 06/22] midx: make a number of functions non-static Taylor Blau
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Update the technical documentation to describe the multi-pack bitmap
format. This patch merely introduces the new format, and describes its
high-level ideas. Git does not yet know how to read nor write these
multi-pack variants, and so the subsequent patches will:

  - Introduce code to interpret multi-pack bitmaps, according to this
    document.

  - Then, introduce code to write multi-pack bitmaps from the 'git
    multi-pack-index write' sub-command.

Finally, the implementation will gain tests in subsequent patches (as
opposed to inline with the patch teaching Git how to write multi-pack
bitmaps) to avoid a cyclic dependency.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/bitmap-format.txt    | 72 ++++++++++++++++----
 Documentation/technical/multi-pack-index.txt | 10 +--
 2 files changed, 61 insertions(+), 21 deletions(-)

diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index f8c18a0f7a..25221c7ec8 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -1,6 +1,45 @@
 GIT bitmap v1 format
 ====================
 
+== Pack and multi-pack bitmaps
+
+Bitmaps store reachability information about the set of objects in a packfile,
+or a multi-pack index (MIDX). The former is defined obviously, and the latter is
+defined as the union of objects in packs contained in the MIDX.
+
+A bitmap may belong to either one pack, or the repository's multi-pack index (if
+it exists). A repository may have at most one bitmap.
+
+An object is uniquely described by its bit position within a bitmap:
+
+	- If the bitmap belongs to a packfile, the __n__th bit corresponds to
+	the __n__th object in pack order. For a function `offset` which maps
+	objects to their byte offset within a pack, pack order is defined as
+	follows:
+
+		o1 <= o2 <==> offset(o1) <= offset(o2)
+
+	- If the bitmap belongs to a MIDX, the __n__th bit corresponds to the
+	__n__th object in MIDX order. With an additional function `pack` which
+	maps objects to the pack they were selected from by the MIDX, MIDX order
+	is defined as follows:
+
+		o1 <= o2 <==> pack(o1) <= pack(o2) /\ offset(o1) <= offset(o2)
+
+	The ordering between packs is done lexicographically by the pack name,
+	with the exception of the preferred pack, which sorts ahead of all other
+	packs.
+
+The on-disk representation (described below) of a bitmap is the same regardless
+of whether or not that bitmap belongs to a packfile or a MIDX. The only
+difference is the interpretation of the bits, which is described above.
+
+Certain bitmap extensions are supported (see: Appendix B). No extensions are
+required for bitmaps corresponding to packfiles. For bitmaps that correspond to
+MIDXs, both the bit-cache and rev-cache extensions are required.
+
+== On-disk format
+
 	- A header appears at the beginning:
 
 		4-byte signature: {'B', 'I', 'T', 'M'}
@@ -14,17 +53,19 @@ GIT bitmap v1 format
 			The following flags are supported:
 
 			- BITMAP_OPT_FULL_DAG (0x1) REQUIRED
-			This flag must always be present. It implies that the bitmap
-			index has been generated for a packfile with full closure
-			(i.e. where every single object in the packfile can find
-			 its parent links inside the same packfile). This is a
-			requirement for the bitmap index format, also present in JGit,
-			that greatly reduces the complexity of the implementation.
+			This flag must always be present. It implies that the
+			bitmap index has been generated for a packfile or
+			multi-pack index (MIDX) with full closure (i.e. where
+			every single object in the packfile/MIDX can find its
+			parent links inside the same packfile/MIDX). This is a
+			requirement for the bitmap index format, also present in
+			JGit, that greatly reduces the complexity of the
+			implementation.
 
 			- BITMAP_OPT_HASH_CACHE (0x4)
 			If present, the end of the bitmap file contains
 			`N` 32-bit name-hash values, one per object in the
-			pack. The format and meaning of the name-hash is
+			pack/MIDX. The format and meaning of the name-hash is
 			described below.
 
 		4-byte entry count (network byte order)
@@ -33,7 +74,8 @@ GIT bitmap v1 format
 
 		20-byte checksum
 
-			The SHA1 checksum of the pack this bitmap index belongs to.
+			The SHA1 checksum of the pack/MIDX this bitmap index
+			belongs to.
 
 	- 4 EWAH bitmaps that act as type indexes
 
@@ -50,7 +92,7 @@ GIT bitmap v1 format
 			- Tags
 
 		In each bitmap, the `n`th bit is set to true if the `n`th object
-		in the packfile is of that type.
+		in the packfile or multi-pack index is of that type.
 
 		The obvious consequence is that the OR of all 4 bitmaps will result
 		in a full set (all bits set), and the AND of all 4 bitmaps will
@@ -62,8 +104,9 @@ GIT bitmap v1 format
 		Each entry contains the following:
 
 		- 4-byte object position (network byte order)
-			The position **in the index for the packfile** where the
-			bitmap for this commit is found.
+			The position **in the index for the packfile or
+			multi-pack index** where the bitmap for this commit is
+			found.
 
 		- 1-byte XOR-offset
 			The xor offset used to compress this bitmap. For an entry
@@ -146,10 +189,11 @@ Name-hash cache
 ---------------
 
 If the BITMAP_OPT_HASH_CACHE flag is set, the end of the bitmap contains
-a cache of 32-bit values, one per object in the pack. The value at
+a cache of 32-bit values, one per object in the pack/MIDX. The value at
 position `i` is the hash of the pathname at which the `i`th object
-(counting in index order) in the pack can be found.  This can be fed
-into the delta heuristics to compare objects with similar pathnames.
+(counting in index or multi-pack index order) in the pack/MIDX can be found.
+This can be fed into the delta heuristics to compare objects with similar
+pathnames.
 
 The hash algorithm used is:
 
diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index fb688976c4..1a73c3ee20 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -71,14 +71,10 @@ Future Work
   still reducing the number of binary searches required for object
   lookups.
 
-- The reachability bitmap is currently paired directly with a single
-  packfile, using the pack-order as the object order to hopefully
-  compress the bitmaps well using run-length encoding. This could be
-  extended to pair a reachability bitmap with a multi-pack-index. If
-  the multi-pack-index is extended to store a "stable object order"
+- If the multi-pack-index is extended to store a "stable object order"
   (a function Order(hash) = integer that is constant for a given hash,
-  even as the multi-pack-index is updated) then a reachability bitmap
-  could point to a multi-pack-index and be updated independently.
+  even as the multi-pack-index is updated) then MIDX bitmaps could be
+  updated independently of the MIDX.
 
 - Packfiles can be marked as "special" using empty files that share
   the initial name but replace ".pack" with ".keep" or ".promisor".
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH 06/22] midx: make a number of functions non-static
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (4 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 05/22] Documentation: describe MIDX-based bitmaps Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-04-09 18:11 ` [PATCH 07/22] midx: clear auxiliary .rev after replacing the MIDX Taylor Blau
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

These functions will be called from outside of midx.c in a subsequent
patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 4 ++--
 midx.h | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 9e86583172..5249802326 100644
--- a/midx.c
+++ b/midx.c
@@ -48,12 +48,12 @@ static uint8_t oid_version(void)
 	}
 }
 
-static const unsigned char *get_midx_checksum(struct multi_pack_index *m)
+const unsigned char *get_midx_checksum(struct multi_pack_index *m)
 {
 	return m->data + m->data_len - the_hash_algo->rawsz;
 }
 
-static char *get_midx_filename(const char *object_dir)
+char *get_midx_filename(const char *object_dir)
 {
 	return xstrfmt("%s/pack/multi-pack-index", object_dir);
 }
diff --git a/midx.h b/midx.h
index 8684cf0fef..1172df1a71 100644
--- a/midx.h
+++ b/midx.h
@@ -42,6 +42,8 @@ struct multi_pack_index {
 #define MIDX_PROGRESS     (1 << 0)
 #define MIDX_WRITE_REV_INDEX (1 << 1)
 
+const unsigned char *get_midx_checksum(struct multi_pack_index *m);
+char *get_midx_filename(const char *object_dir);
 char *get_midx_rev_filename(struct multi_pack_index *m);
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH 07/22] midx: clear auxiliary .rev after replacing the MIDX
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (5 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 06/22] midx: make a number of functions non-static Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-04-09 18:11 ` [PATCH 08/22] midx: respect 'core.multiPackIndex' when writing Taylor Blau
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

When writing a new multi-pack index, write_midx_internal() attempts to
clean up any auxiliary files (currently just the MIDX's `.rev` file, but
soon to include a `.bitmap`, too) corresponding to the MIDX it's
replacing.

This step should happen after the new MIDX is written into place, since
doing so beforehand means that the old MIDX could be read without its
corresponding .rev file.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 5249802326..a24c36968d 100644
--- a/midx.c
+++ b/midx.c
@@ -1076,10 +1076,11 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 
 	if (flags & MIDX_WRITE_REV_INDEX)
 		write_midx_reverse_index(midx_name, midx_hash, &ctx);
-	clear_midx_files_ext(the_repository, ".rev", midx_hash);
 
 	commit_lock_file(&lk);
 
+	clear_midx_files_ext(the_repository, ".rev", midx_hash);
+
 cleanup:
 	for (i = 0; i < ctx.nr; i++) {
 		if (ctx.info[i].p) {
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH 08/22] midx: respect 'core.multiPackIndex' when writing
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (6 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 07/22] midx: clear auxiliary .rev after replacing the MIDX Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-04-09 18:11 ` [PATCH 09/22] pack-bitmap.c: introduce 'bitmap_num_objects()' Taylor Blau
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

When writing a new multi-pack index, write_midx_internal() attempts to
load any existing one to fill in some pieces of information. But it uses
load_multi_pack_index(), which ignores the configuration
"core.multiPackIndex", which indicates whether or not Git is allowed to
read an existing multi-pack-index.

Replace this with a routine that does respect that setting, to avoid
reading multi-pack-index files when told not to.

This avoids a problem that would arise in subsequent patches due to the
combination of 'git repack' reopening the object store in-process and
the multi-pack index code not checking whether a pack already exists in
the object store when calling add_pack_to_midx().

This would ultimately lead to a cycle being created along the
'packed_git' struct's '->next' pointer. That is obviously bad, but it
has hard-to-debug downstream effects like saying a bitmap can't be
loaded for a pack because one already exists (for the same pack).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index a24c36968d..567cdf0fcf 100644
--- a/midx.c
+++ b/midx.c
@@ -908,8 +908,18 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 
 	if (m)
 		ctx.m = m;
-	else
-		ctx.m = load_multi_pack_index(object_dir, 1);
+	else {
+		struct multi_pack_index *cur;
+
+		prepare_multi_pack_index_one(the_repository, object_dir, 1);
+
+		ctx.m = NULL;
+		for (cur = the_repository->objects->multi_pack_index; cur;
+		     cur = cur->next) {
+			if (!strcmp(object_dir, cur->object_dir))
+				ctx.m = cur;
+		}
+	}
 
 	ctx.nr = 0;
 	ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH 09/22] pack-bitmap.c: introduce 'bitmap_num_objects()'
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (7 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 08/22] midx: respect 'core.multiPackIndex' when writing Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-04-09 18:11 ` [PATCH 10/22] pack-bitmap.c: introduce 'nth_bitmap_object_oid()' Taylor Blau
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A subsequent patch to support reading MIDX bitmaps will be less noisy
after extracting a generic function to return how many objects are
contained in a bitmap.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 37 +++++++++++++++++++++----------------
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index d45e91db1e..a6c616aa3e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -136,6 +136,11 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
 	return b;
 }
 
+static uint32_t bitmap_num_objects(struct bitmap_index *index)
+{
+	return index->pack->num_objects;
+}
+
 static int load_bitmap_header(struct bitmap_index *index)
 {
 	struct bitmap_disk_header *header = (void *)index->map;
@@ -154,7 +159,7 @@ static int load_bitmap_header(struct bitmap_index *index)
 	/* Parse known bitmap format options */
 	{
 		uint32_t flags = ntohs(header->options);
-		size_t cache_size = st_mult(index->pack->num_objects, sizeof(uint32_t));
+		size_t cache_size = st_mult(bitmap_num_objects(index), sizeof(uint32_t));
 		unsigned char *index_end = index->map + index->map_size - the_hash_algo->rawsz;
 
 		if ((flags & BITMAP_OPT_FULL_DAG) == 0)
@@ -399,7 +404,7 @@ static inline int bitmap_position_extended(struct bitmap_index *bitmap_git,
 
 	if (pos < kh_end(positions)) {
 		int bitmap_pos = kh_value(positions, pos);
-		return bitmap_pos + bitmap_git->pack->num_objects;
+		return bitmap_pos + bitmap_num_objects(bitmap_git);
 	}
 
 	return -1;
@@ -451,7 +456,7 @@ static int ext_index_add_object(struct bitmap_index *bitmap_git,
 		bitmap_pos = kh_value(eindex->positions, hash_pos);
 	}
 
-	return bitmap_pos + bitmap_git->pack->num_objects;
+	return bitmap_pos + bitmap_num_objects(bitmap_git);
 }
 
 struct bitmap_show_data {
@@ -647,7 +652,7 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
 	for (i = 0; i < eindex->count; ++i) {
 		struct object *obj;
 
-		if (!bitmap_get(objects, bitmap_git->pack->num_objects + i))
+		if (!bitmap_get(objects, bitmap_num_objects(bitmap_git) + i))
 			continue;
 
 		obj = eindex->objects[i];
@@ -808,7 +813,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
 	 * individually.
 	 */
 	for (i = 0; i < eindex->count; i++) {
-		uint32_t pos = i + bitmap_git->pack->num_objects;
+		uint32_t pos = i + bitmap_num_objects(bitmap_git);
 		if (eindex->objects[i]->type == type &&
 		    bitmap_get(to_filter, pos) &&
 		    !bitmap_get(tips, pos))
@@ -835,7 +840,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 
 	oi.sizep = &size;
 
-	if (pos < pack->num_objects) {
+	if (pos < bitmap_num_objects(bitmap_git)) {
 		off_t ofs = pack_pos_to_offset(pack, pos);
 		if (packed_object_info(the_repository, pack, ofs, &oi) < 0) {
 			struct object_id oid;
@@ -845,7 +850,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 		}
 	} else {
 		struct eindex *eindex = &bitmap_git->ext_index;
-		struct object *obj = eindex->objects[pos - pack->num_objects];
+		struct object *obj = eindex->objects[pos - bitmap_num_objects(bitmap_git)];
 		if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
 			die(_("unable to get size of %s"), oid_to_hex(&obj->oid));
 	}
@@ -887,7 +892,7 @@ static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
 	}
 
 	for (i = 0; i < eindex->count; i++) {
-		uint32_t pos = i + bitmap_git->pack->num_objects;
+		uint32_t pos = i + bitmap_num_objects(bitmap_git);
 		if (eindex->objects[i]->type == OBJ_BLOB &&
 		    bitmap_get(to_filter, pos) &&
 		    !bitmap_get(tips, pos) &&
@@ -1075,8 +1080,8 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 	enum object_type type;
 	unsigned long size;
 
-	if (pos >= bitmap_git->pack->num_objects)
-		return; /* not actually in the pack */
+	if (pos >= bitmap_num_objects(bitmap_git))
+		return; /* not actually in the pack or MIDX */
 
 	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
 	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
@@ -1142,6 +1147,7 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 	struct pack_window *w_curs = NULL;
 	size_t i = 0;
 	uint32_t offset;
+	uint32_t objects_nr = bitmap_num_objects(bitmap_git);
 
 	assert(result);
 
@@ -1149,8 +1155,8 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 		i++;
 
 	/* Don't mark objects not in the packfile */
-	if (i > bitmap_git->pack->num_objects / BITS_IN_EWORD)
-		i = bitmap_git->pack->num_objects / BITS_IN_EWORD;
+	if (i > objects_nr / BITS_IN_EWORD)
+		i = objects_nr / BITS_IN_EWORD;
 
 	reuse = bitmap_word_alloc(i);
 	memset(reuse->words, 0xFF, i * sizeof(eword_t));
@@ -1234,7 +1240,7 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
 
 	for (i = 0; i < eindex->count; ++i) {
 		if (eindex->objects[i]->type == type &&
-			bitmap_get(objects, bitmap_git->pack->num_objects + i))
+			bitmap_get(objects, bitmap_num_objects(bitmap_git) + i))
 			count++;
 	}
 
@@ -1455,7 +1461,7 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 	uint32_t i, num_objects;
 	uint32_t *reposition;
 
-	num_objects = bitmap_git->pack->num_objects;
+	num_objects = bitmap_num_objects(bitmap_git);
 	CALLOC_ARRAY(reposition, num_objects);
 
 	for (i = 0; i < num_objects; ++i) {
@@ -1538,7 +1544,6 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
 static off_t get_disk_usage_for_extended(struct bitmap_index *bitmap_git)
 {
 	struct bitmap *result = bitmap_git->result;
-	struct packed_git *pack = bitmap_git->pack;
 	struct eindex *eindex = &bitmap_git->ext_index;
 	off_t total = 0;
 	struct object_info oi = OBJECT_INFO_INIT;
@@ -1550,7 +1555,7 @@ static off_t get_disk_usage_for_extended(struct bitmap_index *bitmap_git)
 	for (i = 0; i < eindex->count; i++) {
 		struct object *obj = eindex->objects[i];
 
-		if (!bitmap_get(result, pack->num_objects + i))
+		if (!bitmap_get(result, bitmap_num_objects(bitmap_git) + i))
 			continue;
 
 		if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH 10/22] pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (8 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 09/22] pack-bitmap.c: introduce 'bitmap_num_objects()' Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-04-09 18:11 ` [PATCH 11/22] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' Taylor Blau
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A subsequent patch to support reading MIDX bitmaps will be less noisy
after extracting a generic function to fetch the nth OID contained in
the bitmap.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index a6c616aa3e..97ee2d331d 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -223,6 +223,13 @@ static inline uint8_t read_u8(const unsigned char *buffer, size_t *pos)
 
 #define MAX_XOR_OFFSET 160
 
+static void nth_bitmap_object_oid(struct bitmap_index *index,
+				  struct object_id *oid,
+				  uint32_t n)
+{
+	nth_packed_object_id(oid, index->pack, n);
+}
+
 static int load_bitmap_entries_v1(struct bitmap_index *index)
 {
 	uint32_t i;
@@ -242,9 +249,7 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 		xor_offset = read_u8(index->map, &index->map_pos);
 		flags = read_u8(index->map, &index->map_pos);
 
-		if (nth_packed_object_id(&oid, index->pack, commit_idx_pos) < 0)
-			return error("corrupt ewah bitmap: commit index %u out of range",
-				     (unsigned)commit_idx_pos);
+		nth_bitmap_object_oid(index, &oid, commit_idx_pos);
 
 		bitmap = read_bitmap_1(index);
 		if (!bitmap)
@@ -844,8 +849,8 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 		off_t ofs = pack_pos_to_offset(pack, pos);
 		if (packed_object_info(the_repository, pack, ofs, &oi) < 0) {
 			struct object_id oid;
-			nth_packed_object_id(&oid, pack,
-					     pack_pos_to_index(pack, pos));
+			nth_bitmap_object_oid(bitmap_git, &oid,
+					      pack_pos_to_index(pack, pos));
 			die(_("unable to get size of %s"), oid_to_hex(&oid));
 		}
 	} else {
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH 11/22] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (9 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 10/22] pack-bitmap.c: introduce 'nth_bitmap_object_oid()' Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-04-09 18:11 ` [PATCH 12/22] pack-bitmap: read multi-pack bitmaps Taylor Blau
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

In a recent commit, pack-objects learned support for the
'pack.preferBitmapTips' configuration. This patch prepares the
multi-pack bitmap code to respect this configuration, too.

Since the multi-pack bitmap code already does a traversal of all
references (in order to discover the set of reachable commits in the
multi-pack index), it is more efficient to check whether or not each
reference is a suffix of any value of 'pack.preferBitmapTips' rather
than do an additional traversal.

Implement a function 'bitmap_is_preferred_refname()' which does just
that. The caller will be added in a subsequent patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 16 ++++++++++++++++
 pack-bitmap.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 97ee2d331d..be52570b0f 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1594,3 +1594,19 @@ const struct string_list *bitmap_preferred_tips(struct repository *r)
 {
 	return repo_config_get_value_multi(r, "pack.preferbitmaptips");
 }
+
+int bitmap_is_preferred_refname(struct repository *r, const char *refname)
+{
+	const struct string_list *preferred_tips = bitmap_preferred_tips(r);
+	struct string_list_item *item;
+
+	if (!preferred_tips)
+		return 0;
+
+	for_each_string_list_item(item, preferred_tips) {
+		if (starts_with(refname, item->string))
+			return 1;
+	}
+
+	return 0;
+}
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 988ed3a30d..0bf75ff2a7 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -93,5 +93,6 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint16_t options);
 
 const struct string_list *bitmap_preferred_tips(struct repository *r);
+int bitmap_is_preferred_refname(struct repository *r, const char *refname);
 
 #endif
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH 12/22] pack-bitmap: read multi-pack bitmaps
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (10 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 11/22] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-04-16  2:39   ` Jonathan Tan
  2021-04-09 18:11 ` [PATCH 13/22] pack-bitmap: write " Taylor Blau
                   ` (13 subsequent siblings)
  25 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

This prepares the code in pack-bitmap to interpret the new multi-pack
bitmaps described in Documentation/technical/bitmap-format.txt, which
mostly involves converting bit positions to accommodate looking them up
in a MIDX.

Note that there are currently no writers who write multi-pack bitmaps,
and that this will be implemented in the subsequent commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c |  12 +-
 pack-bitmap-write.c    |   2 +-
 pack-bitmap.c          | 349 +++++++++++++++++++++++++++++++++++++----
 pack-bitmap.h          |   5 +
 packfile.c             |   2 +-
 5 files changed, 338 insertions(+), 32 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5205dde2e1..a4e4e4ebcc 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -984,7 +984,17 @@ static void write_reused_pack(struct hashfile *f)
 				break;
 
 			offset += ewah_bit_ctz64(word >> offset);
-			write_reused_pack_one(pos + offset, f, &w_curs);
+			if (bitmap_is_midx(bitmap_git)) {
+				off_t pack_offs = bitmap_pack_offset(bitmap_git,
+								     pos + offset);
+				uint32_t pos;
+
+				if (offset_to_pack_pos(reuse_packfile, pack_offs, &pos) < 0)
+					die(_("write_reused_pack: could not locate %"PRIdMAX),
+					    (intmax_t)pack_offs);
+				write_reused_pack_one(pos, f, &w_curs);
+			} else
+				write_reused_pack_one(pos + offset, f, &w_curs);
 			display_progress(progress_state, ++written);
 		}
 	}
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index f90e100e3e..020c1774c8 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -48,7 +48,7 @@ void bitmap_writer_show_progress(int show)
 }
 
 /**
- * Build the initial type index for the packfile
+ * Build the initial type index for the packfile or multi-pack-index
  */
 void bitmap_writer_build_type_index(struct packing_data *to_pack,
 				    struct pack_idx_entry **index,
diff --git a/pack-bitmap.c b/pack-bitmap.c
index be52570b0f..e41fce9675 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -13,6 +13,7 @@
 #include "repository.h"
 #include "object-store.h"
 #include "list-objects-filter-options.h"
+#include "midx.h"
 #include "config.h"
 
 /*
@@ -35,8 +36,15 @@ struct stored_bitmap {
  * the active bitmap index is the largest one.
  */
 struct bitmap_index {
-	/* Packfile to which this bitmap index belongs to */
+	/*
+	 * The pack or multi-pack index (MIDX) that this bitmap index belongs
+	 * to.
+	 *
+	 * Exactly one of these must be non-NULL; this specifies the object
+	 * order used to interpret this bitmap.
+	 */
 	struct packed_git *pack;
+	struct multi_pack_index *midx;
 
 	/*
 	 * Mark the first `reuse_objects` in the packfile as reused:
@@ -71,6 +79,8 @@ struct bitmap_index {
 	/* If not NULL, this is a name-hash cache pointing into map. */
 	uint32_t *hashes;
 
+	const unsigned char *checksum;
+
 	/*
 	 * Extended index.
 	 *
@@ -138,6 +148,8 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
 
 static uint32_t bitmap_num_objects(struct bitmap_index *index)
 {
+	if (index->midx)
+		return index->midx->num_objects;
 	return index->pack->num_objects;
 }
 
@@ -175,6 +187,7 @@ static int load_bitmap_header(struct bitmap_index *index)
 	}
 
 	index->entry_count = ntohl(header->entry_count);
+	index->checksum = header->checksum;
 	index->map_pos += header_size;
 	return 0;
 }
@@ -227,7 +240,10 @@ static void nth_bitmap_object_oid(struct bitmap_index *index,
 				  struct object_id *oid,
 				  uint32_t n)
 {
-	nth_packed_object_id(oid, index->pack, n);
+	if (index->midx)
+		nth_midxed_object_oid(oid, index->midx, n);
+	else
+		nth_packed_object_id(oid, index->pack, n);
 }
 
 static int load_bitmap_entries_v1(struct bitmap_index *index)
@@ -272,7 +288,14 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 	return 0;
 }
 
-static char *pack_bitmap_filename(struct packed_git *p)
+char *midx_bitmap_filename(struct multi_pack_index *midx)
+{
+	return xstrfmt("%s-%s.bitmap",
+		       get_midx_filename(midx->object_dir),
+		       hash_to_hex(get_midx_checksum(midx)));
+}
+
+char *pack_bitmap_filename(struct packed_git *p)
 {
 	size_t len;
 
@@ -281,6 +304,54 @@ static char *pack_bitmap_filename(struct packed_git *p)
 	return xstrfmt("%.*s.bitmap", (int)len, p->pack_name);
 }
 
+static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
+			      struct multi_pack_index *midx)
+{
+	struct stat st;
+	char *idx_name = midx_bitmap_filename(midx);
+	int fd = git_open(idx_name);
+
+	free(idx_name);
+
+	if (fd < 0)
+		return -1;
+
+	if (fstat(fd, &st)) {
+		close(fd);
+		return -1;
+	}
+
+	if (bitmap_git->pack || bitmap_git->midx) {
+		/* ignore extra bitmap file; we can only handle one */
+		return -1;
+	}
+
+	bitmap_git->midx = midx;
+	bitmap_git->map_size = xsize_t(st.st_size);
+	bitmap_git->map_pos = 0;
+	bitmap_git->map = xmmap(NULL, bitmap_git->map_size, PROT_READ,
+				MAP_PRIVATE, fd, 0);
+	close(fd);
+
+	if (load_bitmap_header(bitmap_git) < 0)
+		goto cleanup;
+
+	if (!hasheq(get_midx_checksum(bitmap_git->midx), bitmap_git->checksum))
+		goto cleanup;
+
+	if (load_midx_revindex(bitmap_git->midx) < 0) {
+		warning(_("multi-pack bitmap is missing required reverse index"));
+		goto cleanup;
+	}
+	return 0;
+
+cleanup:
+	munmap(bitmap_git->map, bitmap_git->map_size);
+	bitmap_git->map_size = 0;
+	bitmap_git->map = NULL;
+	return -1;
+}
+
 static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git *packfile)
 {
 	int fd;
@@ -302,12 +373,18 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
 		return -1;
 	}
 
-	if (bitmap_git->pack) {
+	if (bitmap_git->pack || bitmap_git->midx) {
+		/* ignore extra bitmap file; we can only handle one */
 		warning("ignoring extra bitmap file: %s", packfile->pack_name);
 		close(fd);
 		return -1;
 	}
 
+	if (!is_pack_valid(packfile)) {
+		close(fd);
+		return -1;
+	}
+
 	bitmap_git->pack = packfile;
 	bitmap_git->map_size = xsize_t(st.st_size);
 	bitmap_git->map = xmmap(NULL, bitmap_git->map_size, PROT_READ, MAP_PRIVATE, fd, 0);
@@ -324,13 +401,36 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
 	return 0;
 }
 
-static int load_pack_bitmap(struct bitmap_index *bitmap_git)
+static int load_reverse_index(struct bitmap_index *bitmap_git)
+{
+	if (bitmap_is_midx(bitmap_git)) {
+		uint32_t i;
+		int ret;
+
+		ret = load_midx_revindex(bitmap_git->midx);
+		if (ret)
+			return ret;
+
+		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
+			if (prepare_midx_pack(the_repository, bitmap_git->midx, i))
+				die(_("load_reverse_index: could not open pack"));
+			ret = load_pack_revindex(bitmap_git->midx->packs[i]);
+			if (ret)
+				return ret;
+		}
+		return 0;
+	}
+	return load_pack_revindex(bitmap_git->pack);
+}
+
+static int load_bitmap(struct bitmap_index *bitmap_git)
 {
 	assert(bitmap_git->map);
 
 	bitmap_git->bitmaps = kh_init_oid_map();
 	bitmap_git->ext_index.positions = kh_init_oid_pos();
-	if (load_pack_revindex(bitmap_git->pack))
+
+	if (load_reverse_index(bitmap_git))
 		goto failed;
 
 	if (!(bitmap_git->commits = read_bitmap_1(bitmap_git)) ||
@@ -374,11 +474,35 @@ static int open_pack_bitmap(struct repository *r,
 	return ret;
 }
 
+static int open_midx_bitmap(struct repository *r,
+			    struct bitmap_index *bitmap_git)
+{
+	struct multi_pack_index *midx;
+
+	assert(!bitmap_git->map);
+
+	for (midx = get_multi_pack_index(r); midx; midx = midx->next) {
+		if (!open_midx_bitmap_1(bitmap_git, midx))
+			return 0;
+	}
+	return -1;
+}
+
+static int open_bitmap(struct repository *r,
+		       struct bitmap_index *bitmap_git)
+{
+	assert(!bitmap_git->map);
+
+	if (!open_midx_bitmap(r, bitmap_git))
+		return 0;
+	return open_pack_bitmap(r, bitmap_git);
+}
+
 struct bitmap_index *prepare_bitmap_git(struct repository *r)
 {
 	struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
 
-	if (!open_pack_bitmap(r, bitmap_git) && !load_pack_bitmap(bitmap_git))
+	if (!open_bitmap(r, bitmap_git) && !load_bitmap(bitmap_git))
 		return bitmap_git;
 
 	free_bitmap_index(bitmap_git);
@@ -428,10 +552,26 @@ static inline int bitmap_position_packfile(struct bitmap_index *bitmap_git,
 	return pos;
 }
 
+static int bitmap_position_midx(struct bitmap_index *bitmap_git,
+				const struct object_id *oid)
+{
+	uint32_t want, got;
+	if (!bsearch_midx(oid, bitmap_git->midx, &want))
+		return -1;
+
+	if (midx_to_pack_pos(bitmap_git->midx, want, &got) < 0)
+		return -1;
+	return got;
+}
+
 static int bitmap_position(struct bitmap_index *bitmap_git,
 			   const struct object_id *oid)
 {
-	int pos = bitmap_position_packfile(bitmap_git, oid);
+	int pos;
+	if (bitmap_is_midx(bitmap_git))
+		pos = bitmap_position_midx(bitmap_git, oid);
+	else
+		pos = bitmap_position_packfile(bitmap_git, oid);
 	return (pos >= 0) ? pos : bitmap_position_extended(bitmap_git, oid);
 }
 
@@ -721,6 +861,7 @@ static void show_objects_for_type(
 			continue;
 
 		for (offset = 0; offset < BITS_IN_EWORD; ++offset) {
+			struct packed_git *pack;
 			struct object_id oid;
 			uint32_t hash = 0, index_pos;
 			off_t ofs;
@@ -730,14 +871,28 @@ static void show_objects_for_type(
 
 			offset += ewah_bit_ctz64(word >> offset);
 
-			index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset);
-			ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset);
-			nth_packed_object_id(&oid, bitmap_git->pack, index_pos);
+			if (bitmap_is_midx(bitmap_git)) {
+				struct multi_pack_index *m = bitmap_git->midx;
+				uint32_t pack_id;
+
+				index_pos = pack_pos_to_midx(m, pos + offset);
+				ofs = nth_midxed_offset(m, index_pos);
+				nth_midxed_object_oid(&oid, m, index_pos);
+
+				pack_id = nth_midxed_pack_int_id(m, index_pos);
+				pack = bitmap_git->midx->packs[pack_id];
+			} else {
+				index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset);
+				ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset);
+				nth_bitmap_object_oid(bitmap_git, &oid, index_pos);
+
+				pack = bitmap_git->pack;
+			}
 
 			if (bitmap_git->hashes)
 				hash = get_be32(bitmap_git->hashes + index_pos);
 
-			show_reach(&oid, object_type, 0, hash, bitmap_git->pack, ofs);
+			show_reach(&oid, object_type, 0, hash, pack, ofs);
 		}
 	}
 }
@@ -749,8 +904,13 @@ static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
 		struct object *object = roots->item;
 		roots = roots->next;
 
-		if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
-			return 1;
+		if (bitmap_is_midx(bitmap_git)) {
+			if (bsearch_midx(&object->oid, bitmap_git->midx, NULL))
+				return 1;
+		} else {
+			if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
+				return 1;
+		}
 	}
 
 	return 0;
@@ -839,14 +999,26 @@ static void filter_bitmap_blob_none(struct bitmap_index *bitmap_git,
 static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 				     uint32_t pos)
 {
-	struct packed_git *pack = bitmap_git->pack;
 	unsigned long size;
 	struct object_info oi = OBJECT_INFO_INIT;
 
 	oi.sizep = &size;
 
 	if (pos < bitmap_num_objects(bitmap_git)) {
-		off_t ofs = pack_pos_to_offset(pack, pos);
+		struct packed_git *pack;
+		off_t ofs;
+
+		if (bitmap_is_midx(bitmap_git)) {
+			uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
+			uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
+
+			pack = bitmap_git->midx->packs[pack_id];
+			ofs = nth_midxed_offset(bitmap_git->midx, midx_pos);
+		} else {
+			pack = bitmap_git->pack;
+			ofs = pack_pos_to_offset(pack, pos);
+		}
+
 		if (packed_object_info(the_repository, pack, ofs, &oi) < 0) {
 			struct object_id oid;
 			nth_bitmap_object_oid(bitmap_git, &oid,
@@ -990,7 +1162,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	/* try to open a bitmapped pack, but don't parse it yet
 	 * because we may not need to use it */
 	CALLOC_ARRAY(bitmap_git, 1);
-	if (open_pack_bitmap(revs->repo, bitmap_git) < 0)
+	if (open_bitmap(revs->repo, bitmap_git) < 0)
 		goto cleanup;
 
 	for (i = 0; i < revs->pending.nr; ++i) {
@@ -1034,7 +1206,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	 * from disk. this is the point of no return; after this the rev_list
 	 * becomes invalidated and we must perform the revwalk through bitmaps
 	 */
-	if (load_pack_bitmap(bitmap_git) < 0)
+	if (load_bitmap(bitmap_git) < 0)
 		goto cleanup;
 
 	object_array_clear(&revs->pending);
@@ -1081,15 +1253,29 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 			      struct bitmap *reuse,
 			      struct pack_window **w_curs)
 {
-	off_t offset, header;
+	struct packed_git *pack;
+	off_t offset, delta_obj_offset;
 	enum object_type type;
 	unsigned long size;
 
 	if (pos >= bitmap_num_objects(bitmap_git))
 		return; /* not actually in the pack or MIDX */
 
-	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
-	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
+	if (bitmap_is_midx(bitmap_git)) {
+		uint32_t pack_id, midx_pos;
+
+		midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
+		pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
+
+		pack = bitmap_git->midx->packs[pack_id];
+		offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
+	} else {
+		pack = bitmap_git->pack;
+		offset = pack_pos_to_offset(bitmap_git->pack, pos);
+	}
+
+	delta_obj_offset = offset;
+	type = unpack_object_header(pack, w_curs, &offset, &size);
 	if (type < 0)
 		return; /* broken packfile, punt */
 
@@ -1105,11 +1291,11 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 		 * and the normal slow path will complain about it in
 		 * more detail.
 		 */
-		base_offset = get_delta_base(bitmap_git->pack, w_curs,
-					     &offset, type, header);
+		base_offset = get_delta_base(pack, w_curs, &offset, type,
+					     delta_obj_offset);
 		if (!base_offset)
 			return;
-		if (offset_to_pack_pos(bitmap_git->pack, base_offset, &base_pos) < 0)
+		if (offset_to_pack_pos(pack, base_offset, &base_pos) < 0)
 			return;
 
 		/*
@@ -1120,6 +1306,16 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 		 * packs we write fresh, and OFS_DELTA is the default). But
 		 * let's double check to make sure the pack wasn't written with
 		 * odd parameters.
+		 *
+		 * Note that the base does not need to be repositioned, i.e.,
+		 * the MIDX is guaranteed to have selected the copy of "base"
+		 * from the same pack, since this function is only ever called
+		 * on the preferred pack (and all duplicate objects are resolved
+		 * in favor of the preferred pack).
+		 *
+		 * This means that we can reuse base_pos when looking up the bit
+		 * in the reuse bitmap, too, since bits corresponding to the
+		 * preferred pack precede all bits from other packs.
 		 */
 		if (base_pos >= pos)
 			return;
@@ -1142,6 +1338,14 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 	bitmap_set(reuse, pos);
 }
 
+static uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git)
+{
+	struct multi_pack_index *m = bitmap_git->midx;
+	if (!m)
+		BUG("midx_preferred_pack: requires non-empty MIDX");
+	return nth_midxed_pack_int_id(m, pack_pos_to_midx(bitmap_git->midx, 0));
+}
+
 int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 				       struct packed_git **packfile_out,
 				       uint32_t *entries,
@@ -1153,13 +1357,29 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 	size_t i = 0;
 	uint32_t offset;
 	uint32_t objects_nr = bitmap_num_objects(bitmap_git);
+	uint32_t preferred_pack = 0;
 
 	assert(result);
 
+	load_reverse_index(bitmap_git);
+
+	if (bitmap_is_midx(bitmap_git)) {
+		preferred_pack = midx_preferred_pack(bitmap_git);
+		objects_nr = bitmap_git->midx->packs[preferred_pack]->num_objects;
+	} else
+		objects_nr = bitmap_git->pack->num_objects;
+
 	while (i < result->word_alloc && result->words[i] == (eword_t)~0)
 		i++;
 
-	/* Don't mark objects not in the packfile */
+	/*
+	 * Don't mark objects not in the packfile or preferred pack. This bitmap
+	 * marks objects eligible for reuse, but the pack-reuse code only
+	 * understands how to reuse a single pack. Since the preferred pack is
+	 * guaranteed to have all bases for its deltas (in a multi-pack bitmap),
+	 * we use it instead of another pack. In single-pack bitmaps, the choice
+	 * is made for us.
+	 */
 	if (i > objects_nr / BITS_IN_EWORD)
 		i = objects_nr / BITS_IN_EWORD;
 
@@ -1175,6 +1395,14 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 				break;
 
 			offset += ewah_bit_ctz64(word >> offset);
+			if (bitmap_is_midx(bitmap_git)) {
+				/*
+				 * Can't reuse from a non-preferred pack (see
+				 * above).
+				 */
+				if (pos + offset >= objects_nr)
+					continue;
+			}
 			try_partial_reuse(bitmap_git, pos + offset, reuse, &w_curs);
 		}
 	}
@@ -1192,7 +1420,9 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 	 * need to be handled separately.
 	 */
 	bitmap_and_not(result, reuse);
-	*packfile_out = bitmap_git->pack;
+	*packfile_out = bitmap_git->pack ?
+		bitmap_git->pack :
+		bitmap_git->midx->packs[preferred_pack];
 	*reuse_out = reuse;
 	return 0;
 }
@@ -1466,6 +1696,12 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 	uint32_t i, num_objects;
 	uint32_t *reposition;
 
+	if (!bitmap_is_midx(bitmap_git))
+		load_reverse_index(bitmap_git);
+	else if (load_midx_revindex(bitmap_git->midx) < 0)
+		BUG("rebuild_existing_bitmaps: missing required rev-cache "
+		    "extension");
+
 	num_objects = bitmap_num_objects(bitmap_git);
 	CALLOC_ARRAY(reposition, num_objects);
 
@@ -1473,8 +1709,13 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 		struct object_id oid;
 		struct object_entry *oe;
 
-		nth_packed_object_id(&oid, bitmap_git->pack,
-				     pack_pos_to_index(bitmap_git->pack, i));
+		if (bitmap_is_midx(bitmap_git))
+			nth_midxed_object_oid(&oid,
+					      bitmap_git->midx,
+					      pack_pos_to_midx(bitmap_git->midx, i));
+		else
+			nth_packed_object_id(&oid, bitmap_git->pack,
+					     pack_pos_to_index(bitmap_git->pack, i));
 		oe = packlist_find(mapping, &oid);
 
 		if (oe)
@@ -1500,6 +1741,19 @@ void free_bitmap_index(struct bitmap_index *b)
 	free(b->ext_index.hashes);
 	bitmap_free(b->result);
 	bitmap_free(b->haves);
+	if (bitmap_is_midx(b)) {
+		/*
+		 * Multi-pack bitmaps need to have resources associated with
+		 * their on-disk reverse indexes unmapped so that stale .rev and
+		 * .bitmap files can be removed.
+		 *
+		 * Unlike pack-based bitmaps, multi-pack bitmaps can be read and
+		 * written in the same 'git multi-pack-index write --bitmap'
+		 * process. Close resources so they can be removed safely on
+		 * platforms like Windows.
+		 */
+		close_midx_revindex(b->midx);
+	}
 	free(b);
 }
 
@@ -1514,7 +1768,7 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
 				     enum object_type object_type)
 {
 	struct bitmap *result = bitmap_git->result;
-	struct packed_git *pack = bitmap_git->pack;
+	struct packed_git *pack;
 	off_t total = 0;
 	struct ewah_iterator it;
 	eword_t filter;
@@ -1538,6 +1792,29 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
 
 			offset += ewah_bit_ctz64(word >> offset);
 			pos = base + offset;
+
+			if (bitmap_is_midx(bitmap_git)) {
+				uint32_t pack_pos;
+				uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
+				uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
+				off_t offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
+
+				pack = bitmap_git->midx->packs[pack_id];
+
+				if (offset_to_pack_pos(pack, offset, &pack_pos) < 0) {
+					struct object_id oid;
+					nth_midxed_object_oid(&oid, bitmap_git->midx, midx_pos);
+
+					die(_("could not find %s in pack #%"PRIu32" at offset %"PRIuMAX),
+					    oid_to_hex(&oid),
+					    pack_id,
+					    (uintmax_t)offset);
+				}
+
+				pos = pack_pos;
+			} else
+				pack = bitmap_git->pack;
+
 			total += pack_pos_to_offset(pack, pos + 1) -
 				 pack_pos_to_offset(pack, pos);
 		}
@@ -1590,6 +1867,20 @@ off_t get_disk_usage_from_bitmap(struct bitmap_index *bitmap_git,
 	return total;
 }
 
+int bitmap_is_midx(struct bitmap_index *bitmap_git)
+{
+	return !!bitmap_git->midx;
+}
+
+off_t bitmap_pack_offset(struct bitmap_index *bitmap_git, uint32_t pos)
+{
+	if (bitmap_is_midx(bitmap_git))
+		return nth_midxed_offset(bitmap_git->midx,
+					 pack_pos_to_midx(bitmap_git->midx, pos));
+	return nth_packed_object_offset(bitmap_git->pack,
+					pack_pos_to_index(bitmap_git->pack, pos));
+}
+
 const struct string_list *bitmap_preferred_tips(struct repository *r)
 {
 	return repo_config_get_value_multi(r, "pack.preferbitmaptips");
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 0bf75ff2a7..0dc6f7a7e4 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -91,6 +91,11 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint32_t index_nr,
 			  const char *filename,
 			  uint16_t options);
+char *midx_bitmap_filename(struct multi_pack_index *midx);
+char *pack_bitmap_filename(struct packed_git *p);
+
+int bitmap_is_midx(struct bitmap_index *bitmap_git);
+off_t bitmap_pack_offset(struct bitmap_index *bitmap_git, uint32_t pos);
 
 const struct string_list *bitmap_preferred_tips(struct repository *r);
 int bitmap_is_preferred_refname(struct repository *r, const char *refname);
diff --git a/packfile.c b/packfile.c
index 8668345d93..c444e365a3 100644
--- a/packfile.c
+++ b/packfile.c
@@ -863,7 +863,7 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
 	if (!strcmp(file_name, "multi-pack-index"))
 		return;
 	if (starts_with(file_name, "multi-pack-index") &&
-	    ends_with(file_name, ".rev"))
+	    (ends_with(file_name, ".bitmap") || ends_with(file_name, ".rev")))
 		return;
 	if (ends_with(file_name, ".idx") ||
 	    ends_with(file_name, ".rev") ||
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH 13/22] pack-bitmap: write multi-pack bitmaps
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (11 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 12/22] pack-bitmap: read multi-pack bitmaps Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-05-04  5:02   ` Jonathan Tan
  2021-04-09 18:11 ` [PATCH 14/22] t5310: move some tests to lib-bitmap.sh Taylor Blau
                   ` (12 subsequent siblings)
  25 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Write multi-pack bitmaps in the format described by
Documentation/technical/bitmap-format.txt, inferring their presence with
the absence of '--bitmap'.

To write a multi-pack bitmap, this patch attempts to reuse as much of
the existing machinery from pack-objects as possible. Specifically, the
MIDX code prepares a packing_data struct that pretends as if a single
packfile has been generated containing all of the objects contained
within the MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt |  12 +-
 builtin/multi-pack-index.c             |   2 +
 midx.c                                 | 195 ++++++++++++++++++++++++-
 midx.h                                 |   1 +
 4 files changed, 202 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index ffd601bc17..ada14deb2c 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -10,7 +10,7 @@ SYNOPSIS
 --------
 [verse]
 'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress]
-	[--preferred-pack=<pack>] <subcommand>
+	[--preferred-pack=<pack>] [--[no-]bitmap] <subcommand>
 
 DESCRIPTION
 -----------
@@ -40,6 +40,9 @@ write::
 		multiple packs contain the same object. If not given,
 		ties are broken in favor of the pack with the lowest
 		mtime.
+
+	--[no-]bitmap::
+		Control whether or not a multi-pack bitmap is written.
 --
 
 verify::
@@ -81,6 +84,13 @@ EXAMPLES
 $ git multi-pack-index write
 -----------------------------------------------
 
+* Write a MIDX file for the packfiles in the current .git folder with a
+corresponding bitmap.
++
+-------------------------------------------------------------
+$ git multi-pack-index write --preferred-pack <pack> --bitmap
+-------------------------------------------------------------
+
 * Write a MIDX file for the packfiles in an alternate object store.
 +
 -----------------------------------------------
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 5d3ea445fd..bf6fa982e3 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -68,6 +68,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 		OPT_STRING(0, "preferred-pack", &opts.preferred_pack,
 			   N_("preferred-pack"),
 			   N_("pack for reuse when computing a multi-pack bitmap")),
+		OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
+			MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
 		OPT_END(),
 	};
 
diff --git a/midx.c b/midx.c
index 567cdf0fcf..32d7d184c0 100644
--- a/midx.c
+++ b/midx.c
@@ -13,6 +13,10 @@
 #include "repository.h"
 #include "chunk-format.h"
 #include "pack.h"
+#include "pack-bitmap.h"
+#include "refs.h"
+#include "revision.h"
+#include "list-objects.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
@@ -885,6 +889,145 @@ static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
 static void clear_midx_files_ext(struct repository *r, const char *ext,
 				 unsigned char *keep_hash);
 
+static void prepare_midx_packing_data(struct packing_data *pdata,
+				      struct write_midx_context *ctx)
+{
+	uint32_t i;
+
+	memset(pdata, 0, sizeof(struct packing_data));
+	prepare_packing_data(the_repository, pdata);
+
+	for (i = 0; i < ctx->entries_nr; i++) {
+		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
+		struct object_entry *to = packlist_alloc(pdata, &from->oid);
+
+		oe_set_in_pack(pdata, to,
+			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
+	}
+}
+
+static int add_ref_to_pending(const char *refname,
+			      const struct object_id *oid,
+			      int flag, void *cb_data)
+{
+	struct rev_info *revs = (struct rev_info*)cb_data;
+	struct object *object;
+
+	if ((flag & REF_ISSYMREF) && (flag & REF_ISBROKEN)) {
+		warning("symbolic ref is dangling: %s", refname);
+		return 0;
+	}
+
+	object = parse_object_or_die(oid, refname);
+	if (object->type != OBJ_COMMIT)
+		return 0;
+
+	add_pending_object(revs, object, "");
+	if (bitmap_is_preferred_refname(revs->repo, refname))
+		object->flags |= NEEDS_BITMAP;
+	return 0;
+}
+
+struct bitmap_commit_cb {
+	struct commit **commits;
+	size_t commits_nr, commits_alloc;
+
+	struct write_midx_context *ctx;
+};
+
+static const struct object_id *bitmap_oid_access(size_t index,
+						 const void *_entries)
+{
+	const struct pack_midx_entry *entries = _entries;
+	return &entries[index].oid;
+}
+
+static void bitmap_show_commit(struct commit *commit, void *_data)
+{
+	struct bitmap_commit_cb *data = _data;
+	if (oid_pos(&commit->object.oid, data->ctx->entries,
+		    data->ctx->entries_nr,
+		    bitmap_oid_access) > -1) {
+		ALLOC_GROW(data->commits, data->commits_nr + 1,
+			   data->commits_alloc);
+		data->commits[data->commits_nr++] = commit;
+	}
+}
+
+static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr_p,
+						    struct write_midx_context *ctx)
+{
+	struct rev_info revs;
+	struct bitmap_commit_cb cb;
+
+	memset(&cb, 0, sizeof(struct bitmap_commit_cb));
+	cb.ctx = ctx;
+
+	repo_init_revisions(the_repository, &revs, NULL);
+	for_each_ref(add_ref_to_pending, &revs);
+
+	fetch_if_missing = 0;
+	revs.exclude_promisor_objects = 1;
+
+	if (prepare_revision_walk(&revs))
+		die(_("revision walk setup failed"));
+
+	traverse_commit_list(&revs, bitmap_show_commit, NULL, &cb);
+	if (indexed_commits_nr_p)
+		*indexed_commits_nr_p = cb.commits_nr;
+
+	return cb.commits;
+}
+
+static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
+			     struct write_midx_context *ctx,
+			     unsigned flags)
+{
+	struct packing_data pdata;
+	struct pack_idx_entry **index;
+	struct commit **commits = NULL;
+	uint32_t i, commits_nr;
+	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
+	int ret;
+
+	prepare_midx_packing_data(&pdata, ctx);
+
+	commits = find_commits_for_midx_bitmap(&commits_nr, ctx);
+
+	/*
+	 * Build the MIDX-order index based on pdata.objects (which is already
+	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
+	 * this order).
+	 */
+	ALLOC_ARRAY(index, pdata.nr_objects);
+	for (i = 0; i < pdata.nr_objects; i++)
+		index[i] = (struct pack_idx_entry *)&pdata.objects[i];
+
+	bitmap_writer_show_progress(flags & MIDX_PROGRESS);
+	bitmap_writer_build_type_index(&pdata, index, pdata.nr_objects);
+
+	/*
+	 * bitmap_writer_select_commits expects objects in lex order, but
+	 * pack_order gives us exactly that. use it directly instead of
+	 * re-sorting the array
+	 */
+	for (i = 0; i < pdata.nr_objects; i++)
+		index[ctx->pack_order[i]] = (struct pack_idx_entry *)&pdata.objects[i];
+
+	bitmap_writer_select_commits(commits, commits_nr, -1);
+	ret = bitmap_writer_build(&pdata);
+	if (!ret)
+		goto cleanup;
+
+	bitmap_writer_set_checksum(midx_hash);
+	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, 0);
+
+cleanup:
+	free(index);
+	free(bitmap_name);
+	return ret;
+}
+
 static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
 			       struct string_list *packs_to_drop,
 			       const char *preferred_pack_name,
@@ -930,9 +1073,16 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 		for (i = 0; i < ctx.m->num_packs; i++) {
 			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
 
+			if (prepare_midx_pack(the_repository, ctx.m, i)) {
+				error(_("could not load pack %s"),
+				      ctx.m->pack_names[i]);
+				result = 1;
+				goto cleanup;
+			}
+
 			ctx.info[ctx.nr].orig_pack_int_id = i;
 			ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]);
-			ctx.info[ctx.nr].p = NULL;
+			ctx.info[ctx.nr].p = ctx.m->packs[i];
 			ctx.info[ctx.nr].expired = 0;
 			ctx.nr++;
 		}
@@ -947,8 +1097,26 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
 	stop_progress(&ctx.progress);
 
-	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
-		goto cleanup;
+	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop) {
+		struct bitmap_index *bitmap_git;
+		int bitmap_exists;
+		int want_bitmap = flags & MIDX_WRITE_BITMAP;
+
+		bitmap_git = prepare_bitmap_git(the_repository);
+		bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
+		free_bitmap_index(bitmap_git);
+
+		if (bitmap_exists || !want_bitmap) {
+			/*
+			 * The correct MIDX already exists, and so does a
+			 * corresponding bitmap (or one wasn't requested).
+			 */
+			if (!want_bitmap)
+				clear_midx_files_ext(the_repository, ".bitmap",
+						     NULL);
+			goto cleanup;
+		}
+	}
 
 	ctx.preferred_pack_idx = -1;
 	if (preferred_pack_name) {
@@ -1048,9 +1216,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
 	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
 
-	if (ctx.m)
-		close_midx(ctx.m);
-
 	if (ctx.nr - dropped_packs == 0) {
 		error(_("no pack files to index."));
 		result = 1;
@@ -1081,14 +1246,17 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
 
-	if (flags & MIDX_WRITE_REV_INDEX)
+	if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))
 		ctx.pack_order = midx_pack_order(&ctx);
 
 	if (flags & MIDX_WRITE_REV_INDEX)
 		write_midx_reverse_index(midx_name, midx_hash, &ctx);
+	if (flags & MIDX_WRITE_BITMAP)
+		write_midx_bitmap(midx_name, midx_hash, &ctx, flags);
 
 	commit_lock_file(&lk);
 
+	clear_midx_files_ext(the_repository, ".bitmap", midx_hash);
 	clear_midx_files_ext(the_repository, ".rev", midx_hash);
 
 cleanup:
@@ -1096,6 +1264,15 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 		if (ctx.info[i].p) {
 			close_pack(ctx.info[i].p);
 			free(ctx.info[i].p);
+			if (ctx.m) {
+				/*
+				 * Destroy a stale reference to the pack in
+				 * 'ctx.m'.
+				 */
+				uint32_t orig = ctx.info[i].orig_pack_int_id;
+				if (orig < ctx.m->num_packs)
+					ctx.m->packs[orig] = NULL;
+			}
 		}
 		free(ctx.info[i].pack_name);
 	}
@@ -1105,6 +1282,9 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	free(ctx.pack_perm);
 	free(ctx.pack_order);
 	free(midx_name);
+	if (ctx.m)
+		close_midx(ctx.m);
+
 	return result;
 }
 
@@ -1166,6 +1346,7 @@ void clear_midx_file(struct repository *r)
 	if (remove_path(midx))
 		die(_("failed to clear multi-pack-index at %s"), midx);
 
+	clear_midx_files_ext(r, ".bitmap", NULL);
 	clear_midx_files_ext(r, ".rev", NULL);
 
 	free(midx);
diff --git a/midx.h b/midx.h
index 1172df1a71..350f4d0a7b 100644
--- a/midx.h
+++ b/midx.h
@@ -41,6 +41,7 @@ struct multi_pack_index {
 
 #define MIDX_PROGRESS     (1 << 0)
 #define MIDX_WRITE_REV_INDEX (1 << 1)
+#define MIDX_WRITE_BITMAP (1 << 2)
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m);
 char *get_midx_filename(const char *object_dir);
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH 14/22] t5310: move some tests to lib-bitmap.sh
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (12 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 13/22] pack-bitmap: write " Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-04-09 18:11 ` [PATCH 15/22] t/helper/test-read-midx.c: add --checksum mode Taylor Blau
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

We'll soon be adding a test script that will cover many of the same
bitmap concepts as t5310, but for MIDX bitmaps. Let's pull out as many
of the applicable tests as we can so we don't have to rewrite them.

There should be no functional change to t5310; we still run the same
operations in the same order.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/lib-bitmap.sh         | 212 ++++++++++++++++++++++++++++++++++++++++
 t/t5310-pack-bitmaps.sh | 204 +-------------------------------------
 2 files changed, 216 insertions(+), 200 deletions(-)

diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index fe3f98be24..c655a9bf35 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -1,3 +1,6 @@
+# Helpers for scripts testing bitamp functionality; see t5310 for
+# example usage.
+
 # Compare a file containing rev-list bitmap traversal output to its non-bitmap
 # counterpart. You can't just use test_cmp for this, because the two produce
 # subtly different output:
@@ -24,3 +27,212 @@ test_bitmap_traversal () {
 	test_cmp "$1.normalized" "$2.normalized" &&
 	rm -f "$1.normalized" "$2.normalized"
 }
+
+# To ensure the logic for "maximal commits" is exercised, make
+# the repository a bit more complicated.
+#
+#    other                         second
+#      *                             *
+# (99 commits)                  (99 commits)
+#      *                             *
+#      |\                           /|
+#      | * octo-other  octo-second * |
+#      |/|\_________  ____________/|\|
+#      | \          \/  __________/  |
+#      |  | ________/\ /             |
+#      *  |/          * merge-right  *
+#      | _|__________/ \____________ |
+#      |/ |                         \|
+# (l1) *  * merge-left               * (r1)
+#      | / \________________________ |
+#      |/                           \|
+# (l2) *                             * (r2)
+#       \___________________________ |
+#                                   \|
+#                                    * (base)
+#
+# We only push bits down the first-parent history, which
+# makes some of these commits unimportant!
+#
+# The important part for the maximal commit algorithm is how
+# the bitmasks are extended. Assuming starting bit positions
+# for second (bit 0) and other (bit 1), the bitmasks at the
+# end should be:
+#
+#      second: 1       (maximal, selected)
+#       other: 01      (maximal, selected)
+#      (base): 11 (maximal)
+#
+# This complicated history was important for a previous
+# version of the walk that guarantees never walking a
+# commit multiple times. That goal might be important
+# again, so preserve this complicated case. For now, this
+# test will guarantee that the bitmaps are computed
+# correctly, even with the repeat calculations.
+setup_bitmap_history() {
+	test_expect_success 'setup repo with moderate-sized history' '
+		test_commit_bulk --id=file 10 &&
+		git branch -M second &&
+		git checkout -b other HEAD~5 &&
+		test_commit_bulk --id=side 10 &&
+
+		# add complicated history setup, including merges and
+		# ambiguous merge-bases
+
+		git checkout -b merge-left other~2 &&
+		git merge second~2 -m "merge-left" &&
+
+		git checkout -b merge-right second~1 &&
+		git merge other~1 -m "merge-right" &&
+
+		git checkout -b octo-second second &&
+		git merge merge-left merge-right -m "octopus-second" &&
+
+		git checkout -b octo-other other &&
+		git merge merge-left merge-right -m "octopus-other" &&
+
+		git checkout other &&
+		git merge octo-other -m "pull octopus" &&
+
+		git checkout second &&
+		git merge octo-second -m "pull octopus" &&
+
+		# Remove these branches so they are not selected
+		# as bitmap tips
+		git branch -D merge-left &&
+		git branch -D merge-right &&
+		git branch -D octo-other &&
+		git branch -D octo-second &&
+
+		# add padding to make these merges less interesting
+		# and avoid having them selected for bitmaps
+		test_commit_bulk --id=file 100 &&
+		git checkout other &&
+		test_commit_bulk --id=side 100 &&
+		git checkout second &&
+
+		bitmaptip=$(git rev-parse second) &&
+		blob=$(echo tagged-blob | git hash-object -w --stdin) &&
+		git tag tagged-blob $blob
+	'
+}
+
+rev_list_tests_head () {
+	test_expect_success "counting commits via bitmap ($state, $branch)" '
+		git rev-list --count $branch >expect &&
+		git rev-list --use-bitmap-index --count $branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting partial commits via bitmap ($state, $branch)" '
+		git rev-list --count $branch~5..$branch >expect &&
+		git rev-list --use-bitmap-index --count $branch~5..$branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting commits with limit ($state, $branch)" '
+		git rev-list --count -n 1 $branch >expect &&
+		git rev-list --use-bitmap-index --count -n 1 $branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting non-linear history ($state, $branch)" '
+		git rev-list --count other...second >expect &&
+		git rev-list --use-bitmap-index --count other...second >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting commits with limiting ($state, $branch)" '
+		git rev-list --count $branch -- 1.t >expect &&
+		git rev-list --use-bitmap-index --count $branch -- 1.t >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting objects via bitmap ($state, $branch)" '
+		git rev-list --count --objects $branch >expect &&
+		git rev-list --use-bitmap-index --count --objects $branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "enumerate commits ($state, $branch)" '
+		git rev-list --use-bitmap-index $branch >actual &&
+		git rev-list $branch >expect &&
+		test_bitmap_traversal --no-confirm-bitmaps expect actual
+	'
+
+	test_expect_success "enumerate --objects ($state, $branch)" '
+		git rev-list --objects --use-bitmap-index $branch >actual &&
+		git rev-list --objects $branch >expect &&
+		test_bitmap_traversal expect actual
+	'
+
+	test_expect_success "bitmap --objects handles non-commit objects ($state, $branch)" '
+		git rev-list --objects --use-bitmap-index $branch tagged-blob >actual &&
+		grep $blob actual
+	'
+}
+
+rev_list_tests () {
+	state=$1
+
+	for branch in "second" "other"
+	do
+		rev_list_tests_head
+	done
+}
+
+basic_bitmap_tests () {
+	tip="$1"
+	test_expect_success 'rev-list --test-bitmap verifies bitmaps' "
+		git rev-list --test-bitmap "${tip:-HEAD}"
+	"
+
+	rev_list_tests 'full bitmap'
+
+	test_expect_success 'clone from bitmapped repository' '
+		rm -fr clone.git &&
+		git clone --no-local --bare . clone.git &&
+		git rev-parse HEAD >expect &&
+		git --git-dir=clone.git rev-parse HEAD >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success 'partial clone from bitmapped repository' '
+		test_config uploadpack.allowfilter true &&
+		rm -fr partial-clone.git &&
+		git clone --no-local --bare --filter=blob:none . partial-clone.git &&
+		(
+			cd partial-clone.git &&
+			pack=$(echo objects/pack/*.pack) &&
+			git verify-pack -v "$pack" >have &&
+			awk "/blob/ { print \$1 }" <have >blobs &&
+			# we expect this single blob because of the direct ref
+			git rev-parse refs/tags/tagged-blob >expect &&
+			test_cmp expect blobs
+		)
+	'
+
+	test_expect_success 'setup further non-bitmapped commits' '
+		test_commit_bulk --id=further 10
+	'
+
+	rev_list_tests 'partial bitmap'
+
+	test_expect_success 'fetch (partial bitmap)' '
+		git --git-dir=clone.git fetch origin second:second &&
+		git rev-parse HEAD >expect &&
+		git --git-dir=clone.git rev-parse HEAD >actual &&
+		test_cmp expect actual
+	'
+}
+
+# have_delta <obj> <expected_base>
+#
+# Note that because this relies on cat-file, it might find _any_ copy of an
+# object in the repository. The caller is responsible for making sure
+# there's only one (e.g., via "repack -ad", or having just fetched a copy).
+have_delta () {
+	echo $2 >expect &&
+	echo $1 | git cat-file --batch-check="%(deltabase)" >actual &&
+	test_cmp expect actual
+}
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index f53efc8229..4318f84d53 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -25,93 +25,10 @@ has_any () {
 	grep -Ff "$1" "$2"
 }
 
-# To ensure the logic for "maximal commits" is exercised, make
-# the repository a bit more complicated.
-#
-#    other                         second
-#      *                             *
-# (99 commits)                  (99 commits)
-#      *                             *
-#      |\                           /|
-#      | * octo-other  octo-second * |
-#      |/|\_________  ____________/|\|
-#      | \          \/  __________/  |
-#      |  | ________/\ /             |
-#      *  |/          * merge-right  *
-#      | _|__________/ \____________ |
-#      |/ |                         \|
-# (l1) *  * merge-left               * (r1)
-#      | / \________________________ |
-#      |/                           \|
-# (l2) *                             * (r2)
-#       \___________________________ |
-#                                   \|
-#                                    * (base)
-#
-# We only push bits down the first-parent history, which
-# makes some of these commits unimportant!
-#
-# The important part for the maximal commit algorithm is how
-# the bitmasks are extended. Assuming starting bit positions
-# for second (bit 0) and other (bit 1), the bitmasks at the
-# end should be:
-#
-#      second: 1       (maximal, selected)
-#       other: 01      (maximal, selected)
-#      (base): 11 (maximal)
-#
-# This complicated history was important for a previous
-# version of the walk that guarantees never walking a
-# commit multiple times. That goal might be important
-# again, so preserve this complicated case. For now, this
-# test will guarantee that the bitmaps are computed
-# correctly, even with the repeat calculations.
+setup_bitmap_history
 
-test_expect_success 'setup repo with moderate-sized history' '
-	test_commit_bulk --id=file 10 &&
-	git branch -M second &&
-	git checkout -b other HEAD~5 &&
-	test_commit_bulk --id=side 10 &&
-
-	# add complicated history setup, including merges and
-	# ambiguous merge-bases
-
-	git checkout -b merge-left other~2 &&
-	git merge second~2 -m "merge-left" &&
-
-	git checkout -b merge-right second~1 &&
-	git merge other~1 -m "merge-right" &&
-
-	git checkout -b octo-second second &&
-	git merge merge-left merge-right -m "octopus-second" &&
-
-	git checkout -b octo-other other &&
-	git merge merge-left merge-right -m "octopus-other" &&
-
-	git checkout other &&
-	git merge octo-other -m "pull octopus" &&
-
-	git checkout second &&
-	git merge octo-second -m "pull octopus" &&
-
-	# Remove these branches so they are not selected
-	# as bitmap tips
-	git branch -D merge-left &&
-	git branch -D merge-right &&
-	git branch -D octo-other &&
-	git branch -D octo-second &&
-
-	# add padding to make these merges less interesting
-	# and avoid having them selected for bitmaps
-	test_commit_bulk --id=file 100 &&
-	git checkout other &&
-	test_commit_bulk --id=side 100 &&
-	git checkout second &&
-
-	bitmaptip=$(git rev-parse second) &&
-	blob=$(echo tagged-blob | git hash-object -w --stdin) &&
-	git tag tagged-blob $blob &&
-	git config repack.writebitmaps true
+test_expect_success 'setup writing bitmaps during repack' '
+	git config repack.writeBitmaps true
 '
 
 test_expect_success 'full repack creates bitmaps' '
@@ -123,109 +40,7 @@ test_expect_success 'full repack creates bitmaps' '
 	grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
 '
 
-test_expect_success 'rev-list --test-bitmap verifies bitmaps' '
-	git rev-list --test-bitmap HEAD
-'
-
-rev_list_tests_head () {
-	test_expect_success "counting commits via bitmap ($state, $branch)" '
-		git rev-list --count $branch >expect &&
-		git rev-list --use-bitmap-index --count $branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting partial commits via bitmap ($state, $branch)" '
-		git rev-list --count $branch~5..$branch >expect &&
-		git rev-list --use-bitmap-index --count $branch~5..$branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting commits with limit ($state, $branch)" '
-		git rev-list --count -n 1 $branch >expect &&
-		git rev-list --use-bitmap-index --count -n 1 $branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting non-linear history ($state, $branch)" '
-		git rev-list --count other...second >expect &&
-		git rev-list --use-bitmap-index --count other...second >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting commits with limiting ($state, $branch)" '
-		git rev-list --count $branch -- 1.t >expect &&
-		git rev-list --use-bitmap-index --count $branch -- 1.t >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting objects via bitmap ($state, $branch)" '
-		git rev-list --count --objects $branch >expect &&
-		git rev-list --use-bitmap-index --count --objects $branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "enumerate commits ($state, $branch)" '
-		git rev-list --use-bitmap-index $branch >actual &&
-		git rev-list $branch >expect &&
-		test_bitmap_traversal --no-confirm-bitmaps expect actual
-	'
-
-	test_expect_success "enumerate --objects ($state, $branch)" '
-		git rev-list --objects --use-bitmap-index $branch >actual &&
-		git rev-list --objects $branch >expect &&
-		test_bitmap_traversal expect actual
-	'
-
-	test_expect_success "bitmap --objects handles non-commit objects ($state, $branch)" '
-		git rev-list --objects --use-bitmap-index $branch tagged-blob >actual &&
-		grep $blob actual
-	'
-}
-
-rev_list_tests () {
-	state=$1
-
-	for branch in "second" "other"
-	do
-		rev_list_tests_head
-	done
-}
-
-rev_list_tests 'full bitmap'
-
-test_expect_success 'clone from bitmapped repository' '
-	git clone --no-local --bare . clone.git &&
-	git rev-parse HEAD >expect &&
-	git --git-dir=clone.git rev-parse HEAD >actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'partial clone from bitmapped repository' '
-	test_config uploadpack.allowfilter true &&
-	git clone --no-local --bare --filter=blob:none . partial-clone.git &&
-	(
-		cd partial-clone.git &&
-		pack=$(echo objects/pack/*.pack) &&
-		git verify-pack -v "$pack" >have &&
-		awk "/blob/ { print \$1 }" <have >blobs &&
-		# we expect this single blob because of the direct ref
-		git rev-parse refs/tags/tagged-blob >expect &&
-		test_cmp expect blobs
-	)
-'
-
-test_expect_success 'setup further non-bitmapped commits' '
-	test_commit_bulk --id=further 10
-'
-
-rev_list_tests 'partial bitmap'
-
-test_expect_success 'fetch (partial bitmap)' '
-	git --git-dir=clone.git fetch origin second:second &&
-	git rev-parse HEAD >expect &&
-	git --git-dir=clone.git rev-parse HEAD >actual &&
-	test_cmp expect actual
-'
+basic_bitmap_tests
 
 test_expect_success 'incremental repack fails when bitmaps are requested' '
 	test_commit more-1 &&
@@ -461,17 +276,6 @@ test_expect_success 'truncated bitmap fails gracefully (cache)' '
 	test_i18ngrep corrupted.bitmap.index stderr
 '
 
-# have_delta <obj> <expected_base>
-#
-# Note that because this relies on cat-file, it might find _any_ copy of an
-# object in the repository. The caller is responsible for making sure
-# there's only one (e.g., via "repack -ad", or having just fetched a copy).
-have_delta () {
-	echo $2 >expect &&
-	echo $1 | git cat-file --batch-check="%(deltabase)" >actual &&
-	test_cmp expect actual
-}
-
 # Create a state of history with these properties:
 #
 #  - refs that allow a client to fetch some new history, while sharing some old
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH 15/22] t/helper/test-read-midx.c: add --checksum mode
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (13 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 14/22] t5310: move some tests to lib-bitmap.sh Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-04-09 18:12 ` [PATCH 16/22] t5326: test multi-pack bitmap behavior Taylor Blau
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Subsequent tests will want to check for the existence of a multi-pack
bitmap which matches the multi-pack-index stored in the pack directory.

The multi-pack bitmap includes the hex checksum of the MIDX it
corresponds to in its filename (for example,
'$packdir/multi-pack-index-<checksum>.bitmap'). As a result, some tests
want a way to learn what '<checksum>' is.

This helper addresses that need by printing the checksum of the
repository's multi-pack-index.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/helper/test-read-midx.c | 16 +++++++++++++++-
 t/lib-bitmap.sh           |  4 ++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 7c2eb11a8e..cb0d27049a 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -60,12 +60,26 @@ static int read_midx_file(const char *object_dir, int show_objects)
 	return 0;
 }
 
+static int read_midx_checksum(const char *object_dir)
+{
+	struct multi_pack_index *m;
+
+	setup_git_directory();
+	m = load_multi_pack_index(object_dir, 1);
+	if (!m)
+		return 1;
+	printf("%s\n", hash_to_hex(get_midx_checksum(m)));
+	return 0;
+}
+
 int cmd__read_midx(int argc, const char **argv)
 {
 	if (!(argc == 2 || argc == 3))
-		usage("read-midx [--show-objects] <object-dir>");
+		usage("read-midx [--show-objects|--checksum] <object-dir>");
 
 	if (!strcmp(argv[1], "--show-objects"))
 		return read_midx_file(argv[2], 1);
+	else if (!strcmp(argv[1], "--checksum"))
+		return read_midx_checksum(argv[2]);
 	return read_midx_file(argv[1], 0);
 }
diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index c655a9bf35..a5ac8b41a7 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -236,3 +236,7 @@ have_delta () {
 	echo $1 | git cat-file --batch-check="%(deltabase)" >actual &&
 	test_cmp expect actual
 }
+
+midx_checksum () {
+	test-tool read-midx --checksum "${1:-.git/objects}"
+}
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH 16/22] t5326: test multi-pack bitmap behavior
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (14 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 15/22] t/helper/test-read-midx.c: add --checksum mode Taylor Blau
@ 2021-04-09 18:12 ` Taylor Blau
  2021-05-04 17:51   ` Jonathan Tan
  2021-04-09 18:12 ` [PATCH 17/22] t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP Taylor Blau
                   ` (9 subsequent siblings)
  25 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:12 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

This patch introduces a new test, t5326, which tests the basic
functionality of multi-pack bitmaps.

Some trivial behavior is tested, such as:

  - Whether bitmaps can be generated with more than one pack.
  - Whether clones can be served with all objects in the bitmap.
  - Whether follow-up fetches can be served with some objects outside of
    the server's bitmap

These use lib-bitmap's tests (which in turn were pulled from t5310), and
we cover cases where the MIDX represents both a single pack and multiple
packs.

In addition, some non-trivial and MIDX-specific behavior is tested, too,
including:

  - Whether multi-pack bitmaps behave correctly with respect to the
    pack-reuse machinery when the base for some object is selected from
    a different pack than the delta.
  - Whether multi-pack bitmaps correctly respect the
    pack.preferBitmapTips configuration.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5326-multi-pack-bitmaps.sh | 278 ++++++++++++++++++++++++++++++++++
 1 file changed, 278 insertions(+)
 create mode 100755 t/t5326-multi-pack-bitmaps.sh

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
new file mode 100755
index 0000000000..51c4f9ad78
--- /dev/null
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -0,0 +1,278 @@
+#!/bin/sh
+
+test_description='exercise basic multi-pack bitmap functionality'
+. ./test-lib.sh
+. "${TEST_DIRECTORY}/lib-bitmap.sh"
+
+# We'll be writing our own midx and bitmaps, so avoid getting confused by the
+# automatic ones.
+GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
+objdir=.git/objects
+midx=$objdir/pack/multi-pack-index
+
+# midx_pack_source <obj>
+midx_pack_source () {
+	test-tool read-midx --show-objects .git/objects | grep "^$1 " | cut -f2
+}
+
+setup_bitmap_history
+
+test_expect_success 'enable core.multiPackIndex' '
+	git config core.multiPackIndex true
+'
+
+test_expect_success 'create single-pack midx with bitmaps' '
+	git repack -ad &&
+	git multi-pack-index write --bitmap &&
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
+'
+
+basic_bitmap_tests
+
+test_expect_success 'create new additional packs' '
+	for i in $(test_seq 1 16)
+	do
+		test_commit "$i" &&
+		git repack -d
+	done &&
+
+	git checkout -b other2 HEAD~8 &&
+	for i in $(test_seq 1 8)
+	do
+		test_commit "side-$i" &&
+		git repack -d
+	done &&
+	git checkout second
+'
+
+test_expect_success 'create multi-pack midx with bitmaps' '
+	git multi-pack-index write --bitmap &&
+
+	ls $objdir/pack/pack-*.pack >packs &&
+	test_line_count = 26 packs &&
+
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
+'
+
+basic_bitmap_tests
+
+test_expect_success '--no-bitmap is respected when bitmaps exist' '
+	git multi-pack-index write --bitmap &&
+
+	test_commit respect--no-bitmap &&
+	GIT_TEST_MULTI_PACK_INDEX=0 git repack -d &&
+
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+
+	git multi-pack-index write --no-bitmap &&
+
+	test_path_is_file $midx &&
+	test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
+'
+
+test_expect_success 'setup midx with base from later pack' '
+	# Write a and b so that "a" is a delta on top of base "b", since Git
+	# prefers to delete contents out of a base rather than add to a shorter
+	# object.
+	test_seq 1 128 >a &&
+	test_seq 1 130 >b &&
+
+	git add a b &&
+	git commit -m "initial commit" &&
+
+	a=$(git rev-parse HEAD:a) &&
+	b=$(git rev-parse HEAD:b) &&
+
+	# In the first pack, "a" is stored as a delta to "b".
+	p1=$(git pack-objects .git/objects/pack/pack <<-EOF
+	$a
+	$b
+	EOF
+	) &&
+
+	# In the second pack, "a" is missing, and "b" is not a delta nor base to
+	# any other object.
+	p2=$(git pack-objects .git/objects/pack/pack <<-EOF
+	$b
+	$(git rev-parse HEAD)
+	$(git rev-parse HEAD^{tree})
+	EOF
+	) &&
+
+	git prune-packed &&
+	# Use the second pack as the preferred source, so that "b" occurs
+	# earlier in the MIDX object order, rendering "a" unusable for pack
+	# reuse.
+	git multi-pack-index write --bitmap --preferred-pack=pack-$p2.idx &&
+
+	have_delta $a $b &&
+	test $(midx_pack_source $a) != $(midx_pack_source $b)
+'
+
+rev_list_tests 'full bitmap with backwards delta'
+
+test_expect_success 'clone with bitmaps enabled' '
+	git clone --no-local --bare . clone-reverse-delta.git &&
+	test_when_finished "rm -fr clone-reverse-delta.git" &&
+
+	git rev-parse HEAD >expect &&
+	git --git-dir=clone-reverse-delta.git rev-parse HEAD >actual &&
+	test_cmp expect actual
+'
+
+bitmap_reuse_tests() {
+	from=$1
+	to=$2
+
+	test_expect_success "setup pack reuse tests ($from -> $to)" '
+		rm -fr repo &&
+		git init repo &&
+		(
+			cd repo &&
+			test_commit_bulk 16 &&
+			git tag old-tip &&
+
+			git config core.multiPackIndex true &&
+			if test "MIDX" = "$from"
+			then
+				GIT_TEST_MULTI_PACK_INDEX=0 git repack -Ad &&
+				git multi-pack-index write --bitmap
+			else
+				GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb
+			fi
+		)
+	'
+
+	test_expect_success "build bitmap from existing ($from -> $to)" '
+		(
+			cd repo &&
+			test_commit_bulk --id=further 16 &&
+			git tag new-tip &&
+
+			if test "MIDX" = "$to"
+			then
+				GIT_TEST_MULTI_PACK_INDEX=0 git repack -d &&
+				git multi-pack-index write --bitmap
+			else
+				GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb
+			fi
+		)
+	'
+
+	test_expect_success "verify resulting bitmaps ($from -> $to)" '
+		(
+			cd repo &&
+			git for-each-ref &&
+			git rev-list --test-bitmap refs/tags/old-tip &&
+			git rev-list --test-bitmap refs/tags/new-tip
+		)
+	'
+}
+
+bitmap_reuse_tests 'pack' 'MIDX'
+bitmap_reuse_tests 'MIDX' 'pack'
+bitmap_reuse_tests 'MIDX' 'MIDX'
+
+test_expect_success 'missing object closure fails gracefully' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit loose &&
+		test_commit packed &&
+
+		# Do not pass "--revs"; we want a pack without the "loose"
+		# commit.
+		git pack-objects $objdir/pack/pack <<-EOF &&
+		$(git rev-parse packed)
+		EOF
+
+		git multi-pack-index write --bitmap 2>err &&
+		grep "doesn.t have full closure" err &&
+		test_path_is_file $midx &&
+		test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
+	)
+'
+
+test_expect_success 'setup partial bitmaps' '
+	test_commit packed &&
+	git repack &&
+	test_commit loose &&
+	git multi-pack-index write --bitmap 2>err &&
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
+'
+
+basic_bitmap_tests HEAD~
+
+test_expect_success 'removing a MIDX clears stale bitmaps' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+		test_commit base &&
+		git repack &&
+		git multi-pack-index write --bitmap &&
+
+		# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
+		stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
+		rm $midx &&
+
+		# Then write a new MIDX.
+		test_commit new &&
+		git repack &&
+		git multi-pack-index write --bitmap &&
+
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+		test_path_is_missing $stale_bitmap
+	)
+'
+
+test_expect_success 'pack.preferBitmapTips' '
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit_bulk --message="%s" 103 &&
+
+		git log --format="%H" >commits.raw &&
+		sort <commits.raw >commits &&
+
+		git log --format="create refs/tags/%s %H" HEAD >refs &&
+		git update-ref --stdin <refs &&
+
+		git multi-pack-index write --bitmap &&
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+
+		test-tool bitmap list-commits | sort >bitmaps &&
+		comm -13 bitmaps commits >before &&
+		test_line_count = 1 before &&
+
+		perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
+			<before | git update-ref --stdin &&
+
+		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
+		rm -fr $midx-$(midx_checksum $objdir).rev &&
+		rm -fr $midx &&
+
+		git -c pack.preferBitmapTips=refs/tags/include \
+			multi-pack-index write --bitmap &&
+		test-tool bitmap list-commits | sort >bitmaps &&
+		comm -13 bitmaps commits >after &&
+
+		! test_cmp before after
+	)
+'
+
+test_done
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH 17/22] t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (15 preceding siblings ...)
  2021-04-09 18:12 ` [PATCH 16/22] t5326: test multi-pack bitmap behavior Taylor Blau
@ 2021-04-09 18:12 ` Taylor Blau
  2021-04-09 18:12 ` [PATCH 18/22] t5319: don't write MIDX bitmaps in t5319 Taylor Blau
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:12 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

From: Jeff King <peff@peff.net>

Generating a MIDX bitmap confuses many of the tests in t5310, which
expect to control whether and how bitmaps are written. Since the
relevant MIDX-bitmap tests here are covered already in t5326, let's just
disable the flag for the whole t5310 script.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5310-pack-bitmaps.sh | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 4318f84d53..673baa5c3c 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -8,6 +8,10 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 . "$TEST_DIRECTORY"/lib-bundle.sh
 . "$TEST_DIRECTORY"/lib-bitmap.sh
 
+# t5310 deals only with single-pack bitmaps, so don't write MIDX bitmaps in
+# their place.
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
 objpath () {
 	echo ".git/objects/$(echo "$1" | sed -e 's|\(..\)|\1/|')"
 }
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH 18/22] t5319: don't write MIDX bitmaps in t5319
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (16 preceding siblings ...)
  2021-04-09 18:12 ` [PATCH 17/22] t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP Taylor Blau
@ 2021-04-09 18:12 ` Taylor Blau
  2021-04-09 18:12 ` [PATCH 19/22] t7700: update to work with MIDX bitmap test knob Taylor Blau
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:12 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

This test is specifically about generating a midx still respecting a
pack-based bitmap file. Generating a MIDX bitmap would confuse the test.
Let's override the 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' variable to
make sure we don't do so.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5319-multi-pack-index.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 5641d158df..69f1c815aa 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -474,7 +474,8 @@ test_expect_success 'repack preserves multi-pack-index when creating packs' '
 compare_results_with_midx "after repack"
 
 test_expect_success 'multi-pack-index and pack-bitmap' '
-	git -c repack.writeBitmaps=true repack -ad &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -c repack.writeBitmaps=true repack -ad &&
 	git multi-pack-index write &&
 	git rev-list --test-bitmap HEAD
 '
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH 19/22] t7700: update to work with MIDX bitmap test knob
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (17 preceding siblings ...)
  2021-04-09 18:12 ` [PATCH 18/22] t5319: don't write MIDX bitmaps in t5319 Taylor Blau
@ 2021-04-09 18:12 ` Taylor Blau
  2021-04-09 18:12 ` [PATCH 20/22] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:12 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A number of these tests are focused only on pack-based bitmaps and need
to be updated to disable 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' where
necessary.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t7700-repack.sh | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 25b235c063..98eda3bfeb 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -63,13 +63,14 @@ test_expect_success 'objects in packs marked .keep are not repacked' '
 
 test_expect_success 'writing bitmaps via command-line can duplicate .keep objects' '
 	# build on $oid, $packid, and .keep state from previous
-	git repack -Adbl &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 git repack -Adbl &&
 	test_has_duplicate_object true
 '
 
 test_expect_success 'writing bitmaps via config can duplicate .keep objects' '
 	# build on $oid, $packid, and .keep state from previous
-	git -c repack.writebitmaps=true repack -Adl &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -c repack.writebitmaps=true repack -Adl &&
 	test_has_duplicate_object true
 '
 
@@ -189,7 +190,9 @@ test_expect_success 'repack --keep-pack' '
 
 test_expect_success 'bitmaps are created by default in bare repos' '
 	git clone --bare .git bare.git &&
-	git -C bare.git repack -ad &&
+	rm -f bare.git/objects/pack/*.bitmap &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -C bare.git repack -ad &&
 	bitmap=$(ls bare.git/objects/pack/*.bitmap) &&
 	test_path_is_file "$bitmap"
 '
@@ -200,7 +203,8 @@ test_expect_success 'incremental repack does not complain' '
 '
 
 test_expect_success 'bitmaps can be disabled on bare repos' '
-	git -c repack.writeBitmaps=false -C bare.git repack -ad &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -c repack.writeBitmaps=false -C bare.git repack -ad &&
 	bitmap=$(ls bare.git/objects/pack/*.bitmap || :) &&
 	test -z "$bitmap"
 '
@@ -211,7 +215,8 @@ test_expect_success 'no bitmaps created if .keep files present' '
 	keep=${pack%.pack}.keep &&
 	test_when_finished "rm -f \"\$keep\"" &&
 	>"$keep" &&
-	git -C bare.git repack -ad 2>stderr &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -C bare.git repack -ad 2>stderr &&
 	test_must_be_empty stderr &&
 	find bare.git/objects/pack/ -type f -name "*.bitmap" >actual &&
 	test_must_be_empty actual
@@ -222,7 +227,8 @@ test_expect_success 'auto-bitmaps do not complain if unavailable' '
 	blob=$(test-tool genrandom big $((1024*1024)) |
 	       git -C bare.git hash-object -w --stdin) &&
 	git -C bare.git update-ref refs/tags/big $blob &&
-	git -C bare.git repack -ad 2>stderr &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -C bare.git repack -ad 2>stderr &&
 	test_must_be_empty stderr &&
 	find bare.git/objects/pack -type f -name "*.bitmap" >actual &&
 	test_must_be_empty actual
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH 20/22] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (18 preceding siblings ...)
  2021-04-09 18:12 ` [PATCH 19/22] t7700: update to work with MIDX bitmap test knob Taylor Blau
@ 2021-04-09 18:12 ` Taylor Blau
  2021-04-09 18:12 ` [PATCH 21/22] p5310: extract full and partial bitmap tests Taylor Blau
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:12 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Introduce a new 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' environment
variable to also write a multi-pack bitmap when
'GIT_TEST_MULTI_PACK_INDEX' is set.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c          | 13 ++++++++++---
 ci/run-build-and-tests.sh |  1 +
 midx.h                    |  2 ++
 t/README                  |  4 ++++
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 2847fdfbab..3cb843fb59 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -515,7 +515,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		if (!(pack_everything & ALL_INTO_ONE) ||
 		    !is_bare_repository())
 			write_bitmaps = 0;
-	}
+	} else if (write_bitmaps &&
+		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0) &&
+		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0))
+		write_bitmaps = 0;
 	if (pack_kept_objects < 0)
 		pack_kept_objects = write_bitmaps > 0;
 
@@ -720,8 +723,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		update_server_info(0);
 	remove_temporary_files();
 
-	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
-		write_midx_file(get_object_directory(), NULL, 0);
+	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0)) {
+		unsigned flags = 0;
+		if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0))
+			flags |= MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX;
+		write_midx_file(get_object_directory(), NULL, flags);
+	}
 
 	string_list_clear(&names, 0);
 	string_list_clear(&rollback, 0);
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index a66b5e8c75..7c55a5033e 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -22,6 +22,7 @@ linux-gcc)
 	export GIT_TEST_COMMIT_GRAPH=1
 	export GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1
 	export GIT_TEST_MULTI_PACK_INDEX=1
+	export GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1
 	export GIT_TEST_ADD_I_USE_BUILTIN=1
 	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
 	export GIT_TEST_WRITE_REV_INDEX=1
diff --git a/midx.h b/midx.h
index 350f4d0a7b..aa3da557bb 100644
--- a/midx.h
+++ b/midx.h
@@ -8,6 +8,8 @@ struct pack_entry;
 struct repository;
 
 #define GIT_TEST_MULTI_PACK_INDEX "GIT_TEST_MULTI_PACK_INDEX"
+#define GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP \
+	"GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP"
 
 struct multi_pack_index {
 	struct multi_pack_index *next;
diff --git a/t/README b/t/README
index fd9375b146..956731da44 100644
--- a/t/README
+++ b/t/README
@@ -420,6 +420,10 @@ GIT_TEST_MULTI_PACK_INDEX=<boolean>, when true, forces the multi-pack-
 index to be written after every 'git repack' command, and overrides the
 'core.multiPackIndex' setting to true.
 
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=<boolean>, when true, sets the
+'--bitmap' option on all invocations of 'git multi-pack-index write',
+and ignores pack-objects' '--write-bitmap-index'.
+
 GIT_TEST_SIDEBAND_ALL=<boolean>, when true, overrides the
 'uploadpack.allowSidebandAll' setting to true, and when false, forces
 fetch-pack to not request sideband-all (even if the server advertises
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH 21/22] p5310: extract full and partial bitmap tests
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (19 preceding siblings ...)
  2021-04-09 18:12 ` [PATCH 20/22] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
@ 2021-04-09 18:12 ` Taylor Blau
  2021-04-09 18:12 ` [PATCH 22/22] p5326: perf tests for MIDX bitmaps Taylor Blau
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:12 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A new p5326 introduced by the next patch will want these same tests,
interjecting its own setup in between. Move them out so that both perf
tests can reuse them.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/lib-bitmap.sh         | 69 ++++++++++++++++++++++++++++++++++++
 t/perf/p5310-pack-bitmaps.sh | 65 ++-------------------------------
 2 files changed, 72 insertions(+), 62 deletions(-)
 create mode 100644 t/perf/lib-bitmap.sh

diff --git a/t/perf/lib-bitmap.sh b/t/perf/lib-bitmap.sh
new file mode 100644
index 0000000000..63d3bc7cec
--- /dev/null
+++ b/t/perf/lib-bitmap.sh
@@ -0,0 +1,69 @@
+# Helper functions for testing bitmap performance; see p5310.
+
+test_full_bitmap () {
+	test_perf 'simulated clone' '
+		git pack-objects --stdout --all </dev/null >/dev/null
+	'
+
+	test_perf 'simulated fetch' '
+		have=$(git rev-list HEAD~100 -1) &&
+		{
+			echo HEAD &&
+			echo ^$have
+		} | git pack-objects --revs --stdout >/dev/null
+	'
+
+	test_perf 'pack to file (bitmap)' '
+		git pack-objects --use-bitmap-index --all pack1b </dev/null >/dev/null
+	'
+
+	test_perf 'rev-list (commits)' '
+		git rev-list --all --use-bitmap-index >/dev/null
+	'
+
+	test_perf 'rev-list (objects)' '
+		git rev-list --all --use-bitmap-index --objects >/dev/null
+	'
+
+	test_perf 'rev-list with tag negated via --not --all (objects)' '
+		git rev-list perf-tag --not --all --use-bitmap-index --objects >/dev/null
+	'
+
+	test_perf 'rev-list with negative tag (objects)' '
+		git rev-list HEAD --not perf-tag --use-bitmap-index --objects >/dev/null
+	'
+
+	test_perf 'rev-list count with blob:none' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=blob:none >/dev/null
+	'
+
+	test_perf 'rev-list count with blob:limit=1k' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=blob:limit=1k >/dev/null
+	'
+
+	test_perf 'rev-list count with tree:0' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=tree:0 >/dev/null
+	'
+
+	test_perf 'simulated partial clone' '
+		git pack-objects --stdout --all --filter=blob:none </dev/null >/dev/null
+	'
+}
+
+test_partial_bitmap () {
+	test_perf 'clone (partial bitmap)' '
+		git pack-objects --stdout --all </dev/null >/dev/null
+	'
+
+	test_perf 'pack to file (partial bitmap)' '
+		git pack-objects --use-bitmap-index --all pack2b </dev/null >/dev/null
+	'
+
+	test_perf 'rev-list with tree filter (partial bitmap)' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=tree:0 >/dev/null
+	'
+}
diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index 452be01056..7ad4f237bc 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -2,6 +2,7 @@
 
 test_description='Tests pack performance using bitmaps'
 . ./perf-lib.sh
+. "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
 
 test_perf_large_repo
 
@@ -25,56 +26,7 @@ test_perf 'repack to disk' '
 	git repack -ad
 '
 
-test_perf 'simulated clone' '
-	git pack-objects --stdout --all </dev/null >/dev/null
-'
-
-test_perf 'simulated fetch' '
-	have=$(git rev-list HEAD~100 -1) &&
-	{
-		echo HEAD &&
-		echo ^$have
-	} | git pack-objects --revs --stdout >/dev/null
-'
-
-test_perf 'pack to file (bitmap)' '
-	git pack-objects --use-bitmap-index --all pack1b </dev/null >/dev/null
-'
-
-test_perf 'rev-list (commits)' '
-	git rev-list --all --use-bitmap-index >/dev/null
-'
-
-test_perf 'rev-list (objects)' '
-	git rev-list --all --use-bitmap-index --objects >/dev/null
-'
-
-test_perf 'rev-list with tag negated via --not --all (objects)' '
-	git rev-list perf-tag --not --all --use-bitmap-index --objects >/dev/null
-'
-
-test_perf 'rev-list with negative tag (objects)' '
-	git rev-list HEAD --not perf-tag --use-bitmap-index --objects >/dev/null
-'
-
-test_perf 'rev-list count with blob:none' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=blob:none >/dev/null
-'
-
-test_perf 'rev-list count with blob:limit=1k' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=blob:limit=1k >/dev/null
-'
-
-test_perf 'rev-list count with tree:0' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=tree:0 >/dev/null
-'
-
-test_perf 'simulated partial clone' '
-	git pack-objects --stdout --all --filter=blob:none </dev/null >/dev/null
-'
+test_full_bitmap
 
 test_expect_success 'create partial bitmap state' '
 	# pick a commit to represent the repo tip in the past
@@ -97,17 +49,6 @@ test_expect_success 'create partial bitmap state' '
 	git update-ref HEAD $orig_tip
 '
 
-test_perf 'clone (partial bitmap)' '
-	git pack-objects --stdout --all </dev/null >/dev/null
-'
-
-test_perf 'pack to file (partial bitmap)' '
-	git pack-objects --use-bitmap-index --all pack2b </dev/null >/dev/null
-'
-
-test_perf 'rev-list with tree filter (partial bitmap)' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=tree:0 >/dev/null
-'
+test_partial_bitmap
 
 test_done
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH 22/22] p5326: perf tests for MIDX bitmaps
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (20 preceding siblings ...)
  2021-04-09 18:12 ` [PATCH 21/22] p5310: extract full and partial bitmap tests Taylor Blau
@ 2021-04-09 18:12 ` Taylor Blau
  2021-05-04 18:00   ` Jonathan Tan
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                   ` (3 subsequent siblings)
  25 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-04-09 18:12 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

These new performance tests demonstrate effectively the same behavior as
p5310, but use a multi-pack bitmap instead of a single-pack one.

Notably, p5326 does not create a MIDX bitmap with multiple packs. This
is so we can measure a direct comparison between it and p5310. Any
difference between the two is measuring just the overhead of using MIDX
bitmaps.

Here are the results of p5310 and p5326 together, measured at the same
time and on the same machine (using a Xenon W-2255 CPU):

    Test                                                  HEAD
    ------------------------------------------------------------------------
    5310.2: repack to disk                                96.78(93.39+11.33)
    5310.3: simulated clone                               9.98(9.79+0.19)
    5310.4: simulated fetch                               1.75(4.26+0.19)
    5310.5: pack to file (bitmap)                         28.20(27.87+8.70)
    5310.6: rev-list (commits)                            0.41(0.36+0.05)
    5310.7: rev-list (objects)                            1.61(1.54+0.07)
    5310.8: rev-list count with blob:none                 0.25(0.21+0.04)
    5310.9: rev-list count with blob:limit=1k             2.65(2.54+0.10)
    5310.10: rev-list count with tree:0                   0.23(0.19+0.04)
    5310.11: simulated partial clone                      4.34(4.21+0.12)
    5310.13: clone (partial bitmap)                       11.05(12.21+0.48)
    5310.14: pack to file (partial bitmap)                31.25(34.22+3.70)
    5310.15: rev-list with tree filter (partial bitmap)   0.26(0.22+0.04)

versus the same tests (this time using a multi-pack index):

    Test                                                  HEAD
    ------------------------------------------------------------------------
    5326.2: setup multi-pack index                        78.99(75.29+11.58)
    5326.3: simulated clone                               11.78(11.56+0.22)
    5326.4: simulated fetch                               1.70(4.49+0.13)
    5326.5: pack to file (bitmap)                         28.02(27.72+8.76)
    5326.6: rev-list (commits)                            0.42(0.36+0.06)
    5326.7: rev-list (objects)                            1.65(1.58+0.06)
    5326.8: rev-list count with blob:none                 0.26(0.21+0.05)
    5326.9: rev-list count with blob:limit=1k             2.97(2.86+0.10)
    5326.10: rev-list count with tree:0                   0.25(0.20+0.04)
    5326.11: simulated partial clone                      5.65(5.49+0.16)
    5326.13: clone (partial bitmap)                       12.22(13.43+0.38)
    5326.14: pack to file (partial bitmap)                30.05(31.57+7.25)
    5326.15: rev-list with tree filter (partial bitmap)   0.24(0.20+0.04)

There is slight overhead in "simulated clone", "simulated partial
clone", and "clone (partial bitmap)". Unsurprisingly, that overhead is
due to using the MIDX's reverse index to map between bit positions and
MIDX positions.

This can be reproduced by running "git repack -adb" along with "git
multi-pack-index write --bitmap" in a large-ish repository. Then run:

    $ perf record -o pack.perf git -c core.multiPackIndex=false \
      pack-objects --all --stdout >/dev/null </dev/null
    $ perf record -o midx.perf git -c core.multiPackIndex=true \
      pack-objects --all --stdout >/dev/null </dev/null

and compare the two with "perf diff -c delta -o 1 pack.perf midx.perf".
The most notable results are below (the next largest positive delta is
+0.14%):

    # Event 'cycles'
    #
    # Baseline    Delta  Shared Object       Symbol
    # ........  .......  ..................  ..........................
    #
                 +5.86%  git                 [.] nth_midxed_offset
                 +5.24%  git                 [.] nth_midxed_pack_int_id
         3.45%   +0.97%  git                 [.] offset_to_pack_pos
         3.30%   +0.57%  git                 [.] pack_pos_to_offset
                 +0.30%  git                 [.] pack_pos_to_midx

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/p5326-multi-pack-bitmaps.sh | 43 ++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)
 create mode 100755 t/perf/p5326-multi-pack-bitmaps.sh

diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
new file mode 100755
index 0000000000..5845109ac7
--- /dev/null
+++ b/t/perf/p5326-multi-pack-bitmaps.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+
+test_description='Tests performance using midx bitmaps'
+. ./perf-lib.sh
+. "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
+
+test_perf_large_repo
+
+test_expect_success 'enable multi-pack index' '
+	git config core.multiPackIndex true
+'
+
+test_perf 'setup multi-pack index' '
+	git repack -ad &&
+	git multi-pack-index write --bitmap
+'
+
+test_full_bitmap
+
+test_expect_success 'create partial bitmap state' '
+	# pick a commit to represent the repo tip in the past
+	cutoff=$(git rev-list HEAD~100 -1) &&
+	orig_tip=$(git rev-parse HEAD) &&
+
+	# now pretend we have just one tip
+	rm -rf .git/logs .git/refs/* .git/packed-refs &&
+	git update-ref HEAD $cutoff &&
+
+	# and then repack, which will leave us with a nice
+	# big bitmap pack of the "old" history, and all of
+	# the new history will be loose, as if it had been pushed
+	# up incrementally and exploded via unpack-objects
+	git repack -Ad &&
+	git multi-pack-index write --bitmap &&
+
+	# and now restore our original tip, as if the pushes
+	# had happened
+	git update-ref HEAD $orig_tip
+'
+
+test_partial_bitmap
+
+test_done
-- 
2.31.1.163.ga65ce7f831

^ permalink raw reply related	[flat|nested] 273+ messages in thread

* Re: [PATCH 12/22] pack-bitmap: read multi-pack bitmaps
  2021-04-09 18:11 ` [PATCH 12/22] pack-bitmap: read multi-pack bitmaps Taylor Blau
@ 2021-04-16  2:39   ` Jonathan Tan
  2021-04-16  3:13     ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Jonathan Tan @ 2021-04-16  2:39 UTC (permalink / raw)
  To: me; +Cc: git, peff, dstolee, gitster, jonathantanmy

I'll review until this patch for now. Hopefully I'll get to the rest
soon.

> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 5205dde2e1..a4e4e4ebcc 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -984,7 +984,17 @@ static void write_reused_pack(struct hashfile *f)
>  				break;
>  
>  			offset += ewah_bit_ctz64(word >> offset);
> -			write_reused_pack_one(pos + offset, f, &w_curs);
> +			if (bitmap_is_midx(bitmap_git)) {
> +				off_t pack_offs = bitmap_pack_offset(bitmap_git,
> +								     pos + offset);
> +				uint32_t pos;
> +
> +				if (offset_to_pack_pos(reuse_packfile, pack_offs, &pos) < 0)
> +					die(_("write_reused_pack: could not locate %"PRIdMAX),
> +					    (intmax_t)pack_offs);
> +				write_reused_pack_one(pos, f, &w_curs);
> +			} else
> +				write_reused_pack_one(pos + offset, f, &w_curs);
>  			display_progress(progress_state, ++written);
>  		}
>  	}

When bitmaps are used, pos + offset is the pseudo-pack (a virtual
concatenation of all packfiles in the MIDX) position (as in, first
object is 0, second object is 1, and so on), not a position in
a single packfile. From it, we obtain a pack offset, and from it, we
obtain a position in the reused packfile (reuse_packfile). In this way,
the code is equivalent to the non-MIDX case. Looks good.

(There is no need to select a packfile here in the case of MIDX because,
as the code later shows, we always reuse only one packfile - assigned to
reuse_packfile.)

> @@ -35,8 +36,15 @@ struct stored_bitmap {
>   * the active bitmap index is the largest one.
>   */
>  struct bitmap_index {
> -	/* Packfile to which this bitmap index belongs to */
> +	/*
> +	 * The pack or multi-pack index (MIDX) that this bitmap index belongs
> +	 * to.
> +	 *
> +	 * Exactly one of these must be non-NULL; this specifies the object
> +	 * order used to interpret this bitmap.
> +	 */
>  	struct packed_git *pack;
> +	struct multi_pack_index *midx;

Makes sense.

> @@ -71,6 +79,8 @@ struct bitmap_index {
>  	/* If not NULL, this is a name-hash cache pointing into map. */
>  	uint32_t *hashes;
>  
> +	const unsigned char *checksum;
> +
>  	/*
>  	 * Extended index.
>  	 *

I see later that this checksum is used, OK. Maybe comment that this
points into map (just like "hashes", as quoted above).

> @@ -281,6 +304,54 @@ static char *pack_bitmap_filename(struct packed_git *p)
>  	return xstrfmt("%.*s.bitmap", (int)len, p->pack_name);
>  }
>  
> +static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
> +			      struct multi_pack_index *midx)
> +{
> +	struct stat st;
> +	char *idx_name = midx_bitmap_filename(midx);
> +	int fd = git_open(idx_name);
> +
> +	free(idx_name);
> +
> +	if (fd < 0)
> +		return -1;
> +
> +	if (fstat(fd, &st)) {
> +		close(fd);
> +		return -1;
> +	}
> +
> +	if (bitmap_git->pack || bitmap_git->midx) {
> +		/* ignore extra bitmap file; we can only handle one */
> +		return -1;

Here, fd is not closed? Maybe better to have multiple cleanup stages
(one when the mmap has been built, and one when not).

> @@ -302,12 +373,18 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
>  		return -1;
>  	}
>  
> -	if (bitmap_git->pack) {
> +	if (bitmap_git->pack || bitmap_git->midx) {
> +		/* ignore extra bitmap file; we can only handle one */
>  		warning("ignoring extra bitmap file: %s", packfile->pack_name);
>  		close(fd);
>  		return -1;
>  	}
>  
> +	if (!is_pack_valid(packfile)) {
> +		close(fd);
> +		return -1;
> +	}

Why is this needed now (and presumably, not before)?

> -static int load_pack_bitmap(struct bitmap_index *bitmap_git)
> +static int load_reverse_index(struct bitmap_index *bitmap_git)
> +{
> +	if (bitmap_is_midx(bitmap_git)) {
> +		uint32_t i;
> +		int ret;
> +
> +		ret = load_midx_revindex(bitmap_git->midx);
> +		if (ret)
> +			return ret;
> +
> +		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
> +			if (prepare_midx_pack(the_repository, bitmap_git->midx, i))
> +				die(_("load_reverse_index: could not open pack"));
> +			ret = load_pack_revindex(bitmap_git->midx->packs[i]);

I was thinking about why we still need per-pack revindex, but I think
the answer is that we still need to convert pack offsets (roughly
speaking, 0 to size of packfile in bytes) to pack positions (0 to number
of objects) (and one such conversion is in the quoted section of
builtin/pack-objects.c above), and MIDX does not provide this. OK, makes
sense.

> +			if (ret)
> +				return ret;
> +		}
> +		return 0;
> +	}
> +	return load_pack_revindex(bitmap_git->pack);
> +}

[snip]

> @@ -428,10 +552,26 @@ static inline int bitmap_position_packfile(struct bitmap_index *bitmap_git,
>  	return pos;
>  }
>  
> +static int bitmap_position_midx(struct bitmap_index *bitmap_git,
> +				const struct object_id *oid)
> +{
> +	uint32_t want, got;
> +	if (!bsearch_midx(oid, bitmap_git->midx, &want))
> +		return -1;
> +
> +	if (midx_to_pack_pos(bitmap_git->midx, want, &got) < 0)
> +		return -1;
> +	return got;
> +}

bsearch_midx() gives us the position in the MIDX (e.g. if we had an
object with the name 00...00, "want" will be 0, and if we had an
object with the name ff...ff, "want" will be the number of objects
minus 1). midx_to_pack_pos() converts that into the position in the
pseudo-pack, which is what we want. OK.

> @@ -730,14 +871,28 @@ static void show_objects_for_type(
>  
>  			offset += ewah_bit_ctz64(word >> offset);
>  
> -			index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset);
> -			ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset);
> -			nth_packed_object_id(&oid, bitmap_git->pack, index_pos);
> +			if (bitmap_is_midx(bitmap_git)) {
> +				struct multi_pack_index *m = bitmap_git->midx;
> +				uint32_t pack_id;
> +
> +				index_pos = pack_pos_to_midx(m, pos + offset);
> +				ofs = nth_midxed_offset(m, index_pos);
> +				nth_midxed_object_oid(&oid, m, index_pos);
> +
> +				pack_id = nth_midxed_pack_int_id(m, index_pos);
> +				pack = bitmap_git->midx->packs[pack_id];

This is similar to the builtin/pack-objects.c case right at the start of
this patch. (bitmap_pack_offset(), used in builtin/pack-objects.c, is
pack_pos_to_midx() and nth_midx_offset() chained.) OK.

> +			} else {
> +				index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset);
> +				ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset);
> +				nth_bitmap_object_oid(bitmap_git, &oid, index_pos);
> +
> +				pack = bitmap_git->pack;
> +			}
>  
>  			if (bitmap_git->hashes)
>  				hash = get_be32(bitmap_git->hashes + index_pos);
>  
> -			show_reach(&oid, object_type, 0, hash, bitmap_git->pack, ofs);
> +			show_reach(&oid, object_type, 0, hash, pack, ofs);
>  		}
>  	}
>  }
> @@ -749,8 +904,13 @@ static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
>  		struct object *object = roots->item;
>  		roots = roots->next;
>  
> -		if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
> -			return 1;
> +		if (bitmap_is_midx(bitmap_git)) {
> +			if (bsearch_midx(&object->oid, bitmap_git->midx, NULL))
> +				return 1;
> +		} else {
> +			if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
> +				return 1;
> +		}
>  	}
>  
>  	return 0;

OK - we don't actually care about the position, just that it exists,
which is why we can pass NULL as the last argument to bsearch_midx().

> @@ -839,14 +999,26 @@ static void filter_bitmap_blob_none(struct bitmap_index *bitmap_git,
>  static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
>  				     uint32_t pos)
>  {
> -	struct packed_git *pack = bitmap_git->pack;
>  	unsigned long size;
>  	struct object_info oi = OBJECT_INFO_INIT;
>  
>  	oi.sizep = &size;
>  
>  	if (pos < bitmap_num_objects(bitmap_git)) {
> -		off_t ofs = pack_pos_to_offset(pack, pos);
> +		struct packed_git *pack;
> +		off_t ofs;
> +
> +		if (bitmap_is_midx(bitmap_git)) {
> +			uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
> +			uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
> +
> +			pack = bitmap_git->midx->packs[pack_id];
> +			ofs = nth_midxed_offset(bitmap_git->midx, midx_pos);
> +		} else {
> +			pack = bitmap_git->pack;
> +			ofs = pack_pos_to_offset(pack, pos);
> +		}
> +
>  		if (packed_object_info(the_repository, pack, ofs, &oi) < 0) {
>  			struct object_id oid;
>  			nth_bitmap_object_oid(bitmap_git, &oid,

Makes sense - "pos" is the position in the pseudo-pack. From it we get
the MIDX position, and then we can get the pack ID and pack offset as
usual.

> @@ -1081,15 +1253,29 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
>  			      struct bitmap *reuse,
>  			      struct pack_window **w_curs)
>  {
> -	off_t offset, header;
> +	struct packed_git *pack;
> +	off_t offset, delta_obj_offset;
>  	enum object_type type;
>  	unsigned long size;
>  
>  	if (pos >= bitmap_num_objects(bitmap_git))
>  		return; /* not actually in the pack or MIDX */
>  
> -	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
> -	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
> +	if (bitmap_is_midx(bitmap_git)) {
> +		uint32_t pack_id, midx_pos;
> +
> +		midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
> +		pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
> +
> +		pack = bitmap_git->midx->packs[pack_id];
> +		offset = nth_midxed_offset(bitmap_git->midx, midx_pos);

Would it be useful to assert somewhere here that "pack" is the preferred
pack?

Going further, is it reasonable to say that positions 0..n in the
preferred pack (where n is the number of objects in the preferred pack)
match positions 0..n in the pseudo-pack exactly? If yes, maybe we can
simplify things by explaining that we can operate in the MIDX case
exactly (or as similarly as possible) like we operate on a single
packfile because of this, instead of always needing to consider if a
delta base could appear in the MIDX as belonging to another packfile.

> @@ -1538,6 +1792,29 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
>  
>  			offset += ewah_bit_ctz64(word >> offset);
>  			pos = base + offset;
> +
> +			if (bitmap_is_midx(bitmap_git)) {
> +				uint32_t pack_pos;
> +				uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
> +				uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
> +				off_t offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
> +
> +				pack = bitmap_git->midx->packs[pack_id];
> +
> +				if (offset_to_pack_pos(pack, offset, &pack_pos) < 0) {
> +					struct object_id oid;
> +					nth_midxed_object_oid(&oid, bitmap_git->midx, midx_pos);
> +
> +					die(_("could not find %s in pack #%"PRIu32" at offset %"PRIuMAX),
> +					    oid_to_hex(&oid),
> +					    pack_id,
> +					    (uintmax_t)offset);
> +				}
> +
> +				pos = pack_pos;
> +			} else
> +				pack = bitmap_git->pack;
> +
>  			total += pack_pos_to_offset(pack, pos + 1) -
>  				 pack_pos_to_offset(pack, pos);
>  		}

"pos" is assigned to twice in the MIDX case (with different semantics).
I think it's better to do it like in the rest of the patch - use "base +
offset" as the argument to pack_pos_to_midx, and then you wouldn't need
to assign to "pos" twice.

> diff --git a/packfile.c b/packfile.c
> index 8668345d93..c444e365a3 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -863,7 +863,7 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
>  	if (!strcmp(file_name, "multi-pack-index"))
>  		return;
>  	if (starts_with(file_name, "multi-pack-index") &&
> -	    ends_with(file_name, ".rev"))
> +	    (ends_with(file_name, ".bitmap") || ends_with(file_name, ".rev")))
>  		return;
>  	if (ends_with(file_name, ".idx") ||
>  	    ends_with(file_name, ".rev") ||

I guess this will come into play when we start writing MIDX bitmaps?

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH 02/22] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  2021-04-09 18:10 ` [PATCH 02/22] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
@ 2021-04-16  2:46   ` Jonathan Tan
  0 siblings, 0 replies; 273+ messages in thread
From: Jonathan Tan @ 2021-04-16  2:46 UTC (permalink / raw)
  To: me; +Cc: git, peff, dstolee, gitster, jonathantanmy

> @@ -125,15 +125,20 @@ static inline void push_bitmapped_commit(struct commit *commit)
>  	writer.selected_nr++;
>  }
>  
> -static uint32_t find_object_pos(const struct object_id *oid)
> +static uint32_t find_object_pos(const struct object_id *oid, int *found)

find_object_pos() is only called by fill_bitmap_tree() and
fill_bitmap_commit(). fill_bitmap_tree() is only called by itself and
fill_bitmap_commit(). fill_bitmap_commit() is only called by
bitmap_writer_build(). And bitmap_writer_build() is only called by
write_pack_file(), which has been changed to die when
bitmap_writer_build() fails. So looks like everything is accounted for.

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH 12/22] pack-bitmap: read multi-pack bitmaps
  2021-04-16  2:39   ` Jonathan Tan
@ 2021-04-16  3:13     ` Taylor Blau
  0 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-04-16  3:13 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: me, git, peff, dstolee, gitster

On Thu, Apr 15, 2021 at 07:39:25PM -0700, Jonathan Tan wrote:
> I'll review until this patch for now. Hopefully I'll get to the rest
> soon.

Thanks in advance. I always find that you leave insightful comments, so
I appreciate you taking the time to review my patches.

> > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> > index 5205dde2e1..a4e4e4ebcc 100644
> > --- a/builtin/pack-objects.c
> > +++ b/builtin/pack-objects.c
> > @@ -984,7 +984,17 @@ static void write_reused_pack(struct hashfile *f)
> >  				break;
> >
> >  			offset += ewah_bit_ctz64(word >> offset);
> > -			write_reused_pack_one(pos + offset, f, &w_curs);
> > +			if (bitmap_is_midx(bitmap_git)) {
> > +				off_t pack_offs = bitmap_pack_offset(bitmap_git,
> > +								     pos + offset);
> > +				uint32_t pos;
> > +
> > +				if (offset_to_pack_pos(reuse_packfile, pack_offs, &pos) < 0)
> > +					die(_("write_reused_pack: could not locate %"PRIdMAX),
> > +					    (intmax_t)pack_offs);
> > +				write_reused_pack_one(pos, f, &w_curs);
> > +			} else
> > +				write_reused_pack_one(pos + offset, f, &w_curs);
> >  			display_progress(progress_state, ++written);
> >  		}
> >  	}
>
> When bitmaps are used, pos + offset is the pseudo-pack (a virtual
> concatenation of all packfiles in the MIDX) position (as in, first
> object is 0, second object is 1, and so on), not a position in
> a single packfile. From it, we obtain a pack offset, and from it, we
> obtain a position in the reused packfile (reuse_packfile). In this way,
> the code is equivalent to the non-MIDX case. Looks good.
>
> (There is no need to select a packfile here in the case of MIDX because,
> as the code later shows, we always reuse only one packfile - assigned to
> reuse_packfile.)

You're exactly right here on both points.

It's worth noting that the "reuse" you're describing here is only about
reusing sections of the original packfile byte-for-byte (with the
exception of fixing the offsets in any OFS_DELTAs). That's not to be
confused with delta reuse, which is entirely different.

I think that both Peff and I are dubious that the pack-reuse stuff is
kicking in all that much, since there are some heuristics in place about
when it is allowed to take over and when it isn't, but that's a topic
for another thread.

> > @@ -35,8 +36,15 @@ struct stored_bitmap {
> >   * the active bitmap index is the largest one.
> >   */
> >  struct bitmap_index {
> > -	/* Packfile to which this bitmap index belongs to */
> > +	/*
> > +	 * The pack or multi-pack index (MIDX) that this bitmap index belongs
> > +	 * to.
> > +	 *
> > +	 * Exactly one of these must be non-NULL; this specifies the object
> > +	 * order used to interpret this bitmap.
> > +	 */
> >  	struct packed_git *pack;
> > +	struct multi_pack_index *midx;
>
> Makes sense.
>
> > @@ -71,6 +79,8 @@ struct bitmap_index {
> >  	/* If not NULL, this is a name-hash cache pointing into map. */
> >  	uint32_t *hashes;
> >
> > +	const unsigned char *checksum;
> > +
> >  	/*
> >  	 * Extended index.
> >  	 *
>
> I see later that this checksum is used, OK. Maybe comment that this
> points into map (just like "hashes", as quoted above).

Yep, quite fair.

> > +	if (bitmap_git->pack || bitmap_git->midx) {
> > +		/* ignore extra bitmap file; we can only handle one */
> > +		return -1;
>
> Here, fd is not closed? Maybe better to have multiple cleanup stages
> (one when the mmap has been built, and one when not).

Good eyes. That's an oversight, and we should be closing fd there, too.
It looks like we're also missing a warning(), although I am skeptical
that the warning would ever kick in. The pack-based version of this
function is run in a loop over all packs, but the loop doesn't terminate
once a pack bitmap is opened, since we make sure that no *other* packs
have bitmaps, too.

But we don't do the same for multi-pack bitmaps, i.e., once we find a
MIDX that has a bitmap, we terminate immediately. It may be worth
scanning through the list of all MIDXs to make sure that only one has a
bitmap, but to be honest I could go either way on that point, too, since
any MIDX bitmap is worth loading. But the warning doesn't hurt, so I'll
add that, too.

> > +	if (!is_pack_valid(packfile)) {
> > +		close(fd);
> > +		return -1;
> > +	}
>
> Why is this needed now (and presumably, not before)?

It does appear as a stray hunk, and I'm sure that it probably could be
extracted into its own patch. I can't recall anything about this
particular patch that makes it necessary, but maybe Peff remembers
something I don't.

> > @@ -1081,15 +1253,29 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
> >  			      struct bitmap *reuse,
> >  			      struct pack_window **w_curs)
> >  {
> > -	off_t offset, header;
> > +	struct packed_git *pack;
> > +	off_t offset, delta_obj_offset;
> >  	enum object_type type;
> >  	unsigned long size;
> >
> >  	if (pos >= bitmap_num_objects(bitmap_git))
> >  		return; /* not actually in the pack or MIDX */
> >
> > -	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
> > -	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
> > +	if (bitmap_is_midx(bitmap_git)) {
> > +		uint32_t pack_id, midx_pos;
> > +
> > +		midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
> > +		pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
> > +
> > +		pack = bitmap_git->midx->packs[pack_id];
> > +		offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
>
> Would it be useful to assert somewhere here that "pack" is the preferred
> pack?

An assertion like that may hurt this function's cache performance, since
the way we determine the preferred pack is by looking at which pack is
the donor for the 0th object in the MIDX's .rev file. And this function
is rather hot, since it is invoked once per-bit. So it may cause us to
hit more page faults than we currently do.

That all said, the assertion may not be helping much since we only call
this method on objects from a single pack (the bitmapped pack in the
single-pack case, or the preferred pack in the MIDX case). There's a
comment in reuse_partial_packfile_from_bitmap() to this effect, which
may or may not be good enough ;).

> Going further, is it reasonable to say that positions 0..n in the
> preferred pack (where n is the number of objects in the preferred pack)
> match positions 0..n in the pseudo-pack exactly? If yes, maybe we can
> simplify things by explaining that we can operate in the MIDX case
> exactly (or as similarly as possible) like we operate on a single
> packfile because of this, instead of always needing to consider if a
> delta base could appear in the MIDX as belonging to another packfile.

You're right, and there are two things going on here which allow us to
make that assumption:

  - The preferred pack sorts ahead of all other packs in the MIDX when
    assembling the pseudo-pack order, so bits 0..n (where 'n' is the
    number of objects in the preferred pack) of the pseudo pack are
    designated to the preferred pack.

  - When duplicates of objects exist, the MIDX *always* breaks ties in
    favor of the preferred pack, so it's never the case that a delta'd
    object from the preferred pack will find its base in another pack
    (if it asked the MIDX to locate a copy of the base object).

So we can safely remove the conditional on bitmap_is_midx() in the first
part of this function for exactly the reasons above, which is good. That
probably merits moving the comment beginning with "Note that the base
does not need to be repositioned ..." earlier in this function, to make
clear that we really can treat bits from the preferred pack as if they
don't have anything to do with the MIDX at all.

So long as we determine the preferred pack ahead of time (and not once
per-call), I think that it would be a win.

> > @@ -1538,6 +1792,29 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
> >
> >  			offset += ewah_bit_ctz64(word >> offset);
> >  			pos = base + offset;
> > +
> > +			if (bitmap_is_midx(bitmap_git)) {
> > +				uint32_t pack_pos;
> > +				uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
> > +				uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
> > +				off_t offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
> > +
> > +				pack = bitmap_git->midx->packs[pack_id];
> > +
> > +				if (offset_to_pack_pos(pack, offset, &pack_pos) < 0) {
> > +					struct object_id oid;
> > +					nth_midxed_object_oid(&oid, bitmap_git->midx, midx_pos);
> > +
> > +					die(_("could not find %s in pack #%"PRIu32" at offset %"PRIuMAX),
> > +					    oid_to_hex(&oid),
> > +					    pack_id,
> > +					    (uintmax_t)offset);
> > +				}
> > +
> > +				pos = pack_pos;
> > +			} else
> > +				pack = bitmap_git->pack;
> > +
> >  			total += pack_pos_to_offset(pack, pos + 1) -
> >  				 pack_pos_to_offset(pack, pos);
> >  		}
>
> "pos" is assigned to twice in the MIDX case (with different semantics).
> I think it's better to do it like in the rest of the patch - use "base +
> offset" as the argument to pack_pos_to_midx, and then you wouldn't need
> to assign to "pos" twice.

Good idea, thanks. Skimming again over the patch, this is the only place
that I could find where I double-assign pos like this.

> > diff --git a/packfile.c b/packfile.c
> > index 8668345d93..c444e365a3 100644
> > --- a/packfile.c
> > +++ b/packfile.c
> > @@ -863,7 +863,7 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
> >  	if (!strcmp(file_name, "multi-pack-index"))
> >  		return;
> >  	if (starts_with(file_name, "multi-pack-index") &&
> > -	    ends_with(file_name, ".rev"))
> > +	    (ends_with(file_name, ".bitmap") || ends_with(file_name, ".rev")))
> >  		return;
> >  	if (ends_with(file_name, ".idx") ||
> >  	    ends_with(file_name, ".rev") ||
>
> I guess this will come into play when we start writing MIDX bitmaps?

Yep, that's right. Since this patch is about making sure we can handle
the MIDX bitmap as described in
Documentation/technical/bitmap-format.txt, this is part of that.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH 13/22] pack-bitmap: write multi-pack bitmaps
  2021-04-09 18:11 ` [PATCH 13/22] pack-bitmap: write " Taylor Blau
@ 2021-05-04  5:02   ` Jonathan Tan
  2021-05-06 20:18     ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Jonathan Tan @ 2021-05-04  5:02 UTC (permalink / raw)
  To: me; +Cc: git, peff, dstolee, gitster, jonathantanmy

> Write multi-pack bitmaps in the format described by
> Documentation/technical/bitmap-format.txt, inferring their presence with
> the absence of '--bitmap'.
> 
> To write a multi-pack bitmap, this patch attempts to reuse as much of
> the existing machinery from pack-objects as possible. Specifically, the
> MIDX code prepares a packing_data struct that pretends as if a single
> packfile has been generated containing all of the objects contained
> within the MIDX.

Sounds good, and makes sense. Conceptually, the MIDX bitmap is the same
as a regular packfile bitmap, just that the order of objects in the
bitmap is defined differently.

> +static void prepare_midx_packing_data(struct packing_data *pdata,
> +				      struct write_midx_context *ctx)
> +{
> +	uint32_t i;
> +
> +	memset(pdata, 0, sizeof(struct packing_data));
> +	prepare_packing_data(the_repository, pdata);
> +
> +	for (i = 0; i < ctx->entries_nr; i++) {
> +		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
> +		struct object_entry *to = packlist_alloc(pdata, &from->oid);
> +
> +		oe_set_in_pack(pdata, to,
> +			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
> +	}
> +}

It is surprising to see this right at the top. Scrolling down, I guess
that there is more information needed than just the packing_data struct.

> +static int add_ref_to_pending(const char *refname,
> +			      const struct object_id *oid,
> +			      int flag, void *cb_data)
> +{
> +	struct rev_info *revs = (struct rev_info*)cb_data;
> +	struct object *object;
> +
> +	if ((flag & REF_ISSYMREF) && (flag & REF_ISBROKEN)) {
> +		warning("symbolic ref is dangling: %s", refname);
> +		return 0;
> +	}
> +
> +	object = parse_object_or_die(oid, refname);
> +	if (object->type != OBJ_COMMIT)
> +		return 0;
> +
> +	add_pending_object(revs, object, "");
> +	if (bitmap_is_preferred_refname(revs->repo, refname))
> +		object->flags |= NEEDS_BITMAP;
> +	return 0;
> +}

Makes sense. We need to flag certain commits as NEEDS_BITMAP because
bitmaps are not made for all commits but only certain ones.

> +struct bitmap_commit_cb {
> +	struct commit **commits;
> +	size_t commits_nr, commits_alloc;
> +
> +	struct write_midx_context *ctx;
> +};
> +
> +static const struct object_id *bitmap_oid_access(size_t index,
> +						 const void *_entries)
> +{
> +	const struct pack_midx_entry *entries = _entries;
> +	return &entries[index].oid;
> +}
> +
> +static void bitmap_show_commit(struct commit *commit, void *_data)
> +{
> +	struct bitmap_commit_cb *data = _data;
> +	if (oid_pos(&commit->object.oid, data->ctx->entries,
> +		    data->ctx->entries_nr,
> +		    bitmap_oid_access) > -1) {
> +		ALLOC_GROW(data->commits, data->commits_nr + 1,
> +			   data->commits_alloc);
> +		data->commits[data->commits_nr++] = commit;
> +	}
> +}
> +
> +static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr_p,
> +						    struct write_midx_context *ctx)
> +{
> +	struct rev_info revs;
> +	struct bitmap_commit_cb cb;
> +
> +	memset(&cb, 0, sizeof(struct bitmap_commit_cb));
> +	cb.ctx = ctx;
> +
> +	repo_init_revisions(the_repository, &revs, NULL);
> +	for_each_ref(add_ref_to_pending, &revs);
> +
> +	fetch_if_missing = 0;
> +	revs.exclude_promisor_objects = 1;

I think that the MIDX bitmap requires all objects be present? If yes, we
should omit these 2 lines.

> +
> +	if (prepare_revision_walk(&revs))
> +		die(_("revision walk setup failed"));
> +
> +	traverse_commit_list(&revs, bitmap_show_commit, NULL, &cb);
> +	if (indexed_commits_nr_p)
> +		*indexed_commits_nr_p = cb.commits_nr;
> +
> +	return cb.commits;
> +}

Hmm...I might be missing something obvious, but this function and its
callbacks seem to be written like this in order to put the returned
commits in a certain order. But later on in write_midx_bitmap(), the
return value of this function is passed to
bitmap_writer_select_commits(), which resorts the list anyway?

> +static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
> +			     struct write_midx_context *ctx,
> +			     unsigned flags)
> +{
> +	struct packing_data pdata;
> +	struct pack_idx_entry **index;
> +	struct commit **commits = NULL;
> +	uint32_t i, commits_nr;
> +	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
> +	int ret;
> +
> +	prepare_midx_packing_data(&pdata, ctx);
> +
> +	commits = find_commits_for_midx_bitmap(&commits_nr, ctx);
> +
> +	/*
> +	 * Build the MIDX-order index based on pdata.objects (which is already
> +	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
> +	 * this order).
> +	 */
> +	ALLOC_ARRAY(index, pdata.nr_objects);
> +	for (i = 0; i < pdata.nr_objects; i++)
> +		index[i] = (struct pack_idx_entry *)&pdata.objects[i];
> +
> +	bitmap_writer_show_progress(flags & MIDX_PROGRESS);
> +	bitmap_writer_build_type_index(&pdata, index, pdata.nr_objects);
> +
> +	/*
> +	 * bitmap_writer_select_commits expects objects in lex order, but
> +	 * pack_order gives us exactly that. use it directly instead of
> +	 * re-sorting the array
> +	 */
> +	for (i = 0; i < pdata.nr_objects; i++)
> +		index[ctx->pack_order[i]] = (struct pack_idx_entry *)&pdata.objects[i];
> +
> +	bitmap_writer_select_commits(commits, commits_nr, -1);

The comment above says bitmap_writer_select_commits() expects objects in
lex order, but (1) you're putting "index" in lex order, not "commits",
and (2) the first thing in bitmap_writer_select_commits() is a QSORT.
Did you mean another function?

> +	ret = bitmap_writer_build(&pdata);
> +	if (!ret)
> +		goto cleanup;
> +
> +	bitmap_writer_set_checksum(midx_hash);
> +	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, 0);

So bitmap_writer_build_type_index() and bitmap_writer_finish() are
called with 2 different orders of commits. Is this expected? If yes,
maybe this is worth a comment.

> +
> +cleanup:
> +	free(index);
> +	free(bitmap_name);
> +	return ret;
> +}

[snip]

> @@ -930,9 +1073,16 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  		for (i = 0; i < ctx.m->num_packs; i++) {
>  			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
>  
> +			if (prepare_midx_pack(the_repository, ctx.m, i)) {
> +				error(_("could not load pack %s"),
> +				      ctx.m->pack_names[i]);
> +				result = 1;
> +				goto cleanup;
> +			}
> +
>  			ctx.info[ctx.nr].orig_pack_int_id = i;
>  			ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]);
> -			ctx.info[ctx.nr].p = NULL;
> +			ctx.info[ctx.nr].p = ctx.m->packs[i];
>  			ctx.info[ctx.nr].expired = 0;
>  			ctx.nr++;
>  		}

Why is this needed now and not before? From what I see in this function,
nothing seems to happen to this .p pack except that they are closed
later.

> @@ -1096,6 +1264,15 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  		if (ctx.info[i].p) {
>  			close_pack(ctx.info[i].p);
>  			free(ctx.info[i].p);
> +			if (ctx.m) {
> +				/*
> +				 * Destroy a stale reference to the pack in
> +				 * 'ctx.m'.
> +				 */
> +				uint32_t orig = ctx.info[i].orig_pack_int_id;
> +				if (orig < ctx.m->num_packs)
> +					ctx.m->packs[orig] = NULL;
> +			}
>  		}
>  		free(ctx.info[i].pack_name);
>  	}

Is this hunk needed? "ctx" is a local variable and will not outlast this
function.

I'll review the rest tomorrow. It seems like I've gotten over the most
difficult patches.

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH 16/22] t5326: test multi-pack bitmap behavior
  2021-04-09 18:12 ` [PATCH 16/22] t5326: test multi-pack bitmap behavior Taylor Blau
@ 2021-05-04 17:51   ` Jonathan Tan
  0 siblings, 0 replies; 273+ messages in thread
From: Jonathan Tan @ 2021-05-04 17:51 UTC (permalink / raw)
  To: me; +Cc: git, peff, dstolee, gitster, jonathantanmy

> +test_expect_success 'clone with bitmaps enabled' '
> +	git clone --no-local --bare . clone-reverse-delta.git &&
> +	test_when_finished "rm -fr clone-reverse-delta.git" &&
> +
> +	git rev-parse HEAD >expect &&
> +	git --git-dir=clone-reverse-delta.git rev-parse HEAD >actual &&
> +	test_cmp expect actual
> +'

What is this test testing? That bitmaps are used? (I'm not sure how to
verify that though - we seem to have tracing for bitmap writing but not
for reading, for example.)

> +bitmap_reuse_tests() {
> +	from=$1
> +	to=$2
> +
> +	test_expect_success "setup pack reuse tests ($from -> $to)" '
> +		rm -fr repo &&
> +		git init repo &&
> +		(
> +			cd repo &&
> +			test_commit_bulk 16 &&
> +			git tag old-tip &&
> +
> +			git config core.multiPackIndex true &&
> +			if test "MIDX" = "$from"
> +			then
> +				GIT_TEST_MULTI_PACK_INDEX=0 git repack -Ad &&
> +				git multi-pack-index write --bitmap
> +			else
> +				GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb
> +			fi
> +		)
> +	'
> +
> +	test_expect_success "build bitmap from existing ($from -> $to)" '
> +		(
> +			cd repo &&
> +			test_commit_bulk --id=further 16 &&
> +			git tag new-tip &&
> +
> +			if test "MIDX" = "$to"
> +			then
> +				GIT_TEST_MULTI_PACK_INDEX=0 git repack -d &&
> +				git multi-pack-index write --bitmap
> +			else
> +				GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb
> +			fi
> +		)
> +	'
> +
> +	test_expect_success "verify resulting bitmaps ($from -> $to)" '
> +		(
> +			cd repo &&
> +			git for-each-ref &&
> +			git rev-list --test-bitmap refs/tags/old-tip &&
> +			git rev-list --test-bitmap refs/tags/new-tip
> +		)
> +	'
> +}
> +
> +bitmap_reuse_tests 'pack' 'MIDX'
> +bitmap_reuse_tests 'MIDX' 'pack'
> +bitmap_reuse_tests 'MIDX' 'MIDX'

Is it possible to verify that the bitmaps have truly been reused (and
not, say, created from scratch)? (E.g. is there any nature of the
bitmap created - for example, the order of commits?)

> +test_expect_success 'pack.preferBitmapTips' '
> +	git init repo &&
> +	test_when_finished "rm -fr repo" &&
> +	(
> +		cd repo &&
> +
> +		test_commit_bulk --message="%s" 103 &&
> +
> +		git log --format="%H" >commits.raw &&
> +		sort <commits.raw >commits &&
> +
> +		git log --format="create refs/tags/%s %H" HEAD >refs &&
> +		git update-ref --stdin <refs &&
> +
> +		git multi-pack-index write --bitmap &&
> +		test_path_is_file $midx &&
> +		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
> +
> +		test-tool bitmap list-commits | sort >bitmaps &&
> +		comm -13 bitmaps commits >before &&
> +		test_line_count = 1 before &&
> +
> +		perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
> +			<before | git update-ref --stdin &&
> +
> +		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
> +		rm -fr $midx-$(midx_checksum $objdir).rev &&
> +		rm -fr $midx &&
> +
> +		git -c pack.preferBitmapTips=refs/tags/include \
> +			multi-pack-index write --bitmap &&
> +		test-tool bitmap list-commits | sort >bitmaps &&
> +		comm -13 bitmaps commits >after &&
> +
> +		! test_cmp before after
> +	)
> +'

Could we have a more precise comparison of "before" and "after" (besides
the fact that they're different)?

Besides that, all the patches up to this one look good (including patch
14, verified with "--color-moved
--color-moved-ws=allow-indentation-change").

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH 22/22] p5326: perf tests for MIDX bitmaps
  2021-04-09 18:12 ` [PATCH 22/22] p5326: perf tests for MIDX bitmaps Taylor Blau
@ 2021-05-04 18:00   ` Jonathan Tan
  2021-05-05  0:55     ` Junio C Hamano
  0 siblings, 1 reply; 273+ messages in thread
From: Jonathan Tan @ 2021-05-04 18:00 UTC (permalink / raw)
  To: me; +Cc: git, peff, dstolee, gitster, jonathantanmy

> There is slight overhead in "simulated clone", "simulated partial
> clone", and "clone (partial bitmap)". Unsurprisingly, that overhead is
> due to using the MIDX's reverse index to map between bit positions and
> MIDX positions.

Thanks - it's great to see that accessing a MIDX bitmap doesn't add much
overhead (as compared to accessing a single-pack bitmap of the same
size).

All the remaining patches up to and including this one look good.
Overall, I did have some comments here and there, but I am happy with
the overall design.

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH 22/22] p5326: perf tests for MIDX bitmaps
  2021-05-04 18:00   ` Jonathan Tan
@ 2021-05-05  0:55     ` Junio C Hamano
  0 siblings, 0 replies; 273+ messages in thread
From: Junio C Hamano @ 2021-05-05  0:55 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: me, git, peff, dstolee

Jonathan Tan <jonathantanmy@google.com> writes:

>> There is slight overhead in "simulated clone", "simulated partial
>> clone", and "clone (partial bitmap)". Unsurprisingly, that overhead is
>> due to using the MIDX's reverse index to map between bit positions and
>> MIDX positions.
>
> Thanks - it's great to see that accessing a MIDX bitmap doesn't add much
> overhead (as compared to accessing a single-pack bitmap of the same
> size).
>
> All the remaining patches up to and including this one look good.
> Overall, I did have some comments here and there, but I am happy with
> the overall design.

Thanks for a review.

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH 13/22] pack-bitmap: write multi-pack bitmaps
  2021-05-04  5:02   ` Jonathan Tan
@ 2021-05-06 20:18     ` Taylor Blau
  2021-05-06 22:00       ` Jonathan Tan
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-05-06 20:18 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, dstolee, gitster

On Mon, May 03, 2021 at 10:02:30PM -0700, Jonathan Tan wrote:
> > +static void prepare_midx_packing_data(struct packing_data *pdata,
> > +				      struct write_midx_context *ctx)
> > +{
> > +	uint32_t i;
> > +
> > +	memset(pdata, 0, sizeof(struct packing_data));
> > +	prepare_packing_data(the_repository, pdata);
> > +
> > +	for (i = 0; i < ctx->entries_nr; i++) {
> > +		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
> > +		struct object_entry *to = packlist_alloc(pdata, &from->oid);
> > +
> > +		oe_set_in_pack(pdata, to,
> > +			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
> > +	}
> > +}
>
> It is surprising to see this right at the top. Scrolling down, I guess
> that there is more information needed than just the packing_data struct.

Hmm, which part is surprising to you? This function is setting up the
packing_data structure that I mentioned in the commit message, which
happens in two steps. First, we allocate and call
prepare_packing_data(). And then we call packlist_alloc() for each
object in the MIDX, setting up some information about each object
(like its OID and which physical pack it came from).

But if any of this is unclear, let me know which part and I'd be happy
to add a clarifying comment.

> > +static int add_ref_to_pending(const char *refname,
> > +			      const struct object_id *oid,
> > +			      int flag, void *cb_data)
> > +{
> > +	struct rev_info *revs = (struct rev_info*)cb_data;
> > +	struct object *object;
> > +
> > +	if ((flag & REF_ISSYMREF) && (flag & REF_ISBROKEN)) {
> > +		warning("symbolic ref is dangling: %s", refname);
> > +		return 0;
> > +	}
> > +
> > +	object = parse_object_or_die(oid, refname);
> > +	if (object->type != OBJ_COMMIT)
> > +		return 0;
> > +
> > +	add_pending_object(revs, object, "");
> > +	if (bitmap_is_preferred_refname(revs->repo, refname))
> > +		object->flags |= NEEDS_BITMAP;
> > +	return 0;
> > +}
>
> Makes sense. We need to flag certain commits as NEEDS_BITMAP because
> bitmaps are not made for all commits but only certain ones.

Right, and the NEEDS_BITMAP is a bit of a misnomer. It's true meaning is
more like BITMAPPING_THIS_WOULD_BE_A_GOOD_IDEA, since it roughly
translates to "bitmap this commit before any others in its window". More
details are in bitmap_writer_select_commits(), but in all honesty I find
the implementation there somewhat confusing.

> > +static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr_p,
> > +						    struct write_midx_context *ctx)
> > +{
> > +	struct rev_info revs;
> > +	struct bitmap_commit_cb cb;
> > +
> > +	memset(&cb, 0, sizeof(struct bitmap_commit_cb));
> > +	cb.ctx = ctx;
> > +
> > +	repo_init_revisions(the_repository, &revs, NULL);
> > +	for_each_ref(add_ref_to_pending, &revs);
> > +
> > +	fetch_if_missing = 0;
> > +	revs.exclude_promisor_objects = 1;
>
> I think that the MIDX bitmap requires all objects be present? If yes, we
> should omit these 2 lines.

It does require that all objects are present, but if we fetched any
promisor objects at this stage it would be too late. That's because by
the time we're in this function, all of the packs that are to be
included in the MIDX should already exist on disk.

Skipping promisor objects here is intentional, since it only excludes
them from the list of reachable commits that we want to select from when
computing the selection of MIDX'd commits to receive bitmaps.

But, if one of those promisor objects is reachable from another object
that is included in the bitmap, then we will complain later on that we
couldn't find a reachability closure (and fail appropriately).

That said, I'm not sure any of that is obvious from reading this code,
so I'll add a comment to that effect around these lines.

> > +
> > +	if (prepare_revision_walk(&revs))
> > +		die(_("revision walk setup failed"));
> > +
> > +	traverse_commit_list(&revs, bitmap_show_commit, NULL, &cb);
> > +	if (indexed_commits_nr_p)
> > +		*indexed_commits_nr_p = cb.commits_nr;
> > +
> > +	return cb.commits;
> > +}
>
> Hmm...I might be missing something obvious, but this function and its
> callbacks seem to be written like this in order to put the returned
> commits in a certain order. But later on in write_midx_bitmap(), the
> return value of this function is passed to
> bitmap_writer_select_commits(), which resorts the list anyway?

It isn't intentional, but rather just to build up the list in topo
order. In fact, the order we build it up in isn't quite the same as how
the pack bitmap code generates it (it is in true topo order, at least on
GitHub's servers, as a side effect of using delta islands).

The fact that we resort according to date_compare makes me wonder why
changing that seemed to make such a difference for us. The whole
selection code is a mystery to me.

But no, the order shouldn't matter since we QSORT it later. Any code
here that looks like it's putting it in a certain order has much more to
do with convenience than anything else.

>
> > +static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
> > +			     struct write_midx_context *ctx,
> > +			     unsigned flags)
> > +{
> > +	struct packing_data pdata;
> > +	struct pack_idx_entry **index;
> > +	struct commit **commits = NULL;
> > +	uint32_t i, commits_nr;
> > +	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
> > +	int ret;
> > +
> > +	prepare_midx_packing_data(&pdata, ctx);
> > +
> > +	commits = find_commits_for_midx_bitmap(&commits_nr, ctx);
> > +
> > +	/*
> > +	 * Build the MIDX-order index based on pdata.objects (which is already
> > +	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
> > +	 * this order).
> > +	 */
> > +	ALLOC_ARRAY(index, pdata.nr_objects);
> > +	for (i = 0; i < pdata.nr_objects; i++)
> > +		index[i] = (struct pack_idx_entry *)&pdata.objects[i];
> > +
> > +	bitmap_writer_show_progress(flags & MIDX_PROGRESS);
> > +	bitmap_writer_build_type_index(&pdata, index, pdata.nr_objects);
> > +
> > +	/*
> > +	 * bitmap_writer_select_commits expects objects in lex order, but
> > +	 * pack_order gives us exactly that. use it directly instead of
> > +	 * re-sorting the array
> > +	 */
> > +	for (i = 0; i < pdata.nr_objects; i++)
> > +		index[ctx->pack_order[i]] = (struct pack_idx_entry *)&pdata.objects[i];
> > +
> > +	bitmap_writer_select_commits(commits, commits_nr, -1);
>
> The comment above says bitmap_writer_select_commits() expects objects in
> lex order, but (1) you're putting "index" in lex order, not "commits",
> and (2) the first thing in bitmap_writer_select_commits() is a QSORT.
> Did you mean another function?

Ack, I definitely meant bitmap_writer_build(). Thanks for catching.

> > +	ret = bitmap_writer_build(&pdata);
> > +	if (!ret)
> > +		goto cleanup;
> > +
> > +	bitmap_writer_set_checksum(midx_hash);
> > +	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, 0);
>
> So bitmap_writer_build_type_index() and bitmap_writer_finish() are
> called with 2 different orders of commits. Is this expected? If yes,
> maybe this is worth a comment.

Confusingly so, but yes, these two do expect different orders. You can
see the same re-sorting going on much more subtly in
pack-write.c:write_idx_file(), which is called by
builtin/pack-objects.c:finish_tmp_packfile(), which happens between
bitmap_writer_build_type_index() and bitmap_writer_finish().

Definitely worth adding a comment.

> > @@ -930,9 +1073,16 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
> >  		for (i = 0; i < ctx.m->num_packs; i++) {
> >  			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
> >
> > +			if (prepare_midx_pack(the_repository, ctx.m, i)) {
> > +				error(_("could not load pack %s"),
> > +				      ctx.m->pack_names[i]);
> > +				result = 1;
> > +				goto cleanup;
> > +			}
> > +
> >  			ctx.info[ctx.nr].orig_pack_int_id = i;
> >  			ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]);
> > -			ctx.info[ctx.nr].p = NULL;
> > +			ctx.info[ctx.nr].p = ctx.m->packs[i];
> >  			ctx.info[ctx.nr].expired = 0;
> >  			ctx.nr++;
> >  		}
>
> Why is this needed now and not before? From what I see in this function,
> nothing seems to happen to this .p pack except that they are closed
> later.

These are used by prepare_midx_packing_data().

> > @@ -1096,6 +1264,15 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
> >  		if (ctx.info[i].p) {
> >  			close_pack(ctx.info[i].p);
> >  			free(ctx.info[i].p);
> > +			if (ctx.m) {
> > +				/*
> > +				 * Destroy a stale reference to the pack in
> > +				 * 'ctx.m'.
> > +				 */
> > +				uint32_t orig = ctx.info[i].orig_pack_int_id;
> > +				if (orig < ctx.m->num_packs)
> > +					ctx.m->packs[orig] = NULL;
> > +			}
> >  		}
> >  		free(ctx.info[i].pack_name);
> >  	}
>
> Is this hunk needed? "ctx" is a local variable and will not outlast this
> function.

I can't remember exactly why I added this. I'll play around with it and
either remove it or add a comment why it's necessary before the next
reroll.

> I'll review the rest tomorrow. It seems like I've gotten over the most
> difficult patches.

Thanks, and sorry that this took me a few days to get back to. I
appreciate your review immensely.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH 13/22] pack-bitmap: write multi-pack bitmaps
  2021-05-06 20:18     ` Taylor Blau
@ 2021-05-06 22:00       ` Jonathan Tan
  0 siblings, 0 replies; 273+ messages in thread
From: Jonathan Tan @ 2021-05-06 22:00 UTC (permalink / raw)
  To: me; +Cc: jonathantanmy, git, peff, dstolee, gitster

> On Mon, May 03, 2021 at 10:02:30PM -0700, Jonathan Tan wrote:
> > > +static void prepare_midx_packing_data(struct packing_data *pdata,
> > > +				      struct write_midx_context *ctx)
> > > +{
> > > +	uint32_t i;
> > > +
> > > +	memset(pdata, 0, sizeof(struct packing_data));
> > > +	prepare_packing_data(the_repository, pdata);
> > > +
> > > +	for (i = 0; i < ctx->entries_nr; i++) {
> > > +		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
> > > +		struct object_entry *to = packlist_alloc(pdata, &from->oid);
> > > +
> > > +		oe_set_in_pack(pdata, to,
> > > +			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
> > > +	}
> > > +}
> >
> > It is surprising to see this right at the top. Scrolling down, I guess
> > that there is more information needed than just the packing_data struct.
> 
> Hmm, which part is surprising to you? This function is setting up the
> packing_data structure that I mentioned in the commit message, which
> happens in two steps. First, we allocate and call
> prepare_packing_data(). And then we call packlist_alloc() for each
> object in the MIDX, setting up some information about each object
> (like its OID and which physical pack it came from).
> 
> But if any of this is unclear, let me know which part and I'd be happy
> to add a clarifying comment.

Ah, I think I was unclear. I was thinking that the commit message led me
to believe that all information needed for creating a bitmap lies in the
packing_data struct, so I would have expected several helper functions
followed by a function that actually writes the packing_data struct.
Maybe the commit message could be rewritten to avoid that confusion, but
it's probably not a big deal.

> > > +static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr_p,
> > > +						    struct write_midx_context *ctx)
> > > +{
> > > +	struct rev_info revs;
> > > +	struct bitmap_commit_cb cb;
> > > +
> > > +	memset(&cb, 0, sizeof(struct bitmap_commit_cb));
> > > +	cb.ctx = ctx;
> > > +
> > > +	repo_init_revisions(the_repository, &revs, NULL);
> > > +	for_each_ref(add_ref_to_pending, &revs);
> > > +
> > > +	fetch_if_missing = 0;
> > > +	revs.exclude_promisor_objects = 1;
> >
> > I think that the MIDX bitmap requires all objects be present? If yes, we
> > should omit these 2 lines.
> 
> It does require that all objects are present, but if we fetched any
> promisor objects at this stage it would be too late. That's because by
> the time we're in this function, all of the packs that are to be
> included in the MIDX should already exist on disk.
> 
> Skipping promisor objects here is intentional, since it only excludes
> them from the list of reachable commits that we want to select from when
> computing the selection of MIDX'd commits to receive bitmaps.
> 
> But, if one of those promisor objects is reachable from another object
> that is included in the bitmap, then we will complain later on that we
> couldn't find a reachability closure (and fail appropriately).
> 
> That said, I'm not sure any of that is obvious from reading this code,
> so I'll add a comment to that effect around these lines.

So you're saying that if we have missing promisor commits as in the
following graph:

   A
  / \
 B   C
 |   |
 .   .
 .   .
 .   .

where B is missing but promised, and only C is NEEDS_BITMAP, then the
MIDX bitmap write will still work? (So the rev walk is intended to walk
through A and C but not B, and because we are only building bitmaps for
C and potentially its ancestors, we only need the objects in C's
transitive closure.) Even if this is true, "exclude_promisor_objects" is
the wrong option here, because it will exclude all commits that came
from a promisor remote (regardless of whether it is present locally).
(That's how "promisor object" is defined in partial-clone.txt.) What we
need would be an option that permits missing links.

And even if we go with that option that permits missing links, it still
remains that we have very little support for missing promisor commits in
Git right now.

It might be better to just assume that MIDX will only be used for full
clones. If you want, you can add a NEEDSWORK explaining the above case.

> > > +
> > > +	if (prepare_revision_walk(&revs))
> > > +		die(_("revision walk setup failed"));
> > > +
> > > +	traverse_commit_list(&revs, bitmap_show_commit, NULL, &cb);
> > > +	if (indexed_commits_nr_p)
> > > +		*indexed_commits_nr_p = cb.commits_nr;
> > > +
> > > +	return cb.commits;
> > > +}
> >
> > Hmm...I might be missing something obvious, but this function and its
> > callbacks seem to be written like this in order to put the returned
> > commits in a certain order. But later on in write_midx_bitmap(), the
> > return value of this function is passed to
> > bitmap_writer_select_commits(), which resorts the list anyway?
> 
> It isn't intentional, but rather just to build up the list in topo
> order. In fact, the order we build it up in isn't quite the same as how
> the pack bitmap code generates it (it is in true topo order, at least on
> GitHub's servers, as a side effect of using delta islands).
> 
> The fact that we resort according to date_compare makes me wonder why
> changing that seemed to make such a difference for us. The whole
> selection code is a mystery to me.
> 
> But no, the order shouldn't matter since we QSORT it later. Any code
> here that looks like it's putting it in a certain order has much more to
> do with convenience than anything else.

If the order doesn't matter, why don't you just copy one-by-one from
data->ctx->entries into data->commits? (Unless data->ctx->entries has
extra commits?)

> > > +	ret = bitmap_writer_build(&pdata);
> > > +	if (!ret)
> > > +		goto cleanup;
> > > +
> > > +	bitmap_writer_set_checksum(midx_hash);
> > > +	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, 0);
> >
> > So bitmap_writer_build_type_index() and bitmap_writer_finish() are
> > called with 2 different orders of commits. Is this expected? If yes,
> > maybe this is worth a comment.
> 
> Confusingly so, but yes, these two do expect different orders. You can
> see the same re-sorting going on much more subtly in
> pack-write.c:write_idx_file(), which is called by
> builtin/pack-objects.c:finish_tmp_packfile(), which happens between
> bitmap_writer_build_type_index() and bitmap_writer_finish().
> 
> Definitely worth adding a comment.

Ah, I see. Thanks for your explanation.

> > > @@ -930,9 +1073,16 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
> > >  		for (i = 0; i < ctx.m->num_packs; i++) {
> > >  			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
> > >
> > > +			if (prepare_midx_pack(the_repository, ctx.m, i)) {
> > > +				error(_("could not load pack %s"),
> > > +				      ctx.m->pack_names[i]);
> > > +				result = 1;
> > > +				goto cleanup;
> > > +			}
> > > +
> > >  			ctx.info[ctx.nr].orig_pack_int_id = i;
> > >  			ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]);
> > > -			ctx.info[ctx.nr].p = NULL;
> > > +			ctx.info[ctx.nr].p = ctx.m->packs[i];
> > >  			ctx.info[ctx.nr].expired = 0;
> > >  			ctx.nr++;
> > >  		}
> >
> > Why is this needed now and not before? From what I see in this function,
> > nothing seems to happen to this .p pack except that they are closed
> > later.
> 
> These are used by prepare_midx_packing_data().

Ah, thanks.

> > > @@ -1096,6 +1264,15 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
> > >  		if (ctx.info[i].p) {
> > >  			close_pack(ctx.info[i].p);
> > >  			free(ctx.info[i].p);
> > > +			if (ctx.m) {
> > > +				/*
> > > +				 * Destroy a stale reference to the pack in
> > > +				 * 'ctx.m'.
> > > +				 */
> > > +				uint32_t orig = ctx.info[i].orig_pack_int_id;
> > > +				if (orig < ctx.m->num_packs)
> > > +					ctx.m->packs[orig] = NULL;
> > > +			}
> > >  		}
> > >  		free(ctx.info[i].pack_name);
> > >  	}
> >
> > Is this hunk needed? "ctx" is a local variable and will not outlast this
> > function.
> 
> I can't remember exactly why I added this. I'll play around with it and
> either remove it or add a comment why it's necessary before the next
> reroll.

OK.

> 
> > I'll review the rest tomorrow. It seems like I've gotten over the most
> > difficult patches.
> 
> Thanks, and sorry that this took me a few days to get back to. I
> appreciate your review immensely.

No worries, and thanks for these patches.

^ permalink raw reply	[flat|nested] 273+ messages in thread

* [PATCH v2 00/24] multi-pack reachability bitmaps
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (21 preceding siblings ...)
  2021-04-09 18:12 ` [PATCH 22/22] p5326: perf tests for MIDX bitmaps Taylor Blau
@ 2021-06-21 22:24 ` Taylor Blau
  2021-06-21 22:24   ` [PATCH v2 01/24] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
                     ` (24 more replies)
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                   ` (2 subsequent siblings)
  25 siblings, 25 replies; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:24 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Here is a reroll of my series to implement multi-pack reachability bitmaps. It
is based on 'master', and incorporates a handful of changes from an earlier
round of review from Jonathan Tan, as well as a handful of tweaks useful to us
at GitHub that we've picked up over a few months of running these patches in
production.

I have been quite behind in sending this to the list because a number of
non-work things that have kept me busy. But those seem to have settled down, so
here is a second reroll.

Notable changes since last time are summarized here (though a complete
range-diff is below):

  - A preferred pack is inferred when not otherwise specified. This fixes a
    nasty bug dependent on readdir() ordering which can cause bitmap corruption.
    See the new ninth patch for the gory details.
  - A bug which broke CI on 'seen' is fixed where t0410.27 would fail when
    writing a bitmap (as is the case when
    GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1 is set).
  - Comments in unclear portions of the code having to do with promisor objects,
    and object order when fed to bitmap writing routines are added.
  - A number of spots dealing with pack reuse were simplified to avoid using the
    MIDX's .rev file where unnecessary (along with a comment explaining why the
    optimization is possible in the first place).

Thanks in advance for your review, and sorry for the wait.

Jeff King (2):
  t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
  t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP

Taylor Blau (22):
  pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
  pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  pack-bitmap-write.c: free existing bitmaps
  Documentation: build 'technical/bitmap-format' by default
  Documentation: describe MIDX-based bitmaps
  midx: make a number of functions non-static
  midx: clear auxiliary .rev after replacing the MIDX
  midx: respect 'core.multiPackIndex' when writing
  midx: infer preferred pack when not given one
  pack-bitmap.c: introduce 'bitmap_num_objects()'
  pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
  pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
  pack-bitmap: read multi-pack bitmaps
  pack-bitmap: write multi-pack bitmaps
  t5310: move some tests to lib-bitmap.sh
  t/helper/test-read-midx.c: add --checksum mode
  t5326: test multi-pack bitmap behavior
  t5319: don't write MIDX bitmaps in t5319
  t7700: update to work with MIDX bitmap test knob
  midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  p5310: extract full and partial bitmap tests
  p5326: perf tests for MIDX bitmaps

 Documentation/Makefile                       |   1 +
 Documentation/git-multi-pack-index.txt       |  12 +-
 Documentation/technical/bitmap-format.txt    |  72 ++-
 Documentation/technical/multi-pack-index.txt |  10 +-
 builtin/multi-pack-index.c                   |   2 +
 builtin/pack-objects.c                       |   8 +-
 builtin/repack.c                             |  13 +-
 ci/run-build-and-tests.sh                    |   1 +
 midx.c                                       | 288 +++++++++++-
 midx.h                                       |   5 +
 pack-bitmap-write.c                          |  79 +++-
 pack-bitmap.c                                | 470 +++++++++++++++++--
 pack-bitmap.h                                |   8 +-
 packfile.c                                   |   2 +-
 t/README                                     |   4 +
 t/helper/test-read-midx.c                    |  16 +-
 t/lib-bitmap.sh                              | 240 ++++++++++
 t/perf/lib-bitmap.sh                         |  69 +++
 t/perf/p5310-pack-bitmaps.sh                 |  65 +--
 t/perf/p5326-multi-pack-bitmaps.sh           |  43 ++
 t/t0410-partial-clone.sh                     |  12 +-
 t/t5310-pack-bitmaps.sh                      | 231 +--------
 t/t5319-multi-pack-index.sh                  |   3 +-
 t/t5326-multi-pack-bitmaps.sh                | 277 +++++++++++
 t/t7700-repack.sh                            |  18 +-
 25 files changed, 1534 insertions(+), 415 deletions(-)
 create mode 100644 t/perf/lib-bitmap.sh
 create mode 100755 t/perf/p5326-multi-pack-bitmaps.sh
 create mode 100755 t/t5326-multi-pack-bitmaps.sh

Range-diff against v1:
 1:  2d1c6ccab5 =  1:  a18baeb0b4 pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
 2:  d199954ef2 !  2:  3e637d9ec8 pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
    @@ Commit message
         bitmaps would be incomplete).
     
         Pack bitmaps are never written from 'git repack' unless repacking
    -    all-into-one, and so we never write non-closed bitmaps.
    +    all-into-one, and so we never write non-closed bitmaps (except in the
    +    case of partial clones where we aren't guaranteed to have all objects).
     
         But multi-pack bitmaps change this, since it isn't known whether the
         set of objects in the MIDX is closed under reachability until walking
    @@ Commit message
         include in the bitmap, bitmap_writer_build() knows that the set is not
         closed, and so it now fails gracefully.
     
    -    (The new conditional in builtin/pack-objects.c:bitmap_writer_build()
    -    guards against other failure modes, but is never triggered here, because
    -    of the all-into-one detail above. This return value will be important to
    -    check from the multi-pack index caller.)
    +    A test is added in t0410 to trigger a bitmap write without full
    +    reachability closure by removing local copies of some reachable objects
    +    from a promisor remote.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    @@ pack-bitmap-write.c: void bitmap_writer_build(struct packing_data *to_pack)
     -	compute_xor_offsets();
     +	if (closed)
     +		compute_xor_offsets();
    -+	return closed;
    ++	return closed ? 0 : -1;
      }
      
      /**
    @@ pack-bitmap.h: struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap
      void bitmap_writer_finish(struct pack_idx_entry **index,
      			  uint32_t index_nr,
      			  const char *filename,
    +
    + ## t/t0410-partial-clone.sh ##
    +@@ t/t0410-partial-clone.sh: test_expect_success 'gc does not repack promisor objects if there are none' '
    + repack_and_check () {
    + 	rm -rf repo2 &&
    + 	cp -r repo repo2 &&
    +-	git -C repo2 repack $1 -d &&
    ++	if test x"$1" = "x--must-fail"
    ++	then
    ++		shift
    ++		test_must_fail git -C repo2 repack $1 -d
    ++	else
    ++		git -C repo2 repack $1 -d
    ++	fi &&
    + 	git -C repo2 fsck &&
    + 
    + 	git -C repo2 cat-file -e $2 &&
    +@@ t/t0410-partial-clone.sh: test_expect_success 'repack -d does not irreversibly delete promisor objects' '
    + 	printf "$THREE\n" | pack_as_from_promisor &&
    + 	delete_object repo "$ONE" &&
    + 
    ++	repack_and_check --must-fail -ab "$TWO" "$THREE" &&
    + 	repack_and_check -a "$TWO" "$THREE" &&
    + 	repack_and_check -A "$TWO" "$THREE" &&
    + 	repack_and_check -l "$TWO" "$THREE"
 3:  014c18b896 =  3:  490d733d12 pack-bitmap-write.c: free existing bitmaps
 4:  46de889cd2 =  4:  b0bb2e8051 Documentation: build 'technical/bitmap-format' by default
 5:  0d4822a64e =  5:  64a260e0c6 Documentation: describe MIDX-based bitmaps
 6:  c76dfc198e =  6:  b3a12424d7 midx: make a number of functions non-static
 7:  26c3a312f9 =  7:  1448ca0d2b midx: clear auxiliary .rev after replacing the MIDX
 8:  8643174a67 =  8:  dfd1daacc5 midx: respect 'core.multiPackIndex' when writing
 -:  ---------- >  9:  9495f6869d midx: infer preferred pack when not given one
 9:  af507f4b29 = 10:  373aa47528 pack-bitmap.c: introduce 'bitmap_num_objects()'
10:  a6fdf7234a = 11:  ac1f46aa1f pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
11:  a78f83a127 = 12:  c474d2eda5 pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
12:  d5eeca4f11 ! 13:  7d44ba6299 pack-bitmap: read multi-pack bitmaps
    @@ builtin/pack-objects.c: static void write_reused_pack(struct hashfile *f)
      				break;
      
      			offset += ewah_bit_ctz64(word >> offset);
    --			write_reused_pack_one(pos + offset, f, &w_curs);
    -+			if (bitmap_is_midx(bitmap_git)) {
    -+				off_t pack_offs = bitmap_pack_offset(bitmap_git,
    -+								     pos + offset);
    -+				uint32_t pos;
    -+
    -+				if (offset_to_pack_pos(reuse_packfile, pack_offs, &pos) < 0)
    -+					die(_("write_reused_pack: could not locate %"PRIdMAX),
    -+					    (intmax_t)pack_offs);
    -+				write_reused_pack_one(pos, f, &w_curs);
    -+			} else
    -+				write_reused_pack_one(pos + offset, f, &w_curs);
    ++			/*
    ++			 * Can use bit positions directly, even for MIDX
    ++			 * bitmaps. See comment in try_partial_reuse()
    ++			 * for why.
    ++			 */
    + 			write_reused_pack_one(pos + offset, f, &w_curs);
      			display_progress(progress_state, ++written);
      		}
    - 	}
     
      ## pack-bitmap-write.c ##
     @@ pack-bitmap-write.c: void bitmap_writer_show_progress(int show)
    @@ pack-bitmap.c: struct bitmap_index {
      	/* If not NULL, this is a name-hash cache pointing into map. */
      	uint32_t *hashes;
      
    ++	/* The checksum of the packfile or MIDX; points into map. */
     +	const unsigned char *checksum;
     +
      	/*
    @@ pack-bitmap.c: static char *pack_bitmap_filename(struct packed_git *p)
     +
     +	if (bitmap_git->pack || bitmap_git->midx) {
     +		/* ignore extra bitmap file; we can only handle one */
    ++		warning("ignoring extra bitmap file: %s",
    ++			get_midx_filename(midx->object_dir));
    ++		close(fd);
     +		return -1;
     +	}
     +
    @@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
      		goto cleanup;
      
      	object_array_clear(&revs->pending);
    -@@ pack-bitmap.c: static void try_partial_reuse(struct bitmap_index *bitmap_git,
    +@@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
    + }
    + 
    + static void try_partial_reuse(struct bitmap_index *bitmap_git,
    ++			      struct packed_git *pack,
    + 			      size_t pos,
      			      struct bitmap *reuse,
      			      struct pack_window **w_curs)
      {
     -	off_t offset, header;
    -+	struct packed_git *pack;
     +	off_t offset, delta_obj_offset;
      	enum object_type type;
      	unsigned long size;
      
    - 	if (pos >= bitmap_num_objects(bitmap_git))
    - 		return; /* not actually in the pack or MIDX */
    +-	if (pos >= bitmap_num_objects(bitmap_git))
    +-		return; /* not actually in the pack or MIDX */
    ++	/*
    ++	 * try_partial_reuse() is called either on (a) objects in the
    ++	 * bitmapped pack (in the case of a single-pack bitmap) or (b)
    ++	 * objects in the preferred pack of a multi-pack bitmap.
    ++	 * Importantly, the latter can pretend as if only a single pack
    ++	 * exists because:
    ++	 *
    ++	 *   - The first pack->num_objects bits of a MIDX bitmap are
    ++	 *     reserved for the preferred pack, and
    ++	 *
    ++	 *   - Ties due to duplicate objects are always resolved in
    ++	 *     favor of the preferred pack.
    ++	 *
    ++	 * Therefore we do not need to ever ask the MIDX for its copy of
    ++	 * an object by OID, since it will always select it from the
    ++	 * preferred pack. Likewise, the selected copy of the base
    ++	 * object for any deltas will reside in the same pack.
    ++	 *
    ++	 * This means that we can reuse pos when looking up the bit in
    ++	 * the reuse bitmap, too, since bits corresponding to the
    ++	 * preferred pack precede all bits from other packs.
    ++	 */
      
     -	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
     -	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
    -+	if (bitmap_is_midx(bitmap_git)) {
    -+		uint32_t pack_id, midx_pos;
    ++	if (pos >= pack->num_objects)
    ++		return; /* not actually in the pack or MIDX preferred pack */
     +
    -+		midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
    -+		pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
    -+
    -+		pack = bitmap_git->midx->packs[pack_id];
    -+		offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
    -+	} else {
    -+		pack = bitmap_git->pack;
    -+		offset = pack_pos_to_offset(bitmap_git->pack, pos);
    -+	}
    -+
    -+	delta_obj_offset = offset;
    ++	offset = delta_obj_offset = pack_pos_to_offset(pack, pos);
     +	type = unpack_object_header(pack, w_curs, &offset, &size);
      	if (type < 0)
      		return; /* broken packfile, punt */
    @@ pack-bitmap.c: static void try_partial_reuse(struct bitmap_index *bitmap_git,
      			return;
      
      		/*
    -@@ pack-bitmap.c: static void try_partial_reuse(struct bitmap_index *bitmap_git,
    - 		 * packs we write fresh, and OFS_DELTA is the default). But
    - 		 * let's double check to make sure the pack wasn't written with
    - 		 * odd parameters.
    -+		 *
    -+		 * Note that the base does not need to be repositioned, i.e.,
    -+		 * the MIDX is guaranteed to have selected the copy of "base"
    -+		 * from the same pack, since this function is only ever called
    -+		 * on the preferred pack (and all duplicate objects are resolved
    -+		 * in favor of the preferred pack).
    -+		 *
    -+		 * This means that we can reuse base_pos when looking up the bit
    -+		 * in the reuse bitmap, too, since bits corresponding to the
    -+		 * preferred pack precede all bits from other packs.
    - 		 */
    - 		if (base_pos >= pos)
    - 			return;
     @@ pack-bitmap.c: static void try_partial_reuse(struct bitmap_index *bitmap_git,
      	bitmap_set(reuse, pos);
      }
    @@ pack-bitmap.c: static void try_partial_reuse(struct bitmap_index *bitmap_git,
      int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
      				       struct packed_git **packfile_out,
      				       uint32_t *entries,
    -@@ pack-bitmap.c: int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
    + 				       struct bitmap **reuse_out)
    + {
    ++	struct packed_git *pack;
    + 	struct bitmap *result = bitmap_git->result;
    + 	struct bitmap *reuse;
    + 	struct pack_window *w_curs = NULL;
      	size_t i = 0;
      	uint32_t offset;
    - 	uint32_t objects_nr = bitmap_num_objects(bitmap_git);
    -+	uint32_t preferred_pack = 0;
    +-	uint32_t objects_nr = bitmap_num_objects(bitmap_git);
    ++	uint32_t objects_nr;
      
      	assert(result);
      
     +	load_reverse_index(bitmap_git);
     +
    -+	if (bitmap_is_midx(bitmap_git)) {
    -+		preferred_pack = midx_preferred_pack(bitmap_git);
    -+		objects_nr = bitmap_git->midx->packs[preferred_pack]->num_objects;
    -+	} else
    -+		objects_nr = bitmap_git->pack->num_objects;
    ++	if (bitmap_is_midx(bitmap_git))
    ++		pack = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)];
    ++	else
    ++		pack = bitmap_git->pack;
    ++	objects_nr = pack->num_objects;
     +
      	while (i < result->word_alloc && result->words[i] == (eword_t)~0)
      		i++;
    @@ pack-bitmap.c: int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitma
      				break;
      
      			offset += ewah_bit_ctz64(word >> offset);
    +-			try_partial_reuse(bitmap_git, pos + offset, reuse, &w_curs);
     +			if (bitmap_is_midx(bitmap_git)) {
     +				/*
     +				 * Can't reuse from a non-preferred pack (see
    @@ pack-bitmap.c: int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitma
     +				if (pos + offset >= objects_nr)
     +					continue;
     +			}
    - 			try_partial_reuse(bitmap_git, pos + offset, reuse, &w_curs);
    ++			try_partial_reuse(bitmap_git, pack, pos + offset, reuse, &w_curs);
      		}
      	}
    + 
     @@ pack-bitmap.c: int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
      	 * need to be handled separately.
      	 */
      	bitmap_and_not(result, reuse);
     -	*packfile_out = bitmap_git->pack;
    -+	*packfile_out = bitmap_git->pack ?
    -+		bitmap_git->pack :
    -+		bitmap_git->midx->packs[preferred_pack];
    ++	*packfile_out = pack;
      	*reuse_out = reuse;
      	return 0;
      }
    @@ pack-bitmap.c: static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_
      	struct ewah_iterator it;
      	eword_t filter;
     @@ pack-bitmap.c: static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
    + 				break;
      
      			offset += ewah_bit_ctz64(word >> offset);
    - 			pos = base + offset;
    +-			pos = base + offset;
     +
     +			if (bitmap_is_midx(bitmap_git)) {
     +				uint32_t pack_pos;
    -+				uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
    ++				uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, base + offset);
     +				uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
     +				off_t offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
     +
    @@ pack-bitmap.c: static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_
     +				}
     +
     +				pos = pack_pos;
    -+			} else
    ++			} else {
     +				pack = bitmap_git->pack;
    ++				pos = base + offset;
    ++			}
     +
      			total += pack_pos_to_offset(pack, pos + 1) -
      				 pack_pos_to_offset(pack, pos);
13:  fd320c5ed4 ! 14:  a8cec2463d pack-bitmap: write multi-pack bitmaps
    @@ midx.c: static void write_midx_reverse_index(char *midx_name, unsigned char *mid
     +	repo_init_revisions(the_repository, &revs, NULL);
     +	for_each_ref(add_ref_to_pending, &revs);
     +
    ++	/*
    ++	 * Skipping promisor objects here is intentional, since it only excludes
    ++	 * them from the list of reachable commits that we want to select from
    ++	 * when computing the selection of MIDX'd commits to receive bitmaps.
    ++	 *
    ++	 * Reachability bitmaps do require that their objects be closed under
    ++	 * reachability, but fetching any objects missing from promisors at this
    ++	 * point is too late. But, if one of those objects can be reached from
    ++	 * an another object that is included in the bitmap, then we will
    ++	 * complain later that we don't have reachability closure (and fail
    ++	 * appropriately).
    ++	 */
     +	fetch_if_missing = 0;
     +	revs.exclude_promisor_objects = 1;
     +
    ++	/*
    ++	 * Pass selected commits in topo order to match the behavior of
    ++	 * pack-bitmaps when configured with delta islands.
    ++	 */
    ++	revs.topo_order = 1;
    ++	revs.sort_order = REV_SORT_IN_GRAPH_ORDER;
    ++
     +	if (prepare_revision_walk(&revs))
     +		die(_("revision walk setup failed"));
     +
    @@ midx.c: static void write_midx_reverse_index(char *midx_name, unsigned char *mid
     +	bitmap_writer_build_type_index(&pdata, index, pdata.nr_objects);
     +
     +	/*
    -+	 * bitmap_writer_select_commits expects objects in lex order, but
    -+	 * pack_order gives us exactly that. use it directly instead of
    -+	 * re-sorting the array
    ++	 * bitmap_writer_finish expects objects in lex order, but pack_order
    ++	 * gives us exactly that. use it directly instead of re-sorting the
    ++	 * array.
    ++	 *
    ++	 * This changes the order of objects in 'index' between
    ++	 * bitmap_writer_build_type_index and bitmap_writer_finish.
    ++	 *
    ++	 * The same re-ordering takes place in the single-pack bitmap code via
    ++	 * write_idx_file(), which is called by finish_tmp_packfile(), which
    ++	 * happens between bitmap_writer_build_type_index() and
    ++	 * bitmap_writer_finish().
     +	 */
     +	for (i = 0; i < pdata.nr_objects; i++)
     +		index[ctx->pack_order[i]] = (struct pack_idx_entry *)&pdata.objects[i];
     +
     +	bitmap_writer_select_commits(commits, commits_nr, -1);
     +	ret = bitmap_writer_build(&pdata);
    -+	if (!ret)
    ++	if (ret < 0)
     +		goto cleanup;
     +
     +	bitmap_writer_set_checksum(midx_hash);
    @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
     +		}
     +	}
      
    - 	ctx.preferred_pack_idx = -1;
      	if (preferred_pack_name) {
    + 		int found = 0;
    +@@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack_index *
    + 		if (!found)
    + 			warning(_("unknown preferred pack: '%s'"),
    + 				preferred_pack_name);
    +-	} else if (ctx.nr && (flags & MIDX_WRITE_REV_INDEX)) {
    ++	} else if (ctx.nr &&
    ++		   (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
    + 		time_t oldest = ctx.info[0].p->mtime;
    + 		ctx.preferred_pack_idx = 0;
    + 
     @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack_index *
      	hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
      	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
    @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
      
      	if (flags & MIDX_WRITE_REV_INDEX)
      		write_midx_reverse_index(midx_name, midx_hash, &ctx);
    -+	if (flags & MIDX_WRITE_BITMAP)
    -+		write_midx_bitmap(midx_name, midx_hash, &ctx, flags);
    ++	if (flags & MIDX_WRITE_BITMAP) {
    ++		if (write_midx_bitmap(midx_name, midx_hash, &ctx, flags) < 0) {
    ++			error(_("could not write multi-pack bitmap"));
    ++			result = 1;
    ++			goto cleanup;
    ++		}
    ++	}
      
      	commit_lock_file(&lk);
      
14:  570e3de9ed ! 15:  c63eb637c8 t5310: move some tests to lib-bitmap.sh
    @@ t/lib-bitmap.sh: test_bitmap_traversal () {
     +		git --git-dir=clone.git rev-parse HEAD >actual &&
     +		test_cmp expect actual
     +	'
    ++
    ++	test_expect_success 'enumerating progress counts pack-reused objects' '
    ++		count=$(git rev-list --objects --all --count) &&
    ++		git repack -adb &&
    ++
    ++		# check first with only reused objects; confirm that our
    ++		# progress showed the right number, and also that we did
    ++		# pack-reuse as expected.  Check only the final "done"
    ++		# line of the meter (there may be an arbitrary number of
    ++		# intermediate lines ending with CR).
    ++		GIT_PROGRESS_DELAY=0 \
    ++			git pack-objects --all --stdout --progress \
    ++			</dev/null >/dev/null 2>stderr &&
    ++		grep "Enumerating objects: $count, done" stderr &&
    ++		grep "pack-reused $count" stderr &&
    ++
    ++		# now the same but with one non-reused object
    ++		git commit --allow-empty -m "an extra commit object" &&
    ++		GIT_PROGRESS_DELAY=0 \
    ++			git pack-objects --all --stdout --progress \
    ++			</dev/null >/dev/null 2>stderr &&
    ++		grep "Enumerating objects: $((count+1)), done" stderr &&
    ++		grep "pack-reused $count" stderr
    ++	'
     +}
     +
     +# have_delta <obj> <expected_base>
    @@ t/t5310-pack-bitmaps.sh: test_expect_success 'truncated bitmap fails gracefully
      	test_i18ngrep corrupted.bitmap.index stderr
      '
      
    +-test_expect_success 'enumerating progress counts pack-reused objects' '
    +-	count=$(git rev-list --objects --all --count) &&
    +-	git repack -adb &&
    +-
    +-	# check first with only reused objects; confirm that our progress
    +-	# showed the right number, and also that we did pack-reuse as expected.
    +-	# Check only the final "done" line of the meter (there may be an
    +-	# arbitrary number of intermediate lines ending with CR).
    +-	GIT_PROGRESS_DELAY=0 \
    +-		git pack-objects --all --stdout --progress \
    +-		</dev/null >/dev/null 2>stderr &&
    +-	grep "Enumerating objects: $count, done" stderr &&
    +-	grep "pack-reused $count" stderr &&
    +-
    +-	# now the same but with one non-reused object
    +-	git commit --allow-empty -m "an extra commit object" &&
    +-	GIT_PROGRESS_DELAY=0 \
    +-		git pack-objects --all --stdout --progress \
    +-		</dev/null >/dev/null 2>stderr &&
    +-	grep "Enumerating objects: $((count+1)), done" stderr &&
    +-	grep "pack-reused $count" stderr
    +-'
    +-
     -# have_delta <obj> <expected_base>
     -#
     -# Note that because this relies on cat-file, it might find _any_ copy of an
15:  060ee427be = 16:  bedb7afb37 t/helper/test-read-midx.c: add --checksum mode
16:  ff74181e85 ! 17:  fbfac4ae8e t5326: test multi-pack bitmap behavior
    @@ t/t5326-multi-pack-bitmaps.sh (new)
     +	git multi-pack-index write --bitmap &&
     +
     +	ls $objdir/pack/pack-*.pack >packs &&
    -+	test_line_count = 26 packs &&
    ++	test_line_count = 25 packs &&
     +
     +	test_path_is_file $midx &&
     +	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
    @@ t/t5326-multi-pack-bitmaps.sh (new)
     +		$(git rev-parse packed)
     +		EOF
     +
    -+		git multi-pack-index write --bitmap 2>err &&
    ++		test_must_fail git multi-pack-index write --bitmap 2>err &&
     +		grep "doesn.t have full closure" err &&
    -+		test_path_is_file $midx &&
    -+		test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
    ++		test_path_is_missing $midx
     +	)
     +'
     +
 -:  ---------- > 18:  2a5df1832a t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
17:  8f328bb5bc = 19:  2d24c5b7ad t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
18:  dbd953815b = 20:  4cbfaa0e97 t5319: don't write MIDX bitmaps in t5319
19:  ee952e4300 = 21:  839a7a79eb t7700: update to work with MIDX bitmap test knob
20:  83614f9284 = 22:  00418d5b09 midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
21:  2f3836bf2e = 23:  98fa73a76a p5310: extract full and partial bitmap tests
22:  b75b534446 = 24:  ec0f53b424 p5326: perf tests for MIDX bitmaps
-- 
2.31.1.163.ga65ce7f831

^ permalink raw reply	[flat|nested] 273+ messages in thread

* [PATCH v2 01/24] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
@ 2021-06-21 22:24   ` Taylor Blau
  2021-06-24 23:02     ` Ævar Arnfjörð Bjarmason
  2021-07-21  9:45     ` Jeff King
  2021-06-21 22:25   ` [PATCH v2 02/24] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
                     ` (23 subsequent siblings)
  24 siblings, 2 replies; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:24 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

The special `--test-bitmap` mode of `git rev-list` is used to compare
the result of an object traversal with a bitmap to check its integrity.
This mode does not, however, assert that the types of reachable objects
are stored correctly.

Harden this mode by teaching it to also check that each time an object's
bit is marked, the corresponding bit should be set in exactly one of the
type bitmaps (whose type matches the object's true type).

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index d90e1d9d8c..368fa59a42 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1301,10 +1301,52 @@ void count_bitmap_commit_list(struct bitmap_index *bitmap_git,
 struct bitmap_test_data {
 	struct bitmap_index *bitmap_git;
 	struct bitmap *base;
+	struct bitmap *commits;
+	struct bitmap *trees;
+	struct bitmap *blobs;
+	struct bitmap *tags;
 	struct progress *prg;
 	size_t seen;
 };
 
+static void test_bitmap_type(struct bitmap_test_data *tdata,
+			     struct object *obj, int pos)
+{
+	enum object_type bitmap_type = OBJ_NONE;
+	int bitmaps_nr = 0;
+
+	if (bitmap_get(tdata->commits, pos)) {
+		bitmap_type = OBJ_COMMIT;
+		bitmaps_nr++;
+	}
+	if (bitmap_get(tdata->trees, pos)) {
+		bitmap_type = OBJ_TREE;
+		bitmaps_nr++;
+	}
+	if (bitmap_get(tdata->blobs, pos)) {
+		bitmap_type = OBJ_BLOB;
+		bitmaps_nr++;
+	}
+	if (bitmap_get(tdata->tags, pos)) {
+		bitmap_type = OBJ_TAG;
+		bitmaps_nr++;
+	}
+
+	if (!bitmap_type)
+		die("object %s not found in type bitmaps",
+		    oid_to_hex(&obj->oid));
+
+	if (bitmaps_nr > 1)
+		die("object %s does not have a unique type",
+		    oid_to_hex(&obj->oid));
+
+	if (bitmap_type != obj->type)
+		die("object %s: real type %s, expected: %s",
+		    oid_to_hex(&obj->oid),
+		    type_name(obj->type),
+		    type_name(bitmap_type));
+}
+
 static void test_show_object(struct object *object, const char *name,
 			     void *data)
 {
@@ -1314,6 +1356,7 @@ static void test_show_object(struct object *object, const char *name,
 	bitmap_pos = bitmap_position(tdata->bitmap_git, &object->oid);
 	if (bitmap_pos < 0)
 		die("Object not in bitmap: %s\n", oid_to_hex(&object->oid));
+	test_bitmap_type(tdata, object, bitmap_pos);
 
 	bitmap_set(tdata->base, bitmap_pos);
 	display_progress(tdata->prg, ++tdata->seen);
@@ -1328,6 +1371,7 @@ static void test_show_commit(struct commit *commit, void *data)
 				     &commit->object.oid);
 	if (bitmap_pos < 0)
 		die("Object not in bitmap: %s\n", oid_to_hex(&commit->object.oid));
+	test_bitmap_type(tdata, &commit->object, bitmap_pos);
 
 	bitmap_set(tdata->base, bitmap_pos);
 	display_progress(tdata->prg, ++tdata->seen);
@@ -1375,6 +1419,10 @@ void test_bitmap_walk(struct rev_info *revs)
 
 	tdata.bitmap_git = bitmap_git;
 	tdata.base = bitmap_new();
+	tdata.commits = ewah_to_bitmap(bitmap_git->commits);
+	tdata.trees = ewah_to_bitmap(bitmap_git->trees);
+	tdata.blobs = ewah_to_bitmap(bitmap_git->blobs);
+	tdata.tags = ewah_to_bitmap(bitmap_git->tags);
 	tdata.prg = start_progress("Verifying bitmap entries", result_popcnt);
 	tdata.seen = 0;
 
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 02/24] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
  2021-06-21 22:24   ` [PATCH v2 01/24] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-06-24 23:23     ` Ævar Arnfjörð Bjarmason
  2021-07-21  9:50     ` Jeff King
  2021-06-21 22:25   ` [PATCH v2 03/24] pack-bitmap-write.c: free existing bitmaps Taylor Blau
                     ` (22 subsequent siblings)
  24 siblings, 2 replies; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

The set of objects covered by a bitmap must be closed under
reachability, since it must be the case that there is a valid bit
position assigned for every possible reachable object (otherwise the
bitmaps would be incomplete).

Pack bitmaps are never written from 'git repack' unless repacking
all-into-one, and so we never write non-closed bitmaps (except in the
case of partial clones where we aren't guaranteed to have all objects).

But multi-pack bitmaps change this, since it isn't known whether the
set of objects in the MIDX is closed under reachability until walking
them. Plumb through a bit that is set when a reachable object isn't
found.

As soon as a reachable object isn't found in the set of objects to
include in the bitmap, bitmap_writer_build() knows that the set is not
closed, and so it now fails gracefully.

A test is added in t0410 to trigger a bitmap write without full
reachability closure by removing local copies of some reachable objects
from a promisor remote.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c   |  3 +-
 pack-bitmap-write.c      | 76 ++++++++++++++++++++++++++++------------
 pack-bitmap.h            |  2 +-
 t/t0410-partial-clone.sh |  9 ++++-
 4 files changed, 64 insertions(+), 26 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index de00adbb9e..8a523624a1 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1256,7 +1256,8 @@ static void write_pack_file(void)
 
 				bitmap_writer_show_progress(progress);
 				bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1);
-				bitmap_writer_build(&to_pack);
+				if (bitmap_writer_build(&to_pack) < 0)
+					die(_("failed to write bitmap index"));
 				bitmap_writer_finish(written_list, nr_written,
 						     tmpname.buf, write_bitmap_options);
 				write_bitmap_index = 0;
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 88d9e696a5..d374f7884b 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -125,15 +125,20 @@ static inline void push_bitmapped_commit(struct commit *commit)
 	writer.selected_nr++;
 }
 
-static uint32_t find_object_pos(const struct object_id *oid)
+static uint32_t find_object_pos(const struct object_id *oid, int *found)
 {
 	struct object_entry *entry = packlist_find(writer.to_pack, oid);
 
 	if (!entry) {
-		die("Failed to write bitmap index. Packfile doesn't have full closure "
+		if (found)
+			*found = 0;
+		warning("Failed to write bitmap index. Packfile doesn't have full closure "
 			"(object %s is missing)", oid_to_hex(oid));
+		return 0;
 	}
 
+	if (found)
+		*found = 1;
 	return oe_in_pack_pos(writer.to_pack, entry);
 }
 
@@ -331,9 +336,10 @@ static void bitmap_builder_clear(struct bitmap_builder *bb)
 	bb->commits_nr = bb->commits_alloc = 0;
 }
 
-static void fill_bitmap_tree(struct bitmap *bitmap,
-			     struct tree *tree)
+static int fill_bitmap_tree(struct bitmap *bitmap,
+			    struct tree *tree)
 {
+	int found;
 	uint32_t pos;
 	struct tree_desc desc;
 	struct name_entry entry;
@@ -342,9 +348,11 @@ static void fill_bitmap_tree(struct bitmap *bitmap,
 	 * If our bit is already set, then there is nothing to do. Both this
 	 * tree and all of its children will be set.
 	 */
-	pos = find_object_pos(&tree->object.oid);
+	pos = find_object_pos(&tree->object.oid, &found);
+	if (!found)
+		return -1;
 	if (bitmap_get(bitmap, pos))
-		return;
+		return 0;
 	bitmap_set(bitmap, pos);
 
 	if (parse_tree(tree) < 0)
@@ -355,11 +363,15 @@ static void fill_bitmap_tree(struct bitmap *bitmap,
 	while (tree_entry(&desc, &entry)) {
 		switch (object_type(entry.mode)) {
 		case OBJ_TREE:
-			fill_bitmap_tree(bitmap,
-					 lookup_tree(the_repository, &entry.oid));
+			if (fill_bitmap_tree(bitmap,
+					     lookup_tree(the_repository, &entry.oid)) < 0)
+				return -1;
 			break;
 		case OBJ_BLOB:
-			bitmap_set(bitmap, find_object_pos(&entry.oid));
+			pos = find_object_pos(&entry.oid, &found);
+			if (!found)
+				return -1;
+			bitmap_set(bitmap, pos);
 			break;
 		default:
 			/* Gitlink, etc; not reachable */
@@ -368,15 +380,18 @@ static void fill_bitmap_tree(struct bitmap *bitmap,
 	}
 
 	free_tree_buffer(tree);
+	return 0;
 }
 
-static void fill_bitmap_commit(struct bb_commit *ent,
-			       struct commit *commit,
-			       struct prio_queue *queue,
-			       struct prio_queue *tree_queue,
-			       struct bitmap_index *old_bitmap,
-			       const uint32_t *mapping)
+static int fill_bitmap_commit(struct bb_commit *ent,
+			      struct commit *commit,
+			      struct prio_queue *queue,
+			      struct prio_queue *tree_queue,
+			      struct bitmap_index *old_bitmap,
+			      const uint32_t *mapping)
 {
+	int found;
+	uint32_t pos;
 	if (!ent->bitmap)
 		ent->bitmap = bitmap_new();
 
@@ -401,11 +416,16 @@ static void fill_bitmap_commit(struct bb_commit *ent,
 		 * Mark ourselves and queue our tree. The commit
 		 * walk ensures we cover all parents.
 		 */
-		bitmap_set(ent->bitmap, find_object_pos(&c->object.oid));
+		pos = find_object_pos(&c->object.oid, &found);
+		if (!found)
+			return -1;
+		bitmap_set(ent->bitmap, pos);
 		prio_queue_put(tree_queue, get_commit_tree(c));
 
 		for (p = c->parents; p; p = p->next) {
-			int pos = find_object_pos(&p->item->object.oid);
+			pos = find_object_pos(&p->item->object.oid, &found);
+			if (!found)
+				return -1;
 			if (!bitmap_get(ent->bitmap, pos)) {
 				bitmap_set(ent->bitmap, pos);
 				prio_queue_put(queue, p->item);
@@ -413,8 +433,12 @@ static void fill_bitmap_commit(struct bb_commit *ent,
 		}
 	}
 
-	while (tree_queue->nr)
-		fill_bitmap_tree(ent->bitmap, prio_queue_get(tree_queue));
+	while (tree_queue->nr) {
+		if (fill_bitmap_tree(ent->bitmap,
+				     prio_queue_get(tree_queue)) < 0)
+			return -1;
+	}
+	return 0;
 }
 
 static void store_selected(struct bb_commit *ent, struct commit *commit)
@@ -432,7 +456,7 @@ static void store_selected(struct bb_commit *ent, struct commit *commit)
 	kh_value(writer.bitmaps, hash_pos) = stored;
 }
 
-void bitmap_writer_build(struct packing_data *to_pack)
+int bitmap_writer_build(struct packing_data *to_pack)
 {
 	struct bitmap_builder bb;
 	size_t i;
@@ -441,6 +465,7 @@ void bitmap_writer_build(struct packing_data *to_pack)
 	struct prio_queue tree_queue = { NULL };
 	struct bitmap_index *old_bitmap;
 	uint32_t *mapping;
+	int closed = 1; /* until proven otherwise */
 
 	writer.bitmaps = kh_init_oid_map();
 	writer.to_pack = to_pack;
@@ -463,8 +488,11 @@ void bitmap_writer_build(struct packing_data *to_pack)
 		struct commit *child;
 		int reused = 0;
 
-		fill_bitmap_commit(ent, commit, &queue, &tree_queue,
-				   old_bitmap, mapping);
+		if (fill_bitmap_commit(ent, commit, &queue, &tree_queue,
+				       old_bitmap, mapping) < 0) {
+			closed = 0;
+			break;
+		}
 
 		if (ent->selected) {
 			store_selected(ent, commit);
@@ -499,7 +527,9 @@ void bitmap_writer_build(struct packing_data *to_pack)
 
 	stop_progress(&writer.progress);
 
-	compute_xor_offsets();
+	if (closed)
+		compute_xor_offsets();
+	return closed ? 0 : -1;
 }
 
 /**
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 99d733eb26..020cd8d868 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -87,7 +87,7 @@ struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
 				      struct commit *commit);
 void bitmap_writer_select_commits(struct commit **indexed_commits,
 		unsigned int indexed_commits_nr, int max_bitmaps);
-void bitmap_writer_build(struct packing_data *to_pack);
+int bitmap_writer_build(struct packing_data *to_pack);
 void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint32_t index_nr,
 			  const char *filename,
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 584a039b85..1667450917 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -536,7 +536,13 @@ test_expect_success 'gc does not repack promisor objects if there are none' '
 repack_and_check () {
 	rm -rf repo2 &&
 	cp -r repo repo2 &&
-	git -C repo2 repack $1 -d &&
+	if test x"$1" = "x--must-fail"
+	then
+		shift
+		test_must_fail git -C repo2 repack $1 -d
+	else
+		git -C repo2 repack $1 -d
+	fi &&
 	git -C repo2 fsck &&
 
 	git -C repo2 cat-file -e $2 &&
@@ -561,6 +567,7 @@ test_expect_success 'repack -d does not irreversibly delete promisor objects' '
 	printf "$THREE\n" | pack_as_from_promisor &&
 	delete_object repo "$ONE" &&
 
+	repack_and_check --must-fail -ab "$TWO" "$THREE" &&
 	repack_and_check -a "$TWO" "$THREE" &&
 	repack_and_check -A "$TWO" "$THREE" &&
 	repack_and_check -l "$TWO" "$THREE"
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 03/24] pack-bitmap-write.c: free existing bitmaps
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
  2021-06-21 22:24   ` [PATCH v2 01/24] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
  2021-06-21 22:25   ` [PATCH v2 02/24] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-07-21  9:54     ` Jeff King
  2021-06-21 22:25   ` [PATCH v2 04/24] Documentation: build 'technical/bitmap-format' by default Taylor Blau
                     ` (21 subsequent siblings)
  24 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

When writing a new bitmap, the bitmap writer code attempts to read the
existing bitmap (if one is present). This is done in order to quickly
permute the bits of any bitmaps for commits which appear in the existing
bitmap, and were also selected for the new bitmap.

But since this code was added in 341fa34887 (pack-bitmap-write: use
existing bitmaps, 2020-12-08), the resources associated with opening an
existing bitmap were never released.

It's fine to ignore this, but it's bad hygiene. It will also cause a
problem for the multi-pack-index builtin, which will be responsible not
only for writing bitmaps, but also for expiring any old multi-pack
bitmaps.

If an existing bitmap was reused here, it will also be expired. That
will cause a problem on platforms which require file resources to be
closed before unlinking them, like Windows. Avoid this by ensuring we
close reused bitmaps with free_bitmap_index() before removing them.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index d374f7884b..142fd0adb8 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -520,6 +520,7 @@ int bitmap_writer_build(struct packing_data *to_pack)
 	clear_prio_queue(&queue);
 	clear_prio_queue(&tree_queue);
 	bitmap_builder_clear(&bb);
+	free_bitmap_index(old_bitmap);
 	free(mapping);
 
 	trace2_region_leave("pack-bitmap-write", "building_bitmaps_total",
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 04/24] Documentation: build 'technical/bitmap-format' by default
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (2 preceding siblings ...)
  2021-06-21 22:25   ` [PATCH v2 03/24] pack-bitmap-write.c: free existing bitmaps Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-06-24 23:35     ` Ævar Arnfjörð Bjarmason
  2021-07-21  9:58     ` Jeff King
  2021-06-21 22:25   ` [PATCH v2 05/24] Documentation: describe MIDX-based bitmaps Taylor Blau
                     ` (20 subsequent siblings)
  24 siblings, 2 replies; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Even though the 'TECH_DOCS' variable was introduced all the way back in
5e00439f0a (Documentation: build html for all files in technical and
howto, 2012-10-23), the 'bitmap-format' document was never added to that
list when it was created.

Prepare for changes to this file by including it in the list of
technical documentation that 'make doc' will build by default.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/Makefile b/Documentation/Makefile
index f5605b7767..7d7b778b28 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -90,6 +90,7 @@ SP_ARTICLES += $(API_DOCS)
 TECH_DOCS += MyFirstContribution
 TECH_DOCS += MyFirstObjectWalk
 TECH_DOCS += SubmittingPatches
+TECH_DOCS += technical/bitmap-format
 TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
 TECH_DOCS += technical/index-format
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 05/24] Documentation: describe MIDX-based bitmaps
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (3 preceding siblings ...)
  2021-06-21 22:25   ` [PATCH v2 04/24] Documentation: build 'technical/bitmap-format' by default Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-07-21 10:18     ` Jeff King
  2021-06-21 22:25   ` [PATCH v2 06/24] midx: make a number of functions non-static Taylor Blau
                     ` (19 subsequent siblings)
  24 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Update the technical documentation to describe the multi-pack bitmap
format. This patch merely introduces the new format, and describes its
high-level ideas. Git does not yet know how to read nor write these
multi-pack variants, and so the subsequent patches will:

  - Introduce code to interpret multi-pack bitmaps, according to this
    document.

  - Then, introduce code to write multi-pack bitmaps from the 'git
    multi-pack-index write' sub-command.

Finally, the implementation will gain tests in subsequent patches (as
opposed to inline with the patch teaching Git how to write multi-pack
bitmaps) to avoid a cyclic dependency.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/bitmap-format.txt    | 72 ++++++++++++++++----
 Documentation/technical/multi-pack-index.txt | 10 +--
 2 files changed, 61 insertions(+), 21 deletions(-)

diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index f8c18a0f7a..25221c7ec8 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -1,6 +1,45 @@
 GIT bitmap v1 format
 ====================
 
+== Pack and multi-pack bitmaps
+
+Bitmaps store reachability information about the set of objects in a packfile,
+or a multi-pack index (MIDX). The former is defined obviously, and the latter is
+defined as the union of objects in packs contained in the MIDX.
+
+A bitmap may belong to either one pack, or the repository's multi-pack index (if
+it exists). A repository may have at most one bitmap.
+
+An object is uniquely described by its bit position within a bitmap:
+
+	- If the bitmap belongs to a packfile, the __n__th bit corresponds to
+	the __n__th object in pack order. For a function `offset` which maps
+	objects to their byte offset within a pack, pack order is defined as
+	follows:
+
+		o1 <= o2 <==> offset(o1) <= offset(o2)
+
+	- If the bitmap belongs to a MIDX, the __n__th bit corresponds to the
+	__n__th object in MIDX order. With an additional function `pack` which
+	maps objects to the pack they were selected from by the MIDX, MIDX order
+	is defined as follows:
+
+		o1 <= o2 <==> pack(o1) <= pack(o2) /\ offset(o1) <= offset(o2)
+
+	The ordering between packs is done lexicographically by the pack name,
+	with the exception of the preferred pack, which sorts ahead of all other
+	packs.
+
+The on-disk representation (described below) of a bitmap is the same regardless
+of whether or not that bitmap belongs to a packfile or a MIDX. The only
+difference is the interpretation of the bits, which is described above.
+
+Certain bitmap extensions are supported (see: Appendix B). No extensions are
+required for bitmaps corresponding to packfiles. For bitmaps that correspond to
+MIDXs, both the bit-cache and rev-cache extensions are required.
+
+== On-disk format
+
 	- A header appears at the beginning:
 
 		4-byte signature: {'B', 'I', 'T', 'M'}
@@ -14,17 +53,19 @@ GIT bitmap v1 format
 			The following flags are supported:
 
 			- BITMAP_OPT_FULL_DAG (0x1) REQUIRED
-			This flag must always be present. It implies that the bitmap
-			index has been generated for a packfile with full closure
-			(i.e. where every single object in the packfile can find
-			 its parent links inside the same packfile). This is a
-			requirement for the bitmap index format, also present in JGit,
-			that greatly reduces the complexity of the implementation.
+			This flag must always be present. It implies that the
+			bitmap index has been generated for a packfile or
+			multi-pack index (MIDX) with full closure (i.e. where
+			every single object in the packfile/MIDX can find its
+			parent links inside the same packfile/MIDX). This is a
+			requirement for the bitmap index format, also present in
+			JGit, that greatly reduces the complexity of the
+			implementation.
 
 			- BITMAP_OPT_HASH_CACHE (0x4)
 			If present, the end of the bitmap file contains
 			`N` 32-bit name-hash values, one per object in the
-			pack. The format and meaning of the name-hash is
+			pack/MIDX. The format and meaning of the name-hash is
 			described below.
 
 		4-byte entry count (network byte order)
@@ -33,7 +74,8 @@ GIT bitmap v1 format
 
 		20-byte checksum
 
-			The SHA1 checksum of the pack this bitmap index belongs to.
+			The SHA1 checksum of the pack/MIDX this bitmap index
+			belongs to.
 
 	- 4 EWAH bitmaps that act as type indexes
 
@@ -50,7 +92,7 @@ GIT bitmap v1 format
 			- Tags
 
 		In each bitmap, the `n`th bit is set to true if the `n`th object
-		in the packfile is of that type.
+		in the packfile or multi-pack index is of that type.
 
 		The obvious consequence is that the OR of all 4 bitmaps will result
 		in a full set (all bits set), and the AND of all 4 bitmaps will
@@ -62,8 +104,9 @@ GIT bitmap v1 format
 		Each entry contains the following:
 
 		- 4-byte object position (network byte order)
-			The position **in the index for the packfile** where the
-			bitmap for this commit is found.
+			The position **in the index for the packfile or
+			multi-pack index** where the bitmap for this commit is
+			found.
 
 		- 1-byte XOR-offset
 			The xor offset used to compress this bitmap. For an entry
@@ -146,10 +189,11 @@ Name-hash cache
 ---------------
 
 If the BITMAP_OPT_HASH_CACHE flag is set, the end of the bitmap contains
-a cache of 32-bit values, one per object in the pack. The value at
+a cache of 32-bit values, one per object in the pack/MIDX. The value at
 position `i` is the hash of the pathname at which the `i`th object
-(counting in index order) in the pack can be found.  This can be fed
-into the delta heuristics to compare objects with similar pathnames.
+(counting in index or multi-pack index order) in the pack/MIDX can be found.
+This can be fed into the delta heuristics to compare objects with similar
+pathnames.
 
 The hash algorithm used is:
 
diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index fb688976c4..1a73c3ee20 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -71,14 +71,10 @@ Future Work
   still reducing the number of binary searches required for object
   lookups.
 
-- The reachability bitmap is currently paired directly with a single
-  packfile, using the pack-order as the object order to hopefully
-  compress the bitmaps well using run-length encoding. This could be
-  extended to pair a reachability bitmap with a multi-pack-index. If
-  the multi-pack-index is extended to store a "stable object order"
+- If the multi-pack-index is extended to store a "stable object order"
   (a function Order(hash) = integer that is constant for a given hash,
-  even as the multi-pack-index is updated) then a reachability bitmap
-  could point to a multi-pack-index and be updated independently.
+  even as the multi-pack-index is updated) then MIDX bitmaps could be
+  updated independently of the MIDX.
 
 - Packfiles can be marked as "special" using empty files that share
   the initial name but replace ".pack" with ".keep" or ".promisor".
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 06/24] midx: make a number of functions non-static
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (4 preceding siblings ...)
  2021-06-21 22:25   ` [PATCH v2 05/24] Documentation: describe MIDX-based bitmaps Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-06-24 23:42     ` Ævar Arnfjörð Bjarmason
  2021-06-21 22:25   ` [PATCH v2 07/24] midx: clear auxiliary .rev after replacing the MIDX Taylor Blau
                     ` (18 subsequent siblings)
  24 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

These functions will be called from outside of midx.c in a subsequent
patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 4 ++--
 midx.h | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 21d6a05e88..fa23d57a24 100644
--- a/midx.c
+++ b/midx.c
@@ -48,12 +48,12 @@ static uint8_t oid_version(void)
 	}
 }
 
-static const unsigned char *get_midx_checksum(struct multi_pack_index *m)
+const unsigned char *get_midx_checksum(struct multi_pack_index *m)
 {
 	return m->data + m->data_len - the_hash_algo->rawsz;
 }
 
-static char *get_midx_filename(const char *object_dir)
+char *get_midx_filename(const char *object_dir)
 {
 	return xstrfmt("%s/pack/multi-pack-index", object_dir);
 }
diff --git a/midx.h b/midx.h
index 8684cf0fef..1172df1a71 100644
--- a/midx.h
+++ b/midx.h
@@ -42,6 +42,8 @@ struct multi_pack_index {
 #define MIDX_PROGRESS     (1 << 0)
 #define MIDX_WRITE_REV_INDEX (1 << 1)
 
+const unsigned char *get_midx_checksum(struct multi_pack_index *m);
+char *get_midx_filename(const char *object_dir);
 char *get_midx_rev_filename(struct multi_pack_index *m);
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 07/24] midx: clear auxiliary .rev after replacing the MIDX
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (5 preceding siblings ...)
  2021-06-21 22:25   ` [PATCH v2 06/24] midx: make a number of functions non-static Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-07-21 10:19     ` Jeff King
  2021-06-21 22:25   ` [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing Taylor Blau
                     ` (17 subsequent siblings)
  24 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

When writing a new multi-pack index, write_midx_internal() attempts to
clean up any auxiliary files (currently just the MIDX's `.rev` file, but
soon to include a `.bitmap`, too) corresponding to the MIDX it's
replacing.

This step should happen after the new MIDX is written into place, since
doing so beforehand means that the old MIDX could be read without its
corresponding .rev file.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index fa23d57a24..40eb7974ba 100644
--- a/midx.c
+++ b/midx.c
@@ -1076,10 +1076,11 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 
 	if (flags & MIDX_WRITE_REV_INDEX)
 		write_midx_reverse_index(midx_name, midx_hash, &ctx);
-	clear_midx_files_ext(the_repository, ".rev", midx_hash);
 
 	commit_lock_file(&lk);
 
+	clear_midx_files_ext(the_repository, ".rev", midx_hash);
+
 cleanup:
 	for (i = 0; i < ctx.nr; i++) {
 		if (ctx.info[i].p) {
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (6 preceding siblings ...)
  2021-06-21 22:25   ` [PATCH v2 07/24] midx: clear auxiliary .rev after replacing the MIDX Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-06-24 23:43     ` Ævar Arnfjörð Bjarmason
  2021-07-21 10:23     ` Jeff King
  2021-06-21 22:25   ` [PATCH v2 09/24] midx: infer preferred pack when not given one Taylor Blau
                     ` (16 subsequent siblings)
  24 siblings, 2 replies; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

When writing a new multi-pack index, write_midx_internal() attempts to
load any existing one to fill in some pieces of information. But it uses
load_multi_pack_index(), which ignores the configuration
"core.multiPackIndex", which indicates whether or not Git is allowed to
read an existing multi-pack-index.

Replace this with a routine that does respect that setting, to avoid
reading multi-pack-index files when told not to.

This avoids a problem that would arise in subsequent patches due to the
combination of 'git repack' reopening the object store in-process and
the multi-pack index code not checking whether a pack already exists in
the object store when calling add_pack_to_midx().

This would ultimately lead to a cycle being created along the
'packed_git' struct's '->next' pointer. That is obviously bad, but it
has hard-to-debug downstream effects like saying a bitmap can't be
loaded for a pack because one already exists (for the same pack).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 40eb7974ba..759007d5a8 100644
--- a/midx.c
+++ b/midx.c
@@ -908,8 +908,18 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 
 	if (m)
 		ctx.m = m;
-	else
-		ctx.m = load_multi_pack_index(object_dir, 1);
+	else {
+		struct multi_pack_index *cur;
+
+		prepare_multi_pack_index_one(the_repository, object_dir, 1);
+
+		ctx.m = NULL;
+		for (cur = the_repository->objects->multi_pack_index; cur;
+		     cur = cur->next) {
+			if (!strcmp(object_dir, cur->object_dir))
+				ctx.m = cur;
+		}
+	}
 
 	ctx.nr = 0;
 	ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 09/24] midx: infer preferred pack when not given one
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (7 preceding siblings ...)
  2021-06-21 22:25   ` [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-07-21 10:34     ` Jeff King
  2021-06-21 22:25   ` [PATCH v2 10/24] pack-bitmap.c: introduce 'bitmap_num_objects()' Taylor Blau
                     ` (15 subsequent siblings)
  24 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

In 9218c6a40c (midx: allow marking a pack as preferred, 2021-03-30), the
multi-pack index code learned how to select a pack which all duplicate
objects are selected from. That is, if an object appears in multiple
packs, select the copy in the preferred pack before breaking ties
according to the other rules like pack mtime and readdir() order.

Not specifying a preferred pack can cause serious problems with
multi-pack reachability bitmaps, because these bitmaps rely on having at
least one pack from which all duplicates are selected. Not having such a
pack causes problems with the pack reuse code (e.g., like assuming that
a base object was sent from that pack via reuse when in fact the base
was selected from a different pack).

So why does not marking a pack preferred cause problems here? The reason
is roughly as follows:

  - Ties are broken (when handling duplicate objects) by sorting
    according to midx_oid_compare(), which sorts objects by OID,
    preferred-ness, pack mtime, and finally pack ID (more on that
    later).

  - The psuedo pack-order (described in
    Documentation/technical/bitmap-format.txt) is computed by
    midx_pack_order(), and sorts by pack ID and pack offset, with
    preferred packs sorting first.

  - But! Pack IDs come from incrementing the pack count in
    add_pack_to_midx(), which is a callback to
    for_each_file_in_pack_dir(), meaning that pack IDs are assigned in
    readdir() order.

When specifying a preferred pack, all of that works fine, because
duplicate objects are correctly resolved in favor of the copy in the
preferred pack, and the preferred pack sorts first in the object order.

"Sorting first" is critical, because the bitmap code relies on finding
out which pack holds the first object in the MIDX's pseudo pack-order to
determine which pack is preferred.

But if we didn't specify a preferred pack, and the pack which comes
first in readdir() order does not also have the lowest timestamp, then
it's possible that that pack (the one that sorts first in pseudo-pack
order, which the bitmap code will treat as the preferred one) did *not*
have all duplicate objects resolved in its favor, resulting in breakage.

The fix is simple: pick a (semi-arbitrary) preferred pack when none was
specified. This forces that pack to have duplicates resolved in its
favor, and (critically) to sort first in pseudo-pack order.
Unfortunately, testing this behavior portably isn't possible, since it
depends on readdir() order which isn't guaranteed by POSIX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 39 +++++++++++++++++++++++++++++++++------
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/midx.c b/midx.c
index 759007d5a8..752d36c57f 100644
--- a/midx.c
+++ b/midx.c
@@ -950,15 +950,46 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
 		goto cleanup;
 
-	ctx.preferred_pack_idx = -1;
 	if (preferred_pack_name) {
+		int found = 0;
 		for (i = 0; i < ctx.nr; i++) {
 			if (!cmp_idx_or_pack_name(preferred_pack_name,
 						  ctx.info[i].pack_name)) {
 				ctx.preferred_pack_idx = i;
+				found = 1;
 				break;
 			}
 		}
+
+		if (!found)
+			warning(_("unknown preferred pack: '%s'"),
+				preferred_pack_name);
+	} else if (ctx.nr && (flags & MIDX_WRITE_REV_INDEX)) {
+		time_t oldest = ctx.info[0].p->mtime;
+		ctx.preferred_pack_idx = 0;
+
+		if (packs_to_drop && packs_to_drop->nr)
+			BUG("cannot write a MIDX bitmap during expiration");
+
+		/*
+		 * set a preferred pack when writing a bitmap to ensure that
+		 * the pack from which the first object is selected in pseudo
+		 * pack-order has all of its objects selected from that pack
+		 * (and not another pack containing a duplicate)
+		 */
+		for (i = 1; i < ctx.nr; i++) {
+			time_t mtime = ctx.info[i].p->mtime;
+			if (mtime < oldest) {
+				oldest = mtime;
+				ctx.preferred_pack_idx = i;
+			}
+		}
+	} else {
+		/*
+		 * otherwise don't mark any pack as preferred to avoid
+		 * interfering with expiration logic below
+		 */
+		ctx.preferred_pack_idx = -1;
 	}
 
 	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
@@ -1029,11 +1060,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 						      ctx.info, ctx.nr,
 						      sizeof(*ctx.info),
 						      idx_or_pack_name_cmp);
-
-		if (!preferred)
-			warning(_("unknown preferred pack: '%s'"),
-				preferred_pack_name);
-		else {
+		if (preferred) {
 			uint32_t perm = ctx.pack_perm[preferred->orig_pack_int_id];
 			if (perm == PACK_EXPIRED)
 				warning(_("preferred pack '%s' is expired"),
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 10/24] pack-bitmap.c: introduce 'bitmap_num_objects()'
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (8 preceding siblings ...)
  2021-06-21 22:25   ` [PATCH v2 09/24] midx: infer preferred pack when not given one Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-07-21 10:35     ` Jeff King
  2021-06-21 22:25   ` [PATCH v2 11/24] pack-bitmap.c: introduce 'nth_bitmap_object_oid()' Taylor Blau
                     ` (14 subsequent siblings)
  24 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A subsequent patch to support reading MIDX bitmaps will be less noisy
after extracting a generic function to return how many objects are
contained in a bitmap.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 37 +++++++++++++++++++++----------------
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 368fa59a42..2dc135d34a 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -136,6 +136,11 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
 	return b;
 }
 
+static uint32_t bitmap_num_objects(struct bitmap_index *index)
+{
+	return index->pack->num_objects;
+}
+
 static int load_bitmap_header(struct bitmap_index *index)
 {
 	struct bitmap_disk_header *header = (void *)index->map;
@@ -154,7 +159,7 @@ static int load_bitmap_header(struct bitmap_index *index)
 	/* Parse known bitmap format options */
 	{
 		uint32_t flags = ntohs(header->options);
-		size_t cache_size = st_mult(index->pack->num_objects, sizeof(uint32_t));
+		size_t cache_size = st_mult(bitmap_num_objects(index), sizeof(uint32_t));
 		unsigned char *index_end = index->map + index->map_size - the_hash_algo->rawsz;
 
 		if ((flags & BITMAP_OPT_FULL_DAG) == 0)
@@ -399,7 +404,7 @@ static inline int bitmap_position_extended(struct bitmap_index *bitmap_git,
 
 	if (pos < kh_end(positions)) {
 		int bitmap_pos = kh_value(positions, pos);
-		return bitmap_pos + bitmap_git->pack->num_objects;
+		return bitmap_pos + bitmap_num_objects(bitmap_git);
 	}
 
 	return -1;
@@ -451,7 +456,7 @@ static int ext_index_add_object(struct bitmap_index *bitmap_git,
 		bitmap_pos = kh_value(eindex->positions, hash_pos);
 	}
 
-	return bitmap_pos + bitmap_git->pack->num_objects;
+	return bitmap_pos + bitmap_num_objects(bitmap_git);
 }
 
 struct bitmap_show_data {
@@ -650,7 +655,7 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
 	for (i = 0; i < eindex->count; ++i) {
 		struct object *obj;
 
-		if (!bitmap_get(objects, bitmap_git->pack->num_objects + i))
+		if (!bitmap_get(objects, bitmap_num_objects(bitmap_git) + i))
 			continue;
 
 		obj = eindex->objects[i];
@@ -808,7 +813,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
 	 * individually.
 	 */
 	for (i = 0; i < eindex->count; i++) {
-		uint32_t pos = i + bitmap_git->pack->num_objects;
+		uint32_t pos = i + bitmap_num_objects(bitmap_git);
 		if (eindex->objects[i]->type == type &&
 		    bitmap_get(to_filter, pos) &&
 		    !bitmap_get(tips, pos))
@@ -835,7 +840,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 
 	oi.sizep = &size;
 
-	if (pos < pack->num_objects) {
+	if (pos < bitmap_num_objects(bitmap_git)) {
 		off_t ofs = pack_pos_to_offset(pack, pos);
 		if (packed_object_info(the_repository, pack, ofs, &oi) < 0) {
 			struct object_id oid;
@@ -845,7 +850,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 		}
 	} else {
 		struct eindex *eindex = &bitmap_git->ext_index;
-		struct object *obj = eindex->objects[pos - pack->num_objects];
+		struct object *obj = eindex->objects[pos - bitmap_num_objects(bitmap_git)];
 		if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
 			die(_("unable to get size of %s"), oid_to_hex(&obj->oid));
 	}
@@ -887,7 +892,7 @@ static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
 	}
 
 	for (i = 0; i < eindex->count; i++) {
-		uint32_t pos = i + bitmap_git->pack->num_objects;
+		uint32_t pos = i + bitmap_num_objects(bitmap_git);
 		if (eindex->objects[i]->type == OBJ_BLOB &&
 		    bitmap_get(to_filter, pos) &&
 		    !bitmap_get(tips, pos) &&
@@ -1113,8 +1118,8 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 	enum object_type type;
 	unsigned long size;
 
-	if (pos >= bitmap_git->pack->num_objects)
-		return; /* not actually in the pack */
+	if (pos >= bitmap_num_objects(bitmap_git))
+		return; /* not actually in the pack or MIDX */
 
 	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
 	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
@@ -1180,6 +1185,7 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 	struct pack_window *w_curs = NULL;
 	size_t i = 0;
 	uint32_t offset;
+	uint32_t objects_nr = bitmap_num_objects(bitmap_git);
 
 	assert(result);
 
@@ -1187,8 +1193,8 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 		i++;
 
 	/* Don't mark objects not in the packfile */
-	if (i > bitmap_git->pack->num_objects / BITS_IN_EWORD)
-		i = bitmap_git->pack->num_objects / BITS_IN_EWORD;
+	if (i > objects_nr / BITS_IN_EWORD)
+		i = objects_nr / BITS_IN_EWORD;
 
 	reuse = bitmap_word_alloc(i);
 	memset(reuse->words, 0xFF, i * sizeof(eword_t));
@@ -1272,7 +1278,7 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
 
 	for (i = 0; i < eindex->count; ++i) {
 		if (eindex->objects[i]->type == type &&
-			bitmap_get(objects, bitmap_git->pack->num_objects + i))
+			bitmap_get(objects, bitmap_num_objects(bitmap_git) + i))
 			count++;
 	}
 
@@ -1493,7 +1499,7 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 	uint32_t i, num_objects;
 	uint32_t *reposition;
 
-	num_objects = bitmap_git->pack->num_objects;
+	num_objects = bitmap_num_objects(bitmap_git);
 	CALLOC_ARRAY(reposition, num_objects);
 
 	for (i = 0; i < num_objects; ++i) {
@@ -1576,7 +1582,6 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
 static off_t get_disk_usage_for_extended(struct bitmap_index *bitmap_git)
 {
 	struct bitmap *result = bitmap_git->result;
-	struct packed_git *pack = bitmap_git->pack;
 	struct eindex *eindex = &bitmap_git->ext_index;
 	off_t total = 0;
 	struct object_info oi = OBJECT_INFO_INIT;
@@ -1588,7 +1593,7 @@ static off_t get_disk_usage_for_extended(struct bitmap_index *bitmap_git)
 	for (i = 0; i < eindex->count; i++) {
 		struct object *obj = eindex->objects[i];
 
-		if (!bitmap_get(result, pack->num_objects + i))
+		if (!bitmap_get(result, bitmap_num_objects(bitmap_git) + i))
 			continue;
 
 		if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 11/24] pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (9 preceding siblings ...)
  2021-06-21 22:25   ` [PATCH v2 10/24] pack-bitmap.c: introduce 'bitmap_num_objects()' Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-06-24 14:59     ` Taylor Blau
  2021-07-21 10:37     ` Jeff King
  2021-06-21 22:25   ` [PATCH v2 12/24] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' Taylor Blau
                     ` (13 subsequent siblings)
  24 siblings, 2 replies; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A subsequent patch to support reading MIDX bitmaps will be less noisy
after extracting a generic function to fetch the nth OID contained in
the bitmap.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 2dc135d34a..9757cd0fbb 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -223,6 +223,13 @@ static inline uint8_t read_u8(const unsigned char *buffer, size_t *pos)
 
 #define MAX_XOR_OFFSET 160
 
+static void nth_bitmap_object_oid(struct bitmap_index *index,
+				  struct object_id *oid,
+				  uint32_t n)
+{
+	nth_packed_object_id(oid, index->pack, n);
+}
+
 static int load_bitmap_entries_v1(struct bitmap_index *index)
 {
 	uint32_t i;
@@ -242,9 +249,7 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 		xor_offset = read_u8(index->map, &index->map_pos);
 		flags = read_u8(index->map, &index->map_pos);
 
-		if (nth_packed_object_id(&oid, index->pack, commit_idx_pos) < 0)
-			return error("corrupt ewah bitmap: commit index %u out of range",
-				     (unsigned)commit_idx_pos);
+		nth_bitmap_object_oid(index, &oid, commit_idx_pos);
 
 		bitmap = read_bitmap_1(index);
 		if (!bitmap)
@@ -844,8 +849,8 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 		off_t ofs = pack_pos_to_offset(pack, pos);
 		if (packed_object_info(the_repository, pack, ofs, &oi) < 0) {
 			struct object_id oid;
-			nth_packed_object_id(&oid, pack,
-					     pack_pos_to_index(pack, pos));
+			nth_bitmap_object_oid(bitmap_git, &oid,
+					      pack_pos_to_index(pack, pos));
 			die(_("unable to get size of %s"), oid_to_hex(&oid));
 		}
 	} else {
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 12/24] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (10 preceding siblings ...)
  2021-06-21 22:25   ` [PATCH v2 11/24] pack-bitmap.c: introduce 'nth_bitmap_object_oid()' Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-07-21 10:39     ` Jeff King
  2021-06-21 22:25   ` [PATCH v2 13/24] pack-bitmap: read multi-pack bitmaps Taylor Blau
                     ` (12 subsequent siblings)
  24 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

In a recent commit, pack-objects learned support for the
'pack.preferBitmapTips' configuration. This patch prepares the
multi-pack bitmap code to respect this configuration, too.

Since the multi-pack bitmap code already does a traversal of all
references (in order to discover the set of reachable commits in the
multi-pack index), it is more efficient to check whether or not each
reference is a suffix of any value of 'pack.preferBitmapTips' rather
than do an additional traversal.

Implement a function 'bitmap_is_preferred_refname()' which does just
that. The caller will be added in a subsequent patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 16 ++++++++++++++++
 pack-bitmap.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 9757cd0fbb..d882bf7ce1 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1632,3 +1632,19 @@ const struct string_list *bitmap_preferred_tips(struct repository *r)
 {
 	return repo_config_get_value_multi(r, "pack.preferbitmaptips");
 }
+
+int bitmap_is_preferred_refname(struct repository *r, const char *refname)
+{
+	const struct string_list *preferred_tips = bitmap_preferred_tips(r);
+	struct string_list_item *item;
+
+	if (!preferred_tips)
+		return 0;
+
+	for_each_string_list_item(item, preferred_tips) {
+		if (starts_with(refname, item->string))
+			return 1;
+	}
+
+	return 0;
+}
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 020cd8d868..52ea10de51 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -94,5 +94,6 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint16_t options);
 
 const struct string_list *bitmap_preferred_tips(struct repository *r);
+int bitmap_is_preferred_refname(struct repository *r, const char *refname);
 
 #endif
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 13/24] pack-bitmap: read multi-pack bitmaps
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (11 preceding siblings ...)
  2021-06-21 22:25   ` [PATCH v2 12/24] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-07-21 11:32     ` Jeff King
  2021-06-21 22:25   ` [PATCH v2 14/24] pack-bitmap: write " Taylor Blau
                     ` (11 subsequent siblings)
  24 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

This prepares the code in pack-bitmap to interpret the new multi-pack
bitmaps described in Documentation/technical/bitmap-format.txt, which
mostly involves converting bit positions to accommodate looking them up
in a MIDX.

Note that there are currently no writers who write multi-pack bitmaps,
and that this will be implemented in the subsequent commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c |   5 +
 pack-bitmap-write.c    |   2 +-
 pack-bitmap.c          | 362 +++++++++++++++++++++++++++++++++++++----
 pack-bitmap.h          |   5 +
 packfile.c             |   2 +-
 5 files changed, 340 insertions(+), 36 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 8a523624a1..e11d3ac2e5 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1124,6 +1124,11 @@ static void write_reused_pack(struct hashfile *f)
 				break;
 
 			offset += ewah_bit_ctz64(word >> offset);
+			/*
+			 * Can use bit positions directly, even for MIDX
+			 * bitmaps. See comment in try_partial_reuse()
+			 * for why.
+			 */
 			write_reused_pack_one(pos + offset, f, &w_curs);
 			display_progress(progress_state, ++written);
 		}
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 142fd0adb8..9c55c1531e 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -48,7 +48,7 @@ void bitmap_writer_show_progress(int show)
 }
 
 /**
- * Build the initial type index for the packfile
+ * Build the initial type index for the packfile or multi-pack-index
  */
 void bitmap_writer_build_type_index(struct packing_data *to_pack,
 				    struct pack_idx_entry **index,
diff --git a/pack-bitmap.c b/pack-bitmap.c
index d882bf7ce1..4110d23ca1 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -13,6 +13,7 @@
 #include "repository.h"
 #include "object-store.h"
 #include "list-objects-filter-options.h"
+#include "midx.h"
 #include "config.h"
 
 /*
@@ -35,8 +36,15 @@ struct stored_bitmap {
  * the active bitmap index is the largest one.
  */
 struct bitmap_index {
-	/* Packfile to which this bitmap index belongs to */
+	/*
+	 * The pack or multi-pack index (MIDX) that this bitmap index belongs
+	 * to.
+	 *
+	 * Exactly one of these must be non-NULL; this specifies the object
+	 * order used to interpret this bitmap.
+	 */
 	struct packed_git *pack;
+	struct multi_pack_index *midx;
 
 	/*
 	 * Mark the first `reuse_objects` in the packfile as reused:
@@ -71,6 +79,9 @@ struct bitmap_index {
 	/* If not NULL, this is a name-hash cache pointing into map. */
 	uint32_t *hashes;
 
+	/* The checksum of the packfile or MIDX; points into map. */
+	const unsigned char *checksum;
+
 	/*
 	 * Extended index.
 	 *
@@ -138,6 +149,8 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
 
 static uint32_t bitmap_num_objects(struct bitmap_index *index)
 {
+	if (index->midx)
+		return index->midx->num_objects;
 	return index->pack->num_objects;
 }
 
@@ -175,6 +188,7 @@ static int load_bitmap_header(struct bitmap_index *index)
 	}
 
 	index->entry_count = ntohl(header->entry_count);
+	index->checksum = header->checksum;
 	index->map_pos += header_size;
 	return 0;
 }
@@ -227,7 +241,10 @@ static void nth_bitmap_object_oid(struct bitmap_index *index,
 				  struct object_id *oid,
 				  uint32_t n)
 {
-	nth_packed_object_id(oid, index->pack, n);
+	if (index->midx)
+		nth_midxed_object_oid(oid, index->midx, n);
+	else
+		nth_packed_object_id(oid, index->pack, n);
 }
 
 static int load_bitmap_entries_v1(struct bitmap_index *index)
@@ -272,7 +289,14 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 	return 0;
 }
 
-static char *pack_bitmap_filename(struct packed_git *p)
+char *midx_bitmap_filename(struct multi_pack_index *midx)
+{
+	return xstrfmt("%s-%s.bitmap",
+		       get_midx_filename(midx->object_dir),
+		       hash_to_hex(get_midx_checksum(midx)));
+}
+
+char *pack_bitmap_filename(struct packed_git *p)
 {
 	size_t len;
 
@@ -281,6 +305,57 @@ static char *pack_bitmap_filename(struct packed_git *p)
 	return xstrfmt("%.*s.bitmap", (int)len, p->pack_name);
 }
 
+static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
+			      struct multi_pack_index *midx)
+{
+	struct stat st;
+	char *idx_name = midx_bitmap_filename(midx);
+	int fd = git_open(idx_name);
+
+	free(idx_name);
+
+	if (fd < 0)
+		return -1;
+
+	if (fstat(fd, &st)) {
+		close(fd);
+		return -1;
+	}
+
+	if (bitmap_git->pack || bitmap_git->midx) {
+		/* ignore extra bitmap file; we can only handle one */
+		warning("ignoring extra bitmap file: %s",
+			get_midx_filename(midx->object_dir));
+		close(fd);
+		return -1;
+	}
+
+	bitmap_git->midx = midx;
+	bitmap_git->map_size = xsize_t(st.st_size);
+	bitmap_git->map_pos = 0;
+	bitmap_git->map = xmmap(NULL, bitmap_git->map_size, PROT_READ,
+				MAP_PRIVATE, fd, 0);
+	close(fd);
+
+	if (load_bitmap_header(bitmap_git) < 0)
+		goto cleanup;
+
+	if (!hasheq(get_midx_checksum(bitmap_git->midx), bitmap_git->checksum))
+		goto cleanup;
+
+	if (load_midx_revindex(bitmap_git->midx) < 0) {
+		warning(_("multi-pack bitmap is missing required reverse index"));
+		goto cleanup;
+	}
+	return 0;
+
+cleanup:
+	munmap(bitmap_git->map, bitmap_git->map_size);
+	bitmap_git->map_size = 0;
+	bitmap_git->map = NULL;
+	return -1;
+}
+
 static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git *packfile)
 {
 	int fd;
@@ -302,12 +377,18 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
 		return -1;
 	}
 
-	if (bitmap_git->pack) {
+	if (bitmap_git->pack || bitmap_git->midx) {
+		/* ignore extra bitmap file; we can only handle one */
 		warning("ignoring extra bitmap file: %s", packfile->pack_name);
 		close(fd);
 		return -1;
 	}
 
+	if (!is_pack_valid(packfile)) {
+		close(fd);
+		return -1;
+	}
+
 	bitmap_git->pack = packfile;
 	bitmap_git->map_size = xsize_t(st.st_size);
 	bitmap_git->map = xmmap(NULL, bitmap_git->map_size, PROT_READ, MAP_PRIVATE, fd, 0);
@@ -324,13 +405,36 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
 	return 0;
 }
 
-static int load_pack_bitmap(struct bitmap_index *bitmap_git)
+static int load_reverse_index(struct bitmap_index *bitmap_git)
+{
+	if (bitmap_is_midx(bitmap_git)) {
+		uint32_t i;
+		int ret;
+
+		ret = load_midx_revindex(bitmap_git->midx);
+		if (ret)
+			return ret;
+
+		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
+			if (prepare_midx_pack(the_repository, bitmap_git->midx, i))
+				die(_("load_reverse_index: could not open pack"));
+			ret = load_pack_revindex(bitmap_git->midx->packs[i]);
+			if (ret)
+				return ret;
+		}
+		return 0;
+	}
+	return load_pack_revindex(bitmap_git->pack);
+}
+
+static int load_bitmap(struct bitmap_index *bitmap_git)
 {
 	assert(bitmap_git->map);
 
 	bitmap_git->bitmaps = kh_init_oid_map();
 	bitmap_git->ext_index.positions = kh_init_oid_pos();
-	if (load_pack_revindex(bitmap_git->pack))
+
+	if (load_reverse_index(bitmap_git))
 		goto failed;
 
 	if (!(bitmap_git->commits = read_bitmap_1(bitmap_git)) ||
@@ -374,11 +478,35 @@ static int open_pack_bitmap(struct repository *r,
 	return ret;
 }
 
+static int open_midx_bitmap(struct repository *r,
+			    struct bitmap_index *bitmap_git)
+{
+	struct multi_pack_index *midx;
+
+	assert(!bitmap_git->map);
+
+	for (midx = get_multi_pack_index(r); midx; midx = midx->next) {
+		if (!open_midx_bitmap_1(bitmap_git, midx))
+			return 0;
+	}
+	return -1;
+}
+
+static int open_bitmap(struct repository *r,
+		       struct bitmap_index *bitmap_git)
+{
+	assert(!bitmap_git->map);
+
+	if (!open_midx_bitmap(r, bitmap_git))
+		return 0;
+	return open_pack_bitmap(r, bitmap_git);
+}
+
 struct bitmap_index *prepare_bitmap_git(struct repository *r)
 {
 	struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
 
-	if (!open_pack_bitmap(r, bitmap_git) && !load_pack_bitmap(bitmap_git))
+	if (!open_bitmap(r, bitmap_git) && !load_bitmap(bitmap_git))
 		return bitmap_git;
 
 	free_bitmap_index(bitmap_git);
@@ -428,10 +556,26 @@ static inline int bitmap_position_packfile(struct bitmap_index *bitmap_git,
 	return pos;
 }
 
+static int bitmap_position_midx(struct bitmap_index *bitmap_git,
+				const struct object_id *oid)
+{
+	uint32_t want, got;
+	if (!bsearch_midx(oid, bitmap_git->midx, &want))
+		return -1;
+
+	if (midx_to_pack_pos(bitmap_git->midx, want, &got) < 0)
+		return -1;
+	return got;
+}
+
 static int bitmap_position(struct bitmap_index *bitmap_git,
 			   const struct object_id *oid)
 {
-	int pos = bitmap_position_packfile(bitmap_git, oid);
+	int pos;
+	if (bitmap_is_midx(bitmap_git))
+		pos = bitmap_position_midx(bitmap_git, oid);
+	else
+		pos = bitmap_position_packfile(bitmap_git, oid);
 	return (pos >= 0) ? pos : bitmap_position_extended(bitmap_git, oid);
 }
 
@@ -724,6 +868,7 @@ static void show_objects_for_type(
 			continue;
 
 		for (offset = 0; offset < BITS_IN_EWORD; ++offset) {
+			struct packed_git *pack;
 			struct object_id oid;
 			uint32_t hash = 0, index_pos;
 			off_t ofs;
@@ -733,14 +878,28 @@ static void show_objects_for_type(
 
 			offset += ewah_bit_ctz64(word >> offset);
 
-			index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset);
-			ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset);
-			nth_packed_object_id(&oid, bitmap_git->pack, index_pos);
+			if (bitmap_is_midx(bitmap_git)) {
+				struct multi_pack_index *m = bitmap_git->midx;
+				uint32_t pack_id;
+
+				index_pos = pack_pos_to_midx(m, pos + offset);
+				ofs = nth_midxed_offset(m, index_pos);
+				nth_midxed_object_oid(&oid, m, index_pos);
+
+				pack_id = nth_midxed_pack_int_id(m, index_pos);
+				pack = bitmap_git->midx->packs[pack_id];
+			} else {
+				index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset);
+				ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset);
+				nth_bitmap_object_oid(bitmap_git, &oid, index_pos);
+
+				pack = bitmap_git->pack;
+			}
 
 			if (bitmap_git->hashes)
 				hash = get_be32(bitmap_git->hashes + index_pos);
 
-			show_reach(&oid, object_type, 0, hash, bitmap_git->pack, ofs);
+			show_reach(&oid, object_type, 0, hash, pack, ofs);
 		}
 	}
 }
@@ -752,8 +911,13 @@ static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
 		struct object *object = roots->item;
 		roots = roots->next;
 
-		if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
-			return 1;
+		if (bitmap_is_midx(bitmap_git)) {
+			if (bsearch_midx(&object->oid, bitmap_git->midx, NULL))
+				return 1;
+		} else {
+			if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
+				return 1;
+		}
 	}
 
 	return 0;
@@ -839,14 +1003,26 @@ static void filter_bitmap_blob_none(struct bitmap_index *bitmap_git,
 static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 				     uint32_t pos)
 {
-	struct packed_git *pack = bitmap_git->pack;
 	unsigned long size;
 	struct object_info oi = OBJECT_INFO_INIT;
 
 	oi.sizep = &size;
 
 	if (pos < bitmap_num_objects(bitmap_git)) {
-		off_t ofs = pack_pos_to_offset(pack, pos);
+		struct packed_git *pack;
+		off_t ofs;
+
+		if (bitmap_is_midx(bitmap_git)) {
+			uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
+			uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
+
+			pack = bitmap_git->midx->packs[pack_id];
+			ofs = nth_midxed_offset(bitmap_git->midx, midx_pos);
+		} else {
+			pack = bitmap_git->pack;
+			ofs = pack_pos_to_offset(pack, pos);
+		}
+
 		if (packed_object_info(the_repository, pack, ofs, &oi) < 0) {
 			struct object_id oid;
 			nth_bitmap_object_oid(bitmap_git, &oid,
@@ -1027,7 +1203,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	/* try to open a bitmapped pack, but don't parse it yet
 	 * because we may not need to use it */
 	CALLOC_ARRAY(bitmap_git, 1);
-	if (open_pack_bitmap(revs->repo, bitmap_git) < 0)
+	if (open_bitmap(revs->repo, bitmap_git) < 0)
 		goto cleanup;
 
 	for (i = 0; i < revs->pending.nr; ++i) {
@@ -1071,7 +1247,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	 * from disk. this is the point of no return; after this the rev_list
 	 * becomes invalidated and we must perform the revwalk through bitmaps
 	 */
-	if (load_pack_bitmap(bitmap_git) < 0)
+	if (load_bitmap(bitmap_git) < 0)
 		goto cleanup;
 
 	object_array_clear(&revs->pending);
@@ -1115,19 +1291,43 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 }
 
 static void try_partial_reuse(struct bitmap_index *bitmap_git,
+			      struct packed_git *pack,
 			      size_t pos,
 			      struct bitmap *reuse,
 			      struct pack_window **w_curs)
 {
-	off_t offset, header;
+	off_t offset, delta_obj_offset;
 	enum object_type type;
 	unsigned long size;
 
-	if (pos >= bitmap_num_objects(bitmap_git))
-		return; /* not actually in the pack or MIDX */
+	/*
+	 * try_partial_reuse() is called either on (a) objects in the
+	 * bitmapped pack (in the case of a single-pack bitmap) or (b)
+	 * objects in the preferred pack of a multi-pack bitmap.
+	 * Importantly, the latter can pretend as if only a single pack
+	 * exists because:
+	 *
+	 *   - The first pack->num_objects bits of a MIDX bitmap are
+	 *     reserved for the preferred pack, and
+	 *
+	 *   - Ties due to duplicate objects are always resolved in
+	 *     favor of the preferred pack.
+	 *
+	 * Therefore we do not need to ever ask the MIDX for its copy of
+	 * an object by OID, since it will always select it from the
+	 * preferred pack. Likewise, the selected copy of the base
+	 * object for any deltas will reside in the same pack.
+	 *
+	 * This means that we can reuse pos when looking up the bit in
+	 * the reuse bitmap, too, since bits corresponding to the
+	 * preferred pack precede all bits from other packs.
+	 */
 
-	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
-	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
+	if (pos >= pack->num_objects)
+		return; /* not actually in the pack or MIDX preferred pack */
+
+	offset = delta_obj_offset = pack_pos_to_offset(pack, pos);
+	type = unpack_object_header(pack, w_curs, &offset, &size);
 	if (type < 0)
 		return; /* broken packfile, punt */
 
@@ -1143,11 +1343,11 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 		 * and the normal slow path will complain about it in
 		 * more detail.
 		 */
-		base_offset = get_delta_base(bitmap_git->pack, w_curs,
-					     &offset, type, header);
+		base_offset = get_delta_base(pack, w_curs, &offset, type,
+					     delta_obj_offset);
 		if (!base_offset)
 			return;
-		if (offset_to_pack_pos(bitmap_git->pack, base_offset, &base_pos) < 0)
+		if (offset_to_pack_pos(pack, base_offset, &base_pos) < 0)
 			return;
 
 		/*
@@ -1180,24 +1380,48 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 	bitmap_set(reuse, pos);
 }
 
+static uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git)
+{
+	struct multi_pack_index *m = bitmap_git->midx;
+	if (!m)
+		BUG("midx_preferred_pack: requires non-empty MIDX");
+	return nth_midxed_pack_int_id(m, pack_pos_to_midx(bitmap_git->midx, 0));
+}
+
 int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 				       struct packed_git **packfile_out,
 				       uint32_t *entries,
 				       struct bitmap **reuse_out)
 {
+	struct packed_git *pack;
 	struct bitmap *result = bitmap_git->result;
 	struct bitmap *reuse;
 	struct pack_window *w_curs = NULL;
 	size_t i = 0;
 	uint32_t offset;
-	uint32_t objects_nr = bitmap_num_objects(bitmap_git);
+	uint32_t objects_nr;
 
 	assert(result);
 
+	load_reverse_index(bitmap_git);
+
+	if (bitmap_is_midx(bitmap_git))
+		pack = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)];
+	else
+		pack = bitmap_git->pack;
+	objects_nr = pack->num_objects;
+
 	while (i < result->word_alloc && result->words[i] == (eword_t)~0)
 		i++;
 
-	/* Don't mark objects not in the packfile */
+	/*
+	 * Don't mark objects not in the packfile or preferred pack. This bitmap
+	 * marks objects eligible for reuse, but the pack-reuse code only
+	 * understands how to reuse a single pack. Since the preferred pack is
+	 * guaranteed to have all bases for its deltas (in a multi-pack bitmap),
+	 * we use it instead of another pack. In single-pack bitmaps, the choice
+	 * is made for us.
+	 */
 	if (i > objects_nr / BITS_IN_EWORD)
 		i = objects_nr / BITS_IN_EWORD;
 
@@ -1213,7 +1437,15 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 				break;
 
 			offset += ewah_bit_ctz64(word >> offset);
-			try_partial_reuse(bitmap_git, pos + offset, reuse, &w_curs);
+			if (bitmap_is_midx(bitmap_git)) {
+				/*
+				 * Can't reuse from a non-preferred pack (see
+				 * above).
+				 */
+				if (pos + offset >= objects_nr)
+					continue;
+			}
+			try_partial_reuse(bitmap_git, pack, pos + offset, reuse, &w_curs);
 		}
 	}
 
@@ -1230,7 +1462,7 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 	 * need to be handled separately.
 	 */
 	bitmap_and_not(result, reuse);
-	*packfile_out = bitmap_git->pack;
+	*packfile_out = pack;
 	*reuse_out = reuse;
 	return 0;
 }
@@ -1504,6 +1736,12 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 	uint32_t i, num_objects;
 	uint32_t *reposition;
 
+	if (!bitmap_is_midx(bitmap_git))
+		load_reverse_index(bitmap_git);
+	else if (load_midx_revindex(bitmap_git->midx) < 0)
+		BUG("rebuild_existing_bitmaps: missing required rev-cache "
+		    "extension");
+
 	num_objects = bitmap_num_objects(bitmap_git);
 	CALLOC_ARRAY(reposition, num_objects);
 
@@ -1511,8 +1749,13 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 		struct object_id oid;
 		struct object_entry *oe;
 
-		nth_packed_object_id(&oid, bitmap_git->pack,
-				     pack_pos_to_index(bitmap_git->pack, i));
+		if (bitmap_is_midx(bitmap_git))
+			nth_midxed_object_oid(&oid,
+					      bitmap_git->midx,
+					      pack_pos_to_midx(bitmap_git->midx, i));
+		else
+			nth_packed_object_id(&oid, bitmap_git->pack,
+					     pack_pos_to_index(bitmap_git->pack, i));
 		oe = packlist_find(mapping, &oid);
 
 		if (oe)
@@ -1538,6 +1781,19 @@ void free_bitmap_index(struct bitmap_index *b)
 	free(b->ext_index.hashes);
 	bitmap_free(b->result);
 	bitmap_free(b->haves);
+	if (bitmap_is_midx(b)) {
+		/*
+		 * Multi-pack bitmaps need to have resources associated with
+		 * their on-disk reverse indexes unmapped so that stale .rev and
+		 * .bitmap files can be removed.
+		 *
+		 * Unlike pack-based bitmaps, multi-pack bitmaps can be read and
+		 * written in the same 'git multi-pack-index write --bitmap'
+		 * process. Close resources so they can be removed safely on
+		 * platforms like Windows.
+		 */
+		close_midx_revindex(b->midx);
+	}
 	free(b);
 }
 
@@ -1552,7 +1808,7 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
 				     enum object_type object_type)
 {
 	struct bitmap *result = bitmap_git->result;
-	struct packed_git *pack = bitmap_git->pack;
+	struct packed_git *pack;
 	off_t total = 0;
 	struct ewah_iterator it;
 	eword_t filter;
@@ -1575,7 +1831,31 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
 				break;
 
 			offset += ewah_bit_ctz64(word >> offset);
-			pos = base + offset;
+
+			if (bitmap_is_midx(bitmap_git)) {
+				uint32_t pack_pos;
+				uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, base + offset);
+				uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
+				off_t offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
+
+				pack = bitmap_git->midx->packs[pack_id];
+
+				if (offset_to_pack_pos(pack, offset, &pack_pos) < 0) {
+					struct object_id oid;
+					nth_midxed_object_oid(&oid, bitmap_git->midx, midx_pos);
+
+					die(_("could not find %s in pack #%"PRIu32" at offset %"PRIuMAX),
+					    oid_to_hex(&oid),
+					    pack_id,
+					    (uintmax_t)offset);
+				}
+
+				pos = pack_pos;
+			} else {
+				pack = bitmap_git->pack;
+				pos = base + offset;
+			}
+
 			total += pack_pos_to_offset(pack, pos + 1) -
 				 pack_pos_to_offset(pack, pos);
 		}
@@ -1628,6 +1908,20 @@ off_t get_disk_usage_from_bitmap(struct bitmap_index *bitmap_git,
 	return total;
 }
 
+int bitmap_is_midx(struct bitmap_index *bitmap_git)
+{
+	return !!bitmap_git->midx;
+}
+
+off_t bitmap_pack_offset(struct bitmap_index *bitmap_git, uint32_t pos)
+{
+	if (bitmap_is_midx(bitmap_git))
+		return nth_midxed_offset(bitmap_git->midx,
+					 pack_pos_to_midx(bitmap_git->midx, pos));
+	return nth_packed_object_offset(bitmap_git->pack,
+					pack_pos_to_index(bitmap_git->pack, pos));
+}
+
 const struct string_list *bitmap_preferred_tips(struct repository *r)
 {
 	return repo_config_get_value_multi(r, "pack.preferbitmaptips");
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 52ea10de51..30396a7a4a 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -92,6 +92,11 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint32_t index_nr,
 			  const char *filename,
 			  uint16_t options);
+char *midx_bitmap_filename(struct multi_pack_index *midx);
+char *pack_bitmap_filename(struct packed_git *p);
+
+int bitmap_is_midx(struct bitmap_index *bitmap_git);
+off_t bitmap_pack_offset(struct bitmap_index *bitmap_git, uint32_t pos);
 
 const struct string_list *bitmap_preferred_tips(struct repository *r);
 int bitmap_is_preferred_refname(struct repository *r, const char *refname);
diff --git a/packfile.c b/packfile.c
index 755aa7aec5..e855b93208 100644
--- a/packfile.c
+++ b/packfile.c
@@ -860,7 +860,7 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
 	if (!strcmp(file_name, "multi-pack-index"))
 		return;
 	if (starts_with(file_name, "multi-pack-index") &&
-	    ends_with(file_name, ".rev"))
+	    (ends_with(file_name, ".bitmap") || ends_with(file_name, ".rev")))
 		return;
 	if (ends_with(file_name, ".idx") ||
 	    ends_with(file_name, ".rev") ||
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 14/24] pack-bitmap: write multi-pack bitmaps
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (12 preceding siblings ...)
  2021-06-21 22:25   ` [PATCH v2 13/24] pack-bitmap: read multi-pack bitmaps Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-06-24 23:45     ` Ævar Arnfjörð Bjarmason
  2021-07-21 12:09     ` Jeff King
  2021-06-21 22:25   ` [PATCH v2 15/24] t5310: move some tests to lib-bitmap.sh Taylor Blau
                     ` (10 subsequent siblings)
  24 siblings, 2 replies; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Write multi-pack bitmaps in the format described by
Documentation/technical/bitmap-format.txt, inferring their presence with
the absence of '--bitmap'.

To write a multi-pack bitmap, this patch attempts to reuse as much of
the existing machinery from pack-objects as possible. Specifically, the
MIDX code prepares a packing_data struct that pretends as if a single
packfile has been generated containing all of the objects contained
within the MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt |  12 +-
 builtin/multi-pack-index.c             |   2 +
 midx.c                                 | 230 ++++++++++++++++++++++++-
 midx.h                                 |   1 +
 4 files changed, 236 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index ffd601bc17..ada14deb2c 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -10,7 +10,7 @@ SYNOPSIS
 --------
 [verse]
 'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress]
-	[--preferred-pack=<pack>] <subcommand>
+	[--preferred-pack=<pack>] [--[no-]bitmap] <subcommand>
 
 DESCRIPTION
 -----------
@@ -40,6 +40,9 @@ write::
 		multiple packs contain the same object. If not given,
 		ties are broken in favor of the pack with the lowest
 		mtime.
+
+	--[no-]bitmap::
+		Control whether or not a multi-pack bitmap is written.
 --
 
 verify::
@@ -81,6 +84,13 @@ EXAMPLES
 $ git multi-pack-index write
 -----------------------------------------------
 
+* Write a MIDX file for the packfiles in the current .git folder with a
+corresponding bitmap.
++
+-------------------------------------------------------------
+$ git multi-pack-index write --preferred-pack <pack> --bitmap
+-------------------------------------------------------------
+
 * Write a MIDX file for the packfiles in an alternate object store.
 +
 -----------------------------------------------
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 5d3ea445fd..bf6fa982e3 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -68,6 +68,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 		OPT_STRING(0, "preferred-pack", &opts.preferred_pack,
 			   N_("preferred-pack"),
 			   N_("pack for reuse when computing a multi-pack bitmap")),
+		OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
+			MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
 		OPT_END(),
 	};
 
diff --git a/midx.c b/midx.c
index 752d36c57f..a58cca707b 100644
--- a/midx.c
+++ b/midx.c
@@ -13,6 +13,10 @@
 #include "repository.h"
 #include "chunk-format.h"
 #include "pack.h"
+#include "pack-bitmap.h"
+#include "refs.h"
+#include "revision.h"
+#include "list-objects.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
@@ -885,6 +889,172 @@ static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
 static void clear_midx_files_ext(struct repository *r, const char *ext,
 				 unsigned char *keep_hash);
 
+static void prepare_midx_packing_data(struct packing_data *pdata,
+				      struct write_midx_context *ctx)
+{
+	uint32_t i;
+
+	memset(pdata, 0, sizeof(struct packing_data));
+	prepare_packing_data(the_repository, pdata);
+
+	for (i = 0; i < ctx->entries_nr; i++) {
+		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
+		struct object_entry *to = packlist_alloc(pdata, &from->oid);
+
+		oe_set_in_pack(pdata, to,
+			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
+	}
+}
+
+static int add_ref_to_pending(const char *refname,
+			      const struct object_id *oid,
+			      int flag, void *cb_data)
+{
+	struct rev_info *revs = (struct rev_info*)cb_data;
+	struct object *object;
+
+	if ((flag & REF_ISSYMREF) && (flag & REF_ISBROKEN)) {
+		warning("symbolic ref is dangling: %s", refname);
+		return 0;
+	}
+
+	object = parse_object_or_die(oid, refname);
+	if (object->type != OBJ_COMMIT)
+		return 0;
+
+	add_pending_object(revs, object, "");
+	if (bitmap_is_preferred_refname(revs->repo, refname))
+		object->flags |= NEEDS_BITMAP;
+	return 0;
+}
+
+struct bitmap_commit_cb {
+	struct commit **commits;
+	size_t commits_nr, commits_alloc;
+
+	struct write_midx_context *ctx;
+};
+
+static const struct object_id *bitmap_oid_access(size_t index,
+						 const void *_entries)
+{
+	const struct pack_midx_entry *entries = _entries;
+	return &entries[index].oid;
+}
+
+static void bitmap_show_commit(struct commit *commit, void *_data)
+{
+	struct bitmap_commit_cb *data = _data;
+	if (oid_pos(&commit->object.oid, data->ctx->entries,
+		    data->ctx->entries_nr,
+		    bitmap_oid_access) > -1) {
+		ALLOC_GROW(data->commits, data->commits_nr + 1,
+			   data->commits_alloc);
+		data->commits[data->commits_nr++] = commit;
+	}
+}
+
+static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr_p,
+						    struct write_midx_context *ctx)
+{
+	struct rev_info revs;
+	struct bitmap_commit_cb cb;
+
+	memset(&cb, 0, sizeof(struct bitmap_commit_cb));
+	cb.ctx = ctx;
+
+	repo_init_revisions(the_repository, &revs, NULL);
+	for_each_ref(add_ref_to_pending, &revs);
+
+	/*
+	 * Skipping promisor objects here is intentional, since it only excludes
+	 * them from the list of reachable commits that we want to select from
+	 * when computing the selection of MIDX'd commits to receive bitmaps.
+	 *
+	 * Reachability bitmaps do require that their objects be closed under
+	 * reachability, but fetching any objects missing from promisors at this
+	 * point is too late. But, if one of those objects can be reached from
+	 * an another object that is included in the bitmap, then we will
+	 * complain later that we don't have reachability closure (and fail
+	 * appropriately).
+	 */
+	fetch_if_missing = 0;
+	revs.exclude_promisor_objects = 1;
+
+	/*
+	 * Pass selected commits in topo order to match the behavior of
+	 * pack-bitmaps when configured with delta islands.
+	 */
+	revs.topo_order = 1;
+	revs.sort_order = REV_SORT_IN_GRAPH_ORDER;
+
+	if (prepare_revision_walk(&revs))
+		die(_("revision walk setup failed"));
+
+	traverse_commit_list(&revs, bitmap_show_commit, NULL, &cb);
+	if (indexed_commits_nr_p)
+		*indexed_commits_nr_p = cb.commits_nr;
+
+	return cb.commits;
+}
+
+static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
+			     struct write_midx_context *ctx,
+			     unsigned flags)
+{
+	struct packing_data pdata;
+	struct pack_idx_entry **index;
+	struct commit **commits = NULL;
+	uint32_t i, commits_nr;
+	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
+	int ret;
+
+	prepare_midx_packing_data(&pdata, ctx);
+
+	commits = find_commits_for_midx_bitmap(&commits_nr, ctx);
+
+	/*
+	 * Build the MIDX-order index based on pdata.objects (which is already
+	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
+	 * this order).
+	 */
+	ALLOC_ARRAY(index, pdata.nr_objects);
+	for (i = 0; i < pdata.nr_objects; i++)
+		index[i] = (struct pack_idx_entry *)&pdata.objects[i];
+
+	bitmap_writer_show_progress(flags & MIDX_PROGRESS);
+	bitmap_writer_build_type_index(&pdata, index, pdata.nr_objects);
+
+	/*
+	 * bitmap_writer_finish expects objects in lex order, but pack_order
+	 * gives us exactly that. use it directly instead of re-sorting the
+	 * array.
+	 *
+	 * This changes the order of objects in 'index' between
+	 * bitmap_writer_build_type_index and bitmap_writer_finish.
+	 *
+	 * The same re-ordering takes place in the single-pack bitmap code via
+	 * write_idx_file(), which is called by finish_tmp_packfile(), which
+	 * happens between bitmap_writer_build_type_index() and
+	 * bitmap_writer_finish().
+	 */
+	for (i = 0; i < pdata.nr_objects; i++)
+		index[ctx->pack_order[i]] = (struct pack_idx_entry *)&pdata.objects[i];
+
+	bitmap_writer_select_commits(commits, commits_nr, -1);
+	ret = bitmap_writer_build(&pdata);
+	if (ret < 0)
+		goto cleanup;
+
+	bitmap_writer_set_checksum(midx_hash);
+	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, 0);
+
+cleanup:
+	free(index);
+	free(bitmap_name);
+	return ret;
+}
+
 static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
 			       struct string_list *packs_to_drop,
 			       const char *preferred_pack_name,
@@ -930,9 +1100,16 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 		for (i = 0; i < ctx.m->num_packs; i++) {
 			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
 
+			if (prepare_midx_pack(the_repository, ctx.m, i)) {
+				error(_("could not load pack %s"),
+				      ctx.m->pack_names[i]);
+				result = 1;
+				goto cleanup;
+			}
+
 			ctx.info[ctx.nr].orig_pack_int_id = i;
 			ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]);
-			ctx.info[ctx.nr].p = NULL;
+			ctx.info[ctx.nr].p = ctx.m->packs[i];
 			ctx.info[ctx.nr].expired = 0;
 			ctx.nr++;
 		}
@@ -947,8 +1124,26 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
 	stop_progress(&ctx.progress);
 
-	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
-		goto cleanup;
+	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop) {
+		struct bitmap_index *bitmap_git;
+		int bitmap_exists;
+		int want_bitmap = flags & MIDX_WRITE_BITMAP;
+
+		bitmap_git = prepare_bitmap_git(the_repository);
+		bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
+		free_bitmap_index(bitmap_git);
+
+		if (bitmap_exists || !want_bitmap) {
+			/*
+			 * The correct MIDX already exists, and so does a
+			 * corresponding bitmap (or one wasn't requested).
+			 */
+			if (!want_bitmap)
+				clear_midx_files_ext(the_repository, ".bitmap",
+						     NULL);
+			goto cleanup;
+		}
+	}
 
 	if (preferred_pack_name) {
 		int found = 0;
@@ -964,7 +1159,8 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 		if (!found)
 			warning(_("unknown preferred pack: '%s'"),
 				preferred_pack_name);
-	} else if (ctx.nr && (flags & MIDX_WRITE_REV_INDEX)) {
+	} else if (ctx.nr &&
+		   (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
 		time_t oldest = ctx.info[0].p->mtime;
 		ctx.preferred_pack_idx = 0;
 
@@ -1075,9 +1271,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
 	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
 
-	if (ctx.m)
-		close_midx(ctx.m);
-
 	if (ctx.nr - dropped_packs == 0) {
 		error(_("no pack files to index."));
 		result = 1;
@@ -1108,14 +1301,22 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
 
-	if (flags & MIDX_WRITE_REV_INDEX)
+	if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))
 		ctx.pack_order = midx_pack_order(&ctx);
 
 	if (flags & MIDX_WRITE_REV_INDEX)
 		write_midx_reverse_index(midx_name, midx_hash, &ctx);
+	if (flags & MIDX_WRITE_BITMAP) {
+		if (write_midx_bitmap(midx_name, midx_hash, &ctx, flags) < 0) {
+			error(_("could not write multi-pack bitmap"));
+			result = 1;
+			goto cleanup;
+		}
+	}
 
 	commit_lock_file(&lk);
 
+	clear_midx_files_ext(the_repository, ".bitmap", midx_hash);
 	clear_midx_files_ext(the_repository, ".rev", midx_hash);
 
 cleanup:
@@ -1123,6 +1324,15 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 		if (ctx.info[i].p) {
 			close_pack(ctx.info[i].p);
 			free(ctx.info[i].p);
+			if (ctx.m) {
+				/*
+				 * Destroy a stale reference to the pack in
+				 * 'ctx.m'.
+				 */
+				uint32_t orig = ctx.info[i].orig_pack_int_id;
+				if (orig < ctx.m->num_packs)
+					ctx.m->packs[orig] = NULL;
+			}
 		}
 		free(ctx.info[i].pack_name);
 	}
@@ -1132,6 +1342,9 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	free(ctx.pack_perm);
 	free(ctx.pack_order);
 	free(midx_name);
+	if (ctx.m)
+		close_midx(ctx.m);
+
 	return result;
 }
 
@@ -1193,6 +1406,7 @@ void clear_midx_file(struct repository *r)
 	if (remove_path(midx))
 		die(_("failed to clear multi-pack-index at %s"), midx);
 
+	clear_midx_files_ext(r, ".bitmap", NULL);
 	clear_midx_files_ext(r, ".rev", NULL);
 
 	free(midx);
diff --git a/midx.h b/midx.h
index 1172df1a71..350f4d0a7b 100644
--- a/midx.h
+++ b/midx.h
@@ -41,6 +41,7 @@ struct multi_pack_index {
 
 #define MIDX_PROGRESS     (1 << 0)
 #define MIDX_WRITE_REV_INDEX (1 << 1)
+#define MIDX_WRITE_BITMAP (1 << 2)
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m);
 char *get_midx_filename(const char *object_dir);
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 15/24] t5310: move some tests to lib-bitmap.sh
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (13 preceding siblings ...)
  2021-06-21 22:25   ` [PATCH v2 14/24] pack-bitmap: write " Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-06-21 22:25   ` [PATCH v2 16/24] t/helper/test-read-midx.c: add --checksum mode Taylor Blau
                     ` (9 subsequent siblings)
  24 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

We'll soon be adding a test script that will cover many of the same
bitmap concepts as t5310, but for MIDX bitmaps. Let's pull out as many
of the applicable tests as we can so we don't have to rewrite them.

There should be no functional change to t5310; we still run the same
operations in the same order.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/lib-bitmap.sh         | 236 ++++++++++++++++++++++++++++++++++++++++
 t/t5310-pack-bitmaps.sh | 227 +-------------------------------------
 2 files changed, 240 insertions(+), 223 deletions(-)

diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index fe3f98be24..ecb5d0e05d 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -1,3 +1,6 @@
+# Helpers for scripts testing bitamp functionality; see t5310 for
+# example usage.
+
 # Compare a file containing rev-list bitmap traversal output to its non-bitmap
 # counterpart. You can't just use test_cmp for this, because the two produce
 # subtly different output:
@@ -24,3 +27,236 @@ test_bitmap_traversal () {
 	test_cmp "$1.normalized" "$2.normalized" &&
 	rm -f "$1.normalized" "$2.normalized"
 }
+
+# To ensure the logic for "maximal commits" is exercised, make
+# the repository a bit more complicated.
+#
+#    other                         second
+#      *                             *
+# (99 commits)                  (99 commits)
+#      *                             *
+#      |\                           /|
+#      | * octo-other  octo-second * |
+#      |/|\_________  ____________/|\|
+#      | \          \/  __________/  |
+#      |  | ________/\ /             |
+#      *  |/          * merge-right  *
+#      | _|__________/ \____________ |
+#      |/ |                         \|
+# (l1) *  * merge-left               * (r1)
+#      | / \________________________ |
+#      |/                           \|
+# (l2) *                             * (r2)
+#       \___________________________ |
+#                                   \|
+#                                    * (base)
+#
+# We only push bits down the first-parent history, which
+# makes some of these commits unimportant!
+#
+# The important part for the maximal commit algorithm is how
+# the bitmasks are extended. Assuming starting bit positions
+# for second (bit 0) and other (bit 1), the bitmasks at the
+# end should be:
+#
+#      second: 1       (maximal, selected)
+#       other: 01      (maximal, selected)
+#      (base): 11 (maximal)
+#
+# This complicated history was important for a previous
+# version of the walk that guarantees never walking a
+# commit multiple times. That goal might be important
+# again, so preserve this complicated case. For now, this
+# test will guarantee that the bitmaps are computed
+# correctly, even with the repeat calculations.
+setup_bitmap_history() {
+	test_expect_success 'setup repo with moderate-sized history' '
+		test_commit_bulk --id=file 10 &&
+		git branch -M second &&
+		git checkout -b other HEAD~5 &&
+		test_commit_bulk --id=side 10 &&
+
+		# add complicated history setup, including merges and
+		# ambiguous merge-bases
+
+		git checkout -b merge-left other~2 &&
+		git merge second~2 -m "merge-left" &&
+
+		git checkout -b merge-right second~1 &&
+		git merge other~1 -m "merge-right" &&
+
+		git checkout -b octo-second second &&
+		git merge merge-left merge-right -m "octopus-second" &&
+
+		git checkout -b octo-other other &&
+		git merge merge-left merge-right -m "octopus-other" &&
+
+		git checkout other &&
+		git merge octo-other -m "pull octopus" &&
+
+		git checkout second &&
+		git merge octo-second -m "pull octopus" &&
+
+		# Remove these branches so they are not selected
+		# as bitmap tips
+		git branch -D merge-left &&
+		git branch -D merge-right &&
+		git branch -D octo-other &&
+		git branch -D octo-second &&
+
+		# add padding to make these merges less interesting
+		# and avoid having them selected for bitmaps
+		test_commit_bulk --id=file 100 &&
+		git checkout other &&
+		test_commit_bulk --id=side 100 &&
+		git checkout second &&
+
+		bitmaptip=$(git rev-parse second) &&
+		blob=$(echo tagged-blob | git hash-object -w --stdin) &&
+		git tag tagged-blob $blob
+	'
+}
+
+rev_list_tests_head () {
+	test_expect_success "counting commits via bitmap ($state, $branch)" '
+		git rev-list --count $branch >expect &&
+		git rev-list --use-bitmap-index --count $branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting partial commits via bitmap ($state, $branch)" '
+		git rev-list --count $branch~5..$branch >expect &&
+		git rev-list --use-bitmap-index --count $branch~5..$branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting commits with limit ($state, $branch)" '
+		git rev-list --count -n 1 $branch >expect &&
+		git rev-list --use-bitmap-index --count -n 1 $branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting non-linear history ($state, $branch)" '
+		git rev-list --count other...second >expect &&
+		git rev-list --use-bitmap-index --count other...second >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting commits with limiting ($state, $branch)" '
+		git rev-list --count $branch -- 1.t >expect &&
+		git rev-list --use-bitmap-index --count $branch -- 1.t >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting objects via bitmap ($state, $branch)" '
+		git rev-list --count --objects $branch >expect &&
+		git rev-list --use-bitmap-index --count --objects $branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "enumerate commits ($state, $branch)" '
+		git rev-list --use-bitmap-index $branch >actual &&
+		git rev-list $branch >expect &&
+		test_bitmap_traversal --no-confirm-bitmaps expect actual
+	'
+
+	test_expect_success "enumerate --objects ($state, $branch)" '
+		git rev-list --objects --use-bitmap-index $branch >actual &&
+		git rev-list --objects $branch >expect &&
+		test_bitmap_traversal expect actual
+	'
+
+	test_expect_success "bitmap --objects handles non-commit objects ($state, $branch)" '
+		git rev-list --objects --use-bitmap-index $branch tagged-blob >actual &&
+		grep $blob actual
+	'
+}
+
+rev_list_tests () {
+	state=$1
+
+	for branch in "second" "other"
+	do
+		rev_list_tests_head
+	done
+}
+
+basic_bitmap_tests () {
+	tip="$1"
+	test_expect_success 'rev-list --test-bitmap verifies bitmaps' "
+		git rev-list --test-bitmap "${tip:-HEAD}"
+	"
+
+	rev_list_tests 'full bitmap'
+
+	test_expect_success 'clone from bitmapped repository' '
+		rm -fr clone.git &&
+		git clone --no-local --bare . clone.git &&
+		git rev-parse HEAD >expect &&
+		git --git-dir=clone.git rev-parse HEAD >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success 'partial clone from bitmapped repository' '
+		test_config uploadpack.allowfilter true &&
+		rm -fr partial-clone.git &&
+		git clone --no-local --bare --filter=blob:none . partial-clone.git &&
+		(
+			cd partial-clone.git &&
+			pack=$(echo objects/pack/*.pack) &&
+			git verify-pack -v "$pack" >have &&
+			awk "/blob/ { print \$1 }" <have >blobs &&
+			# we expect this single blob because of the direct ref
+			git rev-parse refs/tags/tagged-blob >expect &&
+			test_cmp expect blobs
+		)
+	'
+
+	test_expect_success 'setup further non-bitmapped commits' '
+		test_commit_bulk --id=further 10
+	'
+
+	rev_list_tests 'partial bitmap'
+
+	test_expect_success 'fetch (partial bitmap)' '
+		git --git-dir=clone.git fetch origin second:second &&
+		git rev-parse HEAD >expect &&
+		git --git-dir=clone.git rev-parse HEAD >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success 'enumerating progress counts pack-reused objects' '
+		count=$(git rev-list --objects --all --count) &&
+		git repack -adb &&
+
+		# check first with only reused objects; confirm that our
+		# progress showed the right number, and also that we did
+		# pack-reuse as expected.  Check only the final "done"
+		# line of the meter (there may be an arbitrary number of
+		# intermediate lines ending with CR).
+		GIT_PROGRESS_DELAY=0 \
+			git pack-objects --all --stdout --progress \
+			</dev/null >/dev/null 2>stderr &&
+		grep "Enumerating objects: $count, done" stderr &&
+		grep "pack-reused $count" stderr &&
+
+		# now the same but with one non-reused object
+		git commit --allow-empty -m "an extra commit object" &&
+		GIT_PROGRESS_DELAY=0 \
+			git pack-objects --all --stdout --progress \
+			</dev/null >/dev/null 2>stderr &&
+		grep "Enumerating objects: $((count+1)), done" stderr &&
+		grep "pack-reused $count" stderr
+	'
+}
+
+# have_delta <obj> <expected_base>
+#
+# Note that because this relies on cat-file, it might find _any_ copy of an
+# object in the repository. The caller is responsible for making sure
+# there's only one (e.g., via "repack -ad", or having just fetched a copy).
+have_delta () {
+	echo $2 >expect &&
+	echo $1 | git cat-file --batch-check="%(deltabase)" >actual &&
+	test_cmp expect actual
+}
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index b02838750e..4318f84d53 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -25,93 +25,10 @@ has_any () {
 	grep -Ff "$1" "$2"
 }
 
-# To ensure the logic for "maximal commits" is exercised, make
-# the repository a bit more complicated.
-#
-#    other                         second
-#      *                             *
-# (99 commits)                  (99 commits)
-#      *                             *
-#      |\                           /|
-#      | * octo-other  octo-second * |
-#      |/|\_________  ____________/|\|
-#      | \          \/  __________/  |
-#      |  | ________/\ /             |
-#      *  |/          * merge-right  *
-#      | _|__________/ \____________ |
-#      |/ |                         \|
-# (l1) *  * merge-left               * (r1)
-#      | / \________________________ |
-#      |/                           \|
-# (l2) *                             * (r2)
-#       \___________________________ |
-#                                   \|
-#                                    * (base)
-#
-# We only push bits down the first-parent history, which
-# makes some of these commits unimportant!
-#
-# The important part for the maximal commit algorithm is how
-# the bitmasks are extended. Assuming starting bit positions
-# for second (bit 0) and other (bit 1), the bitmasks at the
-# end should be:
-#
-#      second: 1       (maximal, selected)
-#       other: 01      (maximal, selected)
-#      (base): 11 (maximal)
-#
-# This complicated history was important for a previous
-# version of the walk that guarantees never walking a
-# commit multiple times. That goal might be important
-# again, so preserve this complicated case. For now, this
-# test will guarantee that the bitmaps are computed
-# correctly, even with the repeat calculations.
+setup_bitmap_history
 
-test_expect_success 'setup repo with moderate-sized history' '
-	test_commit_bulk --id=file 10 &&
-	git branch -M second &&
-	git checkout -b other HEAD~5 &&
-	test_commit_bulk --id=side 10 &&
-
-	# add complicated history setup, including merges and
-	# ambiguous merge-bases
-
-	git checkout -b merge-left other~2 &&
-	git merge second~2 -m "merge-left" &&
-
-	git checkout -b merge-right second~1 &&
-	git merge other~1 -m "merge-right" &&
-
-	git checkout -b octo-second second &&
-	git merge merge-left merge-right -m "octopus-second" &&
-
-	git checkout -b octo-other other &&
-	git merge merge-left merge-right -m "octopus-other" &&
-
-	git checkout other &&
-	git merge octo-other -m "pull octopus" &&
-
-	git checkout second &&
-	git merge octo-second -m "pull octopus" &&
-
-	# Remove these branches so they are not selected
-	# as bitmap tips
-	git branch -D merge-left &&
-	git branch -D merge-right &&
-	git branch -D octo-other &&
-	git branch -D octo-second &&
-
-	# add padding to make these merges less interesting
-	# and avoid having them selected for bitmaps
-	test_commit_bulk --id=file 100 &&
-	git checkout other &&
-	test_commit_bulk --id=side 100 &&
-	git checkout second &&
-
-	bitmaptip=$(git rev-parse second) &&
-	blob=$(echo tagged-blob | git hash-object -w --stdin) &&
-	git tag tagged-blob $blob &&
-	git config repack.writebitmaps true
+test_expect_success 'setup writing bitmaps during repack' '
+	git config repack.writeBitmaps true
 '
 
 test_expect_success 'full repack creates bitmaps' '
@@ -123,109 +40,7 @@ test_expect_success 'full repack creates bitmaps' '
 	grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
 '
 
-test_expect_success 'rev-list --test-bitmap verifies bitmaps' '
-	git rev-list --test-bitmap HEAD
-'
-
-rev_list_tests_head () {
-	test_expect_success "counting commits via bitmap ($state, $branch)" '
-		git rev-list --count $branch >expect &&
-		git rev-list --use-bitmap-index --count $branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting partial commits via bitmap ($state, $branch)" '
-		git rev-list --count $branch~5..$branch >expect &&
-		git rev-list --use-bitmap-index --count $branch~5..$branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting commits with limit ($state, $branch)" '
-		git rev-list --count -n 1 $branch >expect &&
-		git rev-list --use-bitmap-index --count -n 1 $branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting non-linear history ($state, $branch)" '
-		git rev-list --count other...second >expect &&
-		git rev-list --use-bitmap-index --count other...second >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting commits with limiting ($state, $branch)" '
-		git rev-list --count $branch -- 1.t >expect &&
-		git rev-list --use-bitmap-index --count $branch -- 1.t >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting objects via bitmap ($state, $branch)" '
-		git rev-list --count --objects $branch >expect &&
-		git rev-list --use-bitmap-index --count --objects $branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "enumerate commits ($state, $branch)" '
-		git rev-list --use-bitmap-index $branch >actual &&
-		git rev-list $branch >expect &&
-		test_bitmap_traversal --no-confirm-bitmaps expect actual
-	'
-
-	test_expect_success "enumerate --objects ($state, $branch)" '
-		git rev-list --objects --use-bitmap-index $branch >actual &&
-		git rev-list --objects $branch >expect &&
-		test_bitmap_traversal expect actual
-	'
-
-	test_expect_success "bitmap --objects handles non-commit objects ($state, $branch)" '
-		git rev-list --objects --use-bitmap-index $branch tagged-blob >actual &&
-		grep $blob actual
-	'
-}
-
-rev_list_tests () {
-	state=$1
-
-	for branch in "second" "other"
-	do
-		rev_list_tests_head
-	done
-}
-
-rev_list_tests 'full bitmap'
-
-test_expect_success 'clone from bitmapped repository' '
-	git clone --no-local --bare . clone.git &&
-	git rev-parse HEAD >expect &&
-	git --git-dir=clone.git rev-parse HEAD >actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'partial clone from bitmapped repository' '
-	test_config uploadpack.allowfilter true &&
-	git clone --no-local --bare --filter=blob:none . partial-clone.git &&
-	(
-		cd partial-clone.git &&
-		pack=$(echo objects/pack/*.pack) &&
-		git verify-pack -v "$pack" >have &&
-		awk "/blob/ { print \$1 }" <have >blobs &&
-		# we expect this single blob because of the direct ref
-		git rev-parse refs/tags/tagged-blob >expect &&
-		test_cmp expect blobs
-	)
-'
-
-test_expect_success 'setup further non-bitmapped commits' '
-	test_commit_bulk --id=further 10
-'
-
-rev_list_tests 'partial bitmap'
-
-test_expect_success 'fetch (partial bitmap)' '
-	git --git-dir=clone.git fetch origin second:second &&
-	git rev-parse HEAD >expect &&
-	git --git-dir=clone.git rev-parse HEAD >actual &&
-	test_cmp expect actual
-'
+basic_bitmap_tests
 
 test_expect_success 'incremental repack fails when bitmaps are requested' '
 	test_commit more-1 &&
@@ -461,40 +276,6 @@ test_expect_success 'truncated bitmap fails gracefully (cache)' '
 	test_i18ngrep corrupted.bitmap.index stderr
 '
 
-test_expect_success 'enumerating progress counts pack-reused objects' '
-	count=$(git rev-list --objects --all --count) &&
-	git repack -adb &&
-
-	# check first with only reused objects; confirm that our progress
-	# showed the right number, and also that we did pack-reuse as expected.
-	# Check only the final "done" line of the meter (there may be an
-	# arbitrary number of intermediate lines ending with CR).
-	GIT_PROGRESS_DELAY=0 \
-		git pack-objects --all --stdout --progress \
-		</dev/null >/dev/null 2>stderr &&
-	grep "Enumerating objects: $count, done" stderr &&
-	grep "pack-reused $count" stderr &&
-
-	# now the same but with one non-reused object
-	git commit --allow-empty -m "an extra commit object" &&
-	GIT_PROGRESS_DELAY=0 \
-		git pack-objects --all --stdout --progress \
-		</dev/null >/dev/null 2>stderr &&
-	grep "Enumerating objects: $((count+1)), done" stderr &&
-	grep "pack-reused $count" stderr
-'
-
-# have_delta <obj> <expected_base>
-#
-# Note that because this relies on cat-file, it might find _any_ copy of an
-# object in the repository. The caller is responsible for making sure
-# there's only one (e.g., via "repack -ad", or having just fetched a copy).
-have_delta () {
-	echo $2 >expect &&
-	echo $1 | git cat-file --batch-check="%(deltabase)" >actual &&
-	test_cmp expect actual
-}
-
 # Create a state of history with these properties:
 #
 #  - refs that allow a client to fetch some new history, while sharing some old
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 16/24] t/helper/test-read-midx.c: add --checksum mode
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (14 preceding siblings ...)
  2021-06-21 22:25   ` [PATCH v2 15/24] t5310: move some tests to lib-bitmap.sh Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-06-21 22:25   ` [PATCH v2 17/24] t5326: test multi-pack bitmap behavior Taylor Blau
                     ` (8 subsequent siblings)
  24 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Subsequent tests will want to check for the existence of a multi-pack
bitmap which matches the multi-pack-index stored in the pack directory.

The multi-pack bitmap includes the hex checksum of the MIDX it
corresponds to in its filename (for example,
'$packdir/multi-pack-index-<checksum>.bitmap'). As a result, some tests
want a way to learn what '<checksum>' is.

This helper addresses that need by printing the checksum of the
repository's multi-pack-index.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/helper/test-read-midx.c | 16 +++++++++++++++-
 t/lib-bitmap.sh           |  4 ++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 7c2eb11a8e..cb0d27049a 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -60,12 +60,26 @@ static int read_midx_file(const char *object_dir, int show_objects)
 	return 0;
 }
 
+static int read_midx_checksum(const char *object_dir)
+{
+	struct multi_pack_index *m;
+
+	setup_git_directory();
+	m = load_multi_pack_index(object_dir, 1);
+	if (!m)
+		return 1;
+	printf("%s\n", hash_to_hex(get_midx_checksum(m)));
+	return 0;
+}
+
 int cmd__read_midx(int argc, const char **argv)
 {
 	if (!(argc == 2 || argc == 3))
-		usage("read-midx [--show-objects] <object-dir>");
+		usage("read-midx [--show-objects|--checksum] <object-dir>");
 
 	if (!strcmp(argv[1], "--show-objects"))
 		return read_midx_file(argv[2], 1);
+	else if (!strcmp(argv[1], "--checksum"))
+		return read_midx_checksum(argv[2]);
 	return read_midx_file(argv[1], 0);
 }
diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index ecb5d0e05d..09cd036f4d 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -260,3 +260,7 @@ have_delta () {
 	echo $1 | git cat-file --batch-check="%(deltabase)" >actual &&
 	test_cmp expect actual
 }
+
+midx_checksum () {
+	test-tool read-midx --checksum "${1:-.git/objects}"
+}
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 17/24] t5326: test multi-pack bitmap behavior
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (15 preceding siblings ...)
  2021-06-21 22:25   ` [PATCH v2 16/24] t/helper/test-read-midx.c: add --checksum mode Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-06-21 22:25   ` [PATCH v2 18/24] t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP Taylor Blau
                     ` (7 subsequent siblings)
  24 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

This patch introduces a new test, t5326, which tests the basic
functionality of multi-pack bitmaps.

Some trivial behavior is tested, such as:

  - Whether bitmaps can be generated with more than one pack.
  - Whether clones can be served with all objects in the bitmap.
  - Whether follow-up fetches can be served with some objects outside of
    the server's bitmap

These use lib-bitmap's tests (which in turn were pulled from t5310), and
we cover cases where the MIDX represents both a single pack and multiple
packs.

In addition, some non-trivial and MIDX-specific behavior is tested, too,
including:

  - Whether multi-pack bitmaps behave correctly with respect to the
    pack-reuse machinery when the base for some object is selected from
    a different pack than the delta.
  - Whether multi-pack bitmaps correctly respect the
    pack.preferBitmapTips configuration.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5326-multi-pack-bitmaps.sh | 277 ++++++++++++++++++++++++++++++++++
 1 file changed, 277 insertions(+)
 create mode 100755 t/t5326-multi-pack-bitmaps.sh

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
new file mode 100755
index 0000000000..c1b7d633e2
--- /dev/null
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -0,0 +1,277 @@
+#!/bin/sh
+
+test_description='exercise basic multi-pack bitmap functionality'
+. ./test-lib.sh
+. "${TEST_DIRECTORY}/lib-bitmap.sh"
+
+# We'll be writing our own midx and bitmaps, so avoid getting confused by the
+# automatic ones.
+GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
+objdir=.git/objects
+midx=$objdir/pack/multi-pack-index
+
+# midx_pack_source <obj>
+midx_pack_source () {
+	test-tool read-midx --show-objects .git/objects | grep "^$1 " | cut -f2
+}
+
+setup_bitmap_history
+
+test_expect_success 'enable core.multiPackIndex' '
+	git config core.multiPackIndex true
+'
+
+test_expect_success 'create single-pack midx with bitmaps' '
+	git repack -ad &&
+	git multi-pack-index write --bitmap &&
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
+'
+
+basic_bitmap_tests
+
+test_expect_success 'create new additional packs' '
+	for i in $(test_seq 1 16)
+	do
+		test_commit "$i" &&
+		git repack -d
+	done &&
+
+	git checkout -b other2 HEAD~8 &&
+	for i in $(test_seq 1 8)
+	do
+		test_commit "side-$i" &&
+		git repack -d
+	done &&
+	git checkout second
+'
+
+test_expect_success 'create multi-pack midx with bitmaps' '
+	git multi-pack-index write --bitmap &&
+
+	ls $objdir/pack/pack-*.pack >packs &&
+	test_line_count = 25 packs &&
+
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
+'
+
+basic_bitmap_tests
+
+test_expect_success '--no-bitmap is respected when bitmaps exist' '
+	git multi-pack-index write --bitmap &&
+
+	test_commit respect--no-bitmap &&
+	GIT_TEST_MULTI_PACK_INDEX=0 git repack -d &&
+
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+
+	git multi-pack-index write --no-bitmap &&
+
+	test_path_is_file $midx &&
+	test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
+'
+
+test_expect_success 'setup midx with base from later pack' '
+	# Write a and b so that "a" is a delta on top of base "b", since Git
+	# prefers to delete contents out of a base rather than add to a shorter
+	# object.
+	test_seq 1 128 >a &&
+	test_seq 1 130 >b &&
+
+	git add a b &&
+	git commit -m "initial commit" &&
+
+	a=$(git rev-parse HEAD:a) &&
+	b=$(git rev-parse HEAD:b) &&
+
+	# In the first pack, "a" is stored as a delta to "b".
+	p1=$(git pack-objects .git/objects/pack/pack <<-EOF
+	$a
+	$b
+	EOF
+	) &&
+
+	# In the second pack, "a" is missing, and "b" is not a delta nor base to
+	# any other object.
+	p2=$(git pack-objects .git/objects/pack/pack <<-EOF
+	$b
+	$(git rev-parse HEAD)
+	$(git rev-parse HEAD^{tree})
+	EOF
+	) &&
+
+	git prune-packed &&
+	# Use the second pack as the preferred source, so that "b" occurs
+	# earlier in the MIDX object order, rendering "a" unusable for pack
+	# reuse.
+	git multi-pack-index write --bitmap --preferred-pack=pack-$p2.idx &&
+
+	have_delta $a $b &&
+	test $(midx_pack_source $a) != $(midx_pack_source $b)
+'
+
+rev_list_tests 'full bitmap with backwards delta'
+
+test_expect_success 'clone with bitmaps enabled' '
+	git clone --no-local --bare . clone-reverse-delta.git &&
+	test_when_finished "rm -fr clone-reverse-delta.git" &&
+
+	git rev-parse HEAD >expect &&
+	git --git-dir=clone-reverse-delta.git rev-parse HEAD >actual &&
+	test_cmp expect actual
+'
+
+bitmap_reuse_tests() {
+	from=$1
+	to=$2
+
+	test_expect_success "setup pack reuse tests ($from -> $to)" '
+		rm -fr repo &&
+		git init repo &&
+		(
+			cd repo &&
+			test_commit_bulk 16 &&
+			git tag old-tip &&
+
+			git config core.multiPackIndex true &&
+			if test "MIDX" = "$from"
+			then
+				GIT_TEST_MULTI_PACK_INDEX=0 git repack -Ad &&
+				git multi-pack-index write --bitmap
+			else
+				GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb
+			fi
+		)
+	'
+
+	test_expect_success "build bitmap from existing ($from -> $to)" '
+		(
+			cd repo &&
+			test_commit_bulk --id=further 16 &&
+			git tag new-tip &&
+
+			if test "MIDX" = "$to"
+			then
+				GIT_TEST_MULTI_PACK_INDEX=0 git repack -d &&
+				git multi-pack-index write --bitmap
+			else
+				GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb
+			fi
+		)
+	'
+
+	test_expect_success "verify resulting bitmaps ($from -> $to)" '
+		(
+			cd repo &&
+			git for-each-ref &&
+			git rev-list --test-bitmap refs/tags/old-tip &&
+			git rev-list --test-bitmap refs/tags/new-tip
+		)
+	'
+}
+
+bitmap_reuse_tests 'pack' 'MIDX'
+bitmap_reuse_tests 'MIDX' 'pack'
+bitmap_reuse_tests 'MIDX' 'MIDX'
+
+test_expect_success 'missing object closure fails gracefully' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit loose &&
+		test_commit packed &&
+
+		# Do not pass "--revs"; we want a pack without the "loose"
+		# commit.
+		git pack-objects $objdir/pack/pack <<-EOF &&
+		$(git rev-parse packed)
+		EOF
+
+		test_must_fail git multi-pack-index write --bitmap 2>err &&
+		grep "doesn.t have full closure" err &&
+		test_path_is_missing $midx
+	)
+'
+
+test_expect_success 'setup partial bitmaps' '
+	test_commit packed &&
+	git repack &&
+	test_commit loose &&
+	git multi-pack-index write --bitmap 2>err &&
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
+'
+
+basic_bitmap_tests HEAD~
+
+test_expect_success 'removing a MIDX clears stale bitmaps' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+		test_commit base &&
+		git repack &&
+		git multi-pack-index write --bitmap &&
+
+		# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
+		stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
+		rm $midx &&
+
+		# Then write a new MIDX.
+		test_commit new &&
+		git repack &&
+		git multi-pack-index write --bitmap &&
+
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+		test_path_is_missing $stale_bitmap
+	)
+'
+
+test_expect_success 'pack.preferBitmapTips' '
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit_bulk --message="%s" 103 &&
+
+		git log --format="%H" >commits.raw &&
+		sort <commits.raw >commits &&
+
+		git log --format="create refs/tags/%s %H" HEAD >refs &&
+		git update-ref --stdin <refs &&
+
+		git multi-pack-index write --bitmap &&
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+
+		test-tool bitmap list-commits | sort >bitmaps &&
+		comm -13 bitmaps commits >before &&
+		test_line_count = 1 before &&
+
+		perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
+			<before | git update-ref --stdin &&
+
+		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
+		rm -fr $midx-$(midx_checksum $objdir).rev &&
+		rm -fr $midx &&
+
+		git -c pack.preferBitmapTips=refs/tags/include \
+			multi-pack-index write --bitmap &&
+		test-tool bitmap list-commits | sort >bitmaps &&
+		comm -13 bitmaps commits >after &&
+
+		! test_cmp before after
+	)
+'
+
+test_done
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 18/24] t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (16 preceding siblings ...)
  2021-06-21 22:25   ` [PATCH v2 17/24] t5326: test multi-pack bitmap behavior Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-06-21 22:25   ` [PATCH v2 19/24] t5310: " Taylor Blau
                     ` (6 subsequent siblings)
  24 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

From: Jeff King <peff@peff.net>

Generating a MIDX bitmap causes tests which repack in a partial clone to
fail because they are missing objects. Missing objects is an expected
component of tests in t0410, so disable this knob altogether. Graceful
degradation when writing a bitmap with missing objects is tested in
t5326.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t0410-partial-clone.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 1667450917..4fd8e83da1 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -4,6 +4,9 @@ test_description='partial clone'
 
 . ./test-lib.sh
 
+# missing promisor objects cause repacks which write bitmaps to fail
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
 delete_object () {
 	rm $1/.git/objects/$(echo $2 | sed -e 's|^..|&/|')
 }
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 19/24] t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (17 preceding siblings ...)
  2021-06-21 22:25   ` [PATCH v2 18/24] t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-06-21 22:25   ` [PATCH v2 20/24] t5319: don't write MIDX bitmaps in t5319 Taylor Blau
                     ` (5 subsequent siblings)
  24 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

From: Jeff King <peff@peff.net>

Generating a MIDX bitmap confuses many of the tests in t5310, which
expect to control whether and how bitmaps are written. Since the
relevant MIDX-bitmap tests here are covered already in t5326, let's just
disable the flag for the whole t5310 script.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5310-pack-bitmaps.sh | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 4318f84d53..673baa5c3c 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -8,6 +8,10 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 . "$TEST_DIRECTORY"/lib-bundle.sh
 . "$TEST_DIRECTORY"/lib-bitmap.sh
 
+# t5310 deals only with single-pack bitmaps, so don't write MIDX bitmaps in
+# their place.
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
 objpath () {
 	echo ".git/objects/$(echo "$1" | sed -e 's|\(..\)|\1/|')"
 }
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 20/24] t5319: don't write MIDX bitmaps in t5319
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (18 preceding siblings ...)
  2021-06-21 22:25   ` [PATCH v2 19/24] t5310: " Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-06-21 22:25   ` [PATCH v2 21/24] t7700: update to work with MIDX bitmap test knob Taylor Blau
                     ` (4 subsequent siblings)
  24 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

This test is specifically about generating a midx still respecting a
pack-based bitmap file. Generating a MIDX bitmap would confuse the test.
Let's override the 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' variable to
make sure we don't do so.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5319-multi-pack-index.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 5641d158df..69f1c815aa 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -474,7 +474,8 @@ test_expect_success 'repack preserves multi-pack-index when creating packs' '
 compare_results_with_midx "after repack"
 
 test_expect_success 'multi-pack-index and pack-bitmap' '
-	git -c repack.writeBitmaps=true repack -ad &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -c repack.writeBitmaps=true repack -ad &&
 	git multi-pack-index write &&
 	git rev-list --test-bitmap HEAD
 '
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 21/24] t7700: update to work with MIDX bitmap test knob
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (19 preceding siblings ...)
  2021-06-21 22:25   ` [PATCH v2 20/24] t5319: don't write MIDX bitmaps in t5319 Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-06-21 22:25   ` [PATCH v2 22/24] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
                     ` (3 subsequent siblings)
  24 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A number of these tests are focused only on pack-based bitmaps and need
to be updated to disable 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' where
necessary.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t7700-repack.sh | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 25b235c063..98eda3bfeb 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -63,13 +63,14 @@ test_expect_success 'objects in packs marked .keep are not repacked' '
 
 test_expect_success 'writing bitmaps via command-line can duplicate .keep objects' '
 	# build on $oid, $packid, and .keep state from previous
-	git repack -Adbl &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 git repack -Adbl &&
 	test_has_duplicate_object true
 '
 
 test_expect_success 'writing bitmaps via config can duplicate .keep objects' '
 	# build on $oid, $packid, and .keep state from previous
-	git -c repack.writebitmaps=true repack -Adl &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -c repack.writebitmaps=true repack -Adl &&
 	test_has_duplicate_object true
 '
 
@@ -189,7 +190,9 @@ test_expect_success 'repack --keep-pack' '
 
 test_expect_success 'bitmaps are created by default in bare repos' '
 	git clone --bare .git bare.git &&
-	git -C bare.git repack -ad &&
+	rm -f bare.git/objects/pack/*.bitmap &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -C bare.git repack -ad &&
 	bitmap=$(ls bare.git/objects/pack/*.bitmap) &&
 	test_path_is_file "$bitmap"
 '
@@ -200,7 +203,8 @@ test_expect_success 'incremental repack does not complain' '
 '
 
 test_expect_success 'bitmaps can be disabled on bare repos' '
-	git -c repack.writeBitmaps=false -C bare.git repack -ad &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -c repack.writeBitmaps=false -C bare.git repack -ad &&
 	bitmap=$(ls bare.git/objects/pack/*.bitmap || :) &&
 	test -z "$bitmap"
 '
@@ -211,7 +215,8 @@ test_expect_success 'no bitmaps created if .keep files present' '
 	keep=${pack%.pack}.keep &&
 	test_when_finished "rm -f \"\$keep\"" &&
 	>"$keep" &&
-	git -C bare.git repack -ad 2>stderr &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -C bare.git repack -ad 2>stderr &&
 	test_must_be_empty stderr &&
 	find bare.git/objects/pack/ -type f -name "*.bitmap" >actual &&
 	test_must_be_empty actual
@@ -222,7 +227,8 @@ test_expect_success 'auto-bitmaps do not complain if unavailable' '
 	blob=$(test-tool genrandom big $((1024*1024)) |
 	       git -C bare.git hash-object -w --stdin) &&
 	git -C bare.git update-ref refs/tags/big $blob &&
-	git -C bare.git repack -ad 2>stderr &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -C bare.git repack -ad 2>stderr &&
 	test_must_be_empty stderr &&
 	find bare.git/objects/pack -type f -name "*.bitmap" >actual &&
 	test_must_be_empty actual
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 22/24] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (20 preceding siblings ...)
  2021-06-21 22:25   ` [PATCH v2 21/24] t7700: update to work with MIDX bitmap test knob Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-06-25  0:03     ` Ævar Arnfjörð Bjarmason
  2021-06-21 22:25   ` [PATCH v2 23/24] p5310: extract full and partial bitmap tests Taylor Blau
                     ` (2 subsequent siblings)
  24 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Introduce a new 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' environment
variable to also write a multi-pack bitmap when
'GIT_TEST_MULTI_PACK_INDEX' is set.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c          | 13 ++++++++++---
 ci/run-build-and-tests.sh |  1 +
 midx.h                    |  2 ++
 t/README                  |  4 ++++
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 5f9bc74adc..77f6f03057 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -515,7 +515,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		if (!(pack_everything & ALL_INTO_ONE) ||
 		    !is_bare_repository())
 			write_bitmaps = 0;
-	}
+	} else if (write_bitmaps &&
+		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0) &&
+		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0))
+		write_bitmaps = 0;
 	if (pack_kept_objects < 0)
 		pack_kept_objects = write_bitmaps > 0;
 
@@ -725,8 +728,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		update_server_info(0);
 	remove_temporary_files();
 
-	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
-		write_midx_file(get_object_directory(), NULL, 0);
+	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0)) {
+		unsigned flags = 0;
+		if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0))
+			flags |= MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX;
+		write_midx_file(get_object_directory(), NULL, flags);
+	}
 
 	string_list_clear(&names, 0);
 	string_list_clear(&rollback, 0);
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index 3ce81ffee9..7ee9ba9325 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -23,6 +23,7 @@ linux-gcc)
 	export GIT_TEST_COMMIT_GRAPH=1
 	export GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1
 	export GIT_TEST_MULTI_PACK_INDEX=1
+	export GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1
 	export GIT_TEST_ADD_I_USE_BUILTIN=1
 	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
 	export GIT_TEST_WRITE_REV_INDEX=1
diff --git a/midx.h b/midx.h
index 350f4d0a7b..aa3da557bb 100644
--- a/midx.h
+++ b/midx.h
@@ -8,6 +8,8 @@ struct pack_entry;
 struct repository;
 
 #define GIT_TEST_MULTI_PACK_INDEX "GIT_TEST_MULTI_PACK_INDEX"
+#define GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP \
+	"GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP"
 
 struct multi_pack_index {
 	struct multi_pack_index *next;
diff --git a/t/README b/t/README
index 1a2072b2c8..1311b8e17a 100644
--- a/t/README
+++ b/t/README
@@ -425,6 +425,10 @@ GIT_TEST_MULTI_PACK_INDEX=<boolean>, when true, forces the multi-pack-
 index to be written after every 'git repack' command, and overrides the
 'core.multiPackIndex' setting to true.
 
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=<boolean>, when true, sets the
+'--bitmap' option on all invocations of 'git multi-pack-index write',
+and ignores pack-objects' '--write-bitmap-index'.
+
 GIT_TEST_SIDEBAND_ALL=<boolean>, when true, overrides the
 'uploadpack.allowSidebandAll' setting to true, and when false, forces
 fetch-pack to not request sideband-all (even if the server advertises
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 23/24] p5310: extract full and partial bitmap tests
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (21 preceding siblings ...)
  2021-06-21 22:25   ` [PATCH v2 22/24] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
@ 2021-06-21 22:25   ` Taylor Blau
  2021-06-21 22:26   ` [PATCH v2 24/24] p5326: perf tests for MIDX bitmaps Taylor Blau
  2021-06-25  9:06   ` [PATCH v2 00/24] multi-pack reachability bitmaps Ævar Arnfjörð Bjarmason
  24 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:25 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A new p5326 introduced by the next patch will want these same tests,
interjecting its own setup in between. Move them out so that both perf
tests can reuse them.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/lib-bitmap.sh         | 69 ++++++++++++++++++++++++++++++++++++
 t/perf/p5310-pack-bitmaps.sh | 65 ++-------------------------------
 2 files changed, 72 insertions(+), 62 deletions(-)
 create mode 100644 t/perf/lib-bitmap.sh

diff --git a/t/perf/lib-bitmap.sh b/t/perf/lib-bitmap.sh
new file mode 100644
index 0000000000..63d3bc7cec
--- /dev/null
+++ b/t/perf/lib-bitmap.sh
@@ -0,0 +1,69 @@
+# Helper functions for testing bitmap performance; see p5310.
+
+test_full_bitmap () {
+	test_perf 'simulated clone' '
+		git pack-objects --stdout --all </dev/null >/dev/null
+	'
+
+	test_perf 'simulated fetch' '
+		have=$(git rev-list HEAD~100 -1) &&
+		{
+			echo HEAD &&
+			echo ^$have
+		} | git pack-objects --revs --stdout >/dev/null
+	'
+
+	test_perf 'pack to file (bitmap)' '
+		git pack-objects --use-bitmap-index --all pack1b </dev/null >/dev/null
+	'
+
+	test_perf 'rev-list (commits)' '
+		git rev-list --all --use-bitmap-index >/dev/null
+	'
+
+	test_perf 'rev-list (objects)' '
+		git rev-list --all --use-bitmap-index --objects >/dev/null
+	'
+
+	test_perf 'rev-list with tag negated via --not --all (objects)' '
+		git rev-list perf-tag --not --all --use-bitmap-index --objects >/dev/null
+	'
+
+	test_perf 'rev-list with negative tag (objects)' '
+		git rev-list HEAD --not perf-tag --use-bitmap-index --objects >/dev/null
+	'
+
+	test_perf 'rev-list count with blob:none' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=blob:none >/dev/null
+	'
+
+	test_perf 'rev-list count with blob:limit=1k' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=blob:limit=1k >/dev/null
+	'
+
+	test_perf 'rev-list count with tree:0' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=tree:0 >/dev/null
+	'
+
+	test_perf 'simulated partial clone' '
+		git pack-objects --stdout --all --filter=blob:none </dev/null >/dev/null
+	'
+}
+
+test_partial_bitmap () {
+	test_perf 'clone (partial bitmap)' '
+		git pack-objects --stdout --all </dev/null >/dev/null
+	'
+
+	test_perf 'pack to file (partial bitmap)' '
+		git pack-objects --use-bitmap-index --all pack2b </dev/null >/dev/null
+	'
+
+	test_perf 'rev-list with tree filter (partial bitmap)' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=tree:0 >/dev/null
+	'
+}
diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index 452be01056..7ad4f237bc 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -2,6 +2,7 @@
 
 test_description='Tests pack performance using bitmaps'
 . ./perf-lib.sh
+. "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
 
 test_perf_large_repo
 
@@ -25,56 +26,7 @@ test_perf 'repack to disk' '
 	git repack -ad
 '
 
-test_perf 'simulated clone' '
-	git pack-objects --stdout --all </dev/null >/dev/null
-'
-
-test_perf 'simulated fetch' '
-	have=$(git rev-list HEAD~100 -1) &&
-	{
-		echo HEAD &&
-		echo ^$have
-	} | git pack-objects --revs --stdout >/dev/null
-'
-
-test_perf 'pack to file (bitmap)' '
-	git pack-objects --use-bitmap-index --all pack1b </dev/null >/dev/null
-'
-
-test_perf 'rev-list (commits)' '
-	git rev-list --all --use-bitmap-index >/dev/null
-'
-
-test_perf 'rev-list (objects)' '
-	git rev-list --all --use-bitmap-index --objects >/dev/null
-'
-
-test_perf 'rev-list with tag negated via --not --all (objects)' '
-	git rev-list perf-tag --not --all --use-bitmap-index --objects >/dev/null
-'
-
-test_perf 'rev-list with negative tag (objects)' '
-	git rev-list HEAD --not perf-tag --use-bitmap-index --objects >/dev/null
-'
-
-test_perf 'rev-list count with blob:none' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=blob:none >/dev/null
-'
-
-test_perf 'rev-list count with blob:limit=1k' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=blob:limit=1k >/dev/null
-'
-
-test_perf 'rev-list count with tree:0' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=tree:0 >/dev/null
-'
-
-test_perf 'simulated partial clone' '
-	git pack-objects --stdout --all --filter=blob:none </dev/null >/dev/null
-'
+test_full_bitmap
 
 test_expect_success 'create partial bitmap state' '
 	# pick a commit to represent the repo tip in the past
@@ -97,17 +49,6 @@ test_expect_success 'create partial bitmap state' '
 	git update-ref HEAD $orig_tip
 '
 
-test_perf 'clone (partial bitmap)' '
-	git pack-objects --stdout --all </dev/null >/dev/null
-'
-
-test_perf 'pack to file (partial bitmap)' '
-	git pack-objects --use-bitmap-index --all pack2b </dev/null >/dev/null
-'
-
-test_perf 'rev-list with tree filter (partial bitmap)' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=tree:0 >/dev/null
-'
+test_partial_bitmap
 
 test_done
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v2 24/24] p5326: perf tests for MIDX bitmaps
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (22 preceding siblings ...)
  2021-06-21 22:25   ` [PATCH v2 23/24] p5310: extract full and partial bitmap tests Taylor Blau
@ 2021-06-21 22:26   ` Taylor Blau
  2021-06-25  9:06   ` [PATCH v2 00/24] multi-pack reachability bitmaps Ævar Arnfjörð Bjarmason
  24 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-06-21 22:26 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

These new performance tests demonstrate effectively the same behavior as
p5310, but use a multi-pack bitmap instead of a single-pack one.

Notably, p5326 does not create a MIDX bitmap with multiple packs. This
is so we can measure a direct comparison between it and p5310. Any
difference between the two is measuring just the overhead of using MIDX
bitmaps.

Here are the results of p5310 and p5326 together, measured at the same
time and on the same machine (using a Xenon W-2255 CPU):

    Test                                                  HEAD
    ------------------------------------------------------------------------
    5310.2: repack to disk                                96.78(93.39+11.33)
    5310.3: simulated clone                               9.98(9.79+0.19)
    5310.4: simulated fetch                               1.75(4.26+0.19)
    5310.5: pack to file (bitmap)                         28.20(27.87+8.70)
    5310.6: rev-list (commits)                            0.41(0.36+0.05)
    5310.7: rev-list (objects)                            1.61(1.54+0.07)
    5310.8: rev-list count with blob:none                 0.25(0.21+0.04)
    5310.9: rev-list count with blob:limit=1k             2.65(2.54+0.10)
    5310.10: rev-list count with tree:0                   0.23(0.19+0.04)
    5310.11: simulated partial clone                      4.34(4.21+0.12)
    5310.13: clone (partial bitmap)                       11.05(12.21+0.48)
    5310.14: pack to file (partial bitmap)                31.25(34.22+3.70)
    5310.15: rev-list with tree filter (partial bitmap)   0.26(0.22+0.04)

versus the same tests (this time using a multi-pack index):

    Test                                                  HEAD
    ------------------------------------------------------------------------
    5326.2: setup multi-pack index                        78.99(75.29+11.58)
    5326.3: simulated clone                               11.78(11.56+0.22)
    5326.4: simulated fetch                               1.70(4.49+0.13)
    5326.5: pack to file (bitmap)                         28.02(27.72+8.76)
    5326.6: rev-list (commits)                            0.42(0.36+0.06)
    5326.7: rev-list (objects)                            1.65(1.58+0.06)
    5326.8: rev-list count with blob:none                 0.26(0.21+0.05)
    5326.9: rev-list count with blob:limit=1k             2.97(2.86+0.10)
    5326.10: rev-list count with tree:0                   0.25(0.20+0.04)
    5326.11: simulated partial clone                      5.65(5.49+0.16)
    5326.13: clone (partial bitmap)                       12.22(13.43+0.38)
    5326.14: pack to file (partial bitmap)                30.05(31.57+7.25)
    5326.15: rev-list with tree filter (partial bitmap)   0.24(0.20+0.04)

There is slight overhead in "simulated clone", "simulated partial
clone", and "clone (partial bitmap)". Unsurprisingly, that overhead is
due to using the MIDX's reverse index to map between bit positions and
MIDX positions.

This can be reproduced by running "git repack -adb" along with "git
multi-pack-index write --bitmap" in a large-ish repository. Then run:

    $ perf record -o pack.perf git -c core.multiPackIndex=false \
      pack-objects --all --stdout >/dev/null </dev/null
    $ perf record -o midx.perf git -c core.multiPackIndex=true \
      pack-objects --all --stdout >/dev/null </dev/null

and compare the two with "perf diff -c delta -o 1 pack.perf midx.perf".
The most notable results are below (the next largest positive delta is
+0.14%):

    # Event 'cycles'
    #
    # Baseline    Delta  Shared Object       Symbol
    # ........  .......  ..................  ..........................
    #
                 +5.86%  git                 [.] nth_midxed_offset
                 +5.24%  git                 [.] nth_midxed_pack_int_id
         3.45%   +0.97%  git                 [.] offset_to_pack_pos
         3.30%   +0.57%  git                 [.] pack_pos_to_offset
                 +0.30%  git                 [.] pack_pos_to_midx

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/p5326-multi-pack-bitmaps.sh | 43 ++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)
 create mode 100755 t/perf/p5326-multi-pack-bitmaps.sh

diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
new file mode 100755
index 0000000000..5845109ac7
--- /dev/null
+++ b/t/perf/p5326-multi-pack-bitmaps.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+
+test_description='Tests performance using midx bitmaps'
+. ./perf-lib.sh
+. "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
+
+test_perf_large_repo
+
+test_expect_success 'enable multi-pack index' '
+	git config core.multiPackIndex true
+'
+
+test_perf 'setup multi-pack index' '
+	git repack -ad &&
+	git multi-pack-index write --bitmap
+'
+
+test_full_bitmap
+
+test_expect_success 'create partial bitmap state' '
+	# pick a commit to represent the repo tip in the past
+	cutoff=$(git rev-list HEAD~100 -1) &&
+	orig_tip=$(git rev-parse HEAD) &&
+
+	# now pretend we have just one tip
+	rm -rf .git/logs .git/refs/* .git/packed-refs &&
+	git update-ref HEAD $cutoff &&
+
+	# and then repack, which will leave us with a nice
+	# big bitmap pack of the "old" history, and all of
+	# the new history will be loose, as if it had been pushed
+	# up incrementally and exploded via unpack-objects
+	git repack -Ad &&
+	git multi-pack-index write --bitmap &&
+
+	# and now restore our original tip, as if the pushes
+	# had happened
+	git update-ref HEAD $orig_tip
+'
+
+test_partial_bitmap
+
+test_done
-- 
2.31.1.163.ga65ce7f831

^ permalink raw reply related	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 11/24] pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
  2021-06-21 22:25   ` [PATCH v2 11/24] pack-bitmap.c: introduce 'nth_bitmap_object_oid()' Taylor Blau
@ 2021-06-24 14:59     ` Taylor Blau
  2021-07-21 10:37     ` Jeff King
  1 sibling, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-06-24 14:59 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

On Mon, Jun 21, 2021 at 06:25:26PM -0400, Taylor Blau wrote:
>  static int load_bitmap_entries_v1(struct bitmap_index *index)
>  {
>  	uint32_t i;
> @@ -242,9 +249,7 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
>  		xor_offset = read_u8(index->map, &index->map_pos);
>  		flags = read_u8(index->map, &index->map_pos);
>
> -		if (nth_packed_object_id(&oid, index->pack, commit_idx_pos) < 0)
> -			return error("corrupt ewah bitmap: commit index %u out of range",
> -				     (unsigned)commit_idx_pos);
> +		nth_bitmap_object_oid(index, &oid, commit_idx_pos);

Oops. I was reading code in this area and noticed that this patch drops
the check introduced by c6b0c3910c (pack-bitmap.c: check reads more
aggressively when loading, 2020-12-08).

I fixed it up locally by restoring the check (but on the new function
nth_bitmap_object_oid() instead), and will send it in a reroll.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 01/24] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
  2021-06-21 22:24   ` [PATCH v2 01/24] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
@ 2021-06-24 23:02     ` Ævar Arnfjörð Bjarmason
  2021-07-14 17:24       ` Taylor Blau
  2021-07-21  9:45     ` Jeff King
  1 sibling, 1 reply; 273+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 23:02 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, gitster, jonathantanmy


On Mon, Jun 21 2021, Taylor Blau wrote:

> +	enum object_type bitmap_type = OBJ_NONE;
> +	int bitmaps_nr = 0;
> +
> +	if (bitmap_get(tdata->commits, pos)) {
> +		bitmap_type = OBJ_COMMIT;
> +		bitmaps_nr++;
> +	}
> +	if (bitmap_get(tdata->trees, pos)) {
> +		bitmap_type = OBJ_TREE;
> +		bitmaps_nr++;
> +	}
> +	if (bitmap_get(tdata->blobs, pos)) {
> +		bitmap_type = OBJ_BLOB;
> +		bitmaps_nr++;
> +	}
> +	if (bitmap_get(tdata->tags, pos)) {
> +		bitmap_type = OBJ_TAG;
> +		bitmaps_nr++;
> +	}

This made me wonder if this could be better with something like the
HAS_MULTI_BITS() macro, but that trick probably can't be applied here in
any shape or form :)

> +
> +	if (!bitmap_type)
> +		die("object %s not found in type bitmaps",
> +		    oid_to_hex(&obj->oid));

It feels a bit magical to use an enum and then assume to know the enum's
values, I know we do "type < 0" all over the place, but I'd think "if
(bitmap_type == OBJ_NONE)" would be better here....

> +
> +	if (bitmaps_nr > 1)
> +		die("object %s does not have a unique type",
> +		    oid_to_hex(&obj->oid));

Or just check the bitmaps_nr instead:

    if (!bitmaps_nr)
        die("found none");
    else if (bitmaps_nr > 1)
        ...;

Just bikeshedding...

> +
> +	if (bitmap_type != obj->type)
> +		die("object %s: real type %s, expected: %s",
> +		    oid_to_hex(&obj->oid),
> +		    type_name(obj->type),
> +		    type_name(bitmap_type));

To argue against myself (sort of) about that "== OBJ_NONE" above, if
we're not assuming that then it's sort of weird not to also assume that
type_name(type) won't return a NULL in the case of OBJ_NONE, which it
does (but this code guards against).

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 02/24] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  2021-06-21 22:25   ` [PATCH v2 02/24] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
@ 2021-06-24 23:23     ` Ævar Arnfjörð Bjarmason
  2021-07-14 17:32       ` Taylor Blau
  2021-07-21  9:50     ` Jeff King
  1 sibling, 1 reply; 273+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 23:23 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, gitster, jonathantanmy


On Mon, Jun 21 2021, Taylor Blau wrote:

> -static uint32_t find_object_pos(const struct object_id *oid)
> +static uint32_t find_object_pos(const struct object_id *oid, int *found)
>  {
>  	struct object_entry *entry = packlist_find(writer.to_pack, oid);
>  
>  	if (!entry) {
> -		die("Failed to write bitmap index. Packfile doesn't have full closure "
> +		if (found)
> +			*found = 0;
> +		warning("Failed to write bitmap index. Packfile doesn't have full closure "
>  			"(object %s is missing)", oid_to_hex(oid));
> +		return 0;
>  	}
>  
> +	if (found)
> +		*found = 1;
>  	return oe_in_pack_pos(writer.to_pack, entry);
>  }
>  
> @@ -331,9 +336,10 @@ static void bitmap_builder_clear(struct bitmap_builder *bb)
>  	bb->commits_nr = bb->commits_alloc = 0;
>  }
>  
> -static void fill_bitmap_tree(struct bitmap *bitmap,
> -			     struct tree *tree)
> +static int fill_bitmap_tree(struct bitmap *bitmap,
> +			    struct tree *tree)
>  {
> +	int found;
>  	uint32_t pos;
>  	struct tree_desc desc;
>  	struct name_entry entry;
> @@ -342,9 +348,11 @@ static void fill_bitmap_tree(struct bitmap *bitmap,
>  	 * If our bit is already set, then there is nothing to do. Both this
>  	 * tree and all of its children will be set.
>  	 */
> -	pos = find_object_pos(&tree->object.oid);
> +	pos = find_object_pos(&tree->object.oid, &found);
> +	if (!found)
> +		return -1;

So, a function that returns an unsigned 32 bit int won't (presumably)
have enough space for an "is bad", but before it died so it didn't
matter.

Now it warns, so it needs a "is bad", so we add another "int" to pass
that information around.

So if we're already paying for that extra space (which, on some
platforms would already be a 64 bit int, and on some so would the
uint32_t, it's just "at least 32 bits").

Wouldn't it be more idiomatic to just have find_object_pos() return
int64_t now, if it's -1 it's an error, otherwise the "pos" is cast to
uint32_t:
	
	diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
	index 88d9e696a54..d71fa6f607a 100644
	--- a/pack-bitmap-write.c
	+++ b/pack-bitmap-write.c
	@@ -125,14 +125,12 @@ static inline void push_bitmapped_commit(struct commit *commit)
	 	writer.selected_nr++;
	 }
	 
	-static uint32_t find_object_pos(const struct object_id *oid)
	+static int64_t find_object_pos(const struct object_id *oid)
	 {
	 	struct object_entry *entry = packlist_find(writer.to_pack, oid);
	 
	-	if (!entry) {
	-		die("Failed to write bitmap index. Packfile doesn't have full closure "
	-			"(object %s is missing)", oid_to_hex(oid));
	-	}
	+	if (!entry)
	+		return -1;
	 
	 	return oe_in_pack_pos(writer.to_pack, entry);
	 }
	@@ -334,7 +332,7 @@ static void bitmap_builder_clear(struct bitmap_builder *bb)
	 static void fill_bitmap_tree(struct bitmap *bitmap,
	 			     struct tree *tree)
	 {
	-	uint32_t pos;
	+	int64_t pos;
	 	struct tree_desc desc;
	 	struct name_entry entry;
	 
	@@ -343,6 +341,9 @@ static void fill_bitmap_tree(struct bitmap *bitmap,
	 	 * tree and all of its children will be set.
	 	 */
	 	pos = find_object_pos(&tree->object.oid);
	+	if (pos < 0)
	+		die("unhappy: %s", oid_to_hex(&tree->object.oid));
	+
	 	if (bitmap_get(bitmap, pos))
	 		return;
	 	bitmap_set(bitmap, pos);

I mean, you don't want the die() part of that, but to me the rest looks
better.

> [...]
> diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
> index 584a039b85..1667450917 100755
> --- a/t/t0410-partial-clone.sh
> +++ b/t/t0410-partial-clone.sh
> @@ -536,7 +536,13 @@ test_expect_success 'gc does not repack promisor objects if there are none' '
>  repack_and_check () {
>  	rm -rf repo2 &&
>  	cp -r repo repo2 &&
> -	git -C repo2 repack $1 -d &&
> +	if test x"$1" = "x--must-fail"
> +	then
> +		shift
> +		test_must_fail git -C repo2 repack $1 -d
> +	else
> +		git -C repo2 repack $1 -d
> +	fi &&
>  	git -C repo2 fsck &&

This sent me down the rabbit hole of
https://lore.kernel.org/git/60c67bdf5b77e_f569220858@natae.notmuch/
again.

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 04/24] Documentation: build 'technical/bitmap-format' by default
  2021-06-21 22:25   ` [PATCH v2 04/24] Documentation: build 'technical/bitmap-format' by default Taylor Blau
@ 2021-06-24 23:35     ` Ævar Arnfjörð Bjarmason
  2021-07-14 17:41       ` Taylor Blau
  2021-07-21  9:58     ` Jeff King
  1 sibling, 1 reply; 273+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 23:35 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, gitster, jonathantanmy


On Mon, Jun 21 2021, Taylor Blau wrote:

> Even though the 'TECH_DOCS' variable was introduced all the way back in
> 5e00439f0a (Documentation: build html for all files in technical and
> howto, 2012-10-23), the 'bitmap-format' document was never added to that
> list when it was created.
>
> Prepare for changes to this file by including it in the list of
> technical documentation that 'make doc' will build by default.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  Documentation/Makefile | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/Documentation/Makefile b/Documentation/Makefile
> index f5605b7767..7d7b778b28 100644
> --- a/Documentation/Makefile
> +++ b/Documentation/Makefile
> @@ -90,6 +90,7 @@ SP_ARTICLES += $(API_DOCS)
>  TECH_DOCS += MyFirstContribution
>  TECH_DOCS += MyFirstObjectWalk
>  TECH_DOCS += SubmittingPatches
> +TECH_DOCS += technical/bitmap-format
>  TECH_DOCS += technical/hash-function-transition
>  TECH_DOCS += technical/http-protocol
>  TECH_DOCS += technical/index-format

As a mostly aside I've got a local series queued up to move all of these
"format" docs to e.g. gitformat-bitmap(5), i.e. to make them first-class
manpages, so other pages can link to them. Right now we mostly don't,
but when our manpages do they link to the generated HTML, which e.g. I
don't have installed by default.

So since you're linking to it: Does anyone prefer this state of a
affairs, and isn't it mainly useful for built docs such as
https://git-scm.com/docs/git-multi-pack-index.

But there's still (but maybe later in this series) a link to
bitmap-format anywhere from another manual page (but there is for
e.g. technical/pack-format.html).

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 06/24] midx: make a number of functions non-static
  2021-06-21 22:25   ` [PATCH v2 06/24] midx: make a number of functions non-static Taylor Blau
@ 2021-06-24 23:42     ` Ævar Arnfjörð Bjarmason
  2021-07-14 23:01       ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 23:42 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, gitster, jonathantanmy


On Mon, Jun 21 2021, Taylor Blau wrote:

> These functions will be called from outside of midx.c in a subsequent
> patch.

So "a number" is "two" and "a subsequent patch" appears to be 13/24. I
think this would be clearer just squashed into whatever needs it, or at
least if it comes right before the new use in the series.

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing
  2021-06-21 22:25   ` [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing Taylor Blau
@ 2021-06-24 23:43     ` Ævar Arnfjörð Bjarmason
  2021-07-21 10:23     ` Jeff King
  1 sibling, 0 replies; 273+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 23:43 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, gitster, jonathantanmy


On Mon, Jun 21 2021, Taylor Blau wrote:

> When writing a new multi-pack index, write_midx_internal() attempts to
> load any existing one to fill in some pieces of information. But it uses
> load_multi_pack_index(), which ignores the configuration
> "core.multiPackIndex", which indicates whether or not Git is allowed to
> read an existing multi-pack-index.
>
> Replace this with a routine that does respect that setting, to avoid
> reading multi-pack-index files when told not to.
>
> This avoids a problem that would arise in subsequent patches due to the
> combination of 'git repack' reopening the object store in-process and
> the multi-pack index code not checking whether a pack already exists in
> the object store when calling add_pack_to_midx().
>
> This would ultimately lead to a cycle being created along the
> 'packed_git' struct's '->next' pointer. That is obviously bad, but it
> has hard-to-debug downstream effects like saying a bitmap can't be
> loaded for a pack because one already exists (for the same pack).
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  midx.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/midx.c b/midx.c
> index 40eb7974ba..759007d5a8 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -908,8 +908,18 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  
>  	if (m)
>  		ctx.m = m;
> -	else
> -		ctx.m = load_multi_pack_index(object_dir, 1);
> +	else {

Style nit: leaves the initial "if" braceless now that the "else" gained
braces.

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 14/24] pack-bitmap: write multi-pack bitmaps
  2021-06-21 22:25   ` [PATCH v2 14/24] pack-bitmap: write " Taylor Blau
@ 2021-06-24 23:45     ` Ævar Arnfjörð Bjarmason
  2021-07-15 14:33       ` Taylor Blau
  2021-07-21 12:09     ` Jeff King
  1 sibling, 1 reply; 273+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 23:45 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, gitster, jonathantanmy


On Mon, Jun 21 2021, Taylor Blau wrote:

> Write multi-pack bitmaps in the format described by
> Documentation/technical/bitmap-format.txt, inferring their presence with
> the absence of '--bitmap'.
>
> To write a multi-pack bitmap, this patch attempts to reuse as much of
> the existing machinery from pack-objects as possible. Specifically, the
> MIDX code prepares a packing_data struct that pretends as if a single
> packfile has been generated containing all of the objects contained
> within the MIDX.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  Documentation/git-multi-pack-index.txt |  12 +-
>  builtin/multi-pack-index.c             |   2 +
>  midx.c                                 | 230 ++++++++++++++++++++++++-
>  midx.h                                 |   1 +
>  4 files changed, 236 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
> index ffd601bc17..ada14deb2c 100644
> --- a/Documentation/git-multi-pack-index.txt
> +++ b/Documentation/git-multi-pack-index.txt
> @@ -10,7 +10,7 @@ SYNOPSIS
>  --------
>  [verse]
>  'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress]
> -	[--preferred-pack=<pack>] <subcommand>
> +	[--preferred-pack=<pack>] [--[no-]bitmap] <subcommand>
>  
>  DESCRIPTION
>  -----------
> @@ -40,6 +40,9 @@ write::
>  		multiple packs contain the same object. If not given,
>  		ties are broken in favor of the pack with the lowest
>  		mtime.
> +
> +	--[no-]bitmap::
> +		Control whether or not a multi-pack bitmap is written.
>  --
>  
>  verify::
> @@ -81,6 +84,13 @@ EXAMPLES
>  $ git multi-pack-index write
>  -----------------------------------------------
>  
> +* Write a MIDX file for the packfiles in the current .git folder with a
> +corresponding bitmap.
> ++
> +-------------------------------------------------------------
> +$ git multi-pack-index write --preferred-pack <pack> --bitmap
> +-------------------------------------------------------------
> +

I wondered if this was a <pack> positional argument, but it's just the
argument for --preferred-pack, even though the synopsis uses the "="
style for it. Even if parse-options.c is loose about it, let's use one
or the other in examples consistently.

>  * Write a MIDX file for the packfiles in an alternate object store.
>  +
>  -----------------------------------------------
> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> index 5d3ea445fd..bf6fa982e3 100644
> --- a/builtin/multi-pack-index.c
> +++ b/builtin/multi-pack-index.c
> @@ -68,6 +68,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
>  		OPT_STRING(0, "preferred-pack", &opts.preferred_pack,
>  			   N_("preferred-pack"),
>  			   N_("pack for reuse when computing a multi-pack bitmap")),
> +		OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
> +			MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
>  		OPT_END(),
>  	};
>  
> diff --git a/midx.c b/midx.c
> index 752d36c57f..a58cca707b 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -13,6 +13,10 @@
>  #include "repository.h"
>  #include "chunk-format.h"
>  #include "pack.h"
> +#include "pack-bitmap.h"
> +#include "refs.h"
> +#include "revision.h"
> +#include "list-objects.h"
>  
>  #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
>  #define MIDX_VERSION 1
> @@ -885,6 +889,172 @@ static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
>  static void clear_midx_files_ext(struct repository *r, const char *ext,
>  				 unsigned char *keep_hash);
>  
> +static void prepare_midx_packing_data(struct packing_data *pdata,
> +				      struct write_midx_context *ctx)
> +{
> +	uint32_t i;
> +
> +	memset(pdata, 0, sizeof(struct packing_data));

We initialize this on the stack in write_midx_bitmap(), shouldn't we
just there do:

    struct packing_data pdata = {0}

Instead of:

    struct packing_data pdata;

And then doing this memset() here?

> +	prepare_packing_data(the_repository, pdata);
> +
> +	for (i = 0; i < ctx->entries_nr; i++) {
> +		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
> +		struct object_entry *to = packlist_alloc(pdata, &from->oid);
> +
> +		oe_set_in_pack(pdata, to,
> +			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
> +	}
> +}
> +
> +static int add_ref_to_pending(const char *refname,
> +			      const struct object_id *oid,
> +			      int flag, void *cb_data)
> +{
> +	struct rev_info *revs = (struct rev_info*)cb_data;
> +	struct object *object;
> +
> +	if ((flag & REF_ISSYMREF) && (flag & REF_ISBROKEN)) {

Just since I'd mentioned HAS_MULTI_BITS() offhand on another patch of
yours, it's for cases like this, so:

-    if ((flag & REF_ISSYMREF) && (flag & REF_ISBROKEN)) {
+    if (HAS_MULTI_BITS(flag & (REF_ISSYMREF|REF_ISBROKEN)) {

Saves you 3 bytes of code:) Anyway, you don't need to use it, just an
intresting function... :)

> +{
> +	struct rev_info revs;
> +	struct bitmap_commit_cb cb;
> +
> +	memset(&cb, 0, sizeof(struct bitmap_commit_cb));

Another case of s/memset/"= {0}"/g ?

> +	cb.ctx = ctx;
> +
> +	repo_init_revisions(the_repository, &revs, NULL);
> +	for_each_ref(add_ref_to_pending, &revs);
> +
> +	/*
> +	 * Skipping promisor objects here is intentional, since it only excludes
> +	 * them from the list of reachable commits that we want to select from
> +	 * when computing the selection of MIDX'd commits to receive bitmaps.
> +	 *
> +	 * Reachability bitmaps do require that their objects be closed under
> +	 * reachability, but fetching any objects missing from promisors at this
> +	 * point is too late. But, if one of those objects can be reached from
> +	 * an another object that is included in the bitmap, then we will
> +	 * complain later that we don't have reachability closure (and fail
> +	 * appropriately).
> +	 */
> +	fetch_if_missing = 0;
> +	revs.exclude_promisor_objects = 1;
> +
> +	/*
> +	 * Pass selected commits in topo order to match the behavior of
> +	 * pack-bitmaps when configured with delta islands.
> +	 */
> +	revs.topo_order = 1;
> +	revs.sort_order = REV_SORT_IN_GRAPH_ORDER;
> +
> +	if (prepare_revision_walk(&revs))
> +		die(_("revision walk setup failed"));
> +
> +	traverse_commit_list(&revs, bitmap_show_commit, NULL, &cb);
> +	if (indexed_commits_nr_p)
> +		*indexed_commits_nr_p = cb.commits_nr;
> +
> +	return cb.commits;
> +}
> +
> +static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
> +			     struct write_midx_context *ctx,
> +			     unsigned flags)
> +{
> +	struct packing_data pdata;
> +	struct pack_idx_entry **index;
> +	struct commit **commits = NULL;
> +	uint32_t i, commits_nr;
> +	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
> +	int ret;
> +
> +	prepare_midx_packing_data(&pdata, ctx);
> +
> +	commits = find_commits_for_midx_bitmap(&commits_nr, ctx);
> +
> +	/*
> +	 * Build the MIDX-order index based on pdata.objects (which is already
> +	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
> +	 * this order).
> +	 */
> +	ALLOC_ARRAY(index, pdata.nr_objects);
> +	for (i = 0; i < pdata.nr_objects; i++)
> +		index[i] = (struct pack_idx_entry *)&pdata.objects[i];
> +
> +	bitmap_writer_show_progress(flags & MIDX_PROGRESS);
> +	bitmap_writer_build_type_index(&pdata, index, pdata.nr_objects);
> +
> +	/*
> +	 * bitmap_writer_finish expects objects in lex order, but pack_order
> +	 * gives us exactly that. use it directly instead of re-sorting the
> +	 * array.
> +	 *
> +	 * This changes the order of objects in 'index' between
> +	 * bitmap_writer_build_type_index and bitmap_writer_finish.
> +	 *
> +	 * The same re-ordering takes place in the single-pack bitmap code via
> +	 * write_idx_file(), which is called by finish_tmp_packfile(), which
> +	 * happens between bitmap_writer_build_type_index() and
> +	 * bitmap_writer_finish().
> +	 */
> +	for (i = 0; i < pdata.nr_objects; i++)
> +		index[ctx->pack_order[i]] = (struct pack_idx_entry *)&pdata.objects[i];
> +
> +	bitmap_writer_select_commits(commits, commits_nr, -1);
> +	ret = bitmap_writer_build(&pdata);
> +	if (ret < 0)
> +		goto cleanup;
> +
> +	bitmap_writer_set_checksum(midx_hash);
> +	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, 0);
> +
> +cleanup:
> +	free(index);
> +	free(bitmap_name);
> +	return ret;
> +}
> +
>  static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
>  			       struct string_list *packs_to_drop,
>  			       const char *preferred_pack_name,
> @@ -930,9 +1100,16 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  		for (i = 0; i < ctx.m->num_packs; i++) {
>  			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
>  
> +			if (prepare_midx_pack(the_repository, ctx.m, i)) {
> +				error(_("could not load pack %s"),
> +				      ctx.m->pack_names[i]);

Isn't the prepare_midx_pack() tasked with populating that pack_names[i]
that you can't load (the strbuf_addf() it does), but it can also exit
before that, do we get an empty string here then? Maybe I'm misreading
it (I haven't run this, just skimmed the code).

> @@ -1132,6 +1342,9 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  	free(ctx.pack_perm);
>  	free(ctx.pack_order);
>  	free(midx_name);
> +	if (ctx.m)
> +		close_midx(ctx.m);
> +

I see Stolee made close_midx() just return silently if !ctx.m in
1dcd9f2043a (midx: close multi-pack-index on repack, 2018-10-12), but
grepping the uses of it it seems calls to it are similarly guarded by
"if"'s.

Just a nit, weird to have a free-like function not invoked like
free. Perhaps (and maybe better for an unrelated cleanup) to either drop
the conditionals, or make it BUG() if it's called with NULL, but at
least we should pick one :)

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 22/24] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  2021-06-21 22:25   ` [PATCH v2 22/24] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
@ 2021-06-25  0:03     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 273+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-25  0:03 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, gitster, jonathantanmy


On Mon, Jun 21 2021, Taylor Blau wrote:

> Introduce a new 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' environment
> variable to also write a multi-pack bitmap when
> 'GIT_TEST_MULTI_PACK_INDEX' is set.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  builtin/repack.c          | 13 ++++++++++---
>  ci/run-build-and-tests.sh |  1 +
>  midx.h                    |  2 ++
>  t/README                  |  4 ++++
>  4 files changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 5f9bc74adc..77f6f03057 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -515,7 +515,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  		if (!(pack_everything & ALL_INTO_ONE) ||
>  		    !is_bare_repository())
>  			write_bitmaps = 0;
> -	}
> +	} else if (write_bitmaps &&
> +		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0) &&
> +		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0))
> +		write_bitmaps = 0;

Style nit: more if/else/elseif some with braces, some not...

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 00/24] multi-pack reachability bitmaps
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
                     ` (23 preceding siblings ...)
  2021-06-21 22:26   ` [PATCH v2 24/24] p5326: perf tests for MIDX bitmaps Taylor Blau
@ 2021-06-25  9:06   ` Ævar Arnfjörð Bjarmason
  2021-07-15 14:36     ` Taylor Blau
  24 siblings, 1 reply; 273+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-25  9:06 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, gitster, jonathantanmy


On Mon, Jun 21 2021, Taylor Blau wrote:

> Thanks in advance for your review, and sorry for the wait.

Thanks for working on this, exciting feature!

Just a note on my comments on this. I left them after some light reading
and would describe them as some combination of "musings", "shallow",
"nit-y" and "bikesheddy".

I.e. I did not have time (or I feel, the familiarity) to give this
series the sort of review it actually deserves as far as the actual
important bits go, i.e. nits aside whether this feature works and
behaves as desired. Sorry, but hopefully at least some of comments were
somewhat useful anyway.

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 01/24] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
  2021-06-24 23:02     ` Ævar Arnfjörð Bjarmason
@ 2021-07-14 17:24       ` Taylor Blau
  0 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-14 17:24 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, peff, dstolee, gitster, jonathantanmy

On Fri, Jun 25, 2021 at 01:02:56AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Mon, Jun 21 2021, Taylor Blau wrote:
>
> > +	enum object_type bitmap_type = OBJ_NONE;
> > +	int bitmaps_nr = 0;
> > +
> > +	if (bitmap_get(tdata->commits, pos)) {
> > +		bitmap_type = OBJ_COMMIT;
> > +		bitmaps_nr++;
> > +	}
> > +	if (bitmap_get(tdata->trees, pos)) {
> > +		bitmap_type = OBJ_TREE;
> > +		bitmaps_nr++;
> > +	}
> > +	if (bitmap_get(tdata->blobs, pos)) {
> > +		bitmap_type = OBJ_BLOB;
> > +		bitmaps_nr++;
> > +	}
> > +	if (bitmap_get(tdata->tags, pos)) {
> > +		bitmap_type = OBJ_TAG;
> > +		bitmaps_nr++;
> > +	}
>
> This made me wonder if this could be better with something like the
> HAS_MULTI_BITS() macro, but that trick probably can't be applied here in
> any shape or form :)

Right; since we're looking at the same bit position in each of the
type-level bitmaps, we can't just OR them together, since all of the
bits are in the same place.

And really, the object_type enum doesn't have values that tell us the
type of an object by looking at just a single bit. So
HAS_MULTI_BITS(OBJ_BLOB) would return "true", since OBJ_BLOB is 3.

> > +
> > +	if (bitmap_type != obj->type)
> > +		die("object %s: real type %s, expected: %s",
> > +		    oid_to_hex(&obj->oid),
> > +		    type_name(obj->type),
> > +		    type_name(bitmap_type));
>
> To argue against myself (sort of) about that "== OBJ_NONE" above, if
> we're not assuming that then it's sort of weird not to also assume that
> type_name(type) won't return a NULL in the case of OBJ_NONE, which it
> does (but this code guards against).

I tend to agree. To restate what you're saying: by the time we get to
the type_name(bitmap_type) we know that bitmap_type is non-zero, so we
assume it's OK to call type_name() on it.

Of course, the object_type_strings does handle the zero argument, so
this is probably a little academic, but good to think through
nonetheless.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 02/24] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  2021-06-24 23:23     ` Ævar Arnfjörð Bjarmason
@ 2021-07-14 17:32       ` Taylor Blau
  2021-07-14 18:44         ` Ævar Arnfjörð Bjarmason
  2021-07-21  9:53         ` Jeff King
  0 siblings, 2 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-14 17:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, peff, dstolee, gitster, jonathantanmy

On Fri, Jun 25, 2021 at 01:23:40AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Mon, Jun 21 2021, Taylor Blau wrote:
>
> > -static uint32_t find_object_pos(const struct object_id *oid)
> > +static uint32_t find_object_pos(const struct object_id *oid, int *found)
> >  {
> >  	struct object_entry *entry = packlist_find(writer.to_pack, oid);
> >
> >  	if (!entry) {
> > -		die("Failed to write bitmap index. Packfile doesn't have full closure "
> > +		if (found)
> > +			*found = 0;
> > +		warning("Failed to write bitmap index. Packfile doesn't have full closure "
> >  			"(object %s is missing)", oid_to_hex(oid));
> > +		return 0;
> >  	}
> >
> > +	if (found)
> > +		*found = 1;
> >  	return oe_in_pack_pos(writer.to_pack, entry);
> >  }
>
> So, a function that returns an unsigned 32 bit int won't (presumably)
> have enough space for an "is bad", but before it died so it didn't
> matter.
>
> Now it warns, so it needs a "is bad", so we add another "int" to pass
> that information around.

Right. You could imagine using the most-significant bit to indicate
"bad" (which in this case is "I couldn't find this object that I'm
supposed to be able to reach"), but of course it cuts our maximum number
of objects in a bitmap in half.

> So if we're already paying for that extra space (which, on some
> platforms would already be a 64 bit int, and on some so would the
> uint32_t, it's just "at least 32 bits").
>
> Wouldn't it be more idiomatic to just have find_object_pos() return
> int64_t now, if it's -1 it's an error, otherwise the "pos" is cast to
> uint32_t:

I'm not sure. It does save the extra argument, which is arguably more
convenient for callers, but the cost for doing so is a cast from a
signed integer type to an unsigned one (and a narrower destination type,
at that).

That seems easier to get wrong to me than passing a pointer to a pure
"int" and keeping the return type a uint32_t. So, I'm probably more
content to leave it as-is rather than change it.

I don't feel too strongly about it, though, so if you do I'd be happy to
hear more.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 04/24] Documentation: build 'technical/bitmap-format' by default
  2021-06-24 23:35     ` Ævar Arnfjörð Bjarmason
@ 2021-07-14 17:41       ` Taylor Blau
  2021-07-14 22:58         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-07-14 17:41 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, peff, dstolee, gitster, jonathantanmy

On Fri, Jun 25, 2021 at 01:35:48AM +0200, Ævar Arnfjörð Bjarmason wrote:
> As a mostly aside I've got a local series queued up to move all of these
> "format" docs to e.g. gitformat-bitmap(5), i.e. to make them first-class
> manpages, so other pages can link to them. Right now we mostly don't,
> but when our manpages do they link to the generated HTML, which e.g. I
> don't have installed by default.

It would be nice to be able to look it up with "man 5 gitformat-bitmap".
I actually don't have strong feelings about this particular patch
getting picked up or not, since it doesn't add the actual format changes
to the file itself.

This does pick up the bitmap-format document in "make -C Documentation
html", which is nice(r than "make -C Documentation
technical/bitmap-format.html" IMHO).

> But there's still (but maybe later in this series) a link to
> bitmap-format anywhere from another manual page (but there is for
> e.g. technical/pack-format.html).

No, I didn't add any links pointed at bitmap-format.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 02/24] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  2021-07-14 17:32       ` Taylor Blau
@ 2021-07-14 18:44         ` Ævar Arnfjörð Bjarmason
  2021-07-21  9:53         ` Jeff King
  1 sibling, 0 replies; 273+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-14 18:44 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, gitster, jonathantanmy


On Wed, Jul 14 2021, Taylor Blau wrote:

> On Fri, Jun 25, 2021 at 01:23:40AM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Mon, Jun 21 2021, Taylor Blau wrote:
>>
>> > -static uint32_t find_object_pos(const struct object_id *oid)
>> > +static uint32_t find_object_pos(const struct object_id *oid, int *found)
>> >  {
>> >  	struct object_entry *entry = packlist_find(writer.to_pack, oid);
>> >
>> >  	if (!entry) {
>> > -		die("Failed to write bitmap index. Packfile doesn't have full closure "
>> > +		if (found)
>> > +			*found = 0;
>> > +		warning("Failed to write bitmap index. Packfile doesn't have full closure "
>> >  			"(object %s is missing)", oid_to_hex(oid));
>> > +		return 0;
>> >  	}
>> >
>> > +	if (found)
>> > +		*found = 1;
>> >  	return oe_in_pack_pos(writer.to_pack, entry);
>> >  }
>>
>> So, a function that returns an unsigned 32 bit int won't (presumably)
>> have enough space for an "is bad", but before it died so it didn't
>> matter.
>>
>> Now it warns, so it needs a "is bad", so we add another "int" to pass
>> that information around.
>
> Right. You could imagine using the most-significant bit to indicate
> "bad" (which in this case is "I couldn't find this object that I'm
> supposed to be able to reach"), but of course it cuts our maximum number
> of objects in a bitmap in half.
>
>> So if we're already paying for that extra space (which, on some
>> platforms would already be a 64 bit int, and on some so would the
>> uint32_t, it's just "at least 32 bits").
>>
>> Wouldn't it be more idiomatic to just have find_object_pos() return
>> int64_t now, if it's -1 it's an error, otherwise the "pos" is cast to
>> uint32_t:
>
> I'm not sure. It does save the extra argument, which is arguably more
> convenient for callers, but the cost for doing so is a cast from a
> signed integer type to an unsigned one (and a narrower destination type,
> at that).
>
> That seems easier to get wrong to me than passing a pointer to a pure
> "int" and keeping the return type a uint32_t. So, I'm probably more
> content to leave it as-is rather than change it.
>
> I don't feel too strongly about it, though, so if you do I'd be happy to
> hear more.

I don't really care, it just looked a bit weird at first, and I wondered
why it couldn't return -1.

Aside from this case do you mean that such a cast would be too expensive
in general, or fears abou going past the 32 bits? I assumed that there
would be checks here for that already (and if not, we'd have wrap-around
now...).

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 04/24] Documentation: build 'technical/bitmap-format' by default
  2021-07-14 17:41       ` Taylor Blau
@ 2021-07-14 22:58         ` Ævar Arnfjörð Bjarmason
  2021-07-21 10:04           ` Jeff King
  0 siblings, 1 reply; 273+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-14 22:58 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, gitster, jonathantanmy


On Wed, Jul 14 2021, Taylor Blau wrote:

> On Fri, Jun 25, 2021 at 01:35:48AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> As a mostly aside I've got a local series queued up to move all of these
>> "format" docs to e.g. gitformat-bitmap(5), i.e. to make them first-class
>> manpages, so other pages can link to them. Right now we mostly don't,
>> but when our manpages do they link to the generated HTML, which e.g. I
>> don't have installed by default.
>
> It would be nice to be able to look it up with "man 5 gitformat-bitmap".
> I actually don't have strong feelings about this particular patch
> getting picked up or not, since it doesn't add the actual format changes
> to the file itself.
>
> This does pick up the bitmap-format document in "make -C Documentation
> html", which is nice(r than "make -C Documentation
> technical/bitmap-format.html" IMHO).

Oh yes, I'm not saying don't add the target. Just a musing on how we
ended up with such a large set of things in "Documentation/technical/*"
as opposed to just man pages.

I guess if there's good reasons for it they'll come out if/when I submit
that series...

>> But there's still (but maybe later in this series) a link to
>> bitmap-format anywhere from another manual page (but there is for
>> e.g. technical/pack-format.html).
>
> No, I didn't add any links pointed at bitmap-format.

I see https://git-scm.com/docs/bitmap-format has somehow managed to get
indexed by Google, perhaps through some magic :)

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 06/24] midx: make a number of functions non-static
  2021-06-24 23:42     ` Ævar Arnfjörð Bjarmason
@ 2021-07-14 23:01       ` Taylor Blau
  0 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-14 23:01 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, peff, dstolee, gitster, jonathantanmy

On Fri, Jun 25, 2021 at 01:42:06AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Mon, Jun 21 2021, Taylor Blau wrote:
>
> > These functions will be called from outside of midx.c in a subsequent
> > patch.
>
> So "a number" is "two" and "a subsequent patch" appears to be 13/24. I
> think this would be clearer just squashed into whatever needs it, or at
> least if it comes right before the new use in the series.

Good suggestion, thanks. This was probably written at a time when the
number of functions I needed was larger (or perhaps when this and
patches 10-12 were all jumbled together).

In any case, I dropped this patch and squashed its contents into 13/24
where it is used.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 14/24] pack-bitmap: write multi-pack bitmaps
  2021-06-24 23:45     ` Ævar Arnfjörð Bjarmason
@ 2021-07-15 14:33       ` Taylor Blau
  0 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-15 14:33 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, peff, dstolee, gitster, jonathantanmy

On Fri, Jun 25, 2021 at 01:45:46AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Mon, Jun 21 2021, Taylor Blau wrote:
> > +* Write a MIDX file for the packfiles in the current .git folder with a
> > +corresponding bitmap.
> > ++
> > +-------------------------------------------------------------
> > +$ git multi-pack-index write --preferred-pack <pack> --bitmap
> > +-------------------------------------------------------------
> > +
>
> I wondered if this was a <pack> positional argument, but it's just the
> argument for --preferred-pack, even though the synopsis uses the "="
> style for it. Even if parse-options.c is loose about it, let's use one
> or the other in examples consistently.

The example below (for writing a MIDX in an alternate object store)
doesn't include the '=', but probably would be clearer if it did. I
think it's a good suggestion, though, so I'll fix up my addition here.

> > +	memset(pdata, 0, sizeof(struct packing_data));
>
> We initialize this on the stack in write_midx_bitmap(), shouldn't we
> just there do:
>
>     struct packing_data pdata = {0}
>
> Instead of:
>
>     struct packing_data pdata;
>
> And then doing this memset() here?

I could go either way. Part of me prefers the memset() since it lets
callers of prepare_midx_packing_data() pass in anything they want,
including a pointer to uninitialized memory. Of course, there is only
one such caller, so it probably doesn't really matter.

And the other caller of prepare_packing_data() which is in
builtin/pack-objects.c operates on a pointer to a statically allocated
variable, so its bytes are already zero'd.

I don't feel strongly about it, though, so I'd just as soon err on the
side of flexibility than changing the declaration.

> > +{
> > +	struct rev_info revs;
> > +	struct bitmap_commit_cb cb;
> > +
> > +	memset(&cb, 0, sizeof(struct bitmap_commit_cb));
>
> Another case of s/memset/"= {0}"/g ?

Ah, in this case I'd prefer the aggregate-style initialization, since
we're zero-ing it out in the same function.

> >  static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
> >  			       struct string_list *packs_to_drop,
> >  			       const char *preferred_pack_name,
> > @@ -930,9 +1100,16 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
> >  		for (i = 0; i < ctx.m->num_packs; i++) {
> >  			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
> >
> > +			if (prepare_midx_pack(the_repository, ctx.m, i)) {
> > +				error(_("could not load pack %s"),
> > +				      ctx.m->pack_names[i]);
>
> Isn't the prepare_midx_pack() tasked with populating that pack_names[i]
> that you can't load (the strbuf_addf() it does), but it can also exit
> before that, do we get an empty string here then? Maybe I'm misreading
> it (I haven't run this, just skimmed the code).

Nice catch, we can't rely on ctx->m.pack_names[i] being safe to read
(and at the same time know that we're going to get a non-empty string).

Since prepare_midx_pack() can fail because the pack itself couldn't be
loaded, I think the easiest thing to do here is to just opaquely say
"could not load pack" without adding any pack name that we may or may
not have.

> > @@ -1132,6 +1342,9 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
> >  	free(ctx.pack_perm);
> >  	free(ctx.pack_order);
> >  	free(midx_name);
> > +	if (ctx.m)
> > +		close_midx(ctx.m);
> > +
>
> I see Stolee made close_midx() just return silently if !ctx.m in
> 1dcd9f2043a (midx: close multi-pack-index on repack, 2018-10-12), but
> grepping the uses of it it seems calls to it are similarly guarded by
> "if"'s.
>
> Just a nit, weird to have a free-like function not invoked like
> free. Perhaps (and maybe better for an unrelated cleanup) to either drop
> the conditionals, or make it BUG() if it's called with NULL, but at
> least we should pick one :)

I agree with the direction of 1dcd9f2043a, so I'm happy to just drop the
conditional and call close_midx() with an argument that may or may not
be NULL.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 00/24] multi-pack reachability bitmaps
  2021-06-25  9:06   ` [PATCH v2 00/24] multi-pack reachability bitmaps Ævar Arnfjörð Bjarmason
@ 2021-07-15 14:36     ` Taylor Blau
  2021-07-21 12:12       ` Jeff King
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-07-15 14:36 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, peff, dstolee, gitster, jonathantanmy

On Fri, Jun 25, 2021 at 11:06:21AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Mon, Jun 21 2021, Taylor Blau wrote:
>
> > Thanks in advance for your review, and sorry for the wait.
>
> Thanks for working on this, exciting feature!
>
> Just a note on my comments on this. I left them after some light reading
> and would describe them as some combination of "musings", "shallow",
> "nit-y" and "bikesheddy".

Thanks for your review. It did help me catch some important issues, like
referring to ctx->m.pack_names[i] when a read at "i" may have been
invalid.

> I.e. I did not have time (or I feel, the familiarity) to give this
> series the sort of review it actually deserves as far as the actual
> important bits go, i.e. nits aside whether this feature works and
> behaves as desired. Sorry, but hopefully at least some of comments were
> somewhat useful anyway.

They were useful indeed. I'll sit on the changes locally since most of
them are pretty benign and I think subsequent review could be done on
top of v2 without seeing my local changes.

I know that reviewing this is on Peff's list of things to do, but there
are competing priorities (and we have all-company meetings this week, so
I would not be surprised to see a lack of movement until at least next
week).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 01/24] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
  2021-06-21 22:24   ` [PATCH v2 01/24] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
  2021-06-24 23:02     ` Ævar Arnfjörð Bjarmason
@ 2021-07-21  9:45     ` Jeff King
  2021-07-21 17:15       ` Taylor Blau
  1 sibling, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-21  9:45 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Mon, Jun 21, 2021 at 06:24:59PM -0400, Taylor Blau wrote:

> The special `--test-bitmap` mode of `git rev-list` is used to compare
> the result of an object traversal with a bitmap to check its integrity.
> This mode does not, however, assert that the types of reachable objects
> are stored correctly.
> 
> Harden this mode by teaching it to also check that each time an object's
> bit is marked, the corresponding bit should be set in exactly one of the
> type bitmaps (whose type matches the object's true type).

Yep, makes sense, and the patch looks good.

> +{
> +	enum object_type bitmap_type = OBJ_NONE;
> [...]
> +
> +	if (!bitmap_type)
> +		die("object %s not found in type bitmaps",
> +		    oid_to_hex(&obj->oid));

I think the suggestion to do:

  if (bitmap_type == OBJ_NONE)

is reasonable here, as it assumes less about the enum. I do think
OBJ_BAD and OBJ_NONE were chosen with these kind of numeric comparisons
in mind, but there is no reason to rely on them in places we don't need
to.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 02/24] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  2021-06-21 22:25   ` [PATCH v2 02/24] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
  2021-06-24 23:23     ` Ævar Arnfjörð Bjarmason
@ 2021-07-21  9:50     ` Jeff King
  2021-07-21 17:20       ` Taylor Blau
  1 sibling, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-21  9:50 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Mon, Jun 21, 2021 at 06:25:01PM -0400, Taylor Blau wrote:

> The set of objects covered by a bitmap must be closed under
> reachability, since it must be the case that there is a valid bit
> position assigned for every possible reachable object (otherwise the
> bitmaps would be incomplete).
> 
> Pack bitmaps are never written from 'git repack' unless repacking
> all-into-one, and so we never write non-closed bitmaps (except in the
> case of partial clones where we aren't guaranteed to have all objects).
> 
> But multi-pack bitmaps change this, since it isn't known whether the
> set of objects in the MIDX is closed under reachability until walking
> them. Plumb through a bit that is set when a reachable object isn't
> found.
> 
> As soon as a reachable object isn't found in the set of objects to
> include in the bitmap, bitmap_writer_build() knows that the set is not
> closed, and so it now fails gracefully.

Leaving aside your intended use here, I think it's nice to get rid of a
deep-buried die() like this in general.

The amount of error-plumbing you had to do is a little unpleasant, but I
think is unavoidable. The only non-obvious part was this hunk:

> @@ -463,8 +488,11 @@ void bitmap_writer_build(struct packing_data *to_pack)
>  		struct commit *child;
>  		int reused = 0;
>  
> -		fill_bitmap_commit(ent, commit, &queue, &tree_queue,
> -				   old_bitmap, mapping);
> +		if (fill_bitmap_commit(ent, commit, &queue, &tree_queue,
> +				       old_bitmap, mapping) < 0) {
> +			closed = 0;
> +			break;
> +		}
>  
>  		if (ent->selected) {
>  			store_selected(ent, commit);

This is the right thing to do because we still want to free memory, stop
progress, etc. I gave a look over what will run after breaking out of
the loop, and compute_xor_offsets(), which you already handled, is the
only thing we'd want to avoid running. Good.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 02/24] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  2021-07-14 17:32       ` Taylor Blau
  2021-07-14 18:44         ` Ævar Arnfjörð Bjarmason
@ 2021-07-21  9:53         ` Jeff King
  1 sibling, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-07-21  9:53 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Ævar Arnfjörð Bjarmason, git, dstolee, gitster,
	jonathantanmy

On Wed, Jul 14, 2021 at 01:32:21PM -0400, Taylor Blau wrote:

> > So if we're already paying for that extra space (which, on some
> > platforms would already be a 64 bit int, and on some so would the
> > uint32_t, it's just "at least 32 bits").
> >
> > Wouldn't it be more idiomatic to just have find_object_pos() return
> > int64_t now, if it's -1 it's an error, otherwise the "pos" is cast to
> > uint32_t:
> 
> I'm not sure. It does save the extra argument, which is arguably more
> convenient for callers, but the cost for doing so is a cast from a
> signed integer type to an unsigned one (and a narrower destination type,
> at that).
> 
> That seems easier to get wrong to me than passing a pointer to a pure
> "int" and keeping the return type a uint32_t. So, I'm probably more
> content to leave it as-is rather than change it.

I agree that the separate "found" value makes things more obvious.
Casting to a smaller size means explaining why that is OK, whereas it is
quite clear that things are correct with two separate variables. And
having an int64_t makes one wonder whether a value like 2^35 is possible
(it isn't; this is a uint32_t because that is the limit of objects in
the pack format).

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 03/24] pack-bitmap-write.c: free existing bitmaps
  2021-06-21 22:25   ` [PATCH v2 03/24] pack-bitmap-write.c: free existing bitmaps Taylor Blau
@ 2021-07-21  9:54     ` Jeff King
  0 siblings, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-07-21  9:54 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Mon, Jun 21, 2021 at 06:25:04PM -0400, Taylor Blau wrote:

> When writing a new bitmap, the bitmap writer code attempts to read the
> existing bitmap (if one is present). This is done in order to quickly
> permute the bits of any bitmaps for commits which appear in the existing
> bitmap, and were also selected for the new bitmap.
> 
> But since this code was added in 341fa34887 (pack-bitmap-write: use
> existing bitmaps, 2020-12-08), the resources associated with opening an
> existing bitmap were never released.
> 
> It's fine to ignore this, but it's bad hygiene. It will also cause a
> problem for the multi-pack-index builtin, which will be responsible not
> only for writing bitmaps, but also for expiring any old multi-pack
> bitmaps.
> 
> If an existing bitmap was reused here, it will also be expired. That
> will cause a problem on platforms which require file resources to be
> closed before unlinking them, like Windows. Avoid this by ensuring we
> close reused bitmaps with free_bitmap_index() before removing them.

I agree with all of that. But just "it's a memory leak" would have
contented me, too. :)

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 04/24] Documentation: build 'technical/bitmap-format' by default
  2021-06-21 22:25   ` [PATCH v2 04/24] Documentation: build 'technical/bitmap-format' by default Taylor Blau
  2021-06-24 23:35     ` Ævar Arnfjörð Bjarmason
@ 2021-07-21  9:58     ` Jeff King
  2021-07-21 10:08       ` Jeff King
  1 sibling, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-21  9:58 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Mon, Jun 21, 2021 at 06:25:07PM -0400, Taylor Blau wrote:

> Even though the 'TECH_DOCS' variable was introduced all the way back in
> 5e00439f0a (Documentation: build html for all files in technical and
> howto, 2012-10-23), the 'bitmap-format' document was never added to that
> list when it was created.
> 
> Prepare for changes to this file by including it in the list of
> technical documentation that 'make doc' will build by default.

OK. I don't care that much about being able to format this as html, but
I agree it's good to be consistent with the other stuff in technical/.

The big question is whether it looks OK rendered by asciidoc, and the
answer seems to be "yes" (from a cursory look I gave it).

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 04/24] Documentation: build 'technical/bitmap-format' by default
  2021-07-14 22:58         ` Ævar Arnfjörð Bjarmason
@ 2021-07-21 10:04           ` Jeff King
  2021-07-21 10:10             ` Jeff King
  0 siblings, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-21 10:04 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Taylor Blau, git, dstolee, gitster, jonathantanmy

On Thu, Jul 15, 2021 at 12:58:00AM +0200, Ævar Arnfjörð Bjarmason wrote:

> >> But there's still (but maybe later in this series) a link to
> >> bitmap-format anywhere from another manual page (but there is for
> >> e.g. technical/pack-format.html).
> >
> > No, I didn't add any links pointed at bitmap-format.
> 
> I see https://git-scm.com/docs/bitmap-format has somehow managed to get
> indexed by Google, perhaps through some magic :)

Presumably somebody somewhere mentioned it (if not, the list archive now
links to it ;) ).

The git-scm.com site doesn't use our Documentation/Makefile, so it has
been building the bitmap-format page for a while. It doesn't look
correct, though (it is missing everything before the "on-disk format"
section).

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 04/24] Documentation: build 'technical/bitmap-format' by default
  2021-07-21  9:58     ` Jeff King
@ 2021-07-21 10:08       ` Jeff King
  2021-07-21 17:23         ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-21 10:08 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 21, 2021 at 05:58:41AM -0400, Jeff King wrote:

> On Mon, Jun 21, 2021 at 06:25:07PM -0400, Taylor Blau wrote:
> 
> > Even though the 'TECH_DOCS' variable was introduced all the way back in
> > 5e00439f0a (Documentation: build html for all files in technical and
> > howto, 2012-10-23), the 'bitmap-format' document was never added to that
> > list when it was created.
> > 
> > Prepare for changes to this file by including it in the list of
> > technical documentation that 'make doc' will build by default.
> 
> OK. I don't care that much about being able to format this as html, but
> I agree it's good to be consistent with the other stuff in technical/.
> 
> The big question is whether it looks OK rendered by asciidoc, and the
> answer seems to be "yes" (from a cursory look I gave it).

Actually, I take it back. After looking more carefully, it renders quite
poorly. There's a lot of structural indentation that ends up being
confused as code blocks.

I don't know if it's better to have a poorly-formatted HTML file, or
none at all. :)

Personally, I would just read the source. And I have a slight concern
that if we start "cleaning it up" to render as asciidoc, the source
might end up a lot less readable (though I'd reserve judgement until
actually seeing it).

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 04/24] Documentation: build 'technical/bitmap-format' by default
  2021-07-21 10:04           ` Jeff King
@ 2021-07-21 10:10             ` Jeff King
  0 siblings, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-07-21 10:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Taylor Blau, git, dstolee, gitster, jonathantanmy

On Wed, Jul 21, 2021 at 06:04:15AM -0400, Jeff King wrote:

> The git-scm.com site doesn't use our Documentation/Makefile, so it has
> been building the bitmap-format page for a while. It doesn't look
> correct, though (it is missing everything before the "on-disk format"
> section).

Doh, nevermind. That nice header section does not exist before Taylor's
series. ;)

The git-scm.com page does seem to throw away the "GIT bitmap v1 format"
title entirely. And it suffers the same rendering problems I saw when
building with asciidoc locally.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 05/24] Documentation: describe MIDX-based bitmaps
  2021-06-21 22:25   ` [PATCH v2 05/24] Documentation: describe MIDX-based bitmaps Taylor Blau
@ 2021-07-21 10:18     ` Jeff King
  2021-07-21 17:53       ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-21 10:18 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Mon, Jun 21, 2021 at 06:25:10PM -0400, Taylor Blau wrote:

> +An object is uniquely described by its bit position within a bitmap:
> +
> +	- If the bitmap belongs to a packfile, the __n__th bit corresponds to
> +	the __n__th object in pack order. For a function `offset` which maps
> +	objects to their byte offset within a pack, pack order is defined as
> +	follows:
> +
> +		o1 <= o2 <==> offset(o1) <= offset(o2)
> +
> +	- If the bitmap belongs to a MIDX, the __n__th bit corresponds to the
> +	__n__th object in MIDX order. With an additional function `pack` which
> +	maps objects to the pack they were selected from by the MIDX, MIDX order
> +	is defined as follows:
> +
> +		o1 <= o2 <==> pack(o1) <= pack(o2) /\ offset(o1) <= offset(o2)
> +
> +	The ordering between packs is done lexicographically by the pack name,
> +	with the exception of the preferred pack, which sorts ahead of all other
> +	packs.

This doesn't render well as asciidoc (the final paragraph is taken as
more of the code block). But that is a problem through the whole file. I
think we should ignore it for now, and worry about asciidoc-ifying the
whole thing later, if we choose to.

> +	The ordering between packs is done lexicographically by the pack name,
> +	with the exception of the preferred pack, which sorts ahead of all other
> +	packs.

Hmm, I'm not sure if this "lexicographically" part is true. Really we're
building on the midx .rev format here. And that says "defined by the
MIDX's pack list" (though I can't offhand remember if that is
lexicographic, or if it is in the reverse-mtime order).

At any rate, should we just be referencing the rev documentation?

> [...]

The rest of the changes to the document seemed quite sensible to me.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 07/24] midx: clear auxiliary .rev after replacing the MIDX
  2021-06-21 22:25   ` [PATCH v2 07/24] midx: clear auxiliary .rev after replacing the MIDX Taylor Blau
@ 2021-07-21 10:19     ` Jeff King
  0 siblings, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-07-21 10:19 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Mon, Jun 21, 2021 at 06:25:15PM -0400, Taylor Blau wrote:

> When writing a new multi-pack index, write_midx_internal() attempts to
> clean up any auxiliary files (currently just the MIDX's `.rev` file, but
> soon to include a `.bitmap`, too) corresponding to the MIDX it's
> replacing.
> 
> This step should happen after the new MIDX is written into place, since
> doing so beforehand means that the old MIDX could be read without its
> corresponding .rev file.

Good catch. The patch looks obviously correct.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing
  2021-06-21 22:25   ` [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing Taylor Blau
  2021-06-24 23:43     ` Ævar Arnfjörð Bjarmason
@ 2021-07-21 10:23     ` Jeff King
  2021-07-21 19:22       ` Taylor Blau
  1 sibling, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-21 10:23 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Mon, Jun 21, 2021 at 06:25:18PM -0400, Taylor Blau wrote:

> When writing a new multi-pack index, write_midx_internal() attempts to
> load any existing one to fill in some pieces of information. But it uses
> load_multi_pack_index(), which ignores the configuration
> "core.multiPackIndex", which indicates whether or not Git is allowed to
> read an existing multi-pack-index.
> 
> Replace this with a routine that does respect that setting, to avoid
> reading multi-pack-index files when told not to.
> 
> This avoids a problem that would arise in subsequent patches due to the
> combination of 'git repack' reopening the object store in-process and
> the multi-pack index code not checking whether a pack already exists in
> the object store when calling add_pack_to_midx().
> 
> This would ultimately lead to a cycle being created along the
> 'packed_git' struct's '->next' pointer. That is obviously bad, but it
> has hard-to-debug downstream effects like saying a bitmap can't be
> loaded for a pack because one already exists (for the same pack).

I'm not sure I completely understand the bug that this causes.

But another question: does this impact how

  git -c core.multipackindex=false multi-pack-index write

behaves? I.e., do we still write, but just avoid reading the existing
midx? That itself seems like a more sensible behavior (e.g., trying to
recover from a broken midx state).

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 09/24] midx: infer preferred pack when not given one
  2021-06-21 22:25   ` [PATCH v2 09/24] midx: infer preferred pack when not given one Taylor Blau
@ 2021-07-21 10:34     ` Jeff King
  2021-07-21 20:16       ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-21 10:34 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Mon, Jun 21, 2021 at 06:25:21PM -0400, Taylor Blau wrote:

> In 9218c6a40c (midx: allow marking a pack as preferred, 2021-03-30), the
> multi-pack index code learned how to select a pack which all duplicate
> objects are selected from. That is, if an object appears in multiple
> packs, select the copy in the preferred pack before breaking ties
> according to the other rules like pack mtime and readdir() order.
> 
> Not specifying a preferred pack can cause serious problems with
> multi-pack reachability bitmaps, because these bitmaps rely on having at
> least one pack from which all duplicates are selected. Not having such a
> pack causes problems with the pack reuse code (e.g., like assuming that
> a base object was sent from that pack via reuse when in fact the base
> was selected from a different pack).

It might be helpful to use a more descriptive name for "pack reuse code"
here, since it's kind of vague for people who have not been actively
working on bitmaps.

I don't have a short name for that chunk of code, but maybe:

  ...causes problems with the code in pack-objects to reuse packs
  verbatim (e.g., that code assumes that a delta object in a chunk of
  pack sent verbatim will have its base object sent from the same pack).

> So why does not marking a pack preferred cause problems here? The reason
> is roughly as follows:
> 
>   - Ties are broken (when handling duplicate objects) by sorting
>     according to midx_oid_compare(), which sorts objects by OID,
>     preferred-ness, pack mtime, and finally pack ID (more on that
>     later).
> 
>   - The psuedo pack-order (described in
>     Documentation/technical/bitmap-format.txt) is computed by
>     midx_pack_order(), and sorts by pack ID and pack offset, with
>     preferred packs sorting first.

I think the .rev description in pack-format.txt may be a better
reference here.

>   - But! Pack IDs come from incrementing the pack count in
>     add_pack_to_midx(), which is a callback to
>     for_each_file_in_pack_dir(), meaning that pack IDs are assigned in
>     readdir() order.
> 
> When specifying a preferred pack, all of that works fine, because
> duplicate objects are correctly resolved in favor of the copy in the
> preferred pack, and the preferred pack sorts first in the object order.
> 
> "Sorting first" is critical, because the bitmap code relies on finding
> out which pack holds the first object in the MIDX's pseudo pack-order to
> determine which pack is preferred.
> 
> But if we didn't specify a preferred pack, and the pack which comes
> first in readdir() order does not also have the lowest timestamp, then
> it's possible that that pack (the one that sorts first in pseudo-pack
> order, which the bitmap code will treat as the preferred one) did *not*
> have all duplicate objects resolved in its favor, resulting in breakage.
> 
> The fix is simple: pick a (semi-arbitrary) preferred pack when none was
> specified. This forces that pack to have duplicates resolved in its
> favor, and (critically) to sort first in pseudo-pack order.
> Unfortunately, testing this behavior portably isn't possible, since it
> depends on readdir() order which isn't guaranteed by POSIX.

This explanation is rather confusing, but I'm not sure if we can do much
better. I followed all of it, because I was there when we found the bug
that this is fixing. And of course that happened _after_ we implemented
midx bitmaps and in particular adapted the verbatim reuse stuff in
pack-objects to make use of it.

I see why you'd want to float the fix up before then, so we don't ever
have the broken state. But it's hard to understand what bug this is
fixing, because the bug does not even exist yet at this point in
the series!

I dunno. Like I said, I was able to follow it, so maybe it is
sufficient. I'm just not sure others would be able to.

> +
> +		if (!found)
> +			warning(_("unknown preferred pack: '%s'"),
> +				preferred_pack_name);
> +	} else if (ctx.nr && (flags & MIDX_WRITE_REV_INDEX)) {
> +		time_t oldest = ctx.info[0].p->mtime;
> +		ctx.preferred_pack_idx = 0;
> +
> +		if (packs_to_drop && packs_to_drop->nr)
> +			BUG("cannot write a MIDX bitmap during expiration");

Likewise, this BUG() feels somewhat out-of-place. At this point in the
series, we don't have bitmaps yet. :)

I can live with that, though. And I don't want to make a lot of work by
trying to re-order this patch within the series. Mostly I want to make
sure that if somebody stumbles on this commit via git-log or in a
bisection, that they can make some sense of what it's trying to do.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 10/24] pack-bitmap.c: introduce 'bitmap_num_objects()'
  2021-06-21 22:25   ` [PATCH v2 10/24] pack-bitmap.c: introduce 'bitmap_num_objects()' Taylor Blau
@ 2021-07-21 10:35     ` Jeff King
  0 siblings, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-07-21 10:35 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Mon, Jun 21, 2021 at 06:25:23PM -0400, Taylor Blau wrote:

> A subsequent patch to support reading MIDX bitmaps will be less noisy
> after extracting a generic function to return how many objects are
> contained in a bitmap.

Thanks for this. These kinds of preparatory patches make reading the
actual tricky changes so much easier.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 11/24] pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
  2021-06-21 22:25   ` [PATCH v2 11/24] pack-bitmap.c: introduce 'nth_bitmap_object_oid()' Taylor Blau
  2021-06-24 14:59     ` Taylor Blau
@ 2021-07-21 10:37     ` Jeff King
  2021-07-21 10:38       ` Jeff King
  1 sibling, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-21 10:37 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Mon, Jun 21, 2021 at 06:25:26PM -0400, Taylor Blau wrote:

> A subsequent patch to support reading MIDX bitmaps will be less noisy
> after extracting a generic function to fetch the nth OID contained in
> the bitmap.

Makes sense, but...

> +static void nth_bitmap_object_oid(struct bitmap_index *index,
> +				  struct object_id *oid,
> +				  uint32_t n)
> +{
> +	nth_packed_object_id(oid, index->pack, n);
> +}
> +
>  static int load_bitmap_entries_v1(struct bitmap_index *index)
>  {
>  	uint32_t i;
> @@ -242,9 +249,7 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
>  		xor_offset = read_u8(index->map, &index->map_pos);
>  		flags = read_u8(index->map, &index->map_pos);
>  
> -		if (nth_packed_object_id(&oid, index->pack, commit_idx_pos) < 0)
> -			return error("corrupt ewah bitmap: commit index %u out of range",
> -				     (unsigned)commit_idx_pos);
> +		nth_bitmap_object_oid(index, &oid, commit_idx_pos);

What happened to our error check here?

Should nth_bitmap_object_oid() be returning the value from
nth_packed_object_id()?

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 11/24] pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
  2021-07-21 10:37     ` Jeff King
@ 2021-07-21 10:38       ` Jeff King
  0 siblings, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-07-21 10:38 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 21, 2021 at 06:37:41AM -0400, Jeff King wrote:

> > @@ -242,9 +249,7 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
> >  		xor_offset = read_u8(index->map, &index->map_pos);
> >  		flags = read_u8(index->map, &index->map_pos);
> >  
> > -		if (nth_packed_object_id(&oid, index->pack, commit_idx_pos) < 0)
> > -			return error("corrupt ewah bitmap: commit index %u out of range",
> > -				     (unsigned)commit_idx_pos);
> > +		nth_bitmap_object_oid(index, &oid, commit_idx_pos);
> 
> What happened to our error check here?
> 
> Should nth_bitmap_object_oid() be returning the value from
> nth_packed_object_id()?

Ah, sorry, I just saw your followup message. I'll look for the fix in
the re-roll.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 12/24] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
  2021-06-21 22:25   ` [PATCH v2 12/24] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' Taylor Blau
@ 2021-07-21 10:39     ` Jeff King
  2021-07-21 20:18       ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-21 10:39 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Mon, Jun 21, 2021 at 06:25:29PM -0400, Taylor Blau wrote:

> In a recent commit, pack-objects learned support for the
> 'pack.preferBitmapTips' configuration. This patch prepares the
> multi-pack bitmap code to respect this configuration, too.
> 
> Since the multi-pack bitmap code already does a traversal of all
> references (in order to discover the set of reachable commits in the
> multi-pack index), it is more efficient to check whether or not each
> reference is a suffix of any value of 'pack.preferBitmapTips' rather
> than do an additional traversal.
> 
> Implement a function 'bitmap_is_preferred_refname()' which does just
> that. The caller will be added in a subsequent patch.

I suspect there was some patch reordering here. We don't have any
multi-pack bitmap code yet. :)

Probably this needs to say something like "in preparation for adding
multi-pack bitmap code..." or similar?

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 13/24] pack-bitmap: read multi-pack bitmaps
  2021-06-21 22:25   ` [PATCH v2 13/24] pack-bitmap: read multi-pack bitmaps Taylor Blau
@ 2021-07-21 11:32     ` Jeff King
  2021-07-21 23:01       ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-21 11:32 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Mon, Jun 21, 2021 at 06:25:31PM -0400, Taylor Blau wrote:

> This prepares the code in pack-bitmap to interpret the new multi-pack
> bitmaps described in Documentation/technical/bitmap-format.txt, which
> mostly involves converting bit positions to accommodate looking them up
> in a MIDX.
> 
> Note that there are currently no writers who write multi-pack bitmaps,
> and that this will be implemented in the subsequent commit.

There's quite a lot going on in this one, of course, but most of it
looks right. A few hunks did puzzle me:

> @@ -302,12 +377,18 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
>  		return -1;
>  	}
>  
> -	if (bitmap_git->pack) {
> +	if (bitmap_git->pack || bitmap_git->midx) {
> +		/* ignore extra bitmap file; we can only handle one */
>  		warning("ignoring extra bitmap file: %s", packfile->pack_name);
>  		close(fd);
>  		return -1;
>  	}
>  
> +	if (!is_pack_valid(packfile)) {
> +		close(fd);
> +		return -1;
> +	}
> +

What's this extra is_pack_valid() doing? I wouldn't expect many changes
at all to this non-midx code path (aside from the "did we already load a
midx bitmap" in the earlier part of the hunk, which makes sense).

> -static int load_pack_bitmap(struct bitmap_index *bitmap_git)
> +static int load_reverse_index(struct bitmap_index *bitmap_git)
> +{
> +	if (bitmap_is_midx(bitmap_git)) {
> +		uint32_t i;
> +		int ret;
> +
> +		ret = load_midx_revindex(bitmap_git->midx);
> +		if (ret)
> +			return ret;
> +
> +		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
> +			if (prepare_midx_pack(the_repository, bitmap_git->midx, i))
> +				die(_("load_reverse_index: could not open pack"));
> +			ret = load_pack_revindex(bitmap_git->midx->packs[i]);
> +			if (ret)
> +				return ret;
> +		}
> +		return 0;
> +	}
> +	return load_pack_revindex(bitmap_git->pack);
> +}

OK, this new function is used in load_bitmap(), which is used for both
pack and midx bitmaps. So if we have a midx bitmap, we'll
unconditionally load the revindex here. But:

  - why do we then load individual pack revindexes? I can believe it may
    be necessary to meet the assumptions of some other part of the code,
    but it would be nice to have a comment giving us some clue.

  - in open_midx_bitmap_1(), we also unconditionally load the midx
    reverse index. I think that will always happen before us here (we
    cannot load_bitmap() a bitmap that has not been opened). So is this
    load_midx_revindex() call always a noop?

> +static int open_bitmap(struct repository *r,
> +		       struct bitmap_index *bitmap_git)
> +{
> +	assert(!bitmap_git->map);
> +
> +	if (!open_midx_bitmap(r, bitmap_git))
> +		return 0;
> +	return open_pack_bitmap(r, bitmap_git);
> +}

We always prefer a midx bitmap over a pack one. That makes sense, since
that means we can leave old pack bitmaps in place when generating midx
bitmaps, if we choose to.

>  static int bitmap_position(struct bitmap_index *bitmap_git,
>  			   const struct object_id *oid)
>  {
> -	int pos = bitmap_position_packfile(bitmap_git, oid);
> +	int pos;
> +	if (bitmap_is_midx(bitmap_git))
> +		pos = bitmap_position_midx(bitmap_git, oid);
> +	else
> +		pos = bitmap_position_packfile(bitmap_git, oid);
>  	return (pos >= 0) ? pos : bitmap_position_extended(bitmap_git, oid);
>  }

Makes sense. Not new in your patch, but this "int" return is fudging the
same 32-bit space we were talking about elsewhere (i.e., "pos" really
could be 2^32, or even more due to extended objects).

In practice I think even 2^31 objects is pretty out-of-reach, but it may
be worth changing the return type (and the callers), or even just
catching the overflow with an assertion.

> @@ -752,8 +911,13 @@ static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
>  		struct object *object = roots->item;
>  		roots = roots->next;
>  
> -		if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
> -			return 1;
> +		if (bitmap_is_midx(bitmap_git)) {
> +			if (bsearch_midx(&object->oid, bitmap_git->midx, NULL))
> +				return 1;
> +		} else {
> +			if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
> +				return 1;
> +		}
>  	}

Makes sense. TBH, I am not sure this in_bitmapped_pack() function is all
that useful. It is used only as part of a heuristic to avoid bitmaps
when we don't have coverage of any "have" commits. But I'm not sure that
heuristic is actually useful.

Anyway, we should definitely not get into ripping it out here. This
series is complicated enough. :) Just a note for possible future work.

>  	if (pos < bitmap_num_objects(bitmap_git)) {
> -		off_t ofs = pack_pos_to_offset(pack, pos);
> +		struct packed_git *pack;
> +		off_t ofs;
> +
> +		if (bitmap_is_midx(bitmap_git)) {
> +			uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
> +			uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
> +
> +			pack = bitmap_git->midx->packs[pack_id];
> +			ofs = nth_midxed_offset(bitmap_git->midx, midx_pos);
> +		} else {
> +			pack = bitmap_git->pack;
> +			ofs = pack_pos_to_offset(pack, pos);
> +		}
> +

All of the hunks like this make perfect sense. The big problem would be
if we _missed_ a place that needed conversion to handle midx. But the
nice thing is that it would segfault quickly in such an instance. So
there I'm mostly relying on test coverage, plus our experience running
with this code at scale.

>  static void try_partial_reuse(struct bitmap_index *bitmap_git,
> +			      struct packed_git *pack,
>  			      size_t pos,
>  			      struct bitmap *reuse,
>  			      struct pack_window **w_curs)
>  {
> -	off_t offset, header;
> +	off_t offset, delta_obj_offset;

I'm OK with all of this in one big patch. But I suspect you _could_
just put:

  if (bitmap_git->midx)
	return; /* partial reuse not implemented for midx yet */

to start with, and then actually implement it later. I call out this
code in particular just because it's got a lot of subtleties (the
"reuse" bits are much more intimate with the assumptions of packs and
bitmaps than most other code).

I'm not sure if it's worth the trouble at this point or not.

>  	enum object_type type;
>  	unsigned long size;
>  
> -	if (pos >= bitmap_num_objects(bitmap_git))
> -		return; /* not actually in the pack or MIDX */
> +	/*
> +	 * try_partial_reuse() is called either on (a) objects in the
> +	 * bitmapped pack (in the case of a single-pack bitmap) or (b)
> +	 * objects in the preferred pack of a multi-pack bitmap.
> +	 * Importantly, the latter can pretend as if only a single pack
> +	 * exists because:
> +	 *
> +	 *   - The first pack->num_objects bits of a MIDX bitmap are
> +	 *     reserved for the preferred pack, and
> +	 *
> +	 *   - Ties due to duplicate objects are always resolved in
> +	 *     favor of the preferred pack.
> +	 *
> +	 * Therefore we do not need to ever ask the MIDX for its copy of
> +	 * an object by OID, since it will always select it from the
> +	 * preferred pack. Likewise, the selected copy of the base
> +	 * object for any deltas will reside in the same pack.
> +	 *
> +	 * This means that we can reuse pos when looking up the bit in
> +	 * the reuse bitmap, too, since bits corresponding to the
> +	 * preferred pack precede all bits from other packs.
> +	 */
>  
> -	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
> -	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
> +	if (pos >= pack->num_objects)
> +		return; /* not actually in the pack or MIDX preferred pack */

It feels weird to go from bitmap_num_objects() back to
pack->num_objects. But I agree it's the right thing for the "pretend as
if only a single pack exists" reasons given above.

> +static uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git)
> +{
> +	struct multi_pack_index *m = bitmap_git->midx;
> +	if (!m)
> +		BUG("midx_preferred_pack: requires non-empty MIDX");
> +	return nth_midxed_pack_int_id(m, pack_pos_to_midx(bitmap_git->midx, 0));
> +}

This part is really subtle. We infer the preferred pack by looking at
the pack of the 0th bit position. In general that works, since that's
part of the definition of the preferred pack.

Could this ever be fooled if we had a preferred pack with 0 objects in
it? I don't know why we would have such a thing, but just trying to
think of cases where our assumptions might not hold (and what bad things
could happen).

> +	if (bitmap_is_midx(bitmap_git))
> +		pack = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)];
> +	else
> +		pack = bitmap_git->pack;
> +	objects_nr = pack->num_objects;
> +
>  	while (i < result->word_alloc && result->words[i] == (eword_t)~0)
>  		i++;
>  
> -	/* Don't mark objects not in the packfile */
> +	/*
> +	 * Don't mark objects not in the packfile or preferred pack. This bitmap
> +	 * marks objects eligible for reuse, but the pack-reuse code only
> +	 * understands how to reuse a single pack. Since the preferred pack is
> +	 * guaranteed to have all bases for its deltas (in a multi-pack bitmap),
> +	 * we use it instead of another pack. In single-pack bitmaps, the choice
> +	 * is made for us.
> +	 */
>  	if (i > objects_nr / BITS_IN_EWORD)
>  		i = objects_nr / BITS_IN_EWORD;

OK, so this clamps our "quick" contiguous set of bits to the number of
objects in the preferred pack. Makes sense. And then we hit the
object-by-object loop below...

> @@ -1213,7 +1437,15 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
>  				break;
>  
>  			offset += ewah_bit_ctz64(word >> offset);
> -			try_partial_reuse(bitmap_git, pos + offset, reuse, &w_curs);
> +			if (bitmap_is_midx(bitmap_git)) {
> +				/*
> +				 * Can't reuse from a non-preferred pack (see
> +				 * above).
> +				 */
> +				if (pos + offset >= objects_nr)
> +					continue;
> +			}
> +			try_partial_reuse(bitmap_git, pack, pos + offset, reuse, &w_curs);

...and this likewise makes sure we never go past that first pack. Good.

I think this "continue" could actually be a "break", as the loop is
iterating over "offset" (and "pos + offset" always gets larger). In
fact, it could break out of the outer loop as well (which is
incrementing "pos"). It's probably a pretty small efficiency in
practice, though.

> @@ -1511,8 +1749,13 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
>  		struct object_id oid;
>  		struct object_entry *oe;
>  
> -		nth_packed_object_id(&oid, bitmap_git->pack,
> -				     pack_pos_to_index(bitmap_git->pack, i));
> +		if (bitmap_is_midx(bitmap_git))
> +			nth_midxed_object_oid(&oid,
> +					      bitmap_git->midx,
> +					      pack_pos_to_midx(bitmap_git->midx, i));
> +		else
> +			nth_packed_object_id(&oid, bitmap_git->pack,
> +					     pack_pos_to_index(bitmap_git->pack, i));
>  		oe = packlist_find(mapping, &oid);

Could this be using nth_bitmap_object_oid()? I guess not, because we are
feeding from pack_pos_to_*. I'm not sure if another helper function is
worth it (pack_pos_to_bitmap_index() or something?).

> @@ -1575,7 +1831,31 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
>  				break;
>  
>  			offset += ewah_bit_ctz64(word >> offset);
> -			pos = base + offset;
> +
> +			if (bitmap_is_midx(bitmap_git)) {
> +				uint32_t pack_pos;
> +				uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, base + offset);
> +				uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
> +				off_t offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
> +
> +				pack = bitmap_git->midx->packs[pack_id];
> +
> +				if (offset_to_pack_pos(pack, offset, &pack_pos) < 0) {
> +					struct object_id oid;
> +					nth_midxed_object_oid(&oid, bitmap_git->midx, midx_pos);
> +
> +					die(_("could not find %s in pack #%"PRIu32" at offset %"PRIuMAX),
> +					    oid_to_hex(&oid),
> +					    pack_id,
> +					    (uintmax_t)offset);
> +				}
> +
> +				pos = pack_pos;
> +			} else {
> +				pack = bitmap_git->pack;
> +				pos = base + offset;
> +			}
> +
>  			total += pack_pos_to_offset(pack, pos + 1) -
>  				 pack_pos_to_offset(pack, pos);
>  		}

In the midx case, we have to go from midx-bitmap-pos to midx-index-pos,
to then get the pack/ofs combo, which then gives us a real "pos" in the
pack. I don't think there's a faster way to do it (and this is still
much faster than looking up objects in the pack only to check their
revindex).

But then with the result, we compare the offset of "pos" and "pos + 1".
We need to know "pos" to find "pos + 1". But in the midx case, don't we
already have the offset of "pos" (it is "offset" in the bitmap_is_midx()
conditional, which is shadowing the completely unrelated "offset" in the
outer loop).

We could reuse it, saving ourselves an extra round-trip of pack_pos to
index_pos to offset. It would just mean stuffing the "total +=" line
into the two sides of the conditional.

> +off_t bitmap_pack_offset(struct bitmap_index *bitmap_git, uint32_t pos)
> +{
> +	if (bitmap_is_midx(bitmap_git))
> +		return nth_midxed_offset(bitmap_git->midx,
> +					 pack_pos_to_midx(bitmap_git->midx, pos));
> +	return nth_packed_object_offset(bitmap_git->pack,
> +					pack_pos_to_index(bitmap_git->pack, pos));
> +}

Does anybody call this function? I don't see any users by the end of the
series.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 14/24] pack-bitmap: write multi-pack bitmaps
  2021-06-21 22:25   ` [PATCH v2 14/24] pack-bitmap: write " Taylor Blau
  2021-06-24 23:45     ` Ævar Arnfjörð Bjarmason
@ 2021-07-21 12:09     ` Jeff King
  2021-07-26 18:12       ` Taylor Blau
  1 sibling, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-21 12:09 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Mon, Jun 21, 2021 at 06:25:34PM -0400, Taylor Blau wrote:

> +static int add_ref_to_pending(const char *refname,
> +			      const struct object_id *oid,
> +			      int flag, void *cb_data)
> +{
> +	struct rev_info *revs = (struct rev_info*)cb_data;
> +	struct object *object;
> +
> +	if ((flag & REF_ISSYMREF) && (flag & REF_ISBROKEN)) {
> +		warning("symbolic ref is dangling: %s", refname);
> +		return 0;
> +	}
> +
> +	object = parse_object_or_die(oid, refname);
> +	if (object->type != OBJ_COMMIT)
> +		return 0;
> +
> +	add_pending_object(revs, object, "");
> +	if (bitmap_is_preferred_refname(revs->repo, refname))
> +		object->flags |= NEEDS_BITMAP;
> +	return 0;
> +}

OK, so we'll look at each ref to get the set of commits that we want to
traverse to put into the bitmap. Which is roughly the same as what the
pack bitmap does. We only generate bitmaps for all-into-one repacks, so
it is traversing all of the reachable objects. It is a little different
in that the pack version is probably hitting reflogs, but IMHO we are
better off to ignore reflogs for the purposes of bitmaps (I would
suggest to do so in the pack-bitmap case, too, except that it is
combined with the "what to pack" traversal there, and by the time we see
each commit we don't know how we got there).

> +struct bitmap_commit_cb {
> +	struct commit **commits;
> +	size_t commits_nr, commits_alloc;
> +
> +	struct write_midx_context *ctx;
> +};
> +
> +static const struct object_id *bitmap_oid_access(size_t index,
> +						 const void *_entries)
> +{
> +	const struct pack_midx_entry *entries = _entries;
> +	return &entries[index].oid;
> +}
> +
> +static void bitmap_show_commit(struct commit *commit, void *_data)
> +{
> +	struct bitmap_commit_cb *data = _data;
> +	if (oid_pos(&commit->object.oid, data->ctx->entries,
> +		    data->ctx->entries_nr,
> +		    bitmap_oid_access) > -1) {

This "> -1" struck me as a little bit funny. Perhaps ">= 0" would be a
more obvious way of saying "we found it"?

> +	/*
> +	 * Skipping promisor objects here is intentional, since it only excludes
> +	 * them from the list of reachable commits that we want to select from
> +	 * when computing the selection of MIDX'd commits to receive bitmaps.
> +	 *
> +	 * Reachability bitmaps do require that their objects be closed under
> +	 * reachability, but fetching any objects missing from promisors at this
> +	 * point is too late. But, if one of those objects can be reached from
> +	 * an another object that is included in the bitmap, then we will
> +	 * complain later that we don't have reachability closure (and fail
> +	 * appropriately).
> +	 */
> +	fetch_if_missing = 0;
> +	revs.exclude_promisor_objects = 1;

Makes sense.

> +	/*
> +	 * Pass selected commits in topo order to match the behavior of
> +	 * pack-bitmaps when configured with delta islands.
> +	 */
> +	revs.topo_order = 1;
> +	revs.sort_order = REV_SORT_IN_GRAPH_ORDER;

Hmm. Why do we want to match this side effect of delta islands here?

The only impact this has is on the order of commits we feed for bitmap
selection (and during the actual generation phase, it may impact
visitation order).

Now I'm of the opinion that topo order is probably the best thing for
bitmap generation (since the bitmaps themselves are connected to the
graph structure). But if it is the best thing, shouldn't we perhaps be
turning on topo-order for single-pack bitmaps, too?

And if it isn't the best thing, then why would we want it here?

> +	if (prepare_revision_walk(&revs))
> +		die(_("revision walk setup failed"));

We call init_revisions(), and then go straight to
prepare_revision_walk() with no call to setup_revisions() between. It
doesn't seem to be clearly documented, but I think you're supposed to,
as it finalizes some bits like diff_setup_done().

I suspect it works OK in practice, and I did find a few other spots that
do not call it (e.g., builtin/am.c:write_commit_patch). But most spots
do at least an empty setup_revisions(0, NULL, &rev, NULL).

> +	/*
> +	 * Build the MIDX-order index based on pdata.objects (which is already
> +	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
> +	 * this order).
> +	 */
> +	ALLOC_ARRAY(index, pdata.nr_objects);
> +	for (i = 0; i < pdata.nr_objects; i++)
> +		index[i] = (struct pack_idx_entry *)&pdata.objects[i];

This cast is correct because the pack_idx_entry is at the start of each
object_entry. But maybe:

  index[i] = &pdata.objects[i].idx;

would be less scary looking?

> +	/*
> +	 * bitmap_writer_finish expects objects in lex order, but pack_order
> +	 * gives us exactly that. use it directly instead of re-sorting the
> +	 * array.
> +	 *
> +	 * This changes the order of objects in 'index' between
> +	 * bitmap_writer_build_type_index and bitmap_writer_finish.
> +	 *
> +	 * The same re-ordering takes place in the single-pack bitmap code via
> +	 * write_idx_file(), which is called by finish_tmp_packfile(), which
> +	 * happens between bitmap_writer_build_type_index() and
> +	 * bitmap_writer_finish().
> +	 */
> +	for (i = 0; i < pdata.nr_objects; i++)
> +		index[ctx->pack_order[i]] = (struct pack_idx_entry *)&pdata.objects[i];

Ditto here.

> +	bitmap_writer_select_commits(commits, commits_nr, -1);

Not related to your patch, but I had to refresh my memory on what this
"-1" was for. It's "max_bitmaps", and is ignored if it's negative. But
the only callers pass "-1"! So we could get rid of it entirely.

It probably makes sense to leave that cleanup out of this
already-complicated series. But maybe worth doing later on top.

> @@ -930,9 +1100,16 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  		for (i = 0; i < ctx.m->num_packs; i++) {
>  			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
>  
> +			if (prepare_midx_pack(the_repository, ctx.m, i)) {
> +				error(_("could not load pack %s"),
> +				      ctx.m->pack_names[i]);
> +				result = 1;
> +				goto cleanup;
> +			}

It might be worth a comment here. I can easily believe that there is
some later part of the bitmap generation code that assumes the packs are
loaded. But somebody reading this is not likely to understand why it's
here.

Should this be done conditionally only if we're writing a bitmap? (That
might also make it obvious why we are doing it).

> @@ -947,8 +1124,26 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
>  	stop_progress(&ctx.progress);
>  
> -	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
> -		goto cleanup;
> +	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop) {
> +		struct bitmap_index *bitmap_git;
> +		int bitmap_exists;
> +		int want_bitmap = flags & MIDX_WRITE_BITMAP;
> +
> +		bitmap_git = prepare_bitmap_git(the_repository);
> +		bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
> +		free_bitmap_index(bitmap_git);
> +
> +		if (bitmap_exists || !want_bitmap) {
> +			/*
> +			 * The correct MIDX already exists, and so does a
> +			 * corresponding bitmap (or one wasn't requested).
> +			 */
> +			if (!want_bitmap)
> +				clear_midx_files_ext(the_repository, ".bitmap",
> +						     NULL);
> +			goto cleanup;
> +		}
> +	}

So this makes "git multi-pack-index write --write-bitmap" actually write
a bitmap, even if the midx itself didn't need updating? Sounds good.
Likewise, we'll delete a bitmap if one exists but we were not requested
to write one. Makes sense.

I do think nice-to-have bits like this could have come in a separate
patch with their own explanation and tests. It may not be worth trying
to extract it at this point, though.

> @@ -1075,9 +1271,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  	hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
>  	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
>  
> -	if (ctx.m)
> -		close_midx(ctx.m);
> -
>  	if (ctx.nr - dropped_packs == 0) {
>  		error(_("no pack files to index."));
>  		result = 1;

I'm not sure what this hunk is doing. We do pick up the close_midx()
call at the end of the function, amidst the other cleanup.

I expect the answer is something like "we need it open when we generate
the bitmaps". But it makes me wonder if we could hit any cases where we
try to overwrite it while it's still open, which would cause problems on
Windows.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 00/24] multi-pack reachability bitmaps
  2021-07-15 14:36     ` Taylor Blau
@ 2021-07-21 12:12       ` Jeff King
  0 siblings, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-07-21 12:12 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Ævar Arnfjörð Bjarmason, git, dstolee, gitster,
	jonathantanmy

On Thu, Jul 15, 2021 at 10:36:53AM -0400, Taylor Blau wrote:

> I know that reviewing this is on Peff's list of things to do, but there
> are competing priorities (and we have all-company meetings this week, so
> I would not be surprised to see a lack of movement until at least next
> week).

I made it through a very careful read of all of the actual code changes,
through patch 14. After that, it looks like it's all tests, but my brain
is now effectively fried. I had some comments, but nothing
earth-shattering.

I'll pick up on 15-24 later, though I think between my comments and
Ævar's there's some light changes to be made. So don't hesitate to post
a new version in the meantime, which can save us a round-trip.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 01/24] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
  2021-07-21  9:45     ` Jeff King
@ 2021-07-21 17:15       ` Taylor Blau
  0 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-21 17:15 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 21, 2021 at 05:45:12AM -0400, Jeff King wrote:
> > +{
> > +	enum object_type bitmap_type = OBJ_NONE;
> > [...]
> > +
> > +	if (!bitmap_type)
> > +		die("object %s not found in type bitmaps",
> > +		    oid_to_hex(&obj->oid));
>
> I think the suggestion to do:
>
>   if (bitmap_type == OBJ_NONE)
>
> is reasonable here, as it assumes less about the enum. I do think
> OBJ_BAD and OBJ_NONE were chosen with these kind of numeric comparisons
> in mind, but there is no reason to rely on them in places we don't need
> to.

I had to double check your suggestion, because my first question was
"what if bitmap_type is OBJ_BAD?" We can call type_name() on OBJ_BAD, but
it will return NULL, and we use the return value in a format string
unconditionally.

So that would be a problem, but it's impossible for this to ever be
OBJ_BAD, because we only set it based on the type bitmaps; so it's
either a commit/tree/blob/tag, or none (but not bad).

I took your suggestion, thanks.

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 02/24] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  2021-07-21  9:50     ` Jeff King
@ 2021-07-21 17:20       ` Taylor Blau
  2021-07-23  7:37         ` Jeff King
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-07-21 17:20 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 21, 2021 at 05:50:43AM -0400, Jeff King wrote:
> The amount of error-plumbing you had to do is a little unpleasant, but I
> think is unavoidable. The only non-obvious part was this hunk:

Agreed, at least on the amount of plumbing required to get this to work
;).

> > @@ -463,8 +488,11 @@ void bitmap_writer_build(struct packing_data *to_pack)
> >  		struct commit *child;
> >  		int reused = 0;
> >
> > -		fill_bitmap_commit(ent, commit, &queue, &tree_queue,
> > -				   old_bitmap, mapping);
> > +		if (fill_bitmap_commit(ent, commit, &queue, &tree_queue,
> > +				       old_bitmap, mapping) < 0) {
> > +			closed = 0;
> > +			break;
> > +		}
> >
> >  		if (ent->selected) {
> >  			store_selected(ent, commit);
>
> This is the right thing to do because we still want to free memory, stop
> progress, etc. I gave a look over what will run after breaking out of
> the loop, and compute_xor_offsets(), which you already handled, is the
> only thing we'd want to avoid running. Good.

Right. The key is that we return "closed ? 0 : -1" (of course, being
careful to invert "closed" where "1" OK into a suitable return value for
bitmap_writer_build, where "0" means OK, and a negative number means
"error").

While I'm thinking about that inversion, we *could* call this variable
"open" and set it to "0" until proven otherwise. Then the conditional
becomes "if (!open)", but the return value is still "return open ? -1 :
0" (since I assume we'd want to use 0/1 values for "open" instead of -1,
meaning we'd have to do some translation).

Anyway, this is definitely an annoying detail that doesn't really
matter (and just rambling on my part) ;).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 04/24] Documentation: build 'technical/bitmap-format' by default
  2021-07-21 10:08       ` Jeff King
@ 2021-07-21 17:23         ` Taylor Blau
  2021-07-23  7:39           ` Jeff King
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-07-21 17:23 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 21, 2021 at 06:08:18AM -0400, Jeff King wrote:
> On Wed, Jul 21, 2021 at 05:58:41AM -0400, Jeff King wrote:
>
> > On Mon, Jun 21, 2021 at 06:25:07PM -0400, Taylor Blau wrote:
> >
> > > Even though the 'TECH_DOCS' variable was introduced all the way back in
> > > 5e00439f0a (Documentation: build html for all files in technical and
> > > howto, 2012-10-23), the 'bitmap-format' document was never added to that
> > > list when it was created.
> > >
> > > Prepare for changes to this file by including it in the list of
> > > technical documentation that 'make doc' will build by default.
> >
> > OK. I don't care that much about being able to format this as html, but
> > I agree it's good to be consistent with the other stuff in technical/.
> >
> > The big question is whether it looks OK rendered by asciidoc, and the
> > answer seems to be "yes" (from a cursory look I gave it).
>
> Actually, I take it back. After looking more carefully, it renders quite
> poorly. There's a lot of structural indentation that ends up being
> confused as code blocks.
>
> I don't know if it's better to have a poorly-formatted HTML file, or
> none at all. :)
>
> Personally, I would just read the source. And I have a slight concern
> that if we start "cleaning it up" to render as asciidoc, the source
> might end up a lot less readable (though I'd reserve judgement until
> actually seeing it).

Yeah, the actual source is pretty readable (and it's what I had been
looking at, although it is sometimes convenient to have a version I can
read in my web browser). But it's definitely not good Asciidoc.

I briefly considered cleaning it up, but decided against it. Usually I
would opt to clean it up, but this series is already so large that I
figured it would make a negative impact on the reviewer experience to
read a clean-up patch here.

I wouldn't be opposed to coming back to it in the future, once the dust
settles. I guess we can consider this #leftoverbits until then.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 05/24] Documentation: describe MIDX-based bitmaps
  2021-07-21 10:18     ` Jeff King
@ 2021-07-21 17:53       ` Taylor Blau
  2021-07-23  7:45         ` Jeff King
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-07-21 17:53 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 21, 2021 at 06:18:46AM -0400, Jeff King wrote:
> On Mon, Jun 21, 2021 at 06:25:10PM -0400, Taylor Blau wrote:
>
> > +An object is uniquely described by its bit position within a bitmap:
> > +
> > +	- If the bitmap belongs to a packfile, the __n__th bit corresponds to
> > +	the __n__th object in pack order. For a function `offset` which maps
> > +	objects to their byte offset within a pack, pack order is defined as
> > +	follows:
> > +
> > +		o1 <= o2 <==> offset(o1) <= offset(o2)
> > +
> > +	- If the bitmap belongs to a MIDX, the __n__th bit corresponds to the
> > +	__n__th object in MIDX order. With an additional function `pack` which
> > +	maps objects to the pack they were selected from by the MIDX, MIDX order
> > +	is defined as follows:
> > +
> > +		o1 <= o2 <==> pack(o1) <= pack(o2) /\ offset(o1) <= offset(o2)
> > +
> > +	The ordering between packs is done lexicographically by the pack name,
> > +	with the exception of the preferred pack, which sorts ahead of all other
> > +	packs.
>
> This doesn't render well as asciidoc (the final paragraph is taken as
> more of the code block). But that is a problem through the whole file. I
> think we should ignore it for now, and worry about asciidoc-ifying the
> whole thing later, if we choose to.

Agreed; let's ignore it for now.

> > +	The ordering between packs is done lexicographically by the pack name,
> > +	with the exception of the preferred pack, which sorts ahead of all other
> > +	packs.
>
> Hmm, I'm not sure if this "lexicographically" part is true. Really we're
> building on the midx .rev format here. And that says "defined by the
> MIDX's pack list" (though I can't offhand remember if that is
> lexicographic, or if it is in the reverse-mtime order).
>
> At any rate, should we just be referencing the rev documentation?

The packs are listed in lex order in the MIDX, but that is so we can
binary search that list to determine whether a pack is included in the
MIDX or not.

I had to check, but we do use the lex order to resolve duplicate
objects, too. See (at the tip of this branch):

    QSORT(ctx.info, ctx.nr, pack_info_compare);

from within midx.c:write_midx_internal(). Here, ctx.info contains the
list of packs, and pack_info_compare is a thin wrapper around
strcmp()-ing the pack_name values of two packed_git structures.

Arguably, you'd get better EWAH compression of the bits between packs
if we sorted packs in reverse order according to their mtime. But I
suspect that it doesn't matter much in practice, since the number of
objects vastly outpaces the number of packs (but I haven't measured to
be certain, so take that with a grain of salt).

In any case, I think that you're right that adding too much detail hurts
us here, so we should really be mentioning the MIDX's .rev-file
documentation (unfortunately, we can't linkgit it, so mentioning it by
name will have to suffice). I plan to reroll with something like this on
top:

diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index 25221c7ec8..04b3ec2178 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -26,9 +26,8 @@ An object is uniquely described by its bit position within a bitmap:

 		o1 <= o2 <==> pack(o1) <= pack(o2) /\ offset(o1) <= offset(o2)

-	The ordering between packs is done lexicographically by the pack name,
-	with the exception of the preferred pack, which sorts ahead of all other
-	packs.
+	The ordering between packs is done according to the MIDX's .rev file.
+	Notably, the preferred pack sorts ahead of all other packs.

 The on-disk representation (described below) of a bitmap is the same regardless
 of whether or not that bitmap belongs to a packfile or a MIDX. The only

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing
  2021-07-21 10:23     ` Jeff King
@ 2021-07-21 19:22       ` Taylor Blau
  2021-07-23  8:29         ` Jeff King
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-07-21 19:22 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 21, 2021 at 06:23:23AM -0400, Jeff King wrote:
> On Mon, Jun 21, 2021 at 06:25:18PM -0400, Taylor Blau wrote:
>
> > When writing a new multi-pack index, write_midx_internal() attempts to
> > load any existing one to fill in some pieces of information. But it uses
> > load_multi_pack_index(), which ignores the configuration
> > "core.multiPackIndex", which indicates whether or not Git is allowed to
> > read an existing multi-pack-index.
> >
> > Replace this with a routine that does respect that setting, to avoid
> > reading multi-pack-index files when told not to.
> >
> > This avoids a problem that would arise in subsequent patches due to the
> > combination of 'git repack' reopening the object store in-process and
> > the multi-pack index code not checking whether a pack already exists in
> > the object store when calling add_pack_to_midx().
> >
> > This would ultimately lead to a cycle being created along the
> > 'packed_git' struct's '->next' pointer. That is obviously bad, but it
> > has hard-to-debug downstream effects like saying a bitmap can't be
> > loaded for a pack because one already exists (for the same pack).
>
> I'm not sure I completely understand the bug that this causes.

Off-hand, I can't quite remember either. But it is important; I do have
a distinct memory of dropping this patch and then watching a 'git repack
--write-midx' (that option will be introduced in a later series) fail
horribly.

If I remember correctly, the bug has to do with loading a MIDX twice in
the same process. When we call add_packed_git() from within
prepare_midx_pack(), we load the pack without caring whether or not it's
already loaded. So loading a MIDX twice in the same process will fail.

So really I think that this is papering over that bug: we're just
removing one of the times that we happened to load a MIDX from during
the writing phase.

What I do remember is that this bug was a huge pain to figure out ;).
I'm happy to look further if you aren't satisfied with my vague
explanation here (and I wouldn't blame you).

> But another question: does this impact how
>
>   git -c core.multipackindex=false multi-pack-index write
>
> behaves? I.e., do we still write, but just avoid reading the existing
> midx? That itself seems like a more sensible behavior (e.g., trying to
> recover from a broken midx state).

Yes. Before this patch, that invocation would still load and use any
existing MIDX to write a new one. Now we don't, because (unlike
load_multi_pack_index()) prepare_multi_pack_index_one() does check
core.multiPackIndex before returning anything.

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 09/24] midx: infer preferred pack when not given one
  2021-07-21 10:34     ` Jeff King
@ 2021-07-21 20:16       ` Taylor Blau
  2021-07-23  8:50         ` Jeff King
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-07-21 20:16 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 21, 2021 at 06:34:25AM -0400, Jeff King wrote:
> On Mon, Jun 21, 2021 at 06:25:21PM -0400, Taylor Blau wrote:
>
> > In 9218c6a40c (midx: allow marking a pack as preferred, 2021-03-30), the
> > multi-pack index code learned how to select a pack which all duplicate
> > objects are selected from. That is, if an object appears in multiple
> > packs, select the copy in the preferred pack before breaking ties
> > according to the other rules like pack mtime and readdir() order.
> >
> > Not specifying a preferred pack can cause serious problems with
> > multi-pack reachability bitmaps, because these bitmaps rely on having at
> > least one pack from which all duplicates are selected. Not having such a
> > pack causes problems with the pack reuse code (e.g., like assuming that
> > a base object was sent from that pack via reuse when in fact the base
> > was selected from a different pack).
>
> It might be helpful to use a more descriptive name for "pack reuse code"
> here, since it's kind of vague for people who have not been actively
> working on bitmaps.
>
> I don't have a short name for that chunk of code, but maybe:
>
>   ...causes problems with the code in pack-objects to reuse packs
>   verbatim (e.g., that code assumes that a delta object in a chunk of
>   pack sent verbatim will have its base object sent from the same pack).

Thanks; I like what you wrote here.

> >   - The psuedo pack-order (described in
> >     Documentation/technical/bitmap-format.txt) is computed by
> >     midx_pack_order(), and sorts by pack ID and pack offset, with
> >     preferred packs sorting first.
>
> I think the .rev description in pack-format.txt may be a better
> reference here.

Ditto, I changed that, too.

> >   - But! Pack IDs come from incrementing the pack count in
> >     add_pack_to_midx(), which is a callback to
> >     for_each_file_in_pack_dir(), meaning that pack IDs are assigned in
> >     readdir() order.
> >
> > [ ... ]
>
> This explanation is rather confusing, but I'm not sure if we can do much
> better. I followed all of it, because I was there when we found the bug
> that this is fixing. And of course that happened _after_ we implemented
> midx bitmaps and in particular adapted the verbatim reuse stuff in
> pack-objects to make use of it.
>
> I see why you'd want to float the fix up before then, so we don't ever
> have the broken state. But it's hard to understand what bug this is
> fixing, because the bug does not even exist yet at this point in
> the series!
>
> I dunno. Like I said, I was able to follow it, so maybe it is
> sufficient. I'm just not sure others would be able to.

I think that others will follow it, too. But I agree that it is
confusing, since we're fixing a bug that doesn't yet exist. In reality,
I wrote this patch after sending v1, and then reordered its position to
come before the implementation of MIDX bitmaps for that reason.

So in one sense, I prefer it this way because we don't ever introduce
the bug.  But in another sense, it is very jarring to read about an
interaction that has no basis in the code (yet).

I think that the best thing we could do without adding any significant
reordering would be to just call out the situation we're in. I added
this onto the end of the commit message which I think makes things a
little clearer:

    (Note that multi-pack reachability bitmaps have yet to be
    implemented; so in that sense this patch is fixing a bug which does
    not yet exist.  But by having this patch beforehand, we can prevent
    the bug from ever materializing.)

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 12/24] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
  2021-07-21 10:39     ` Jeff King
@ 2021-07-21 20:18       ` Taylor Blau
  0 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-21 20:18 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 21, 2021 at 06:39:39AM -0400, Jeff King wrote:
> On Mon, Jun 21, 2021 at 06:25:29PM -0400, Taylor Blau wrote:
> > [...]
> >
> > Implement a function 'bitmap_is_preferred_refname()' which does just
> > that. The caller will be added in a subsequent patch.
>
> I suspect there was some patch reordering here. We don't have any
> multi-pack bitmap code yet. :)
>
> Probably this needs to say something like "in preparation for adding
> multi-pack bitmap code..." or similar?

Oops. Good catch!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 13/24] pack-bitmap: read multi-pack bitmaps
  2021-07-21 11:32     ` Jeff King
@ 2021-07-21 23:01       ` Taylor Blau
  2021-07-23  9:40         ` Jeff King
  2021-07-23 10:00         ` Jeff King
  0 siblings, 2 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-21 23:01 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 21, 2021 at 07:32:49AM -0400, Jeff King wrote:
> On Mon, Jun 21, 2021 at 06:25:31PM -0400, Taylor Blau wrote:
> > +	if (!is_pack_valid(packfile)) {
> > +		close(fd);
> > +		return -1;
> > +	}
> > +
>
> What's this extra is_pack_valid() doing? I wouldn't expect many changes
> at all to this non-midx code path (aside from the "did we already load a
> midx bitmap" in the earlier part of the hunk, which makes sense).

That looks like a mistake to me. I did a little digging and tried to
remember if it could have ever been useful, but I think that it's just a
stray change that has no value. Removed.

> > -static int load_pack_bitmap(struct bitmap_index *bitmap_git)
> > +static int load_reverse_index(struct bitmap_index *bitmap_git)
> > +{
> > +	if (bitmap_is_midx(bitmap_git)) {
> > +		uint32_t i;
> > +		int ret;
> > +
> > +		ret = load_midx_revindex(bitmap_git->midx);
> > +		if (ret)
> > +			return ret;
> > +
> > +		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
> > +			if (prepare_midx_pack(the_repository, bitmap_git->midx, i))
> > +				die(_("load_reverse_index: could not open pack"));
> > +			ret = load_pack_revindex(bitmap_git->midx->packs[i]);
> > +			if (ret)
> > +				return ret;
> > +		}
> > +		return 0;
> > +	}
> > +	return load_pack_revindex(bitmap_git->pack);
> > +}
>
> OK, this new function is used in load_bitmap(), which is used for both
> pack and midx bitmaps. So if we have a midx bitmap, we'll
> unconditionally load the revindex here. But:
>
>   - why do we then load individual pack revindexes? I can believe it may
>     be necessary to meet the assumptions of some other part of the code,
>     but it would be nice to have a comment giving us some clue.

Good suggestion. We will need to reference the reverse index belonging
to individual packs in a few locations in pack-objects (for e.g.,
write_reuse_object() calls offset_to_pack_pos(), and
pack_pos_to_offset(), both with arbitrary packs, not just the preferred
one).

I left the comment vague; something along the lines of "lots of routines
in pack-objects will need these structures to be ready to use".

I think there's room for improvement there, since for e.g., `git
rev-list --count --objects --use-bitmap-index` doesn't need to load the
reverse indexes. But that's already the case with classic bitmaps, too,
which eagerly call load_pack_revindex().

>   - in open_midx_bitmap_1(), we also unconditionally load the midx
>     reverse index. I think that will always happen before us here (we
>     cannot load_bitmap() a bitmap that has not been opened). So is this
>     load_midx_revindex() call always a noop?

Great catch. I removed the call to load_midx_revindex(), and replaced it
with a comment explaining why we don't need to call it (because we
already did).

> >  static int bitmap_position(struct bitmap_index *bitmap_git,
> >  			   const struct object_id *oid)
> >  {
> > -	int pos = bitmap_position_packfile(bitmap_git, oid);
> > +	int pos;
> > +	if (bitmap_is_midx(bitmap_git))
> > +		pos = bitmap_position_midx(bitmap_git, oid);
> > +	else
> > +		pos = bitmap_position_packfile(bitmap_git, oid);
> >  	return (pos >= 0) ? pos : bitmap_position_extended(bitmap_git, oid);
> >  }
>
> Makes sense. Not new in your patch, but this "int" return is fudging the
> same 32-bit space we were talking about elsewhere (i.e., "pos" really
> could be 2^32, or even more due to extended objects).

:-). It bothers me to no end, too, because of all of the recent effort
to improve the reverse-index APIs to avoid exactly this issue. But I
tend to agree that the concern is more theoretical than anything,
because we're only using the MSB, so the remaining 2^31 possible objects
still seems pretty generous.

> In practice I think even 2^31 objects is pretty out-of-reach, but it may
> be worth changing the return type (and the callers), or even just
> catching the overflow with an assertion.

Possibly, but keep in mind that the former is basically the same
refactor as we did with the "tell me whether this object was found via
this extra pointer". But bitmap_position() has a lot more callers than
that, so the plumbing required would be a little more prevalent.

So I'd be content to just punt on it for now, if you'd be OK with it.

> >  	if (pos < bitmap_num_objects(bitmap_git)) {
> > -		off_t ofs = pack_pos_to_offset(pack, pos);
> > +		struct packed_git *pack;
> > +		off_t ofs;
> > +
> > +		if (bitmap_is_midx(bitmap_git)) {
> > +			uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
> > +			uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
> > +
> > +			pack = bitmap_git->midx->packs[pack_id];
> > +			ofs = nth_midxed_offset(bitmap_git->midx, midx_pos);
> > +		} else {
> > +			pack = bitmap_git->pack;
> > +			ofs = pack_pos_to_offset(pack, pos);
> > +		}
> > +
>
> All of the hunks like this make perfect sense. The big problem would be
> if we _missed_ a place that needed conversion to handle midx. But the
> nice thing is that it would segfault quickly in such an instance. So
> there I'm mostly relying on test coverage, plus our experience running
> with this code at scale.

Yeah; I'm definitely happy to rely on our experience running this and
related patches at GitHub for several months to give us confidence that
we didn't miss anything here.

> >  static void try_partial_reuse(struct bitmap_index *bitmap_git,
> > +			      struct packed_git *pack,
> >  			      size_t pos,
> >  			      struct bitmap *reuse,
> >  			      struct pack_window **w_curs)
> >  {
> > -	off_t offset, header;
> > +	off_t offset, delta_obj_offset;
>
> I'm OK with all of this in one big patch. But I suspect you _could_
> just put:
>
>   if (bitmap_git->midx)
> 	return; /* partial reuse not implemented for midx yet */
>
> to start with, and then actually implement it later. I call out this
> code in particular just because it's got a lot of subtleties (the
> "reuse" bits are much more intimate with the assumptions of packs and
> bitmaps than most other code).
>
> I'm not sure if it's worth the trouble at this point or not.

Yeah, I'd definitely err on the side of not splitting this up now,
especially since you've already gone through the whole patch and
reviewed it. (Of course, if your response was "this patch is way too
big, please split it up so I can more easily review it", that would be a
different story).

But I appreciate the advice, since I have felt that a lot of these
format-level changes require a 3-patch arc where:

  - The first patch describes the new format in Documentation/technical.
  - The second patch implements support for reading files that are
    written in the new format.
  - And finally, the third patch implements support for writing such
    files.

...and it's usually the second of those three patches that is the most
complicated one by far. So this is a good way to split that patch up
into many pieces.

Of course, that only works if you delay adding tests until after support
is added for all parts of the new format, but that's more-or-less what
did here.

> > +static uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git)
> > +{
> > +	struct multi_pack_index *m = bitmap_git->midx;
> > +	if (!m)
> > +		BUG("midx_preferred_pack: requires non-empty MIDX");
> > +	return nth_midxed_pack_int_id(m, pack_pos_to_midx(bitmap_git->midx, 0));
> > +}
>
> This part is really subtle. We infer the preferred pack by looking at
> the pack of the 0th bit position. In general that works, since that's
> part of the definition of the preferred pack.
>
> Could this ever be fooled if we had a preferred pack with 0 objects in
> it? I don't know why we would have such a thing, but just trying to
> think of cases where our assumptions might not hold (and what bad things
> could happen).

An empty preferred pack would cause a problem, yes. The solution is
two-fold (and incorporated into the reroll that I plan on sending
shortly):

  - When the user specifies --preferred-pack, the MIDX code must make
    sure that the given pack is non-empty. That's a new patch, and
    basically adds a new conditional (to check the pack itself) and a
    test (to make sure that we catch the case we are trying to prevent).

  - When the user doesn't specify --preferred-pack (and instead asks us
    to infer one for them) we want to select not just the oldest pack,
    but the oldest *non-empty* pack. That is folded into the "midx:
    infer preferred pack when not given one" patch.

In that patch, I made a note, but I think that it's subtle enough to
merit sharing here again. In the loop over all packs, the conditional for swapping out the oldest pack for the current one was something like:

    if (p->mtime < oldest->mtime)
      oldest = p;

but now we want it to be:

    if (!oldest->num_objects || p->mtime < oldest->mtime)
      oldest = p;

to reject packs that have no objects. And we want to be extra careful in
the case where the only pack fed to the MIDX writer was empty. But we
don't have to do anything there, since there are no objects to write
anyway, so any "preferred_idx" would be fine.

> > +	if (bitmap_is_midx(bitmap_git))
> > +		pack = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)];
> > +	else
> > +		pack = bitmap_git->pack;
> > +	objects_nr = pack->num_objects;
> > +
> >  	while (i < result->word_alloc && result->words[i] == (eword_t)~0)
> >  		i++;
> >
> > -	/* Don't mark objects not in the packfile */
> > +	/*
> > +	 * Don't mark objects not in the packfile or preferred pack. This bitmap
> > +	 * marks objects eligible for reuse, but the pack-reuse code only
> > +	 * understands how to reuse a single pack. Since the preferred pack is
> > +	 * guaranteed to have all bases for its deltas (in a multi-pack bitmap),
> > +	 * we use it instead of another pack. In single-pack bitmaps, the choice
> > +	 * is made for us.
> > +	 */
> >  	if (i > objects_nr / BITS_IN_EWORD)
> >  		i = objects_nr / BITS_IN_EWORD;
>
> OK, so this clamps our "quick" contiguous set of bits to the number of
> objects in the preferred pack. Makes sense. And then we hit the
> object-by-object loop below...
>
> > @@ -1213,7 +1437,15 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
> >  				break;
> >
> >  			offset += ewah_bit_ctz64(word >> offset);
> > -			try_partial_reuse(bitmap_git, pos + offset, reuse, &w_curs);
> > +			if (bitmap_is_midx(bitmap_git)) {
> > +				/*
> > +				 * Can't reuse from a non-preferred pack (see
> > +				 * above).
> > +				 */
> > +				if (pos + offset >= objects_nr)
> > +					continue;
> > +			}
> > +			try_partial_reuse(bitmap_git, pack, pos + offset, reuse, &w_curs);
>
> ...and this likewise makes sure we never go past that first pack. Good.
>
> I think this "continue" could actually be a "break", as the loop is
> iterating over "offset" (and "pos + offset" always gets larger). In
> fact, it could break out of the outer loop as well (which is
> incrementing "pos"). It's probably a pretty small efficiency in
> practice, though.

Yeah; you're right. And we'll save up to BITS_IN_EWORD cycles of this
loop. (I wonder if smart-enough compilers will realize the same
optimization that you did and turn that `continue` into a `break`
automatically, but that's neither here nor there).

> > @@ -1511,8 +1749,13 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
> >  		struct object_id oid;
> >  		struct object_entry *oe;
> >
> > -		nth_packed_object_id(&oid, bitmap_git->pack,
> > -				     pack_pos_to_index(bitmap_git->pack, i));
> > +		if (bitmap_is_midx(bitmap_git))
> > +			nth_midxed_object_oid(&oid,
> > +					      bitmap_git->midx,
> > +					      pack_pos_to_midx(bitmap_git->midx, i));
> > +		else
> > +			nth_packed_object_id(&oid, bitmap_git->pack,
> > +					     pack_pos_to_index(bitmap_git->pack, i));
> >  		oe = packlist_find(mapping, &oid);
>
> Could this be using nth_bitmap_object_oid()? I guess not, because we are
> feeding from pack_pos_to_*. I'm not sure if another helper function is
> worth it (pack_pos_to_bitmap_index() or something?).

You're right that we can't call nth_bitmap_object_oid here directly,
sadly. But I think your suggestion for pack_pos_to_bitmap_index() (or
similar) would only benefit this caller, since most places that dispatch
conditionally to either pack_pos_to_{midx,index} want to pass the result
to a different function depending on which branch they took.

Definitely possible that I missed another case that would help, but that
was what I came up with after just a quick glance.

> > @@ -1575,7 +1831,31 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
> >  				break;
> >
> >  			offset += ewah_bit_ctz64(word >> offset);
> > -			pos = base + offset;
> > +
> > +			if (bitmap_is_midx(bitmap_git)) {
> > +				uint32_t pack_pos;
> > +				uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, base + offset);
> > +				uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
> > +				off_t offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
> > +
> > +				pack = bitmap_git->midx->packs[pack_id];
> > +
> > +				if (offset_to_pack_pos(pack, offset, &pack_pos) < 0) {
> > +					struct object_id oid;
> > +					nth_midxed_object_oid(&oid, bitmap_git->midx, midx_pos);
> > +
> > +					die(_("could not find %s in pack #%"PRIu32" at offset %"PRIuMAX),
> > +					    oid_to_hex(&oid),
> > +					    pack_id,
> > +					    (uintmax_t)offset);
> > +				}
> > +
> > +				pos = pack_pos;
> > +			} else {
> > +				pack = bitmap_git->pack;
> > +				pos = base + offset;
> > +			}
> > +
> >  			total += pack_pos_to_offset(pack, pos + 1) -
> >  				 pack_pos_to_offset(pack, pos);
> >  		}
>
> In the midx case, we have to go from midx-bitmap-pos to midx-index-pos,
> to then get the pack/ofs combo, which then gives us a real "pos" in the
> pack. I don't think there's a faster way to do it (and this is still
> much faster than looking up objects in the pack only to check their
> revindex).
>
> But then with the result, we compare the offset of "pos" and "pos + 1".
> We need to know "pos" to find "pos + 1". But in the midx case, don't we
> already have the offset of "pos" (it is "offset" in the bitmap_is_midx()
> conditional, which is shadowing the completely unrelated "offset" in the
> outer loop).
>
> We could reuse it, saving ourselves an extra round-trip of pack_pos to
> index_pos to offset. It would just mean stuffing the "total +=" line
> into the two sides of the conditional.

Yep; agreed. And it allows us to clean up a few little other things, so
I squashed it in. Thanks for the suggestion!

> > +off_t bitmap_pack_offset(struct bitmap_index *bitmap_git, uint32_t pos)
> > +{
> > +	if (bitmap_is_midx(bitmap_git))
> > +		return nth_midxed_offset(bitmap_git->midx,
> > +					 pack_pos_to_midx(bitmap_git->midx, pos));
> > +	return nth_packed_object_offset(bitmap_git->pack,
> > +					pack_pos_to_index(bitmap_git->pack, pos));
> > +}
>
> Does anybody call this function? I don't see any users by the end of the
> series.

Nope, great catch. I looked at callers of nth_midxed_offset and
nth_packed_object_offset to see if they could use this function and
weren't, but the spots I looked at didn't appear to be able that they
would be helped by the existence of this helper, so I just removed it.

Phew! This was quite the email to respond to, but I suppose that's my
fault for writing such a monstrous patch. Thank you for taking the time
to read through it all so carefully. I think you got the short end of
the stick between writing this email and responding to it, so thank you
:-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 02/24] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  2021-07-21 17:20       ` Taylor Blau
@ 2021-07-23  7:37         ` Jeff King
  2021-07-26 18:48           ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-23  7:37 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 21, 2021 at 01:20:47PM -0400, Taylor Blau wrote:

> > > @@ -463,8 +488,11 @@ void bitmap_writer_build(struct packing_data *to_pack)
> > >  		struct commit *child;
> > >  		int reused = 0;
> > >
> > > -		fill_bitmap_commit(ent, commit, &queue, &tree_queue,
> > > -				   old_bitmap, mapping);
> > > +		if (fill_bitmap_commit(ent, commit, &queue, &tree_queue,
> > > +				       old_bitmap, mapping) < 0) {
> > > +			closed = 0;
> > > +			break;
> > > +		}
> > >
> > >  		if (ent->selected) {
> > >  			store_selected(ent, commit);
> >
> > This is the right thing to do because we still want to free memory, stop
> > progress, etc. I gave a look over what will run after breaking out of
> > the loop, and compute_xor_offsets(), which you already handled, is the
> > only thing we'd want to avoid running. Good.
> 
> Right. The key is that we return "closed ? 0 : -1" (of course, being
> careful to invert "closed" where "1" OK into a suitable return value for
> bitmap_writer_build, where "0" means OK, and a negative number means
> "error").
> 
> While I'm thinking about that inversion, we *could* call this variable
> "open" and set it to "0" until proven otherwise. Then the conditional
> becomes "if (!open)", but the return value is still "return open ? -1 :
> 0" (since I assume we'd want to use 0/1 values for "open" instead of -1,
> meaning we'd have to do some translation).

I thought about suggesting that it be called "err" or "ret" or
something. And then we do not have to care that fill_bitmap_commit()
only returns an error in the non-closed state. We are simply propagating
its error-return back up the stack.

And then you can just write:

  ret = fill_bitmap_commit(...);
  if (ret < 0)
	break;

  ...
  return ret;

without an extra conversion. I don't care that much either way, though
(but if you like it and are re-rolling anyway... :) ).

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 04/24] Documentation: build 'technical/bitmap-format' by default
  2021-07-21 17:23         ` Taylor Blau
@ 2021-07-23  7:39           ` Jeff King
  2021-07-26 18:49             ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-23  7:39 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 21, 2021 at 01:23:34PM -0400, Taylor Blau wrote:

> > I don't know if it's better to have a poorly-formatted HTML file, or
> > none at all. :)
> >
> > Personally, I would just read the source. And I have a slight concern
> > that if we start "cleaning it up" to render as asciidoc, the source
> > might end up a lot less readable (though I'd reserve judgement until
> > actually seeing it).
> 
> Yeah, the actual source is pretty readable (and it's what I had been
> looking at, although it is sometimes convenient to have a version I can
> read in my web browser). But it's definitely not good Asciidoc.
> 
> I briefly considered cleaning it up, but decided against it. Usually I
> would opt to clean it up, but this series is already so large that I
> figured it would make a negative impact on the reviewer experience to
> read a clean-up patch here.
> 
> I wouldn't be opposed to coming back to it in the future, once the dust
> settles. I guess we can consider this #leftoverbits until then.

Yeah, I definitely don't want to see that cleanup as a dependency for
this series. It's already long enough as it is. Coming back to it later
is just fine with me.

The question here is: should we continue to omit it from the html build,
since it does not render well (i.e., should we simply drop this patch).

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 05/24] Documentation: describe MIDX-based bitmaps
  2021-07-21 17:53       ` Taylor Blau
@ 2021-07-23  7:45         ` Jeff King
  0 siblings, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-07-23  7:45 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 21, 2021 at 01:53:40PM -0400, Taylor Blau wrote:

> > > +	The ordering between packs is done lexicographically by the pack name,
> > > +	with the exception of the preferred pack, which sorts ahead of all other
> > > +	packs.
> >
> > Hmm, I'm not sure if this "lexicographically" part is true. Really we're
> > building on the midx .rev format here. And that says "defined by the
> > MIDX's pack list" (though I can't offhand remember if that is
> > lexicographic, or if it is in the reverse-mtime order).
> >
> > At any rate, should we just be referencing the rev documentation?
> 
> The packs are listed in lex order in the MIDX, but that is so we can
> binary search that list to determine whether a pack is included in the
> MIDX or not.
> 
> I had to check, but we do use the lex order to resolve duplicate
> objects, too. See (at the tip of this branch):
> 
>     QSORT(ctx.info, ctx.nr, pack_info_compare);
> 
> from within midx.c:write_midx_internal(). Here, ctx.info contains the
> list of packs, and pack_info_compare is a thin wrapper around
> strcmp()-ing the pack_name values of two packed_git structures.

Ah, OK, thanks for checking.

> Arguably, you'd get better EWAH compression of the bits between packs
> if we sorted packs in reverse order according to their mtime. But I
> suspect that it doesn't matter much in practice, since the number of
> objects vastly outpaces the number of packs (but I haven't measured to
> be certain, so take that with a grain of salt).

Agreed, especially when the intended use is with geometric repacking to
keep reasonable-sized packs.

Either way, I think heuristics to optimize the pack ordering can easily
come on top later. Let's keep this series focused on the fundamentals of
having midx bitmaps at all.

> In any case, I think that you're right that adding too much detail hurts
> us here, so we should really be mentioning the MIDX's .rev-file
> documentation (unfortunately, we can't linkgit it, so mentioning it by
> name will have to suffice). I plan to reroll with something like this on
> top:
> 
> diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
> index 25221c7ec8..04b3ec2178 100644
> --- a/Documentation/technical/bitmap-format.txt
> +++ b/Documentation/technical/bitmap-format.txt
> @@ -26,9 +26,8 @@ An object is uniquely described by its bit position within a bitmap:
> 
>  		o1 <= o2 <==> pack(o1) <= pack(o2) /\ offset(o1) <= offset(o2)
> 
> -	The ordering between packs is done lexicographically by the pack name,
> -	with the exception of the preferred pack, which sorts ahead of all other
> -	packs.
> +	The ordering between packs is done according to the MIDX's .rev file.
> +	Notably, the preferred pack sorts ahead of all other packs.
> 
>  The on-disk representation (described below) of a bitmap is the same regardless
>  of whether or not that bitmap belongs to a packfile or a MIDX. The only

Thanks, that looks much better. We can't linkgit, but we only build HTML
for these. So just a link to pack-format.html would work, as they'd
generally be found side-by-side in the filesystem. But since this
doesn't even really render as asciidoc, I'm not sure I care either way.
(Obviously we could also mention pack-format.txt by name, but it's
probably already obvious-ish to a human that this is where you'd find
information on the pack .rev format).

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing
  2021-07-21 19:22       ` Taylor Blau
@ 2021-07-23  8:29         ` Jeff King
  2021-07-26 18:59           ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-23  8:29 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 21, 2021 at 03:22:34PM -0400, Taylor Blau wrote:

> > > This avoids a problem that would arise in subsequent patches due to the
> > > combination of 'git repack' reopening the object store in-process and
> > > the multi-pack index code not checking whether a pack already exists in
> > > the object store when calling add_pack_to_midx().
> > >
> > > This would ultimately lead to a cycle being created along the
> > > 'packed_git' struct's '->next' pointer. That is obviously bad, but it
> > > has hard-to-debug downstream effects like saying a bitmap can't be
> > > loaded for a pack because one already exists (for the same pack).
> >
> > I'm not sure I completely understand the bug that this causes.
> 
> Off-hand, I can't quite remember either. But it is important; I do have
> a distinct memory of dropping this patch and then watching a 'git repack
> --write-midx' (that option will be introduced in a later series) fail
> horribly.
> 
> If I remember correctly, the bug has to do with loading a MIDX twice in
> the same process. When we call add_packed_git() from within
> prepare_midx_pack(), we load the pack without caring whether or not it's
> already loaded. So loading a MIDX twice in the same process will fail.
> 
> So really I think that this is papering over that bug: we're just
> removing one of the times that we happened to load a MIDX from during
> the writing phase.

Hmm, after staring at this for a bit, I've unconfused and re-confused
myself several times.

Here are some interesting bits:

  - calling load_multi_pack_index() directly creates a new midx object.
    None of its m->packs[] array will be filled in. Nor is it reachable
    as r->objects->multi_pack_index.

  - in using that midx, we end up calling prepare_midx_pack() for
    various packs, which creates a new packed_git struct and adds it to
    r->objects->packed_git (via install_packed_git()).

So that's a bit weird already, because we have packed_git structs in
r->objects that came from a midx that isn't r->objects->multi_pack_index.
And then if we later call prepare_multi_pack_index(), for example as
part of a pack reprepare, then we'd end up with duplicates.

Whereas normally, when a direct load_multi_pack_index() was not called,
our only midx would be r->objects->multi_pack_index, and so we'd avoid
re-loading it.

That seems wrong and wasteful, but I don't see how it results in a
circular linked list. And it seems like it would already be the case for
this write path, independent of your series. Either way, the solution is
probably for prepare_midx_pack() to check for duplicates (which we can
do pretty cheaply these days due to the hashmap; see prepare_pack).

But I'm worried there is something else going on. Your commit message
mentions add_pack_to_midx(). That's something we call as part of
write_midx_internal(), and it does create other packed_git structs. But
it never calls install_packed_git() on them; they just live in the
write_midx_context. So I'm not sure how they'd interfere with things.

And then there's one final oddity. Your patch assigns to ctx.m from
r->objects->multi_pack_index. But later in write_midx_internal(), we
call close_midx(). In the original, it's in the middle of the function,
but one of your patches puts it at the end of the function. But that
means we are closing r->objects->multi_pack_index.

Looking at close_midx(), it does not actually zero the struct. So we'd
still have r->objects->multi_pack_index->data pointed to memory which
has been unmapped. That seems like an accident waiting to happen. I
guess it doesn't usually cause problems because we'd typically write a
midx near the end of the process, and then not look up other objects?

So I'm concerned this is introducing a subtle bug that will bite us
later. And we should figure out what the actual thing it's fixing is, so
we can understand if there is a better way to fix it (e.g., by removing
duplicates in prepare_midx_pack(), or if it is some interaction with the
writing code).

I guess a good thing to try would be dropping this patch and seeing if
the tests break. ;)

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 09/24] midx: infer preferred pack when not given one
  2021-07-21 20:16       ` Taylor Blau
@ 2021-07-23  8:50         ` Jeff King
  2021-07-26 19:44           ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-23  8:50 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 21, 2021 at 04:16:07PM -0400, Taylor Blau wrote:

> > I dunno. Like I said, I was able to follow it, so maybe it is
> > sufficient. I'm just not sure others would be able to.
> 
> I think that others will follow it, too. But I agree that it is
> confusing, since we're fixing a bug that doesn't yet exist. In reality,
> I wrote this patch after sending v1, and then reordered its position to
> come before the implementation of MIDX bitmaps for that reason.
> 
> So in one sense, I prefer it this way because we don't ever introduce
> the bug.  But in another sense, it is very jarring to read about an
> interaction that has no basis in the code (yet).
> 
> I think that the best thing we could do without adding any significant
> reordering would be to just call out the situation we're in. I added
> this onto the end of the commit message which I think makes things a
> little clearer:
> 
>     (Note that multi-pack reachability bitmaps have yet to be
>     implemented; so in that sense this patch is fixing a bug which does
>     not yet exist.  But by having this patch beforehand, we can prevent
>     the bug from ever materializing.)

I do like fixing it up front. Here's my attempt at rewriting the commit
message. I tried to omit details about pack order, and instead refer to
the revindex code, and instead add more explanation of how this relates
to the pack-reuse code.

Something like:

  In 9218c6a40c (midx: allow marking a pack as preferred, 2021-03-30),
  the multi-pack index code learned how to select a pack which all
  duplicate objects are selected from. That is, if an object appears in
  multiple packs, select the copy in the preferred pack before using one
  from any other pack.

  Later in that same series, 38ff7cabb6 (pack-revindex: write multi-pack
  reverse indexes, 2021-03-30) learned to put the preferred pack at the
  start of the pack order when generating a midx ".rev" file. So far,
  that .rev ordering has not mattered. But it will be very important
  once we start using the .rev ordering for midx bitmaps.

  There is code in pack-objects to reuse pack bytes verbatim when
  bitmaps tell us a significant portion of the beginning of the code
  should be in the output. This code relies on the pack mentioned by the
  0th bit also being the pack that is preferred for duplicates (because
  we'd want to make sure both bases and deltas come from the same pack).
  For a pack .bitmap, this is trivially correct. For a midx bitmap, it
  is only true when some pack gets both duplicate-priority and is placed
  at the front of the .rev file. I.e., there must be _some_ preferred
  pack.

  So if the user did not specify a preferred pack, we pick one
  arbitrarily.

  There's no test here for a few reasons:

    - the midx bitmap feature does not yet exist; this is preemptively
      fixing a problem before introducing buggy code

    - whether things go wrong with the current rules depends on things
      like readdir() order, since that is used for some midx pack
      ordering. So any test might happen to succeed or fail based on
      factors outside of our control.

Thoughts?

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 13/24] pack-bitmap: read multi-pack bitmaps
  2021-07-21 23:01       ` Taylor Blau
@ 2021-07-23  9:40         ` Jeff King
  2021-07-23 10:00         ` Jeff King
  1 sibling, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-07-23  9:40 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 21, 2021 at 07:01:16PM -0400, Taylor Blau wrote:

> On Wed, Jul 21, 2021 at 07:32:49AM -0400, Jeff King wrote:
> > On Mon, Jun 21, 2021 at 06:25:31PM -0400, Taylor Blau wrote:
> > > +	if (!is_pack_valid(packfile)) {
> > > +		close(fd);
> > > +		return -1;
> > > +	}
> > > +
> >
> > What's this extra is_pack_valid() doing? I wouldn't expect many changes
> > at all to this non-midx code path (aside from the "did we already load a
> > midx bitmap" in the earlier part of the hunk, which makes sense).
> 
> That looks like a mistake to me. I did a little digging and tried to
> remember if it could have ever been useful, but I think that it's just a
> stray change that has no value. Removed.

This turned out to be quite interesting. It _is_ a mistake to include it
in this series. But it turns out to be quite valuable on its own. :)

I just cleaned it up and sent it as its own separate patch:

  https://lore.kernel.org/git/YPqL%2FpZt6hNYN4hB@coredump.intra.peff.net/

So it's a happy accident that your series called attention to it. :)

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 13/24] pack-bitmap: read multi-pack bitmaps
  2021-07-21 23:01       ` Taylor Blau
  2021-07-23  9:40         ` Jeff King
@ 2021-07-23 10:00         ` Jeff King
  2021-07-26 20:36           ` Taylor Blau
  1 sibling, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-23 10:00 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 21, 2021 at 07:01:16PM -0400, Taylor Blau wrote:

> > OK, this new function is used in load_bitmap(), which is used for both
> > pack and midx bitmaps. So if we have a midx bitmap, we'll
> > unconditionally load the revindex here. But:
> >
> >   - why do we then load individual pack revindexes? I can believe it may
> >     be necessary to meet the assumptions of some other part of the code,
> >     but it would be nice to have a comment giving us some clue.
> 
> Good suggestion. We will need to reference the reverse index belonging
> to individual packs in a few locations in pack-objects (for e.g.,
> write_reuse_object() calls offset_to_pack_pos(), and
> pack_pos_to_offset(), both with arbitrary packs, not just the preferred
> one).
> 
> I left the comment vague; something along the lines of "lots of routines
> in pack-objects will need these structures to be ready to use".

Makes sense. I think we _could_ be lazy-loading them, but IIRC only some
of the revindex functions are happy to lazy-load. It's definitely fine
to punt on that for now with a comment.

> I think there's room for improvement there, since for e.g., `git
> rev-list --count --objects --use-bitmap-index` doesn't need to load the
> reverse indexes. But that's already the case with classic bitmaps, too,
> which eagerly call load_pack_revindex().

Right. I think our solution there was to make loading the revindex
really cheap (open+mmap, rather than the in-core generation). I'm
definitely happy to call that fast enough for now, and if somebody wants
to benchmark and micro-optimize cases where we can avoid loading them,
we can do that later.

> > In practice I think even 2^31 objects is pretty out-of-reach, but it may
> > be worth changing the return type (and the callers), or even just
> > catching the overflow with an assertion.
> 
> Possibly, but keep in mind that the former is basically the same
> refactor as we did with the "tell me whether this object was found via
> this extra pointer". But bitmap_position() has a lot more callers than
> that, so the plumbing required would be a little more prevalent.
> 
> So I'd be content to just punt on it for now, if you'd be OK with it.

Yeah, I think it's fine to leave it out of this series. It's not new,
and we can revisit it later.

> > Could this ever be fooled if we had a preferred pack with 0 objects in
> > it? I don't know why we would have such a thing, but just trying to
> > think of cases where our assumptions might not hold (and what bad things
> > could happen).
> 
> An empty preferred pack would cause a problem, yes. The solution is
> two-fold (and incorporated into the reroll that I plan on sending
> shortly):
> 
>   - When the user specifies --preferred-pack, the MIDX code must make
>     sure that the given pack is non-empty. That's a new patch, and
>     basically adds a new conditional (to check the pack itself) and a
>     test (to make sure that we catch the case we are trying to prevent).
> 
>   - When the user doesn't specify --preferred-pack (and instead asks us
>     to infer one for them) we want to select not just the oldest pack,
>     but the oldest *non-empty* pack. That is folded into the "midx:
>     infer preferred pack when not given one" patch.

Oh good, I said something useful. ;) The fix you outlined sounds
sensible.

> > > +			if (bitmap_is_midx(bitmap_git)) {
> > > +				/*
> > > +				 * Can't reuse from a non-preferred pack (see
> > > +				 * above).
> > > +				 */
> > > +				if (pos + offset >= objects_nr)
> > > +					continue;
> > > +			}
> > > +			try_partial_reuse(bitmap_git, pack, pos + offset, reuse, &w_curs);
> >
> > ...and this likewise makes sure we never go past that first pack. Good.
> >
> > I think this "continue" could actually be a "break", as the loop is
> > iterating over "offset" (and "pos + offset" always gets larger). In
> > fact, it could break out of the outer loop as well (which is
> > incrementing "pos"). It's probably a pretty small efficiency in
> > practice, though.
> 
> Yeah; you're right. And we'll save up to BITS_IN_EWORD cycles of this
> loop. (I wonder if smart-enough compilers will realize the same
> optimization that you did and turn that `continue` into a `break`
> automatically, but that's neither here nor there).

If you break all the way out, then it saves iterating over all of those
other words that are not in the first pack, too. I.e., if your bitmap
has 10 million bits (for a 10-million object clone), but your first pack
only has a million objects in it, we'll call try_partial_reuse() 9
million extra times.

Fortunately, each call is super cheap, because the first thing it does
is check if the requested bit is past the end of the pack. Which kind of
makes me wonder if we could simplify this further by just letting
try_partial_reuse() tell us when there's no point going further:

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 02948e8c78..b84b55c4f3 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1308,7 +1308,11 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	return NULL;
 }
 
-static void try_partial_reuse(struct bitmap_index *bitmap_git,
+/*
+ * -1 means "stop trying further objects"; 0 means we may or may not have
+ * reused, but you can keep feeding bits.
+ */
+static int try_partial_reuse(struct bitmap_index *bitmap_git,
 			      struct packed_git *pack,
 			      size_t pos,
 			      struct bitmap *reuse,
@@ -1342,12 +1346,12 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 	 */
 
 	if (pos >= pack->num_objects)
-		return; /* not actually in the pack or MIDX preferred pack */
+		return -1; /* not actually in the pack or MIDX preferred pack */
 
 	offset = delta_obj_offset = pack_pos_to_offset(pack, pos);
 	type = unpack_object_header(pack, w_curs, &offset, &size);
 	if (type < 0)
-		return; /* broken packfile, punt */
+		return -1; /* broken packfile, punt */
 
 	if (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA) {
 		off_t base_offset;
@@ -1364,9 +1368,9 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 		base_offset = get_delta_base(pack, w_curs, &offset, type,
 					     delta_obj_offset);
 		if (!base_offset)
-			return;
+			return 0;
 		if (offset_to_pack_pos(pack, base_offset, &base_pos) < 0)
-			return;
+			return 0;
 
 		/*
 		 * We assume delta dependencies always point backwards. This
@@ -1378,7 +1382,7 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 		 * odd parameters.
 		 */
 		if (base_pos >= pos)
-			return;
+			return 0;
 
 		/*
 		 * And finally, if we're not sending the base as part of our
@@ -1389,13 +1393,14 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 		 * object_entry code path handle it.
 		 */
 		if (!bitmap_get(reuse, base_pos))
-			return;
+			return 0;
 	}
 
 	/*
 	 * If we got here, then the object is OK to reuse. Mark it.
 	 */
 	bitmap_set(reuse, pos);
+	return 0;
 }
 
 static uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git)
@@ -1449,22 +1454,20 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 	for (; i < result->word_alloc; ++i) {
 		eword_t word = result->words[i];
 		size_t pos = (i * BITS_IN_EWORD);
+		int ret;
 
 		for (offset = 0; offset < BITS_IN_EWORD; ++offset) {
 			if ((word >> offset) == 0)
 				break;
 
 			offset += ewah_bit_ctz64(word >> offset);
-			if (bitmap_is_midx(bitmap_git)) {
-				/*
-				 * Can't reuse from a non-preferred pack (see
-				 * above).
-				 */
-				if (pos + offset >= objects_nr)
-					continue;
-			}
-			try_partial_reuse(bitmap_git, pack, pos + offset, reuse, &w_curs);
+			ret = try_partial_reuse(bitmap_git, pack, pos + offset,
+						reuse, &w_curs);
+			if (ret < 0)
+				break;
 		}
+		if (ret < 0)
+			break;
 	}
 
 	unuse_pack(&w_curs);

The double-ret check is kind of ugly, though I suspect compilers
optimize it pretty well. The alternative is a "goto" to a label just
past the loop (also ugly, but easily explained with a comment).

> > > @@ -1511,8 +1749,13 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
> > >  		struct object_id oid;
> > >  		struct object_entry *oe;
> > >
> > > -		nth_packed_object_id(&oid, bitmap_git->pack,
> > > -				     pack_pos_to_index(bitmap_git->pack, i));
> > > +		if (bitmap_is_midx(bitmap_git))
> > > +			nth_midxed_object_oid(&oid,
> > > +					      bitmap_git->midx,
> > > +					      pack_pos_to_midx(bitmap_git->midx, i));
> > > +		else
> > > +			nth_packed_object_id(&oid, bitmap_git->pack,
> > > +					     pack_pos_to_index(bitmap_git->pack, i));
> > >  		oe = packlist_find(mapping, &oid);
> >
> > Could this be using nth_bitmap_object_oid()? I guess not, because we are
> > feeding from pack_pos_to_*. I'm not sure if another helper function is
> > worth it (pack_pos_to_bitmap_index() or something?).
> 
> You're right that we can't call nth_bitmap_object_oid here directly,
> sadly. But I think your suggestion for pack_pos_to_bitmap_index() (or
> similar) would only benefit this caller, since most places that dispatch
> conditionally to either pack_pos_to_{midx,index} want to pass the result
> to a different function depending on which branch they took.
> 
> Definitely possible that I missed another case that would help, but that
> was what I came up with after just a quick glance.

Yeah, looking around, I don't see another opportunity. So the benefits
are pretty minimal. We could do:

  index_pos = bitmap_is_midx(bitmap_git) ?
              pack_pos_to_midx(bitmap_git->midx, i) :
	      pack_pos_to_index(bitmap_git->pack, i);
  nth_bitmap_object_oid(&oid, bitmap_git, index_pos);

but that is not buying much. I'm content to leave it.

-Peff

^ permalink raw reply related	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 14/24] pack-bitmap: write multi-pack bitmaps
  2021-07-21 12:09     ` Jeff King
@ 2021-07-26 18:12       ` Taylor Blau
  2021-07-26 18:23         ` Taylor Blau
  2021-07-27 17:11         ` Jeff King
  0 siblings, 2 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-26 18:12 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 21, 2021 at 08:09:19AM -0400, Jeff King wrote:
> On Mon, Jun 21, 2021 at 06:25:34PM -0400, Taylor Blau wrote:
>
> > +static int add_ref_to_pending(const char *refname,
> > +			      const struct object_id *oid,
> > +			      int flag, void *cb_data)
> > +{
> > +	struct rev_info *revs = (struct rev_info*)cb_data;
> > +	struct object *object;
> > +
> > +	if ((flag & REF_ISSYMREF) && (flag & REF_ISBROKEN)) {
> > +		warning("symbolic ref is dangling: %s", refname);
> > +		return 0;
> > +	}
> > +
> > +	object = parse_object_or_die(oid, refname);
> > +	if (object->type != OBJ_COMMIT)
> > +		return 0;
> > +
> > +	add_pending_object(revs, object, "");
> > +	if (bitmap_is_preferred_refname(revs->repo, refname))
> > +		object->flags |= NEEDS_BITMAP;
> > +	return 0;
> > +}
>
> OK, so we'll look at each ref to get the set of commits that we want to
> traverse to put into the bitmap. Which is roughly the same as what the
> pack bitmap does. We only generate bitmaps for all-into-one repacks, so
> it is traversing all of the reachable objects. It is a little different
> in that the pack version is probably hitting reflogs, but IMHO we are
> better off to ignore reflogs for the purposes of bitmaps (I would
> suggest to do so in the pack-bitmap case, too, except that it is
> combined with the "what to pack" traversal there, and by the time we see
> each commit we don't know how we got there).

Right. And we might end up ignoring a lot of these commits, too: the
for-each-ref is just a starting point to enumerate everything, but we
only care about parts of the object graph that are contained in a pack
which is included in the MIDX we are writing (hence the bare "return"
you're commenting around below).

> > +static void bitmap_show_commit(struct commit *commit, void *_data)
> > +{
> > +	struct bitmap_commit_cb *data = _data;
> > +	if (oid_pos(&commit->object.oid, data->ctx->entries,
> > +		    data->ctx->entries_nr,
> > +		    bitmap_oid_access) > -1) {
>
> This "> -1" struck me as a little bit funny. Perhaps ">= 0" would be a
> more obvious way of saying "we found it"?

Sure. (I looked for other uses of oid_pos() to see what is more
common, but there really are vanishingly few uses.) Easier to read might
even be:

    int pos = oid_pos(...);
    if (pos < 0)
      return;
    ALLOC_GROW(...);

which is what I ended up going for.

> > +	/*
> > +	 * Pass selected commits in topo order to match the behavior of
> > +	 * pack-bitmaps when configured with delta islands.
> > +	 */
> > +	revs.topo_order = 1;
> > +	revs.sort_order = REV_SORT_IN_GRAPH_ORDER;
>
> Hmm. Why do we want to match this side effect of delta islands here?
>
> The only impact this has is on the order of commits we feed for bitmap
> selection (and during the actual generation phase, it may impact
> visitation order).
>
> Now I'm of the opinion that topo order is probably the best thing for
> bitmap generation (since the bitmaps themselves are connected to the
> graph structure). But if it is the best thing, shouldn't we perhaps be
> turning on topo-order for single-pack bitmaps, too?
>
> And if it isn't the best thing, then why would we want it here?

Heh, you were the one that suggested I bring this over to MIDX-based
bitmaps in the first place ;).

This comes from an investigation into why bitmap coverage had worsened
for some repositories using MIDX bitmaps at GitHub. The real reason was
resolved and unrelated to this, but trying to match the behavior of MIDX
bitmaps to our existing pack bitmap setup (which uses delta-islands) was
one strategy we tried while debugging.

I actually suspect that it doesn't really matter what order we feed this
list to bitmap_writer_select_commits() in, because the first thing that
it does is QSORT() the incoming list of commits in date order.

But it does mirror the behavior of our previous bitmap generation
settings, which has been running for years.

So... we could probably drop this hunk? I'd probably rather err on the
safe side and leave this alone since it matches a system that we know to
work well in practice.

> > +	if (prepare_revision_walk(&revs))
> > +		die(_("revision walk setup failed"));
>
> We call init_revisions(), and then go straight to
> prepare_revision_walk() with no call to setup_revisions() between. It
> doesn't seem to be clearly documented, but I think you're supposed to,
> as it finalizes some bits like diff_setup_done().
>
> I suspect it works OK in practice, and I did find a few other spots that
> do not call it (e.g., builtin/am.c:write_commit_patch). But most spots
> do at least an empty setup_revisions(0, NULL, &rev, NULL).

Sure, thanks.

> > +	/*
> > +	 * Build the MIDX-order index based on pdata.objects (which is already
> > +	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
> > +	 * this order).
> > +	 */
> > +	ALLOC_ARRAY(index, pdata.nr_objects);
> > +	for (i = 0; i < pdata.nr_objects; i++)
> > +		index[i] = (struct pack_idx_entry *)&pdata.objects[i];
>
> This cast is correct because the pack_idx_entry is at the start of each
> object_entry. But maybe:
>
>   index[i] = &pdata.objects[i].idx;
>
> would be less scary looking?

Definitely, and thanks (for this spot and the other one you mentioned).

> > +	bitmap_writer_select_commits(commits, commits_nr, -1);
>
> Not related to your patch, but I had to refresh my memory on what this
> "-1" was for. It's "max_bitmaps", and is ignored if it's negative. But
> the only callers pass "-1"! So we could get rid of it entirely.
>
> It probably makes sense to leave that cleanup out of this
> already-complicated series. But maybe worth doing later on top.

Yeah, seems like an easy topic for somebody interested in any
#leftoverbits could pick up. Once this lands, I'll be happy to take care
of it myself, too.

> > @@ -930,9 +1100,16 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
> >  		for (i = 0; i < ctx.m->num_packs; i++) {
> >  			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
> >
> > +			if (prepare_midx_pack(the_repository, ctx.m, i)) {
> > +				error(_("could not load pack %s"),
> > +				      ctx.m->pack_names[i]);
> > +				result = 1;
> > +				goto cleanup;
> > +			}
>
> It might be worth a comment here. I can easily believe that there is
> some later part of the bitmap generation code that assumes the packs are
> loaded. But somebody reading this is not likely to understand why it's
> here.
>
> Should this be done conditionally only if we're writing a bitmap? (That
> might also make it obvious why we are doing it).

Ah. Actually, I don't think this was necessary before, but we *do* need
it now because we want to compare the pack mtime's for inferring a
preferred pack when one wasn't given. And we also need to open the pack
indexes, too, because we care about the object counts (to make sure that
we don't infer a preferred pack which has no objects).

Luckily, any new packs will be loaded (and likewise have their indexes
open, too), via the the add_pack_to_midx() callback that we pass as an
argument to for_each_file_in_pack_dir().

But we could do something like this instead:

--- 8< ---

diff --git a/midx.c b/midx.c
index 8426e1a0b1..a70a6bca81 100644
--- a/midx.c
+++ b/midx.c
@@ -1111,16 +1111,29 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 		for (i = 0; i < ctx.m->num_packs; i++) {
 			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);

-			if (prepare_midx_pack(the_repository, ctx.m, i)) {
-				error(_("could not load pack"));
-				result = 1;
-				goto cleanup;
-			}
-
 			ctx.info[ctx.nr].orig_pack_int_id = i;
 			ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]);
-			ctx.info[ctx.nr].p = ctx.m->packs[i];
+			ctx.info[ctx.nr].p = NULL;
 			ctx.info[ctx.nr].expired = 0;
+
+			if (flags & MIDX_WRITE_REV_INDEX) {
+				/*
+				 * If generating a reverse index, need to have
+				 * packed_git's loaded to compare their
+				 * mtimes and object count.
+				 */
+				if (prepare_midx_pack(the_repository, ctx.m, i)) {
+					error(_("could not load pack"));
+					result = 1;
+					goto cleanup;
+				}
+
+				if (open_pack_index(ctx.m->packs[i]))
+					die(_("could not open index for %s"),
+					    ctx.m->packs[i]->pack_name);
+				ctx.info[ctx.nr].p = ctx.m->packs[i];
+			}
+
 			ctx.nr++;
 		}
 	}

--- >8 ---

> > @@ -1075,9 +1271,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
> >  	hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
> >  	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
> >
> > -	if (ctx.m)
> > -		close_midx(ctx.m);
> > -
> >  	if (ctx.nr - dropped_packs == 0) {
> >  		error(_("no pack files to index."));
> >  		result = 1;
>
> I'm not sure what this hunk is doing. We do pick up the close_midx()
> call at the end of the function, amidst the other cleanup.
>
> I expect the answer is something like "we need it open when we generate
> the bitmaps". But it makes me wonder if we could hit any cases where we
> try to overwrite it while it's still open, which would cause problems on
> Windows.

The reason is kind of annoying. If we're building a MIDX bitmap
in-process (e.g., `git multi-pack-index write --bitmap`) then we'll call
prepare_packed_git() to build our pseudo-packing list which we pass to
the bitmap generation machinery.

But prepare_packed_git() calls prepare_packed_git_one() ->
for_each_file_in_pack_dir() with the prepare_pack() callback -> which
the wants to see if the MIDX we have open already knows about a given
pack so we avoid opening it twice.

But even though the MIDX would have gone away by this point (with the
previous close_midx() call that is removed above), we still hold onto
a pointer to it via the object_store's `multi_pack_index` pointer. And
then all the way down in packfile.c:prepare_pack() we try to pass a
now-defunct pointer as the first argument to midx_contains_pack(), and
crash.

And clearing out that `multi_pack_index` pointer is tricky, because the
MIDX would have to compare the odb's `object_dir` with its own (which is
brittle in its own right), but also would have to see if that object
store is pointing at *it*, and not some other MIDX.

So we do have to keep it open there. Which makes me wonder how this
could possibly work on Windows, because holding the MIDX open will make
the commit_lock_file() definitely fail. But it seems OK in the
Windows-based CI runs?

Puzzled.

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 14/24] pack-bitmap: write multi-pack bitmaps
  2021-07-26 18:12       ` Taylor Blau
@ 2021-07-26 18:23         ` Taylor Blau
  2021-07-27 17:11         ` Jeff King
  1 sibling, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-26 18:23 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Mon, Jul 26, 2021 at 02:12:30PM -0400, Taylor Blau wrote:
> So we do have to keep it open there. Which makes me wonder how this
> could possibly work on Windows, because holding the MIDX open will make
> the commit_lock_file() definitely fail. But it seems OK in the
> Windows-based CI runs?
>
> Puzzled.

The below should do the trick; it'll keep the MIDX open just long enough
to generate a bitmap (if one was requested), but will close any
handle(s) on an existing MIDX right before we move the temporary file
into place.

It has the added benefit of making that hunk about destroying stale
references to packs be unnecessary.

Watching the Actions run here to see how this runs on Windows:

    https://github.com/ttaylorr/git/actions/runs/1068457013

Below is the patch.

--- >8 ---

commit c7b7ce0ebc793e311072929772a2d352600f3d54
Author: Taylor Blau <me@ttaylorr.com>
Date:   Mon Jul 26 14:17:27 2021 -0400

    fixup! pack-bitmap: write multi-pack bitmaps

diff --git a/midx.c b/midx.c
index 76c94a0df2..297627f992 100644
--- a/midx.c
+++ b/midx.c
@@ -1358,6 +1358,8 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 		}
 	}

+	close_midx(ctx.m);
+
 	commit_lock_file(&lk);

 	clear_midx_files_ext(the_repository, ".bitmap", midx_hash);
@@ -1368,15 +1370,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 		if (ctx.info[i].p) {
 			close_pack(ctx.info[i].p);
 			free(ctx.info[i].p);
-			if (ctx.m) {
-				/*
-				 * Destroy a stale reference to the pack in
-				 * 'ctx.m'.
-				 */
-				uint32_t orig = ctx.info[i].orig_pack_int_id;
-				if (orig < ctx.m->num_packs)
-					ctx.m->packs[orig] = NULL;
-			}
 		}
 		free(ctx.info[i].pack_name);
 	}
@@ -1386,7 +1379,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	free(ctx.pack_perm);
 	free(ctx.pack_order);
 	free(midx_name);
-	close_midx(ctx.m);

 	return result;
 }

^ permalink raw reply related	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 02/24] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  2021-07-23  7:37         ` Jeff King
@ 2021-07-26 18:48           ` Taylor Blau
  2021-07-27 17:11             ` Jeff King
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-07-26 18:48 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Fri, Jul 23, 2021 at 03:37:52AM -0400, Jeff King wrote:
> I thought about suggesting that it be called "err" or "ret" or
> something. And then we do not have to care that fill_bitmap_commit()
> only returns an error in the non-closed state. We are simply propagating
> its error-return back up the stack.

Hmm. For whatever the inconvience costs us, I do like that the variable
can be named specifically like "open" or "closed" as opposed to the more
generic "err" or "ret".

So I'll probably keep it is unless you feel strongly (which I suspect
you do not).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 04/24] Documentation: build 'technical/bitmap-format' by default
  2021-07-23  7:39           ` Jeff King
@ 2021-07-26 18:49             ` Taylor Blau
  0 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-26 18:49 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Fri, Jul 23, 2021 at 03:39:25AM -0400, Jeff King wrote:
> The question here is: should we continue to omit it from the html build,
> since it does not render well (i.e., should we simply drop this patch).

I think that's a nice way of putting it. Since the HTML rendering is
terrible, let's just drop this patch and leave cleaning it up as
#leftoverbits.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing
  2021-07-23  8:29         ` Jeff King
@ 2021-07-26 18:59           ` Taylor Blau
  2021-07-26 22:14             ` Taylor Blau
  2021-07-27 17:17             ` Jeff King
  0 siblings, 2 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-26 18:59 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Fri, Jul 23, 2021 at 04:29:37AM -0400, Jeff King wrote:
> On Wed, Jul 21, 2021 at 03:22:34PM -0400, Taylor Blau wrote:
>
> > > > This avoids a problem that would arise in subsequent patches due to the
> > > > combination of 'git repack' reopening the object store in-process and
> > > > the multi-pack index code not checking whether a pack already exists in
> > > > the object store when calling add_pack_to_midx().
> > > >
> > > > This would ultimately lead to a cycle being created along the
> > > > 'packed_git' struct's '->next' pointer. That is obviously bad, but it
> > > > has hard-to-debug downstream effects like saying a bitmap can't be
> > > > loaded for a pack because one already exists (for the same pack).
> > >
> > > I'm not sure I completely understand the bug that this causes.
> >
> > Off-hand, I can't quite remember either. But it is important; I do have
> > a distinct memory of dropping this patch and then watching a 'git repack
> > --write-midx' (that option will be introduced in a later series) fail
> > horribly.
> >
> > If I remember correctly, the bug has to do with loading a MIDX twice in
> > the same process. When we call add_packed_git() from within
> > prepare_midx_pack(), we load the pack without caring whether or not it's
> > already loaded. So loading a MIDX twice in the same process will fail.
> >
> > So really I think that this is papering over that bug: we're just
> > removing one of the times that we happened to load a MIDX from during
> > the writing phase.
>
> Hmm, after staring at this for a bit, I've unconfused and re-confused
> myself several times.
>
> Here are some interesting bits:
>
>   - calling load_multi_pack_index() directly creates a new midx object.
>     None of its m->packs[] array will be filled in. Nor is it reachable
>     as r->objects->multi_pack_index.
>
>   - in using that midx, we end up calling prepare_midx_pack() for
>     various packs, which creates a new packed_git struct and adds it to
>     r->objects->packed_git (via install_packed_git()).
>
> So that's a bit weird already, because we have packed_git structs in
> r->objects that came from a midx that isn't r->objects->multi_pack_index.
> And then if we later call prepare_multi_pack_index(), for example as
> part of a pack reprepare, then we'd end up with duplicates.

Ah, this jogged my memory: this is a relic from when we generated MIDX
bitmaps in-process with the rest of the `repack` code. And when we did
that, we did have to call `reprepare_packed_git()` after writing the new
packs but before moving them into place.

So that's where the `reprepare_packed_git()` came from, but we don't
have any of that code anymore, since we now generate MIDX bitmaps by
invoking:

    git multi-pack-index write --bitmap --stdin-packs --refs-snapshot

as a sub-process of `git repack`; no need for any reprepare which is
what was triggering this bug.

To be sure, I reverted this patch out of GitHub's fork, and reran the
tests both in normal mode (just `make test`) and then once more with the
`GIT_TEST_MULTI_PACK_INDEX{,_WRITE_BITMAP}` environment variables set.
Unsurprisingly, it passed both times.

I'm happy to keep digging further, but I think that I'm 99% satisfied
here. Digging further involves resurrecting a much older version of this
series (and others adjacent to it), and there are probably other bugs
lurking that would be annoying to tease out.

In any case, let's drop this patch from the series. It's disappointing
that we can't run:

    git -c core.multiPackIndex= multi-pack-index write

anymore, but I guess that's no worse than the state we were in before
this patch, so I'm content to let it live on.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 09/24] midx: infer preferred pack when not given one
  2021-07-23  8:50         ` Jeff King
@ 2021-07-26 19:44           ` Taylor Blau
  0 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-26 19:44 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Fri, Jul 23, 2021 at 04:50:31AM -0400, Jeff King wrote:
> On Wed, Jul 21, 2021 at 04:16:07PM -0400, Taylor Blau wrote:
>
> > > I dunno. Like I said, I was able to follow it, so maybe it is
> > > sufficient. I'm just not sure others would be able to.
> >
> > I think that others will follow it, too. But I agree that it is
> > confusing, since we're fixing a bug that doesn't yet exist. In reality,
> > I wrote this patch after sending v1, and then reordered its position to
> > come before the implementation of MIDX bitmaps for that reason.
> >
> > So in one sense, I prefer it this way because we don't ever introduce
> > the bug.  But in another sense, it is very jarring to read about an
> > interaction that has no basis in the code (yet).
> >
> > I think that the best thing we could do without adding any significant
> > reordering would be to just call out the situation we're in. I added
> > this onto the end of the commit message which I think makes things a
> > little clearer:
> >
> >     (Note that multi-pack reachability bitmaps have yet to be
> >     implemented; so in that sense this patch is fixing a bug which does
> >     not yet exist.  But by having this patch beforehand, we can prevent
> >     the bug from ever materializing.)
>
> I do like fixing it up front. Here's my attempt at rewriting the commit
> message. I tried to omit details about pack order, and instead refer to
> the revindex code, and instead add more explanation of how this relates
> to the pack-reuse code.
>
> Something like:
>
> [...]
>
> Thoughts?

I like it, although reading it fresh I found the sentence beginning with
"So if the user did not specify a preferred pack" to be a little
confusing. To connect it back to the previous paragraph, I added:

  ... in order to avoid a situation where no pack is marked as preferred
  (breaking our assumption about the pack representing the object at the
  0th bit).

and that read out much clearer (to me at least).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 13/24] pack-bitmap: read multi-pack bitmaps
  2021-07-23 10:00         ` Jeff King
@ 2021-07-26 20:36           ` Taylor Blau
  0 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-26 20:36 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Fri, Jul 23, 2021 at 06:00:47AM -0400, Jeff King wrote:
> On Wed, Jul 21, 2021 at 07:01:16PM -0400, Taylor Blau wrote:
> > > > +			if (bitmap_is_midx(bitmap_git)) {
> > > > +				/*
> > > > +				 * Can't reuse from a non-preferred pack (see
> > > > +				 * above).
> > > > +				 */
> > > > +				if (pos + offset >= objects_nr)
> > > > +					continue;
> > > > +			}
> > > > +			try_partial_reuse(bitmap_git, pack, pos + offset, reuse, &w_curs);
> > >
> > > ...and this likewise makes sure we never go past that first pack. Good.
> > >
> > > I think this "continue" could actually be a "break", as the loop is
> > > iterating over "offset" (and "pos + offset" always gets larger). In
> > > fact, it could break out of the outer loop as well (which is
> > > incrementing "pos"). It's probably a pretty small efficiency in
> > > practice, though.
> >
> > Yeah; you're right. And we'll save up to BITS_IN_EWORD cycles of this
> > loop. (I wonder if smart-enough compilers will realize the same
> > optimization that you did and turn that `continue` into a `break`
> > automatically, but that's neither here nor there).
>
> If you break all the way out, then it saves iterating over all of those
> other words that are not in the first pack, too. I.e., if your bitmap
> has 10 million bits (for a 10-million object clone), but your first pack
> only has a million objects in it, we'll call try_partial_reuse() 9
> million extra times.
>
> Fortunately, each call is super cheap, because the first thing it does
> is check if the requested bit is past the end of the pack. Which kind of
> makes me wonder if we could simplify this further by just letting
> try_partial_reuse() tell us when there's no point going further:
>
> [snip suggested diff]

All looks pretty good to me. I think that a goto is a little easier to
read than two identical "if (ret < 0)" checks. And having a comment
makes it clearer to me than the double if statements. So I'm content do
to that instead.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing
  2021-07-26 18:59           ` Taylor Blau
@ 2021-07-26 22:14             ` Taylor Blau
  2021-07-27 17:29               ` Jeff King
  2021-07-27 17:17             ` Jeff King
  1 sibling, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-07-26 22:14 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Mon, Jul 26, 2021 at 02:59:02PM -0400, Taylor Blau wrote:
> On Fri, Jul 23, 2021 at 04:29:37AM -0400, Jeff King wrote:
> > On Wed, Jul 21, 2021 at 03:22:34PM -0400, Taylor Blau wrote:
> >
> > > > > This avoids a problem that would arise in subsequent patches due to the
> > > > > combination of 'git repack' reopening the object store in-process and
> > > > > the multi-pack index code not checking whether a pack already exists in
> > > > > the object store when calling add_pack_to_midx().
> > > > >
> > > > > This would ultimately lead to a cycle being created along the
> > > > > 'packed_git' struct's '->next' pointer. That is obviously bad, but it
> > > > > has hard-to-debug downstream effects like saying a bitmap can't be
> > > > > loaded for a pack because one already exists (for the same pack).
> > > >
> > > > I'm not sure I completely understand the bug that this causes.
> > >
> > > Off-hand, I can't quite remember either. But it is important; I do have
> > > a distinct memory of dropping this patch and then watching a 'git repack
> > > --write-midx' (that option will be introduced in a later series) fail
> > > horribly.
> > >
> > > If I remember correctly, the bug has to do with loading a MIDX twice in
> > > the same process. When we call add_packed_git() from within
> > > prepare_midx_pack(), we load the pack without caring whether or not it's
> > > already loaded. So loading a MIDX twice in the same process will fail.
> > >
> > > So really I think that this is papering over that bug: we're just
> > > removing one of the times that we happened to load a MIDX from during
> > > the writing phase.
> >
> > Hmm, after staring at this for a bit, I've unconfused and re-confused
> > myself several times.
> >
> > Here are some interesting bits:
> >
> >   - calling load_multi_pack_index() directly creates a new midx object.
> >     None of its m->packs[] array will be filled in. Nor is it reachable
> >     as r->objects->multi_pack_index.
> >
> >   - in using that midx, we end up calling prepare_midx_pack() for
> >     various packs, which creates a new packed_git struct and adds it to
> >     r->objects->packed_git (via install_packed_git()).
> >
> > So that's a bit weird already, because we have packed_git structs in
> > r->objects that came from a midx that isn't r->objects->multi_pack_index.
> > And then if we later call prepare_multi_pack_index(), for example as
> > part of a pack reprepare, then we'd end up with duplicates.
>
> Ah, this jogged my memory: this is a relic from when we generated MIDX
> bitmaps in-process with the rest of the `repack` code. And when we did
> that, we did have to call `reprepare_packed_git()` after writing the new
> packs but before moving them into place.

Actually, I take that back. You were right from the start: the way the
code is written we *can* end up calling both:

  - load_multi_pack_index, from write_midx_internal(), which sets up a
    MIDX, but does not update r->objects->multi_pack_index to point at
    it.

  - ...and prepare_multi_pack_index_one (via prepare_bitmap_git ->
    open_bitmap -> open_midx_bitmap -> get_multi_pack_index ->
    prepare_packed_git) which *also* creates a new MIDX, *and*
    updates the_repository->objects->multi_pack_index to point at it.

(The latter codepath is from the check in write_midx_internal() to see
if we already have a MIDX bitmap when the MIDX we are trying to write
already exists on disk.)

So in this scenario, we have two copies of the same MIDX open, and the
repository's single pack is opened in one of the MIDXs, but not both.
One copy of the pack is pointed at via r->objects->packed_git. Then when
we fall back to open_pack_bitmap(), we call get_all_packs(), which calls
prepare_midx_pack(), which installs the second MIDX's copy of the same
pack into the r->objects->packed_git, and we have a cycle.

I think there are a few ways to fix this bug. The most obvious is to
make install_packed_git() check for the existence of the pack in the
hashmap of installed packs before (re-)installing it. But that could be
quadratic if the hashmap has too many collisions (and the lookup tends
towards being linear in the number of keys rather than constant).

But I think that a more straightforward way would be to open the MIDX we
use when generating the MIDX with prepare_multi_pack_index_one() instead
of load_multi_pack_index() so that the resulting MIDX is pointed at by
r->objects->multi_pack_index. That would prevent the latter call from
deep within the callstack of prepare_bitmap_git() from opening another
copy and then (mistakenly) re-installing the same pack twice.

Thoughts?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 14/24] pack-bitmap: write multi-pack bitmaps
  2021-07-26 18:12       ` Taylor Blau
  2021-07-26 18:23         ` Taylor Blau
@ 2021-07-27 17:11         ` Jeff King
  2021-07-27 20:33           ` Taylor Blau
  1 sibling, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-27 17:11 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Mon, Jul 26, 2021 at 02:12:30PM -0400, Taylor Blau wrote:

> > This "> -1" struck me as a little bit funny. Perhaps ">= 0" would be a
> > more obvious way of saying "we found it"?
> 
> Sure. (I looked for other uses of oid_pos() to see what is more
> common, but there really are vanishingly few uses.) Easier to read might
> even be:
> 
>     int pos = oid_pos(...);
>     if (pos < 0)
>       return;
>     ALLOC_GROW(...);
> 
> which is what I ended up going for.

Sure, that's better still.

> [topo-sorting commits fed to bitmap writer]
>
> This comes from an investigation into why bitmap coverage had worsened
> for some repositories using MIDX bitmaps at GitHub. The real reason was
> resolved and unrelated to this, but trying to match the behavior of MIDX
> bitmaps to our existing pack bitmap setup (which uses delta-islands) was
> one strategy we tried while debugging.
>
> I actually suspect that it doesn't really matter what order we feed this
> list to bitmap_writer_select_commits() in, because the first thing that
> it does is QSORT() the incoming list of commits in date order.

Hmm, yes, I agree that it shouldn't matter for that reason (though
arguably topo order would still be better than a strict date order, it
does nothing now).

I remember looking into reasons why a single-pack bitmap versus a
midx-of-a-single-pack bitmap might not be identical, but the interesting
things turned out to be elsewhere. Did this actually change anything at
all? If so, then perhaps the "it doesn't matter" is not as true as we
are thinking. I could believe that it has a tiny impact when breaking
times between identical committer dates, though.

> But it does mirror the behavior of our previous bitmap generation
> settings, which has been running for years.
> 
> So... we could probably drop this hunk? I'd probably rather err on the
> safe side and leave this alone since it matches a system that we know to
> work well in practice.

I'd rather drop it, if we think it's doing nothing. While I do value
history in production as a sign of stability, upstream review is a good
time to make sure we understand all of the "why", and to clean things up
(e.g., another example is the questionable close_midx() stuff discussed
elsewhere).

And if we do suspect it is doing something, then IMHO the right thing is
probably still to drop it, and to introduce the feature identically to
both the midx and pack bitmap generation code paths. But that should be
a separate topic (and may actually involve fixing the QSORT to put
things in topo order rather than just date order).

> > > @@ -930,9 +1100,16 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
> > >  		for (i = 0; i < ctx.m->num_packs; i++) {
> > >  			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
> > >
> > > +			if (prepare_midx_pack(the_repository, ctx.m, i)) {
> > > +				error(_("could not load pack %s"),
> > > +				      ctx.m->pack_names[i]);
> > > +				result = 1;
> > > +				goto cleanup;
> > > +			}
> >
> > It might be worth a comment here. I can easily believe that there is
> > some later part of the bitmap generation code that assumes the packs are
> > loaded. But somebody reading this is not likely to understand why it's
> > here.
> >
> > Should this be done conditionally only if we're writing a bitmap? (That
> > might also make it obvious why we are doing it).
> 
> Ah. Actually, I don't think this was necessary before, but we *do* need
> it now because we want to compare the pack mtime's for inferring a
> preferred pack when one wasn't given. And we also need to open the pack
> indexes, too, because we care about the object counts (to make sure that
> we don't infer a preferred pack which has no objects).
> 
> Luckily, any new packs will be loaded (and likewise have their indexes
> open, too), via the the add_pack_to_midx() callback that we pass as an
> argument to for_each_file_in_pack_dir().

Hmm, OK. Your second paragraph makes it sound like we _don't_ need to do
this. But the key is "new packs". In add_pack_to_midx() we skip any
packs that are already in the existing midx, assuming they've already
been added. And we probably must do that, otherwise we end up with
duplicate structs that are not actually shared by ctx->m.

> But we could do something like this instead:
> 
> --- 8< ---
> 
> diff --git a/midx.c b/midx.c
> index 8426e1a0b1..a70a6bca81 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -1111,16 +1111,29 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  		for (i = 0; i < ctx.m->num_packs; i++) {
>  			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
> 
> -			if (prepare_midx_pack(the_repository, ctx.m, i)) {
> -				error(_("could not load pack"));
> -				result = 1;
> -				goto cleanup;
> -			}
> -
>  			ctx.info[ctx.nr].orig_pack_int_id = i;
>  			ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]);
> -			ctx.info[ctx.nr].p = ctx.m->packs[i];
> +			ctx.info[ctx.nr].p = NULL;
>  			ctx.info[ctx.nr].expired = 0;
> +
> +			if (flags & MIDX_WRITE_REV_INDEX) {
> +				/*
> +				 * If generating a reverse index, need to have
> +				 * packed_git's loaded to compare their
> +				 * mtimes and object count.
> +				 */
> +				if (prepare_midx_pack(the_repository, ctx.m, i)) {
> +					error(_("could not load pack"));
> +					result = 1;
> +					goto cleanup;
> +				}
> +
> +				if (open_pack_index(ctx.m->packs[i]))
> +					die(_("could not open index for %s"),
> +					    ctx.m->packs[i]->pack_name);
> +				ctx.info[ctx.nr].p = ctx.m->packs[i];
> +			}
> +

Yeah, was what I was suggesting: make it conditional on bitmaps (well, a
.rev index, which is more precise), and put in comments. :)

It's interesting that your earlier iteration didn't call
open_pack_index(). Is it necessary, or not? From your description, it
seems like it should be. But maybe some later step lazy-loads it? Even
if so, I can see how prepare_midx_pack() would still be required
(because we want to make sure we are using the same struct).

> [closing the midx]
> 
> The reason is kind of annoying. If we're building a MIDX bitmap
> in-process (e.g., `git multi-pack-index write --bitmap`) then we'll call
> prepare_packed_git() to build our pseudo-packing list which we pass to
> the bitmap generation machinery.
> 
> But prepare_packed_git() calls prepare_packed_git_one() ->
> for_each_file_in_pack_dir() with the prepare_pack() callback -> which
> the wants to see if the MIDX we have open already knows about a given
> pack so we avoid opening it twice.
> 
> But even though the MIDX would have gone away by this point (with the
> previous close_midx() call that is removed above), we still hold onto
> a pointer to it via the object_store's `multi_pack_index` pointer. And
> then all the way down in packfile.c:prepare_pack() we try to pass a
> now-defunct pointer as the first argument to midx_contains_pack(), and
> crash.
> 
> And clearing out that `multi_pack_index` pointer is tricky, because the
> MIDX would have to compare the odb's `object_dir` with its own (which is
> brittle in its own right), but also would have to see if that object
> store is pointing at *it*, and not some other MIDX.
> 
> So we do have to keep it open there. Which makes me wonder how this
> could possibly work on Windows, because holding the MIDX open will make
> the commit_lock_file() definitely fail. But it seems OK in the
> Windows-based CI runs?

Forgetting Windows for a moment, it seems like the same "the pointer is
bogus and we crash" is now just pushed down to the end of the function,
rather than the middle. I.e., it is safe to close the midx we got from
load_multi_pack_index(), but not one we got from
prepare_multi_pack_index_one(). So it's hard to reason about this at all
until that problem from patch 08 is resolved.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 02/24] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  2021-07-26 18:48           ` Taylor Blau
@ 2021-07-27 17:11             ` Jeff King
  0 siblings, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-07-27 17:11 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Mon, Jul 26, 2021 at 02:48:19PM -0400, Taylor Blau wrote:

> On Fri, Jul 23, 2021 at 03:37:52AM -0400, Jeff King wrote:
> > I thought about suggesting that it be called "err" or "ret" or
> > something. And then we do not have to care that fill_bitmap_commit()
> > only returns an error in the non-closed state. We are simply propagating
> > its error-return back up the stack.
> 
> Hmm. For whatever the inconvience costs us, I do like that the variable
> can be named specifically like "open" or "closed" as opposed to the more
> generic "err" or "ret".
> 
> So I'll probably keep it is unless you feel strongly (which I suspect
> you do not).

I don't.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing
  2021-07-26 18:59           ` Taylor Blau
  2021-07-26 22:14             ` Taylor Blau
@ 2021-07-27 17:17             ` Jeff King
  1 sibling, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-07-27 17:17 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Mon, Jul 26, 2021 at 02:59:02PM -0400, Taylor Blau wrote:

> > Hmm, after staring at this for a bit, I've unconfused and re-confused
> > myself several times.
> >
> > Here are some interesting bits:
> >
> >   - calling load_multi_pack_index() directly creates a new midx object.
> >     None of its m->packs[] array will be filled in. Nor is it reachable
> >     as r->objects->multi_pack_index.
> >
> >   - in using that midx, we end up calling prepare_midx_pack() for
> >     various packs, which creates a new packed_git struct and adds it to
> >     r->objects->packed_git (via install_packed_git()).
> >
> > So that's a bit weird already, because we have packed_git structs in
> > r->objects that came from a midx that isn't r->objects->multi_pack_index.
> > And then if we later call prepare_multi_pack_index(), for example as
> > part of a pack reprepare, then we'd end up with duplicates.
> 
> Ah, this jogged my memory: this is a relic from when we generated MIDX
> bitmaps in-process with the rest of the `repack` code. And when we did
> that, we did have to call `reprepare_packed_git()` after writing the new
> packs but before moving them into place.
> 
> So that's where the `reprepare_packed_git()` came from, but we don't
> have any of that code anymore, since we now generate MIDX bitmaps by
> invoking:
> 
>     git multi-pack-index write --bitmap --stdin-packs --refs-snapshot
> 
> as a sub-process of `git repack`; no need for any reprepare which is
> what was triggering this bug.

OK, that makes sense, especially given the "close_midx() leaves the
pointer bogus" stuff discussed elsewhere.

> To be sure, I reverted this patch out of GitHub's fork, and reran the
> tests both in normal mode (just `make test`) and then once more with the
> `GIT_TEST_MULTI_PACK_INDEX{,_WRITE_BITMAP}` environment variables set.
> Unsurprisingly, it passed both times.
> 
> I'm happy to keep digging further, but I think that I'm 99% satisfied
> here. Digging further involves resurrecting a much older version of this
> series (and others adjacent to it), and there are probably other bugs
> lurking that would be annoying to tease out.
> 
> In any case, let's drop this patch from the series. It's disappointing
> that we can't run:
> 
>     git -c core.multiPackIndex= multi-pack-index write
> 
> anymore, but I guess that's no worse than the state we were in before
> this patch, so I'm content to let it live on.

Great. If we can drop it, I think that is the best path forward. I think
that may simplify things for the writing patch, too, then. It should not
matter if we move close_midx() anymore, because we will not be closing
the main r->objects->multi_pack_index struct.

I do suspect we could be skipping the load _and_ close of the midx
entirely in write_midx_internal(), and just using whatever the caller
has passed in (and arguably just having most callers pass in the regular
midx struct if they want us to reuse parts of it). That might be a
cleanup we can leave for later, but it might be necessary to touch these
bits anyway (if there is still some kind of close_midx() ordering gotcha
in the function).

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing
  2021-07-26 22:14             ` Taylor Blau
@ 2021-07-27 17:29               ` Jeff King
  2021-07-27 17:36                 ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-27 17:29 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Mon, Jul 26, 2021 at 06:14:09PM -0400, Taylor Blau wrote:

> > Ah, this jogged my memory: this is a relic from when we generated MIDX
> > bitmaps in-process with the rest of the `repack` code. And when we did
> > that, we did have to call `reprepare_packed_git()` after writing the new
> > packs but before moving them into place.
> 
> Actually, I take that back. You were right from the start: the way the
> code is written we *can* end up calling both:
> 
>   - load_multi_pack_index, from write_midx_internal(), which sets up a
>     MIDX, but does not update r->objects->multi_pack_index to point at
>     it.
> 
>   - ...and prepare_multi_pack_index_one (via prepare_bitmap_git ->
>     open_bitmap -> open_midx_bitmap -> get_multi_pack_index ->
>     prepare_packed_git) which *also* creates a new MIDX, *and*
>     updates the_repository->objects->multi_pack_index to point at it.
> 
> (The latter codepath is from the check in write_midx_internal() to see
> if we already have a MIDX bitmap when the MIDX we are trying to write
> already exists on disk.)
> 
> So in this scenario, we have two copies of the same MIDX open, and the
> repository's single pack is opened in one of the MIDXs, but not both.
> One copy of the pack is pointed at via r->objects->packed_git. Then when
> we fall back to open_pack_bitmap(), we call get_all_packs(), which calls
> prepare_midx_pack(), which installs the second MIDX's copy of the same
> pack into the r->objects->packed_git, and we have a cycle.

Right, I understand how that ends up with duplicate structs for each
pack. But how do we get a cycle out of that?

> I think there are a few ways to fix this bug. The most obvious is to
> make install_packed_git() check for the existence of the pack in the
> hashmap of installed packs before (re-)installing it. But that could be
> quadratic if the hashmap has too many collisions (and the lookup tends
> towards being linear in the number of keys rather than constant).

I think it may be worth doing that anyway. You can assume the hashmap
will behave reasonably.

But it would mean that the "multi_pack_index" flag in packed_git does
not specify _which_ midx is pointing to it. At the very least, it would
need to become a ref-count (so when one midx goes away, it does not lose
its "I am part of a midx" flag).  And possibly it would need to actually
know the complete list of midx structs it's associated with (I haven't
looked at all of the uses of that flag).

That makes things sufficiently tricky that I would prefer not to
untangle it as part of this series.

> But I think that a more straightforward way would be to open the MIDX we
> use when generating the MIDX with prepare_multi_pack_index_one() instead
> of load_multi_pack_index() so that the resulting MIDX is pointed at by
> r->objects->multi_pack_index. That would prevent the latter call from
> deep within the callstack of prepare_bitmap_git() from opening another
> copy and then (mistakenly) re-installing the same pack twice.

But now the internal midx writing code can never call close_midx() on
that, because it does not own it to close. Can we simply drop the
close_midx() call there?

This would all make much more sense to me if write_midx_internal()
simply took a conceptually read-only midx as a parameter, and the caller
passed in the appropriate one (probably even using
prepare_multi_pack_index_one() to get it).

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing
  2021-07-27 17:29               ` Jeff King
@ 2021-07-27 17:36                 ` Taylor Blau
  2021-07-27 17:42                   ` Jeff King
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 17:36 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Tue, Jul 27, 2021 at 01:29:49PM -0400, Jeff King wrote:
> On Mon, Jul 26, 2021 at 06:14:09PM -0400, Taylor Blau wrote:
> > So in this scenario, we have two copies of the same MIDX open, and the
> > repository's single pack is opened in one of the MIDXs, but not both.
> > One copy of the pack is pointed at via r->objects->packed_git. Then when
> > we fall back to open_pack_bitmap(), we call get_all_packs(), which calls
> > prepare_midx_pack(), which installs the second MIDX's copy of the same
> > pack into the r->objects->packed_git, and we have a cycle.
>
> Right, I understand how that ends up with duplicate structs for each
> pack. But how do we get a cycle out of that?

Sorry, it isn't a true cycle where p->next == p.

> But now the internal midx writing code can never call close_midx() on
> that, because it does not own it to close. Can we simply drop the
> close_midx() call there?
>
> This would all make much more sense to me if write_midx_internal()
> simply took a conceptually read-only midx as a parameter, and the caller
> passed in the appropriate one (probably even using
> prepare_multi_pack_index_one() to get it).

No, we can't drop the close_midx() call there because we must close the
MIDX file on Windows before moving a new one into place. My feeling is
we should always be working on the r->objects->multi_pack_index pointer,
and calling close_object_store() there instead of close_midx().

Does that seem like a reasonable approach to you?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing
  2021-07-27 17:36                 ` Taylor Blau
@ 2021-07-27 17:42                   ` Jeff King
  2021-07-27 17:47                     ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-27 17:42 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Tue, Jul 27, 2021 at 01:36:34PM -0400, Taylor Blau wrote:

> > But now the internal midx writing code can never call close_midx() on
> > that, because it does not own it to close. Can we simply drop the
> > close_midx() call there?
> >
> > This would all make much more sense to me if write_midx_internal()
> > simply took a conceptually read-only midx as a parameter, and the caller
> > passed in the appropriate one (probably even using
> > prepare_multi_pack_index_one() to get it).
> 
> No, we can't drop the close_midx() call there because we must close the
> MIDX file on Windows before moving a new one into place. My feeling is
> we should always be working on the r->objects->multi_pack_index pointer,
> and calling close_object_store() there instead of close_midx().
> 
> Does that seem like a reasonable approach to you?

Yes, though I'd have said that it is the responsibility of the caller
(who knows we are operating with r->objects->multi_pack_index) to do
that closing. But maybe it's not possible if the rename-into-place
happens at too low a level.

BTW, yet another weirdness: close_object_store() will call close_midx()
on the outermost midx struct, ignoring o->multi_pack_index->next
entirely. So that's a leak, but also means we may not be closing the
midx we're interested in (since write_midx_internal() takes an
object-dir parameter, and we could be pointing to some other
object-dir's midx).

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing
  2021-07-27 17:42                   ` Jeff King
@ 2021-07-27 17:47                     ` Taylor Blau
  2021-07-27 17:55                       ` Jeff King
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 17:47 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Tue, Jul 27, 2021 at 01:42:35PM -0400, Jeff King wrote:
> On Tue, Jul 27, 2021 at 01:36:34PM -0400, Taylor Blau wrote:
>
> > > But now the internal midx writing code can never call close_midx() on
> > > that, because it does not own it to close. Can we simply drop the
> > > close_midx() call there?
> > >
> > > This would all make much more sense to me if write_midx_internal()
> > > simply took a conceptually read-only midx as a parameter, and the caller
> > > passed in the appropriate one (probably even using
> > > prepare_multi_pack_index_one() to get it).
> >
> > No, we can't drop the close_midx() call there because we must close the
> > MIDX file on Windows before moving a new one into place. My feeling is
> > we should always be working on the r->objects->multi_pack_index pointer,
> > and calling close_object_store() there instead of close_midx().
> >
> > Does that seem like a reasonable approach to you?
>
> Yes, though I'd have said that it is the responsibility of the caller
> (who knows we are operating with r->objects->multi_pack_index) to do
> that closing. But maybe it's not possible if the rename-into-place
> happens at too low a level.

Right; write_midx_internal() needs to access the MIDX right up until the
point that we write the new one into place, so the only place to close
it is in write_midx_internal().

> BTW, yet another weirdness: close_object_store() will call close_midx()
> on the outermost midx struct, ignoring o->multi_pack_index->next
> entirely. So that's a leak, but also means we may not be closing the
> midx we're interested in (since write_midx_internal() takes an
> object-dir parameter, and we could be pointing to some other
> object-dir's midx).

Yuck, this is a mess. I'm tempted to say that we should be closing the
MIDX that we're operating on inside of write_midx_internal() so we can
write, but then declaring the whole object store to be bunk and calling
close_object_store() before leaving the function. Of course, one of
those steps should be closing the inner-most MIDX before closing the
next one and so on.

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing
  2021-07-27 17:47                     ` Taylor Blau
@ 2021-07-27 17:55                       ` Jeff King
  2021-07-27 20:05                         ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-27 17:55 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Tue, Jul 27, 2021 at 01:47:40PM -0400, Taylor Blau wrote:

> > BTW, yet another weirdness: close_object_store() will call close_midx()
> > on the outermost midx struct, ignoring o->multi_pack_index->next
> > entirely. So that's a leak, but also means we may not be closing the
> > midx we're interested in (since write_midx_internal() takes an
> > object-dir parameter, and we could be pointing to some other
> > object-dir's midx).
> 
> Yuck, this is a mess. I'm tempted to say that we should be closing the
> MIDX that we're operating on inside of write_midx_internal() so we can
> write, but then declaring the whole object store to be bunk and calling
> close_object_store() before leaving the function. Of course, one of
> those steps should be closing the inner-most MIDX before closing the
> next one and so on.

That gets even weirder when you look at other callers of
write_midx_internal(). For instance, expire_midx_packs() is calling
load_multi_pack_index() directly, and then passing it to
write_midx_internal().

So closing the whole object store there is likewise weird.

I actually think having write_midx_internal() open up a new midx is
reasonable-ish. It's just that:

  - it's weird when it stuffs duplicate packs into the
    r->objects->packed_git list. But AFAICT that's not actually hurting
    anything?

  - we do need to make sure that the midx is closed (not just our copy,
    but any other open copies that happen to be in the same process) in
    order for things to work on Windows.

So I guess because of the second point, the internal midx write probably
needs to be calling close_object_store(). But because other callers use
load_multi_pack_index(), it probably needs to be closing the one that is
passed in, too! But of course not double-closing it if it did come from
the regular object store. One easy easy way to avoid that is to just
open a separate one.

I have some spicy takes on how midx's should have been designed, but I
think it's probably not productive to rant about it at this point. ;)

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing
  2021-07-27 17:55                       ` Jeff King
@ 2021-07-27 20:05                         ` Taylor Blau
  2021-07-28 17:46                           ` Jeff King
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 20:05 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Tue, Jul 27, 2021 at 01:55:52PM -0400, Jeff King wrote:
> On Tue, Jul 27, 2021 at 01:47:40PM -0400, Taylor Blau wrote:
>
> > > BTW, yet another weirdness: close_object_store() will call close_midx()
> > > on the outermost midx struct, ignoring o->multi_pack_index->next
> > > entirely. So that's a leak, but also means we may not be closing the
> > > midx we're interested in (since write_midx_internal() takes an
> > > object-dir parameter, and we could be pointing to some other
> > > object-dir's midx).
> >
> > Yuck, this is a mess. I'm tempted to say that we should be closing the
> > MIDX that we're operating on inside of write_midx_internal() so we can
> > write, but then declaring the whole object store to be bunk and calling
> > close_object_store() before leaving the function. Of course, one of
> > those steps should be closing the inner-most MIDX before closing the
> > next one and so on.
>
> That gets even weirder when you look at other callers of
> write_midx_internal(). For instance, expire_midx_packs() is calling
> load_multi_pack_index() directly, and then passing it to
> write_midx_internal().
>
> So closing the whole object store there is likewise weird.
>
> I actually think having write_midx_internal() open up a new midx is
> reasonable-ish. It's just that:
>
>   - it's weird when it stuffs duplicate packs into the
>     r->objects->packed_git list. But AFAICT that's not actually hurting
>     anything?

It is hurting us when we try to write a MIDX bitmap, because we try to
see if one already exists. And to do that, we call prepare_bitmap_git(),
which tries to call open_pack_bitmap_1 on *each* pack in the packed_git
list. Critically, prepare_bitmap_git() errors out if it is called with a
bitmap_git that has a non-NULL `->pack` pointer.

So they aren't a cycle in the sense that `p->next == p`, but it does
cause problems for us nonetheless.

I stepped away from my computer for an hour or so and thought about
this, and I think that the solution is two-fold:

  - We should be more careful about freeing up the ->next pointers of a
    MIDX, and releasing the memory we allocated to hold each MIDX struct
    in the first place.

  - We should always be operating on the repository's
    r->objects->multi_pack_index, or any other MIDX that can be reached
    via walking the `->next` pointers. If we do that consistently, then
    we'll only have at most one instance of a MIDX struct corresponding
    to each MIDX file on disk.

In the reroll that I'll send shortly, those are:

  - https://github.com/ttaylorr/git/commit/61a617715f3827401522c7b08b50bb6866f2a4e9
  - https://github.com/ttaylorr/git/commit/fd15ecf47c57ce4ff0d31621c2c9f61ff7a74939

respectively. It resolves my issue locally which was that I was
previously unable to run:

    GIT_TEST_MULTI_PACK_INDEX=1 \
    GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1 \
      t7700-repack.sh

without t7700.13 failing on me (it previously complained about a
"duplicate" .bitmap file, which is a side-effect of placing duplicates
in the packed_git list, not a true duplicate .bitmap on disk).

I'm waiting on a CI run [1], but I'm relatively happy with the result. I
think we could do something similar to the MIDX code like we did to the
commit-graph code in [2], but I'm reasonably happy with where we are
now.

Thanks,
Taylor

[1]: https://github.com/ttaylorr/git/actions/runs/1072513087
[2]: https://lore.kernel.org/git/cover.1580424766.git.me@ttaylorr.com/

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 14/24] pack-bitmap: write multi-pack bitmaps
  2021-07-27 17:11         ` Jeff King
@ 2021-07-27 20:33           ` Taylor Blau
  2021-07-28 17:52             ` Jeff King
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 20:33 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Tue, Jul 27, 2021 at 01:11:25PM -0400, Jeff King wrote:
> On Mon, Jul 26, 2021 at 02:12:30PM -0400, Taylor Blau wrote:
> > But it does mirror the behavior of our previous bitmap generation
> > settings, which has been running for years.
> >
> > So... we could probably drop this hunk? I'd probably rather err on the
> > safe side and leave this alone since it matches a system that we know to
> > work well in practice.
>
> I'd rather drop it, if we think it's doing nothing. While I do value
> history in production as a sign of stability, upstream review is a good
> time to make sure we understand all of the "why", and to clean things up
> (e.g., another example is the questionable close_midx() stuff discussed
> elsewhere).

OK, I think that's a very reasonable way of thinking about it, so I'd
rather just get rid of it (not to mention that I really doubt it's doing
much of anything in the first place).

> > Luckily, any new packs will be loaded (and likewise have their indexes
> > open, too), via the the add_pack_to_midx() callback that we pass as an
> > argument to for_each_file_in_pack_dir().
>
> Hmm, OK. Your second paragraph makes it sound like we _don't_ need to do
> this. But the key is "new packs". In add_pack_to_midx() we skip any
> packs that are already in the existing midx, assuming they've already
> been added. And we probably must do that, otherwise we end up with
> duplicate structs that are not actually shared by ctx->m.

Exactly.

> It's interesting that your earlier iteration didn't call
> open_pack_index(). Is it necessary, or not? From your description, it
> seems like it should be. But maybe some later step lazy-loads it? Even
> if so, I can see how prepare_midx_pack() would still be required
> (because we want to make sure we are using the same struct).

It's only necessary now (at least for determining a preferred pack if
the caller didn't specify one with `--preferred-pack`) because we care
about reading the `num_objects` field, which the index must be loaded
for.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* [PATCH v3 00/25] multi-pack reachability bitmaps
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (22 preceding siblings ...)
  2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
@ 2021-07-27 21:19 ` Taylor Blau
  2021-07-27 21:19   ` [PATCH v3 01/25] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
                     ` (25 more replies)
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
  25 siblings, 26 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:19 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Here is another reroll of my series to implement multi-pack reachability
bitmaps, based on reviews from Ævar and Peff.

Notable changes since last time are summarized here (though a complete
range-diff is below as well):

  - Preventing multiple copies of the same MIDX from being opened, see the new
    patches 8-9 for details.
  - Ensuring preferred packs are non-empty.
  - Simplifying a handful of routines to read from MIDX bitmaps.
  - General code clean-up, removing a few stray hunks, some commit message
    tweaking.

This reroll also dropped three patches present in v2, namely:

  - A patch to build Documentation/technical/bitmap-format.txt (the document is
    poorly formatted and the generated HTML isn't readable).
  - A patch to make some MIDX-related functions non-static (the required
    functions are instead exposed in the patches that first make use of them).
  - A patch to respect `core.multiPackIndex` when writing MIDXs (see patches 8-9
    for the replacement).

Thanks in advance for your review. I think Peff still wanted to read through
patches 16-25, but that the first 15 or so should be in pretty good shape by
now.

Jeff King (2):
  t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
  t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP

Taylor Blau (23):
  pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
  pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  pack-bitmap-write.c: free existing bitmaps
  Documentation: describe MIDX-based bitmaps
  midx: clear auxiliary .rev after replacing the MIDX
  midx: reject empty `--preferred-pack`'s
  midx: infer preferred pack when not given one
  midx: close linked MIDXs, avoid leaking memory
  midx: avoid opening multiple MIDXs when writing
  pack-bitmap.c: introduce 'bitmap_num_objects()'
  pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
  pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
  pack-bitmap.c: avoid redundant calls to try_partial_reuse
  pack-bitmap: read multi-pack bitmaps
  pack-bitmap: write multi-pack bitmaps
  t5310: move some tests to lib-bitmap.sh
  t/helper/test-read-midx.c: add --checksum mode
  t5326: test multi-pack bitmap behavior
  t5319: don't write MIDX bitmaps in t5319
  t7700: update to work with MIDX bitmap test knob
  midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  p5310: extract full and partial bitmap tests
  p5326: perf tests for MIDX bitmaps

 Documentation/git-multi-pack-index.txt       |  18 +-
 Documentation/technical/bitmap-format.txt    |  71 ++-
 Documentation/technical/multi-pack-index.txt |  10 +-
 builtin/multi-pack-index.c                   |   2 +
 builtin/pack-objects.c                       |   8 +-
 builtin/repack.c                             |  12 +-
 ci/run-build-and-tests.sh                    |   1 +
 midx.c                                       | 321 +++++++++++-
 midx.h                                       |   5 +
 pack-bitmap-write.c                          |  79 ++-
 pack-bitmap.c                                | 499 ++++++++++++++++---
 pack-bitmap.h                                |   9 +-
 packfile.c                                   |   2 +-
 t/README                                     |   4 +
 t/helper/test-read-midx.c                    |  16 +-
 t/lib-bitmap.sh                              | 240 +++++++++
 t/perf/lib-bitmap.sh                         |  69 +++
 t/perf/p5310-pack-bitmaps.sh                 |  65 +--
 t/perf/p5326-multi-pack-bitmaps.sh           |  43 ++
 t/t0410-partial-clone.sh                     |  12 +-
 t/t5310-pack-bitmaps.sh                      | 231 +--------
 t/t5319-multi-pack-index.sh                  |  20 +-
 t/t5326-multi-pack-bitmaps.sh                | 277 ++++++++++
 t/t7700-repack.sh                            |  18 +-
 24 files changed, 1596 insertions(+), 436 deletions(-)
 create mode 100644 t/perf/lib-bitmap.sh
 create mode 100755 t/perf/p5326-multi-pack-bitmaps.sh
 create mode 100755 t/t5326-multi-pack-bitmaps.sh

Range-diff against v2:
 1:  a18baeb0b4 !  1:  fa4cbed48e pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
    @@ pack-bitmap.c: void count_bitmap_commit_list(struct bitmap_index *bitmap_git,
     +		bitmaps_nr++;
     +	}
     +
    -+	if (!bitmap_type)
    ++	if (bitmap_type == OBJ_NONE)
     +		die("object %s not found in type bitmaps",
     +		    oid_to_hex(&obj->oid));
     +
 2:  3e637d9ec8 =  2:  2b15c1fc5c pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
 3:  490d733d12 =  3:  2ad513a230 pack-bitmap-write.c: free existing bitmaps
 4:  b0bb2e8051 <  -:  ---------- Documentation: build 'technical/bitmap-format' by default
 5:  64a260e0c6 !  4:  8da5de7c24 Documentation: describe MIDX-based bitmaps
    @@ Documentation/technical/bitmap-format.txt
     +
     +		o1 <= o2 <==> pack(o1) <= pack(o2) /\ offset(o1) <= offset(o2)
     +
    -+	The ordering between packs is done lexicographically by the pack name,
    -+	with the exception of the preferred pack, which sorts ahead of all other
    -+	packs.
    ++	The ordering between packs is done according to the MIDX's .rev file.
    ++	Notably, the preferred pack sorts ahead of all other packs.
     +
     +The on-disk representation (described below) of a bitmap is the same regardless
     +of whether or not that bitmap belongs to a packfile or a MIDX. The only
 6:  b3a12424d7 <  -:  ---------- midx: make a number of functions non-static
 7:  1448ca0d2b =  5:  49297f57ed midx: clear auxiliary .rev after replacing the MIDX
 8:  dfd1daacc5 <  -:  ---------- midx: respect 'core.multiPackIndex' when writing
 -:  ---------- >  6:  c5513f2a75 midx: reject empty `--preferred-pack`'s
 9:  9495f6869d !  7:  53ef0a6d67 midx: infer preferred pack when not given one
    @@ Commit message
         Not specifying a preferred pack can cause serious problems with
         multi-pack reachability bitmaps, because these bitmaps rely on having at
         least one pack from which all duplicates are selected. Not having such a
    -    pack causes problems with the pack reuse code (e.g., like assuming that
    -    a base object was sent from that pack via reuse when in fact the base
    -    was selected from a different pack).
    +    pack causes problems with the code in pack-objects to reuse packs
    +    verbatim (e.g., that code assumes that a delta object in a chunk of pack
    +    sent verbatim will have its base object sent from the same pack).
     
         So why does not marking a pack preferred cause problems here? The reason
         is roughly as follows:
    @@ Commit message
             later).
     
           - The psuedo pack-order (described in
    -        Documentation/technical/bitmap-format.txt) is computed by
    +        Documentation/technical/pack-format.txt under the section
    +        "multi-pack-index reverse indexes") is computed by
             midx_pack_order(), and sorts by pack ID and pack offset, with
             preferred packs sorting first.
     
    @@ Commit message
         order, which the bitmap code will treat as the preferred one) did *not*
         have all duplicate objects resolved in its favor, resulting in breakage.
     
    -    The fix is simple: pick a (semi-arbitrary) preferred pack when none was
    -    specified. This forces that pack to have duplicates resolved in its
    -    favor, and (critically) to sort first in pseudo-pack order.
    -    Unfortunately, testing this behavior portably isn't possible, since it
    -    depends on readdir() order which isn't guaranteed by POSIX.
    +    The fix is simple: pick a (semi-arbitrary, non-empty) preferred pack
    +    when none was specified. This forces that pack to have duplicates
    +    resolved in its favor, and (critically) to sort first in pseudo-pack
    +    order.  Unfortunately, testing this behavior portably isn't possible,
    +    since it depends on readdir() order which isn't guaranteed by POSIX.
    +
    +    (Note that multi-pack reachability bitmaps have yet to be implemented;
    +    so in that sense this patch is fixing a bug which does not yet exist.
    +    But by having this patch beforehand, we can prevent the bug from ever
    +    materializing.)
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
     +			warning(_("unknown preferred pack: '%s'"),
     +				preferred_pack_name);
     +	} else if (ctx.nr && (flags & MIDX_WRITE_REV_INDEX)) {
    -+		time_t oldest = ctx.info[0].p->mtime;
    ++		struct packed_git *oldest = ctx.info[ctx.preferred_pack_idx].p;
     +		ctx.preferred_pack_idx = 0;
     +
     +		if (packs_to_drop && packs_to_drop->nr)
    @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
     +		 * (and not another pack containing a duplicate)
     +		 */
     +		for (i = 1; i < ctx.nr; i++) {
    -+			time_t mtime = ctx.info[i].p->mtime;
    -+			if (mtime < oldest) {
    -+				oldest = mtime;
    ++			struct packed_git *p = ctx.info[i].p;
    ++
    ++			if (!oldest->num_objects || p->mtime < oldest->mtime) {
    ++				oldest = p;
     +				ctx.preferred_pack_idx = i;
     +			}
     +		}
    ++
    ++		if (!oldest->num_objects) {
    ++			/*
    ++			 * If all packs are empty; unset the preferred index.
    ++			 * This is acceptable since there will be no duplicate
    ++			 * objects to resolve, so the preferred value doesn't
    ++			 * matter.
    ++			 */
    ++			ctx.preferred_pack_idx = -1;
    ++		}
     +	} else {
     +		/*
     +		 * otherwise don't mark any pack as preferred to avoid
    @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
     +		ctx.preferred_pack_idx = -1;
      	}
      
    - 	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
    + 	if (ctx.preferred_pack_idx > -1) {
     @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack_index *
      						      ctx.info, ctx.nr,
      						      sizeof(*ctx.info),
 -:  ---------- >  8:  114773d9cd midx: close linked MIDXs, avoid leaking memory
 -:  ---------- >  9:  40cff5beb5 midx: avoid opening multiple MIDXs when writing
10:  373aa47528 = 10:  ca7f726abf pack-bitmap.c: introduce 'bitmap_num_objects()'
11:  ac1f46aa1f ! 11:  67e6897a34 pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
    @@ pack-bitmap.c: static inline uint8_t read_u8(const unsigned char *buffer, size_t
      
      #define MAX_XOR_OFFSET 160
      
    -+static void nth_bitmap_object_oid(struct bitmap_index *index,
    -+				  struct object_id *oid,
    -+				  uint32_t n)
    ++static int nth_bitmap_object_oid(struct bitmap_index *index,
    ++				 struct object_id *oid,
    ++				 uint32_t n)
     +{
    -+	nth_packed_object_id(oid, index->pack, n);
    ++	return nth_packed_object_id(oid, index->pack, n);
     +}
     +
      static int load_bitmap_entries_v1(struct bitmap_index *index)
    @@ pack-bitmap.c: static int load_bitmap_entries_v1(struct bitmap_index *index)
      		flags = read_u8(index->map, &index->map_pos);
      
     -		if (nth_packed_object_id(&oid, index->pack, commit_idx_pos) < 0)
    --			return error("corrupt ewah bitmap: commit index %u out of range",
    --				     (unsigned)commit_idx_pos);
    -+		nth_bitmap_object_oid(index, &oid, commit_idx_pos);
    ++		if (nth_bitmap_object_oid(index, &oid, commit_idx_pos) < 0)
    + 			return error("corrupt ewah bitmap: commit index %u out of range",
    + 				     (unsigned)commit_idx_pos);
      
    - 		bitmap = read_bitmap_1(index);
    - 		if (!bitmap)
     @@ pack-bitmap.c: static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
      		off_t ofs = pack_pos_to_offset(pack, pos);
      		if (packed_object_info(the_repository, pack, ofs, &oi) < 0) {
12:  c474d2eda5 ! 12:  743a1a138e pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
    @@ Commit message
         'pack.preferBitmapTips' configuration. This patch prepares the
         multi-pack bitmap code to respect this configuration, too.
     
    -    Since the multi-pack bitmap code already does a traversal of all
    -    references (in order to discover the set of reachable commits in the
    -    multi-pack index), it is more efficient to check whether or not each
    -    reference is a suffix of any value of 'pack.preferBitmapTips' rather
    -    than do an additional traversal.
    +    The yet-to-be implemented code will find that it is more efficient to
    +    check whether each reference contains a prefix found in the configured
    +    set of values rather than doing an additional traversal.
     
    -    Implement a function 'bitmap_is_preferred_refname()' which does just
    -    that. The caller will be added in a subsequent patch.
    +    Implement a function 'bitmap_is_preferred_refname()' which will perform
    +    that check. Its caller will be added in a subsequent patch.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
 -:  ---------- > 13:  a3b641b3e6 pack-bitmap.c: avoid redundant calls to try_partial_reuse
13:  7d44ba6299 ! 14:  141ff83275 pack-bitmap: read multi-pack bitmaps
    @@ Commit message
         in a MIDX.
     
         Note that there are currently no writers who write multi-pack bitmaps,
    -    and that this will be implemented in the subsequent commit.
    +    and that this will be implemented in the subsequent commit. Note also
    +    that get_midx_checksum() and get_midx_filename() are made non-static so
    +    they can be called from pack-bitmap.c.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    @@ builtin/pack-objects.c: static void write_reused_pack(struct hashfile *f)
      			display_progress(progress_state, ++written);
      		}
     
    + ## midx.c ##
    +@@ midx.c: static uint8_t oid_version(void)
    + 	}
    + }
    + 
    +-static const unsigned char *get_midx_checksum(struct multi_pack_index *m)
    ++const unsigned char *get_midx_checksum(struct multi_pack_index *m)
    + {
    + 	return m->data + m->data_len - the_hash_algo->rawsz;
    + }
    + 
    +-static char *get_midx_filename(const char *object_dir)
    ++char *get_midx_filename(const char *object_dir)
    + {
    + 	return xstrfmt("%s/pack/multi-pack-index", object_dir);
    + }
    +
    + ## midx.h ##
    +@@ midx.h: struct multi_pack_index {
    + #define MIDX_PROGRESS     (1 << 0)
    + #define MIDX_WRITE_REV_INDEX (1 << 1)
    + 
    ++const unsigned char *get_midx_checksum(struct multi_pack_index *m);
    ++char *get_midx_filename(const char *object_dir);
    + char *get_midx_rev_filename(struct multi_pack_index *m);
    + 
    + struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
    +
      ## pack-bitmap-write.c ##
     @@ pack-bitmap-write.c: void bitmap_writer_show_progress(int show)
      }
    @@ pack-bitmap.c: static int load_bitmap_header(struct bitmap_index *index)
      	index->map_pos += header_size;
      	return 0;
      }
    -@@ pack-bitmap.c: static void nth_bitmap_object_oid(struct bitmap_index *index,
    - 				  struct object_id *oid,
    - 				  uint32_t n)
    +@@ pack-bitmap.c: static int nth_bitmap_object_oid(struct bitmap_index *index,
    + 				 struct object_id *oid,
    + 				 uint32_t n)
      {
    --	nth_packed_object_id(oid, index->pack, n);
     +	if (index->midx)
    -+		nth_midxed_object_oid(oid, index->midx, n);
    -+	else
    -+		nth_packed_object_id(oid, index->pack, n);
    ++		return nth_midxed_object_oid(oid, index->midx, n) ? 0 : -1;
    + 	return nth_packed_object_id(oid, index->pack, n);
      }
      
    - static int load_bitmap_entries_v1(struct bitmap_index *index)
     @@ pack-bitmap.c: static int load_bitmap_entries_v1(struct bitmap_index *index)
      	return 0;
      }
    @@ pack-bitmap.c: static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, st
      		warning("ignoring extra bitmap file: %s", packfile->pack_name);
      		close(fd);
      		return -1;
    - 	}
    - 
    -+	if (!is_pack_valid(packfile)) {
    -+		close(fd);
    -+		return -1;
    -+	}
    -+
    - 	bitmap_git->pack = packfile;
    - 	bitmap_git->map_size = xsize_t(st.st_size);
    - 	bitmap_git->map = xmmap(NULL, bitmap_git->map_size, PROT_READ, MAP_PRIVATE, fd, 0);
     @@ pack-bitmap.c: static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
      	return 0;
      }
    @@ pack-bitmap.c: static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, st
     +		uint32_t i;
     +		int ret;
     +
    -+		ret = load_midx_revindex(bitmap_git->midx);
    -+		if (ret)
    -+			return ret;
    -+
    ++		/*
    ++		 * The multi-pack-index's .rev file is already loaded via
    ++		 * open_pack_bitmap_1().
    ++		 *
    ++		 * But we still need to open the individual pack .rev files,
    ++		 * since we will need to make use of them in pack-objects.
    ++		 */
     +		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
     +			if (prepare_midx_pack(the_repository, bitmap_git->midx, i))
     +				die(_("load_reverse_index: could not open pack"));
    @@ pack-bitmap.c: static int open_pack_bitmap(struct repository *r,
      
     -	if (!open_pack_bitmap(r, bitmap_git) && !load_pack_bitmap(bitmap_git))
     +	if (!open_bitmap(r, bitmap_git) && !load_bitmap(bitmap_git))
    ++		return bitmap_git;
    ++
    ++	free_bitmap_index(bitmap_git);
    ++	return NULL;
    ++}
    ++
    ++struct bitmap_index *prepare_midx_bitmap_git(struct repository *r,
    ++					     struct multi_pack_index *midx)
    ++{
    ++	struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
    ++
    ++	if (!open_midx_bitmap_1(bitmap_git, midx) && !load_bitmap(bitmap_git))
      		return bitmap_git;
      
      	free_bitmap_index(bitmap_git);
    @@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
      
      	object_array_clear(&revs->pending);
     @@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
    - }
    - 
    - static void try_partial_reuse(struct bitmap_index *bitmap_git,
    -+			      struct packed_git *pack,
    - 			      size_t pos,
    - 			      struct bitmap *reuse,
    - 			      struct pack_window **w_curs)
    +  * reused, but you can keep feeding bits.
    +  */
    + static int try_partial_reuse(struct bitmap_index *bitmap_git,
    ++			     struct packed_git *pack,
    + 			     size_t pos,
    + 			     struct bitmap *reuse,
    + 			     struct pack_window **w_curs)
      {
     -	off_t offset, header;
     +	off_t offset, delta_obj_offset;
    @@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
      	unsigned long size;
      
     -	if (pos >= bitmap_num_objects(bitmap_git))
    --		return; /* not actually in the pack or MIDX */
    +-		return -1; /* not actually in the pack or MIDX */
     +	/*
     +	 * try_partial_reuse() is called either on (a) objects in the
     +	 * bitmapped pack (in the case of a single-pack bitmap) or (b)
    @@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
     -	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
     -	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
     +	if (pos >= pack->num_objects)
    -+		return; /* not actually in the pack or MIDX preferred pack */
    ++		return -1; /* not actually in the pack or MIDX preferred pack */
     +
     +	offset = delta_obj_offset = pack_pos_to_offset(pack, pos);
     +	type = unpack_object_header(pack, w_curs, &offset, &size);
      	if (type < 0)
    - 		return; /* broken packfile, punt */
    + 		return -1; /* broken packfile, punt */
      
    -@@ pack-bitmap.c: static void try_partial_reuse(struct bitmap_index *bitmap_git,
    +@@ pack-bitmap.c: static int try_partial_reuse(struct bitmap_index *bitmap_git,
      		 * and the normal slow path will complain about it in
      		 * more detail.
      		 */
    @@ pack-bitmap.c: static void try_partial_reuse(struct bitmap_index *bitmap_git,
     +		base_offset = get_delta_base(pack, w_curs, &offset, type,
     +					     delta_obj_offset);
      		if (!base_offset)
    - 			return;
    + 			return 0;
     -		if (offset_to_pack_pos(bitmap_git->pack, base_offset, &base_pos) < 0)
     +		if (offset_to_pack_pos(pack, base_offset, &base_pos) < 0)
    - 			return;
    + 			return 0;
      
      		/*
    -@@ pack-bitmap.c: static void try_partial_reuse(struct bitmap_index *bitmap_git,
    - 	bitmap_set(reuse, pos);
    +@@ pack-bitmap.c: static int try_partial_reuse(struct bitmap_index *bitmap_git,
    + 	return 0;
      }
      
     +static uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git)
    @@ pack-bitmap.c: int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitma
      				break;
      
      			offset += ewah_bit_ctz64(word >> offset);
    --			try_partial_reuse(bitmap_git, pos + offset, reuse, &w_curs);
    -+			if (bitmap_is_midx(bitmap_git)) {
    -+				/*
    -+				 * Can't reuse from a non-preferred pack (see
    -+				 * above).
    -+				 */
    -+				if (pos + offset >= objects_nr)
    -+					continue;
    -+			}
    -+			try_partial_reuse(bitmap_git, pack, pos + offset, reuse, &w_curs);
    - 		}
    - 	}
    - 
    +-			if (try_partial_reuse(bitmap_git, pos + offset, reuse,
    +-					      &w_curs) < 0) {
    ++			if (try_partial_reuse(bitmap_git, pack, pos + offset,
    ++					      reuse, &w_curs) < 0) {
    + 				/*
    + 				 * try_partial_reuse indicated we couldn't reuse
    + 				 * any bits, so there is no point in trying more
     @@ pack-bitmap.c: int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
      	 * need to be handled separately.
      	 */
    @@ pack-bitmap.c: static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_
      {
      	struct bitmap *result = bitmap_git->result;
     -	struct packed_git *pack = bitmap_git->pack;
    -+	struct packed_git *pack;
      	off_t total = 0;
      	struct ewah_iterator it;
      	eword_t filter;
     @@ pack-bitmap.c: static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
    + 			continue;
    + 
    + 		for (offset = 0; offset < BITS_IN_EWORD; offset++) {
    +-			size_t pos;
    +-
    + 			if ((word >> offset) == 0)
      				break;
      
      			offset += ewah_bit_ctz64(word >> offset);
     -			pos = base + offset;
    +-			total += pack_pos_to_offset(pack, pos + 1) -
    +-				 pack_pos_to_offset(pack, pos);
     +
     +			if (bitmap_is_midx(bitmap_git)) {
     +				uint32_t pack_pos;
     +				uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, base + offset);
    -+				uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
     +				off_t offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
     +
    -+				pack = bitmap_git->midx->packs[pack_id];
    ++				uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
    ++				struct packed_git *pack = bitmap_git->midx->packs[pack_id];
     +
     +				if (offset_to_pack_pos(pack, offset, &pack_pos) < 0) {
     +					struct object_id oid;
     +					nth_midxed_object_oid(&oid, bitmap_git->midx, midx_pos);
     +
    -+					die(_("could not find %s in pack #%"PRIu32" at offset %"PRIuMAX),
    ++					die(_("could not find %s in pack %s at offset %"PRIuMAX),
     +					    oid_to_hex(&oid),
    -+					    pack_id,
    ++					    pack->pack_name,
     +					    (uintmax_t)offset);
     +				}
     +
    -+				pos = pack_pos;
    ++				total += pack_pos_to_offset(pack, pack_pos + 1) - offset;
     +			} else {
    -+				pack = bitmap_git->pack;
    -+				pos = base + offset;
    ++				size_t pos = base + offset;
    ++				total += pack_pos_to_offset(bitmap_git->pack, pos + 1) -
    ++					 pack_pos_to_offset(bitmap_git->pack, pos);
     +			}
    -+
    - 			total += pack_pos_to_offset(pack, pos + 1) -
    - 				 pack_pos_to_offset(pack, pos);
      		}
    + 	}
    + 
     @@ pack-bitmap.c: off_t get_disk_usage_from_bitmap(struct bitmap_index *bitmap_git,
      	return total;
      }
    @@ pack-bitmap.c: off_t get_disk_usage_from_bitmap(struct bitmap_index *bitmap_git,
     +{
     +	return !!bitmap_git->midx;
     +}
    -+
    -+off_t bitmap_pack_offset(struct bitmap_index *bitmap_git, uint32_t pos)
    -+{
    -+	if (bitmap_is_midx(bitmap_git))
    -+		return nth_midxed_offset(bitmap_git->midx,
    -+					 pack_pos_to_midx(bitmap_git->midx, pos));
    -+	return nth_packed_object_offset(bitmap_git->pack,
    -+					pack_pos_to_index(bitmap_git->pack, pos));
    -+}
     +
      const struct string_list *bitmap_preferred_tips(struct repository *r)
      {
      	return repo_config_get_value_multi(r, "pack.preferbitmaptips");
     
      ## pack-bitmap.h ##
    +@@ pack-bitmap.h: typedef int (*show_reachable_fn)(
    + struct bitmap_index;
    + 
    + struct bitmap_index *prepare_bitmap_git(struct repository *r);
    ++struct bitmap_index *prepare_midx_bitmap_git(struct repository *r,
    ++					     struct multi_pack_index *midx);
    + void count_bitmap_commit_list(struct bitmap_index *, uint32_t *commits,
    + 			      uint32_t *trees, uint32_t *blobs, uint32_t *tags);
    + void traverse_bitmap_commit_list(struct bitmap_index *,
     @@ pack-bitmap.h: void bitmap_writer_finish(struct pack_idx_entry **index,
      			  uint32_t index_nr,
      			  const char *filename,
    @@ pack-bitmap.h: void bitmap_writer_finish(struct pack_idx_entry **index,
     +char *pack_bitmap_filename(struct packed_git *p);
     +
     +int bitmap_is_midx(struct bitmap_index *bitmap_git);
    -+off_t bitmap_pack_offset(struct bitmap_index *bitmap_git, uint32_t pos);
      
      const struct string_list *bitmap_preferred_tips(struct repository *r);
      int bitmap_is_preferred_refname(struct repository *r, const char *refname);
14:  a8cec2463d ! 15:  54600b5814 pack-bitmap: write multi-pack bitmaps
    @@ Documentation/git-multi-pack-index.txt: SYNOPSIS
      DESCRIPTION
      -----------
     @@ Documentation/git-multi-pack-index.txt: write::
    - 		multiple packs contain the same object. If not given,
    - 		ties are broken in favor of the pack with the lowest
    - 		mtime.
    + 		multiple packs contain the same object. `<pack>` must
    + 		contain at least one object. If not given, ties are
    + 		broken in favor of the pack with the lowest mtime.
     +
     +	--[no-]bitmap::
     +		Control whether or not a multi-pack bitmap is written.
    @@ Documentation/git-multi-pack-index.txt: EXAMPLES
     +corresponding bitmap.
     ++
     +-------------------------------------------------------------
    -+$ git multi-pack-index write --preferred-pack <pack> --bitmap
    ++$ git multi-pack-index write --preferred-pack=<pack> --bitmap
     +-------------------------------------------------------------
     +
      * Write a MIDX file for the packfiles in an alternate object store.
    @@ midx.c
      
      #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
      #define MIDX_VERSION 1
    -@@ midx.c: static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
    - static void clear_midx_files_ext(struct repository *r, const char *ext,
    - 				 unsigned char *keep_hash);
    +@@ midx.c: static int midx_checksum_valid(struct multi_pack_index *m)
    + 	return hashfile_checksum_valid(m->data, m->data_len);
    + }
      
     +static void prepare_midx_packing_data(struct packing_data *pdata,
     +				      struct write_midx_context *ctx)
    @@ midx.c: static void write_midx_reverse_index(char *midx_name, unsigned char *mid
     +static void bitmap_show_commit(struct commit *commit, void *_data)
     +{
     +	struct bitmap_commit_cb *data = _data;
    -+	if (oid_pos(&commit->object.oid, data->ctx->entries,
    -+		    data->ctx->entries_nr,
    -+		    bitmap_oid_access) > -1) {
    -+		ALLOC_GROW(data->commits, data->commits_nr + 1,
    -+			   data->commits_alloc);
    -+		data->commits[data->commits_nr++] = commit;
    -+	}
    ++	int pos = oid_pos(&commit->object.oid, data->ctx->entries,
    ++			  data->ctx->entries_nr,
    ++			  bitmap_oid_access);
    ++	if (pos < 0)
    ++		return;
    ++
    ++	ALLOC_GROW(data->commits, data->commits_nr + 1, data->commits_alloc);
    ++	data->commits[data->commits_nr++] = commit;
     +}
     +
     +static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr_p,
     +						    struct write_midx_context *ctx)
     +{
     +	struct rev_info revs;
    -+	struct bitmap_commit_cb cb;
    ++	struct bitmap_commit_cb cb = {0};
     +
    -+	memset(&cb, 0, sizeof(struct bitmap_commit_cb));
     +	cb.ctx = ctx;
     +
     +	repo_init_revisions(the_repository, &revs, NULL);
    ++	setup_revisions(0, NULL, &revs, NULL);
     +	for_each_ref(add_ref_to_pending, &revs);
     +
     +	/*
    @@ midx.c: static void write_midx_reverse_index(char *midx_name, unsigned char *mid
     +	fetch_if_missing = 0;
     +	revs.exclude_promisor_objects = 1;
     +
    -+	/*
    -+	 * Pass selected commits in topo order to match the behavior of
    -+	 * pack-bitmaps when configured with delta islands.
    -+	 */
    -+	revs.topo_order = 1;
    -+	revs.sort_order = REV_SORT_IN_GRAPH_ORDER;
    -+
     +	if (prepare_revision_walk(&revs))
     +		die(_("revision walk setup failed"));
     +
    @@ midx.c: static void write_midx_reverse_index(char *midx_name, unsigned char *mid
     +	 */
     +	ALLOC_ARRAY(index, pdata.nr_objects);
     +	for (i = 0; i < pdata.nr_objects; i++)
    -+		index[i] = (struct pack_idx_entry *)&pdata.objects[i];
    ++		index[i] = &pdata.objects[i].idx;
     +
     +	bitmap_writer_show_progress(flags & MIDX_PROGRESS);
     +	bitmap_writer_build_type_index(&pdata, index, pdata.nr_objects);
    @@ midx.c: static void write_midx_reverse_index(char *midx_name, unsigned char *mid
     +	 * bitmap_writer_finish().
     +	 */
     +	for (i = 0; i < pdata.nr_objects; i++)
    -+		index[ctx->pack_order[i]] = (struct pack_idx_entry *)&pdata.objects[i];
    ++		index[ctx->pack_order[i]] = &pdata.objects[i].idx;
     +
     +	bitmap_writer_select_commits(commits, commits_nr, -1);
     +	ret = bitmap_writer_build(&pdata);
    @@ midx.c: static void write_midx_reverse_index(char *midx_name, unsigned char *mid
     +	return ret;
     +}
     +
    - static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
    + static int write_midx_internal(const char *object_dir,
      			       struct string_list *packs_to_drop,
      			       const char *preferred_pack_name,
    -@@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack_index *
    - 		for (i = 0; i < ctx.m->num_packs; i++) {
    - 			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
    +@@ midx.c: static int write_midx_internal(const char *object_dir,
      
    -+			if (prepare_midx_pack(the_repository, ctx.m, i)) {
    -+				error(_("could not load pack %s"),
    -+				      ctx.m->pack_names[i]);
    -+				result = 1;
    -+				goto cleanup;
    -+			}
    -+
      			ctx.info[ctx.nr].orig_pack_int_id = i;
      			ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]);
     -			ctx.info[ctx.nr].p = NULL;
     +			ctx.info[ctx.nr].p = ctx.m->packs[i];
      			ctx.info[ctx.nr].expired = 0;
    - 			ctx.nr++;
    - 		}
    -@@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack_index *
    + 
    + 			if (flags & MIDX_WRITE_REV_INDEX) {
    +@@ midx.c: static int write_midx_internal(const char *object_dir,
      	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
      	stop_progress(&ctx.progress);
      
    @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
     +		int bitmap_exists;
     +		int want_bitmap = flags & MIDX_WRITE_BITMAP;
     +
    -+		bitmap_git = prepare_bitmap_git(the_repository);
    ++		bitmap_git = prepare_midx_bitmap_git(the_repository, ctx.m);
     +		bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
     +		free_bitmap_index(bitmap_git);
     +
    @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
      
      	if (preferred_pack_name) {
      		int found = 0;
    -@@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack_index *
    +@@ midx.c: static int write_midx_internal(const char *object_dir,
      		if (!found)
      			warning(_("unknown preferred pack: '%s'"),
      				preferred_pack_name);
     -	} else if (ctx.nr && (flags & MIDX_WRITE_REV_INDEX)) {
     +	} else if (ctx.nr &&
     +		   (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
    - 		time_t oldest = ctx.info[0].p->mtime;
    + 		struct packed_git *oldest = ctx.info[ctx.preferred_pack_idx].p;
      		ctx.preferred_pack_idx = 0;
      
    -@@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack_index *
    +@@ midx.c: static int write_midx_internal(const char *object_dir,
      	hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
      	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
      
    @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
      	if (ctx.nr - dropped_packs == 0) {
      		error(_("no pack files to index."));
      		result = 1;
    -@@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack_index *
    +@@ midx.c: static int write_midx_internal(const char *object_dir,
      	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
      	free_chunkfile(cf);
      
    @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
     +			goto cleanup;
     +		}
     +	}
    ++
    ++	close_midx(ctx.m);
      
      	commit_lock_file(&lk);
      
    @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
      	clear_midx_files_ext(the_repository, ".rev", midx_hash);
      
      cleanup:
    -@@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack_index *
    - 		if (ctx.info[i].p) {
    - 			close_pack(ctx.info[i].p);
    - 			free(ctx.info[i].p);
    -+			if (ctx.m) {
    -+				/*
    -+				 * Destroy a stale reference to the pack in
    -+				 * 'ctx.m'.
    -+				 */
    -+				uint32_t orig = ctx.info[i].orig_pack_int_id;
    -+				if (orig < ctx.m->num_packs)
    -+					ctx.m->packs[orig] = NULL;
    -+			}
    - 		}
    - 		free(ctx.info[i].pack_name);
    - 	}
    -@@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack_index *
    +@@ midx.c: static int write_midx_internal(const char *object_dir,
      	free(ctx.pack_perm);
      	free(ctx.pack_order);
      	free(midx_name);
    -+	if (ctx.m)
    -+		close_midx(ctx.m);
     +
      	return result;
      }
15:  c63eb637c8 = 16:  168b7b0976 t5310: move some tests to lib-bitmap.sh
16:  bedb7afb37 = 17:  60ec8b3466 t/helper/test-read-midx.c: add --checksum mode
17:  fbfac4ae8e = 18:  3258ccfc1c t5326: test multi-pack bitmap behavior
18:  2a5df1832a = 19:  47c7e6bb9b t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
19:  2d24c5b7ad = 20:  6a708858b1 t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
20:  4cbfaa0e97 = 21:  1eaa744b24 t5319: don't write MIDX bitmaps in t5319
21:  839a7a79eb = 22:  a4a899e31f t7700: update to work with MIDX bitmap test knob
22:  00418d5b09 ! 23:  50865e52a3 midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
    @@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix
      		if (!(pack_everything & ALL_INTO_ONE) ||
      		    !is_bare_repository())
      			write_bitmaps = 0;
    --	}
     +	} else if (write_bitmaps &&
     +		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0) &&
    -+		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0))
    ++		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0)) {
     +		write_bitmaps = 0;
    + 	}
      	if (pack_kept_objects < 0)
      		pack_kept_objects = write_bitmaps > 0;
    - 
     @@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix)
      		update_server_info(0);
      	remove_temporary_files();
23:  98fa73a76a = 24:  0f1fd6e7d4 p5310: extract full and partial bitmap tests
24:  ec0f53b424 = 25:  82e8133bf4 p5326: perf tests for MIDX bitmaps
-- 
2.31.1.163.ga65ce7f831

^ permalink raw reply	[flat|nested] 273+ messages in thread

* [PATCH v3 01/25] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
@ 2021-07-27 21:19   ` Taylor Blau
  2021-07-27 21:19   ` [PATCH v3 02/25] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
                     ` (24 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:19 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

The special `--test-bitmap` mode of `git rev-list` is used to compare
the result of an object traversal with a bitmap to check its integrity.
This mode does not, however, assert that the types of reachable objects
are stored correctly.

Harden this mode by teaching it to also check that each time an object's
bit is marked, the corresponding bit should be set in exactly one of the
type bitmaps (whose type matches the object's true type).

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index bfc10148f5..a73960a55d 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1319,10 +1319,52 @@ void count_bitmap_commit_list(struct bitmap_index *bitmap_git,
 struct bitmap_test_data {
 	struct bitmap_index *bitmap_git;
 	struct bitmap *base;
+	struct bitmap *commits;
+	struct bitmap *trees;
+	struct bitmap *blobs;
+	struct bitmap *tags;
 	struct progress *prg;
 	size_t seen;
 };
 
+static void test_bitmap_type(struct bitmap_test_data *tdata,
+			     struct object *obj, int pos)
+{
+	enum object_type bitmap_type = OBJ_NONE;
+	int bitmaps_nr = 0;
+
+	if (bitmap_get(tdata->commits, pos)) {
+		bitmap_type = OBJ_COMMIT;
+		bitmaps_nr++;
+	}
+	if (bitmap_get(tdata->trees, pos)) {
+		bitmap_type = OBJ_TREE;
+		bitmaps_nr++;
+	}
+	if (bitmap_get(tdata->blobs, pos)) {
+		bitmap_type = OBJ_BLOB;
+		bitmaps_nr++;
+	}
+	if (bitmap_get(tdata->tags, pos)) {
+		bitmap_type = OBJ_TAG;
+		bitmaps_nr++;
+	}
+
+	if (bitmap_type == OBJ_NONE)
+		die("object %s not found in type bitmaps",
+		    oid_to_hex(&obj->oid));
+
+	if (bitmaps_nr > 1)
+		die("object %s does not have a unique type",
+		    oid_to_hex(&obj->oid));
+
+	if (bitmap_type != obj->type)
+		die("object %s: real type %s, expected: %s",
+		    oid_to_hex(&obj->oid),
+		    type_name(obj->type),
+		    type_name(bitmap_type));
+}
+
 static void test_show_object(struct object *object, const char *name,
 			     void *data)
 {
@@ -1332,6 +1374,7 @@ static void test_show_object(struct object *object, const char *name,
 	bitmap_pos = bitmap_position(tdata->bitmap_git, &object->oid);
 	if (bitmap_pos < 0)
 		die("Object not in bitmap: %s\n", oid_to_hex(&object->oid));
+	test_bitmap_type(tdata, object, bitmap_pos);
 
 	bitmap_set(tdata->base, bitmap_pos);
 	display_progress(tdata->prg, ++tdata->seen);
@@ -1346,6 +1389,7 @@ static void test_show_commit(struct commit *commit, void *data)
 				     &commit->object.oid);
 	if (bitmap_pos < 0)
 		die("Object not in bitmap: %s\n", oid_to_hex(&commit->object.oid));
+	test_bitmap_type(tdata, &commit->object, bitmap_pos);
 
 	bitmap_set(tdata->base, bitmap_pos);
 	display_progress(tdata->prg, ++tdata->seen);
@@ -1393,6 +1437,10 @@ void test_bitmap_walk(struct rev_info *revs)
 
 	tdata.bitmap_git = bitmap_git;
 	tdata.base = bitmap_new();
+	tdata.commits = ewah_to_bitmap(bitmap_git->commits);
+	tdata.trees = ewah_to_bitmap(bitmap_git->trees);
+	tdata.blobs = ewah_to_bitmap(bitmap_git->blobs);
+	tdata.tags = ewah_to_bitmap(bitmap_git->tags);
 	tdata.prg = start_progress("Verifying bitmap entries", result_popcnt);
 	tdata.seen = 0;
 
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 02/25] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
  2021-07-27 21:19   ` [PATCH v3 01/25] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
@ 2021-07-27 21:19   ` Taylor Blau
  2021-07-27 21:19   ` [PATCH v3 03/25] pack-bitmap-write.c: free existing bitmaps Taylor Blau
                     ` (23 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:19 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

The set of objects covered by a bitmap must be closed under
reachability, since it must be the case that there is a valid bit
position assigned for every possible reachable object (otherwise the
bitmaps would be incomplete).

Pack bitmaps are never written from 'git repack' unless repacking
all-into-one, and so we never write non-closed bitmaps (except in the
case of partial clones where we aren't guaranteed to have all objects).

But multi-pack bitmaps change this, since it isn't known whether the
set of objects in the MIDX is closed under reachability until walking
them. Plumb through a bit that is set when a reachable object isn't
found.

As soon as a reachable object isn't found in the set of objects to
include in the bitmap, bitmap_writer_build() knows that the set is not
closed, and so it now fails gracefully.

A test is added in t0410 to trigger a bitmap write without full
reachability closure by removing local copies of some reachable objects
from a promisor remote.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c   |  3 +-
 pack-bitmap-write.c      | 76 ++++++++++++++++++++++++++++------------
 pack-bitmap.h            |  2 +-
 t/t0410-partial-clone.sh |  9 ++++-
 4 files changed, 64 insertions(+), 26 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index de00adbb9e..8a523624a1 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1256,7 +1256,8 @@ static void write_pack_file(void)
 
 				bitmap_writer_show_progress(progress);
 				bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1);
-				bitmap_writer_build(&to_pack);
+				if (bitmap_writer_build(&to_pack) < 0)
+					die(_("failed to write bitmap index"));
 				bitmap_writer_finish(written_list, nr_written,
 						     tmpname.buf, write_bitmap_options);
 				write_bitmap_index = 0;
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 88d9e696a5..d374f7884b 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -125,15 +125,20 @@ static inline void push_bitmapped_commit(struct commit *commit)
 	writer.selected_nr++;
 }
 
-static uint32_t find_object_pos(const struct object_id *oid)
+static uint32_t find_object_pos(const struct object_id *oid, int *found)
 {
 	struct object_entry *entry = packlist_find(writer.to_pack, oid);
 
 	if (!entry) {
-		die("Failed to write bitmap index. Packfile doesn't have full closure "
+		if (found)
+			*found = 0;
+		warning("Failed to write bitmap index. Packfile doesn't have full closure "
 			"(object %s is missing)", oid_to_hex(oid));
+		return 0;
 	}
 
+	if (found)
+		*found = 1;
 	return oe_in_pack_pos(writer.to_pack, entry);
 }
 
@@ -331,9 +336,10 @@ static void bitmap_builder_clear(struct bitmap_builder *bb)
 	bb->commits_nr = bb->commits_alloc = 0;
 }
 
-static void fill_bitmap_tree(struct bitmap *bitmap,
-			     struct tree *tree)
+static int fill_bitmap_tree(struct bitmap *bitmap,
+			    struct tree *tree)
 {
+	int found;
 	uint32_t pos;
 	struct tree_desc desc;
 	struct name_entry entry;
@@ -342,9 +348,11 @@ static void fill_bitmap_tree(struct bitmap *bitmap,
 	 * If our bit is already set, then there is nothing to do. Both this
 	 * tree and all of its children will be set.
 	 */
-	pos = find_object_pos(&tree->object.oid);
+	pos = find_object_pos(&tree->object.oid, &found);
+	if (!found)
+		return -1;
 	if (bitmap_get(bitmap, pos))
-		return;
+		return 0;
 	bitmap_set(bitmap, pos);
 
 	if (parse_tree(tree) < 0)
@@ -355,11 +363,15 @@ static void fill_bitmap_tree(struct bitmap *bitmap,
 	while (tree_entry(&desc, &entry)) {
 		switch (object_type(entry.mode)) {
 		case OBJ_TREE:
-			fill_bitmap_tree(bitmap,
-					 lookup_tree(the_repository, &entry.oid));
+			if (fill_bitmap_tree(bitmap,
+					     lookup_tree(the_repository, &entry.oid)) < 0)
+				return -1;
 			break;
 		case OBJ_BLOB:
-			bitmap_set(bitmap, find_object_pos(&entry.oid));
+			pos = find_object_pos(&entry.oid, &found);
+			if (!found)
+				return -1;
+			bitmap_set(bitmap, pos);
 			break;
 		default:
 			/* Gitlink, etc; not reachable */
@@ -368,15 +380,18 @@ static void fill_bitmap_tree(struct bitmap *bitmap,
 	}
 
 	free_tree_buffer(tree);
+	return 0;
 }
 
-static void fill_bitmap_commit(struct bb_commit *ent,
-			       struct commit *commit,
-			       struct prio_queue *queue,
-			       struct prio_queue *tree_queue,
-			       struct bitmap_index *old_bitmap,
-			       const uint32_t *mapping)
+static int fill_bitmap_commit(struct bb_commit *ent,
+			      struct commit *commit,
+			      struct prio_queue *queue,
+			      struct prio_queue *tree_queue,
+			      struct bitmap_index *old_bitmap,
+			      const uint32_t *mapping)
 {
+	int found;
+	uint32_t pos;
 	if (!ent->bitmap)
 		ent->bitmap = bitmap_new();
 
@@ -401,11 +416,16 @@ static void fill_bitmap_commit(struct bb_commit *ent,
 		 * Mark ourselves and queue our tree. The commit
 		 * walk ensures we cover all parents.
 		 */
-		bitmap_set(ent->bitmap, find_object_pos(&c->object.oid));
+		pos = find_object_pos(&c->object.oid, &found);
+		if (!found)
+			return -1;
+		bitmap_set(ent->bitmap, pos);
 		prio_queue_put(tree_queue, get_commit_tree(c));
 
 		for (p = c->parents; p; p = p->next) {
-			int pos = find_object_pos(&p->item->object.oid);
+			pos = find_object_pos(&p->item->object.oid, &found);
+			if (!found)
+				return -1;
 			if (!bitmap_get(ent->bitmap, pos)) {
 				bitmap_set(ent->bitmap, pos);
 				prio_queue_put(queue, p->item);
@@ -413,8 +433,12 @@ static void fill_bitmap_commit(struct bb_commit *ent,
 		}
 	}
 
-	while (tree_queue->nr)
-		fill_bitmap_tree(ent->bitmap, prio_queue_get(tree_queue));
+	while (tree_queue->nr) {
+		if (fill_bitmap_tree(ent->bitmap,
+				     prio_queue_get(tree_queue)) < 0)
+			return -1;
+	}
+	return 0;
 }
 
 static void store_selected(struct bb_commit *ent, struct commit *commit)
@@ -432,7 +456,7 @@ static void store_selected(struct bb_commit *ent, struct commit *commit)
 	kh_value(writer.bitmaps, hash_pos) = stored;
 }
 
-void bitmap_writer_build(struct packing_data *to_pack)
+int bitmap_writer_build(struct packing_data *to_pack)
 {
 	struct bitmap_builder bb;
 	size_t i;
@@ -441,6 +465,7 @@ void bitmap_writer_build(struct packing_data *to_pack)
 	struct prio_queue tree_queue = { NULL };
 	struct bitmap_index *old_bitmap;
 	uint32_t *mapping;
+	int closed = 1; /* until proven otherwise */
 
 	writer.bitmaps = kh_init_oid_map();
 	writer.to_pack = to_pack;
@@ -463,8 +488,11 @@ void bitmap_writer_build(struct packing_data *to_pack)
 		struct commit *child;
 		int reused = 0;
 
-		fill_bitmap_commit(ent, commit, &queue, &tree_queue,
-				   old_bitmap, mapping);
+		if (fill_bitmap_commit(ent, commit, &queue, &tree_queue,
+				       old_bitmap, mapping) < 0) {
+			closed = 0;
+			break;
+		}
 
 		if (ent->selected) {
 			store_selected(ent, commit);
@@ -499,7 +527,9 @@ void bitmap_writer_build(struct packing_data *to_pack)
 
 	stop_progress(&writer.progress);
 
-	compute_xor_offsets();
+	if (closed)
+		compute_xor_offsets();
+	return closed ? 0 : -1;
 }
 
 /**
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 99d733eb26..020cd8d868 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -87,7 +87,7 @@ struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
 				      struct commit *commit);
 void bitmap_writer_select_commits(struct commit **indexed_commits,
 		unsigned int indexed_commits_nr, int max_bitmaps);
-void bitmap_writer_build(struct packing_data *to_pack);
+int bitmap_writer_build(struct packing_data *to_pack);
 void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint32_t index_nr,
 			  const char *filename,
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index a211a66c67..bbcc51ee8e 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -536,7 +536,13 @@ test_expect_success 'gc does not repack promisor objects if there are none' '
 repack_and_check () {
 	rm -rf repo2 &&
 	cp -r repo repo2 &&
-	git -C repo2 repack $1 -d &&
+	if test x"$1" = "x--must-fail"
+	then
+		shift
+		test_must_fail git -C repo2 repack $1 -d
+	else
+		git -C repo2 repack $1 -d
+	fi &&
 	git -C repo2 fsck &&
 
 	git -C repo2 cat-file -e $2 &&
@@ -561,6 +567,7 @@ test_expect_success 'repack -d does not irreversibly delete promisor objects' '
 	printf "$THREE\n" | pack_as_from_promisor &&
 	delete_object repo "$ONE" &&
 
+	repack_and_check --must-fail -ab "$TWO" "$THREE" &&
 	repack_and_check -a "$TWO" "$THREE" &&
 	repack_and_check -A "$TWO" "$THREE" &&
 	repack_and_check -l "$TWO" "$THREE"
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 03/25] pack-bitmap-write.c: free existing bitmaps
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
  2021-07-27 21:19   ` [PATCH v3 01/25] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
  2021-07-27 21:19   ` [PATCH v3 02/25] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
@ 2021-07-27 21:19   ` Taylor Blau
  2021-07-27 21:19   ` [PATCH v3 04/25] Documentation: describe MIDX-based bitmaps Taylor Blau
                     ` (22 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:19 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

When writing a new bitmap, the bitmap writer code attempts to read the
existing bitmap (if one is present). This is done in order to quickly
permute the bits of any bitmaps for commits which appear in the existing
bitmap, and were also selected for the new bitmap.

But since this code was added in 341fa34887 (pack-bitmap-write: use
existing bitmaps, 2020-12-08), the resources associated with opening an
existing bitmap were never released.

It's fine to ignore this, but it's bad hygiene. It will also cause a
problem for the multi-pack-index builtin, which will be responsible not
only for writing bitmaps, but also for expiring any old multi-pack
bitmaps.

If an existing bitmap was reused here, it will also be expired. That
will cause a problem on platforms which require file resources to be
closed before unlinking them, like Windows. Avoid this by ensuring we
close reused bitmaps with free_bitmap_index() before removing them.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index d374f7884b..142fd0adb8 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -520,6 +520,7 @@ int bitmap_writer_build(struct packing_data *to_pack)
 	clear_prio_queue(&queue);
 	clear_prio_queue(&tree_queue);
 	bitmap_builder_clear(&bb);
+	free_bitmap_index(old_bitmap);
 	free(mapping);
 
 	trace2_region_leave("pack-bitmap-write", "building_bitmaps_total",
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 04/25] Documentation: describe MIDX-based bitmaps
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (2 preceding siblings ...)
  2021-07-27 21:19   ` [PATCH v3 03/25] pack-bitmap-write.c: free existing bitmaps Taylor Blau
@ 2021-07-27 21:19   ` Taylor Blau
  2021-07-27 21:19   ` [PATCH v3 05/25] midx: clear auxiliary .rev after replacing the MIDX Taylor Blau
                     ` (21 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:19 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Update the technical documentation to describe the multi-pack bitmap
format. This patch merely introduces the new format, and describes its
high-level ideas. Git does not yet know how to read nor write these
multi-pack variants, and so the subsequent patches will:

  - Introduce code to interpret multi-pack bitmaps, according to this
    document.

  - Then, introduce code to write multi-pack bitmaps from the 'git
    multi-pack-index write' sub-command.

Finally, the implementation will gain tests in subsequent patches (as
opposed to inline with the patch teaching Git how to write multi-pack
bitmaps) to avoid a cyclic dependency.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/bitmap-format.txt    | 71 ++++++++++++++++----
 Documentation/technical/multi-pack-index.txt | 10 +--
 2 files changed, 60 insertions(+), 21 deletions(-)

diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index f8c18a0f7a..04b3ec2178 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -1,6 +1,44 @@
 GIT bitmap v1 format
 ====================
 
+== Pack and multi-pack bitmaps
+
+Bitmaps store reachability information about the set of objects in a packfile,
+or a multi-pack index (MIDX). The former is defined obviously, and the latter is
+defined as the union of objects in packs contained in the MIDX.
+
+A bitmap may belong to either one pack, or the repository's multi-pack index (if
+it exists). A repository may have at most one bitmap.
+
+An object is uniquely described by its bit position within a bitmap:
+
+	- If the bitmap belongs to a packfile, the __n__th bit corresponds to
+	the __n__th object in pack order. For a function `offset` which maps
+	objects to their byte offset within a pack, pack order is defined as
+	follows:
+
+		o1 <= o2 <==> offset(o1) <= offset(o2)
+
+	- If the bitmap belongs to a MIDX, the __n__th bit corresponds to the
+	__n__th object in MIDX order. With an additional function `pack` which
+	maps objects to the pack they were selected from by the MIDX, MIDX order
+	is defined as follows:
+
+		o1 <= o2 <==> pack(o1) <= pack(o2) /\ offset(o1) <= offset(o2)
+
+	The ordering between packs is done according to the MIDX's .rev file.
+	Notably, the preferred pack sorts ahead of all other packs.
+
+The on-disk representation (described below) of a bitmap is the same regardless
+of whether or not that bitmap belongs to a packfile or a MIDX. The only
+difference is the interpretation of the bits, which is described above.
+
+Certain bitmap extensions are supported (see: Appendix B). No extensions are
+required for bitmaps corresponding to packfiles. For bitmaps that correspond to
+MIDXs, both the bit-cache and rev-cache extensions are required.
+
+== On-disk format
+
 	- A header appears at the beginning:
 
 		4-byte signature: {'B', 'I', 'T', 'M'}
@@ -14,17 +52,19 @@ GIT bitmap v1 format
 			The following flags are supported:
 
 			- BITMAP_OPT_FULL_DAG (0x1) REQUIRED
-			This flag must always be present. It implies that the bitmap
-			index has been generated for a packfile with full closure
-			(i.e. where every single object in the packfile can find
-			 its parent links inside the same packfile). This is a
-			requirement for the bitmap index format, also present in JGit,
-			that greatly reduces the complexity of the implementation.
+			This flag must always be present. It implies that the
+			bitmap index has been generated for a packfile or
+			multi-pack index (MIDX) with full closure (i.e. where
+			every single object in the packfile/MIDX can find its
+			parent links inside the same packfile/MIDX). This is a
+			requirement for the bitmap index format, also present in
+			JGit, that greatly reduces the complexity of the
+			implementation.
 
 			- BITMAP_OPT_HASH_CACHE (0x4)
 			If present, the end of the bitmap file contains
 			`N` 32-bit name-hash values, one per object in the
-			pack. The format and meaning of the name-hash is
+			pack/MIDX. The format and meaning of the name-hash is
 			described below.
 
 		4-byte entry count (network byte order)
@@ -33,7 +73,8 @@ GIT bitmap v1 format
 
 		20-byte checksum
 
-			The SHA1 checksum of the pack this bitmap index belongs to.
+			The SHA1 checksum of the pack/MIDX this bitmap index
+			belongs to.
 
 	- 4 EWAH bitmaps that act as type indexes
 
@@ -50,7 +91,7 @@ GIT bitmap v1 format
 			- Tags
 
 		In each bitmap, the `n`th bit is set to true if the `n`th object
-		in the packfile is of that type.
+		in the packfile or multi-pack index is of that type.
 
 		The obvious consequence is that the OR of all 4 bitmaps will result
 		in a full set (all bits set), and the AND of all 4 bitmaps will
@@ -62,8 +103,9 @@ GIT bitmap v1 format
 		Each entry contains the following:
 
 		- 4-byte object position (network byte order)
-			The position **in the index for the packfile** where the
-			bitmap for this commit is found.
+			The position **in the index for the packfile or
+			multi-pack index** where the bitmap for this commit is
+			found.
 
 		- 1-byte XOR-offset
 			The xor offset used to compress this bitmap. For an entry
@@ -146,10 +188,11 @@ Name-hash cache
 ---------------
 
 If the BITMAP_OPT_HASH_CACHE flag is set, the end of the bitmap contains
-a cache of 32-bit values, one per object in the pack. The value at
+a cache of 32-bit values, one per object in the pack/MIDX. The value at
 position `i` is the hash of the pathname at which the `i`th object
-(counting in index order) in the pack can be found.  This can be fed
-into the delta heuristics to compare objects with similar pathnames.
+(counting in index or multi-pack index order) in the pack/MIDX can be found.
+This can be fed into the delta heuristics to compare objects with similar
+pathnames.
 
 The hash algorithm used is:
 
diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index fb688976c4..1a73c3ee20 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -71,14 +71,10 @@ Future Work
   still reducing the number of binary searches required for object
   lookups.
 
-- The reachability bitmap is currently paired directly with a single
-  packfile, using the pack-order as the object order to hopefully
-  compress the bitmaps well using run-length encoding. This could be
-  extended to pair a reachability bitmap with a multi-pack-index. If
-  the multi-pack-index is extended to store a "stable object order"
+- If the multi-pack-index is extended to store a "stable object order"
   (a function Order(hash) = integer that is constant for a given hash,
-  even as the multi-pack-index is updated) then a reachability bitmap
-  could point to a multi-pack-index and be updated independently.
+  even as the multi-pack-index is updated) then MIDX bitmaps could be
+  updated independently of the MIDX.
 
 - Packfiles can be marked as "special" using empty files that share
   the initial name but replace ".pack" with ".keep" or ".promisor".
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (3 preceding siblings ...)
  2021-07-27 21:19   ` [PATCH v3 04/25] Documentation: describe MIDX-based bitmaps Taylor Blau
@ 2021-07-27 21:19   ` Taylor Blau
  2021-07-27 21:19   ` [PATCH v3 06/25] midx: reject empty `--preferred-pack`'s Taylor Blau
                     ` (20 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:19 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

When writing a new multi-pack index, write_midx_internal() attempts to
clean up any auxiliary files (currently just the MIDX's `.rev` file, but
soon to include a `.bitmap`, too) corresponding to the MIDX it's
replacing.

This step should happen after the new MIDX is written into place, since
doing so beforehand means that the old MIDX could be read without its
corresponding .rev file.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 9a35b0255d..3193426d24 100644
--- a/midx.c
+++ b/midx.c
@@ -1086,10 +1086,11 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 
 	if (flags & MIDX_WRITE_REV_INDEX)
 		write_midx_reverse_index(midx_name, midx_hash, &ctx);
-	clear_midx_files_ext(the_repository, ".rev", midx_hash);
 
 	commit_lock_file(&lk);
 
+	clear_midx_files_ext(the_repository, ".rev", midx_hash);
+
 cleanup:
 	for (i = 0; i < ctx.nr; i++) {
 		if (ctx.info[i].p) {
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 06/25] midx: reject empty `--preferred-pack`'s
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (4 preceding siblings ...)
  2021-07-27 21:19   ` [PATCH v3 05/25] midx: clear auxiliary .rev after replacing the MIDX Taylor Blau
@ 2021-07-27 21:19   ` Taylor Blau
  2021-07-27 21:19   ` [PATCH v3 07/25] midx: infer preferred pack when not given one Taylor Blau
                     ` (19 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:19 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

The soon-to-be-implemented multi-pack bitmap treats object in the first
bit position specially by assuming that all objects in the pack it was
selected from are also represented from that pack in the MIDX. In other
words, the pack from which the first object was selected must also have
all of its other objects selected from that same pack in the MIDX in
case of any duplicates.

But this assumption relies on the fact that there is at least one object
in that pack to begin with; otherwise the object in the first bit
position isn't from a preferred pack, in which case we can no longer
assume that all objects in that pack were also selected from the same
pack.

Guard this assumption by checking the number of objects in the given
preferred pack, and failing if the given pack is empty.

To make sure we can safely perform this check, open any packs which are
contained in an existing MIDX via prepare_midx_pack(). The same is done
for new packs via the add_pack_to_midx() callback, but packs picked up
from a previous MIDX will not yet have these opened.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt |  6 +++---
 midx.c                                 | 29 ++++++++++++++++++++++++++
 t/t5319-multi-pack-index.sh            | 17 +++++++++++++++
 3 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index ffd601bc17..c9b063d31e 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -37,9 +37,9 @@ write::
 --
 	--preferred-pack=<pack>::
 		Optionally specify the tie-breaking pack used when
-		multiple packs contain the same object. If not given,
-		ties are broken in favor of the pack with the lowest
-		mtime.
+		multiple packs contain the same object. `<pack>` must
+		contain at least one object. If not given, ties are
+		broken in favor of the pack with the lowest mtime.
 --
 
 verify::
diff --git a/midx.c b/midx.c
index 3193426d24..092dbf45b6 100644
--- a/midx.c
+++ b/midx.c
@@ -934,6 +934,25 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 			ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]);
 			ctx.info[ctx.nr].p = NULL;
 			ctx.info[ctx.nr].expired = 0;
+
+			if (flags & MIDX_WRITE_REV_INDEX) {
+				/*
+				 * If generating a reverse index, need to have
+				 * packed_git's loaded to compare their
+				 * mtimes and object count.
+				 */
+				if (prepare_midx_pack(the_repository, ctx.m, i)) {
+					error(_("could not load pack"));
+					result = 1;
+					goto cleanup;
+				}
+
+				if (open_pack_index(ctx.m->packs[i]))
+					die(_("could not open index for %s"),
+					    ctx.m->packs[i]->pack_name);
+				ctx.info[ctx.nr].p = ctx.m->packs[i];
+			}
+
 			ctx.nr++;
 		}
 	}
@@ -961,6 +980,16 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 		}
 	}
 
+	if (ctx.preferred_pack_idx > -1) {
+		struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p;
+		if (!preferred->num_objects) {
+			error(_("cannot select preferred pack %s with no objects"),
+			      preferred->pack_name);
+			result = 1;
+			goto cleanup;
+		}
+	}
+
 	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
 					 ctx.preferred_pack_idx);
 
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 7609f1ea64..1f0a2ae852 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -277,6 +277,23 @@ test_expect_success 'midx picks objects from preferred pack' '
 	)
 '
 
+test_expect_success 'preferred packs must be non-empty' '
+	test_when_finished rm -rf preferred.git &&
+	git init preferred.git &&
+	(
+		cd preferred.git &&
+
+		test_commit base &&
+		git repack -ad &&
+
+		empty="$(git pack-objects $objdir/pack/pack </dev/null)" &&
+
+		test_must_fail git multi-pack-index write \
+			--preferred-pack=pack-$empty.pack 2>err &&
+		grep "with no objects" err
+	)
+'
+
 test_expect_success 'verify multi-pack-index success' '
 	git multi-pack-index verify --object-dir=$objdir
 '
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 07/25] midx: infer preferred pack when not given one
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (5 preceding siblings ...)
  2021-07-27 21:19   ` [PATCH v3 06/25] midx: reject empty `--preferred-pack`'s Taylor Blau
@ 2021-07-27 21:19   ` Taylor Blau
  2021-07-27 21:19   ` [PATCH v3 08/25] midx: close linked MIDXs, avoid leaking memory Taylor Blau
                     ` (18 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:19 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

In 9218c6a40c (midx: allow marking a pack as preferred, 2021-03-30), the
multi-pack index code learned how to select a pack which all duplicate
objects are selected from. That is, if an object appears in multiple
packs, select the copy in the preferred pack before breaking ties
according to the other rules like pack mtime and readdir() order.

Not specifying a preferred pack can cause serious problems with
multi-pack reachability bitmaps, because these bitmaps rely on having at
least one pack from which all duplicates are selected. Not having such a
pack causes problems with the code in pack-objects to reuse packs
verbatim (e.g., that code assumes that a delta object in a chunk of pack
sent verbatim will have its base object sent from the same pack).

So why does not marking a pack preferred cause problems here? The reason
is roughly as follows:

  - Ties are broken (when handling duplicate objects) by sorting
    according to midx_oid_compare(), which sorts objects by OID,
    preferred-ness, pack mtime, and finally pack ID (more on that
    later).

  - The psuedo pack-order (described in
    Documentation/technical/pack-format.txt under the section
    "multi-pack-index reverse indexes") is computed by
    midx_pack_order(), and sorts by pack ID and pack offset, with
    preferred packs sorting first.

  - But! Pack IDs come from incrementing the pack count in
    add_pack_to_midx(), which is a callback to
    for_each_file_in_pack_dir(), meaning that pack IDs are assigned in
    readdir() order.

When specifying a preferred pack, all of that works fine, because
duplicate objects are correctly resolved in favor of the copy in the
preferred pack, and the preferred pack sorts first in the object order.

"Sorting first" is critical, because the bitmap code relies on finding
out which pack holds the first object in the MIDX's pseudo pack-order to
determine which pack is preferred.

But if we didn't specify a preferred pack, and the pack which comes
first in readdir() order does not also have the lowest timestamp, then
it's possible that that pack (the one that sorts first in pseudo-pack
order, which the bitmap code will treat as the preferred one) did *not*
have all duplicate objects resolved in its favor, resulting in breakage.

The fix is simple: pick a (semi-arbitrary, non-empty) preferred pack
when none was specified. This forces that pack to have duplicates
resolved in its favor, and (critically) to sort first in pseudo-pack
order.  Unfortunately, testing this behavior portably isn't possible,
since it depends on readdir() order which isn't guaranteed by POSIX.

(Note that multi-pack reachability bitmaps have yet to be implemented;
so in that sense this patch is fixing a bug which does not yet exist.
But by having this patch beforehand, we can prevent the bug from ever
materializing.)

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 50 ++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 44 insertions(+), 6 deletions(-)

diff --git a/midx.c b/midx.c
index 092dbf45b6..8956492b9c 100644
--- a/midx.c
+++ b/midx.c
@@ -969,15 +969,57 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
 		goto cleanup;
 
-	ctx.preferred_pack_idx = -1;
 	if (preferred_pack_name) {
+		int found = 0;
 		for (i = 0; i < ctx.nr; i++) {
 			if (!cmp_idx_or_pack_name(preferred_pack_name,
 						  ctx.info[i].pack_name)) {
 				ctx.preferred_pack_idx = i;
+				found = 1;
 				break;
 			}
 		}
+
+		if (!found)
+			warning(_("unknown preferred pack: '%s'"),
+				preferred_pack_name);
+	} else if (ctx.nr && (flags & MIDX_WRITE_REV_INDEX)) {
+		struct packed_git *oldest = ctx.info[ctx.preferred_pack_idx].p;
+		ctx.preferred_pack_idx = 0;
+
+		if (packs_to_drop && packs_to_drop->nr)
+			BUG("cannot write a MIDX bitmap during expiration");
+
+		/*
+		 * set a preferred pack when writing a bitmap to ensure that
+		 * the pack from which the first object is selected in pseudo
+		 * pack-order has all of its objects selected from that pack
+		 * (and not another pack containing a duplicate)
+		 */
+		for (i = 1; i < ctx.nr; i++) {
+			struct packed_git *p = ctx.info[i].p;
+
+			if (!oldest->num_objects || p->mtime < oldest->mtime) {
+				oldest = p;
+				ctx.preferred_pack_idx = i;
+			}
+		}
+
+		if (!oldest->num_objects) {
+			/*
+			 * If all packs are empty; unset the preferred index.
+			 * This is acceptable since there will be no duplicate
+			 * objects to resolve, so the preferred value doesn't
+			 * matter.
+			 */
+			ctx.preferred_pack_idx = -1;
+		}
+	} else {
+		/*
+		 * otherwise don't mark any pack as preferred to avoid
+		 * interfering with expiration logic below
+		 */
+		ctx.preferred_pack_idx = -1;
 	}
 
 	if (ctx.preferred_pack_idx > -1) {
@@ -1058,11 +1100,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 						      ctx.info, ctx.nr,
 						      sizeof(*ctx.info),
 						      idx_or_pack_name_cmp);
-
-		if (!preferred)
-			warning(_("unknown preferred pack: '%s'"),
-				preferred_pack_name);
-		else {
+		if (preferred) {
 			uint32_t perm = ctx.pack_perm[preferred->orig_pack_int_id];
 			if (perm == PACK_EXPIRED)
 				warning(_("preferred pack '%s' is expired"),
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 08/25] midx: close linked MIDXs, avoid leaking memory
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (6 preceding siblings ...)
  2021-07-27 21:19   ` [PATCH v3 07/25] midx: infer preferred pack when not given one Taylor Blau
@ 2021-07-27 21:19   ` Taylor Blau
  2021-07-27 21:19   ` [PATCH v3 09/25] midx: avoid opening multiple MIDXs when writing Taylor Blau
                     ` (17 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:19 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

When a repository has at least one alternate, the MIDX belonging to each
alternate is accessed through the `next` pointer on the main object
store's copy of the MIDX. close_midx() didn't bother to close any
of the linked MIDXs. It likewise didn't free the memory pointed to by
`m`, leaving uninitialized bytes with live pointers to them left around
in the heap.

Clean this up by closing linked MIDXs, and freeing up the memory pointed
to by each of them. When callers call close_midx(), then they can
discard the entire linked list of MIDXs and set their pointer to the
head of that list to NULL.

This isn't strictly required for the upcoming patches, but it makes it
much more difficult (though still possible, for e.g., by calling
`close_midx(m->next)` which leaves `m->next` pointing at uninitialized
bytes) to have pointers to uninitialized memory.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/midx.c b/midx.c
index 8956492b9c..18e1949613 100644
--- a/midx.c
+++ b/midx.c
@@ -195,6 +195,8 @@ void close_midx(struct multi_pack_index *m)
 	if (!m)
 		return;
 
+	close_midx(m->next);
+
 	munmap((unsigned char *)m->data, m->data_len);
 
 	for (i = 0; i < m->num_packs; i++) {
@@ -203,6 +205,7 @@ void close_midx(struct multi_pack_index *m)
 	}
 	FREE_AND_NULL(m->packs);
 	FREE_AND_NULL(m->pack_names);
+	free(m);
 }
 
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id)
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 09/25] midx: avoid opening multiple MIDXs when writing
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (7 preceding siblings ...)
  2021-07-27 21:19   ` [PATCH v3 08/25] midx: close linked MIDXs, avoid leaking memory Taylor Blau
@ 2021-07-27 21:19   ` Taylor Blau
  2021-07-29 19:30     ` Taylor Blau
  2021-08-12 20:15     ` Jeff King
  2021-07-27 21:19   ` [PATCH v3 10/25] pack-bitmap.c: introduce 'bitmap_num_objects()' Taylor Blau
                     ` (16 subsequent siblings)
  25 siblings, 2 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:19 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Opening multiple instance of the same MIDX can lead to problems like two
separate packed_git structures which represent the same pack being added
to the repository's object store.

The above scenario can happen because prepare_midx_pack() checks if
`m->packs[pack_int_id]` is NULL in order to determine if a pack has been
opened and installed in the repository before. But a caller can
construct two copies of the same MIDX by calling get_multi_pack_index()
and load_multi_pack_index() since the former manipulates the
object store directly but the latter is a lower-level routine which
allocates a new MIDX for each call.

So if prepare_midx_pack() is called on multiple MIDXs with the same
pack_int_id, then that pack will be installed twice in the object
store's packed_git pointer.

This can lead to problems in, for e.g., the pack-bitmap code, which does
something like the following (in pack-bitmap.c:open_pack_bitmap()):

    struct bitmap_index *bitmap_git = ...;
    for (p = get_all_packs(r); p; p = p->next) {
      if (open_pack_bitmap_1(bitmap_git, p) == 0)
        ret = 0;
    }

which is a problem if two copies of the same pack exist in the
packed_git list because pack-bitmap.c:open_pack_bitmap_1() contains a
conditional like the following:

    if (bitmap_git->pack || bitmap_git->midx) {
      /* ignore extra bitmap file; we can only handle one */
      warning("ignoring extra bitmap file: %s", packfile->pack_name);
      close(fd);
      return -1;
    }

Avoid this scenario by not letting write_midx_internal() open a MIDX
that isn't also pointed at by the object store. So long as this is the
case, other routines should prefer to open MIDXs with
get_multi_pack_index() or reprepare_packed_git() instead of creating
instances on their own. Because get_multi_pack_index() returns
`r->object_store->multi_pack_index` if it is non-NULL, we'll only have
one instance of a MIDX open at one time, avoiding these problems.

To encourage this, drop the `struct multi_pack_index *` parameter from
`write_midx_internal()`, and rely instead on the `object_dir` to find
(or initialize) the correct MIDX instance.

Likewise, replace the call to `close_midx()` with
`close_object_store()`, since we're about to replace the MIDX with a new
one and should invalidate the object store's memory of any MIDX that
might have existed beforehand.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/midx.c b/midx.c
index 18e1949613..d67d7f383d 100644
--- a/midx.c
+++ b/midx.c
@@ -893,7 +893,7 @@ static int midx_checksum_valid(struct multi_pack_index *m)
 	return hashfile_checksum_valid(m->data, m->data_len);
 }
 
-static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
+static int write_midx_internal(const char *object_dir,
 			       struct string_list *packs_to_drop,
 			       const char *preferred_pack_name,
 			       unsigned flags)
@@ -904,6 +904,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	struct hashfile *f = NULL;
 	struct lock_file lk;
 	struct write_midx_context ctx = { 0 };
+	struct multi_pack_index *cur;
 	int pack_name_concat_len = 0;
 	int dropped_packs = 0;
 	int result = 0;
@@ -914,10 +915,14 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 		die_errno(_("unable to create leading directories of %s"),
 			  midx_name);
 
-	if (m)
-		ctx.m = m;
-	else
-		ctx.m = load_multi_pack_index(object_dir, 1);
+	for (cur = get_multi_pack_index(the_repository); cur; cur = cur->next) {
+		if (!strcmp(object_dir, cur->object_dir)) {
+			ctx.m = cur;
+			break;
+		}
+	}
+	if (!ctx.m)
+		ctx.m = get_local_multi_pack_index(the_repository);
 
 	if (ctx.m && !midx_checksum_valid(ctx.m)) {
 		warning(_("ignoring existing multi-pack-index; checksum mismatch"));
@@ -1182,8 +1187,7 @@ int write_midx_file(const char *object_dir,
 		    const char *preferred_pack_name,
 		    unsigned flags)
 {
-	return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name,
-				   flags);
+	return write_midx_internal(object_dir, NULL, preferred_pack_name, flags);
 }
 
 struct clear_midx_data {
@@ -1460,8 +1464,10 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
 
 	free(count);
 
-	if (packs_to_drop.nr)
-		result = write_midx_internal(object_dir, m, &packs_to_drop, NULL, flags);
+	if (packs_to_drop.nr) {
+		result = write_midx_internal(object_dir, &packs_to_drop, NULL, flags);
+		m = NULL;
+	}
 
 	string_list_clear(&packs_to_drop, 0);
 	return result;
@@ -1650,7 +1656,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 		goto cleanup;
 	}
 
-	result = write_midx_internal(object_dir, m, NULL, NULL, flags);
+	result = write_midx_internal(object_dir, NULL, NULL, flags);
 	m = NULL;
 
 cleanup:
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 10/25] pack-bitmap.c: introduce 'bitmap_num_objects()'
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (8 preceding siblings ...)
  2021-07-27 21:19   ` [PATCH v3 09/25] midx: avoid opening multiple MIDXs when writing Taylor Blau
@ 2021-07-27 21:19   ` Taylor Blau
  2021-07-27 21:19   ` [PATCH v3 11/25] pack-bitmap.c: introduce 'nth_bitmap_object_oid()' Taylor Blau
                     ` (15 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:19 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A subsequent patch to support reading MIDX bitmaps will be less noisy
after extracting a generic function to return how many objects are
contained in a bitmap.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 37 +++++++++++++++++++++----------------
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index a73960a55d..54dc4f7915 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -136,6 +136,11 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
 	return b;
 }
 
+static uint32_t bitmap_num_objects(struct bitmap_index *index)
+{
+	return index->pack->num_objects;
+}
+
 static int load_bitmap_header(struct bitmap_index *index)
 {
 	struct bitmap_disk_header *header = (void *)index->map;
@@ -154,7 +159,7 @@ static int load_bitmap_header(struct bitmap_index *index)
 	/* Parse known bitmap format options */
 	{
 		uint32_t flags = ntohs(header->options);
-		size_t cache_size = st_mult(index->pack->num_objects, sizeof(uint32_t));
+		size_t cache_size = st_mult(bitmap_num_objects(index), sizeof(uint32_t));
 		unsigned char *index_end = index->map + index->map_size - the_hash_algo->rawsz;
 
 		if ((flags & BITMAP_OPT_FULL_DAG) == 0)
@@ -399,7 +404,7 @@ static inline int bitmap_position_extended(struct bitmap_index *bitmap_git,
 
 	if (pos < kh_end(positions)) {
 		int bitmap_pos = kh_value(positions, pos);
-		return bitmap_pos + bitmap_git->pack->num_objects;
+		return bitmap_pos + bitmap_num_objects(bitmap_git);
 	}
 
 	return -1;
@@ -451,7 +456,7 @@ static int ext_index_add_object(struct bitmap_index *bitmap_git,
 		bitmap_pos = kh_value(eindex->positions, hash_pos);
 	}
 
-	return bitmap_pos + bitmap_git->pack->num_objects;
+	return bitmap_pos + bitmap_num_objects(bitmap_git);
 }
 
 struct bitmap_show_data {
@@ -668,7 +673,7 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
 	for (i = 0; i < eindex->count; ++i) {
 		struct object *obj;
 
-		if (!bitmap_get(objects, bitmap_git->pack->num_objects + i))
+		if (!bitmap_get(objects, bitmap_num_objects(bitmap_git) + i))
 			continue;
 
 		obj = eindex->objects[i];
@@ -826,7 +831,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
 	 * individually.
 	 */
 	for (i = 0; i < eindex->count; i++) {
-		uint32_t pos = i + bitmap_git->pack->num_objects;
+		uint32_t pos = i + bitmap_num_objects(bitmap_git);
 		if (eindex->objects[i]->type == type &&
 		    bitmap_get(to_filter, pos) &&
 		    !bitmap_get(tips, pos))
@@ -853,7 +858,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 
 	oi.sizep = &size;
 
-	if (pos < pack->num_objects) {
+	if (pos < bitmap_num_objects(bitmap_git)) {
 		off_t ofs = pack_pos_to_offset(pack, pos);
 		if (packed_object_info(the_repository, pack, ofs, &oi) < 0) {
 			struct object_id oid;
@@ -863,7 +868,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 		}
 	} else {
 		struct eindex *eindex = &bitmap_git->ext_index;
-		struct object *obj = eindex->objects[pos - pack->num_objects];
+		struct object *obj = eindex->objects[pos - bitmap_num_objects(bitmap_git)];
 		if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
 			die(_("unable to get size of %s"), oid_to_hex(&obj->oid));
 	}
@@ -905,7 +910,7 @@ static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
 	}
 
 	for (i = 0; i < eindex->count; i++) {
-		uint32_t pos = i + bitmap_git->pack->num_objects;
+		uint32_t pos = i + bitmap_num_objects(bitmap_git);
 		if (eindex->objects[i]->type == OBJ_BLOB &&
 		    bitmap_get(to_filter, pos) &&
 		    !bitmap_get(tips, pos) &&
@@ -1131,8 +1136,8 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 	enum object_type type;
 	unsigned long size;
 
-	if (pos >= bitmap_git->pack->num_objects)
-		return; /* not actually in the pack */
+	if (pos >= bitmap_num_objects(bitmap_git))
+		return; /* not actually in the pack or MIDX */
 
 	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
 	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
@@ -1198,6 +1203,7 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 	struct pack_window *w_curs = NULL;
 	size_t i = 0;
 	uint32_t offset;
+	uint32_t objects_nr = bitmap_num_objects(bitmap_git);
 
 	assert(result);
 
@@ -1205,8 +1211,8 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 		i++;
 
 	/* Don't mark objects not in the packfile */
-	if (i > bitmap_git->pack->num_objects / BITS_IN_EWORD)
-		i = bitmap_git->pack->num_objects / BITS_IN_EWORD;
+	if (i > objects_nr / BITS_IN_EWORD)
+		i = objects_nr / BITS_IN_EWORD;
 
 	reuse = bitmap_word_alloc(i);
 	memset(reuse->words, 0xFF, i * sizeof(eword_t));
@@ -1290,7 +1296,7 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
 
 	for (i = 0; i < eindex->count; ++i) {
 		if (eindex->objects[i]->type == type &&
-			bitmap_get(objects, bitmap_git->pack->num_objects + i))
+			bitmap_get(objects, bitmap_num_objects(bitmap_git) + i))
 			count++;
 	}
 
@@ -1511,7 +1517,7 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 	uint32_t i, num_objects;
 	uint32_t *reposition;
 
-	num_objects = bitmap_git->pack->num_objects;
+	num_objects = bitmap_num_objects(bitmap_git);
 	CALLOC_ARRAY(reposition, num_objects);
 
 	for (i = 0; i < num_objects; ++i) {
@@ -1594,7 +1600,6 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
 static off_t get_disk_usage_for_extended(struct bitmap_index *bitmap_git)
 {
 	struct bitmap *result = bitmap_git->result;
-	struct packed_git *pack = bitmap_git->pack;
 	struct eindex *eindex = &bitmap_git->ext_index;
 	off_t total = 0;
 	struct object_info oi = OBJECT_INFO_INIT;
@@ -1606,7 +1611,7 @@ static off_t get_disk_usage_for_extended(struct bitmap_index *bitmap_git)
 	for (i = 0; i < eindex->count; i++) {
 		struct object *obj = eindex->objects[i];
 
-		if (!bitmap_get(result, pack->num_objects + i))
+		if (!bitmap_get(result, bitmap_num_objects(bitmap_git) + i))
 			continue;
 
 		if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 11/25] pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (9 preceding siblings ...)
  2021-07-27 21:19   ` [PATCH v3 10/25] pack-bitmap.c: introduce 'bitmap_num_objects()' Taylor Blau
@ 2021-07-27 21:19   ` Taylor Blau
  2021-07-27 21:19   ` [PATCH v3 12/25] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' Taylor Blau
                     ` (14 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:19 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A subsequent patch to support reading MIDX bitmaps will be less noisy
after extracting a generic function to fetch the nth OID contained in
the bitmap.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 54dc4f7915..9d0dd1cde7 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -223,6 +223,13 @@ static inline uint8_t read_u8(const unsigned char *buffer, size_t *pos)
 
 #define MAX_XOR_OFFSET 160
 
+static int nth_bitmap_object_oid(struct bitmap_index *index,
+				 struct object_id *oid,
+				 uint32_t n)
+{
+	return nth_packed_object_id(oid, index->pack, n);
+}
+
 static int load_bitmap_entries_v1(struct bitmap_index *index)
 {
 	uint32_t i;
@@ -242,7 +249,7 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 		xor_offset = read_u8(index->map, &index->map_pos);
 		flags = read_u8(index->map, &index->map_pos);
 
-		if (nth_packed_object_id(&oid, index->pack, commit_idx_pos) < 0)
+		if (nth_bitmap_object_oid(index, &oid, commit_idx_pos) < 0)
 			return error("corrupt ewah bitmap: commit index %u out of range",
 				     (unsigned)commit_idx_pos);
 
@@ -862,8 +869,8 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 		off_t ofs = pack_pos_to_offset(pack, pos);
 		if (packed_object_info(the_repository, pack, ofs, &oi) < 0) {
 			struct object_id oid;
-			nth_packed_object_id(&oid, pack,
-					     pack_pos_to_index(pack, pos));
+			nth_bitmap_object_oid(bitmap_git, &oid,
+					      pack_pos_to_index(pack, pos));
 			die(_("unable to get size of %s"), oid_to_hex(&oid));
 		}
 	} else {
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 12/25] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (10 preceding siblings ...)
  2021-07-27 21:19   ` [PATCH v3 11/25] pack-bitmap.c: introduce 'nth_bitmap_object_oid()' Taylor Blau
@ 2021-07-27 21:19   ` Taylor Blau
  2021-07-27 21:19   ` [PATCH v3 13/25] pack-bitmap.c: avoid redundant calls to try_partial_reuse Taylor Blau
                     ` (13 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:19 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

In a recent commit, pack-objects learned support for the
'pack.preferBitmapTips' configuration. This patch prepares the
multi-pack bitmap code to respect this configuration, too.

The yet-to-be implemented code will find that it is more efficient to
check whether each reference contains a prefix found in the configured
set of values rather than doing an additional traversal.

Implement a function 'bitmap_is_preferred_refname()' which will perform
that check. Its caller will be added in a subsequent patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 16 ++++++++++++++++
 pack-bitmap.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 9d0dd1cde7..6b12c96e32 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1652,3 +1652,19 @@ const struct string_list *bitmap_preferred_tips(struct repository *r)
 {
 	return repo_config_get_value_multi(r, "pack.preferbitmaptips");
 }
+
+int bitmap_is_preferred_refname(struct repository *r, const char *refname)
+{
+	const struct string_list *preferred_tips = bitmap_preferred_tips(r);
+	struct string_list_item *item;
+
+	if (!preferred_tips)
+		return 0;
+
+	for_each_string_list_item(item, preferred_tips) {
+		if (starts_with(refname, item->string))
+			return 1;
+	}
+
+	return 0;
+}
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 020cd8d868..52ea10de51 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -94,5 +94,6 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint16_t options);
 
 const struct string_list *bitmap_preferred_tips(struct repository *r);
+int bitmap_is_preferred_refname(struct repository *r, const char *refname);
 
 #endif
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 13/25] pack-bitmap.c: avoid redundant calls to try_partial_reuse
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (11 preceding siblings ...)
  2021-07-27 21:19   ` [PATCH v3 12/25] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' Taylor Blau
@ 2021-07-27 21:19   ` Taylor Blau
  2021-07-27 21:19   ` [PATCH v3 14/25] pack-bitmap: read multi-pack bitmaps Taylor Blau
                     ` (12 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:19 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

try_partial_reuse() is used to mark any bits in the beginning of a
bitmap whose objects can be reused verbatim from the pack they came
from.

Currently this function returns void, and signals nothing to the caller
when bits could not be reused. But multi-pack bitmaps would benefit from
having such a signal, because they may try to pass objects which are in
bounds, but from a pack other than the preferred one.

Any extra calls are noops because of a conditional in
reuse_partial_packfile_from_bitmap(), but those loop iterations can be
avoided by letting try_partial_reuse() indicate when it can't accept any
more bits for reuse, and then listening to that signal.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 40 +++++++++++++++++++++++++++++-----------
 1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 6b12c96e32..1442f0c8f2 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1134,22 +1134,26 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	return NULL;
 }
 
-static void try_partial_reuse(struct bitmap_index *bitmap_git,
-			      size_t pos,
-			      struct bitmap *reuse,
-			      struct pack_window **w_curs)
+/*
+ * -1 means "stop trying further objects"; 0 means we may or may not have
+ * reused, but you can keep feeding bits.
+ */
+static int try_partial_reuse(struct bitmap_index *bitmap_git,
+			     size_t pos,
+			     struct bitmap *reuse,
+			     struct pack_window **w_curs)
 {
 	off_t offset, header;
 	enum object_type type;
 	unsigned long size;
 
 	if (pos >= bitmap_num_objects(bitmap_git))
-		return; /* not actually in the pack or MIDX */
+		return -1; /* not actually in the pack or MIDX */
 
 	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
 	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
 	if (type < 0)
-		return; /* broken packfile, punt */
+		return -1; /* broken packfile, punt */
 
 	if (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA) {
 		off_t base_offset;
@@ -1166,9 +1170,9 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 		base_offset = get_delta_base(bitmap_git->pack, w_curs,
 					     &offset, type, header);
 		if (!base_offset)
-			return;
+			return 0;
 		if (offset_to_pack_pos(bitmap_git->pack, base_offset, &base_pos) < 0)
-			return;
+			return 0;
 
 		/*
 		 * We assume delta dependencies always point backwards. This
@@ -1180,7 +1184,7 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 		 * odd parameters.
 		 */
 		if (base_pos >= pos)
-			return;
+			return 0;
 
 		/*
 		 * And finally, if we're not sending the base as part of our
@@ -1191,13 +1195,14 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 		 * object_entry code path handle it.
 		 */
 		if (!bitmap_get(reuse, base_pos))
-			return;
+			return 0;
 	}
 
 	/*
 	 * If we got here, then the object is OK to reuse. Mark it.
 	 */
 	bitmap_set(reuse, pos);
+	return 0;
 }
 
 int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
@@ -1233,10 +1238,23 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 				break;
 
 			offset += ewah_bit_ctz64(word >> offset);
-			try_partial_reuse(bitmap_git, pos + offset, reuse, &w_curs);
+			if (try_partial_reuse(bitmap_git, pos + offset, reuse,
+					      &w_curs) < 0) {
+				/*
+				 * try_partial_reuse indicated we couldn't reuse
+				 * any bits, so there is no point in trying more
+				 * bits in the current word, or any other words
+				 * in result.
+				 *
+				 * Jump out of both loops to avoid future
+				 * unnecessary calls to try_partial_reuse.
+				 */
+				goto done;
+			}
 		}
 	}
 
+done:
 	unuse_pack(&w_curs);
 
 	*entries = bitmap_popcount(reuse);
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 14/25] pack-bitmap: read multi-pack bitmaps
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (12 preceding siblings ...)
  2021-07-27 21:19   ` [PATCH v3 13/25] pack-bitmap.c: avoid redundant calls to try_partial_reuse Taylor Blau
@ 2021-07-27 21:19   ` Taylor Blau
  2021-07-27 21:20   ` [PATCH v3 15/25] pack-bitmap: write " Taylor Blau
                     ` (11 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:19 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

This prepares the code in pack-bitmap to interpret the new multi-pack
bitmaps described in Documentation/technical/bitmap-format.txt, which
mostly involves converting bit positions to accommodate looking them up
in a MIDX.

Note that there are currently no writers who write multi-pack bitmaps,
and that this will be implemented in the subsequent commit. Note also
that get_midx_checksum() and get_midx_filename() are made non-static so
they can be called from pack-bitmap.c.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c |   5 +
 midx.c                 |   4 +-
 midx.h                 |   2 +
 pack-bitmap-write.c    |   2 +-
 pack-bitmap.c          | 357 ++++++++++++++++++++++++++++++++++++-----
 pack-bitmap.h          |   6 +
 packfile.c             |   2 +-
 7 files changed, 336 insertions(+), 42 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 8a523624a1..e11d3ac2e5 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1124,6 +1124,11 @@ static void write_reused_pack(struct hashfile *f)
 				break;
 
 			offset += ewah_bit_ctz64(word >> offset);
+			/*
+			 * Can use bit positions directly, even for MIDX
+			 * bitmaps. See comment in try_partial_reuse()
+			 * for why.
+			 */
 			write_reused_pack_one(pos + offset, f, &w_curs);
 			display_progress(progress_state, ++written);
 		}
diff --git a/midx.c b/midx.c
index d67d7f383d..db21727c62 100644
--- a/midx.c
+++ b/midx.c
@@ -48,12 +48,12 @@ static uint8_t oid_version(void)
 	}
 }
 
-static const unsigned char *get_midx_checksum(struct multi_pack_index *m)
+const unsigned char *get_midx_checksum(struct multi_pack_index *m)
 {
 	return m->data + m->data_len - the_hash_algo->rawsz;
 }
 
-static char *get_midx_filename(const char *object_dir)
+char *get_midx_filename(const char *object_dir)
 {
 	return xstrfmt("%s/pack/multi-pack-index", object_dir);
 }
diff --git a/midx.h b/midx.h
index 8684cf0fef..1172df1a71 100644
--- a/midx.h
+++ b/midx.h
@@ -42,6 +42,8 @@ struct multi_pack_index {
 #define MIDX_PROGRESS     (1 << 0)
 #define MIDX_WRITE_REV_INDEX (1 << 1)
 
+const unsigned char *get_midx_checksum(struct multi_pack_index *m);
+char *get_midx_filename(const char *object_dir);
 char *get_midx_rev_filename(struct multi_pack_index *m);
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 142fd0adb8..9c55c1531e 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -48,7 +48,7 @@ void bitmap_writer_show_progress(int show)
 }
 
 /**
- * Build the initial type index for the packfile
+ * Build the initial type index for the packfile or multi-pack-index
  */
 void bitmap_writer_build_type_index(struct packing_data *to_pack,
 				    struct pack_idx_entry **index,
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 1442f0c8f2..f599646e19 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -13,6 +13,7 @@
 #include "repository.h"
 #include "object-store.h"
 #include "list-objects-filter-options.h"
+#include "midx.h"
 #include "config.h"
 
 /*
@@ -35,8 +36,15 @@ struct stored_bitmap {
  * the active bitmap index is the largest one.
  */
 struct bitmap_index {
-	/* Packfile to which this bitmap index belongs to */
+	/*
+	 * The pack or multi-pack index (MIDX) that this bitmap index belongs
+	 * to.
+	 *
+	 * Exactly one of these must be non-NULL; this specifies the object
+	 * order used to interpret this bitmap.
+	 */
 	struct packed_git *pack;
+	struct multi_pack_index *midx;
 
 	/*
 	 * Mark the first `reuse_objects` in the packfile as reused:
@@ -71,6 +79,9 @@ struct bitmap_index {
 	/* If not NULL, this is a name-hash cache pointing into map. */
 	uint32_t *hashes;
 
+	/* The checksum of the packfile or MIDX; points into map. */
+	const unsigned char *checksum;
+
 	/*
 	 * Extended index.
 	 *
@@ -138,6 +149,8 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
 
 static uint32_t bitmap_num_objects(struct bitmap_index *index)
 {
+	if (index->midx)
+		return index->midx->num_objects;
 	return index->pack->num_objects;
 }
 
@@ -175,6 +188,7 @@ static int load_bitmap_header(struct bitmap_index *index)
 	}
 
 	index->entry_count = ntohl(header->entry_count);
+	index->checksum = header->checksum;
 	index->map_pos += header_size;
 	return 0;
 }
@@ -227,6 +241,8 @@ static int nth_bitmap_object_oid(struct bitmap_index *index,
 				 struct object_id *oid,
 				 uint32_t n)
 {
+	if (index->midx)
+		return nth_midxed_object_oid(oid, index->midx, n) ? 0 : -1;
 	return nth_packed_object_id(oid, index->pack, n);
 }
 
@@ -274,7 +290,14 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 	return 0;
 }
 
-static char *pack_bitmap_filename(struct packed_git *p)
+char *midx_bitmap_filename(struct multi_pack_index *midx)
+{
+	return xstrfmt("%s-%s.bitmap",
+		       get_midx_filename(midx->object_dir),
+		       hash_to_hex(get_midx_checksum(midx)));
+}
+
+char *pack_bitmap_filename(struct packed_git *p)
 {
 	size_t len;
 
@@ -283,6 +306,57 @@ static char *pack_bitmap_filename(struct packed_git *p)
 	return xstrfmt("%.*s.bitmap", (int)len, p->pack_name);
 }
 
+static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
+			      struct multi_pack_index *midx)
+{
+	struct stat st;
+	char *idx_name = midx_bitmap_filename(midx);
+	int fd = git_open(idx_name);
+
+	free(idx_name);
+
+	if (fd < 0)
+		return -1;
+
+	if (fstat(fd, &st)) {
+		close(fd);
+		return -1;
+	}
+
+	if (bitmap_git->pack || bitmap_git->midx) {
+		/* ignore extra bitmap file; we can only handle one */
+		warning("ignoring extra bitmap file: %s",
+			get_midx_filename(midx->object_dir));
+		close(fd);
+		return -1;
+	}
+
+	bitmap_git->midx = midx;
+	bitmap_git->map_size = xsize_t(st.st_size);
+	bitmap_git->map_pos = 0;
+	bitmap_git->map = xmmap(NULL, bitmap_git->map_size, PROT_READ,
+				MAP_PRIVATE, fd, 0);
+	close(fd);
+
+	if (load_bitmap_header(bitmap_git) < 0)
+		goto cleanup;
+
+	if (!hasheq(get_midx_checksum(bitmap_git->midx), bitmap_git->checksum))
+		goto cleanup;
+
+	if (load_midx_revindex(bitmap_git->midx) < 0) {
+		warning(_("multi-pack bitmap is missing required reverse index"));
+		goto cleanup;
+	}
+	return 0;
+
+cleanup:
+	munmap(bitmap_git->map, bitmap_git->map_size);
+	bitmap_git->map_size = 0;
+	bitmap_git->map = NULL;
+	return -1;
+}
+
 static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git *packfile)
 {
 	int fd;
@@ -304,7 +378,8 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
 		return -1;
 	}
 
-	if (bitmap_git->pack) {
+	if (bitmap_git->pack || bitmap_git->midx) {
+		/* ignore extra bitmap file; we can only handle one */
 		warning("ignoring extra bitmap file: %s", packfile->pack_name);
 		close(fd);
 		return -1;
@@ -326,13 +401,39 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
 	return 0;
 }
 
-static int load_pack_bitmap(struct bitmap_index *bitmap_git)
+static int load_reverse_index(struct bitmap_index *bitmap_git)
+{
+	if (bitmap_is_midx(bitmap_git)) {
+		uint32_t i;
+		int ret;
+
+		/*
+		 * The multi-pack-index's .rev file is already loaded via
+		 * open_pack_bitmap_1().
+		 *
+		 * But we still need to open the individual pack .rev files,
+		 * since we will need to make use of them in pack-objects.
+		 */
+		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
+			if (prepare_midx_pack(the_repository, bitmap_git->midx, i))
+				die(_("load_reverse_index: could not open pack"));
+			ret = load_pack_revindex(bitmap_git->midx->packs[i]);
+			if (ret)
+				return ret;
+		}
+		return 0;
+	}
+	return load_pack_revindex(bitmap_git->pack);
+}
+
+static int load_bitmap(struct bitmap_index *bitmap_git)
 {
 	assert(bitmap_git->map);
 
 	bitmap_git->bitmaps = kh_init_oid_map();
 	bitmap_git->ext_index.positions = kh_init_oid_pos();
-	if (load_pack_revindex(bitmap_git->pack))
+
+	if (load_reverse_index(bitmap_git))
 		goto failed;
 
 	if (!(bitmap_git->commits = read_bitmap_1(bitmap_git)) ||
@@ -376,11 +477,47 @@ static int open_pack_bitmap(struct repository *r,
 	return ret;
 }
 
+static int open_midx_bitmap(struct repository *r,
+			    struct bitmap_index *bitmap_git)
+{
+	struct multi_pack_index *midx;
+
+	assert(!bitmap_git->map);
+
+	for (midx = get_multi_pack_index(r); midx; midx = midx->next) {
+		if (!open_midx_bitmap_1(bitmap_git, midx))
+			return 0;
+	}
+	return -1;
+}
+
+static int open_bitmap(struct repository *r,
+		       struct bitmap_index *bitmap_git)
+{
+	assert(!bitmap_git->map);
+
+	if (!open_midx_bitmap(r, bitmap_git))
+		return 0;
+	return open_pack_bitmap(r, bitmap_git);
+}
+
 struct bitmap_index *prepare_bitmap_git(struct repository *r)
 {
 	struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
 
-	if (!open_pack_bitmap(r, bitmap_git) && !load_pack_bitmap(bitmap_git))
+	if (!open_bitmap(r, bitmap_git) && !load_bitmap(bitmap_git))
+		return bitmap_git;
+
+	free_bitmap_index(bitmap_git);
+	return NULL;
+}
+
+struct bitmap_index *prepare_midx_bitmap_git(struct repository *r,
+					     struct multi_pack_index *midx)
+{
+	struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
+
+	if (!open_midx_bitmap_1(bitmap_git, midx) && !load_bitmap(bitmap_git))
 		return bitmap_git;
 
 	free_bitmap_index(bitmap_git);
@@ -430,10 +567,26 @@ static inline int bitmap_position_packfile(struct bitmap_index *bitmap_git,
 	return pos;
 }
 
+static int bitmap_position_midx(struct bitmap_index *bitmap_git,
+				const struct object_id *oid)
+{
+	uint32_t want, got;
+	if (!bsearch_midx(oid, bitmap_git->midx, &want))
+		return -1;
+
+	if (midx_to_pack_pos(bitmap_git->midx, want, &got) < 0)
+		return -1;
+	return got;
+}
+
 static int bitmap_position(struct bitmap_index *bitmap_git,
 			   const struct object_id *oid)
 {
-	int pos = bitmap_position_packfile(bitmap_git, oid);
+	int pos;
+	if (bitmap_is_midx(bitmap_git))
+		pos = bitmap_position_midx(bitmap_git, oid);
+	else
+		pos = bitmap_position_packfile(bitmap_git, oid);
 	return (pos >= 0) ? pos : bitmap_position_extended(bitmap_git, oid);
 }
 
@@ -744,6 +897,7 @@ static void show_objects_for_type(
 			continue;
 
 		for (offset = 0; offset < BITS_IN_EWORD; ++offset) {
+			struct packed_git *pack;
 			struct object_id oid;
 			uint32_t hash = 0, index_pos;
 			off_t ofs;
@@ -753,14 +907,28 @@ static void show_objects_for_type(
 
 			offset += ewah_bit_ctz64(word >> offset);
 
-			index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset);
-			ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset);
-			nth_packed_object_id(&oid, bitmap_git->pack, index_pos);
+			if (bitmap_is_midx(bitmap_git)) {
+				struct multi_pack_index *m = bitmap_git->midx;
+				uint32_t pack_id;
+
+				index_pos = pack_pos_to_midx(m, pos + offset);
+				ofs = nth_midxed_offset(m, index_pos);
+				nth_midxed_object_oid(&oid, m, index_pos);
+
+				pack_id = nth_midxed_pack_int_id(m, index_pos);
+				pack = bitmap_git->midx->packs[pack_id];
+			} else {
+				index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset);
+				ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset);
+				nth_bitmap_object_oid(bitmap_git, &oid, index_pos);
+
+				pack = bitmap_git->pack;
+			}
 
 			if (bitmap_git->hashes)
 				hash = get_be32(bitmap_git->hashes + index_pos);
 
-			show_reach(&oid, object_type, 0, hash, bitmap_git->pack, ofs);
+			show_reach(&oid, object_type, 0, hash, pack, ofs);
 		}
 	}
 }
@@ -772,8 +940,13 @@ static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
 		struct object *object = roots->item;
 		roots = roots->next;
 
-		if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
-			return 1;
+		if (bitmap_is_midx(bitmap_git)) {
+			if (bsearch_midx(&object->oid, bitmap_git->midx, NULL))
+				return 1;
+		} else {
+			if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
+				return 1;
+		}
 	}
 
 	return 0;
@@ -859,14 +1032,26 @@ static void filter_bitmap_blob_none(struct bitmap_index *bitmap_git,
 static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 				     uint32_t pos)
 {
-	struct packed_git *pack = bitmap_git->pack;
 	unsigned long size;
 	struct object_info oi = OBJECT_INFO_INIT;
 
 	oi.sizep = &size;
 
 	if (pos < bitmap_num_objects(bitmap_git)) {
-		off_t ofs = pack_pos_to_offset(pack, pos);
+		struct packed_git *pack;
+		off_t ofs;
+
+		if (bitmap_is_midx(bitmap_git)) {
+			uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
+			uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
+
+			pack = bitmap_git->midx->packs[pack_id];
+			ofs = nth_midxed_offset(bitmap_git->midx, midx_pos);
+		} else {
+			pack = bitmap_git->pack;
+			ofs = pack_pos_to_offset(pack, pos);
+		}
+
 		if (packed_object_info(the_repository, pack, ofs, &oi) < 0) {
 			struct object_id oid;
 			nth_bitmap_object_oid(bitmap_git, &oid,
@@ -1047,7 +1232,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	/* try to open a bitmapped pack, but don't parse it yet
 	 * because we may not need to use it */
 	CALLOC_ARRAY(bitmap_git, 1);
-	if (open_pack_bitmap(revs->repo, bitmap_git) < 0)
+	if (open_bitmap(revs->repo, bitmap_git) < 0)
 		goto cleanup;
 
 	for (i = 0; i < revs->pending.nr; ++i) {
@@ -1091,7 +1276,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	 * from disk. this is the point of no return; after this the rev_list
 	 * becomes invalidated and we must perform the revwalk through bitmaps
 	 */
-	if (load_pack_bitmap(bitmap_git) < 0)
+	if (load_bitmap(bitmap_git) < 0)
 		goto cleanup;
 
 	object_array_clear(&revs->pending);
@@ -1139,19 +1324,43 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
  * reused, but you can keep feeding bits.
  */
 static int try_partial_reuse(struct bitmap_index *bitmap_git,
+			     struct packed_git *pack,
 			     size_t pos,
 			     struct bitmap *reuse,
 			     struct pack_window **w_curs)
 {
-	off_t offset, header;
+	off_t offset, delta_obj_offset;
 	enum object_type type;
 	unsigned long size;
 
-	if (pos >= bitmap_num_objects(bitmap_git))
-		return -1; /* not actually in the pack or MIDX */
+	/*
+	 * try_partial_reuse() is called either on (a) objects in the
+	 * bitmapped pack (in the case of a single-pack bitmap) or (b)
+	 * objects in the preferred pack of a multi-pack bitmap.
+	 * Importantly, the latter can pretend as if only a single pack
+	 * exists because:
+	 *
+	 *   - The first pack->num_objects bits of a MIDX bitmap are
+	 *     reserved for the preferred pack, and
+	 *
+	 *   - Ties due to duplicate objects are always resolved in
+	 *     favor of the preferred pack.
+	 *
+	 * Therefore we do not need to ever ask the MIDX for its copy of
+	 * an object by OID, since it will always select it from the
+	 * preferred pack. Likewise, the selected copy of the base
+	 * object for any deltas will reside in the same pack.
+	 *
+	 * This means that we can reuse pos when looking up the bit in
+	 * the reuse bitmap, too, since bits corresponding to the
+	 * preferred pack precede all bits from other packs.
+	 */
 
-	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
-	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
+	if (pos >= pack->num_objects)
+		return -1; /* not actually in the pack or MIDX preferred pack */
+
+	offset = delta_obj_offset = pack_pos_to_offset(pack, pos);
+	type = unpack_object_header(pack, w_curs, &offset, &size);
 	if (type < 0)
 		return -1; /* broken packfile, punt */
 
@@ -1167,11 +1376,11 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
 		 * and the normal slow path will complain about it in
 		 * more detail.
 		 */
-		base_offset = get_delta_base(bitmap_git->pack, w_curs,
-					     &offset, type, header);
+		base_offset = get_delta_base(pack, w_curs, &offset, type,
+					     delta_obj_offset);
 		if (!base_offset)
 			return 0;
-		if (offset_to_pack_pos(bitmap_git->pack, base_offset, &base_pos) < 0)
+		if (offset_to_pack_pos(pack, base_offset, &base_pos) < 0)
 			return 0;
 
 		/*
@@ -1205,24 +1414,48 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
 	return 0;
 }
 
+static uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git)
+{
+	struct multi_pack_index *m = bitmap_git->midx;
+	if (!m)
+		BUG("midx_preferred_pack: requires non-empty MIDX");
+	return nth_midxed_pack_int_id(m, pack_pos_to_midx(bitmap_git->midx, 0));
+}
+
 int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 				       struct packed_git **packfile_out,
 				       uint32_t *entries,
 				       struct bitmap **reuse_out)
 {
+	struct packed_git *pack;
 	struct bitmap *result = bitmap_git->result;
 	struct bitmap *reuse;
 	struct pack_window *w_curs = NULL;
 	size_t i = 0;
 	uint32_t offset;
-	uint32_t objects_nr = bitmap_num_objects(bitmap_git);
+	uint32_t objects_nr;
 
 	assert(result);
 
+	load_reverse_index(bitmap_git);
+
+	if (bitmap_is_midx(bitmap_git))
+		pack = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)];
+	else
+		pack = bitmap_git->pack;
+	objects_nr = pack->num_objects;
+
 	while (i < result->word_alloc && result->words[i] == (eword_t)~0)
 		i++;
 
-	/* Don't mark objects not in the packfile */
+	/*
+	 * Don't mark objects not in the packfile or preferred pack. This bitmap
+	 * marks objects eligible for reuse, but the pack-reuse code only
+	 * understands how to reuse a single pack. Since the preferred pack is
+	 * guaranteed to have all bases for its deltas (in a multi-pack bitmap),
+	 * we use it instead of another pack. In single-pack bitmaps, the choice
+	 * is made for us.
+	 */
 	if (i > objects_nr / BITS_IN_EWORD)
 		i = objects_nr / BITS_IN_EWORD;
 
@@ -1238,8 +1471,8 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 				break;
 
 			offset += ewah_bit_ctz64(word >> offset);
-			if (try_partial_reuse(bitmap_git, pos + offset, reuse,
-					      &w_curs) < 0) {
+			if (try_partial_reuse(bitmap_git, pack, pos + offset,
+					      reuse, &w_curs) < 0) {
 				/*
 				 * try_partial_reuse indicated we couldn't reuse
 				 * any bits, so there is no point in trying more
@@ -1268,7 +1501,7 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 	 * need to be handled separately.
 	 */
 	bitmap_and_not(result, reuse);
-	*packfile_out = bitmap_git->pack;
+	*packfile_out = pack;
 	*reuse_out = reuse;
 	return 0;
 }
@@ -1542,6 +1775,12 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 	uint32_t i, num_objects;
 	uint32_t *reposition;
 
+	if (!bitmap_is_midx(bitmap_git))
+		load_reverse_index(bitmap_git);
+	else if (load_midx_revindex(bitmap_git->midx) < 0)
+		BUG("rebuild_existing_bitmaps: missing required rev-cache "
+		    "extension");
+
 	num_objects = bitmap_num_objects(bitmap_git);
 	CALLOC_ARRAY(reposition, num_objects);
 
@@ -1549,8 +1788,13 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 		struct object_id oid;
 		struct object_entry *oe;
 
-		nth_packed_object_id(&oid, bitmap_git->pack,
-				     pack_pos_to_index(bitmap_git->pack, i));
+		if (bitmap_is_midx(bitmap_git))
+			nth_midxed_object_oid(&oid,
+					      bitmap_git->midx,
+					      pack_pos_to_midx(bitmap_git->midx, i));
+		else
+			nth_packed_object_id(&oid, bitmap_git->pack,
+					     pack_pos_to_index(bitmap_git->pack, i));
 		oe = packlist_find(mapping, &oid);
 
 		if (oe)
@@ -1576,6 +1820,19 @@ void free_bitmap_index(struct bitmap_index *b)
 	free(b->ext_index.hashes);
 	bitmap_free(b->result);
 	bitmap_free(b->haves);
+	if (bitmap_is_midx(b)) {
+		/*
+		 * Multi-pack bitmaps need to have resources associated with
+		 * their on-disk reverse indexes unmapped so that stale .rev and
+		 * .bitmap files can be removed.
+		 *
+		 * Unlike pack-based bitmaps, multi-pack bitmaps can be read and
+		 * written in the same 'git multi-pack-index write --bitmap'
+		 * process. Close resources so they can be removed safely on
+		 * platforms like Windows.
+		 */
+		close_midx_revindex(b->midx);
+	}
 	free(b);
 }
 
@@ -1590,7 +1847,6 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
 				     enum object_type object_type)
 {
 	struct bitmap *result = bitmap_git->result;
-	struct packed_git *pack = bitmap_git->pack;
 	off_t total = 0;
 	struct ewah_iterator it;
 	eword_t filter;
@@ -1607,15 +1863,35 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
 			continue;
 
 		for (offset = 0; offset < BITS_IN_EWORD; offset++) {
-			size_t pos;
-
 			if ((word >> offset) == 0)
 				break;
 
 			offset += ewah_bit_ctz64(word >> offset);
-			pos = base + offset;
-			total += pack_pos_to_offset(pack, pos + 1) -
-				 pack_pos_to_offset(pack, pos);
+
+			if (bitmap_is_midx(bitmap_git)) {
+				uint32_t pack_pos;
+				uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, base + offset);
+				off_t offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
+
+				uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
+				struct packed_git *pack = bitmap_git->midx->packs[pack_id];
+
+				if (offset_to_pack_pos(pack, offset, &pack_pos) < 0) {
+					struct object_id oid;
+					nth_midxed_object_oid(&oid, bitmap_git->midx, midx_pos);
+
+					die(_("could not find %s in pack %s at offset %"PRIuMAX),
+					    oid_to_hex(&oid),
+					    pack->pack_name,
+					    (uintmax_t)offset);
+				}
+
+				total += pack_pos_to_offset(pack, pack_pos + 1) - offset;
+			} else {
+				size_t pos = base + offset;
+				total += pack_pos_to_offset(bitmap_git->pack, pos + 1) -
+					 pack_pos_to_offset(bitmap_git->pack, pos);
+			}
 		}
 	}
 
@@ -1666,6 +1942,11 @@ off_t get_disk_usage_from_bitmap(struct bitmap_index *bitmap_git,
 	return total;
 }
 
+int bitmap_is_midx(struct bitmap_index *bitmap_git)
+{
+	return !!bitmap_git->midx;
+}
+
 const struct string_list *bitmap_preferred_tips(struct repository *r)
 {
 	return repo_config_get_value_multi(r, "pack.preferbitmaptips");
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 52ea10de51..81664f933f 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -44,6 +44,8 @@ typedef int (*show_reachable_fn)(
 struct bitmap_index;
 
 struct bitmap_index *prepare_bitmap_git(struct repository *r);
+struct bitmap_index *prepare_midx_bitmap_git(struct repository *r,
+					     struct multi_pack_index *midx);
 void count_bitmap_commit_list(struct bitmap_index *, uint32_t *commits,
 			      uint32_t *trees, uint32_t *blobs, uint32_t *tags);
 void traverse_bitmap_commit_list(struct bitmap_index *,
@@ -92,6 +94,10 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint32_t index_nr,
 			  const char *filename,
 			  uint16_t options);
+char *midx_bitmap_filename(struct multi_pack_index *midx);
+char *pack_bitmap_filename(struct packed_git *p);
+
+int bitmap_is_midx(struct bitmap_index *bitmap_git);
 
 const struct string_list *bitmap_preferred_tips(struct repository *r);
 int bitmap_is_preferred_refname(struct repository *r, const char *refname);
diff --git a/packfile.c b/packfile.c
index 9ef6d98292..371f5488cf 100644
--- a/packfile.c
+++ b/packfile.c
@@ -860,7 +860,7 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
 	if (!strcmp(file_name, "multi-pack-index"))
 		return;
 	if (starts_with(file_name, "multi-pack-index") &&
-	    ends_with(file_name, ".rev"))
+	    (ends_with(file_name, ".bitmap") || ends_with(file_name, ".rev")))
 		return;
 	if (ends_with(file_name, ".idx") ||
 	    ends_with(file_name, ".rev") ||
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 15/25] pack-bitmap: write multi-pack bitmaps
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (13 preceding siblings ...)
  2021-07-27 21:19   ` [PATCH v3 14/25] pack-bitmap: read multi-pack bitmaps Taylor Blau
@ 2021-07-27 21:20   ` Taylor Blau
  2021-07-27 21:20   ` [PATCH v3 16/25] t5310: move some tests to lib-bitmap.sh Taylor Blau
                     ` (10 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:20 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Write multi-pack bitmaps in the format described by
Documentation/technical/bitmap-format.txt, inferring their presence with
the absence of '--bitmap'.

To write a multi-pack bitmap, this patch attempts to reuse as much of
the existing machinery from pack-objects as possible. Specifically, the
MIDX code prepares a packing_data struct that pretends as if a single
packfile has been generated containing all of the objects contained
within the MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt |  12 +-
 builtin/multi-pack-index.c             |   2 +
 midx.c                                 | 208 ++++++++++++++++++++++++-
 midx.h                                 |   1 +
 4 files changed, 214 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index c9b063d31e..ed52459a9d 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -10,7 +10,7 @@ SYNOPSIS
 --------
 [verse]
 'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress]
-	[--preferred-pack=<pack>] <subcommand>
+	[--preferred-pack=<pack>] [--[no-]bitmap] <subcommand>
 
 DESCRIPTION
 -----------
@@ -40,6 +40,9 @@ write::
 		multiple packs contain the same object. `<pack>` must
 		contain at least one object. If not given, ties are
 		broken in favor of the pack with the lowest mtime.
+
+	--[no-]bitmap::
+		Control whether or not a multi-pack bitmap is written.
 --
 
 verify::
@@ -81,6 +84,13 @@ EXAMPLES
 $ git multi-pack-index write
 -----------------------------------------------
 
+* Write a MIDX file for the packfiles in the current .git folder with a
+corresponding bitmap.
++
+-------------------------------------------------------------
+$ git multi-pack-index write --preferred-pack=<pack> --bitmap
+-------------------------------------------------------------
+
 * Write a MIDX file for the packfiles in an alternate object store.
 +
 -----------------------------------------------
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 5d3ea445fd..bf6fa982e3 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -68,6 +68,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 		OPT_STRING(0, "preferred-pack", &opts.preferred_pack,
 			   N_("preferred-pack"),
 			   N_("pack for reuse when computing a multi-pack bitmap")),
+		OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
+			MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
 		OPT_END(),
 	};
 
diff --git a/midx.c b/midx.c
index db21727c62..ce6cc62c20 100644
--- a/midx.c
+++ b/midx.c
@@ -13,6 +13,10 @@
 #include "repository.h"
 #include "chunk-format.h"
 #include "pack.h"
+#include "pack-bitmap.h"
+#include "refs.h"
+#include "revision.h"
+#include "list-objects.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
@@ -893,6 +897,166 @@ static int midx_checksum_valid(struct multi_pack_index *m)
 	return hashfile_checksum_valid(m->data, m->data_len);
 }
 
+static void prepare_midx_packing_data(struct packing_data *pdata,
+				      struct write_midx_context *ctx)
+{
+	uint32_t i;
+
+	memset(pdata, 0, sizeof(struct packing_data));
+	prepare_packing_data(the_repository, pdata);
+
+	for (i = 0; i < ctx->entries_nr; i++) {
+		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
+		struct object_entry *to = packlist_alloc(pdata, &from->oid);
+
+		oe_set_in_pack(pdata, to,
+			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
+	}
+}
+
+static int add_ref_to_pending(const char *refname,
+			      const struct object_id *oid,
+			      int flag, void *cb_data)
+{
+	struct rev_info *revs = (struct rev_info*)cb_data;
+	struct object *object;
+
+	if ((flag & REF_ISSYMREF) && (flag & REF_ISBROKEN)) {
+		warning("symbolic ref is dangling: %s", refname);
+		return 0;
+	}
+
+	object = parse_object_or_die(oid, refname);
+	if (object->type != OBJ_COMMIT)
+		return 0;
+
+	add_pending_object(revs, object, "");
+	if (bitmap_is_preferred_refname(revs->repo, refname))
+		object->flags |= NEEDS_BITMAP;
+	return 0;
+}
+
+struct bitmap_commit_cb {
+	struct commit **commits;
+	size_t commits_nr, commits_alloc;
+
+	struct write_midx_context *ctx;
+};
+
+static const struct object_id *bitmap_oid_access(size_t index,
+						 const void *_entries)
+{
+	const struct pack_midx_entry *entries = _entries;
+	return &entries[index].oid;
+}
+
+static void bitmap_show_commit(struct commit *commit, void *_data)
+{
+	struct bitmap_commit_cb *data = _data;
+	int pos = oid_pos(&commit->object.oid, data->ctx->entries,
+			  data->ctx->entries_nr,
+			  bitmap_oid_access);
+	if (pos < 0)
+		return;
+
+	ALLOC_GROW(data->commits, data->commits_nr + 1, data->commits_alloc);
+	data->commits[data->commits_nr++] = commit;
+}
+
+static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr_p,
+						    struct write_midx_context *ctx)
+{
+	struct rev_info revs;
+	struct bitmap_commit_cb cb = {0};
+
+	cb.ctx = ctx;
+
+	repo_init_revisions(the_repository, &revs, NULL);
+	setup_revisions(0, NULL, &revs, NULL);
+	for_each_ref(add_ref_to_pending, &revs);
+
+	/*
+	 * Skipping promisor objects here is intentional, since it only excludes
+	 * them from the list of reachable commits that we want to select from
+	 * when computing the selection of MIDX'd commits to receive bitmaps.
+	 *
+	 * Reachability bitmaps do require that their objects be closed under
+	 * reachability, but fetching any objects missing from promisors at this
+	 * point is too late. But, if one of those objects can be reached from
+	 * an another object that is included in the bitmap, then we will
+	 * complain later that we don't have reachability closure (and fail
+	 * appropriately).
+	 */
+	fetch_if_missing = 0;
+	revs.exclude_promisor_objects = 1;
+
+	if (prepare_revision_walk(&revs))
+		die(_("revision walk setup failed"));
+
+	traverse_commit_list(&revs, bitmap_show_commit, NULL, &cb);
+	if (indexed_commits_nr_p)
+		*indexed_commits_nr_p = cb.commits_nr;
+
+	return cb.commits;
+}
+
+static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
+			     struct write_midx_context *ctx,
+			     unsigned flags)
+{
+	struct packing_data pdata;
+	struct pack_idx_entry **index;
+	struct commit **commits = NULL;
+	uint32_t i, commits_nr;
+	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
+	int ret;
+
+	prepare_midx_packing_data(&pdata, ctx);
+
+	commits = find_commits_for_midx_bitmap(&commits_nr, ctx);
+
+	/*
+	 * Build the MIDX-order index based on pdata.objects (which is already
+	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
+	 * this order).
+	 */
+	ALLOC_ARRAY(index, pdata.nr_objects);
+	for (i = 0; i < pdata.nr_objects; i++)
+		index[i] = &pdata.objects[i].idx;
+
+	bitmap_writer_show_progress(flags & MIDX_PROGRESS);
+	bitmap_writer_build_type_index(&pdata, index, pdata.nr_objects);
+
+	/*
+	 * bitmap_writer_finish expects objects in lex order, but pack_order
+	 * gives us exactly that. use it directly instead of re-sorting the
+	 * array.
+	 *
+	 * This changes the order of objects in 'index' between
+	 * bitmap_writer_build_type_index and bitmap_writer_finish.
+	 *
+	 * The same re-ordering takes place in the single-pack bitmap code via
+	 * write_idx_file(), which is called by finish_tmp_packfile(), which
+	 * happens between bitmap_writer_build_type_index() and
+	 * bitmap_writer_finish().
+	 */
+	for (i = 0; i < pdata.nr_objects; i++)
+		index[ctx->pack_order[i]] = &pdata.objects[i].idx;
+
+	bitmap_writer_select_commits(commits, commits_nr, -1);
+	ret = bitmap_writer_build(&pdata);
+	if (ret < 0)
+		goto cleanup;
+
+	bitmap_writer_set_checksum(midx_hash);
+	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, 0);
+
+cleanup:
+	free(index);
+	free(bitmap_name);
+	return ret;
+}
+
 static int write_midx_internal(const char *object_dir,
 			       struct string_list *packs_to_drop,
 			       const char *preferred_pack_name,
@@ -940,7 +1104,7 @@ static int write_midx_internal(const char *object_dir,
 
 			ctx.info[ctx.nr].orig_pack_int_id = i;
 			ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]);
-			ctx.info[ctx.nr].p = NULL;
+			ctx.info[ctx.nr].p = ctx.m->packs[i];
 			ctx.info[ctx.nr].expired = 0;
 
 			if (flags & MIDX_WRITE_REV_INDEX) {
@@ -974,8 +1138,26 @@ static int write_midx_internal(const char *object_dir,
 	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
 	stop_progress(&ctx.progress);
 
-	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
-		goto cleanup;
+	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop) {
+		struct bitmap_index *bitmap_git;
+		int bitmap_exists;
+		int want_bitmap = flags & MIDX_WRITE_BITMAP;
+
+		bitmap_git = prepare_midx_bitmap_git(the_repository, ctx.m);
+		bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
+		free_bitmap_index(bitmap_git);
+
+		if (bitmap_exists || !want_bitmap) {
+			/*
+			 * The correct MIDX already exists, and so does a
+			 * corresponding bitmap (or one wasn't requested).
+			 */
+			if (!want_bitmap)
+				clear_midx_files_ext(the_repository, ".bitmap",
+						     NULL);
+			goto cleanup;
+		}
+	}
 
 	if (preferred_pack_name) {
 		int found = 0;
@@ -991,7 +1173,8 @@ static int write_midx_internal(const char *object_dir,
 		if (!found)
 			warning(_("unknown preferred pack: '%s'"),
 				preferred_pack_name);
-	} else if (ctx.nr && (flags & MIDX_WRITE_REV_INDEX)) {
+	} else if (ctx.nr &&
+		   (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
 		struct packed_git *oldest = ctx.info[ctx.preferred_pack_idx].p;
 		ctx.preferred_pack_idx = 0;
 
@@ -1123,9 +1306,6 @@ static int write_midx_internal(const char *object_dir,
 	hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
 	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
 
-	if (ctx.m)
-		close_midx(ctx.m);
-
 	if (ctx.nr - dropped_packs == 0) {
 		error(_("no pack files to index."));
 		result = 1;
@@ -1156,14 +1336,24 @@ static int write_midx_internal(const char *object_dir,
 	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
 
-	if (flags & MIDX_WRITE_REV_INDEX)
+	if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))
 		ctx.pack_order = midx_pack_order(&ctx);
 
 	if (flags & MIDX_WRITE_REV_INDEX)
 		write_midx_reverse_index(midx_name, midx_hash, &ctx);
+	if (flags & MIDX_WRITE_BITMAP) {
+		if (write_midx_bitmap(midx_name, midx_hash, &ctx, flags) < 0) {
+			error(_("could not write multi-pack bitmap"));
+			result = 1;
+			goto cleanup;
+		}
+	}
+
+	close_midx(ctx.m);
 
 	commit_lock_file(&lk);
 
+	clear_midx_files_ext(the_repository, ".bitmap", midx_hash);
 	clear_midx_files_ext(the_repository, ".rev", midx_hash);
 
 cleanup:
@@ -1180,6 +1370,7 @@ static int write_midx_internal(const char *object_dir,
 	free(ctx.pack_perm);
 	free(ctx.pack_order);
 	free(midx_name);
+
 	return result;
 }
 
@@ -1240,6 +1431,7 @@ void clear_midx_file(struct repository *r)
 	if (remove_path(midx))
 		die(_("failed to clear multi-pack-index at %s"), midx);
 
+	clear_midx_files_ext(r, ".bitmap", NULL);
 	clear_midx_files_ext(r, ".rev", NULL);
 
 	free(midx);
diff --git a/midx.h b/midx.h
index 1172df1a71..350f4d0a7b 100644
--- a/midx.h
+++ b/midx.h
@@ -41,6 +41,7 @@ struct multi_pack_index {
 
 #define MIDX_PROGRESS     (1 << 0)
 #define MIDX_WRITE_REV_INDEX (1 << 1)
+#define MIDX_WRITE_BITMAP (1 << 2)
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m);
 char *get_midx_filename(const char *object_dir);
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 16/25] t5310: move some tests to lib-bitmap.sh
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (14 preceding siblings ...)
  2021-07-27 21:20   ` [PATCH v3 15/25] pack-bitmap: write " Taylor Blau
@ 2021-07-27 21:20   ` Taylor Blau
  2021-08-12 20:25     ` Jeff King
  2021-07-27 21:20   ` [PATCH v3 17/25] t/helper/test-read-midx.c: add --checksum mode Taylor Blau
                     ` (9 subsequent siblings)
  25 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:20 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

We'll soon be adding a test script that will cover many of the same
bitmap concepts as t5310, but for MIDX bitmaps. Let's pull out as many
of the applicable tests as we can so we don't have to rewrite them.

There should be no functional change to t5310; we still run the same
operations in the same order.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/lib-bitmap.sh         | 236 ++++++++++++++++++++++++++++++++++++++++
 t/t5310-pack-bitmaps.sh | 227 +-------------------------------------
 2 files changed, 240 insertions(+), 223 deletions(-)

diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index fe3f98be24..ecb5d0e05d 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -1,3 +1,6 @@
+# Helpers for scripts testing bitamp functionality; see t5310 for
+# example usage.
+
 # Compare a file containing rev-list bitmap traversal output to its non-bitmap
 # counterpart. You can't just use test_cmp for this, because the two produce
 # subtly different output:
@@ -24,3 +27,236 @@ test_bitmap_traversal () {
 	test_cmp "$1.normalized" "$2.normalized" &&
 	rm -f "$1.normalized" "$2.normalized"
 }
+
+# To ensure the logic for "maximal commits" is exercised, make
+# the repository a bit more complicated.
+#
+#    other                         second
+#      *                             *
+# (99 commits)                  (99 commits)
+#      *                             *
+#      |\                           /|
+#      | * octo-other  octo-second * |
+#      |/|\_________  ____________/|\|
+#      | \          \/  __________/  |
+#      |  | ________/\ /             |
+#      *  |/          * merge-right  *
+#      | _|__________/ \____________ |
+#      |/ |                         \|
+# (l1) *  * merge-left               * (r1)
+#      | / \________________________ |
+#      |/                           \|
+# (l2) *                             * (r2)
+#       \___________________________ |
+#                                   \|
+#                                    * (base)
+#
+# We only push bits down the first-parent history, which
+# makes some of these commits unimportant!
+#
+# The important part for the maximal commit algorithm is how
+# the bitmasks are extended. Assuming starting bit positions
+# for second (bit 0) and other (bit 1), the bitmasks at the
+# end should be:
+#
+#      second: 1       (maximal, selected)
+#       other: 01      (maximal, selected)
+#      (base): 11 (maximal)
+#
+# This complicated history was important for a previous
+# version of the walk that guarantees never walking a
+# commit multiple times. That goal might be important
+# again, so preserve this complicated case. For now, this
+# test will guarantee that the bitmaps are computed
+# correctly, even with the repeat calculations.
+setup_bitmap_history() {
+	test_expect_success 'setup repo with moderate-sized history' '
+		test_commit_bulk --id=file 10 &&
+		git branch -M second &&
+		git checkout -b other HEAD~5 &&
+		test_commit_bulk --id=side 10 &&
+
+		# add complicated history setup, including merges and
+		# ambiguous merge-bases
+
+		git checkout -b merge-left other~2 &&
+		git merge second~2 -m "merge-left" &&
+
+		git checkout -b merge-right second~1 &&
+		git merge other~1 -m "merge-right" &&
+
+		git checkout -b octo-second second &&
+		git merge merge-left merge-right -m "octopus-second" &&
+
+		git checkout -b octo-other other &&
+		git merge merge-left merge-right -m "octopus-other" &&
+
+		git checkout other &&
+		git merge octo-other -m "pull octopus" &&
+
+		git checkout second &&
+		git merge octo-second -m "pull octopus" &&
+
+		# Remove these branches so they are not selected
+		# as bitmap tips
+		git branch -D merge-left &&
+		git branch -D merge-right &&
+		git branch -D octo-other &&
+		git branch -D octo-second &&
+
+		# add padding to make these merges less interesting
+		# and avoid having them selected for bitmaps
+		test_commit_bulk --id=file 100 &&
+		git checkout other &&
+		test_commit_bulk --id=side 100 &&
+		git checkout second &&
+
+		bitmaptip=$(git rev-parse second) &&
+		blob=$(echo tagged-blob | git hash-object -w --stdin) &&
+		git tag tagged-blob $blob
+	'
+}
+
+rev_list_tests_head () {
+	test_expect_success "counting commits via bitmap ($state, $branch)" '
+		git rev-list --count $branch >expect &&
+		git rev-list --use-bitmap-index --count $branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting partial commits via bitmap ($state, $branch)" '
+		git rev-list --count $branch~5..$branch >expect &&
+		git rev-list --use-bitmap-index --count $branch~5..$branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting commits with limit ($state, $branch)" '
+		git rev-list --count -n 1 $branch >expect &&
+		git rev-list --use-bitmap-index --count -n 1 $branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting non-linear history ($state, $branch)" '
+		git rev-list --count other...second >expect &&
+		git rev-list --use-bitmap-index --count other...second >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting commits with limiting ($state, $branch)" '
+		git rev-list --count $branch -- 1.t >expect &&
+		git rev-list --use-bitmap-index --count $branch -- 1.t >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting objects via bitmap ($state, $branch)" '
+		git rev-list --count --objects $branch >expect &&
+		git rev-list --use-bitmap-index --count --objects $branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "enumerate commits ($state, $branch)" '
+		git rev-list --use-bitmap-index $branch >actual &&
+		git rev-list $branch >expect &&
+		test_bitmap_traversal --no-confirm-bitmaps expect actual
+	'
+
+	test_expect_success "enumerate --objects ($state, $branch)" '
+		git rev-list --objects --use-bitmap-index $branch >actual &&
+		git rev-list --objects $branch >expect &&
+		test_bitmap_traversal expect actual
+	'
+
+	test_expect_success "bitmap --objects handles non-commit objects ($state, $branch)" '
+		git rev-list --objects --use-bitmap-index $branch tagged-blob >actual &&
+		grep $blob actual
+	'
+}
+
+rev_list_tests () {
+	state=$1
+
+	for branch in "second" "other"
+	do
+		rev_list_tests_head
+	done
+}
+
+basic_bitmap_tests () {
+	tip="$1"
+	test_expect_success 'rev-list --test-bitmap verifies bitmaps' "
+		git rev-list --test-bitmap "${tip:-HEAD}"
+	"
+
+	rev_list_tests 'full bitmap'
+
+	test_expect_success 'clone from bitmapped repository' '
+		rm -fr clone.git &&
+		git clone --no-local --bare . clone.git &&
+		git rev-parse HEAD >expect &&
+		git --git-dir=clone.git rev-parse HEAD >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success 'partial clone from bitmapped repository' '
+		test_config uploadpack.allowfilter true &&
+		rm -fr partial-clone.git &&
+		git clone --no-local --bare --filter=blob:none . partial-clone.git &&
+		(
+			cd partial-clone.git &&
+			pack=$(echo objects/pack/*.pack) &&
+			git verify-pack -v "$pack" >have &&
+			awk "/blob/ { print \$1 }" <have >blobs &&
+			# we expect this single blob because of the direct ref
+			git rev-parse refs/tags/tagged-blob >expect &&
+			test_cmp expect blobs
+		)
+	'
+
+	test_expect_success 'setup further non-bitmapped commits' '
+		test_commit_bulk --id=further 10
+	'
+
+	rev_list_tests 'partial bitmap'
+
+	test_expect_success 'fetch (partial bitmap)' '
+		git --git-dir=clone.git fetch origin second:second &&
+		git rev-parse HEAD >expect &&
+		git --git-dir=clone.git rev-parse HEAD >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success 'enumerating progress counts pack-reused objects' '
+		count=$(git rev-list --objects --all --count) &&
+		git repack -adb &&
+
+		# check first with only reused objects; confirm that our
+		# progress showed the right number, and also that we did
+		# pack-reuse as expected.  Check only the final "done"
+		# line of the meter (there may be an arbitrary number of
+		# intermediate lines ending with CR).
+		GIT_PROGRESS_DELAY=0 \
+			git pack-objects --all --stdout --progress \
+			</dev/null >/dev/null 2>stderr &&
+		grep "Enumerating objects: $count, done" stderr &&
+		grep "pack-reused $count" stderr &&
+
+		# now the same but with one non-reused object
+		git commit --allow-empty -m "an extra commit object" &&
+		GIT_PROGRESS_DELAY=0 \
+			git pack-objects --all --stdout --progress \
+			</dev/null >/dev/null 2>stderr &&
+		grep "Enumerating objects: $((count+1)), done" stderr &&
+		grep "pack-reused $count" stderr
+	'
+}
+
+# have_delta <obj> <expected_base>
+#
+# Note that because this relies on cat-file, it might find _any_ copy of an
+# object in the repository. The caller is responsible for making sure
+# there's only one (e.g., via "repack -ad", or having just fetched a copy).
+have_delta () {
+	echo $2 >expect &&
+	echo $1 | git cat-file --batch-check="%(deltabase)" >actual &&
+	test_cmp expect actual
+}
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index b02838750e..4318f84d53 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -25,93 +25,10 @@ has_any () {
 	grep -Ff "$1" "$2"
 }
 
-# To ensure the logic for "maximal commits" is exercised, make
-# the repository a bit more complicated.
-#
-#    other                         second
-#      *                             *
-# (99 commits)                  (99 commits)
-#      *                             *
-#      |\                           /|
-#      | * octo-other  octo-second * |
-#      |/|\_________  ____________/|\|
-#      | \          \/  __________/  |
-#      |  | ________/\ /             |
-#      *  |/          * merge-right  *
-#      | _|__________/ \____________ |
-#      |/ |                         \|
-# (l1) *  * merge-left               * (r1)
-#      | / \________________________ |
-#      |/                           \|
-# (l2) *                             * (r2)
-#       \___________________________ |
-#                                   \|
-#                                    * (base)
-#
-# We only push bits down the first-parent history, which
-# makes some of these commits unimportant!
-#
-# The important part for the maximal commit algorithm is how
-# the bitmasks are extended. Assuming starting bit positions
-# for second (bit 0) and other (bit 1), the bitmasks at the
-# end should be:
-#
-#      second: 1       (maximal, selected)
-#       other: 01      (maximal, selected)
-#      (base): 11 (maximal)
-#
-# This complicated history was important for a previous
-# version of the walk that guarantees never walking a
-# commit multiple times. That goal might be important
-# again, so preserve this complicated case. For now, this
-# test will guarantee that the bitmaps are computed
-# correctly, even with the repeat calculations.
+setup_bitmap_history
 
-test_expect_success 'setup repo with moderate-sized history' '
-	test_commit_bulk --id=file 10 &&
-	git branch -M second &&
-	git checkout -b other HEAD~5 &&
-	test_commit_bulk --id=side 10 &&
-
-	# add complicated history setup, including merges and
-	# ambiguous merge-bases
-
-	git checkout -b merge-left other~2 &&
-	git merge second~2 -m "merge-left" &&
-
-	git checkout -b merge-right second~1 &&
-	git merge other~1 -m "merge-right" &&
-
-	git checkout -b octo-second second &&
-	git merge merge-left merge-right -m "octopus-second" &&
-
-	git checkout -b octo-other other &&
-	git merge merge-left merge-right -m "octopus-other" &&
-
-	git checkout other &&
-	git merge octo-other -m "pull octopus" &&
-
-	git checkout second &&
-	git merge octo-second -m "pull octopus" &&
-
-	# Remove these branches so they are not selected
-	# as bitmap tips
-	git branch -D merge-left &&
-	git branch -D merge-right &&
-	git branch -D octo-other &&
-	git branch -D octo-second &&
-
-	# add padding to make these merges less interesting
-	# and avoid having them selected for bitmaps
-	test_commit_bulk --id=file 100 &&
-	git checkout other &&
-	test_commit_bulk --id=side 100 &&
-	git checkout second &&
-
-	bitmaptip=$(git rev-parse second) &&
-	blob=$(echo tagged-blob | git hash-object -w --stdin) &&
-	git tag tagged-blob $blob &&
-	git config repack.writebitmaps true
+test_expect_success 'setup writing bitmaps during repack' '
+	git config repack.writeBitmaps true
 '
 
 test_expect_success 'full repack creates bitmaps' '
@@ -123,109 +40,7 @@ test_expect_success 'full repack creates bitmaps' '
 	grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
 '
 
-test_expect_success 'rev-list --test-bitmap verifies bitmaps' '
-	git rev-list --test-bitmap HEAD
-'
-
-rev_list_tests_head () {
-	test_expect_success "counting commits via bitmap ($state, $branch)" '
-		git rev-list --count $branch >expect &&
-		git rev-list --use-bitmap-index --count $branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting partial commits via bitmap ($state, $branch)" '
-		git rev-list --count $branch~5..$branch >expect &&
-		git rev-list --use-bitmap-index --count $branch~5..$branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting commits with limit ($state, $branch)" '
-		git rev-list --count -n 1 $branch >expect &&
-		git rev-list --use-bitmap-index --count -n 1 $branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting non-linear history ($state, $branch)" '
-		git rev-list --count other...second >expect &&
-		git rev-list --use-bitmap-index --count other...second >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting commits with limiting ($state, $branch)" '
-		git rev-list --count $branch -- 1.t >expect &&
-		git rev-list --use-bitmap-index --count $branch -- 1.t >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting objects via bitmap ($state, $branch)" '
-		git rev-list --count --objects $branch >expect &&
-		git rev-list --use-bitmap-index --count --objects $branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "enumerate commits ($state, $branch)" '
-		git rev-list --use-bitmap-index $branch >actual &&
-		git rev-list $branch >expect &&
-		test_bitmap_traversal --no-confirm-bitmaps expect actual
-	'
-
-	test_expect_success "enumerate --objects ($state, $branch)" '
-		git rev-list --objects --use-bitmap-index $branch >actual &&
-		git rev-list --objects $branch >expect &&
-		test_bitmap_traversal expect actual
-	'
-
-	test_expect_success "bitmap --objects handles non-commit objects ($state, $branch)" '
-		git rev-list --objects --use-bitmap-index $branch tagged-blob >actual &&
-		grep $blob actual
-	'
-}
-
-rev_list_tests () {
-	state=$1
-
-	for branch in "second" "other"
-	do
-		rev_list_tests_head
-	done
-}
-
-rev_list_tests 'full bitmap'
-
-test_expect_success 'clone from bitmapped repository' '
-	git clone --no-local --bare . clone.git &&
-	git rev-parse HEAD >expect &&
-	git --git-dir=clone.git rev-parse HEAD >actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'partial clone from bitmapped repository' '
-	test_config uploadpack.allowfilter true &&
-	git clone --no-local --bare --filter=blob:none . partial-clone.git &&
-	(
-		cd partial-clone.git &&
-		pack=$(echo objects/pack/*.pack) &&
-		git verify-pack -v "$pack" >have &&
-		awk "/blob/ { print \$1 }" <have >blobs &&
-		# we expect this single blob because of the direct ref
-		git rev-parse refs/tags/tagged-blob >expect &&
-		test_cmp expect blobs
-	)
-'
-
-test_expect_success 'setup further non-bitmapped commits' '
-	test_commit_bulk --id=further 10
-'
-
-rev_list_tests 'partial bitmap'
-
-test_expect_success 'fetch (partial bitmap)' '
-	git --git-dir=clone.git fetch origin second:second &&
-	git rev-parse HEAD >expect &&
-	git --git-dir=clone.git rev-parse HEAD >actual &&
-	test_cmp expect actual
-'
+basic_bitmap_tests
 
 test_expect_success 'incremental repack fails when bitmaps are requested' '
 	test_commit more-1 &&
@@ -461,40 +276,6 @@ test_expect_success 'truncated bitmap fails gracefully (cache)' '
 	test_i18ngrep corrupted.bitmap.index stderr
 '
 
-test_expect_success 'enumerating progress counts pack-reused objects' '
-	count=$(git rev-list --objects --all --count) &&
-	git repack -adb &&
-
-	# check first with only reused objects; confirm that our progress
-	# showed the right number, and also that we did pack-reuse as expected.
-	# Check only the final "done" line of the meter (there may be an
-	# arbitrary number of intermediate lines ending with CR).
-	GIT_PROGRESS_DELAY=0 \
-		git pack-objects --all --stdout --progress \
-		</dev/null >/dev/null 2>stderr &&
-	grep "Enumerating objects: $count, done" stderr &&
-	grep "pack-reused $count" stderr &&
-
-	# now the same but with one non-reused object
-	git commit --allow-empty -m "an extra commit object" &&
-	GIT_PROGRESS_DELAY=0 \
-		git pack-objects --all --stdout --progress \
-		</dev/null >/dev/null 2>stderr &&
-	grep "Enumerating objects: $((count+1)), done" stderr &&
-	grep "pack-reused $count" stderr
-'
-
-# have_delta <obj> <expected_base>
-#
-# Note that because this relies on cat-file, it might find _any_ copy of an
-# object in the repository. The caller is responsible for making sure
-# there's only one (e.g., via "repack -ad", or having just fetched a copy).
-have_delta () {
-	echo $2 >expect &&
-	echo $1 | git cat-file --batch-check="%(deltabase)" >actual &&
-	test_cmp expect actual
-}
-
 # Create a state of history with these properties:
 #
 #  - refs that allow a client to fetch some new history, while sharing some old
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 17/25] t/helper/test-read-midx.c: add --checksum mode
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (15 preceding siblings ...)
  2021-07-27 21:20   ` [PATCH v3 16/25] t5310: move some tests to lib-bitmap.sh Taylor Blau
@ 2021-07-27 21:20   ` Taylor Blau
  2021-08-12 20:31     ` Jeff King
  2021-07-27 21:20   ` [PATCH v3 18/25] t5326: test multi-pack bitmap behavior Taylor Blau
                     ` (8 subsequent siblings)
  25 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:20 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Subsequent tests will want to check for the existence of a multi-pack
bitmap which matches the multi-pack-index stored in the pack directory.

The multi-pack bitmap includes the hex checksum of the MIDX it
corresponds to in its filename (for example,
'$packdir/multi-pack-index-<checksum>.bitmap'). As a result, some tests
want a way to learn what '<checksum>' is.

This helper addresses that need by printing the checksum of the
repository's multi-pack-index.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/helper/test-read-midx.c | 16 +++++++++++++++-
 t/lib-bitmap.sh           |  4 ++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 7c2eb11a8e..cb0d27049a 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -60,12 +60,26 @@ static int read_midx_file(const char *object_dir, int show_objects)
 	return 0;
 }
 
+static int read_midx_checksum(const char *object_dir)
+{
+	struct multi_pack_index *m;
+
+	setup_git_directory();
+	m = load_multi_pack_index(object_dir, 1);
+	if (!m)
+		return 1;
+	printf("%s\n", hash_to_hex(get_midx_checksum(m)));
+	return 0;
+}
+
 int cmd__read_midx(int argc, const char **argv)
 {
 	if (!(argc == 2 || argc == 3))
-		usage("read-midx [--show-objects] <object-dir>");
+		usage("read-midx [--show-objects|--checksum] <object-dir>");
 
 	if (!strcmp(argv[1], "--show-objects"))
 		return read_midx_file(argv[2], 1);
+	else if (!strcmp(argv[1], "--checksum"))
+		return read_midx_checksum(argv[2]);
 	return read_midx_file(argv[1], 0);
 }
diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index ecb5d0e05d..09cd036f4d 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -260,3 +260,7 @@ have_delta () {
 	echo $1 | git cat-file --batch-check="%(deltabase)" >actual &&
 	test_cmp expect actual
 }
+
+midx_checksum () {
+	test-tool read-midx --checksum "${1:-.git/objects}"
+}
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 18/25] t5326: test multi-pack bitmap behavior
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (16 preceding siblings ...)
  2021-07-27 21:20   ` [PATCH v3 17/25] t/helper/test-read-midx.c: add --checksum mode Taylor Blau
@ 2021-07-27 21:20   ` Taylor Blau
  2021-08-12 21:02     ` Jeff King
  2021-07-27 21:20   ` [PATCH v3 19/25] t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP Taylor Blau
                     ` (7 subsequent siblings)
  25 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:20 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

This patch introduces a new test, t5326, which tests the basic
functionality of multi-pack bitmaps.

Some trivial behavior is tested, such as:

  - Whether bitmaps can be generated with more than one pack.
  - Whether clones can be served with all objects in the bitmap.
  - Whether follow-up fetches can be served with some objects outside of
    the server's bitmap

These use lib-bitmap's tests (which in turn were pulled from t5310), and
we cover cases where the MIDX represents both a single pack and multiple
packs.

In addition, some non-trivial and MIDX-specific behavior is tested, too,
including:

  - Whether multi-pack bitmaps behave correctly with respect to the
    pack-reuse machinery when the base for some object is selected from
    a different pack than the delta.
  - Whether multi-pack bitmaps correctly respect the
    pack.preferBitmapTips configuration.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5326-multi-pack-bitmaps.sh | 277 ++++++++++++++++++++++++++++++++++
 1 file changed, 277 insertions(+)
 create mode 100755 t/t5326-multi-pack-bitmaps.sh

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
new file mode 100755
index 0000000000..c1b7d633e2
--- /dev/null
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -0,0 +1,277 @@
+#!/bin/sh
+
+test_description='exercise basic multi-pack bitmap functionality'
+. ./test-lib.sh
+. "${TEST_DIRECTORY}/lib-bitmap.sh"
+
+# We'll be writing our own midx and bitmaps, so avoid getting confused by the
+# automatic ones.
+GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
+objdir=.git/objects
+midx=$objdir/pack/multi-pack-index
+
+# midx_pack_source <obj>
+midx_pack_source () {
+	test-tool read-midx --show-objects .git/objects | grep "^$1 " | cut -f2
+}
+
+setup_bitmap_history
+
+test_expect_success 'enable core.multiPackIndex' '
+	git config core.multiPackIndex true
+'
+
+test_expect_success 'create single-pack midx with bitmaps' '
+	git repack -ad &&
+	git multi-pack-index write --bitmap &&
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
+'
+
+basic_bitmap_tests
+
+test_expect_success 'create new additional packs' '
+	for i in $(test_seq 1 16)
+	do
+		test_commit "$i" &&
+		git repack -d
+	done &&
+
+	git checkout -b other2 HEAD~8 &&
+	for i in $(test_seq 1 8)
+	do
+		test_commit "side-$i" &&
+		git repack -d
+	done &&
+	git checkout second
+'
+
+test_expect_success 'create multi-pack midx with bitmaps' '
+	git multi-pack-index write --bitmap &&
+
+	ls $objdir/pack/pack-*.pack >packs &&
+	test_line_count = 25 packs &&
+
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
+'
+
+basic_bitmap_tests
+
+test_expect_success '--no-bitmap is respected when bitmaps exist' '
+	git multi-pack-index write --bitmap &&
+
+	test_commit respect--no-bitmap &&
+	GIT_TEST_MULTI_PACK_INDEX=0 git repack -d &&
+
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+
+	git multi-pack-index write --no-bitmap &&
+
+	test_path_is_file $midx &&
+	test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
+'
+
+test_expect_success 'setup midx with base from later pack' '
+	# Write a and b so that "a" is a delta on top of base "b", since Git
+	# prefers to delete contents out of a base rather than add to a shorter
+	# object.
+	test_seq 1 128 >a &&
+	test_seq 1 130 >b &&
+
+	git add a b &&
+	git commit -m "initial commit" &&
+
+	a=$(git rev-parse HEAD:a) &&
+	b=$(git rev-parse HEAD:b) &&
+
+	# In the first pack, "a" is stored as a delta to "b".
+	p1=$(git pack-objects .git/objects/pack/pack <<-EOF
+	$a
+	$b
+	EOF
+	) &&
+
+	# In the second pack, "a" is missing, and "b" is not a delta nor base to
+	# any other object.
+	p2=$(git pack-objects .git/objects/pack/pack <<-EOF
+	$b
+	$(git rev-parse HEAD)
+	$(git rev-parse HEAD^{tree})
+	EOF
+	) &&
+
+	git prune-packed &&
+	# Use the second pack as the preferred source, so that "b" occurs
+	# earlier in the MIDX object order, rendering "a" unusable for pack
+	# reuse.
+	git multi-pack-index write --bitmap --preferred-pack=pack-$p2.idx &&
+
+	have_delta $a $b &&
+	test $(midx_pack_source $a) != $(midx_pack_source $b)
+'
+
+rev_list_tests 'full bitmap with backwards delta'
+
+test_expect_success 'clone with bitmaps enabled' '
+	git clone --no-local --bare . clone-reverse-delta.git &&
+	test_when_finished "rm -fr clone-reverse-delta.git" &&
+
+	git rev-parse HEAD >expect &&
+	git --git-dir=clone-reverse-delta.git rev-parse HEAD >actual &&
+	test_cmp expect actual
+'
+
+bitmap_reuse_tests() {
+	from=$1
+	to=$2
+
+	test_expect_success "setup pack reuse tests ($from -> $to)" '
+		rm -fr repo &&
+		git init repo &&
+		(
+			cd repo &&
+			test_commit_bulk 16 &&
+			git tag old-tip &&
+
+			git config core.multiPackIndex true &&
+			if test "MIDX" = "$from"
+			then
+				GIT_TEST_MULTI_PACK_INDEX=0 git repack -Ad &&
+				git multi-pack-index write --bitmap
+			else
+				GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb
+			fi
+		)
+	'
+
+	test_expect_success "build bitmap from existing ($from -> $to)" '
+		(
+			cd repo &&
+			test_commit_bulk --id=further 16 &&
+			git tag new-tip &&
+
+			if test "MIDX" = "$to"
+			then
+				GIT_TEST_MULTI_PACK_INDEX=0 git repack -d &&
+				git multi-pack-index write --bitmap
+			else
+				GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb
+			fi
+		)
+	'
+
+	test_expect_success "verify resulting bitmaps ($from -> $to)" '
+		(
+			cd repo &&
+			git for-each-ref &&
+			git rev-list --test-bitmap refs/tags/old-tip &&
+			git rev-list --test-bitmap refs/tags/new-tip
+		)
+	'
+}
+
+bitmap_reuse_tests 'pack' 'MIDX'
+bitmap_reuse_tests 'MIDX' 'pack'
+bitmap_reuse_tests 'MIDX' 'MIDX'
+
+test_expect_success 'missing object closure fails gracefully' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit loose &&
+		test_commit packed &&
+
+		# Do not pass "--revs"; we want a pack without the "loose"
+		# commit.
+		git pack-objects $objdir/pack/pack <<-EOF &&
+		$(git rev-parse packed)
+		EOF
+
+		test_must_fail git multi-pack-index write --bitmap 2>err &&
+		grep "doesn.t have full closure" err &&
+		test_path_is_missing $midx
+	)
+'
+
+test_expect_success 'setup partial bitmaps' '
+	test_commit packed &&
+	git repack &&
+	test_commit loose &&
+	git multi-pack-index write --bitmap 2>err &&
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
+'
+
+basic_bitmap_tests HEAD~
+
+test_expect_success 'removing a MIDX clears stale bitmaps' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+		test_commit base &&
+		git repack &&
+		git multi-pack-index write --bitmap &&
+
+		# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
+		stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
+		rm $midx &&
+
+		# Then write a new MIDX.
+		test_commit new &&
+		git repack &&
+		git multi-pack-index write --bitmap &&
+
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+		test_path_is_missing $stale_bitmap
+	)
+'
+
+test_expect_success 'pack.preferBitmapTips' '
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit_bulk --message="%s" 103 &&
+
+		git log --format="%H" >commits.raw &&
+		sort <commits.raw >commits &&
+
+		git log --format="create refs/tags/%s %H" HEAD >refs &&
+		git update-ref --stdin <refs &&
+
+		git multi-pack-index write --bitmap &&
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+
+		test-tool bitmap list-commits | sort >bitmaps &&
+		comm -13 bitmaps commits >before &&
+		test_line_count = 1 before &&
+
+		perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
+			<before | git update-ref --stdin &&
+
+		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
+		rm -fr $midx-$(midx_checksum $objdir).rev &&
+		rm -fr $midx &&
+
+		git -c pack.preferBitmapTips=refs/tags/include \
+			multi-pack-index write --bitmap &&
+		test-tool bitmap list-commits | sort >bitmaps &&
+		comm -13 bitmaps commits >after &&
+
+		! test_cmp before after
+	)
+'
+
+test_done
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 19/25] t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (17 preceding siblings ...)
  2021-07-27 21:20   ` [PATCH v3 18/25] t5326: test multi-pack bitmap behavior Taylor Blau
@ 2021-07-27 21:20   ` Taylor Blau
  2021-07-27 21:20   ` [PATCH v3 20/25] t5310: " Taylor Blau
                     ` (6 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:20 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

From: Jeff King <peff@peff.net>

Generating a MIDX bitmap causes tests which repack in a partial clone to
fail because they are missing objects. Missing objects is an expected
component of tests in t0410, so disable this knob altogether. Graceful
degradation when writing a bitmap with missing objects is tested in
t5326.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t0410-partial-clone.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index bbcc51ee8e..bba679685f 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -4,6 +4,9 @@ test_description='partial clone'
 
 . ./test-lib.sh
 
+# missing promisor objects cause repacks which write bitmaps to fail
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
 delete_object () {
 	rm $1/.git/objects/$(echo $2 | sed -e 's|^..|&/|')
 }
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 20/25] t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (18 preceding siblings ...)
  2021-07-27 21:20   ` [PATCH v3 19/25] t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP Taylor Blau
@ 2021-07-27 21:20   ` Taylor Blau
  2021-07-27 21:20   ` [PATCH v3 21/25] t5319: don't write MIDX bitmaps in t5319 Taylor Blau
                     ` (5 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:20 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

From: Jeff King <peff@peff.net>

Generating a MIDX bitmap confuses many of the tests in t5310, which
expect to control whether and how bitmaps are written. Since the
relevant MIDX-bitmap tests here are covered already in t5326, let's just
disable the flag for the whole t5310 script.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5310-pack-bitmaps.sh | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 4318f84d53..673baa5c3c 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -8,6 +8,10 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 . "$TEST_DIRECTORY"/lib-bundle.sh
 . "$TEST_DIRECTORY"/lib-bitmap.sh
 
+# t5310 deals only with single-pack bitmaps, so don't write MIDX bitmaps in
+# their place.
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
 objpath () {
 	echo ".git/objects/$(echo "$1" | sed -e 's|\(..\)|\1/|')"
 }
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 21/25] t5319: don't write MIDX bitmaps in t5319
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (19 preceding siblings ...)
  2021-07-27 21:20   ` [PATCH v3 20/25] t5310: " Taylor Blau
@ 2021-07-27 21:20   ` Taylor Blau
  2021-07-27 21:20   ` [PATCH v3 22/25] t7700: update to work with MIDX bitmap test knob Taylor Blau
                     ` (4 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:20 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

This test is specifically about generating a midx still respecting a
pack-based bitmap file. Generating a MIDX bitmap would confuse the test.
Let's override the 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' variable to
make sure we don't do so.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5319-multi-pack-index.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 1f0a2ae852..7b685957c6 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -504,7 +504,8 @@ test_expect_success 'repack preserves multi-pack-index when creating packs' '
 compare_results_with_midx "after repack"
 
 test_expect_success 'multi-pack-index and pack-bitmap' '
-	git -c repack.writeBitmaps=true repack -ad &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -c repack.writeBitmaps=true repack -ad &&
 	git multi-pack-index write &&
 	git rev-list --test-bitmap HEAD
 '
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 22/25] t7700: update to work with MIDX bitmap test knob
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (20 preceding siblings ...)
  2021-07-27 21:20   ` [PATCH v3 21/25] t5319: don't write MIDX bitmaps in t5319 Taylor Blau
@ 2021-07-27 21:20   ` Taylor Blau
  2021-07-27 21:20   ` [PATCH v3 23/25] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
                     ` (3 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:20 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A number of these tests are focused only on pack-based bitmaps and need
to be updated to disable 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' where
necessary.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t7700-repack.sh | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 25b235c063..98eda3bfeb 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -63,13 +63,14 @@ test_expect_success 'objects in packs marked .keep are not repacked' '
 
 test_expect_success 'writing bitmaps via command-line can duplicate .keep objects' '
 	# build on $oid, $packid, and .keep state from previous
-	git repack -Adbl &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 git repack -Adbl &&
 	test_has_duplicate_object true
 '
 
 test_expect_success 'writing bitmaps via config can duplicate .keep objects' '
 	# build on $oid, $packid, and .keep state from previous
-	git -c repack.writebitmaps=true repack -Adl &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -c repack.writebitmaps=true repack -Adl &&
 	test_has_duplicate_object true
 '
 
@@ -189,7 +190,9 @@ test_expect_success 'repack --keep-pack' '
 
 test_expect_success 'bitmaps are created by default in bare repos' '
 	git clone --bare .git bare.git &&
-	git -C bare.git repack -ad &&
+	rm -f bare.git/objects/pack/*.bitmap &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -C bare.git repack -ad &&
 	bitmap=$(ls bare.git/objects/pack/*.bitmap) &&
 	test_path_is_file "$bitmap"
 '
@@ -200,7 +203,8 @@ test_expect_success 'incremental repack does not complain' '
 '
 
 test_expect_success 'bitmaps can be disabled on bare repos' '
-	git -c repack.writeBitmaps=false -C bare.git repack -ad &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -c repack.writeBitmaps=false -C bare.git repack -ad &&
 	bitmap=$(ls bare.git/objects/pack/*.bitmap || :) &&
 	test -z "$bitmap"
 '
@@ -211,7 +215,8 @@ test_expect_success 'no bitmaps created if .keep files present' '
 	keep=${pack%.pack}.keep &&
 	test_when_finished "rm -f \"\$keep\"" &&
 	>"$keep" &&
-	git -C bare.git repack -ad 2>stderr &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -C bare.git repack -ad 2>stderr &&
 	test_must_be_empty stderr &&
 	find bare.git/objects/pack/ -type f -name "*.bitmap" >actual &&
 	test_must_be_empty actual
@@ -222,7 +227,8 @@ test_expect_success 'auto-bitmaps do not complain if unavailable' '
 	blob=$(test-tool genrandom big $((1024*1024)) |
 	       git -C bare.git hash-object -w --stdin) &&
 	git -C bare.git update-ref refs/tags/big $blob &&
-	git -C bare.git repack -ad 2>stderr &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -C bare.git repack -ad 2>stderr &&
 	test_must_be_empty stderr &&
 	find bare.git/objects/pack -type f -name "*.bitmap" >actual &&
 	test_must_be_empty actual
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 23/25] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (21 preceding siblings ...)
  2021-07-27 21:20   ` [PATCH v3 22/25] t7700: update to work with MIDX bitmap test knob Taylor Blau
@ 2021-07-27 21:20   ` Taylor Blau
  2021-08-12 21:09     ` Jeff King
  2021-07-27 21:20   ` [PATCH v3 24/25] p5310: extract full and partial bitmap tests Taylor Blau
                     ` (2 subsequent siblings)
  25 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:20 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Introduce a new 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' environment
variable to also write a multi-pack bitmap when
'GIT_TEST_MULTI_PACK_INDEX' is set.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c          | 12 ++++++++++--
 ci/run-build-and-tests.sh |  1 +
 midx.h                    |  2 ++
 t/README                  |  4 ++++
 4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 5f9bc74adc..82ab668272 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -515,6 +515,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		if (!(pack_everything & ALL_INTO_ONE) ||
 		    !is_bare_repository())
 			write_bitmaps = 0;
+	} else if (write_bitmaps &&
+		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0) &&
+		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0)) {
+		write_bitmaps = 0;
 	}
 	if (pack_kept_objects < 0)
 		pack_kept_objects = write_bitmaps > 0;
@@ -725,8 +729,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		update_server_info(0);
 	remove_temporary_files();
 
-	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
-		write_midx_file(get_object_directory(), NULL, 0);
+	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0)) {
+		unsigned flags = 0;
+		if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0))
+			flags |= MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX;
+		write_midx_file(get_object_directory(), NULL, flags);
+	}
 
 	string_list_clear(&names, 0);
 	string_list_clear(&rollback, 0);
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index 3ce81ffee9..7ee9ba9325 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -23,6 +23,7 @@ linux-gcc)
 	export GIT_TEST_COMMIT_GRAPH=1
 	export GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1
 	export GIT_TEST_MULTI_PACK_INDEX=1
+	export GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1
 	export GIT_TEST_ADD_I_USE_BUILTIN=1
 	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
 	export GIT_TEST_WRITE_REV_INDEX=1
diff --git a/midx.h b/midx.h
index 350f4d0a7b..aa3da557bb 100644
--- a/midx.h
+++ b/midx.h
@@ -8,6 +8,8 @@ struct pack_entry;
 struct repository;
 
 #define GIT_TEST_MULTI_PACK_INDEX "GIT_TEST_MULTI_PACK_INDEX"
+#define GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP \
+	"GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP"
 
 struct multi_pack_index {
 	struct multi_pack_index *next;
diff --git a/t/README b/t/README
index 9e70122302..12014aa988 100644
--- a/t/README
+++ b/t/README
@@ -425,6 +425,10 @@ GIT_TEST_MULTI_PACK_INDEX=<boolean>, when true, forces the multi-pack-
 index to be written after every 'git repack' command, and overrides the
 'core.multiPackIndex' setting to true.
 
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=<boolean>, when true, sets the
+'--bitmap' option on all invocations of 'git multi-pack-index write',
+and ignores pack-objects' '--write-bitmap-index'.
+
 GIT_TEST_SIDEBAND_ALL=<boolean>, when true, overrides the
 'uploadpack.allowSidebandAll' setting to true, and when false, forces
 fetch-pack to not request sideband-all (even if the server advertises
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 24/25] p5310: extract full and partial bitmap tests
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (22 preceding siblings ...)
  2021-07-27 21:20   ` [PATCH v3 23/25] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
@ 2021-07-27 21:20   ` Taylor Blau
  2021-07-27 21:20   ` [PATCH v3 25/25] p5326: perf tests for MIDX bitmaps Taylor Blau
  2021-08-12 21:21   ` [PATCH v3 00/25] multi-pack reachability bitmaps Jeff King
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:20 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A new p5326 introduced by the next patch will want these same tests,
interjecting its own setup in between. Move them out so that both perf
tests can reuse them.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/lib-bitmap.sh         | 69 ++++++++++++++++++++++++++++++++++++
 t/perf/p5310-pack-bitmaps.sh | 65 ++-------------------------------
 2 files changed, 72 insertions(+), 62 deletions(-)
 create mode 100644 t/perf/lib-bitmap.sh

diff --git a/t/perf/lib-bitmap.sh b/t/perf/lib-bitmap.sh
new file mode 100644
index 0000000000..63d3bc7cec
--- /dev/null
+++ b/t/perf/lib-bitmap.sh
@@ -0,0 +1,69 @@
+# Helper functions for testing bitmap performance; see p5310.
+
+test_full_bitmap () {
+	test_perf 'simulated clone' '
+		git pack-objects --stdout --all </dev/null >/dev/null
+	'
+
+	test_perf 'simulated fetch' '
+		have=$(git rev-list HEAD~100 -1) &&
+		{
+			echo HEAD &&
+			echo ^$have
+		} | git pack-objects --revs --stdout >/dev/null
+	'
+
+	test_perf 'pack to file (bitmap)' '
+		git pack-objects --use-bitmap-index --all pack1b </dev/null >/dev/null
+	'
+
+	test_perf 'rev-list (commits)' '
+		git rev-list --all --use-bitmap-index >/dev/null
+	'
+
+	test_perf 'rev-list (objects)' '
+		git rev-list --all --use-bitmap-index --objects >/dev/null
+	'
+
+	test_perf 'rev-list with tag negated via --not --all (objects)' '
+		git rev-list perf-tag --not --all --use-bitmap-index --objects >/dev/null
+	'
+
+	test_perf 'rev-list with negative tag (objects)' '
+		git rev-list HEAD --not perf-tag --use-bitmap-index --objects >/dev/null
+	'
+
+	test_perf 'rev-list count with blob:none' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=blob:none >/dev/null
+	'
+
+	test_perf 'rev-list count with blob:limit=1k' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=blob:limit=1k >/dev/null
+	'
+
+	test_perf 'rev-list count with tree:0' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=tree:0 >/dev/null
+	'
+
+	test_perf 'simulated partial clone' '
+		git pack-objects --stdout --all --filter=blob:none </dev/null >/dev/null
+	'
+}
+
+test_partial_bitmap () {
+	test_perf 'clone (partial bitmap)' '
+		git pack-objects --stdout --all </dev/null >/dev/null
+	'
+
+	test_perf 'pack to file (partial bitmap)' '
+		git pack-objects --use-bitmap-index --all pack2b </dev/null >/dev/null
+	'
+
+	test_perf 'rev-list with tree filter (partial bitmap)' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=tree:0 >/dev/null
+	'
+}
diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index 452be01056..7ad4f237bc 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -2,6 +2,7 @@
 
 test_description='Tests pack performance using bitmaps'
 . ./perf-lib.sh
+. "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
 
 test_perf_large_repo
 
@@ -25,56 +26,7 @@ test_perf 'repack to disk' '
 	git repack -ad
 '
 
-test_perf 'simulated clone' '
-	git pack-objects --stdout --all </dev/null >/dev/null
-'
-
-test_perf 'simulated fetch' '
-	have=$(git rev-list HEAD~100 -1) &&
-	{
-		echo HEAD &&
-		echo ^$have
-	} | git pack-objects --revs --stdout >/dev/null
-'
-
-test_perf 'pack to file (bitmap)' '
-	git pack-objects --use-bitmap-index --all pack1b </dev/null >/dev/null
-'
-
-test_perf 'rev-list (commits)' '
-	git rev-list --all --use-bitmap-index >/dev/null
-'
-
-test_perf 'rev-list (objects)' '
-	git rev-list --all --use-bitmap-index --objects >/dev/null
-'
-
-test_perf 'rev-list with tag negated via --not --all (objects)' '
-	git rev-list perf-tag --not --all --use-bitmap-index --objects >/dev/null
-'
-
-test_perf 'rev-list with negative tag (objects)' '
-	git rev-list HEAD --not perf-tag --use-bitmap-index --objects >/dev/null
-'
-
-test_perf 'rev-list count with blob:none' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=blob:none >/dev/null
-'
-
-test_perf 'rev-list count with blob:limit=1k' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=blob:limit=1k >/dev/null
-'
-
-test_perf 'rev-list count with tree:0' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=tree:0 >/dev/null
-'
-
-test_perf 'simulated partial clone' '
-	git pack-objects --stdout --all --filter=blob:none </dev/null >/dev/null
-'
+test_full_bitmap
 
 test_expect_success 'create partial bitmap state' '
 	# pick a commit to represent the repo tip in the past
@@ -97,17 +49,6 @@ test_expect_success 'create partial bitmap state' '
 	git update-ref HEAD $orig_tip
 '
 
-test_perf 'clone (partial bitmap)' '
-	git pack-objects --stdout --all </dev/null >/dev/null
-'
-
-test_perf 'pack to file (partial bitmap)' '
-	git pack-objects --use-bitmap-index --all pack2b </dev/null >/dev/null
-'
-
-test_perf 'rev-list with tree filter (partial bitmap)' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=tree:0 >/dev/null
-'
+test_partial_bitmap
 
 test_done
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v3 25/25] p5326: perf tests for MIDX bitmaps
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (23 preceding siblings ...)
  2021-07-27 21:20   ` [PATCH v3 24/25] p5310: extract full and partial bitmap tests Taylor Blau
@ 2021-07-27 21:20   ` Taylor Blau
  2021-08-12 21:18     ` Jeff King
  2021-08-12 21:21   ` [PATCH v3 00/25] multi-pack reachability bitmaps Jeff King
  25 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-07-27 21:20 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

These new performance tests demonstrate effectively the same behavior as
p5310, but use a multi-pack bitmap instead of a single-pack one.

Notably, p5326 does not create a MIDX bitmap with multiple packs. This
is so we can measure a direct comparison between it and p5310. Any
difference between the two is measuring just the overhead of using MIDX
bitmaps.

Here are the results of p5310 and p5326 together, measured at the same
time and on the same machine (using a Xenon W-2255 CPU):

    Test                                                  HEAD
    ------------------------------------------------------------------------
    5310.2: repack to disk                                96.78(93.39+11.33)
    5310.3: simulated clone                               9.98(9.79+0.19)
    5310.4: simulated fetch                               1.75(4.26+0.19)
    5310.5: pack to file (bitmap)                         28.20(27.87+8.70)
    5310.6: rev-list (commits)                            0.41(0.36+0.05)
    5310.7: rev-list (objects)                            1.61(1.54+0.07)
    5310.8: rev-list count with blob:none                 0.25(0.21+0.04)
    5310.9: rev-list count with blob:limit=1k             2.65(2.54+0.10)
    5310.10: rev-list count with tree:0                   0.23(0.19+0.04)
    5310.11: simulated partial clone                      4.34(4.21+0.12)
    5310.13: clone (partial bitmap)                       11.05(12.21+0.48)
    5310.14: pack to file (partial bitmap)                31.25(34.22+3.70)
    5310.15: rev-list with tree filter (partial bitmap)   0.26(0.22+0.04)

versus the same tests (this time using a multi-pack index):

    Test                                                  HEAD
    ------------------------------------------------------------------------
    5326.2: setup multi-pack index                        78.99(75.29+11.58)
    5326.3: simulated clone                               11.78(11.56+0.22)
    5326.4: simulated fetch                               1.70(4.49+0.13)
    5326.5: pack to file (bitmap)                         28.02(27.72+8.76)
    5326.6: rev-list (commits)                            0.42(0.36+0.06)
    5326.7: rev-list (objects)                            1.65(1.58+0.06)
    5326.8: rev-list count with blob:none                 0.26(0.21+0.05)
    5326.9: rev-list count with blob:limit=1k             2.97(2.86+0.10)
    5326.10: rev-list count with tree:0                   0.25(0.20+0.04)
    5326.11: simulated partial clone                      5.65(5.49+0.16)
    5326.13: clone (partial bitmap)                       12.22(13.43+0.38)
    5326.14: pack to file (partial bitmap)                30.05(31.57+7.25)
    5326.15: rev-list with tree filter (partial bitmap)   0.24(0.20+0.04)

There is slight overhead in "simulated clone", "simulated partial
clone", and "clone (partial bitmap)". Unsurprisingly, that overhead is
due to using the MIDX's reverse index to map between bit positions and
MIDX positions.

This can be reproduced by running "git repack -adb" along with "git
multi-pack-index write --bitmap" in a large-ish repository. Then run:

    $ perf record -o pack.perf git -c core.multiPackIndex=false \
      pack-objects --all --stdout >/dev/null </dev/null
    $ perf record -o midx.perf git -c core.multiPackIndex=true \
      pack-objects --all --stdout >/dev/null </dev/null

and compare the two with "perf diff -c delta -o 1 pack.perf midx.perf".
The most notable results are below (the next largest positive delta is
+0.14%):

    # Event 'cycles'
    #
    # Baseline    Delta  Shared Object       Symbol
    # ........  .......  ..................  ..........................
    #
                 +5.86%  git                 [.] nth_midxed_offset
                 +5.24%  git                 [.] nth_midxed_pack_int_id
         3.45%   +0.97%  git                 [.] offset_to_pack_pos
         3.30%   +0.57%  git                 [.] pack_pos_to_offset
                 +0.30%  git                 [.] pack_pos_to_midx

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/p5326-multi-pack-bitmaps.sh | 43 ++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)
 create mode 100755 t/perf/p5326-multi-pack-bitmaps.sh

diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
new file mode 100755
index 0000000000..5845109ac7
--- /dev/null
+++ b/t/perf/p5326-multi-pack-bitmaps.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+
+test_description='Tests performance using midx bitmaps'
+. ./perf-lib.sh
+. "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
+
+test_perf_large_repo
+
+test_expect_success 'enable multi-pack index' '
+	git config core.multiPackIndex true
+'
+
+test_perf 'setup multi-pack index' '
+	git repack -ad &&
+	git multi-pack-index write --bitmap
+'
+
+test_full_bitmap
+
+test_expect_success 'create partial bitmap state' '
+	# pick a commit to represent the repo tip in the past
+	cutoff=$(git rev-list HEAD~100 -1) &&
+	orig_tip=$(git rev-parse HEAD) &&
+
+	# now pretend we have just one tip
+	rm -rf .git/logs .git/refs/* .git/packed-refs &&
+	git update-ref HEAD $cutoff &&
+
+	# and then repack, which will leave us with a nice
+	# big bitmap pack of the "old" history, and all of
+	# the new history will be loose, as if it had been pushed
+	# up incrementally and exploded via unpack-objects
+	git repack -Ad &&
+	git multi-pack-index write --bitmap &&
+
+	# and now restore our original tip, as if the pushes
+	# had happened
+	git update-ref HEAD $orig_tip
+'
+
+test_partial_bitmap
+
+test_done
-- 
2.31.1.163.ga65ce7f831

^ permalink raw reply related	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing
  2021-07-27 20:05                         ` Taylor Blau
@ 2021-07-28 17:46                           ` Jeff King
  2021-07-29 19:44                             ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-28 17:46 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Tue, Jul 27, 2021 at 04:05:39PM -0400, Taylor Blau wrote:

> > I actually think having write_midx_internal() open up a new midx is
> > reasonable-ish. It's just that:
> >
> >   - it's weird when it stuffs duplicate packs into the
> >     r->objects->packed_git list. But AFAICT that's not actually hurting
> >     anything?
> 
> It is hurting us when we try to write a MIDX bitmap, because we try to
> see if one already exists. And to do that, we call prepare_bitmap_git(),
> which tries to call open_pack_bitmap_1 on *each* pack in the packed_git
> list. Critically, prepare_bitmap_git() errors out if it is called with a
> bitmap_git that has a non-NULL `->pack` pointer.

It doesn't error out. It does produce a warning(), though, if it ignores
a bitmap (and that warning is doubly confusing because it is ignoring
bitmap X because it has already loaded and will use that exact same X!).

This causes t7700.13 to fail because it is being picky about stderr
being empty.

So the overall behavior is correct, but I agree it's sufficiently ugly
that we should make sure it doesn't happen.

  Side note: IMHO the "check all packs to see if there are any other
  bitmaps to warn about" behavior is kind of pointless, and we should
  consider just returning as soon as we have one. This is already
  somewhat the case after your midx-bitmap patches, as we will not even
  bother to look for a pack bitmap after finding a midx bitmap. That is
  a good thing, because it means you can keep pack bitmaps around for
  flexibility. But let's leave any changes to the pack-only behavior out
  of this series for simplicity.

> I stepped away from my computer for an hour or so and thought about
> this, and I think that the solution is two-fold:
> 
>   - We should be more careful about freeing up the ->next pointers of a
>     MIDX, and releasing the memory we allocated to hold each MIDX struct
>     in the first place.

Yeah. This is a bug already before your series. I suspect nobody noticed
because it's very rare for us to call close_midx() at all, and it only
matters if there's an alternate odb with a midx. (The common call to
close_midx() is in these write paths, but it is always using a single
midx file).

>   - We should always be operating on the repository's
>     r->objects->multi_pack_index, or any other MIDX that can be reached
>     via walking the `->next` pointers. If we do that consistently, then
>     we'll only have at most one instance of a MIDX struct corresponding
>     to each MIDX file on disk.

Certainly that makes sense to me in terms of the Windows "must close the
current midx before writing" behavior. We have to realize that we're
operating in the current repo.

But we do allow an "--object-dir" option to "multi-pack-index write",
and I don't see any other code explicitly requiring that it be part of
the current repository. What I'm wondering is whether this would be
breaking:

  cd $REPO/..
  git multi-pack-index --object-dir $REPO/.git/objects write

or:

  cd /some/other/repo
  git multi-pack-index --object-dir $REPO/.git/objects write

The latter does seem to work, but the former segfaults (usually -- if
there's already a midx it is OK).

If it is broken now, this may be a good time to explicitly forbid it.
It does seem to make the --object-dir mostly pointless, though it would
still work for operating on a midx in one of your alternates. I'm not
sure I understand the original point of that option, and if the current
behavior is sufficient.

If it turns out that we can't forbid writing midx's besides the ones in
r->objects, it may be sufficient to just make any assumptions
conditional. I.e., _if_ it's one of the ones mentioned by r->objects,
then close it, but otherwise leave it open. But if we can get away with
restricting ourselves as you described, I think the result will be much
simpler, and we should prefer that.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 14/24] pack-bitmap: write multi-pack bitmaps
  2021-07-27 20:33           ` Taylor Blau
@ 2021-07-28 17:52             ` Jeff King
  2021-07-29 19:33               ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-07-28 17:52 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Tue, Jul 27, 2021 at 04:33:01PM -0400, Taylor Blau wrote:

> > It's interesting that your earlier iteration didn't call
> > open_pack_index(). Is it necessary, or not? From your description, it
> > seems like it should be. But maybe some later step lazy-loads it? Even
> > if so, I can see how prepare_midx_pack() would still be required
> > (because we want to make sure we are using the same struct).
> 
> It's only necessary now (at least for determining a preferred pack if
> the caller didn't specify one with `--preferred-pack`) because we care
> about reading the `num_objects` field, which the index must be loaded
> for.

I guess I'm a little confused about "now" in your sentence. I understand
that it's not necessary before your series to have loaded all of the
index files ahead of time. But didn't we need to do so in v2 of your
series, which has the preferred-pack logic?

If so, then was the v2 version buggy, since it only called
prepare_midx_pack() and not open_pack_index()? And then v3 is fixing
that? Or is something else opening the pack index for us?

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v3 09/25] midx: avoid opening multiple MIDXs when writing
  2021-07-27 21:19   ` [PATCH v3 09/25] midx: avoid opening multiple MIDXs when writing Taylor Blau
@ 2021-07-29 19:30     ` Taylor Blau
  2021-08-12 20:15     ` Jeff King
  1 sibling, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-07-29 19:30 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

On Tue, Jul 27, 2021 at 05:19:46PM -0400, Taylor Blau wrote:
> @@ -914,10 +915,14 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  		die_errno(_("unable to create leading directories of %s"),
>  			  midx_name);
>
> -	if (m)
> -		ctx.m = m;
> -	else
> -		ctx.m = load_multi_pack_index(object_dir, 1);
> +	for (cur = get_multi_pack_index(the_repository); cur; cur = cur->next) {
> +		if (!strcmp(object_dir, cur->object_dir)) {
> +			ctx.m = cur;
> +			break;
> +		}
> +	}
> +	if (!ctx.m)
> +		ctx.m = get_local_multi_pack_index(the_repository);

Oops, the `if (!ctx.m)` part of this diff is just plain wrong.

I think that I had in my mind that some callers don't pass object_dir,
and so that we should fall-back to the local MIDX in that case. And so I
probably meant to write `if (!object_dir && !ctx.m)` instead.

But, all of the callers *do* pass the result of get_object_directory(),
so we don't need to do anything of the sort.

On a related note, though, a side-effect of this change is that this
makes it no longer possible to do

    git multi-pack-index write --object-dir=/not/an/alternate.git/objects

since get_local_multi_pack_index() will only populate the MIDXs in
alternate object stores. We never enforced that `--object-dir` must
point to an alternate, but the documentation uses `<alt>` to describe
the argument to this flag, and accepting arbitrary non-alternate paths
seems like a footgun to me.

So I'm OK with "breaking" that behavior, as long as nobody complains
loudly. Obviously it makes the fix easier to write, but I'd argue that
the behavior we're losing is worth getting rid of anyway.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 14/24] pack-bitmap: write multi-pack bitmaps
  2021-07-28 17:52             ` Jeff King
@ 2021-07-29 19:33               ` Taylor Blau
  2021-08-12 20:00                 ` Jeff King
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-07-29 19:33 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 28, 2021 at 01:52:37PM -0400, Jeff King wrote:
> On Tue, Jul 27, 2021 at 04:33:01PM -0400, Taylor Blau wrote:
>
> > > It's interesting that your earlier iteration didn't call
> > > open_pack_index(). Is it necessary, or not? From your description, it
> > > seems like it should be. But maybe some later step lazy-loads it? Even
> > > if so, I can see how prepare_midx_pack() would still be required
> > > (because we want to make sure we are using the same struct).
> >
> > It's only necessary now (at least for determining a preferred pack if
> > the caller didn't specify one with `--preferred-pack`) because we care
> > about reading the `num_objects` field, which the index must be loaded
> > for.
>
> I guess I'm a little confused about "now" in your sentence. I understand
> that it's not necessary before your series to have loaded all of the
> index files ahead of time. But didn't we need to do so in v2 of your
> series, which has the preferred-pack logic?
>
> If so, then was the v2 version buggy, since it only called
> prepare_midx_pack() and not open_pack_index()? And then v3 is fixing
> that? Or is something else opening the pack index for us?

In earlier versions of this series, I don't think we needed to have the
indexes loaded by this point, since (before v3) we didn't care about
ignoring the empty packs when finding a default preferred-pack.

But now we do, and so we need to call open_pack_index() ourselves.
Confusingly, we only need to do that on packs that *are* included in the
MIDX, since prepare_midx_pack() doesn't do it for us, but
add_pack_to_midx() does.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing
  2021-07-28 17:46                           ` Jeff King
@ 2021-07-29 19:44                             ` Taylor Blau
  2021-08-12 19:59                               ` Jeff King
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-07-29 19:44 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 28, 2021 at 01:46:21PM -0400, Jeff King wrote:
> On Tue, Jul 27, 2021 at 04:05:39PM -0400, Taylor Blau wrote:
>
> > > I actually think having write_midx_internal() open up a new midx is
> > > reasonable-ish. It's just that:
> > >
> > >   - it's weird when it stuffs duplicate packs into the
> > >     r->objects->packed_git list. But AFAICT that's not actually hurting
> > >     anything?
> >
> > It is hurting us when we try to write a MIDX bitmap, because we try to
> > see if one already exists. And to do that, we call prepare_bitmap_git(),
> > which tries to call open_pack_bitmap_1 on *each* pack in the packed_git
> > list. Critically, prepare_bitmap_git() errors out if it is called with a
> > bitmap_git that has a non-NULL `->pack` pointer.
>
> It doesn't error out. It does produce a warning(), though, if it ignores
> a bitmap (and that warning is doubly confusing because it is ignoring
> bitmap X because it has already loaded and will use that exact same X!).
>
> This causes t7700.13 to fail because it is being picky about stderr
> being empty.

Right, sorry for suggesting that the error was more severe than it
actually is.

> So the overall behavior is correct, but I agree it's sufficiently ugly
> that we should make sure it doesn't happen.

100% agreed. I think the most unfortunate thing is the state of
r->objects->packed_git, since it's utterly bizarre to have the same pack
opened twice and have both of those copies in the list. That is
definitely worth preventing.

>   Side note: IMHO the "check all packs to see if there are any other
>   bitmaps to warn about" behavior is kind of pointless, and we should
>   consider just returning as soon as we have one. This is already
>   somewhat the case after your midx-bitmap patches, as we will not even
>   bother to look for a pack bitmap after finding a midx bitmap. That is
>   a good thing, because it means you can keep pack bitmaps around for
>   flexibility. But let's leave any changes to the pack-only behavior out
>   of this series for simplicity.

I agree. I'd be in favor of something like

diff --git a/pack-bitmap.c b/pack-bitmap.c
index f599646e19..5450ffb04c 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -378,13 +378,6 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
 		return -1;
 	}

-	if (bitmap_git->pack || bitmap_git->midx) {
-		/* ignore extra bitmap file; we can only handle one */
-		warning("ignoring extra bitmap file: %s", packfile->pack_name);
-		close(fd);
-		return -1;
-	}
-
 	bitmap_git->pack = packfile;
 	bitmap_git->map_size = xsize_t(st.st_size);
 	bitmap_git->map = xmmap(NULL, bitmap_git->map_size, PROT_READ, MAP_PRIVATE, fd, 0);
@@ -465,16 +458,15 @@ static int open_pack_bitmap(struct repository *r,
 			    struct bitmap_index *bitmap_git)
 {
 	struct packed_git *p;
-	int ret = -1;

 	assert(!bitmap_git->map);

 	for (p = get_all_packs(r); p; p = p->next) {
-		if (open_pack_bitmap_1(bitmap_git, p) == 0)
-			ret = 0;
+		if (!open_pack_bitmap_1(bitmap_git, p))
+			return 0;
 	}

-	return ret;
+	return -1;
 }

 static int open_midx_bitmap(struct repository *r,

...but agree that we should wait until after the dust has settled on
this already-complex series.

> >   - We should always be operating on the repository's
> >     r->objects->multi_pack_index, or any other MIDX that can be reached
> >     via walking the `->next` pointers. If we do that consistently, then
> >     we'll only have at most one instance of a MIDX struct corresponding
> >     to each MIDX file on disk.
>
> Certainly that makes sense to me in terms of the Windows "must close the
> current midx before writing" behavior. We have to realize that we're
> operating in the current repo.
>
> But we do allow an "--object-dir" option to "multi-pack-index write",
> and I don't see any other code explicitly requiring that it be part of
> the current repository. What I'm wondering is whether this would be
> breaking:
>
>   cd $REPO/..
>   git multi-pack-index --object-dir $REPO/.git/objects write
>
> or:
>
>   cd /some/other/repo
>   git multi-pack-index --object-dir $REPO/.git/objects write
>
> The latter does seem to work, but the former segfaults (usually -- if
> there's already a midx it is OK).

The former should work, but doesn't, because (as you pointed out to me
in our regular weekly discussion off-list) that the "multi-pack-index"
entry in git.c's commands array has the RUN_SETUP_GENTLY option, and
probably should have RUN_SETUP so that we complain with die() instead of
BUG.

And the latter will continue to work, but only if in your scenario that
$REPO is an alternate of /some/other/repo.

I wrote a little bit more in [1] about this behavior, but the upshot is
that we used to technically support passing *any* directory to
`--object-dir`, including directories that didn't belong to an
alternated repository.

And that will cease to work after the patch that [1] is in response to
is applied. But for the reasons that I explain there, I think that is a
sufficient outcome, because the behavior is kind of bizarre to begin
with.

[1]: https://lore.kernel.org/git/YQMB32fvSiH9julg@nand.local/

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing
  2021-07-29 19:44                             ` Taylor Blau
@ 2021-08-12 19:59                               ` Jeff King
  0 siblings, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-08-12 19:59 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Thu, Jul 29, 2021 at 03:44:34PM -0400, Taylor Blau wrote:

> >   Side note: IMHO the "check all packs to see if there are any other
> >   bitmaps to warn about" behavior is kind of pointless, and we should
> >   consider just returning as soon as we have one. This is already
> >   somewhat the case after your midx-bitmap patches, as we will not even
> >   bother to look for a pack bitmap after finding a midx bitmap. That is
> >   a good thing, because it means you can keep pack bitmaps around for
> >   flexibility. But let's leave any changes to the pack-only behavior out
> >   of this series for simplicity.
> 
> I agree. I'd be in favor of something like
> [...patch...]

Yep, that looks good. I'd be quite happy if you sent that once the dust
is settled.

> > But we do allow an "--object-dir" option to "multi-pack-index write",
> > and I don't see any other code explicitly requiring that it be part of
> > the current repository. What I'm wondering is whether this would be
> > breaking:
> >
> >   cd $REPO/..
> >   git multi-pack-index --object-dir $REPO/.git/objects write
> >
> > or:
> >
> >   cd /some/other/repo
> >   git multi-pack-index --object-dir $REPO/.git/objects write
> >
> > The latter does seem to work, but the former segfaults (usually -- if
> > there's already a midx it is OK).
> 
> The former should work, but doesn't, because (as you pointed out to me
> in our regular weekly discussion off-list) that the "multi-pack-index"
> entry in git.c's commands array has the RUN_SETUP_GENTLY option, and
> probably should have RUN_SETUP so that we complain with die() instead of
> BUG.
> 
> And the latter will continue to work, but only if in your scenario that
> $REPO is an alternate of /some/other/repo.
> 
> I wrote a little bit more in [1] about this behavior, but the upshot is
> that we used to technically support passing *any* directory to
> `--object-dir`, including directories that didn't belong to an
> alternated repository.
> 
> And that will cease to work after the patch that [1] is in response to
> is applied. But for the reasons that I explain there, I think that is a
> sufficient outcome, because the behavior is kind of bizarre to begin
> with.

Yeah, I think I am comfortable with the change at this point. The only
case that will be broken is one that is quite ridiculous, and I am
surprised worked in the first place. Thanks for talking it through.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v2 14/24] pack-bitmap: write multi-pack bitmaps
  2021-07-29 19:33               ` Taylor Blau
@ 2021-08-12 20:00                 ` Jeff King
  0 siblings, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-08-12 20:00 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Thu, Jul 29, 2021 at 03:33:18PM -0400, Taylor Blau wrote:

> > > It's only necessary now (at least for determining a preferred pack if
> > > the caller didn't specify one with `--preferred-pack`) because we care
> > > about reading the `num_objects` field, which the index must be loaded
> > > for.
> >
> > I guess I'm a little confused about "now" in your sentence. I understand
> > that it's not necessary before your series to have loaded all of the
> > index files ahead of time. But didn't we need to do so in v2 of your
> > series, which has the preferred-pack logic?
> >
> > If so, then was the v2 version buggy, since it only called
> > prepare_midx_pack() and not open_pack_index()? And then v3 is fixing
> > that? Or is something else opening the pack index for us?
> 
> In earlier versions of this series, I don't think we needed to have the
> indexes loaded by this point, since (before v3) we didn't care about
> ignoring the empty packs when finding a default preferred-pack.
> 
> But now we do, and so we need to call open_pack_index() ourselves.
> Confusingly, we only need to do that on packs that *are* included in the
> MIDX, since prepare_midx_pack() doesn't do it for us, but
> add_pack_to_midx() does.

Ah, that was the part I was missing: the default preferred-pack stuff is
only in v3. That makes sense.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v3 09/25] midx: avoid opening multiple MIDXs when writing
  2021-07-27 21:19   ` [PATCH v3 09/25] midx: avoid opening multiple MIDXs when writing Taylor Blau
  2021-07-29 19:30     ` Taylor Blau
@ 2021-08-12 20:15     ` Jeff King
  2021-08-12 20:22       ` Jeff King
  1 sibling, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-08-12 20:15 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Tue, Jul 27, 2021 at 05:19:46PM -0400, Taylor Blau wrote:

> Opening multiple instance of the same MIDX can lead to problems like two
> separate packed_git structures which represent the same pack being added
> to the repository's object store.
> [...]

Thanks, I think this approach fixes all of the potential problems from
our earlier discussion. You already noted the "!ctx->m" thing in a
follow-up. But also...

> Likewise, replace the call to `close_midx()` with
> `close_object_store()`, since we're about to replace the MIDX with a new
> one and should invalidate the object store's memory of any MIDX that
> might have existed beforehand.

Yes, I agree we need to do this, but I don't see the change in the
patch. Did something get lost in the rebasing/squashing process?

I think we'd need something like this:

diff --git a/midx.c b/midx.c
index 6dfafe7a8c..bfb6afea2e 100644
--- a/midx.c
+++ b/midx.c
@@ -1123,8 +1123,7 @@ static int write_midx_internal(const char *object_dir,
 	hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
 	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
 
-	if (ctx.m)
-		close_midx(ctx.m);
+	close_object_store(the_repository->objects);
 
 	if (ctx.nr - dropped_packs == 0) {
 		error(_("no pack files to index."));

though I'm not sure:

 - if this should be unconditional or dependent on ctx.m (I think the
   latter, because if we are renaming over any open midx, we would have
   filled in ctx.m earlier).

 - if this should go below the "no pack files to index" check (i.e., is
   there any point in closing if we know we will not write?). In fact,
   its purpose might be more obvious right before finalize_hashfile(),
   but I am OK either way on that.

-Peff

^ permalink raw reply related	[flat|nested] 273+ messages in thread

* Re: [PATCH v3 09/25] midx: avoid opening multiple MIDXs when writing
  2021-08-12 20:15     ` Jeff King
@ 2021-08-12 20:22       ` Jeff King
  2021-08-12 21:20         ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-08-12 20:22 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Thu, Aug 12, 2021 at 04:15:32PM -0400, Jeff King wrote:

> I think we'd need something like this:
> 
> diff --git a/midx.c b/midx.c
> index 6dfafe7a8c..bfb6afea2e 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -1123,8 +1123,7 @@ static int write_midx_internal(const char *object_dir,
>  	hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
>  	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
>  
> -	if (ctx.m)
> -		close_midx(ctx.m);
> +	close_object_store(the_repository->objects);
>  
>  	if (ctx.nr - dropped_packs == 0) {
>  		error(_("no pack files to index."));
> 
> though I'm not sure:
> 
>  - if this should be unconditional or dependent on ctx.m (I think the
>    latter, because if we are renaming over any open midx, we would have
>    filled in ctx.m earlier).
> 
>  - if this should go below the "no pack files to index" check (i.e., is
>    there any point in closing if we know we will not write?). In fact,
>    its purpose might be more obvious right before finalize_hashfile(),
>    but I am OK either way on that.

Ah, this close_midx() actually gets moved and made unconditional later
in the series.  But it still needs to be close_object_store() instead.

Also, my mention of finalize_hashfile() is wrong. It's
commit_lock_file() that does the actual rename, and indeed, that's where
you moved it to in the end, which is good.

-Peff




> 
> -Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v3 16/25] t5310: move some tests to lib-bitmap.sh
  2021-07-27 21:20   ` [PATCH v3 16/25] t5310: move some tests to lib-bitmap.sh Taylor Blau
@ 2021-08-12 20:25     ` Jeff King
  0 siblings, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-08-12 20:25 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Tue, Jul 27, 2021 at 05:20:04PM -0400, Taylor Blau wrote:

> diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
> index fe3f98be24..ecb5d0e05d 100644
> --- a/t/lib-bitmap.sh
> +++ b/t/lib-bitmap.sh
> @@ -1,3 +1,6 @@
> +# Helpers for scripts testing bitamp functionality; see t5310 for
> +# example usage.

Bitamp. :) Not worth a re-roll on its own, but I think we'll want one
more round to fix the close_object_store() stuff earlier in the series.

The rest of the patch looks good.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v3 17/25] t/helper/test-read-midx.c: add --checksum mode
  2021-07-27 21:20   ` [PATCH v3 17/25] t/helper/test-read-midx.c: add --checksum mode Taylor Blau
@ 2021-08-12 20:31     ` Jeff King
  2021-08-12 21:31       ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-08-12 20:31 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Tue, Jul 27, 2021 at 05:20:07PM -0400, Taylor Blau wrote:

> Subsequent tests will want to check for the existence of a multi-pack
> bitmap which matches the multi-pack-index stored in the pack directory.
> 
> The multi-pack bitmap includes the hex checksum of the MIDX it
> corresponds to in its filename (for example,
> '$packdir/multi-pack-index-<checksum>.bitmap'). As a result, some tests
> want a way to learn what '<checksum>' is.
> 
> This helper addresses that need by printing the checksum of the
> repository's multi-pack-index.

Makes sense. It might be nice to have a generic tool for pulling hashes
out of checksum files. Perhaps even a tool that is shipped with Git for
operating on such files (for in-the-field debugging and diagnosis). But
that can definitely be separate from this series (if ever).

>  t/helper/test-read-midx.c | 16 +++++++++++++++-
>  t/lib-bitmap.sh           |  4 ++++
>  2 files changed, 19 insertions(+), 1 deletion(-)

The patch itself looks fine to me. One curiosity:

> diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
> index ecb5d0e05d..09cd036f4d 100644
> --- a/t/lib-bitmap.sh
> +++ b/t/lib-bitmap.sh
> @@ -260,3 +260,7 @@ have_delta () {
>  	echo $1 | git cat-file --batch-check="%(deltabase)" >actual &&
>  	test_cmp expect actual
>  }
> +
> +midx_checksum () {
> +	test-tool read-midx --checksum "${1:-.git/objects}"
> +}

This default ".git/objects" will only _usually_ be the right thing. :)
If the actual C code accepted a missing object-dir, it could use the
correct object directory discovered by setup_git_directory().

Probably not a big deal either way, though.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v3 18/25] t5326: test multi-pack bitmap behavior
  2021-07-27 21:20   ` [PATCH v3 18/25] t5326: test multi-pack bitmap behavior Taylor Blau
@ 2021-08-12 21:02     ` Jeff King
  2021-08-12 21:07       ` Jeff King
  2021-08-12 22:38       ` Taylor Blau
  0 siblings, 2 replies; 273+ messages in thread
From: Jeff King @ 2021-08-12 21:02 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Tue, Jul 27, 2021 at 05:20:10PM -0400, Taylor Blau wrote:

> diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
> new file mode 100755
> index 0000000000..c1b7d633e2
> --- /dev/null
> +++ b/t/t5326-multi-pack-bitmaps.sh
> @@ -0,0 +1,277 @@
> +#!/bin/sh
> +
> +test_description='exercise basic multi-pack bitmap functionality'
> +. ./test-lib.sh
> +. "${TEST_DIRECTORY}/lib-bitmap.sh"
> +
> +# We'll be writing our own midx and bitmaps, so avoid getting confused by the
> +# automatic ones.
> +GIT_TEST_MULTI_PACK_INDEX=0
> +GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0

This latter variable doesn't do anything at this point in the series.
Probably not a big deal (it is simply a noop until then), but if it's
not hard, it may make sense to bump the "respect ... WRITE_BITMAP" patch
earlier in the series.

> +test_expect_success 'create single-pack midx with bitmaps' '
> +	git repack -ad &&
> +	git multi-pack-index write --bitmap &&
> +	test_path_is_file $midx &&
> +	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
> +'
> +
> +basic_bitmap_tests

We can't use a midx bitmap without a .rev file. The basic_bitmap_tests
function covers that, but I wonder if we should also check:

  test_path_is_file $midx-$(midx_checksum $objdir).rev

in that first test.

> +test_expect_success 'create new additional packs' '
> +	for i in $(test_seq 1 16)
> +	do
> +		test_commit "$i" &&
> +		git repack -d
> +	done &&

This loop needs an "|| return 1" inside to catch &&-chain problems (not
that we expect "repack -d" to fail, but just on principle).

> +	git checkout -b other2 HEAD~8 &&
> +	for i in $(test_seq 1 8)
> +	do
> +		test_commit "side-$i" &&
> +		git repack -d
> +	done &&

Ditto here.

> +test_expect_success 'create multi-pack midx with bitmaps' '
> +	git multi-pack-index write --bitmap &&
> +
> +	ls $objdir/pack/pack-*.pack >packs &&
> +	test_line_count = 25 packs &&
> +
> +	test_path_is_file $midx &&
> +	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
> +'

Possible spot for checking the .rev file again (though really, it is
belt-and-suspenders at this point).

> +basic_bitmap_tests

I love how the earlier refactoring made it easy to test the single- and
multi-pack cases thoroughly.

> +test_expect_success '--no-bitmap is respected when bitmaps exist' '
> +	git multi-pack-index write --bitmap &&
> +
> +	test_commit respect--no-bitmap &&
> +	GIT_TEST_MULTI_PACK_INDEX=0 git repack -d &&

Do we need to set this env variable? We've already set it to 0 at the
top of the script.

> +	test_path_is_file $midx &&
> +	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
> +
> +	git multi-pack-index write --no-bitmap &&
> +
> +	test_path_is_file $midx &&
> +	test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
> +'

OK, so we expect "--no-bitmap" to drop the bitmap (just like it does for
a regular pack bitmap). Makes sense. We probably should check:

  test_path_is_missing $midx-$(midx_checksum $objdir).rev

here, too (unlike the other spots, it isn't redundant; we could leave a
stale file around and likely nobody would notice).

> +test_expect_success 'setup midx with base from later pack' '
> +	# Write a and b so that "a" is a delta on top of base "b", since Git
> +	# prefers to delete contents out of a base rather than add to a shorter
> +	# object.
> +	test_seq 1 128 >a &&
> +	test_seq 1 130 >b &&
> +
> +	git add a b &&
> +	git commit -m "initial commit" &&
> +
> +	a=$(git rev-parse HEAD:a) &&
> +	b=$(git rev-parse HEAD:b) &&
> +
> +	# In the first pack, "a" is stored as a delta to "b".
> +	p1=$(git pack-objects .git/objects/pack/pack <<-EOF
> +	$a
> +	$b
> +	EOF
> +	) &&

This is brittle with respect to Git's delta heuristics, of course, but I
don't think there's a better way to do it with pack-objects. And this is
not the first test to make similar assumptions. I think you can
construct a known set of deltas using lib-pack.sh. It may get a bit
complicated. As an alternative, maybe it makes sense to confirm that the
deltas are set up as expected? You can do it with cat-file
--batch-check.

> +test_expect_success 'removing a MIDX clears stale bitmaps' '
> +	rm -fr repo &&
> +	git init repo &&
> +	test_when_finished "rm -fr repo" &&
> +	(
> +		cd repo &&
> +		test_commit base &&
> +		git repack &&
> +		git multi-pack-index write --bitmap &&
> +
> +		# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
> +		stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
> +		rm $midx &&
> +
> +		# Then write a new MIDX.
> +		test_commit new &&
> +		git repack &&
> +		git multi-pack-index write --bitmap &&
> +
> +		test_path_is_file $midx &&
> +		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
> +		test_path_is_missing $stale_bitmap
> +	)

Another spot where we might want to check that the stale .rev file has
gone away (and optionally that the new one was written; I haven't noted
all of those, though).

> +test_expect_success 'pack.preferBitmapTips' '
> +	git init repo &&
> +	test_when_finished "rm -fr repo" &&
> +	(
> +		cd repo &&
> +
> +		test_commit_bulk --message="%s" 103 &&
> +
> +		git log --format="%H" >commits.raw &&
> +		sort <commits.raw >commits &&
> +
> +		git log --format="create refs/tags/%s %H" HEAD >refs &&
> +		git update-ref --stdin <refs &&
> +
> +		git multi-pack-index write --bitmap &&
> +		test_path_is_file $midx &&
> +		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
> +
> +		test-tool bitmap list-commits | sort >bitmaps &&
> +		comm -13 bitmaps commits >before &&
> +		test_line_count = 1 before &&
> +
> +		perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
> +			<before | git update-ref --stdin &&
> +
> +		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
> +		rm -fr $midx-$(midx_checksum $objdir).rev &&
> +		rm -fr $midx &&
> +
> +		git -c pack.preferBitmapTips=refs/tags/include \
> +			multi-pack-index write --bitmap &&
> +		test-tool bitmap list-commits | sort >bitmaps &&
> +		comm -13 bitmaps commits >after &&
> +
> +		! test_cmp before after
> +	)
> +'

OK, so we are not depending on any _specific_ commits to get bitmapped,
but just confirming that we have some impact. That may be the best we
can do given that we are subject to the bitmap code's heuristics (and
anyway, this is exactly what the pack version does).

Any other parts of the patch that I didn't quote looked very good to me.
I'm happy to have such a thorough set of tests.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v3 18/25] t5326: test multi-pack bitmap behavior
  2021-08-12 21:02     ` Jeff King
@ 2021-08-12 21:07       ` Jeff King
  2021-08-12 22:38       ` Taylor Blau
  1 sibling, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-08-12 21:07 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Thu, Aug 12, 2021 at 05:02:26PM -0400, Jeff King wrote:

> > +# We'll be writing our own midx and bitmaps, so avoid getting confused by the
> > +# automatic ones.
> > +GIT_TEST_MULTI_PACK_INDEX=0
> > +GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
> 
> This latter variable doesn't do anything at this point in the series.
> Probably not a big deal (it is simply a noop until then), but if it's
> not hard, it may make sense to bump the "respect ... WRITE_BITMAP" patch
> earlier in the series.

Reading the other patches, I guess you ordering was to "fix" each of the
tests preemptively, and then add the knob at the end. That's OK by me.
For an alternate test-mode like this, I usually wouldn't worry about
bisectability, but it doesn't hurt. Somebody reading the commits later
won't have any trouble finding the definition of the WRITE_BITMAP
variable added in the subsequent patch.

> > +test_expect_success '--no-bitmap is respected when bitmaps exist' '
> > +	git multi-pack-index write --bitmap &&
> > +
> > +	test_commit respect--no-bitmap &&
> > +	GIT_TEST_MULTI_PACK_INDEX=0 git repack -d &&
> 
> Do we need to set this env variable? We've already set it to 0 at the
> top of the script.

By the way, there were a few more of these later in the script that
could be cleaned up, too.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v3 23/25] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  2021-07-27 21:20   ` [PATCH v3 23/25] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
@ 2021-08-12 21:09     ` Jeff King
  0 siblings, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-08-12 21:09 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Tue, Jul 27, 2021 at 05:20:23PM -0400, Taylor Blau wrote:

> diff --git a/builtin/repack.c b/builtin/repack.c
> index 5f9bc74adc..82ab668272 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -515,6 +515,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  		if (!(pack_everything & ALL_INTO_ONE) ||
>  		    !is_bare_repository())
>  			write_bitmaps = 0;
> +	} else if (write_bitmaps &&
> +		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0) &&
> +		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0)) {
> +		write_bitmaps = 0;
>  	}

This hunk confused me for a minute, since we are turning write_bitmaps
_off_ if we see a positive "write midx bitmap". But I guess the point is
to turn off the pack bitmap, and then in the later hunk:

> @@ -725,8 +729,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  		update_server_info(0);
>  	remove_temporary_files();
>  
> -	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
> -		write_midx_file(get_object_directory(), NULL, 0);
> +	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0)) {
> +		unsigned flags = 0;
> +		if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0))
> +			flags |= MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX;
> +		write_midx_file(get_object_directory(), NULL, flags);
> +	}

...we'd turn on the midx one. Makes sense.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v3 25/25] p5326: perf tests for MIDX bitmaps
  2021-07-27 21:20   ` [PATCH v3 25/25] p5326: perf tests for MIDX bitmaps Taylor Blau
@ 2021-08-12 21:18     ` Jeff King
  0 siblings, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-08-12 21:18 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Tue, Jul 27, 2021 at 05:20:28PM -0400, Taylor Blau wrote:

> These new performance tests demonstrate effectively the same behavior as
> p5310, but use a multi-pack bitmap instead of a single-pack one.
> 
> Notably, p5326 does not create a MIDX bitmap with multiple packs. This
> is so we can measure a direct comparison between it and p5310. Any
> difference between the two is measuring just the overhead of using MIDX
> bitmaps.
> 
> Here are the results of p5310 and p5326 together, measured at the same
> time and on the same machine (using a Xenon W-2255 CPU):

Neat. I think having separate perf regression tests for regular and mix
bitmaps will be useful, but being able to compare the pack and mix
versions is a cherry on top.

There was one funny number:

>     5310.2: repack to disk                                96.78(93.39+11.33)
>     5326.2: setup multi-pack index                        78.99(75.29+11.58)

In p5310, that step is repacking and writing bitmaps. With the midx,
it's repacking, then writing a midx with bitmaps. I'd expect the latter
to be strictly slower than the former, but here it's faster.

Running the code locally, I got similar results (with p5310 just a tiny
bit faster). So it may have just been noise or some other timing issue.

  As an aside, I think that test is a little bit bogus due to
  GIT_PERF_REPEAT_COUNT; the first trial will generate bitmaps from
  scratch, and then subsequent runs will reuse partial results. It
  probably should "rm -f .git/objects/*.bitmap" within the test. We can
  deal with that separately, though.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v3 09/25] midx: avoid opening multiple MIDXs when writing
  2021-08-12 20:22       ` Jeff King
@ 2021-08-12 21:20         ` Taylor Blau
  0 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-12 21:20 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, dstolee, gitster, jonathantanmy

On Thu, Aug 12, 2021 at 04:22:29PM -0400, Jeff King wrote:
> On Thu, Aug 12, 2021 at 04:15:32PM -0400, Jeff King wrote:
>
> > I think we'd need something like this:
> >
> > diff --git a/midx.c b/midx.c
> > index 6dfafe7a8c..bfb6afea2e 100644
> > --- a/midx.c
> > +++ b/midx.c
> > @@ -1123,8 +1123,7 @@ static int write_midx_internal(const char *object_dir,
> >  	hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
> >  	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
> >
> > -	if (ctx.m)
> > -		close_midx(ctx.m);
> > +	close_object_store(the_repository->objects);
> >
> >  	if (ctx.nr - dropped_packs == 0) {
> >  		error(_("no pack files to index."));
> >
> > though I'm not sure:
> >
> >  - if this should be unconditional or dependent on ctx.m (I think the
> >    latter, because if we are renaming over any open midx, we would have
> >    filled in ctx.m earlier).
> >
> >  - if this should go below the "no pack files to index" check (i.e., is
> >    there any point in closing if we know we will not write?). In fact,
> >    its purpose might be more obvious right before finalize_hashfile(),
> >    but I am OK either way on that.
>
> Ah, this close_midx() actually gets moved and made unconditional later
> in the series.  But it still needs to be close_object_store() instead.

Exactly; this first patch should read:

    if (ctx.m)
      close_object_store(the_repository->objects);

and then the latter patch (15/25) we drop the conditional and move our
call down until after the MIDX bitmap is written, but before we call
commit_lock_file().

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v3 00/25] multi-pack reachability bitmaps
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
                     ` (24 preceding siblings ...)
  2021-07-27 21:20   ` [PATCH v3 25/25] p5326: perf tests for MIDX bitmaps Taylor Blau
@ 2021-08-12 21:21   ` Jeff King
  2021-08-12 22:41     ` Taylor Blau
  25 siblings, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-08-12 21:21 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Tue, Jul 27, 2021 at 05:19:21PM -0400, Taylor Blau wrote:

> Thanks in advance for your review. I think Peff still wanted to read through
> patches 16-25, but that the first 15 or so should be in pretty good shape by
> now.

I think this is looking pretty close. There's the close_midx() thing
discussed in patch 9 that I think we need to deal with. In the tests, I
found some little nits. Nothing serious, but some of it at least is
worth fixing.

So I think with one more fairly trivial re-roll, we can think about
merging this to 'next'.

Thanks for your patience with my slow reviews, and for all your work on
this. It's really quite a complicated topic. :)

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v3 17/25] t/helper/test-read-midx.c: add --checksum mode
  2021-08-12 20:31     ` Jeff King
@ 2021-08-12 21:31       ` Taylor Blau
  0 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-12 21:31 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Thu, Aug 12, 2021 at 04:31:18PM -0400, Jeff King wrote:
> On Tue, Jul 27, 2021 at 05:20:07PM -0400, Taylor Blau wrote:
>
> > Subsequent tests will want to check for the existence of a multi-pack
> > bitmap which matches the multi-pack-index stored in the pack directory.
> >
> > The multi-pack bitmap includes the hex checksum of the MIDX it
> > corresponds to in its filename (for example,
> > '$packdir/multi-pack-index-<checksum>.bitmap'). As a result, some tests
> > want a way to learn what '<checksum>' is.
> >
> > This helper addresses that need by printing the checksum of the
> > repository's multi-pack-index.
>
> Makes sense. It might be nice to have a generic tool for pulling hashes
> out of checksum files. Perhaps even a tool that is shipped with Git for
> operating on such files (for in-the-field debugging and diagnosis). But
> that can definitely be separate from this series (if ever).

Yeah. That would definitely be in the spirit of "we should have more
test-tool-like helpers exposed via user-facing plumbing". And I agree
that it would be nice, but I definitely agree that it's a topic for a
later date ;).

> >  t/helper/test-read-midx.c | 16 +++++++++++++++-
> >  t/lib-bitmap.sh           |  4 ++++
> >  2 files changed, 19 insertions(+), 1 deletion(-)
>
> The patch itself looks fine to me. One curiosity:
>
> > diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
> > index ecb5d0e05d..09cd036f4d 100644
> > --- a/t/lib-bitmap.sh
> > +++ b/t/lib-bitmap.sh
> > @@ -260,3 +260,7 @@ have_delta () {
> >  	echo $1 | git cat-file --batch-check="%(deltabase)" >actual &&
> >  	test_cmp expect actual
> >  }
> > +
> > +midx_checksum () {
> > +	test-tool read-midx --checksum "${1:-.git/objects}"
> > +}
>
> This default ".git/objects" will only _usually_ be the right thing. :)
> If the actual C code accepted a missing object-dir, it could use the
> correct object directory discovered by setup_git_directory().
>
> Probably not a big deal either way, though.

Yeah. We could just sidestep the whole thing by not having the
`.git/objects` default, since all callers of midx_checksum pass an
argument, so that fallback is dead code anyway. Thanks for noting, I'll
remove it for the next round.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v3 18/25] t5326: test multi-pack bitmap behavior
  2021-08-12 21:02     ` Jeff King
  2021-08-12 21:07       ` Jeff King
@ 2021-08-12 22:38       ` Taylor Blau
  2021-08-12 23:23         ` Jeff King
  1 sibling, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-08-12 22:38 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Thu, Aug 12, 2021 at 05:02:26PM -0400, Jeff King wrote:
> On Tue, Jul 27, 2021 at 05:20:10PM -0400, Taylor Blau wrote:
>
> > diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
> > new file mode 100755
> > index 0000000000..c1b7d633e2
> > --- /dev/null
> > +++ b/t/t5326-multi-pack-bitmaps.sh
> > @@ -0,0 +1,277 @@
> > +#!/bin/sh
> > +
> > +test_description='exercise basic multi-pack bitmap functionality'
> > +. ./test-lib.sh
> > +. "${TEST_DIRECTORY}/lib-bitmap.sh"
> > +
> > +# We'll be writing our own midx and bitmaps, so avoid getting confused by the
> > +# automatic ones.
> > +GIT_TEST_MULTI_PACK_INDEX=0
> > +GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
>
> This latter variable doesn't do anything at this point in the series.
> Probably not a big deal (it is simply a noop until then), but if it's
> not hard, it may make sense to bump the "respect ... WRITE_BITMAP" patch
> earlier in the series.

If my memory serves me correctly, I think the very first version of this
patch didn't have a GIT_TEST_MULTI_PACK_INDEX{,_WRITE_BITMAP}=0 at the
top, and so individual invocations needed to set it in their own
environment. Presumably at some point I added this, but forgot to clean
up the redundant ones. I removed the ones you mentioned in your
response, and a few others.

> > +test_expect_success 'create single-pack midx with bitmaps' '
> > +	git repack -ad &&
> > +	git multi-pack-index write --bitmap &&
> > +	test_path_is_file $midx &&
> > +	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
> > +'
> > +
> > +basic_bitmap_tests
>
> We can't use a midx bitmap without a .rev file. The basic_bitmap_tests
> function covers that, but I wonder if we should also check:
>
>   test_path_is_file $midx-$(midx_checksum $objdir).rev
>
> in that first test.

Good idea. These tests probably preceded the invention of .rev files, so
a lot of them needed updating. I made sure to add them where
appropriate.

> > +test_expect_success 'create new additional packs' '
> > +	for i in $(test_seq 1 16)
> > +	do
> > +		test_commit "$i" &&
> > +		git repack -d
> > +	done &&
>
> This loop needs an "|| return 1" inside to catch &&-chain problems (not
> that we expect "repack -d" to fail, but just on principle).

Nice catch, thanks.

> I love how the earlier refactoring made it easy to test the single- and
> multi-pack cases thoroughly.

Likewise :-).

> > +test_expect_success 'setup midx with base from later pack' '
> > +	# Write a and b so that "a" is a delta on top of base "b", since Git
> > +	# prefers to delete contents out of a base rather than add to a shorter
> > +	# object.
> > +	test_seq 1 128 >a &&
> > +	test_seq 1 130 >b &&
> > +
> > +	git add a b &&
> > +	git commit -m "initial commit" &&
> > +
> > +	a=$(git rev-parse HEAD:a) &&
> > +	b=$(git rev-parse HEAD:b) &&
> > +
> > +	# In the first pack, "a" is stored as a delta to "b".
> > +	p1=$(git pack-objects .git/objects/pack/pack <<-EOF
> > +	$a
> > +	$b
> > +	EOF
> > +	) &&
>
> This is brittle with respect to Git's delta heuristics, of course, but I
> don't think there's a better way to do it with pack-objects. And this is
> not the first test to make similar assumptions. I think you can
> construct a known set of deltas using lib-pack.sh. It may get a bit
> complicated. As an alternative, maybe it makes sense to confirm that the
> deltas are set up as expected? You can do it with cat-file
> --batch-check.

Yeah, I definitely agree that this test is brittle. But it would fail if
our assumptions about what gets delta'd with what changes, because we do
check that 'a' is a delta on top of 'b' (see the call to have_delta
towards the end of this test). That have_delta helper does use
`--batch-check=%(deltabase)`, which is (I think) the cat-file invocation
you're mentioning.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v3 00/25] multi-pack reachability bitmaps
  2021-08-12 21:21   ` [PATCH v3 00/25] multi-pack reachability bitmaps Jeff King
@ 2021-08-12 22:41     ` Taylor Blau
  0 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-12 22:41 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Thu, Aug 12, 2021 at 05:21:44PM -0400, Jeff King wrote:
> On Tue, Jul 27, 2021 at 05:19:21PM -0400, Taylor Blau wrote:
>
> > Thanks in advance for your review. I think Peff still wanted to read through
> > patches 16-25, but that the first 15 or so should be in pretty good shape by
> > now.
>
> I think this is looking pretty close. There's the close_midx() thing
> discussed in patch 9 that I think we need to deal with. In the tests, I
> found some little nits. Nothing serious, but some of it at least is
> worth fixing.

Thanks; I think I fixed everything up that you mentioned. The big deals
were about close_object_store() versus close_midx(), and the changes to
t5326 in patch 18/25. I think the remaining comments on patches 19-25
were thinking aloud instead of recommending any changes.

> So I think with one more fairly trivial re-roll, we can think about
> merging this to 'next'.

Great. I have all of that prepared locally, but I'll wait until after
Monday to send it since I don't want to dilute the conversation away
from release hardening (I certainly don't mind your review, I just
figure folks would appreciate me *not* sending 25 new messages to their
inboxes ;-)).

> Thanks for your patience with my slow reviews, and for all your work on
> this. It's really quite a complicated topic. :)

Thanks for reviewing. I tried my best to break this series up from all
of the ones that it depends on, but there was only so much I could do to
isolate the complexity. I'm glad that it had a thorough set of eyes on
it.

I'll look forward to sending a hopefully-final reroll shortly after the
release so we can move onto a few more cosmetic topics on top to
integrate MIDX bitmaps more closely with `repack` and so on.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v3 18/25] t5326: test multi-pack bitmap behavior
  2021-08-12 22:38       ` Taylor Blau
@ 2021-08-12 23:23         ` Jeff King
  0 siblings, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-08-12 23:23 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Thu, Aug 12, 2021 at 06:38:20PM -0400, Taylor Blau wrote:

> > This is brittle with respect to Git's delta heuristics, of course, but I
> > don't think there's a better way to do it with pack-objects. And this is
> > not the first test to make similar assumptions. I think you can
> > construct a known set of deltas using lib-pack.sh. It may get a bit
> > complicated. As an alternative, maybe it makes sense to confirm that the
> > deltas are set up as expected? You can do it with cat-file
> > --batch-check.
> 
> Yeah, I definitely agree that this test is brittle. But it would fail if
> our assumptions about what gets delta'd with what changes, because we do
> check that 'a' is a delta on top of 'b' (see the call to have_delta
> towards the end of this test). That have_delta helper does use
> `--batch-check=%(deltabase)`, which is (I think) the cat-file invocation
> you're mentioning.

Doh, I totally missed that. I was expecting to verify it earlier in the
test as a pre-condition, but it works just fine where it is. So yeah,
you are already doing the thing I was suggesting.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* [PATCH v4 00/25] multi-pack reachability bitmaps
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (23 preceding siblings ...)
  2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
@ 2021-08-24 16:15 ` Taylor Blau
  2021-08-24 16:15   ` [PATCH v4 01/25] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
                     ` (25 more replies)
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
  25 siblings, 26 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:15 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Here is what I anticipate to be a final reroll of my series to implement
multi-pack reachability bitmaps, based on review feedback from Peff.

Most of the change since last time are cosmetic test clean-ups. The previous
reroll of this series incorporated feedback from a discussion[1] surrounding the
`multi-pack-index` builtin's `--object-dir` argument. This reroll fixes a bug
discussed here[2] where we should have been calling close_object_store() but
weren't; the remainder of that bug has already been dealt with.

Thanks everybody for dealing with multiple versions of this quite lengthy and
complicated series. Hopefully we are done in this round and can move on to
integrating this with `git repack`, which will complete the MIDX bitmaps topic.

[1]: https://lore.kernel.org/git/YQMFIljXl7sAAA%2FL@nand.local/
[2]: https://lore.kernel.org/git/YRWBZJDCVyUOhk2F@coredump.intra.peff.net/

Jeff King (2):
  t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
  t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP

Taylor Blau (23):
  pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
  pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  pack-bitmap-write.c: free existing bitmaps
  Documentation: describe MIDX-based bitmaps
  midx: clear auxiliary .rev after replacing the MIDX
  midx: reject empty `--preferred-pack`'s
  midx: infer preferred pack when not given one
  midx: close linked MIDXs, avoid leaking memory
  midx: avoid opening multiple MIDXs when writing
  pack-bitmap.c: introduce 'bitmap_num_objects()'
  pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
  pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
  pack-bitmap.c: avoid redundant calls to try_partial_reuse
  pack-bitmap: read multi-pack bitmaps
  pack-bitmap: write multi-pack bitmaps
  t5310: move some tests to lib-bitmap.sh
  t/helper/test-read-midx.c: add --checksum mode
  t5326: test multi-pack bitmap behavior
  t5319: don't write MIDX bitmaps in t5319
  t7700: update to work with MIDX bitmap test knob
  midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  p5310: extract full and partial bitmap tests
  p5326: perf tests for MIDX bitmaps

 Documentation/git-multi-pack-index.txt       |  18 +-
 Documentation/technical/bitmap-format.txt    |  71 ++-
 Documentation/technical/multi-pack-index.txt |  10 +-
 builtin/multi-pack-index.c                   |   2 +
 builtin/pack-objects.c                       |   8 +-
 builtin/repack.c                             |  12 +-
 ci/run-build-and-tests.sh                    |   1 +
 midx.c                                       | 319 +++++++++++-
 midx.h                                       |   5 +
 pack-bitmap-write.c                          |  79 ++-
 pack-bitmap.c                                | 499 ++++++++++++++++---
 pack-bitmap.h                                |   9 +-
 packfile.c                                   |   2 +-
 t/README                                     |   4 +
 t/helper/test-read-midx.c                    |  16 +-
 t/lib-bitmap.sh                              | 240 +++++++++
 t/perf/lib-bitmap.sh                         |  69 +++
 t/perf/p5310-pack-bitmaps.sh                 |  65 +--
 t/perf/p5326-multi-pack-bitmaps.sh           |  43 ++
 t/t0410-partial-clone.sh                     |  12 +-
 t/t5310-pack-bitmaps.sh                      | 231 +--------
 t/t5319-multi-pack-index.sh                  |  20 +-
 t/t5326-multi-pack-bitmaps.sh                | 286 +++++++++++
 t/t7700-repack.sh                            |  18 +-
 24 files changed, 1603 insertions(+), 436 deletions(-)
 create mode 100644 t/perf/lib-bitmap.sh
 create mode 100755 t/perf/p5326-multi-pack-bitmaps.sh
 create mode 100755 t/t5326-multi-pack-bitmaps.sh

Range-diff against v3:
 1:  fa4cbed48e =  1:  92dc0bbc0d pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
 2:  2b15c1fc5c =  2:  979276bc74 pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
 3:  2ad513a230 =  3:  8f00493955 pack-bitmap-write.c: free existing bitmaps
 4:  8da5de7c24 =  4:  bc7db926d8 Documentation: describe MIDX-based bitmaps
 5:  49297f57ed =  5:  771741844b midx: clear auxiliary .rev after replacing the MIDX
 6:  c5513f2a75 =  6:  dab5dbf228 midx: reject empty `--preferred-pack`'s
 7:  53ef0a6d67 =  7:  31f4517de0 midx: infer preferred pack when not given one
 8:  114773d9cd =  8:  aa3bd96d9b midx: close linked MIDXs, avoid leaking memory
 9:  40cff5beb5 !  9:  c9fea31fa8 midx: avoid opening multiple MIDXs when writing
    @@ Commit message
         one and should invalidate the object store's memory of any MIDX that
         might have existed beforehand.
     
    +    Note that this now forbids passing object directories that don't belong
    +    to alternate repositories over `--object-dir`, since before we would
    +    have happily opened a MIDX in any directory, but now restrict ourselves
    +    to only those reachable by `r->objects->multi_pack_index` (and alternate
    +    MIDXs that we can see by walking the `next` pointer).
    +
    +    As far as I can tell, supporting arbitrary directories with
    +    `--object-dir` was a historical accident, since even the documentation
    +    says `<alt>` when referring to the value passed to this option.
    +
    +    A future patch could clean this up and provide a warning() when a
    +    non-alternate directory was given, since we'll still write a new MIDX
    +    there, we just won't reuse any MIDX that might happen to already exist
    +    in that directory.
    +
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## midx.c ##
    @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
     +			break;
     +		}
     +	}
    -+	if (!ctx.m)
    -+		ctx.m = get_local_multi_pack_index(the_repository);
      
      	if (ctx.m && !midx_checksum_valid(ctx.m)) {
      		warning(_("ignoring existing multi-pack-index; checksum mismatch"));
    +@@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack_index *
    + 	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
    + 
    + 	if (ctx.m)
    +-		close_midx(ctx.m);
    ++		close_object_store(the_repository->objects);
    + 
    + 	if (ctx.nr - dropped_packs == 0) {
    + 		error(_("no pack files to index."));
     @@ midx.c: int write_midx_file(const char *object_dir,
      		    const char *preferred_pack_name,
      		    unsigned flags)
10:  ca7f726abf ! 10:  ee72fb7e38 pack-bitmap.c: introduce 'bitmap_num_objects()'
    @@ pack-bitmap.c: static void show_extended_objects(struct bitmap_index *bitmap_git
      
      		obj = eindex->objects[i];
     @@ pack-bitmap.c: static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
    - 	 * individually.
    + 	 * them individually.
      	 */
      	for (i = 0; i < eindex->count; i++) {
     -		uint32_t pos = i + bitmap_git->pack->num_objects;
11:  67e6897a34 = 11:  ede0bf1ce1 pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
12:  743a1a138e = 12:  df6844def0 pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
13:  a3b641b3e6 = 13:  4e06f051a7 pack-bitmap.c: avoid redundant calls to try_partial_reuse
14:  141ff83275 = 14:  a0d73eb3d3 pack-bitmap: read multi-pack bitmaps
15:  54600b5814 ! 15:  9d83ad77ab pack-bitmap: write multi-pack bitmaps
    @@ midx.c: static int write_midx_internal(const char *object_dir,
      	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
      
     -	if (ctx.m)
    --		close_midx(ctx.m);
    +-		close_object_store(the_repository->objects);
     -
      	if (ctx.nr - dropped_packs == 0) {
      		error(_("no pack files to index."));
    @@ midx.c: static int write_midx_internal(const char *object_dir,
     +		}
     +	}
     +
    -+	close_midx(ctx.m);
    ++	close_object_store(the_repository->objects);
      
      	commit_lock_file(&lk);
      
16:  168b7b0976 ! 16:  a92af89884 t5310: move some tests to lib-bitmap.sh
    @@ Commit message
     
      ## t/lib-bitmap.sh ##
     @@
    -+# Helpers for scripts testing bitamp functionality; see t5310 for
    ++# Helpers for scripts testing bitmap functionality; see t5310 for
     +# example usage.
     +
      # Compare a file containing rev-list bitmap traversal output to its non-bitmap
17:  60ec8b3466 ! 17:  d47aa4a919 t/helper/test-read-midx.c: add --checksum mode
    @@ t/lib-bitmap.sh: have_delta () {
      }
     +
     +midx_checksum () {
    -+	test-tool read-midx --checksum "${1:-.git/objects}"
    ++	test-tool read-midx --checksum "$1"
     +}
18:  3258ccfc1c ! 18:  9d9d9f28a6 t5326: test multi-pack bitmap behavior
    @@ t/t5326-multi-pack-bitmaps.sh (new)
     +	git repack -ad &&
     +	git multi-pack-index write --bitmap &&
     +	test_path_is_file $midx &&
    -+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
    ++	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
    ++	test_path_is_file $midx-$(midx_checksum $objdir).rev
     +'
     +
     +basic_bitmap_tests
    @@ t/t5326-multi-pack-bitmaps.sh (new)
     +	for i in $(test_seq 1 16)
     +	do
     +		test_commit "$i" &&
    -+		git repack -d
    ++		git repack -d || return 1
     +	done &&
     +
     +	git checkout -b other2 HEAD~8 &&
     +	for i in $(test_seq 1 8)
     +	do
     +		test_commit "side-$i" &&
    -+		git repack -d
    ++		git repack -d || return 1
     +	done &&
     +	git checkout second
     +'
    @@ t/t5326-multi-pack-bitmaps.sh (new)
     +	test_line_count = 25 packs &&
     +
     +	test_path_is_file $midx &&
    -+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
    ++	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
    ++	test_path_is_file $midx-$(midx_checksum $objdir).rev
     +'
     +
     +basic_bitmap_tests
    @@ t/t5326-multi-pack-bitmaps.sh (new)
     +	git multi-pack-index write --bitmap &&
     +
     +	test_commit respect--no-bitmap &&
    -+	GIT_TEST_MULTI_PACK_INDEX=0 git repack -d &&
    ++	git repack -d &&
     +
     +	test_path_is_file $midx &&
     +	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
    ++	test_path_is_file $midx-$(midx_checksum $objdir).rev &&
     +
     +	git multi-pack-index write --no-bitmap &&
     +
     +	test_path_is_file $midx &&
    -+	test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
    ++	test_path_is_missing $midx-$(midx_checksum $objdir).bitmap &&
    ++	test_path_is_missing $midx-$(midx_checksum $objdir).rev
     +'
     +
     +test_expect_success 'setup midx with base from later pack' '
    @@ t/t5326-multi-pack-bitmaps.sh (new)
     +			git config core.multiPackIndex true &&
     +			if test "MIDX" = "$from"
     +			then
    -+				GIT_TEST_MULTI_PACK_INDEX=0 git repack -Ad &&
    ++				git repack -Ad &&
     +				git multi-pack-index write --bitmap
     +			else
    -+				GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb
    ++				git repack -Adb
     +			fi
     +		)
     +	'
    @@ t/t5326-multi-pack-bitmaps.sh (new)
     +
     +			if test "MIDX" = "$to"
     +			then
    -+				GIT_TEST_MULTI_PACK_INDEX=0 git repack -d &&
    ++				git repack -d &&
     +				git multi-pack-index write --bitmap
     +			else
    -+				GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb
    ++				git repack -Adb
     +			fi
     +		)
     +	'
    @@ t/t5326-multi-pack-bitmaps.sh (new)
     +	test_commit loose &&
     +	git multi-pack-index write --bitmap 2>err &&
     +	test_path_is_file $midx &&
    -+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
    ++	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
    ++	test_path_is_file $midx-$(midx_checksum $objdir).rev
     +'
     +
     +basic_bitmap_tests HEAD~
    @@ t/t5326-multi-pack-bitmaps.sh (new)
     +
     +		# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
     +		stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
    ++		stale_rev=$midx-$(midx_checksum $objdir).rev &&
     +		rm $midx &&
     +
     +		# Then write a new MIDX.
    @@ t/t5326-multi-pack-bitmaps.sh (new)
     +
     +		test_path_is_file $midx &&
     +		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
    -+		test_path_is_missing $stale_bitmap
    ++		test_path_is_file $midx-$(midx_checksum $objdir).rev &&
    ++		test_path_is_missing $stale_bitmap &&
    ++		test_path_is_missing $stale_rev
     +	)
     +'
     +
    @@ t/t5326-multi-pack-bitmaps.sh (new)
     +		git multi-pack-index write --bitmap &&
     +		test_path_is_file $midx &&
     +		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
    ++		test_path_is_file $midx-$(midx_checksum $objdir).rev &&
     +
     +		test-tool bitmap list-commits | sort >bitmaps &&
     +		comm -13 bitmaps commits >before &&
19:  47c7e6bb9b = 19:  3e0da7e5ed t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
20:  6a708858b1 = 20:  4e0d49a2dd t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
21:  1eaa744b24 = 21:  47eba8ecf9 t5319: don't write MIDX bitmaps in t5319
22:  a4a899e31f = 22:  3d78afa2ad t7700: update to work with MIDX bitmap test knob
23:  50865e52a3 = 23:  c2f94e033d midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
24:  0f1fd6e7d4 = 24:  6b03016c99 p5310: extract full and partial bitmap tests
25:  82e8133bf4 = 25:  d98faa4c2c p5326: perf tests for MIDX bitmaps
-- 
2.31.1.163.ga65ce7f831

^ permalink raw reply	[flat|nested] 273+ messages in thread

* [PATCH v4 01/25] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
@ 2021-08-24 16:15   ` Taylor Blau
  2021-08-24 16:15   ` [PATCH v4 02/25] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
                     ` (24 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:15 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

The special `--test-bitmap` mode of `git rev-list` is used to compare
the result of an object traversal with a bitmap to check its integrity.
This mode does not, however, assert that the types of reachable objects
are stored correctly.

Harden this mode by teaching it to also check that each time an object's
bit is marked, the corresponding bit should be set in exactly one of the
type bitmaps (whose type matches the object's true type).

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index d999616c9e..9b11af87aa 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1325,10 +1325,52 @@ void count_bitmap_commit_list(struct bitmap_index *bitmap_git,
 struct bitmap_test_data {
 	struct bitmap_index *bitmap_git;
 	struct bitmap *base;
+	struct bitmap *commits;
+	struct bitmap *trees;
+	struct bitmap *blobs;
+	struct bitmap *tags;
 	struct progress *prg;
 	size_t seen;
 };
 
+static void test_bitmap_type(struct bitmap_test_data *tdata,
+			     struct object *obj, int pos)
+{
+	enum object_type bitmap_type = OBJ_NONE;
+	int bitmaps_nr = 0;
+
+	if (bitmap_get(tdata->commits, pos)) {
+		bitmap_type = OBJ_COMMIT;
+		bitmaps_nr++;
+	}
+	if (bitmap_get(tdata->trees, pos)) {
+		bitmap_type = OBJ_TREE;
+		bitmaps_nr++;
+	}
+	if (bitmap_get(tdata->blobs, pos)) {
+		bitmap_type = OBJ_BLOB;
+		bitmaps_nr++;
+	}
+	if (bitmap_get(tdata->tags, pos)) {
+		bitmap_type = OBJ_TAG;
+		bitmaps_nr++;
+	}
+
+	if (bitmap_type == OBJ_NONE)
+		die("object %s not found in type bitmaps",
+		    oid_to_hex(&obj->oid));
+
+	if (bitmaps_nr > 1)
+		die("object %s does not have a unique type",
+		    oid_to_hex(&obj->oid));
+
+	if (bitmap_type != obj->type)
+		die("object %s: real type %s, expected: %s",
+		    oid_to_hex(&obj->oid),
+		    type_name(obj->type),
+		    type_name(bitmap_type));
+}
+
 static void test_show_object(struct object *object, const char *name,
 			     void *data)
 {
@@ -1338,6 +1380,7 @@ static void test_show_object(struct object *object, const char *name,
 	bitmap_pos = bitmap_position(tdata->bitmap_git, &object->oid);
 	if (bitmap_pos < 0)
 		die("Object not in bitmap: %s\n", oid_to_hex(&object->oid));
+	test_bitmap_type(tdata, object, bitmap_pos);
 
 	bitmap_set(tdata->base, bitmap_pos);
 	display_progress(tdata->prg, ++tdata->seen);
@@ -1352,6 +1395,7 @@ static void test_show_commit(struct commit *commit, void *data)
 				     &commit->object.oid);
 	if (bitmap_pos < 0)
 		die("Object not in bitmap: %s\n", oid_to_hex(&commit->object.oid));
+	test_bitmap_type(tdata, &commit->object, bitmap_pos);
 
 	bitmap_set(tdata->base, bitmap_pos);
 	display_progress(tdata->prg, ++tdata->seen);
@@ -1399,6 +1443,10 @@ void test_bitmap_walk(struct rev_info *revs)
 
 	tdata.bitmap_git = bitmap_git;
 	tdata.base = bitmap_new();
+	tdata.commits = ewah_to_bitmap(bitmap_git->commits);
+	tdata.trees = ewah_to_bitmap(bitmap_git->trees);
+	tdata.blobs = ewah_to_bitmap(bitmap_git->blobs);
+	tdata.tags = ewah_to_bitmap(bitmap_git->tags);
 	tdata.prg = start_progress("Verifying bitmap entries", result_popcnt);
 	tdata.seen = 0;
 
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 02/25] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
  2021-08-24 16:15   ` [PATCH v4 01/25] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
@ 2021-08-24 16:15   ` Taylor Blau
  2021-08-24 16:15   ` [PATCH v4 03/25] pack-bitmap-write.c: free existing bitmaps Taylor Blau
                     ` (23 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:15 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

The set of objects covered by a bitmap must be closed under
reachability, since it must be the case that there is a valid bit
position assigned for every possible reachable object (otherwise the
bitmaps would be incomplete).

Pack bitmaps are never written from 'git repack' unless repacking
all-into-one, and so we never write non-closed bitmaps (except in the
case of partial clones where we aren't guaranteed to have all objects).

But multi-pack bitmaps change this, since it isn't known whether the
set of objects in the MIDX is closed under reachability until walking
them. Plumb through a bit that is set when a reachable object isn't
found.

As soon as a reachable object isn't found in the set of objects to
include in the bitmap, bitmap_writer_build() knows that the set is not
closed, and so it now fails gracefully.

A test is added in t0410 to trigger a bitmap write without full
reachability closure by removing local copies of some reachable objects
from a promisor remote.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c   |  3 +-
 pack-bitmap-write.c      | 76 ++++++++++++++++++++++++++++------------
 pack-bitmap.h            |  2 +-
 t/t0410-partial-clone.sh |  9 ++++-
 4 files changed, 64 insertions(+), 26 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index de00adbb9e..8a523624a1 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1256,7 +1256,8 @@ static void write_pack_file(void)
 
 				bitmap_writer_show_progress(progress);
 				bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1);
-				bitmap_writer_build(&to_pack);
+				if (bitmap_writer_build(&to_pack) < 0)
+					die(_("failed to write bitmap index"));
 				bitmap_writer_finish(written_list, nr_written,
 						     tmpname.buf, write_bitmap_options);
 				write_bitmap_index = 0;
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 88d9e696a5..d374f7884b 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -125,15 +125,20 @@ static inline void push_bitmapped_commit(struct commit *commit)
 	writer.selected_nr++;
 }
 
-static uint32_t find_object_pos(const struct object_id *oid)
+static uint32_t find_object_pos(const struct object_id *oid, int *found)
 {
 	struct object_entry *entry = packlist_find(writer.to_pack, oid);
 
 	if (!entry) {
-		die("Failed to write bitmap index. Packfile doesn't have full closure "
+		if (found)
+			*found = 0;
+		warning("Failed to write bitmap index. Packfile doesn't have full closure "
 			"(object %s is missing)", oid_to_hex(oid));
+		return 0;
 	}
 
+	if (found)
+		*found = 1;
 	return oe_in_pack_pos(writer.to_pack, entry);
 }
 
@@ -331,9 +336,10 @@ static void bitmap_builder_clear(struct bitmap_builder *bb)
 	bb->commits_nr = bb->commits_alloc = 0;
 }
 
-static void fill_bitmap_tree(struct bitmap *bitmap,
-			     struct tree *tree)
+static int fill_bitmap_tree(struct bitmap *bitmap,
+			    struct tree *tree)
 {
+	int found;
 	uint32_t pos;
 	struct tree_desc desc;
 	struct name_entry entry;
@@ -342,9 +348,11 @@ static void fill_bitmap_tree(struct bitmap *bitmap,
 	 * If our bit is already set, then there is nothing to do. Both this
 	 * tree and all of its children will be set.
 	 */
-	pos = find_object_pos(&tree->object.oid);
+	pos = find_object_pos(&tree->object.oid, &found);
+	if (!found)
+		return -1;
 	if (bitmap_get(bitmap, pos))
-		return;
+		return 0;
 	bitmap_set(bitmap, pos);
 
 	if (parse_tree(tree) < 0)
@@ -355,11 +363,15 @@ static void fill_bitmap_tree(struct bitmap *bitmap,
 	while (tree_entry(&desc, &entry)) {
 		switch (object_type(entry.mode)) {
 		case OBJ_TREE:
-			fill_bitmap_tree(bitmap,
-					 lookup_tree(the_repository, &entry.oid));
+			if (fill_bitmap_tree(bitmap,
+					     lookup_tree(the_repository, &entry.oid)) < 0)
+				return -1;
 			break;
 		case OBJ_BLOB:
-			bitmap_set(bitmap, find_object_pos(&entry.oid));
+			pos = find_object_pos(&entry.oid, &found);
+			if (!found)
+				return -1;
+			bitmap_set(bitmap, pos);
 			break;
 		default:
 			/* Gitlink, etc; not reachable */
@@ -368,15 +380,18 @@ static void fill_bitmap_tree(struct bitmap *bitmap,
 	}
 
 	free_tree_buffer(tree);
+	return 0;
 }
 
-static void fill_bitmap_commit(struct bb_commit *ent,
-			       struct commit *commit,
-			       struct prio_queue *queue,
-			       struct prio_queue *tree_queue,
-			       struct bitmap_index *old_bitmap,
-			       const uint32_t *mapping)
+static int fill_bitmap_commit(struct bb_commit *ent,
+			      struct commit *commit,
+			      struct prio_queue *queue,
+			      struct prio_queue *tree_queue,
+			      struct bitmap_index *old_bitmap,
+			      const uint32_t *mapping)
 {
+	int found;
+	uint32_t pos;
 	if (!ent->bitmap)
 		ent->bitmap = bitmap_new();
 
@@ -401,11 +416,16 @@ static void fill_bitmap_commit(struct bb_commit *ent,
 		 * Mark ourselves and queue our tree. The commit
 		 * walk ensures we cover all parents.
 		 */
-		bitmap_set(ent->bitmap, find_object_pos(&c->object.oid));
+		pos = find_object_pos(&c->object.oid, &found);
+		if (!found)
+			return -1;
+		bitmap_set(ent->bitmap, pos);
 		prio_queue_put(tree_queue, get_commit_tree(c));
 
 		for (p = c->parents; p; p = p->next) {
-			int pos = find_object_pos(&p->item->object.oid);
+			pos = find_object_pos(&p->item->object.oid, &found);
+			if (!found)
+				return -1;
 			if (!bitmap_get(ent->bitmap, pos)) {
 				bitmap_set(ent->bitmap, pos);
 				prio_queue_put(queue, p->item);
@@ -413,8 +433,12 @@ static void fill_bitmap_commit(struct bb_commit *ent,
 		}
 	}
 
-	while (tree_queue->nr)
-		fill_bitmap_tree(ent->bitmap, prio_queue_get(tree_queue));
+	while (tree_queue->nr) {
+		if (fill_bitmap_tree(ent->bitmap,
+				     prio_queue_get(tree_queue)) < 0)
+			return -1;
+	}
+	return 0;
 }
 
 static void store_selected(struct bb_commit *ent, struct commit *commit)
@@ -432,7 +456,7 @@ static void store_selected(struct bb_commit *ent, struct commit *commit)
 	kh_value(writer.bitmaps, hash_pos) = stored;
 }
 
-void bitmap_writer_build(struct packing_data *to_pack)
+int bitmap_writer_build(struct packing_data *to_pack)
 {
 	struct bitmap_builder bb;
 	size_t i;
@@ -441,6 +465,7 @@ void bitmap_writer_build(struct packing_data *to_pack)
 	struct prio_queue tree_queue = { NULL };
 	struct bitmap_index *old_bitmap;
 	uint32_t *mapping;
+	int closed = 1; /* until proven otherwise */
 
 	writer.bitmaps = kh_init_oid_map();
 	writer.to_pack = to_pack;
@@ -463,8 +488,11 @@ void bitmap_writer_build(struct packing_data *to_pack)
 		struct commit *child;
 		int reused = 0;
 
-		fill_bitmap_commit(ent, commit, &queue, &tree_queue,
-				   old_bitmap, mapping);
+		if (fill_bitmap_commit(ent, commit, &queue, &tree_queue,
+				       old_bitmap, mapping) < 0) {
+			closed = 0;
+			break;
+		}
 
 		if (ent->selected) {
 			store_selected(ent, commit);
@@ -499,7 +527,9 @@ void bitmap_writer_build(struct packing_data *to_pack)
 
 	stop_progress(&writer.progress);
 
-	compute_xor_offsets();
+	if (closed)
+		compute_xor_offsets();
+	return closed ? 0 : -1;
 }
 
 /**
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 99d733eb26..020cd8d868 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -87,7 +87,7 @@ struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
 				      struct commit *commit);
 void bitmap_writer_select_commits(struct commit **indexed_commits,
 		unsigned int indexed_commits_nr, int max_bitmaps);
-void bitmap_writer_build(struct packing_data *to_pack);
+int bitmap_writer_build(struct packing_data *to_pack);
 void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint32_t index_nr,
 			  const char *filename,
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index a211a66c67..bbcc51ee8e 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -536,7 +536,13 @@ test_expect_success 'gc does not repack promisor objects if there are none' '
 repack_and_check () {
 	rm -rf repo2 &&
 	cp -r repo repo2 &&
-	git -C repo2 repack $1 -d &&
+	if test x"$1" = "x--must-fail"
+	then
+		shift
+		test_must_fail git -C repo2 repack $1 -d
+	else
+		git -C repo2 repack $1 -d
+	fi &&
 	git -C repo2 fsck &&
 
 	git -C repo2 cat-file -e $2 &&
@@ -561,6 +567,7 @@ test_expect_success 'repack -d does not irreversibly delete promisor objects' '
 	printf "$THREE\n" | pack_as_from_promisor &&
 	delete_object repo "$ONE" &&
 
+	repack_and_check --must-fail -ab "$TWO" "$THREE" &&
 	repack_and_check -a "$TWO" "$THREE" &&
 	repack_and_check -A "$TWO" "$THREE" &&
 	repack_and_check -l "$TWO" "$THREE"
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 03/25] pack-bitmap-write.c: free existing bitmaps
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
  2021-08-24 16:15   ` [PATCH v4 01/25] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
  2021-08-24 16:15   ` [PATCH v4 02/25] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
@ 2021-08-24 16:15   ` Taylor Blau
  2021-08-24 16:15   ` [PATCH v4 04/25] Documentation: describe MIDX-based bitmaps Taylor Blau
                     ` (22 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:15 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

When writing a new bitmap, the bitmap writer code attempts to read the
existing bitmap (if one is present). This is done in order to quickly
permute the bits of any bitmaps for commits which appear in the existing
bitmap, and were also selected for the new bitmap.

But since this code was added in 341fa34887 (pack-bitmap-write: use
existing bitmaps, 2020-12-08), the resources associated with opening an
existing bitmap were never released.

It's fine to ignore this, but it's bad hygiene. It will also cause a
problem for the multi-pack-index builtin, which will be responsible not
only for writing bitmaps, but also for expiring any old multi-pack
bitmaps.

If an existing bitmap was reused here, it will also be expired. That
will cause a problem on platforms which require file resources to be
closed before unlinking them, like Windows. Avoid this by ensuring we
close reused bitmaps with free_bitmap_index() before removing them.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index d374f7884b..142fd0adb8 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -520,6 +520,7 @@ int bitmap_writer_build(struct packing_data *to_pack)
 	clear_prio_queue(&queue);
 	clear_prio_queue(&tree_queue);
 	bitmap_builder_clear(&bb);
+	free_bitmap_index(old_bitmap);
 	free(mapping);
 
 	trace2_region_leave("pack-bitmap-write", "building_bitmaps_total",
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 04/25] Documentation: describe MIDX-based bitmaps
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (2 preceding siblings ...)
  2021-08-24 16:15   ` [PATCH v4 03/25] pack-bitmap-write.c: free existing bitmaps Taylor Blau
@ 2021-08-24 16:15   ` Taylor Blau
  2021-08-24 16:16   ` [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX Taylor Blau
                     ` (21 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:15 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Update the technical documentation to describe the multi-pack bitmap
format. This patch merely introduces the new format, and describes its
high-level ideas. Git does not yet know how to read nor write these
multi-pack variants, and so the subsequent patches will:

  - Introduce code to interpret multi-pack bitmaps, according to this
    document.

  - Then, introduce code to write multi-pack bitmaps from the 'git
    multi-pack-index write' sub-command.

Finally, the implementation will gain tests in subsequent patches (as
opposed to inline with the patch teaching Git how to write multi-pack
bitmaps) to avoid a cyclic dependency.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/bitmap-format.txt    | 71 ++++++++++++++++----
 Documentation/technical/multi-pack-index.txt | 10 +--
 2 files changed, 60 insertions(+), 21 deletions(-)

diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index f8c18a0f7a..04b3ec2178 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -1,6 +1,44 @@
 GIT bitmap v1 format
 ====================
 
+== Pack and multi-pack bitmaps
+
+Bitmaps store reachability information about the set of objects in a packfile,
+or a multi-pack index (MIDX). The former is defined obviously, and the latter is
+defined as the union of objects in packs contained in the MIDX.
+
+A bitmap may belong to either one pack, or the repository's multi-pack index (if
+it exists). A repository may have at most one bitmap.
+
+An object is uniquely described by its bit position within a bitmap:
+
+	- If the bitmap belongs to a packfile, the __n__th bit corresponds to
+	the __n__th object in pack order. For a function `offset` which maps
+	objects to their byte offset within a pack, pack order is defined as
+	follows:
+
+		o1 <= o2 <==> offset(o1) <= offset(o2)
+
+	- If the bitmap belongs to a MIDX, the __n__th bit corresponds to the
+	__n__th object in MIDX order. With an additional function `pack` which
+	maps objects to the pack they were selected from by the MIDX, MIDX order
+	is defined as follows:
+
+		o1 <= o2 <==> pack(o1) <= pack(o2) /\ offset(o1) <= offset(o2)
+
+	The ordering between packs is done according to the MIDX's .rev file.
+	Notably, the preferred pack sorts ahead of all other packs.
+
+The on-disk representation (described below) of a bitmap is the same regardless
+of whether or not that bitmap belongs to a packfile or a MIDX. The only
+difference is the interpretation of the bits, which is described above.
+
+Certain bitmap extensions are supported (see: Appendix B). No extensions are
+required for bitmaps corresponding to packfiles. For bitmaps that correspond to
+MIDXs, both the bit-cache and rev-cache extensions are required.
+
+== On-disk format
+
 	- A header appears at the beginning:
 
 		4-byte signature: {'B', 'I', 'T', 'M'}
@@ -14,17 +52,19 @@ GIT bitmap v1 format
 			The following flags are supported:
 
 			- BITMAP_OPT_FULL_DAG (0x1) REQUIRED
-			This flag must always be present. It implies that the bitmap
-			index has been generated for a packfile with full closure
-			(i.e. where every single object in the packfile can find
-			 its parent links inside the same packfile). This is a
-			requirement for the bitmap index format, also present in JGit,
-			that greatly reduces the complexity of the implementation.
+			This flag must always be present. It implies that the
+			bitmap index has been generated for a packfile or
+			multi-pack index (MIDX) with full closure (i.e. where
+			every single object in the packfile/MIDX can find its
+			parent links inside the same packfile/MIDX). This is a
+			requirement for the bitmap index format, also present in
+			JGit, that greatly reduces the complexity of the
+			implementation.
 
 			- BITMAP_OPT_HASH_CACHE (0x4)
 			If present, the end of the bitmap file contains
 			`N` 32-bit name-hash values, one per object in the
-			pack. The format and meaning of the name-hash is
+			pack/MIDX. The format and meaning of the name-hash is
 			described below.
 
 		4-byte entry count (network byte order)
@@ -33,7 +73,8 @@ GIT bitmap v1 format
 
 		20-byte checksum
 
-			The SHA1 checksum of the pack this bitmap index belongs to.
+			The SHA1 checksum of the pack/MIDX this bitmap index
+			belongs to.
 
 	- 4 EWAH bitmaps that act as type indexes
 
@@ -50,7 +91,7 @@ GIT bitmap v1 format
 			- Tags
 
 		In each bitmap, the `n`th bit is set to true if the `n`th object
-		in the packfile is of that type.
+		in the packfile or multi-pack index is of that type.
 
 		The obvious consequence is that the OR of all 4 bitmaps will result
 		in a full set (all bits set), and the AND of all 4 bitmaps will
@@ -62,8 +103,9 @@ GIT bitmap v1 format
 		Each entry contains the following:
 
 		- 4-byte object position (network byte order)
-			The position **in the index for the packfile** where the
-			bitmap for this commit is found.
+			The position **in the index for the packfile or
+			multi-pack index** where the bitmap for this commit is
+			found.
 
 		- 1-byte XOR-offset
 			The xor offset used to compress this bitmap. For an entry
@@ -146,10 +188,11 @@ Name-hash cache
 ---------------
 
 If the BITMAP_OPT_HASH_CACHE flag is set, the end of the bitmap contains
-a cache of 32-bit values, one per object in the pack. The value at
+a cache of 32-bit values, one per object in the pack/MIDX. The value at
 position `i` is the hash of the pathname at which the `i`th object
-(counting in index order) in the pack can be found.  This can be fed
-into the delta heuristics to compare objects with similar pathnames.
+(counting in index or multi-pack index order) in the pack/MIDX can be found.
+This can be fed into the delta heuristics to compare objects with similar
+pathnames.
 
 The hash algorithm used is:
 
diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index fb688976c4..1a73c3ee20 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -71,14 +71,10 @@ Future Work
   still reducing the number of binary searches required for object
   lookups.
 
-- The reachability bitmap is currently paired directly with a single
-  packfile, using the pack-order as the object order to hopefully
-  compress the bitmaps well using run-length encoding. This could be
-  extended to pair a reachability bitmap with a multi-pack-index. If
-  the multi-pack-index is extended to store a "stable object order"
+- If the multi-pack-index is extended to store a "stable object order"
   (a function Order(hash) = integer that is constant for a given hash,
-  even as the multi-pack-index is updated) then a reachability bitmap
-  could point to a multi-pack-index and be updated independently.
+  even as the multi-pack-index is updated) then MIDX bitmaps could be
+  updated independently of the MIDX.
 
 - Packfiles can be marked as "special" using empty files that share
   the initial name but replace ".pack" with ".keep" or ".promisor".
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (3 preceding siblings ...)
  2021-08-24 16:15   ` [PATCH v4 04/25] Documentation: describe MIDX-based bitmaps Taylor Blau
@ 2021-08-24 16:16   ` Taylor Blau
  2021-08-24 20:27     ` Junio C Hamano
  2021-08-24 16:16   ` [PATCH v4 06/25] midx: reject empty `--preferred-pack`'s Taylor Blau
                     ` (20 subsequent siblings)
  25 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:16 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

When writing a new multi-pack index, write_midx_internal() attempts to
clean up any auxiliary files (currently just the MIDX's `.rev` file, but
soon to include a `.bitmap`, too) corresponding to the MIDX it's
replacing.

This step should happen after the new MIDX is written into place, since
doing so beforehand means that the old MIDX could be read without its
corresponding .rev file.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 321c6fdd2f..73b199ca49 100644
--- a/midx.c
+++ b/midx.c
@@ -1086,10 +1086,11 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 
 	if (flags & MIDX_WRITE_REV_INDEX)
 		write_midx_reverse_index(midx_name, midx_hash, &ctx);
-	clear_midx_files_ext(the_repository, ".rev", midx_hash);
 
 	commit_lock_file(&lk);
 
+	clear_midx_files_ext(the_repository, ".rev", midx_hash);
+
 cleanup:
 	for (i = 0; i < ctx.nr; i++) {
 		if (ctx.info[i].p) {
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 06/25] midx: reject empty `--preferred-pack`'s
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (4 preceding siblings ...)
  2021-08-24 16:16   ` [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX Taylor Blau
@ 2021-08-24 16:16   ` Taylor Blau
  2021-08-24 16:16   ` [PATCH v4 07/25] midx: infer preferred pack when not given one Taylor Blau
                     ` (19 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:16 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

The soon-to-be-implemented multi-pack bitmap treats object in the first
bit position specially by assuming that all objects in the pack it was
selected from are also represented from that pack in the MIDX. In other
words, the pack from which the first object was selected must also have
all of its other objects selected from that same pack in the MIDX in
case of any duplicates.

But this assumption relies on the fact that there is at least one object
in that pack to begin with; otherwise the object in the first bit
position isn't from a preferred pack, in which case we can no longer
assume that all objects in that pack were also selected from the same
pack.

Guard this assumption by checking the number of objects in the given
preferred pack, and failing if the given pack is empty.

To make sure we can safely perform this check, open any packs which are
contained in an existing MIDX via prepare_midx_pack(). The same is done
for new packs via the add_pack_to_midx() callback, but packs picked up
from a previous MIDX will not yet have these opened.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt |  6 +++---
 midx.c                                 | 29 ++++++++++++++++++++++++++
 t/t5319-multi-pack-index.sh            | 17 +++++++++++++++
 3 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index ffd601bc17..c9b063d31e 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -37,9 +37,9 @@ write::
 --
 	--preferred-pack=<pack>::
 		Optionally specify the tie-breaking pack used when
-		multiple packs contain the same object. If not given,
-		ties are broken in favor of the pack with the lowest
-		mtime.
+		multiple packs contain the same object. `<pack>` must
+		contain at least one object. If not given, ties are
+		broken in favor of the pack with the lowest mtime.
 --
 
 verify::
diff --git a/midx.c b/midx.c
index 73b199ca49..551e5c2ee5 100644
--- a/midx.c
+++ b/midx.c
@@ -934,6 +934,25 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 			ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]);
 			ctx.info[ctx.nr].p = NULL;
 			ctx.info[ctx.nr].expired = 0;
+
+			if (flags & MIDX_WRITE_REV_INDEX) {
+				/*
+				 * If generating a reverse index, need to have
+				 * packed_git's loaded to compare their
+				 * mtimes and object count.
+				 */
+				if (prepare_midx_pack(the_repository, ctx.m, i)) {
+					error(_("could not load pack"));
+					result = 1;
+					goto cleanup;
+				}
+
+				if (open_pack_index(ctx.m->packs[i]))
+					die(_("could not open index for %s"),
+					    ctx.m->packs[i]->pack_name);
+				ctx.info[ctx.nr].p = ctx.m->packs[i];
+			}
+
 			ctx.nr++;
 		}
 	}
@@ -961,6 +980,16 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 		}
 	}
 
+	if (ctx.preferred_pack_idx > -1) {
+		struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p;
+		if (!preferred->num_objects) {
+			error(_("cannot select preferred pack %s with no objects"),
+			      preferred->pack_name);
+			result = 1;
+			goto cleanup;
+		}
+	}
+
 	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
 					 ctx.preferred_pack_idx);
 
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 3d4d9f10c3..9b184bd45e 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -277,6 +277,23 @@ test_expect_success 'midx picks objects from preferred pack' '
 	)
 '
 
+test_expect_success 'preferred packs must be non-empty' '
+	test_when_finished rm -rf preferred.git &&
+	git init preferred.git &&
+	(
+		cd preferred.git &&
+
+		test_commit base &&
+		git repack -ad &&
+
+		empty="$(git pack-objects $objdir/pack/pack </dev/null)" &&
+
+		test_must_fail git multi-pack-index write \
+			--preferred-pack=pack-$empty.pack 2>err &&
+		grep "with no objects" err
+	)
+'
+
 test_expect_success 'verify multi-pack-index success' '
 	git multi-pack-index verify --object-dir=$objdir
 '
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 07/25] midx: infer preferred pack when not given one
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (5 preceding siblings ...)
  2021-08-24 16:16   ` [PATCH v4 06/25] midx: reject empty `--preferred-pack`'s Taylor Blau
@ 2021-08-24 16:16   ` Taylor Blau
  2021-08-24 16:16   ` [PATCH v4 08/25] midx: close linked MIDXs, avoid leaking memory Taylor Blau
                     ` (18 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:16 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

In 9218c6a40c (midx: allow marking a pack as preferred, 2021-03-30), the
multi-pack index code learned how to select a pack which all duplicate
objects are selected from. That is, if an object appears in multiple
packs, select the copy in the preferred pack before breaking ties
according to the other rules like pack mtime and readdir() order.

Not specifying a preferred pack can cause serious problems with
multi-pack reachability bitmaps, because these bitmaps rely on having at
least one pack from which all duplicates are selected. Not having such a
pack causes problems with the code in pack-objects to reuse packs
verbatim (e.g., that code assumes that a delta object in a chunk of pack
sent verbatim will have its base object sent from the same pack).

So why does not marking a pack preferred cause problems here? The reason
is roughly as follows:

  - Ties are broken (when handling duplicate objects) by sorting
    according to midx_oid_compare(), which sorts objects by OID,
    preferred-ness, pack mtime, and finally pack ID (more on that
    later).

  - The psuedo pack-order (described in
    Documentation/technical/pack-format.txt under the section
    "multi-pack-index reverse indexes") is computed by
    midx_pack_order(), and sorts by pack ID and pack offset, with
    preferred packs sorting first.

  - But! Pack IDs come from incrementing the pack count in
    add_pack_to_midx(), which is a callback to
    for_each_file_in_pack_dir(), meaning that pack IDs are assigned in
    readdir() order.

When specifying a preferred pack, all of that works fine, because
duplicate objects are correctly resolved in favor of the copy in the
preferred pack, and the preferred pack sorts first in the object order.

"Sorting first" is critical, because the bitmap code relies on finding
out which pack holds the first object in the MIDX's pseudo pack-order to
determine which pack is preferred.

But if we didn't specify a preferred pack, and the pack which comes
first in readdir() order does not also have the lowest timestamp, then
it's possible that that pack (the one that sorts first in pseudo-pack
order, which the bitmap code will treat as the preferred one) did *not*
have all duplicate objects resolved in its favor, resulting in breakage.

The fix is simple: pick a (semi-arbitrary, non-empty) preferred pack
when none was specified. This forces that pack to have duplicates
resolved in its favor, and (critically) to sort first in pseudo-pack
order.  Unfortunately, testing this behavior portably isn't possible,
since it depends on readdir() order which isn't guaranteed by POSIX.

(Note that multi-pack reachability bitmaps have yet to be implemented;
so in that sense this patch is fixing a bug which does not yet exist.
But by having this patch beforehand, we can prevent the bug from ever
materializing.)

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 50 ++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 44 insertions(+), 6 deletions(-)

diff --git a/midx.c b/midx.c
index 551e5c2ee5..e5b17483af 100644
--- a/midx.c
+++ b/midx.c
@@ -969,15 +969,57 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
 		goto cleanup;
 
-	ctx.preferred_pack_idx = -1;
 	if (preferred_pack_name) {
+		int found = 0;
 		for (i = 0; i < ctx.nr; i++) {
 			if (!cmp_idx_or_pack_name(preferred_pack_name,
 						  ctx.info[i].pack_name)) {
 				ctx.preferred_pack_idx = i;
+				found = 1;
 				break;
 			}
 		}
+
+		if (!found)
+			warning(_("unknown preferred pack: '%s'"),
+				preferred_pack_name);
+	} else if (ctx.nr && (flags & MIDX_WRITE_REV_INDEX)) {
+		struct packed_git *oldest = ctx.info[ctx.preferred_pack_idx].p;
+		ctx.preferred_pack_idx = 0;
+
+		if (packs_to_drop && packs_to_drop->nr)
+			BUG("cannot write a MIDX bitmap during expiration");
+
+		/*
+		 * set a preferred pack when writing a bitmap to ensure that
+		 * the pack from which the first object is selected in pseudo
+		 * pack-order has all of its objects selected from that pack
+		 * (and not another pack containing a duplicate)
+		 */
+		for (i = 1; i < ctx.nr; i++) {
+			struct packed_git *p = ctx.info[i].p;
+
+			if (!oldest->num_objects || p->mtime < oldest->mtime) {
+				oldest = p;
+				ctx.preferred_pack_idx = i;
+			}
+		}
+
+		if (!oldest->num_objects) {
+			/*
+			 * If all packs are empty; unset the preferred index.
+			 * This is acceptable since there will be no duplicate
+			 * objects to resolve, so the preferred value doesn't
+			 * matter.
+			 */
+			ctx.preferred_pack_idx = -1;
+		}
+	} else {
+		/*
+		 * otherwise don't mark any pack as preferred to avoid
+		 * interfering with expiration logic below
+		 */
+		ctx.preferred_pack_idx = -1;
 	}
 
 	if (ctx.preferred_pack_idx > -1) {
@@ -1058,11 +1100,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 						      ctx.info, ctx.nr,
 						      sizeof(*ctx.info),
 						      idx_or_pack_name_cmp);
-
-		if (!preferred)
-			warning(_("unknown preferred pack: '%s'"),
-				preferred_pack_name);
-		else {
+		if (preferred) {
 			uint32_t perm = ctx.pack_perm[preferred->orig_pack_int_id];
 			if (perm == PACK_EXPIRED)
 				warning(_("preferred pack '%s' is expired"),
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 08/25] midx: close linked MIDXs, avoid leaking memory
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (6 preceding siblings ...)
  2021-08-24 16:16   ` [PATCH v4 07/25] midx: infer preferred pack when not given one Taylor Blau
@ 2021-08-24 16:16   ` Taylor Blau
  2021-08-24 16:16   ` [PATCH v4 09/25] midx: avoid opening multiple MIDXs when writing Taylor Blau
                     ` (17 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:16 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

When a repository has at least one alternate, the MIDX belonging to each
alternate is accessed through the `next` pointer on the main object
store's copy of the MIDX. close_midx() didn't bother to close any
of the linked MIDXs. It likewise didn't free the memory pointed to by
`m`, leaving uninitialized bytes with live pointers to them left around
in the heap.

Clean this up by closing linked MIDXs, and freeing up the memory pointed
to by each of them. When callers call close_midx(), then they can
discard the entire linked list of MIDXs and set their pointer to the
head of that list to NULL.

This isn't strictly required for the upcoming patches, but it makes it
much more difficult (though still possible, for e.g., by calling
`close_midx(m->next)` which leaves `m->next` pointing at uninitialized
bytes) to have pointers to uninitialized memory.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/midx.c b/midx.c
index e5b17483af..0a515d8711 100644
--- a/midx.c
+++ b/midx.c
@@ -195,6 +195,8 @@ void close_midx(struct multi_pack_index *m)
 	if (!m)
 		return;
 
+	close_midx(m->next);
+
 	munmap((unsigned char *)m->data, m->data_len);
 
 	for (i = 0; i < m->num_packs; i++) {
@@ -203,6 +205,7 @@ void close_midx(struct multi_pack_index *m)
 	}
 	FREE_AND_NULL(m->packs);
 	FREE_AND_NULL(m->pack_names);
+	free(m);
 }
 
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id)
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 09/25] midx: avoid opening multiple MIDXs when writing
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (7 preceding siblings ...)
  2021-08-24 16:16   ` [PATCH v4 08/25] midx: close linked MIDXs, avoid leaking memory Taylor Blau
@ 2021-08-24 16:16   ` Taylor Blau
  2021-08-24 16:16   ` [PATCH v4 10/25] pack-bitmap.c: introduce 'bitmap_num_objects()' Taylor Blau
                     ` (16 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:16 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Opening multiple instance of the same MIDX can lead to problems like two
separate packed_git structures which represent the same pack being added
to the repository's object store.

The above scenario can happen because prepare_midx_pack() checks if
`m->packs[pack_int_id]` is NULL in order to determine if a pack has been
opened and installed in the repository before. But a caller can
construct two copies of the same MIDX by calling get_multi_pack_index()
and load_multi_pack_index() since the former manipulates the
object store directly but the latter is a lower-level routine which
allocates a new MIDX for each call.

So if prepare_midx_pack() is called on multiple MIDXs with the same
pack_int_id, then that pack will be installed twice in the object
store's packed_git pointer.

This can lead to problems in, for e.g., the pack-bitmap code, which does
something like the following (in pack-bitmap.c:open_pack_bitmap()):

    struct bitmap_index *bitmap_git = ...;
    for (p = get_all_packs(r); p; p = p->next) {
      if (open_pack_bitmap_1(bitmap_git, p) == 0)
        ret = 0;
    }

which is a problem if two copies of the same pack exist in the
packed_git list because pack-bitmap.c:open_pack_bitmap_1() contains a
conditional like the following:

    if (bitmap_git->pack || bitmap_git->midx) {
      /* ignore extra bitmap file; we can only handle one */
      warning("ignoring extra bitmap file: %s", packfile->pack_name);
      close(fd);
      return -1;
    }

Avoid this scenario by not letting write_midx_internal() open a MIDX
that isn't also pointed at by the object store. So long as this is the
case, other routines should prefer to open MIDXs with
get_multi_pack_index() or reprepare_packed_git() instead of creating
instances on their own. Because get_multi_pack_index() returns
`r->object_store->multi_pack_index` if it is non-NULL, we'll only have
one instance of a MIDX open at one time, avoiding these problems.

To encourage this, drop the `struct multi_pack_index *` parameter from
`write_midx_internal()`, and rely instead on the `object_dir` to find
(or initialize) the correct MIDX instance.

Likewise, replace the call to `close_midx()` with
`close_object_store()`, since we're about to replace the MIDX with a new
one and should invalidate the object store's memory of any MIDX that
might have existed beforehand.

Note that this now forbids passing object directories that don't belong
to alternate repositories over `--object-dir`, since before we would
have happily opened a MIDX in any directory, but now restrict ourselves
to only those reachable by `r->objects->multi_pack_index` (and alternate
MIDXs that we can see by walking the `next` pointer).

As far as I can tell, supporting arbitrary directories with
`--object-dir` was a historical accident, since even the documentation
says `<alt>` when referring to the value passed to this option.

A future patch could clean this up and provide a warning() when a
non-alternate directory was given, since we'll still write a new MIDX
there, we just won't reuse any MIDX that might happen to already exist
in that directory.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/midx.c b/midx.c
index 0a515d8711..3dacb31f9d 100644
--- a/midx.c
+++ b/midx.c
@@ -893,7 +893,7 @@ static int midx_checksum_valid(struct multi_pack_index *m)
 	return hashfile_checksum_valid(m->data, m->data_len);
 }
 
-static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
+static int write_midx_internal(const char *object_dir,
 			       struct string_list *packs_to_drop,
 			       const char *preferred_pack_name,
 			       unsigned flags)
@@ -904,6 +904,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	struct hashfile *f = NULL;
 	struct lock_file lk;
 	struct write_midx_context ctx = { 0 };
+	struct multi_pack_index *cur;
 	int pack_name_concat_len = 0;
 	int dropped_packs = 0;
 	int result = 0;
@@ -914,10 +915,12 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 		die_errno(_("unable to create leading directories of %s"),
 			  midx_name);
 
-	if (m)
-		ctx.m = m;
-	else
-		ctx.m = load_multi_pack_index(object_dir, 1);
+	for (cur = get_multi_pack_index(the_repository); cur; cur = cur->next) {
+		if (!strcmp(object_dir, cur->object_dir)) {
+			ctx.m = cur;
+			break;
+		}
+	}
 
 	if (ctx.m && !midx_checksum_valid(ctx.m)) {
 		warning(_("ignoring existing multi-pack-index; checksum mismatch"));
@@ -1119,7 +1122,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
 
 	if (ctx.m)
-		close_midx(ctx.m);
+		close_object_store(the_repository->objects);
 
 	if (ctx.nr - dropped_packs == 0) {
 		error(_("no pack files to index."));
@@ -1182,8 +1185,7 @@ int write_midx_file(const char *object_dir,
 		    const char *preferred_pack_name,
 		    unsigned flags)
 {
-	return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name,
-				   flags);
+	return write_midx_internal(object_dir, NULL, preferred_pack_name, flags);
 }
 
 struct clear_midx_data {
@@ -1461,8 +1463,10 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
 
 	free(count);
 
-	if (packs_to_drop.nr)
-		result = write_midx_internal(object_dir, m, &packs_to_drop, NULL, flags);
+	if (packs_to_drop.nr) {
+		result = write_midx_internal(object_dir, &packs_to_drop, NULL, flags);
+		m = NULL;
+	}
 
 	string_list_clear(&packs_to_drop, 0);
 	return result;
@@ -1651,7 +1655,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 		goto cleanup;
 	}
 
-	result = write_midx_internal(object_dir, m, NULL, NULL, flags);
+	result = write_midx_internal(object_dir, NULL, NULL, flags);
 	m = NULL;
 
 cleanup:
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 10/25] pack-bitmap.c: introduce 'bitmap_num_objects()'
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (8 preceding siblings ...)
  2021-08-24 16:16   ` [PATCH v4 09/25] midx: avoid opening multiple MIDXs when writing Taylor Blau
@ 2021-08-24 16:16   ` Taylor Blau
  2021-08-24 16:16   ` [PATCH v4 11/25] pack-bitmap.c: introduce 'nth_bitmap_object_oid()' Taylor Blau
                     ` (15 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:16 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A subsequent patch to support reading MIDX bitmaps will be less noisy
after extracting a generic function to return how many objects are
contained in a bitmap.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 37 +++++++++++++++++++++----------------
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 9b11af87aa..65356f9657 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -136,6 +136,11 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
 	return b;
 }
 
+static uint32_t bitmap_num_objects(struct bitmap_index *index)
+{
+	return index->pack->num_objects;
+}
+
 static int load_bitmap_header(struct bitmap_index *index)
 {
 	struct bitmap_disk_header *header = (void *)index->map;
@@ -154,7 +159,7 @@ static int load_bitmap_header(struct bitmap_index *index)
 	/* Parse known bitmap format options */
 	{
 		uint32_t flags = ntohs(header->options);
-		size_t cache_size = st_mult(index->pack->num_objects, sizeof(uint32_t));
+		size_t cache_size = st_mult(bitmap_num_objects(index), sizeof(uint32_t));
 		unsigned char *index_end = index->map + index->map_size - the_hash_algo->rawsz;
 
 		if ((flags & BITMAP_OPT_FULL_DAG) == 0)
@@ -404,7 +409,7 @@ static inline int bitmap_position_extended(struct bitmap_index *bitmap_git,
 
 	if (pos < kh_end(positions)) {
 		int bitmap_pos = kh_value(positions, pos);
-		return bitmap_pos + bitmap_git->pack->num_objects;
+		return bitmap_pos + bitmap_num_objects(bitmap_git);
 	}
 
 	return -1;
@@ -456,7 +461,7 @@ static int ext_index_add_object(struct bitmap_index *bitmap_git,
 		bitmap_pos = kh_value(eindex->positions, hash_pos);
 	}
 
-	return bitmap_pos + bitmap_git->pack->num_objects;
+	return bitmap_pos + bitmap_num_objects(bitmap_git);
 }
 
 struct bitmap_show_data {
@@ -673,7 +678,7 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
 	for (i = 0; i < eindex->count; ++i) {
 		struct object *obj;
 
-		if (!bitmap_get(objects, bitmap_git->pack->num_objects + i))
+		if (!bitmap_get(objects, bitmap_num_objects(bitmap_git) + i))
 			continue;
 
 		obj = eindex->objects[i];
@@ -832,7 +837,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
 	 * them individually.
 	 */
 	for (i = 0; i < eindex->count; i++) {
-		uint32_t pos = i + bitmap_git->pack->num_objects;
+		uint32_t pos = i + bitmap_num_objects(bitmap_git);
 		if (eindex->objects[i]->type == type &&
 		    bitmap_get(to_filter, pos) &&
 		    !bitmap_get(tips, pos))
@@ -859,7 +864,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 
 	oi.sizep = &size;
 
-	if (pos < pack->num_objects) {
+	if (pos < bitmap_num_objects(bitmap_git)) {
 		off_t ofs = pack_pos_to_offset(pack, pos);
 		if (packed_object_info(the_repository, pack, ofs, &oi) < 0) {
 			struct object_id oid;
@@ -869,7 +874,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 		}
 	} else {
 		struct eindex *eindex = &bitmap_git->ext_index;
-		struct object *obj = eindex->objects[pos - pack->num_objects];
+		struct object *obj = eindex->objects[pos - bitmap_num_objects(bitmap_git)];
 		if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
 			die(_("unable to get size of %s"), oid_to_hex(&obj->oid));
 	}
@@ -911,7 +916,7 @@ static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
 	}
 
 	for (i = 0; i < eindex->count; i++) {
-		uint32_t pos = i + bitmap_git->pack->num_objects;
+		uint32_t pos = i + bitmap_num_objects(bitmap_git);
 		if (eindex->objects[i]->type == OBJ_BLOB &&
 		    bitmap_get(to_filter, pos) &&
 		    !bitmap_get(tips, pos) &&
@@ -1137,8 +1142,8 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 	enum object_type type;
 	unsigned long size;
 
-	if (pos >= bitmap_git->pack->num_objects)
-		return; /* not actually in the pack */
+	if (pos >= bitmap_num_objects(bitmap_git))
+		return; /* not actually in the pack or MIDX */
 
 	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
 	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
@@ -1204,6 +1209,7 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 	struct pack_window *w_curs = NULL;
 	size_t i = 0;
 	uint32_t offset;
+	uint32_t objects_nr = bitmap_num_objects(bitmap_git);
 
 	assert(result);
 
@@ -1211,8 +1217,8 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 		i++;
 
 	/* Don't mark objects not in the packfile */
-	if (i > bitmap_git->pack->num_objects / BITS_IN_EWORD)
-		i = bitmap_git->pack->num_objects / BITS_IN_EWORD;
+	if (i > objects_nr / BITS_IN_EWORD)
+		i = objects_nr / BITS_IN_EWORD;
 
 	reuse = bitmap_word_alloc(i);
 	memset(reuse->words, 0xFF, i * sizeof(eword_t));
@@ -1296,7 +1302,7 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
 
 	for (i = 0; i < eindex->count; ++i) {
 		if (eindex->objects[i]->type == type &&
-			bitmap_get(objects, bitmap_git->pack->num_objects + i))
+			bitmap_get(objects, bitmap_num_objects(bitmap_git) + i))
 			count++;
 	}
 
@@ -1517,7 +1523,7 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 	uint32_t i, num_objects;
 	uint32_t *reposition;
 
-	num_objects = bitmap_git->pack->num_objects;
+	num_objects = bitmap_num_objects(bitmap_git);
 	CALLOC_ARRAY(reposition, num_objects);
 
 	for (i = 0; i < num_objects; ++i) {
@@ -1600,7 +1606,6 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
 static off_t get_disk_usage_for_extended(struct bitmap_index *bitmap_git)
 {
 	struct bitmap *result = bitmap_git->result;
-	struct packed_git *pack = bitmap_git->pack;
 	struct eindex *eindex = &bitmap_git->ext_index;
 	off_t total = 0;
 	struct object_info oi = OBJECT_INFO_INIT;
@@ -1612,7 +1617,7 @@ static off_t get_disk_usage_for_extended(struct bitmap_index *bitmap_git)
 	for (i = 0; i < eindex->count; i++) {
 		struct object *obj = eindex->objects[i];
 
-		if (!bitmap_get(result, pack->num_objects + i))
+		if (!bitmap_get(result, bitmap_num_objects(bitmap_git) + i))
 			continue;
 
 		if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 11/25] pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (9 preceding siblings ...)
  2021-08-24 16:16   ` [PATCH v4 10/25] pack-bitmap.c: introduce 'bitmap_num_objects()' Taylor Blau
@ 2021-08-24 16:16   ` Taylor Blau
  2021-08-24 16:16   ` [PATCH v4 12/25] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' Taylor Blau
                     ` (14 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:16 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A subsequent patch to support reading MIDX bitmaps will be less noisy
after extracting a generic function to fetch the nth OID contained in
the bitmap.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 65356f9657..612f62da97 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -223,6 +223,13 @@ static inline uint8_t read_u8(const unsigned char *buffer, size_t *pos)
 
 #define MAX_XOR_OFFSET 160
 
+static int nth_bitmap_object_oid(struct bitmap_index *index,
+				 struct object_id *oid,
+				 uint32_t n)
+{
+	return nth_packed_object_id(oid, index->pack, n);
+}
+
 static int load_bitmap_entries_v1(struct bitmap_index *index)
 {
 	uint32_t i;
@@ -242,7 +249,7 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 		xor_offset = read_u8(index->map, &index->map_pos);
 		flags = read_u8(index->map, &index->map_pos);
 
-		if (nth_packed_object_id(&oid, index->pack, commit_idx_pos) < 0)
+		if (nth_bitmap_object_oid(index, &oid, commit_idx_pos) < 0)
 			return error("corrupt ewah bitmap: commit index %u out of range",
 				     (unsigned)commit_idx_pos);
 
@@ -868,8 +875,8 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 		off_t ofs = pack_pos_to_offset(pack, pos);
 		if (packed_object_info(the_repository, pack, ofs, &oi) < 0) {
 			struct object_id oid;
-			nth_packed_object_id(&oid, pack,
-					     pack_pos_to_index(pack, pos));
+			nth_bitmap_object_oid(bitmap_git, &oid,
+					      pack_pos_to_index(pack, pos));
 			die(_("unable to get size of %s"), oid_to_hex(&oid));
 		}
 	} else {
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 12/25] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (10 preceding siblings ...)
  2021-08-24 16:16   ` [PATCH v4 11/25] pack-bitmap.c: introduce 'nth_bitmap_object_oid()' Taylor Blau
@ 2021-08-24 16:16   ` Taylor Blau
  2021-08-24 16:16   ` [PATCH v4 13/25] pack-bitmap.c: avoid redundant calls to try_partial_reuse Taylor Blau
                     ` (13 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:16 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

In a recent commit, pack-objects learned support for the
'pack.preferBitmapTips' configuration. This patch prepares the
multi-pack bitmap code to respect this configuration, too.

The yet-to-be implemented code will find that it is more efficient to
check whether each reference contains a prefix found in the configured
set of values rather than doing an additional traversal.

Implement a function 'bitmap_is_preferred_refname()' which will perform
that check. Its caller will be added in a subsequent patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 16 ++++++++++++++++
 pack-bitmap.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 612f62da97..d5296750eb 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1658,3 +1658,19 @@ const struct string_list *bitmap_preferred_tips(struct repository *r)
 {
 	return repo_config_get_value_multi(r, "pack.preferbitmaptips");
 }
+
+int bitmap_is_preferred_refname(struct repository *r, const char *refname)
+{
+	const struct string_list *preferred_tips = bitmap_preferred_tips(r);
+	struct string_list_item *item;
+
+	if (!preferred_tips)
+		return 0;
+
+	for_each_string_list_item(item, preferred_tips) {
+		if (starts_with(refname, item->string))
+			return 1;
+	}
+
+	return 0;
+}
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 020cd8d868..52ea10de51 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -94,5 +94,6 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint16_t options);
 
 const struct string_list *bitmap_preferred_tips(struct repository *r);
+int bitmap_is_preferred_refname(struct repository *r, const char *refname);
 
 #endif
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 13/25] pack-bitmap.c: avoid redundant calls to try_partial_reuse
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (11 preceding siblings ...)
  2021-08-24 16:16   ` [PATCH v4 12/25] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' Taylor Blau
@ 2021-08-24 16:16   ` Taylor Blau
  2021-08-24 16:16   ` [PATCH v4 14/25] pack-bitmap: read multi-pack bitmaps Taylor Blau
                     ` (12 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:16 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

try_partial_reuse() is used to mark any bits in the beginning of a
bitmap whose objects can be reused verbatim from the pack they came
from.

Currently this function returns void, and signals nothing to the caller
when bits could not be reused. But multi-pack bitmaps would benefit from
having such a signal, because they may try to pass objects which are in
bounds, but from a pack other than the preferred one.

Any extra calls are noops because of a conditional in
reuse_partial_packfile_from_bitmap(), but those loop iterations can be
avoided by letting try_partial_reuse() indicate when it can't accept any
more bits for reuse, and then listening to that signal.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 40 +++++++++++++++++++++++++++++-----------
 1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index d5296750eb..4e37f5d574 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1140,22 +1140,26 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	return NULL;
 }
 
-static void try_partial_reuse(struct bitmap_index *bitmap_git,
-			      size_t pos,
-			      struct bitmap *reuse,
-			      struct pack_window **w_curs)
+/*
+ * -1 means "stop trying further objects"; 0 means we may or may not have
+ * reused, but you can keep feeding bits.
+ */
+static int try_partial_reuse(struct bitmap_index *bitmap_git,
+			     size_t pos,
+			     struct bitmap *reuse,
+			     struct pack_window **w_curs)
 {
 	off_t offset, header;
 	enum object_type type;
 	unsigned long size;
 
 	if (pos >= bitmap_num_objects(bitmap_git))
-		return; /* not actually in the pack or MIDX */
+		return -1; /* not actually in the pack or MIDX */
 
 	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
 	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
 	if (type < 0)
-		return; /* broken packfile, punt */
+		return -1; /* broken packfile, punt */
 
 	if (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA) {
 		off_t base_offset;
@@ -1172,9 +1176,9 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 		base_offset = get_delta_base(bitmap_git->pack, w_curs,
 					     &offset, type, header);
 		if (!base_offset)
-			return;
+			return 0;
 		if (offset_to_pack_pos(bitmap_git->pack, base_offset, &base_pos) < 0)
-			return;
+			return 0;
 
 		/*
 		 * We assume delta dependencies always point backwards. This
@@ -1186,7 +1190,7 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 		 * odd parameters.
 		 */
 		if (base_pos >= pos)
-			return;
+			return 0;
 
 		/*
 		 * And finally, if we're not sending the base as part of our
@@ -1197,13 +1201,14 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 		 * object_entry code path handle it.
 		 */
 		if (!bitmap_get(reuse, base_pos))
-			return;
+			return 0;
 	}
 
 	/*
 	 * If we got here, then the object is OK to reuse. Mark it.
 	 */
 	bitmap_set(reuse, pos);
+	return 0;
 }
 
 int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
@@ -1239,10 +1244,23 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 				break;
 
 			offset += ewah_bit_ctz64(word >> offset);
-			try_partial_reuse(bitmap_git, pos + offset, reuse, &w_curs);
+			if (try_partial_reuse(bitmap_git, pos + offset, reuse,
+					      &w_curs) < 0) {
+				/*
+				 * try_partial_reuse indicated we couldn't reuse
+				 * any bits, so there is no point in trying more
+				 * bits in the current word, or any other words
+				 * in result.
+				 *
+				 * Jump out of both loops to avoid future
+				 * unnecessary calls to try_partial_reuse.
+				 */
+				goto done;
+			}
 		}
 	}
 
+done:
 	unuse_pack(&w_curs);
 
 	*entries = bitmap_popcount(reuse);
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 14/25] pack-bitmap: read multi-pack bitmaps
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (12 preceding siblings ...)
  2021-08-24 16:16   ` [PATCH v4 13/25] pack-bitmap.c: avoid redundant calls to try_partial_reuse Taylor Blau
@ 2021-08-24 16:16   ` Taylor Blau
  2021-08-24 16:16   ` [PATCH v4 15/25] pack-bitmap: write " Taylor Blau
                     ` (11 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:16 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

This prepares the code in pack-bitmap to interpret the new multi-pack
bitmaps described in Documentation/technical/bitmap-format.txt, which
mostly involves converting bit positions to accommodate looking them up
in a MIDX.

Note that there are currently no writers who write multi-pack bitmaps,
and that this will be implemented in the subsequent commit. Note also
that get_midx_checksum() and get_midx_filename() are made non-static so
they can be called from pack-bitmap.c.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c |   5 +
 midx.c                 |   4 +-
 midx.h                 |   2 +
 pack-bitmap-write.c    |   2 +-
 pack-bitmap.c          | 357 ++++++++++++++++++++++++++++++++++++-----
 pack-bitmap.h          |   6 +
 packfile.c             |   2 +-
 7 files changed, 336 insertions(+), 42 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 8a523624a1..e11d3ac2e5 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1124,6 +1124,11 @@ static void write_reused_pack(struct hashfile *f)
 				break;
 
 			offset += ewah_bit_ctz64(word >> offset);
+			/*
+			 * Can use bit positions directly, even for MIDX
+			 * bitmaps. See comment in try_partial_reuse()
+			 * for why.
+			 */
 			write_reused_pack_one(pos + offset, f, &w_curs);
 			display_progress(progress_state, ++written);
 		}
diff --git a/midx.c b/midx.c
index 3dacb31f9d..2dceaf9565 100644
--- a/midx.c
+++ b/midx.c
@@ -48,12 +48,12 @@ static uint8_t oid_version(void)
 	}
 }
 
-static const unsigned char *get_midx_checksum(struct multi_pack_index *m)
+const unsigned char *get_midx_checksum(struct multi_pack_index *m)
 {
 	return m->data + m->data_len - the_hash_algo->rawsz;
 }
 
-static char *get_midx_filename(const char *object_dir)
+char *get_midx_filename(const char *object_dir)
 {
 	return xstrfmt("%s/pack/multi-pack-index", object_dir);
 }
diff --git a/midx.h b/midx.h
index 8684cf0fef..1172df1a71 100644
--- a/midx.h
+++ b/midx.h
@@ -42,6 +42,8 @@ struct multi_pack_index {
 #define MIDX_PROGRESS     (1 << 0)
 #define MIDX_WRITE_REV_INDEX (1 << 1)
 
+const unsigned char *get_midx_checksum(struct multi_pack_index *m);
+char *get_midx_filename(const char *object_dir);
 char *get_midx_rev_filename(struct multi_pack_index *m);
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 142fd0adb8..9c55c1531e 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -48,7 +48,7 @@ void bitmap_writer_show_progress(int show)
 }
 
 /**
- * Build the initial type index for the packfile
+ * Build the initial type index for the packfile or multi-pack-index
  */
 void bitmap_writer_build_type_index(struct packing_data *to_pack,
 				    struct pack_idx_entry **index,
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 4e37f5d574..fa69ed7a6d 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -13,6 +13,7 @@
 #include "repository.h"
 #include "object-store.h"
 #include "list-objects-filter-options.h"
+#include "midx.h"
 #include "config.h"
 
 /*
@@ -35,8 +36,15 @@ struct stored_bitmap {
  * the active bitmap index is the largest one.
  */
 struct bitmap_index {
-	/* Packfile to which this bitmap index belongs to */
+	/*
+	 * The pack or multi-pack index (MIDX) that this bitmap index belongs
+	 * to.
+	 *
+	 * Exactly one of these must be non-NULL; this specifies the object
+	 * order used to interpret this bitmap.
+	 */
 	struct packed_git *pack;
+	struct multi_pack_index *midx;
 
 	/*
 	 * Mark the first `reuse_objects` in the packfile as reused:
@@ -71,6 +79,9 @@ struct bitmap_index {
 	/* If not NULL, this is a name-hash cache pointing into map. */
 	uint32_t *hashes;
 
+	/* The checksum of the packfile or MIDX; points into map. */
+	const unsigned char *checksum;
+
 	/*
 	 * Extended index.
 	 *
@@ -138,6 +149,8 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
 
 static uint32_t bitmap_num_objects(struct bitmap_index *index)
 {
+	if (index->midx)
+		return index->midx->num_objects;
 	return index->pack->num_objects;
 }
 
@@ -175,6 +188,7 @@ static int load_bitmap_header(struct bitmap_index *index)
 	}
 
 	index->entry_count = ntohl(header->entry_count);
+	index->checksum = header->checksum;
 	index->map_pos += header_size;
 	return 0;
 }
@@ -227,6 +241,8 @@ static int nth_bitmap_object_oid(struct bitmap_index *index,
 				 struct object_id *oid,
 				 uint32_t n)
 {
+	if (index->midx)
+		return nth_midxed_object_oid(oid, index->midx, n) ? 0 : -1;
 	return nth_packed_object_id(oid, index->pack, n);
 }
 
@@ -274,7 +290,14 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 	return 0;
 }
 
-static char *pack_bitmap_filename(struct packed_git *p)
+char *midx_bitmap_filename(struct multi_pack_index *midx)
+{
+	return xstrfmt("%s-%s.bitmap",
+		       get_midx_filename(midx->object_dir),
+		       hash_to_hex(get_midx_checksum(midx)));
+}
+
+char *pack_bitmap_filename(struct packed_git *p)
 {
 	size_t len;
 
@@ -283,6 +306,57 @@ static char *pack_bitmap_filename(struct packed_git *p)
 	return xstrfmt("%.*s.bitmap", (int)len, p->pack_name);
 }
 
+static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
+			      struct multi_pack_index *midx)
+{
+	struct stat st;
+	char *idx_name = midx_bitmap_filename(midx);
+	int fd = git_open(idx_name);
+
+	free(idx_name);
+
+	if (fd < 0)
+		return -1;
+
+	if (fstat(fd, &st)) {
+		close(fd);
+		return -1;
+	}
+
+	if (bitmap_git->pack || bitmap_git->midx) {
+		/* ignore extra bitmap file; we can only handle one */
+		warning("ignoring extra bitmap file: %s",
+			get_midx_filename(midx->object_dir));
+		close(fd);
+		return -1;
+	}
+
+	bitmap_git->midx = midx;
+	bitmap_git->map_size = xsize_t(st.st_size);
+	bitmap_git->map_pos = 0;
+	bitmap_git->map = xmmap(NULL, bitmap_git->map_size, PROT_READ,
+				MAP_PRIVATE, fd, 0);
+	close(fd);
+
+	if (load_bitmap_header(bitmap_git) < 0)
+		goto cleanup;
+
+	if (!hasheq(get_midx_checksum(bitmap_git->midx), bitmap_git->checksum))
+		goto cleanup;
+
+	if (load_midx_revindex(bitmap_git->midx) < 0) {
+		warning(_("multi-pack bitmap is missing required reverse index"));
+		goto cleanup;
+	}
+	return 0;
+
+cleanup:
+	munmap(bitmap_git->map, bitmap_git->map_size);
+	bitmap_git->map_size = 0;
+	bitmap_git->map = NULL;
+	return -1;
+}
+
 static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git *packfile)
 {
 	int fd;
@@ -304,7 +378,8 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
 		return -1;
 	}
 
-	if (bitmap_git->pack) {
+	if (bitmap_git->pack || bitmap_git->midx) {
+		/* ignore extra bitmap file; we can only handle one */
 		warning("ignoring extra bitmap file: %s", packfile->pack_name);
 		close(fd);
 		return -1;
@@ -331,13 +406,39 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
 	return 0;
 }
 
-static int load_pack_bitmap(struct bitmap_index *bitmap_git)
+static int load_reverse_index(struct bitmap_index *bitmap_git)
+{
+	if (bitmap_is_midx(bitmap_git)) {
+		uint32_t i;
+		int ret;
+
+		/*
+		 * The multi-pack-index's .rev file is already loaded via
+		 * open_pack_bitmap_1().
+		 *
+		 * But we still need to open the individual pack .rev files,
+		 * since we will need to make use of them in pack-objects.
+		 */
+		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
+			if (prepare_midx_pack(the_repository, bitmap_git->midx, i))
+				die(_("load_reverse_index: could not open pack"));
+			ret = load_pack_revindex(bitmap_git->midx->packs[i]);
+			if (ret)
+				return ret;
+		}
+		return 0;
+	}
+	return load_pack_revindex(bitmap_git->pack);
+}
+
+static int load_bitmap(struct bitmap_index *bitmap_git)
 {
 	assert(bitmap_git->map);
 
 	bitmap_git->bitmaps = kh_init_oid_map();
 	bitmap_git->ext_index.positions = kh_init_oid_pos();
-	if (load_pack_revindex(bitmap_git->pack))
+
+	if (load_reverse_index(bitmap_git))
 		goto failed;
 
 	if (!(bitmap_git->commits = read_bitmap_1(bitmap_git)) ||
@@ -381,11 +482,47 @@ static int open_pack_bitmap(struct repository *r,
 	return ret;
 }
 
+static int open_midx_bitmap(struct repository *r,
+			    struct bitmap_index *bitmap_git)
+{
+	struct multi_pack_index *midx;
+
+	assert(!bitmap_git->map);
+
+	for (midx = get_multi_pack_index(r); midx; midx = midx->next) {
+		if (!open_midx_bitmap_1(bitmap_git, midx))
+			return 0;
+	}
+	return -1;
+}
+
+static int open_bitmap(struct repository *r,
+		       struct bitmap_index *bitmap_git)
+{
+	assert(!bitmap_git->map);
+
+	if (!open_midx_bitmap(r, bitmap_git))
+		return 0;
+	return open_pack_bitmap(r, bitmap_git);
+}
+
 struct bitmap_index *prepare_bitmap_git(struct repository *r)
 {
 	struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
 
-	if (!open_pack_bitmap(r, bitmap_git) && !load_pack_bitmap(bitmap_git))
+	if (!open_bitmap(r, bitmap_git) && !load_bitmap(bitmap_git))
+		return bitmap_git;
+
+	free_bitmap_index(bitmap_git);
+	return NULL;
+}
+
+struct bitmap_index *prepare_midx_bitmap_git(struct repository *r,
+					     struct multi_pack_index *midx)
+{
+	struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
+
+	if (!open_midx_bitmap_1(bitmap_git, midx) && !load_bitmap(bitmap_git))
 		return bitmap_git;
 
 	free_bitmap_index(bitmap_git);
@@ -435,10 +572,26 @@ static inline int bitmap_position_packfile(struct bitmap_index *bitmap_git,
 	return pos;
 }
 
+static int bitmap_position_midx(struct bitmap_index *bitmap_git,
+				const struct object_id *oid)
+{
+	uint32_t want, got;
+	if (!bsearch_midx(oid, bitmap_git->midx, &want))
+		return -1;
+
+	if (midx_to_pack_pos(bitmap_git->midx, want, &got) < 0)
+		return -1;
+	return got;
+}
+
 static int bitmap_position(struct bitmap_index *bitmap_git,
 			   const struct object_id *oid)
 {
-	int pos = bitmap_position_packfile(bitmap_git, oid);
+	int pos;
+	if (bitmap_is_midx(bitmap_git))
+		pos = bitmap_position_midx(bitmap_git, oid);
+	else
+		pos = bitmap_position_packfile(bitmap_git, oid);
 	return (pos >= 0) ? pos : bitmap_position_extended(bitmap_git, oid);
 }
 
@@ -749,6 +902,7 @@ static void show_objects_for_type(
 			continue;
 
 		for (offset = 0; offset < BITS_IN_EWORD; ++offset) {
+			struct packed_git *pack;
 			struct object_id oid;
 			uint32_t hash = 0, index_pos;
 			off_t ofs;
@@ -758,14 +912,28 @@ static void show_objects_for_type(
 
 			offset += ewah_bit_ctz64(word >> offset);
 
-			index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset);
-			ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset);
-			nth_packed_object_id(&oid, bitmap_git->pack, index_pos);
+			if (bitmap_is_midx(bitmap_git)) {
+				struct multi_pack_index *m = bitmap_git->midx;
+				uint32_t pack_id;
+
+				index_pos = pack_pos_to_midx(m, pos + offset);
+				ofs = nth_midxed_offset(m, index_pos);
+				nth_midxed_object_oid(&oid, m, index_pos);
+
+				pack_id = nth_midxed_pack_int_id(m, index_pos);
+				pack = bitmap_git->midx->packs[pack_id];
+			} else {
+				index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset);
+				ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset);
+				nth_bitmap_object_oid(bitmap_git, &oid, index_pos);
+
+				pack = bitmap_git->pack;
+			}
 
 			if (bitmap_git->hashes)
 				hash = get_be32(bitmap_git->hashes + index_pos);
 
-			show_reach(&oid, object_type, 0, hash, bitmap_git->pack, ofs);
+			show_reach(&oid, object_type, 0, hash, pack, ofs);
 		}
 	}
 }
@@ -777,8 +945,13 @@ static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
 		struct object *object = roots->item;
 		roots = roots->next;
 
-		if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
-			return 1;
+		if (bitmap_is_midx(bitmap_git)) {
+			if (bsearch_midx(&object->oid, bitmap_git->midx, NULL))
+				return 1;
+		} else {
+			if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
+				return 1;
+		}
 	}
 
 	return 0;
@@ -865,14 +1038,26 @@ static void filter_bitmap_blob_none(struct bitmap_index *bitmap_git,
 static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 				     uint32_t pos)
 {
-	struct packed_git *pack = bitmap_git->pack;
 	unsigned long size;
 	struct object_info oi = OBJECT_INFO_INIT;
 
 	oi.sizep = &size;
 
 	if (pos < bitmap_num_objects(bitmap_git)) {
-		off_t ofs = pack_pos_to_offset(pack, pos);
+		struct packed_git *pack;
+		off_t ofs;
+
+		if (bitmap_is_midx(bitmap_git)) {
+			uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
+			uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
+
+			pack = bitmap_git->midx->packs[pack_id];
+			ofs = nth_midxed_offset(bitmap_git->midx, midx_pos);
+		} else {
+			pack = bitmap_git->pack;
+			ofs = pack_pos_to_offset(pack, pos);
+		}
+
 		if (packed_object_info(the_repository, pack, ofs, &oi) < 0) {
 			struct object_id oid;
 			nth_bitmap_object_oid(bitmap_git, &oid,
@@ -1053,7 +1238,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	/* try to open a bitmapped pack, but don't parse it yet
 	 * because we may not need to use it */
 	CALLOC_ARRAY(bitmap_git, 1);
-	if (open_pack_bitmap(revs->repo, bitmap_git) < 0)
+	if (open_bitmap(revs->repo, bitmap_git) < 0)
 		goto cleanup;
 
 	for (i = 0; i < revs->pending.nr; ++i) {
@@ -1097,7 +1282,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	 * from disk. this is the point of no return; after this the rev_list
 	 * becomes invalidated and we must perform the revwalk through bitmaps
 	 */
-	if (load_pack_bitmap(bitmap_git) < 0)
+	if (load_bitmap(bitmap_git) < 0)
 		goto cleanup;
 
 	object_array_clear(&revs->pending);
@@ -1145,19 +1330,43 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
  * reused, but you can keep feeding bits.
  */
 static int try_partial_reuse(struct bitmap_index *bitmap_git,
+			     struct packed_git *pack,
 			     size_t pos,
 			     struct bitmap *reuse,
 			     struct pack_window **w_curs)
 {
-	off_t offset, header;
+	off_t offset, delta_obj_offset;
 	enum object_type type;
 	unsigned long size;
 
-	if (pos >= bitmap_num_objects(bitmap_git))
-		return -1; /* not actually in the pack or MIDX */
+	/*
+	 * try_partial_reuse() is called either on (a) objects in the
+	 * bitmapped pack (in the case of a single-pack bitmap) or (b)
+	 * objects in the preferred pack of a multi-pack bitmap.
+	 * Importantly, the latter can pretend as if only a single pack
+	 * exists because:
+	 *
+	 *   - The first pack->num_objects bits of a MIDX bitmap are
+	 *     reserved for the preferred pack, and
+	 *
+	 *   - Ties due to duplicate objects are always resolved in
+	 *     favor of the preferred pack.
+	 *
+	 * Therefore we do not need to ever ask the MIDX for its copy of
+	 * an object by OID, since it will always select it from the
+	 * preferred pack. Likewise, the selected copy of the base
+	 * object for any deltas will reside in the same pack.
+	 *
+	 * This means that we can reuse pos when looking up the bit in
+	 * the reuse bitmap, too, since bits corresponding to the
+	 * preferred pack precede all bits from other packs.
+	 */
 
-	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
-	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
+	if (pos >= pack->num_objects)
+		return -1; /* not actually in the pack or MIDX preferred pack */
+
+	offset = delta_obj_offset = pack_pos_to_offset(pack, pos);
+	type = unpack_object_header(pack, w_curs, &offset, &size);
 	if (type < 0)
 		return -1; /* broken packfile, punt */
 
@@ -1173,11 +1382,11 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
 		 * and the normal slow path will complain about it in
 		 * more detail.
 		 */
-		base_offset = get_delta_base(bitmap_git->pack, w_curs,
-					     &offset, type, header);
+		base_offset = get_delta_base(pack, w_curs, &offset, type,
+					     delta_obj_offset);
 		if (!base_offset)
 			return 0;
-		if (offset_to_pack_pos(bitmap_git->pack, base_offset, &base_pos) < 0)
+		if (offset_to_pack_pos(pack, base_offset, &base_pos) < 0)
 			return 0;
 
 		/*
@@ -1211,24 +1420,48 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
 	return 0;
 }
 
+static uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git)
+{
+	struct multi_pack_index *m = bitmap_git->midx;
+	if (!m)
+		BUG("midx_preferred_pack: requires non-empty MIDX");
+	return nth_midxed_pack_int_id(m, pack_pos_to_midx(bitmap_git->midx, 0));
+}
+
 int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 				       struct packed_git **packfile_out,
 				       uint32_t *entries,
 				       struct bitmap **reuse_out)
 {
+	struct packed_git *pack;
 	struct bitmap *result = bitmap_git->result;
 	struct bitmap *reuse;
 	struct pack_window *w_curs = NULL;
 	size_t i = 0;
 	uint32_t offset;
-	uint32_t objects_nr = bitmap_num_objects(bitmap_git);
+	uint32_t objects_nr;
 
 	assert(result);
 
+	load_reverse_index(bitmap_git);
+
+	if (bitmap_is_midx(bitmap_git))
+		pack = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)];
+	else
+		pack = bitmap_git->pack;
+	objects_nr = pack->num_objects;
+
 	while (i < result->word_alloc && result->words[i] == (eword_t)~0)
 		i++;
 
-	/* Don't mark objects not in the packfile */
+	/*
+	 * Don't mark objects not in the packfile or preferred pack. This bitmap
+	 * marks objects eligible for reuse, but the pack-reuse code only
+	 * understands how to reuse a single pack. Since the preferred pack is
+	 * guaranteed to have all bases for its deltas (in a multi-pack bitmap),
+	 * we use it instead of another pack. In single-pack bitmaps, the choice
+	 * is made for us.
+	 */
 	if (i > objects_nr / BITS_IN_EWORD)
 		i = objects_nr / BITS_IN_EWORD;
 
@@ -1244,8 +1477,8 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 				break;
 
 			offset += ewah_bit_ctz64(word >> offset);
-			if (try_partial_reuse(bitmap_git, pos + offset, reuse,
-					      &w_curs) < 0) {
+			if (try_partial_reuse(bitmap_git, pack, pos + offset,
+					      reuse, &w_curs) < 0) {
 				/*
 				 * try_partial_reuse indicated we couldn't reuse
 				 * any bits, so there is no point in trying more
@@ -1274,7 +1507,7 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 	 * need to be handled separately.
 	 */
 	bitmap_and_not(result, reuse);
-	*packfile_out = bitmap_git->pack;
+	*packfile_out = pack;
 	*reuse_out = reuse;
 	return 0;
 }
@@ -1548,6 +1781,12 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 	uint32_t i, num_objects;
 	uint32_t *reposition;
 
+	if (!bitmap_is_midx(bitmap_git))
+		load_reverse_index(bitmap_git);
+	else if (load_midx_revindex(bitmap_git->midx) < 0)
+		BUG("rebuild_existing_bitmaps: missing required rev-cache "
+		    "extension");
+
 	num_objects = bitmap_num_objects(bitmap_git);
 	CALLOC_ARRAY(reposition, num_objects);
 
@@ -1555,8 +1794,13 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 		struct object_id oid;
 		struct object_entry *oe;
 
-		nth_packed_object_id(&oid, bitmap_git->pack,
-				     pack_pos_to_index(bitmap_git->pack, i));
+		if (bitmap_is_midx(bitmap_git))
+			nth_midxed_object_oid(&oid,
+					      bitmap_git->midx,
+					      pack_pos_to_midx(bitmap_git->midx, i));
+		else
+			nth_packed_object_id(&oid, bitmap_git->pack,
+					     pack_pos_to_index(bitmap_git->pack, i));
 		oe = packlist_find(mapping, &oid);
 
 		if (oe)
@@ -1582,6 +1826,19 @@ void free_bitmap_index(struct bitmap_index *b)
 	free(b->ext_index.hashes);
 	bitmap_free(b->result);
 	bitmap_free(b->haves);
+	if (bitmap_is_midx(b)) {
+		/*
+		 * Multi-pack bitmaps need to have resources associated with
+		 * their on-disk reverse indexes unmapped so that stale .rev and
+		 * .bitmap files can be removed.
+		 *
+		 * Unlike pack-based bitmaps, multi-pack bitmaps can be read and
+		 * written in the same 'git multi-pack-index write --bitmap'
+		 * process. Close resources so they can be removed safely on
+		 * platforms like Windows.
+		 */
+		close_midx_revindex(b->midx);
+	}
 	free(b);
 }
 
@@ -1596,7 +1853,6 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
 				     enum object_type object_type)
 {
 	struct bitmap *result = bitmap_git->result;
-	struct packed_git *pack = bitmap_git->pack;
 	off_t total = 0;
 	struct ewah_iterator it;
 	eword_t filter;
@@ -1613,15 +1869,35 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
 			continue;
 
 		for (offset = 0; offset < BITS_IN_EWORD; offset++) {
-			size_t pos;
-
 			if ((word >> offset) == 0)
 				break;
 
 			offset += ewah_bit_ctz64(word >> offset);
-			pos = base + offset;
-			total += pack_pos_to_offset(pack, pos + 1) -
-				 pack_pos_to_offset(pack, pos);
+
+			if (bitmap_is_midx(bitmap_git)) {
+				uint32_t pack_pos;
+				uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, base + offset);
+				off_t offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
+
+				uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
+				struct packed_git *pack = bitmap_git->midx->packs[pack_id];
+
+				if (offset_to_pack_pos(pack, offset, &pack_pos) < 0) {
+					struct object_id oid;
+					nth_midxed_object_oid(&oid, bitmap_git->midx, midx_pos);
+
+					die(_("could not find %s in pack %s at offset %"PRIuMAX),
+					    oid_to_hex(&oid),
+					    pack->pack_name,
+					    (uintmax_t)offset);
+				}
+
+				total += pack_pos_to_offset(pack, pack_pos + 1) - offset;
+			} else {
+				size_t pos = base + offset;
+				total += pack_pos_to_offset(bitmap_git->pack, pos + 1) -
+					 pack_pos_to_offset(bitmap_git->pack, pos);
+			}
 		}
 	}
 
@@ -1672,6 +1948,11 @@ off_t get_disk_usage_from_bitmap(struct bitmap_index *bitmap_git,
 	return total;
 }
 
+int bitmap_is_midx(struct bitmap_index *bitmap_git)
+{
+	return !!bitmap_git->midx;
+}
+
 const struct string_list *bitmap_preferred_tips(struct repository *r)
 {
 	return repo_config_get_value_multi(r, "pack.preferbitmaptips");
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 52ea10de51..81664f933f 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -44,6 +44,8 @@ typedef int (*show_reachable_fn)(
 struct bitmap_index;
 
 struct bitmap_index *prepare_bitmap_git(struct repository *r);
+struct bitmap_index *prepare_midx_bitmap_git(struct repository *r,
+					     struct multi_pack_index *midx);
 void count_bitmap_commit_list(struct bitmap_index *, uint32_t *commits,
 			      uint32_t *trees, uint32_t *blobs, uint32_t *tags);
 void traverse_bitmap_commit_list(struct bitmap_index *,
@@ -92,6 +94,10 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint32_t index_nr,
 			  const char *filename,
 			  uint16_t options);
+char *midx_bitmap_filename(struct multi_pack_index *midx);
+char *pack_bitmap_filename(struct packed_git *p);
+
+int bitmap_is_midx(struct bitmap_index *bitmap_git);
 
 const struct string_list *bitmap_preferred_tips(struct repository *r);
 int bitmap_is_preferred_refname(struct repository *r, const char *refname);
diff --git a/packfile.c b/packfile.c
index 9ef6d98292..371f5488cf 100644
--- a/packfile.c
+++ b/packfile.c
@@ -860,7 +860,7 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
 	if (!strcmp(file_name, "multi-pack-index"))
 		return;
 	if (starts_with(file_name, "multi-pack-index") &&
-	    ends_with(file_name, ".rev"))
+	    (ends_with(file_name, ".bitmap") || ends_with(file_name, ".rev")))
 		return;
 	if (ends_with(file_name, ".idx") ||
 	    ends_with(file_name, ".rev") ||
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 15/25] pack-bitmap: write multi-pack bitmaps
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (13 preceding siblings ...)
  2021-08-24 16:16   ` [PATCH v4 14/25] pack-bitmap: read multi-pack bitmaps Taylor Blau
@ 2021-08-24 16:16   ` Taylor Blau
  2021-08-24 16:16   ` [PATCH v4 16/25] t5310: move some tests to lib-bitmap.sh Taylor Blau
                     ` (10 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:16 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Write multi-pack bitmaps in the format described by
Documentation/technical/bitmap-format.txt, inferring their presence with
the absence of '--bitmap'.

To write a multi-pack bitmap, this patch attempts to reuse as much of
the existing machinery from pack-objects as possible. Specifically, the
MIDX code prepares a packing_data struct that pretends as if a single
packfile has been generated containing all of the objects contained
within the MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt |  12 +-
 builtin/multi-pack-index.c             |   2 +
 midx.c                                 | 208 ++++++++++++++++++++++++-
 midx.h                                 |   1 +
 4 files changed, 214 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index c9b063d31e..ed52459a9d 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -10,7 +10,7 @@ SYNOPSIS
 --------
 [verse]
 'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress]
-	[--preferred-pack=<pack>] <subcommand>
+	[--preferred-pack=<pack>] [--[no-]bitmap] <subcommand>
 
 DESCRIPTION
 -----------
@@ -40,6 +40,9 @@ write::
 		multiple packs contain the same object. `<pack>` must
 		contain at least one object. If not given, ties are
 		broken in favor of the pack with the lowest mtime.
+
+	--[no-]bitmap::
+		Control whether or not a multi-pack bitmap is written.
 --
 
 verify::
@@ -81,6 +84,13 @@ EXAMPLES
 $ git multi-pack-index write
 -----------------------------------------------
 
+* Write a MIDX file for the packfiles in the current .git folder with a
+corresponding bitmap.
++
+-------------------------------------------------------------
+$ git multi-pack-index write --preferred-pack=<pack> --bitmap
+-------------------------------------------------------------
+
 * Write a MIDX file for the packfiles in an alternate object store.
 +
 -----------------------------------------------
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 8ff0dee2ec..73c0113b48 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -68,6 +68,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 		OPT_STRING(0, "preferred-pack", &opts.preferred_pack,
 			   N_("preferred-pack"),
 			   N_("pack for reuse when computing a multi-pack bitmap")),
+		OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
+			MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
 		OPT_END(),
 	};
 
diff --git a/midx.c b/midx.c
index 2dceaf9565..4574e6d411 100644
--- a/midx.c
+++ b/midx.c
@@ -13,6 +13,10 @@
 #include "repository.h"
 #include "chunk-format.h"
 #include "pack.h"
+#include "pack-bitmap.h"
+#include "refs.h"
+#include "revision.h"
+#include "list-objects.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
@@ -893,6 +897,166 @@ static int midx_checksum_valid(struct multi_pack_index *m)
 	return hashfile_checksum_valid(m->data, m->data_len);
 }
 
+static void prepare_midx_packing_data(struct packing_data *pdata,
+				      struct write_midx_context *ctx)
+{
+	uint32_t i;
+
+	memset(pdata, 0, sizeof(struct packing_data));
+	prepare_packing_data(the_repository, pdata);
+
+	for (i = 0; i < ctx->entries_nr; i++) {
+		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
+		struct object_entry *to = packlist_alloc(pdata, &from->oid);
+
+		oe_set_in_pack(pdata, to,
+			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
+	}
+}
+
+static int add_ref_to_pending(const char *refname,
+			      const struct object_id *oid,
+			      int flag, void *cb_data)
+{
+	struct rev_info *revs = (struct rev_info*)cb_data;
+	struct object *object;
+
+	if ((flag & REF_ISSYMREF) && (flag & REF_ISBROKEN)) {
+		warning("symbolic ref is dangling: %s", refname);
+		return 0;
+	}
+
+	object = parse_object_or_die(oid, refname);
+	if (object->type != OBJ_COMMIT)
+		return 0;
+
+	add_pending_object(revs, object, "");
+	if (bitmap_is_preferred_refname(revs->repo, refname))
+		object->flags |= NEEDS_BITMAP;
+	return 0;
+}
+
+struct bitmap_commit_cb {
+	struct commit **commits;
+	size_t commits_nr, commits_alloc;
+
+	struct write_midx_context *ctx;
+};
+
+static const struct object_id *bitmap_oid_access(size_t index,
+						 const void *_entries)
+{
+	const struct pack_midx_entry *entries = _entries;
+	return &entries[index].oid;
+}
+
+static void bitmap_show_commit(struct commit *commit, void *_data)
+{
+	struct bitmap_commit_cb *data = _data;
+	int pos = oid_pos(&commit->object.oid, data->ctx->entries,
+			  data->ctx->entries_nr,
+			  bitmap_oid_access);
+	if (pos < 0)
+		return;
+
+	ALLOC_GROW(data->commits, data->commits_nr + 1, data->commits_alloc);
+	data->commits[data->commits_nr++] = commit;
+}
+
+static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr_p,
+						    struct write_midx_context *ctx)
+{
+	struct rev_info revs;
+	struct bitmap_commit_cb cb = {0};
+
+	cb.ctx = ctx;
+
+	repo_init_revisions(the_repository, &revs, NULL);
+	setup_revisions(0, NULL, &revs, NULL);
+	for_each_ref(add_ref_to_pending, &revs);
+
+	/*
+	 * Skipping promisor objects here is intentional, since it only excludes
+	 * them from the list of reachable commits that we want to select from
+	 * when computing the selection of MIDX'd commits to receive bitmaps.
+	 *
+	 * Reachability bitmaps do require that their objects be closed under
+	 * reachability, but fetching any objects missing from promisors at this
+	 * point is too late. But, if one of those objects can be reached from
+	 * an another object that is included in the bitmap, then we will
+	 * complain later that we don't have reachability closure (and fail
+	 * appropriately).
+	 */
+	fetch_if_missing = 0;
+	revs.exclude_promisor_objects = 1;
+
+	if (prepare_revision_walk(&revs))
+		die(_("revision walk setup failed"));
+
+	traverse_commit_list(&revs, bitmap_show_commit, NULL, &cb);
+	if (indexed_commits_nr_p)
+		*indexed_commits_nr_p = cb.commits_nr;
+
+	return cb.commits;
+}
+
+static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
+			     struct write_midx_context *ctx,
+			     unsigned flags)
+{
+	struct packing_data pdata;
+	struct pack_idx_entry **index;
+	struct commit **commits = NULL;
+	uint32_t i, commits_nr;
+	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
+	int ret;
+
+	prepare_midx_packing_data(&pdata, ctx);
+
+	commits = find_commits_for_midx_bitmap(&commits_nr, ctx);
+
+	/*
+	 * Build the MIDX-order index based on pdata.objects (which is already
+	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
+	 * this order).
+	 */
+	ALLOC_ARRAY(index, pdata.nr_objects);
+	for (i = 0; i < pdata.nr_objects; i++)
+		index[i] = &pdata.objects[i].idx;
+
+	bitmap_writer_show_progress(flags & MIDX_PROGRESS);
+	bitmap_writer_build_type_index(&pdata, index, pdata.nr_objects);
+
+	/*
+	 * bitmap_writer_finish expects objects in lex order, but pack_order
+	 * gives us exactly that. use it directly instead of re-sorting the
+	 * array.
+	 *
+	 * This changes the order of objects in 'index' between
+	 * bitmap_writer_build_type_index and bitmap_writer_finish.
+	 *
+	 * The same re-ordering takes place in the single-pack bitmap code via
+	 * write_idx_file(), which is called by finish_tmp_packfile(), which
+	 * happens between bitmap_writer_build_type_index() and
+	 * bitmap_writer_finish().
+	 */
+	for (i = 0; i < pdata.nr_objects; i++)
+		index[ctx->pack_order[i]] = &pdata.objects[i].idx;
+
+	bitmap_writer_select_commits(commits, commits_nr, -1);
+	ret = bitmap_writer_build(&pdata);
+	if (ret < 0)
+		goto cleanup;
+
+	bitmap_writer_set_checksum(midx_hash);
+	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, 0);
+
+cleanup:
+	free(index);
+	free(bitmap_name);
+	return ret;
+}
+
 static int write_midx_internal(const char *object_dir,
 			       struct string_list *packs_to_drop,
 			       const char *preferred_pack_name,
@@ -938,7 +1102,7 @@ static int write_midx_internal(const char *object_dir,
 
 			ctx.info[ctx.nr].orig_pack_int_id = i;
 			ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]);
-			ctx.info[ctx.nr].p = NULL;
+			ctx.info[ctx.nr].p = ctx.m->packs[i];
 			ctx.info[ctx.nr].expired = 0;
 
 			if (flags & MIDX_WRITE_REV_INDEX) {
@@ -972,8 +1136,26 @@ static int write_midx_internal(const char *object_dir,
 	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
 	stop_progress(&ctx.progress);
 
-	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
-		goto cleanup;
+	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop) {
+		struct bitmap_index *bitmap_git;
+		int bitmap_exists;
+		int want_bitmap = flags & MIDX_WRITE_BITMAP;
+
+		bitmap_git = prepare_midx_bitmap_git(the_repository, ctx.m);
+		bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
+		free_bitmap_index(bitmap_git);
+
+		if (bitmap_exists || !want_bitmap) {
+			/*
+			 * The correct MIDX already exists, and so does a
+			 * corresponding bitmap (or one wasn't requested).
+			 */
+			if (!want_bitmap)
+				clear_midx_files_ext(the_repository, ".bitmap",
+						     NULL);
+			goto cleanup;
+		}
+	}
 
 	if (preferred_pack_name) {
 		int found = 0;
@@ -989,7 +1171,8 @@ static int write_midx_internal(const char *object_dir,
 		if (!found)
 			warning(_("unknown preferred pack: '%s'"),
 				preferred_pack_name);
-	} else if (ctx.nr && (flags & MIDX_WRITE_REV_INDEX)) {
+	} else if (ctx.nr &&
+		   (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
 		struct packed_git *oldest = ctx.info[ctx.preferred_pack_idx].p;
 		ctx.preferred_pack_idx = 0;
 
@@ -1121,9 +1304,6 @@ static int write_midx_internal(const char *object_dir,
 	hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
 	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
 
-	if (ctx.m)
-		close_object_store(the_repository->objects);
-
 	if (ctx.nr - dropped_packs == 0) {
 		error(_("no pack files to index."));
 		result = 1;
@@ -1154,14 +1334,24 @@ static int write_midx_internal(const char *object_dir,
 	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
 
-	if (flags & MIDX_WRITE_REV_INDEX)
+	if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))
 		ctx.pack_order = midx_pack_order(&ctx);
 
 	if (flags & MIDX_WRITE_REV_INDEX)
 		write_midx_reverse_index(midx_name, midx_hash, &ctx);
+	if (flags & MIDX_WRITE_BITMAP) {
+		if (write_midx_bitmap(midx_name, midx_hash, &ctx, flags) < 0) {
+			error(_("could not write multi-pack bitmap"));
+			result = 1;
+			goto cleanup;
+		}
+	}
+
+	close_object_store(the_repository->objects);
 
 	commit_lock_file(&lk);
 
+	clear_midx_files_ext(the_repository, ".bitmap", midx_hash);
 	clear_midx_files_ext(the_repository, ".rev", midx_hash);
 
 cleanup:
@@ -1178,6 +1368,7 @@ static int write_midx_internal(const char *object_dir,
 	free(ctx.pack_perm);
 	free(ctx.pack_order);
 	free(midx_name);
+
 	return result;
 }
 
@@ -1238,6 +1429,7 @@ void clear_midx_file(struct repository *r)
 	if (remove_path(midx))
 		die(_("failed to clear multi-pack-index at %s"), midx);
 
+	clear_midx_files_ext(r, ".bitmap", NULL);
 	clear_midx_files_ext(r, ".rev", NULL);
 
 	free(midx);
diff --git a/midx.h b/midx.h
index 1172df1a71..350f4d0a7b 100644
--- a/midx.h
+++ b/midx.h
@@ -41,6 +41,7 @@ struct multi_pack_index {
 
 #define MIDX_PROGRESS     (1 << 0)
 #define MIDX_WRITE_REV_INDEX (1 << 1)
+#define MIDX_WRITE_BITMAP (1 << 2)
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m);
 char *get_midx_filename(const char *object_dir);
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 16/25] t5310: move some tests to lib-bitmap.sh
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (14 preceding siblings ...)
  2021-08-24 16:16   ` [PATCH v4 15/25] pack-bitmap: write " Taylor Blau
@ 2021-08-24 16:16   ` Taylor Blau
  2021-08-24 16:16   ` [PATCH v4 17/25] t/helper/test-read-midx.c: add --checksum mode Taylor Blau
                     ` (9 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:16 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

We'll soon be adding a test script that will cover many of the same
bitmap concepts as t5310, but for MIDX bitmaps. Let's pull out as many
of the applicable tests as we can so we don't have to rewrite them.

There should be no functional change to t5310; we still run the same
operations in the same order.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/lib-bitmap.sh         | 236 ++++++++++++++++++++++++++++++++++++++++
 t/t5310-pack-bitmaps.sh | 227 +-------------------------------------
 2 files changed, 240 insertions(+), 223 deletions(-)

diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index fe3f98be24..77464da6fd 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -1,3 +1,6 @@
+# Helpers for scripts testing bitmap functionality; see t5310 for
+# example usage.
+
 # Compare a file containing rev-list bitmap traversal output to its non-bitmap
 # counterpart. You can't just use test_cmp for this, because the two produce
 # subtly different output:
@@ -24,3 +27,236 @@ test_bitmap_traversal () {
 	test_cmp "$1.normalized" "$2.normalized" &&
 	rm -f "$1.normalized" "$2.normalized"
 }
+
+# To ensure the logic for "maximal commits" is exercised, make
+# the repository a bit more complicated.
+#
+#    other                         second
+#      *                             *
+# (99 commits)                  (99 commits)
+#      *                             *
+#      |\                           /|
+#      | * octo-other  octo-second * |
+#      |/|\_________  ____________/|\|
+#      | \          \/  __________/  |
+#      |  | ________/\ /             |
+#      *  |/          * merge-right  *
+#      | _|__________/ \____________ |
+#      |/ |                         \|
+# (l1) *  * merge-left               * (r1)
+#      | / \________________________ |
+#      |/                           \|
+# (l2) *                             * (r2)
+#       \___________________________ |
+#                                   \|
+#                                    * (base)
+#
+# We only push bits down the first-parent history, which
+# makes some of these commits unimportant!
+#
+# The important part for the maximal commit algorithm is how
+# the bitmasks are extended. Assuming starting bit positions
+# for second (bit 0) and other (bit 1), the bitmasks at the
+# end should be:
+#
+#      second: 1       (maximal, selected)
+#       other: 01      (maximal, selected)
+#      (base): 11 (maximal)
+#
+# This complicated history was important for a previous
+# version of the walk that guarantees never walking a
+# commit multiple times. That goal might be important
+# again, so preserve this complicated case. For now, this
+# test will guarantee that the bitmaps are computed
+# correctly, even with the repeat calculations.
+setup_bitmap_history() {
+	test_expect_success 'setup repo with moderate-sized history' '
+		test_commit_bulk --id=file 10 &&
+		git branch -M second &&
+		git checkout -b other HEAD~5 &&
+		test_commit_bulk --id=side 10 &&
+
+		# add complicated history setup, including merges and
+		# ambiguous merge-bases
+
+		git checkout -b merge-left other~2 &&
+		git merge second~2 -m "merge-left" &&
+
+		git checkout -b merge-right second~1 &&
+		git merge other~1 -m "merge-right" &&
+
+		git checkout -b octo-second second &&
+		git merge merge-left merge-right -m "octopus-second" &&
+
+		git checkout -b octo-other other &&
+		git merge merge-left merge-right -m "octopus-other" &&
+
+		git checkout other &&
+		git merge octo-other -m "pull octopus" &&
+
+		git checkout second &&
+		git merge octo-second -m "pull octopus" &&
+
+		# Remove these branches so they are not selected
+		# as bitmap tips
+		git branch -D merge-left &&
+		git branch -D merge-right &&
+		git branch -D octo-other &&
+		git branch -D octo-second &&
+
+		# add padding to make these merges less interesting
+		# and avoid having them selected for bitmaps
+		test_commit_bulk --id=file 100 &&
+		git checkout other &&
+		test_commit_bulk --id=side 100 &&
+		git checkout second &&
+
+		bitmaptip=$(git rev-parse second) &&
+		blob=$(echo tagged-blob | git hash-object -w --stdin) &&
+		git tag tagged-blob $blob
+	'
+}
+
+rev_list_tests_head () {
+	test_expect_success "counting commits via bitmap ($state, $branch)" '
+		git rev-list --count $branch >expect &&
+		git rev-list --use-bitmap-index --count $branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting partial commits via bitmap ($state, $branch)" '
+		git rev-list --count $branch~5..$branch >expect &&
+		git rev-list --use-bitmap-index --count $branch~5..$branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting commits with limit ($state, $branch)" '
+		git rev-list --count -n 1 $branch >expect &&
+		git rev-list --use-bitmap-index --count -n 1 $branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting non-linear history ($state, $branch)" '
+		git rev-list --count other...second >expect &&
+		git rev-list --use-bitmap-index --count other...second >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting commits with limiting ($state, $branch)" '
+		git rev-list --count $branch -- 1.t >expect &&
+		git rev-list --use-bitmap-index --count $branch -- 1.t >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting objects via bitmap ($state, $branch)" '
+		git rev-list --count --objects $branch >expect &&
+		git rev-list --use-bitmap-index --count --objects $branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "enumerate commits ($state, $branch)" '
+		git rev-list --use-bitmap-index $branch >actual &&
+		git rev-list $branch >expect &&
+		test_bitmap_traversal --no-confirm-bitmaps expect actual
+	'
+
+	test_expect_success "enumerate --objects ($state, $branch)" '
+		git rev-list --objects --use-bitmap-index $branch >actual &&
+		git rev-list --objects $branch >expect &&
+		test_bitmap_traversal expect actual
+	'
+
+	test_expect_success "bitmap --objects handles non-commit objects ($state, $branch)" '
+		git rev-list --objects --use-bitmap-index $branch tagged-blob >actual &&
+		grep $blob actual
+	'
+}
+
+rev_list_tests () {
+	state=$1
+
+	for branch in "second" "other"
+	do
+		rev_list_tests_head
+	done
+}
+
+basic_bitmap_tests () {
+	tip="$1"
+	test_expect_success 'rev-list --test-bitmap verifies bitmaps' "
+		git rev-list --test-bitmap "${tip:-HEAD}"
+	"
+
+	rev_list_tests 'full bitmap'
+
+	test_expect_success 'clone from bitmapped repository' '
+		rm -fr clone.git &&
+		git clone --no-local --bare . clone.git &&
+		git rev-parse HEAD >expect &&
+		git --git-dir=clone.git rev-parse HEAD >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success 'partial clone from bitmapped repository' '
+		test_config uploadpack.allowfilter true &&
+		rm -fr partial-clone.git &&
+		git clone --no-local --bare --filter=blob:none . partial-clone.git &&
+		(
+			cd partial-clone.git &&
+			pack=$(echo objects/pack/*.pack) &&
+			git verify-pack -v "$pack" >have &&
+			awk "/blob/ { print \$1 }" <have >blobs &&
+			# we expect this single blob because of the direct ref
+			git rev-parse refs/tags/tagged-blob >expect &&
+			test_cmp expect blobs
+		)
+	'
+
+	test_expect_success 'setup further non-bitmapped commits' '
+		test_commit_bulk --id=further 10
+	'
+
+	rev_list_tests 'partial bitmap'
+
+	test_expect_success 'fetch (partial bitmap)' '
+		git --git-dir=clone.git fetch origin second:second &&
+		git rev-parse HEAD >expect &&
+		git --git-dir=clone.git rev-parse HEAD >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success 'enumerating progress counts pack-reused objects' '
+		count=$(git rev-list --objects --all --count) &&
+		git repack -adb &&
+
+		# check first with only reused objects; confirm that our
+		# progress showed the right number, and also that we did
+		# pack-reuse as expected.  Check only the final "done"
+		# line of the meter (there may be an arbitrary number of
+		# intermediate lines ending with CR).
+		GIT_PROGRESS_DELAY=0 \
+			git pack-objects --all --stdout --progress \
+			</dev/null >/dev/null 2>stderr &&
+		grep "Enumerating objects: $count, done" stderr &&
+		grep "pack-reused $count" stderr &&
+
+		# now the same but with one non-reused object
+		git commit --allow-empty -m "an extra commit object" &&
+		GIT_PROGRESS_DELAY=0 \
+			git pack-objects --all --stdout --progress \
+			</dev/null >/dev/null 2>stderr &&
+		grep "Enumerating objects: $((count+1)), done" stderr &&
+		grep "pack-reused $count" stderr
+	'
+}
+
+# have_delta <obj> <expected_base>
+#
+# Note that because this relies on cat-file, it might find _any_ copy of an
+# object in the repository. The caller is responsible for making sure
+# there's only one (e.g., via "repack -ad", or having just fetched a copy).
+have_delta () {
+	echo $2 >expect &&
+	echo $1 | git cat-file --batch-check="%(deltabase)" >actual &&
+	test_cmp expect actual
+}
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index b02838750e..4318f84d53 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -25,93 +25,10 @@ has_any () {
 	grep -Ff "$1" "$2"
 }
 
-# To ensure the logic for "maximal commits" is exercised, make
-# the repository a bit more complicated.
-#
-#    other                         second
-#      *                             *
-# (99 commits)                  (99 commits)
-#      *                             *
-#      |\                           /|
-#      | * octo-other  octo-second * |
-#      |/|\_________  ____________/|\|
-#      | \          \/  __________/  |
-#      |  | ________/\ /             |
-#      *  |/          * merge-right  *
-#      | _|__________/ \____________ |
-#      |/ |                         \|
-# (l1) *  * merge-left               * (r1)
-#      | / \________________________ |
-#      |/                           \|
-# (l2) *                             * (r2)
-#       \___________________________ |
-#                                   \|
-#                                    * (base)
-#
-# We only push bits down the first-parent history, which
-# makes some of these commits unimportant!
-#
-# The important part for the maximal commit algorithm is how
-# the bitmasks are extended. Assuming starting bit positions
-# for second (bit 0) and other (bit 1), the bitmasks at the
-# end should be:
-#
-#      second: 1       (maximal, selected)
-#       other: 01      (maximal, selected)
-#      (base): 11 (maximal)
-#
-# This complicated history was important for a previous
-# version of the walk that guarantees never walking a
-# commit multiple times. That goal might be important
-# again, so preserve this complicated case. For now, this
-# test will guarantee that the bitmaps are computed
-# correctly, even with the repeat calculations.
+setup_bitmap_history
 
-test_expect_success 'setup repo with moderate-sized history' '
-	test_commit_bulk --id=file 10 &&
-	git branch -M second &&
-	git checkout -b other HEAD~5 &&
-	test_commit_bulk --id=side 10 &&
-
-	# add complicated history setup, including merges and
-	# ambiguous merge-bases
-
-	git checkout -b merge-left other~2 &&
-	git merge second~2 -m "merge-left" &&
-
-	git checkout -b merge-right second~1 &&
-	git merge other~1 -m "merge-right" &&
-
-	git checkout -b octo-second second &&
-	git merge merge-left merge-right -m "octopus-second" &&
-
-	git checkout -b octo-other other &&
-	git merge merge-left merge-right -m "octopus-other" &&
-
-	git checkout other &&
-	git merge octo-other -m "pull octopus" &&
-
-	git checkout second &&
-	git merge octo-second -m "pull octopus" &&
-
-	# Remove these branches so they are not selected
-	# as bitmap tips
-	git branch -D merge-left &&
-	git branch -D merge-right &&
-	git branch -D octo-other &&
-	git branch -D octo-second &&
-
-	# add padding to make these merges less interesting
-	# and avoid having them selected for bitmaps
-	test_commit_bulk --id=file 100 &&
-	git checkout other &&
-	test_commit_bulk --id=side 100 &&
-	git checkout second &&
-
-	bitmaptip=$(git rev-parse second) &&
-	blob=$(echo tagged-blob | git hash-object -w --stdin) &&
-	git tag tagged-blob $blob &&
-	git config repack.writebitmaps true
+test_expect_success 'setup writing bitmaps during repack' '
+	git config repack.writeBitmaps true
 '
 
 test_expect_success 'full repack creates bitmaps' '
@@ -123,109 +40,7 @@ test_expect_success 'full repack creates bitmaps' '
 	grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
 '
 
-test_expect_success 'rev-list --test-bitmap verifies bitmaps' '
-	git rev-list --test-bitmap HEAD
-'
-
-rev_list_tests_head () {
-	test_expect_success "counting commits via bitmap ($state, $branch)" '
-		git rev-list --count $branch >expect &&
-		git rev-list --use-bitmap-index --count $branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting partial commits via bitmap ($state, $branch)" '
-		git rev-list --count $branch~5..$branch >expect &&
-		git rev-list --use-bitmap-index --count $branch~5..$branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting commits with limit ($state, $branch)" '
-		git rev-list --count -n 1 $branch >expect &&
-		git rev-list --use-bitmap-index --count -n 1 $branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting non-linear history ($state, $branch)" '
-		git rev-list --count other...second >expect &&
-		git rev-list --use-bitmap-index --count other...second >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting commits with limiting ($state, $branch)" '
-		git rev-list --count $branch -- 1.t >expect &&
-		git rev-list --use-bitmap-index --count $branch -- 1.t >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting objects via bitmap ($state, $branch)" '
-		git rev-list --count --objects $branch >expect &&
-		git rev-list --use-bitmap-index --count --objects $branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "enumerate commits ($state, $branch)" '
-		git rev-list --use-bitmap-index $branch >actual &&
-		git rev-list $branch >expect &&
-		test_bitmap_traversal --no-confirm-bitmaps expect actual
-	'
-
-	test_expect_success "enumerate --objects ($state, $branch)" '
-		git rev-list --objects --use-bitmap-index $branch >actual &&
-		git rev-list --objects $branch >expect &&
-		test_bitmap_traversal expect actual
-	'
-
-	test_expect_success "bitmap --objects handles non-commit objects ($state, $branch)" '
-		git rev-list --objects --use-bitmap-index $branch tagged-blob >actual &&
-		grep $blob actual
-	'
-}
-
-rev_list_tests () {
-	state=$1
-
-	for branch in "second" "other"
-	do
-		rev_list_tests_head
-	done
-}
-
-rev_list_tests 'full bitmap'
-
-test_expect_success 'clone from bitmapped repository' '
-	git clone --no-local --bare . clone.git &&
-	git rev-parse HEAD >expect &&
-	git --git-dir=clone.git rev-parse HEAD >actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'partial clone from bitmapped repository' '
-	test_config uploadpack.allowfilter true &&
-	git clone --no-local --bare --filter=blob:none . partial-clone.git &&
-	(
-		cd partial-clone.git &&
-		pack=$(echo objects/pack/*.pack) &&
-		git verify-pack -v "$pack" >have &&
-		awk "/blob/ { print \$1 }" <have >blobs &&
-		# we expect this single blob because of the direct ref
-		git rev-parse refs/tags/tagged-blob >expect &&
-		test_cmp expect blobs
-	)
-'
-
-test_expect_success 'setup further non-bitmapped commits' '
-	test_commit_bulk --id=further 10
-'
-
-rev_list_tests 'partial bitmap'
-
-test_expect_success 'fetch (partial bitmap)' '
-	git --git-dir=clone.git fetch origin second:second &&
-	git rev-parse HEAD >expect &&
-	git --git-dir=clone.git rev-parse HEAD >actual &&
-	test_cmp expect actual
-'
+basic_bitmap_tests
 
 test_expect_success 'incremental repack fails when bitmaps are requested' '
 	test_commit more-1 &&
@@ -461,40 +276,6 @@ test_expect_success 'truncated bitmap fails gracefully (cache)' '
 	test_i18ngrep corrupted.bitmap.index stderr
 '
 
-test_expect_success 'enumerating progress counts pack-reused objects' '
-	count=$(git rev-list --objects --all --count) &&
-	git repack -adb &&
-
-	# check first with only reused objects; confirm that our progress
-	# showed the right number, and also that we did pack-reuse as expected.
-	# Check only the final "done" line of the meter (there may be an
-	# arbitrary number of intermediate lines ending with CR).
-	GIT_PROGRESS_DELAY=0 \
-		git pack-objects --all --stdout --progress \
-		</dev/null >/dev/null 2>stderr &&
-	grep "Enumerating objects: $count, done" stderr &&
-	grep "pack-reused $count" stderr &&
-
-	# now the same but with one non-reused object
-	git commit --allow-empty -m "an extra commit object" &&
-	GIT_PROGRESS_DELAY=0 \
-		git pack-objects --all --stdout --progress \
-		</dev/null >/dev/null 2>stderr &&
-	grep "Enumerating objects: $((count+1)), done" stderr &&
-	grep "pack-reused $count" stderr
-'
-
-# have_delta <obj> <expected_base>
-#
-# Note that because this relies on cat-file, it might find _any_ copy of an
-# object in the repository. The caller is responsible for making sure
-# there's only one (e.g., via "repack -ad", or having just fetched a copy).
-have_delta () {
-	echo $2 >expect &&
-	echo $1 | git cat-file --batch-check="%(deltabase)" >actual &&
-	test_cmp expect actual
-}
-
 # Create a state of history with these properties:
 #
 #  - refs that allow a client to fetch some new history, while sharing some old
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 17/25] t/helper/test-read-midx.c: add --checksum mode
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (15 preceding siblings ...)
  2021-08-24 16:16   ` [PATCH v4 16/25] t5310: move some tests to lib-bitmap.sh Taylor Blau
@ 2021-08-24 16:16   ` Taylor Blau
  2021-08-24 16:16   ` [PATCH v4 18/25] t5326: test multi-pack bitmap behavior Taylor Blau
                     ` (8 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:16 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Subsequent tests will want to check for the existence of a multi-pack
bitmap which matches the multi-pack-index stored in the pack directory.

The multi-pack bitmap includes the hex checksum of the MIDX it
corresponds to in its filename (for example,
'$packdir/multi-pack-index-<checksum>.bitmap'). As a result, some tests
want a way to learn what '<checksum>' is.

This helper addresses that need by printing the checksum of the
repository's multi-pack-index.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/helper/test-read-midx.c | 16 +++++++++++++++-
 t/lib-bitmap.sh           |  4 ++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 7c2eb11a8e..cb0d27049a 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -60,12 +60,26 @@ static int read_midx_file(const char *object_dir, int show_objects)
 	return 0;
 }
 
+static int read_midx_checksum(const char *object_dir)
+{
+	struct multi_pack_index *m;
+
+	setup_git_directory();
+	m = load_multi_pack_index(object_dir, 1);
+	if (!m)
+		return 1;
+	printf("%s\n", hash_to_hex(get_midx_checksum(m)));
+	return 0;
+}
+
 int cmd__read_midx(int argc, const char **argv)
 {
 	if (!(argc == 2 || argc == 3))
-		usage("read-midx [--show-objects] <object-dir>");
+		usage("read-midx [--show-objects|--checksum] <object-dir>");
 
 	if (!strcmp(argv[1], "--show-objects"))
 		return read_midx_file(argv[2], 1);
+	else if (!strcmp(argv[1], "--checksum"))
+		return read_midx_checksum(argv[2]);
 	return read_midx_file(argv[1], 0);
 }
diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index 77464da6fd..21d0392dda 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -260,3 +260,7 @@ have_delta () {
 	echo $1 | git cat-file --batch-check="%(deltabase)" >actual &&
 	test_cmp expect actual
 }
+
+midx_checksum () {
+	test-tool read-midx --checksum "$1"
+}
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 18/25] t5326: test multi-pack bitmap behavior
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (16 preceding siblings ...)
  2021-08-24 16:16   ` [PATCH v4 17/25] t/helper/test-read-midx.c: add --checksum mode Taylor Blau
@ 2021-08-24 16:16   ` Taylor Blau
  2021-08-24 16:16   ` [PATCH v4 19/25] t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP Taylor Blau
                     ` (7 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:16 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

This patch introduces a new test, t5326, which tests the basic
functionality of multi-pack bitmaps.

Some trivial behavior is tested, such as:

  - Whether bitmaps can be generated with more than one pack.
  - Whether clones can be served with all objects in the bitmap.
  - Whether follow-up fetches can be served with some objects outside of
    the server's bitmap

These use lib-bitmap's tests (which in turn were pulled from t5310), and
we cover cases where the MIDX represents both a single pack and multiple
packs.

In addition, some non-trivial and MIDX-specific behavior is tested, too,
including:

  - Whether multi-pack bitmaps behave correctly with respect to the
    pack-reuse machinery when the base for some object is selected from
    a different pack than the delta.
  - Whether multi-pack bitmaps correctly respect the
    pack.preferBitmapTips configuration.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5326-multi-pack-bitmaps.sh | 286 ++++++++++++++++++++++++++++++++++
 1 file changed, 286 insertions(+)
 create mode 100755 t/t5326-multi-pack-bitmaps.sh

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
new file mode 100755
index 0000000000..4ad7c2c969
--- /dev/null
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -0,0 +1,286 @@
+#!/bin/sh
+
+test_description='exercise basic multi-pack bitmap functionality'
+. ./test-lib.sh
+. "${TEST_DIRECTORY}/lib-bitmap.sh"
+
+# We'll be writing our own midx and bitmaps, so avoid getting confused by the
+# automatic ones.
+GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
+objdir=.git/objects
+midx=$objdir/pack/multi-pack-index
+
+# midx_pack_source <obj>
+midx_pack_source () {
+	test-tool read-midx --show-objects .git/objects | grep "^$1 " | cut -f2
+}
+
+setup_bitmap_history
+
+test_expect_success 'enable core.multiPackIndex' '
+	git config core.multiPackIndex true
+'
+
+test_expect_success 'create single-pack midx with bitmaps' '
+	git repack -ad &&
+	git multi-pack-index write --bitmap &&
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+	test_path_is_file $midx-$(midx_checksum $objdir).rev
+'
+
+basic_bitmap_tests
+
+test_expect_success 'create new additional packs' '
+	for i in $(test_seq 1 16)
+	do
+		test_commit "$i" &&
+		git repack -d || return 1
+	done &&
+
+	git checkout -b other2 HEAD~8 &&
+	for i in $(test_seq 1 8)
+	do
+		test_commit "side-$i" &&
+		git repack -d || return 1
+	done &&
+	git checkout second
+'
+
+test_expect_success 'create multi-pack midx with bitmaps' '
+	git multi-pack-index write --bitmap &&
+
+	ls $objdir/pack/pack-*.pack >packs &&
+	test_line_count = 25 packs &&
+
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+	test_path_is_file $midx-$(midx_checksum $objdir).rev
+'
+
+basic_bitmap_tests
+
+test_expect_success '--no-bitmap is respected when bitmaps exist' '
+	git multi-pack-index write --bitmap &&
+
+	test_commit respect--no-bitmap &&
+	git repack -d &&
+
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+	test_path_is_file $midx-$(midx_checksum $objdir).rev &&
+
+	git multi-pack-index write --no-bitmap &&
+
+	test_path_is_file $midx &&
+	test_path_is_missing $midx-$(midx_checksum $objdir).bitmap &&
+	test_path_is_missing $midx-$(midx_checksum $objdir).rev
+'
+
+test_expect_success 'setup midx with base from later pack' '
+	# Write a and b so that "a" is a delta on top of base "b", since Git
+	# prefers to delete contents out of a base rather than add to a shorter
+	# object.
+	test_seq 1 128 >a &&
+	test_seq 1 130 >b &&
+
+	git add a b &&
+	git commit -m "initial commit" &&
+
+	a=$(git rev-parse HEAD:a) &&
+	b=$(git rev-parse HEAD:b) &&
+
+	# In the first pack, "a" is stored as a delta to "b".
+	p1=$(git pack-objects .git/objects/pack/pack <<-EOF
+	$a
+	$b
+	EOF
+	) &&
+
+	# In the second pack, "a" is missing, and "b" is not a delta nor base to
+	# any other object.
+	p2=$(git pack-objects .git/objects/pack/pack <<-EOF
+	$b
+	$(git rev-parse HEAD)
+	$(git rev-parse HEAD^{tree})
+	EOF
+	) &&
+
+	git prune-packed &&
+	# Use the second pack as the preferred source, so that "b" occurs
+	# earlier in the MIDX object order, rendering "a" unusable for pack
+	# reuse.
+	git multi-pack-index write --bitmap --preferred-pack=pack-$p2.idx &&
+
+	have_delta $a $b &&
+	test $(midx_pack_source $a) != $(midx_pack_source $b)
+'
+
+rev_list_tests 'full bitmap with backwards delta'
+
+test_expect_success 'clone with bitmaps enabled' '
+	git clone --no-local --bare . clone-reverse-delta.git &&
+	test_when_finished "rm -fr clone-reverse-delta.git" &&
+
+	git rev-parse HEAD >expect &&
+	git --git-dir=clone-reverse-delta.git rev-parse HEAD >actual &&
+	test_cmp expect actual
+'
+
+bitmap_reuse_tests() {
+	from=$1
+	to=$2
+
+	test_expect_success "setup pack reuse tests ($from -> $to)" '
+		rm -fr repo &&
+		git init repo &&
+		(
+			cd repo &&
+			test_commit_bulk 16 &&
+			git tag old-tip &&
+
+			git config core.multiPackIndex true &&
+			if test "MIDX" = "$from"
+			then
+				git repack -Ad &&
+				git multi-pack-index write --bitmap
+			else
+				git repack -Adb
+			fi
+		)
+	'
+
+	test_expect_success "build bitmap from existing ($from -> $to)" '
+		(
+			cd repo &&
+			test_commit_bulk --id=further 16 &&
+			git tag new-tip &&
+
+			if test "MIDX" = "$to"
+			then
+				git repack -d &&
+				git multi-pack-index write --bitmap
+			else
+				git repack -Adb
+			fi
+		)
+	'
+
+	test_expect_success "verify resulting bitmaps ($from -> $to)" '
+		(
+			cd repo &&
+			git for-each-ref &&
+			git rev-list --test-bitmap refs/tags/old-tip &&
+			git rev-list --test-bitmap refs/tags/new-tip
+		)
+	'
+}
+
+bitmap_reuse_tests 'pack' 'MIDX'
+bitmap_reuse_tests 'MIDX' 'pack'
+bitmap_reuse_tests 'MIDX' 'MIDX'
+
+test_expect_success 'missing object closure fails gracefully' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit loose &&
+		test_commit packed &&
+
+		# Do not pass "--revs"; we want a pack without the "loose"
+		# commit.
+		git pack-objects $objdir/pack/pack <<-EOF &&
+		$(git rev-parse packed)
+		EOF
+
+		test_must_fail git multi-pack-index write --bitmap 2>err &&
+		grep "doesn.t have full closure" err &&
+		test_path_is_missing $midx
+	)
+'
+
+test_expect_success 'setup partial bitmaps' '
+	test_commit packed &&
+	git repack &&
+	test_commit loose &&
+	git multi-pack-index write --bitmap 2>err &&
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+	test_path_is_file $midx-$(midx_checksum $objdir).rev
+'
+
+basic_bitmap_tests HEAD~
+
+test_expect_success 'removing a MIDX clears stale bitmaps' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+		test_commit base &&
+		git repack &&
+		git multi-pack-index write --bitmap &&
+
+		# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
+		stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
+		stale_rev=$midx-$(midx_checksum $objdir).rev &&
+		rm $midx &&
+
+		# Then write a new MIDX.
+		test_commit new &&
+		git repack &&
+		git multi-pack-index write --bitmap &&
+
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+		test_path_is_file $midx-$(midx_checksum $objdir).rev &&
+		test_path_is_missing $stale_bitmap &&
+		test_path_is_missing $stale_rev
+	)
+'
+
+test_expect_success 'pack.preferBitmapTips' '
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit_bulk --message="%s" 103 &&
+
+		git log --format="%H" >commits.raw &&
+		sort <commits.raw >commits &&
+
+		git log --format="create refs/tags/%s %H" HEAD >refs &&
+		git update-ref --stdin <refs &&
+
+		git multi-pack-index write --bitmap &&
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+		test_path_is_file $midx-$(midx_checksum $objdir).rev &&
+
+		test-tool bitmap list-commits | sort >bitmaps &&
+		comm -13 bitmaps commits >before &&
+		test_line_count = 1 before &&
+
+		perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
+			<before | git update-ref --stdin &&
+
+		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
+		rm -fr $midx-$(midx_checksum $objdir).rev &&
+		rm -fr $midx &&
+
+		git -c pack.preferBitmapTips=refs/tags/include \
+			multi-pack-index write --bitmap &&
+		test-tool bitmap list-commits | sort >bitmaps &&
+		comm -13 bitmaps commits >after &&
+
+		! test_cmp before after
+	)
+'
+
+test_done
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 19/25] t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (17 preceding siblings ...)
  2021-08-24 16:16   ` [PATCH v4 18/25] t5326: test multi-pack bitmap behavior Taylor Blau
@ 2021-08-24 16:16   ` Taylor Blau
  2021-08-24 16:16   ` [PATCH v4 20/25] t5310: " Taylor Blau
                     ` (6 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:16 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

From: Jeff King <peff@peff.net>

Generating a MIDX bitmap causes tests which repack in a partial clone to
fail because they are missing objects. Missing objects is an expected
component of tests in t0410, so disable this knob altogether. Graceful
degradation when writing a bitmap with missing objects is tested in
t5326.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t0410-partial-clone.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index bbcc51ee8e..bba679685f 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -4,6 +4,9 @@ test_description='partial clone'
 
 . ./test-lib.sh
 
+# missing promisor objects cause repacks which write bitmaps to fail
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
 delete_object () {
 	rm $1/.git/objects/$(echo $2 | sed -e 's|^..|&/|')
 }
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 20/25] t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (18 preceding siblings ...)
  2021-08-24 16:16   ` [PATCH v4 19/25] t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP Taylor Blau
@ 2021-08-24 16:16   ` Taylor Blau
  2021-08-24 16:16   ` [PATCH v4 21/25] t5319: don't write MIDX bitmaps in t5319 Taylor Blau
                     ` (5 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:16 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

From: Jeff King <peff@peff.net>

Generating a MIDX bitmap confuses many of the tests in t5310, which
expect to control whether and how bitmaps are written. Since the
relevant MIDX-bitmap tests here are covered already in t5326, let's just
disable the flag for the whole t5310 script.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5310-pack-bitmaps.sh | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 4318f84d53..673baa5c3c 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -8,6 +8,10 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 . "$TEST_DIRECTORY"/lib-bundle.sh
 . "$TEST_DIRECTORY"/lib-bitmap.sh
 
+# t5310 deals only with single-pack bitmaps, so don't write MIDX bitmaps in
+# their place.
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
 objpath () {
 	echo ".git/objects/$(echo "$1" | sed -e 's|\(..\)|\1/|')"
 }
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 21/25] t5319: don't write MIDX bitmaps in t5319
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (19 preceding siblings ...)
  2021-08-24 16:16   ` [PATCH v4 20/25] t5310: " Taylor Blau
@ 2021-08-24 16:16   ` Taylor Blau
  2021-08-24 16:16   ` [PATCH v4 22/25] t7700: update to work with MIDX bitmap test knob Taylor Blau
                     ` (4 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:16 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

This test is specifically about generating a midx still respecting a
pack-based bitmap file. Generating a MIDX bitmap would confuse the test.
Let's override the 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' variable to
make sure we don't do so.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5319-multi-pack-index.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 9b184bd45e..a81375d920 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -504,7 +504,8 @@ test_expect_success 'repack preserves multi-pack-index when creating packs' '
 compare_results_with_midx "after repack"
 
 test_expect_success 'multi-pack-index and pack-bitmap' '
-	git -c repack.writeBitmaps=true repack -ad &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -c repack.writeBitmaps=true repack -ad &&
 	git multi-pack-index write &&
 	git rev-list --test-bitmap HEAD
 '
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 22/25] t7700: update to work with MIDX bitmap test knob
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (20 preceding siblings ...)
  2021-08-24 16:16   ` [PATCH v4 21/25] t5319: don't write MIDX bitmaps in t5319 Taylor Blau
@ 2021-08-24 16:16   ` Taylor Blau
  2021-08-24 16:16   ` [PATCH v4 23/25] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
                     ` (3 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:16 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A number of these tests are focused only on pack-based bitmaps and need
to be updated to disable 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' where
necessary.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t7700-repack.sh | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 25b235c063..98eda3bfeb 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -63,13 +63,14 @@ test_expect_success 'objects in packs marked .keep are not repacked' '
 
 test_expect_success 'writing bitmaps via command-line can duplicate .keep objects' '
 	# build on $oid, $packid, and .keep state from previous
-	git repack -Adbl &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 git repack -Adbl &&
 	test_has_duplicate_object true
 '
 
 test_expect_success 'writing bitmaps via config can duplicate .keep objects' '
 	# build on $oid, $packid, and .keep state from previous
-	git -c repack.writebitmaps=true repack -Adl &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -c repack.writebitmaps=true repack -Adl &&
 	test_has_duplicate_object true
 '
 
@@ -189,7 +190,9 @@ test_expect_success 'repack --keep-pack' '
 
 test_expect_success 'bitmaps are created by default in bare repos' '
 	git clone --bare .git bare.git &&
-	git -C bare.git repack -ad &&
+	rm -f bare.git/objects/pack/*.bitmap &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -C bare.git repack -ad &&
 	bitmap=$(ls bare.git/objects/pack/*.bitmap) &&
 	test_path_is_file "$bitmap"
 '
@@ -200,7 +203,8 @@ test_expect_success 'incremental repack does not complain' '
 '
 
 test_expect_success 'bitmaps can be disabled on bare repos' '
-	git -c repack.writeBitmaps=false -C bare.git repack -ad &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -c repack.writeBitmaps=false -C bare.git repack -ad &&
 	bitmap=$(ls bare.git/objects/pack/*.bitmap || :) &&
 	test -z "$bitmap"
 '
@@ -211,7 +215,8 @@ test_expect_success 'no bitmaps created if .keep files present' '
 	keep=${pack%.pack}.keep &&
 	test_when_finished "rm -f \"\$keep\"" &&
 	>"$keep" &&
-	git -C bare.git repack -ad 2>stderr &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -C bare.git repack -ad 2>stderr &&
 	test_must_be_empty stderr &&
 	find bare.git/objects/pack/ -type f -name "*.bitmap" >actual &&
 	test_must_be_empty actual
@@ -222,7 +227,8 @@ test_expect_success 'auto-bitmaps do not complain if unavailable' '
 	blob=$(test-tool genrandom big $((1024*1024)) |
 	       git -C bare.git hash-object -w --stdin) &&
 	git -C bare.git update-ref refs/tags/big $blob &&
-	git -C bare.git repack -ad 2>stderr &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -C bare.git repack -ad 2>stderr &&
 	test_must_be_empty stderr &&
 	find bare.git/objects/pack -type f -name "*.bitmap" >actual &&
 	test_must_be_empty actual
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 23/25] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (21 preceding siblings ...)
  2021-08-24 16:16   ` [PATCH v4 22/25] t7700: update to work with MIDX bitmap test knob Taylor Blau
@ 2021-08-24 16:16   ` Taylor Blau
  2021-08-24 16:16   ` [PATCH v4 24/25] p5310: extract full and partial bitmap tests Taylor Blau
                     ` (2 subsequent siblings)
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:16 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Introduce a new 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' environment
variable to also write a multi-pack bitmap when
'GIT_TEST_MULTI_PACK_INDEX' is set.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c          | 12 ++++++++++--
 ci/run-build-and-tests.sh |  1 +
 midx.h                    |  2 ++
 t/README                  |  4 ++++
 4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 5f9bc74adc..82ab668272 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -515,6 +515,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		if (!(pack_everything & ALL_INTO_ONE) ||
 		    !is_bare_repository())
 			write_bitmaps = 0;
+	} else if (write_bitmaps &&
+		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0) &&
+		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0)) {
+		write_bitmaps = 0;
 	}
 	if (pack_kept_objects < 0)
 		pack_kept_objects = write_bitmaps > 0;
@@ -725,8 +729,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		update_server_info(0);
 	remove_temporary_files();
 
-	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
-		write_midx_file(get_object_directory(), NULL, 0);
+	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0)) {
+		unsigned flags = 0;
+		if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0))
+			flags |= MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX;
+		write_midx_file(get_object_directory(), NULL, flags);
+	}
 
 	string_list_clear(&names, 0);
 	string_list_clear(&rollback, 0);
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index 3ce81ffee9..7ee9ba9325 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -23,6 +23,7 @@ linux-gcc)
 	export GIT_TEST_COMMIT_GRAPH=1
 	export GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1
 	export GIT_TEST_MULTI_PACK_INDEX=1
+	export GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1
 	export GIT_TEST_ADD_I_USE_BUILTIN=1
 	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
 	export GIT_TEST_WRITE_REV_INDEX=1
diff --git a/midx.h b/midx.h
index 350f4d0a7b..aa3da557bb 100644
--- a/midx.h
+++ b/midx.h
@@ -8,6 +8,8 @@ struct pack_entry;
 struct repository;
 
 #define GIT_TEST_MULTI_PACK_INDEX "GIT_TEST_MULTI_PACK_INDEX"
+#define GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP \
+	"GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP"
 
 struct multi_pack_index {
 	struct multi_pack_index *next;
diff --git a/t/README b/t/README
index 9e70122302..12014aa988 100644
--- a/t/README
+++ b/t/README
@@ -425,6 +425,10 @@ GIT_TEST_MULTI_PACK_INDEX=<boolean>, when true, forces the multi-pack-
 index to be written after every 'git repack' command, and overrides the
 'core.multiPackIndex' setting to true.
 
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=<boolean>, when true, sets the
+'--bitmap' option on all invocations of 'git multi-pack-index write',
+and ignores pack-objects' '--write-bitmap-index'.
+
 GIT_TEST_SIDEBAND_ALL=<boolean>, when true, overrides the
 'uploadpack.allowSidebandAll' setting to true, and when false, forces
 fetch-pack to not request sideband-all (even if the server advertises
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 24/25] p5310: extract full and partial bitmap tests
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (22 preceding siblings ...)
  2021-08-24 16:16   ` [PATCH v4 23/25] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
@ 2021-08-24 16:16   ` Taylor Blau
  2021-08-24 16:16   ` [PATCH v4 25/25] p5326: perf tests for MIDX bitmaps Taylor Blau
  2021-08-25  0:28   ` [PATCH v4 00/25] multi-pack reachability bitmaps Jeff King
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:16 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A new p5326 introduced by the next patch will want these same tests,
interjecting its own setup in between. Move them out so that both perf
tests can reuse them.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/lib-bitmap.sh         | 69 ++++++++++++++++++++++++++++++++++++
 t/perf/p5310-pack-bitmaps.sh | 65 ++-------------------------------
 2 files changed, 72 insertions(+), 62 deletions(-)
 create mode 100644 t/perf/lib-bitmap.sh

diff --git a/t/perf/lib-bitmap.sh b/t/perf/lib-bitmap.sh
new file mode 100644
index 0000000000..63d3bc7cec
--- /dev/null
+++ b/t/perf/lib-bitmap.sh
@@ -0,0 +1,69 @@
+# Helper functions for testing bitmap performance; see p5310.
+
+test_full_bitmap () {
+	test_perf 'simulated clone' '
+		git pack-objects --stdout --all </dev/null >/dev/null
+	'
+
+	test_perf 'simulated fetch' '
+		have=$(git rev-list HEAD~100 -1) &&
+		{
+			echo HEAD &&
+			echo ^$have
+		} | git pack-objects --revs --stdout >/dev/null
+	'
+
+	test_perf 'pack to file (bitmap)' '
+		git pack-objects --use-bitmap-index --all pack1b </dev/null >/dev/null
+	'
+
+	test_perf 'rev-list (commits)' '
+		git rev-list --all --use-bitmap-index >/dev/null
+	'
+
+	test_perf 'rev-list (objects)' '
+		git rev-list --all --use-bitmap-index --objects >/dev/null
+	'
+
+	test_perf 'rev-list with tag negated via --not --all (objects)' '
+		git rev-list perf-tag --not --all --use-bitmap-index --objects >/dev/null
+	'
+
+	test_perf 'rev-list with negative tag (objects)' '
+		git rev-list HEAD --not perf-tag --use-bitmap-index --objects >/dev/null
+	'
+
+	test_perf 'rev-list count with blob:none' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=blob:none >/dev/null
+	'
+
+	test_perf 'rev-list count with blob:limit=1k' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=blob:limit=1k >/dev/null
+	'
+
+	test_perf 'rev-list count with tree:0' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=tree:0 >/dev/null
+	'
+
+	test_perf 'simulated partial clone' '
+		git pack-objects --stdout --all --filter=blob:none </dev/null >/dev/null
+	'
+}
+
+test_partial_bitmap () {
+	test_perf 'clone (partial bitmap)' '
+		git pack-objects --stdout --all </dev/null >/dev/null
+	'
+
+	test_perf 'pack to file (partial bitmap)' '
+		git pack-objects --use-bitmap-index --all pack2b </dev/null >/dev/null
+	'
+
+	test_perf 'rev-list with tree filter (partial bitmap)' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=tree:0 >/dev/null
+	'
+}
diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index 452be01056..7ad4f237bc 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -2,6 +2,7 @@
 
 test_description='Tests pack performance using bitmaps'
 . ./perf-lib.sh
+. "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
 
 test_perf_large_repo
 
@@ -25,56 +26,7 @@ test_perf 'repack to disk' '
 	git repack -ad
 '
 
-test_perf 'simulated clone' '
-	git pack-objects --stdout --all </dev/null >/dev/null
-'
-
-test_perf 'simulated fetch' '
-	have=$(git rev-list HEAD~100 -1) &&
-	{
-		echo HEAD &&
-		echo ^$have
-	} | git pack-objects --revs --stdout >/dev/null
-'
-
-test_perf 'pack to file (bitmap)' '
-	git pack-objects --use-bitmap-index --all pack1b </dev/null >/dev/null
-'
-
-test_perf 'rev-list (commits)' '
-	git rev-list --all --use-bitmap-index >/dev/null
-'
-
-test_perf 'rev-list (objects)' '
-	git rev-list --all --use-bitmap-index --objects >/dev/null
-'
-
-test_perf 'rev-list with tag negated via --not --all (objects)' '
-	git rev-list perf-tag --not --all --use-bitmap-index --objects >/dev/null
-'
-
-test_perf 'rev-list with negative tag (objects)' '
-	git rev-list HEAD --not perf-tag --use-bitmap-index --objects >/dev/null
-'
-
-test_perf 'rev-list count with blob:none' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=blob:none >/dev/null
-'
-
-test_perf 'rev-list count with blob:limit=1k' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=blob:limit=1k >/dev/null
-'
-
-test_perf 'rev-list count with tree:0' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=tree:0 >/dev/null
-'
-
-test_perf 'simulated partial clone' '
-	git pack-objects --stdout --all --filter=blob:none </dev/null >/dev/null
-'
+test_full_bitmap
 
 test_expect_success 'create partial bitmap state' '
 	# pick a commit to represent the repo tip in the past
@@ -97,17 +49,6 @@ test_expect_success 'create partial bitmap state' '
 	git update-ref HEAD $orig_tip
 '
 
-test_perf 'clone (partial bitmap)' '
-	git pack-objects --stdout --all </dev/null >/dev/null
-'
-
-test_perf 'pack to file (partial bitmap)' '
-	git pack-objects --use-bitmap-index --all pack2b </dev/null >/dev/null
-'
-
-test_perf 'rev-list with tree filter (partial bitmap)' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=tree:0 >/dev/null
-'
+test_partial_bitmap
 
 test_done
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v4 25/25] p5326: perf tests for MIDX bitmaps
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (23 preceding siblings ...)
  2021-08-24 16:16   ` [PATCH v4 24/25] p5310: extract full and partial bitmap tests Taylor Blau
@ 2021-08-24 16:16   ` Taylor Blau
  2021-08-25  0:28   ` [PATCH v4 00/25] multi-pack reachability bitmaps Jeff King
  25 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 16:16 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

These new performance tests demonstrate effectively the same behavior as
p5310, but use a multi-pack bitmap instead of a single-pack one.

Notably, p5326 does not create a MIDX bitmap with multiple packs. This
is so we can measure a direct comparison between it and p5310. Any
difference between the two is measuring just the overhead of using MIDX
bitmaps.

Here are the results of p5310 and p5326 together, measured at the same
time and on the same machine (using a Xenon W-2255 CPU):

    Test                                                  HEAD
    ------------------------------------------------------------------------
    5310.2: repack to disk                                96.78(93.39+11.33)
    5310.3: simulated clone                               9.98(9.79+0.19)
    5310.4: simulated fetch                               1.75(4.26+0.19)
    5310.5: pack to file (bitmap)                         28.20(27.87+8.70)
    5310.6: rev-list (commits)                            0.41(0.36+0.05)
    5310.7: rev-list (objects)                            1.61(1.54+0.07)
    5310.8: rev-list count with blob:none                 0.25(0.21+0.04)
    5310.9: rev-list count with blob:limit=1k             2.65(2.54+0.10)
    5310.10: rev-list count with tree:0                   0.23(0.19+0.04)
    5310.11: simulated partial clone                      4.34(4.21+0.12)
    5310.13: clone (partial bitmap)                       11.05(12.21+0.48)
    5310.14: pack to file (partial bitmap)                31.25(34.22+3.70)
    5310.15: rev-list with tree filter (partial bitmap)   0.26(0.22+0.04)

versus the same tests (this time using a multi-pack index):

    Test                                                  HEAD
    ------------------------------------------------------------------------
    5326.2: setup multi-pack index                        78.99(75.29+11.58)
    5326.3: simulated clone                               11.78(11.56+0.22)
    5326.4: simulated fetch                               1.70(4.49+0.13)
    5326.5: pack to file (bitmap)                         28.02(27.72+8.76)
    5326.6: rev-list (commits)                            0.42(0.36+0.06)
    5326.7: rev-list (objects)                            1.65(1.58+0.06)
    5326.8: rev-list count with blob:none                 0.26(0.21+0.05)
    5326.9: rev-list count with blob:limit=1k             2.97(2.86+0.10)
    5326.10: rev-list count with tree:0                   0.25(0.20+0.04)
    5326.11: simulated partial clone                      5.65(5.49+0.16)
    5326.13: clone (partial bitmap)                       12.22(13.43+0.38)
    5326.14: pack to file (partial bitmap)                30.05(31.57+7.25)
    5326.15: rev-list with tree filter (partial bitmap)   0.24(0.20+0.04)

There is slight overhead in "simulated clone", "simulated partial
clone", and "clone (partial bitmap)". Unsurprisingly, that overhead is
due to using the MIDX's reverse index to map between bit positions and
MIDX positions.

This can be reproduced by running "git repack -adb" along with "git
multi-pack-index write --bitmap" in a large-ish repository. Then run:

    $ perf record -o pack.perf git -c core.multiPackIndex=false \
      pack-objects --all --stdout >/dev/null </dev/null
    $ perf record -o midx.perf git -c core.multiPackIndex=true \
      pack-objects --all --stdout >/dev/null </dev/null

and compare the two with "perf diff -c delta -o 1 pack.perf midx.perf".
The most notable results are below (the next largest positive delta is
+0.14%):

    # Event 'cycles'
    #
    # Baseline    Delta  Shared Object       Symbol
    # ........  .......  ..................  ..........................
    #
                 +5.86%  git                 [.] nth_midxed_offset
                 +5.24%  git                 [.] nth_midxed_pack_int_id
         3.45%   +0.97%  git                 [.] offset_to_pack_pos
         3.30%   +0.57%  git                 [.] pack_pos_to_offset
                 +0.30%  git                 [.] pack_pos_to_midx

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/p5326-multi-pack-bitmaps.sh | 43 ++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)
 create mode 100755 t/perf/p5326-multi-pack-bitmaps.sh

diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
new file mode 100755
index 0000000000..5845109ac7
--- /dev/null
+++ b/t/perf/p5326-multi-pack-bitmaps.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+
+test_description='Tests performance using midx bitmaps'
+. ./perf-lib.sh
+. "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
+
+test_perf_large_repo
+
+test_expect_success 'enable multi-pack index' '
+	git config core.multiPackIndex true
+'
+
+test_perf 'setup multi-pack index' '
+	git repack -ad &&
+	git multi-pack-index write --bitmap
+'
+
+test_full_bitmap
+
+test_expect_success 'create partial bitmap state' '
+	# pick a commit to represent the repo tip in the past
+	cutoff=$(git rev-list HEAD~100 -1) &&
+	orig_tip=$(git rev-parse HEAD) &&
+
+	# now pretend we have just one tip
+	rm -rf .git/logs .git/refs/* .git/packed-refs &&
+	git update-ref HEAD $cutoff &&
+
+	# and then repack, which will leave us with a nice
+	# big bitmap pack of the "old" history, and all of
+	# the new history will be loose, as if it had been pushed
+	# up incrementally and exploded via unpack-objects
+	git repack -Ad &&
+	git multi-pack-index write --bitmap &&
+
+	# and now restore our original tip, as if the pushes
+	# had happened
+	git update-ref HEAD $orig_tip
+'
+
+test_partial_bitmap
+
+test_done
-- 
2.31.1.163.ga65ce7f831

^ permalink raw reply related	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-24 16:16   ` [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX Taylor Blau
@ 2021-08-24 20:27     ` Junio C Hamano
  2021-08-24 20:34       ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Junio C Hamano @ 2021-08-24 20:27 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, jonathantanmy

Taylor Blau <me@ttaylorr.com> writes:

> When writing a new multi-pack index, write_midx_internal() attempts to
> clean up any auxiliary files (currently just the MIDX's `.rev` file, but
> soon to include a `.bitmap`, too) corresponding to the MIDX it's
> replacing.
>
> This step should happen after the new MIDX is written into place, since
> doing so beforehand means that the old MIDX could be read without its
> corresponding .rev file.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  midx.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/midx.c b/midx.c
> index 321c6fdd2f..73b199ca49 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -1086,10 +1086,11 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  
>  	if (flags & MIDX_WRITE_REV_INDEX)
>  		write_midx_reverse_index(midx_name, midx_hash, &ctx);
> -	clear_midx_files_ext(the_repository, ".rev", midx_hash);
>  
>  	commit_lock_file(&lk);
>  
> +	clear_midx_files_ext(the_repository, ".rev", midx_hash);

This needs to take object_dir into account, no?

There are a few more calls to clear_midx_files_ext() added in 15/25
and they use the_repository, too.

>  cleanup:
>  	for (i = 0; i < ctx.nr; i++) {
>  		if (ctx.info[i].p) {

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-24 20:27     ` Junio C Hamano
@ 2021-08-24 20:34       ` Taylor Blau
  2021-08-24 21:12         ` Junio C Hamano
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 20:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, peff, dstolee, jonathantanmy

On Tue, Aug 24, 2021 at 01:27:34PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > When writing a new multi-pack index, write_midx_internal() attempts to
> > clean up any auxiliary files (currently just the MIDX's `.rev` file, but
> > soon to include a `.bitmap`, too) corresponding to the MIDX it's
> > replacing.
> >
> > This step should happen after the new MIDX is written into place, since
> > doing so beforehand means that the old MIDX could be read without its
> > corresponding .rev file.
> >
> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > ---
> >  midx.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/midx.c b/midx.c
> > index 321c6fdd2f..73b199ca49 100644
> > --- a/midx.c
> > +++ b/midx.c
> > @@ -1086,10 +1086,11 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
> >
> >  	if (flags & MIDX_WRITE_REV_INDEX)
> >  		write_midx_reverse_index(midx_name, midx_hash, &ctx);
> > -	clear_midx_files_ext(the_repository, ".rev", midx_hash);
> >
> >  	commit_lock_file(&lk);
> >
> > +	clear_midx_files_ext(the_repository, ".rev", midx_hash);
>
> This needs to take object_dir into account, no?

Yes and no; clear_midx_files_ext() still takes a pointer to a 'struct
repository' until we pick up [1].

I asked for some changes in the latest version that Johannes posted. So
I'd be OK to live with this behavior for the time being, and then I can
send another patch on top that fixes the new and existing callers
(incorporating [1] with some new tests).

Or we can hold one up and expedite the other. I would suggest that we
pick up this series to next if you're otherwise happy with it and then I
can send the trivial fixes on top.

Thanks,
Taylor

[1]: https://lore.kernel.org/git/20210823171011.80588-1-johannes@sipsolutions.net/

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-24 20:34       ` Taylor Blau
@ 2021-08-24 21:12         ` Junio C Hamano
  2021-08-24 21:24           ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Junio C Hamano @ 2021-08-24 21:12 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, jonathantanmy

Taylor Blau <me@ttaylorr.com> writes:

>> This needs to take object_dir into account, no?
>
> Yes and no; clear_midx_files_ext() still takes a pointer to a 'struct
> repository' until we pick up [1].

I was hoping that [1] will become part of this series as a trivial
clean-up and bugfix, perhaps in its early part.


^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-24 21:12         ` Junio C Hamano
@ 2021-08-24 21:24           ` Taylor Blau
  2021-08-24 22:01             ` Taylor Blau
  2021-08-24 22:04             ` Junio C Hamano
  0 siblings, 2 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 21:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, peff, dstolee, jonathantanmy

On Tue, Aug 24, 2021 at 02:12:42PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> >> This needs to take object_dir into account, no?
> >
> > Yes and no; clear_midx_files_ext() still takes a pointer to a 'struct
> > repository' until we pick up [1].
>
> I was hoping that [1] will become part of this series as a trivial
> clean-up and bugfix, perhaps in its early part.

Sure, that works even better. I'll send a reroll incorporating it as
soon as I finish re-testing.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-24 21:24           ` Taylor Blau
@ 2021-08-24 22:01             ` Taylor Blau
  2021-08-24 22:04             ` Junio C Hamano
  1 sibling, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 22:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, dstolee, jonathantanmy

On Tue, Aug 24, 2021 at 05:24:45PM -0400, Taylor Blau wrote:
> On Tue, Aug 24, 2021 at 02:12:42PM -0700, Junio C Hamano wrote:
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> > >> This needs to take object_dir into account, no?
> > >
> > > Yes and no; clear_midx_files_ext() still takes a pointer to a 'struct
> > > repository' until we pick up [1].
> >
> > I was hoping that [1] will become part of this series as a trivial
> > clean-up and bugfix, perhaps in its early part.
>
> Sure, that works even better. I'll send a reroll incorporating it as
> soon as I finish re-testing.

Hmm, this got me wondering: what should be the behavior be when we run
the multi-pack-index command outside of a Git repository? For example,
in patch 15 we do:

    for (cur = get_multi_pack_index(the_repository); cur; cur = cur->next) {
      if (!strcmp(object_dir, cur->object_dir)) {
        ctx.m = cur;
        break;
      }
    }

but obviously get_multi_pack_index(the_repository) will fail when there
is no repository to begin with.

The real question is whether we should allow munging arbitrary MIDXs, or
restrict the ones we can modify to just our alternates. If we allow the
former, then that code needs to be tweaked. If not, and we only allow
touching alternates, then we need to require a repository for the
multi-pack-index builtin.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-24 21:24           ` Taylor Blau
  2021-08-24 22:01             ` Taylor Blau
@ 2021-08-24 22:04             ` Junio C Hamano
  2021-08-24 22:06               ` Junio C Hamano
  1 sibling, 1 reply; 273+ messages in thread
From: Junio C Hamano @ 2021-08-24 22:04 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, jonathantanmy

Taylor Blau <me@ttaylorr.com> writes:

> On Tue, Aug 24, 2021 at 02:12:42PM -0700, Junio C Hamano wrote:
>> Taylor Blau <me@ttaylorr.com> writes:
>>
>> >> This needs to take object_dir into account, no?
>> >
>> > Yes and no; clear_midx_files_ext() still takes a pointer to a 'struct
>> > repository' until we pick up [1].
>>
>> I was hoping that [1] will become part of this series as a trivial
>> clean-up and bugfix, perhaps in its early part.
>
> Sure, that works even better. I'll send a reroll incorporating it as
> soon as I finish re-testing.

FWIW, here is what I have somewhere in 'seen' where two topics meet.

diff --cc midx.c
index c0209751b5,4574e6d411..0000000000
--- i/midx.c
+++ w/midx.c
@@@ -1090,6 -1351,9 +1351,9 @@@ static int write_midx_internal(const ch
  
  	commit_lock_file(&lk);
  
 -	clear_midx_files_ext(the_repository, ".bitmap", midx_hash);
 -	clear_midx_files_ext(the_repository, ".rev", midx_hash);
++	clear_midx_files_ext(object_dir, ".bitmap", midx_hash);
++	clear_midx_files_ext(object_dir, ".rev", midx_hash);
+ 
  cleanup:
  	for (i = 0; i < ctx.nr; i++) {
  		if (ctx.info[i].p) {
@@@ -1165,7 -1429,8 +1429,8 @@@ void clear_midx_file(struct repository 
  	if (remove_path(midx))
  		die(_("failed to clear multi-pack-index at %s"), midx);
  
 -	clear_midx_files_ext(r, ".bitmap", NULL);
 -	clear_midx_files_ext(r, ".rev", NULL);
++	clear_midx_files_ext(r->objects->odb->path, ".bitmap", NULL);
 +	clear_midx_files_ext(r->objects->odb->path, ".rev", NULL);
  
  	free(midx);
  }

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-24 22:04             ` Junio C Hamano
@ 2021-08-24 22:06               ` Junio C Hamano
  2021-08-24 22:10                 ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Junio C Hamano @ 2021-08-24 22:06 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, jonathantanmy

Junio C Hamano <gitster@pobox.com> writes:

> FWIW, here is what I have somewhere in 'seen' where two topics meet.

Oops, one change missed.

diff --cc midx.c
index c0209751b5,4574e6d411..0000000000
--- i/midx.c
+++ w/midx.c
@@@ -947,11 -1136,29 +1136,29 @@@ static int write_midx_internal(const ch
  	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
  	stop_progress(&ctx.progress);
  
- 	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
- 		goto cleanup;
+ 	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop) {
+ 		struct bitmap_index *bitmap_git;
+ 		int bitmap_exists;
+ 		int want_bitmap = flags & MIDX_WRITE_BITMAP;
+ 
+ 		bitmap_git = prepare_midx_bitmap_git(the_repository, ctx.m);
+ 		bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
+ 		free_bitmap_index(bitmap_git);
+ 
+ 		if (bitmap_exists || !want_bitmap) {
+ 			/*
+ 			 * The correct MIDX already exists, and so does a
+ 			 * corresponding bitmap (or one wasn't requested).
+ 			 */
+ 			if (!want_bitmap)
 -				clear_midx_files_ext(the_repository, ".bitmap",
++				clear_midx_files_ext(object_dir, ".bitmap",
+ 						     NULL);
+ 			goto cleanup;
+ 		}
+ 	}
  
- 	ctx.preferred_pack_idx = -1;
  	if (preferred_pack_name) {
+ 		int found = 0;
  		for (i = 0; i < ctx.nr; i++) {
  			if (!cmp_idx_or_pack_name(preferred_pack_name,
  						  ctx.info[i].pack_name)) {
@@@ -1090,6 -1351,9 +1351,9 @@@
  
  	commit_lock_file(&lk);
  
 -	clear_midx_files_ext(the_repository, ".bitmap", midx_hash);
 -	clear_midx_files_ext(the_repository, ".rev", midx_hash);
++	clear_midx_files_ext(object_dir, ".bitmap", midx_hash);
++	clear_midx_files_ext(object_dir, ".rev", midx_hash);
+ 
  cleanup:
  	for (i = 0; i < ctx.nr; i++) {
  		if (ctx.info[i].p) {
@@@ -1165,7 -1429,8 +1429,8 @@@ void clear_midx_file(struct repository 
  	if (remove_path(midx))
  		die(_("failed to clear multi-pack-index at %s"), midx);
  
 -	clear_midx_files_ext(r, ".bitmap", NULL);
 -	clear_midx_files_ext(r, ".rev", NULL);
++	clear_midx_files_ext(r->objects->odb->path, ".bitmap", NULL);
 +	clear_midx_files_ext(r->objects->odb->path, ".rev", NULL);
  
  	free(midx);
  }

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-24 22:06               ` Junio C Hamano
@ 2021-08-24 22:10                 ` Taylor Blau
  2021-08-27  6:01                   ` Junio C Hamano
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-08-24 22:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, dstolee, jonathantanmy

On Tue, Aug 24, 2021 at 03:06:55PM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
> > FWIW, here is what I have somewhere in 'seen' where two topics meet.
>
> Oops, one change missed.

Thanks; that matches my own resolution. I noticed that it does fail the
new test in t5319, since writing a MIDX wants to make sure that we are
only touching an alternate's object directory (which will fail if we are
running `git multi-pack-index` from outside of a repository).

My opinion is that we should require being inside of a repository to run
the MIDX builtin. Otherwise we're allowing that command to modify any
old MIDX, which doesn't make sense.

I think we probably need a single unifying topic, so I'm happy if you
want to discard one of our two topics from seen in the meantime.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 00/25] multi-pack reachability bitmaps
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
                     ` (24 preceding siblings ...)
  2021-08-24 16:16   ` [PATCH v4 25/25] p5326: perf tests for MIDX bitmaps Taylor Blau
@ 2021-08-25  0:28   ` Jeff King
  2021-08-25  2:10     ` Taylor Blau
  25 siblings, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-08-25  0:28 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Tue, Aug 24, 2021 at 12:15:47PM -0400, Taylor Blau wrote:

> Range-diff against v3:
> [...]
>  9:  40cff5beb5 !  9:  c9fea31fa8 midx: avoid opening multiple MIDXs when writing
>     @@ Commit message
>          one and should invalidate the object store's memory of any MIDX that
>          might have existed beforehand.
>      
>     +    Note that this now forbids passing object directories that don't belong
>     +    to alternate repositories over `--object-dir`, since before we would
>     +    have happily opened a MIDX in any directory, but now restrict ourselves
>     +    to only those reachable by `r->objects->multi_pack_index` (and alternate
>     +    MIDXs that we can see by walking the `next` pointer).
>     +
>     +    As far as I can tell, supporting arbitrary directories with
>     +    `--object-dir` was a historical accident, since even the documentation
>     +    says `<alt>` when referring to the value passed to this option.
>     +
>     +    A future patch could clean this up and provide a warning() when a
>     +    non-alternate directory was given, since we'll still write a new MIDX
>     +    there, we just won't reuse any MIDX that might happen to already exist
>     +    in that directory.
>     +

So this is definitely fixed as we discussed. But since that discussion,
we've had the thread over in:

  https://lore.kernel.org/git/20210820195558.44275-1-johannes@sipsolutions.net/

and its siblings:

  https://lore.kernel.org/git/20210823094049.44136-1-johannes@sipsolutions.net/

  https://lore.kernel.org/git/20210823171011.80588-1-johannes@sipsolutions.net/

It's not clear to me that we have a resolution on whether calling "cd ..
&& git multi-pack-index write --object-dir repo.git" is supposed to
work.

It has traditionally worked (at least for trivial cases, AFAICT), but I
find the behavior surprising and unlike most of the rest of Git, and I'm
not at all certain that there aren't subtle bugs lurking (basically
anything that wants to do object lookup, like oh say, a bitmap
generator).

But if we do want to support it, then we have to find a different
solution here, don't we?  I think the least-painful version of that is
probably recording _whether_ we found ctx.m in the_repository's
object_store, and switching behavior based on that (e.g., calling
close_midx() versus close_object_store() depending).

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 00/25] multi-pack reachability bitmaps
  2021-08-25  0:28   ` [PATCH v4 00/25] multi-pack reachability bitmaps Jeff King
@ 2021-08-25  2:10     ` Taylor Blau
  2021-08-25  2:13       ` Taylor Blau
  2021-08-25  7:36       ` Jeff King
  0 siblings, 2 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-25  2:10 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, dstolee, gitster, jonathantanmy

On Tue, Aug 24, 2021 at 08:28:36PM -0400, Jeff King wrote:
> On Tue, Aug 24, 2021 at 12:15:47PM -0400, Taylor Blau wrote:
>
> > Range-diff against v3:
> > [...]
> >  9:  40cff5beb5 !  9:  c9fea31fa8 midx: avoid opening multiple MIDXs when writing
> >     @@ Commit message
> >          one and should invalidate the object store's memory of any MIDX that
> >          might have existed beforehand.
> >
> >     +    Note that this now forbids passing object directories that don't belong
> >     +    to alternate repositories over `--object-dir`, since before we would
> >     +    have happily opened a MIDX in any directory, but now restrict ourselves
> >     +    to only those reachable by `r->objects->multi_pack_index` (and alternate
> >     +    MIDXs that we can see by walking the `next` pointer).
> >     +
> >     +    As far as I can tell, supporting arbitrary directories with
> >     +    `--object-dir` was a historical accident, since even the documentation
> >     +    says `<alt>` when referring to the value passed to this option.
> >     +
> >     +    A future patch could clean this up and provide a warning() when a
> >     +    non-alternate directory was given, since we'll still write a new MIDX
> >     +    there, we just won't reuse any MIDX that might happen to already exist
> >     +    in that directory.
> >     +
>
> So this is definitely fixed as we discussed. But since that discussion,
> we've had the thread over in:
>
>   https://lore.kernel.org/git/20210820195558.44275-1-johannes@sipsolutions.net/
>
> and its siblings:
>
>   https://lore.kernel.org/git/20210823094049.44136-1-johannes@sipsolutions.net/
>
>   https://lore.kernel.org/git/20210823171011.80588-1-johannes@sipsolutions.net/
>
> It's not clear to me that we have a resolution on whether calling "cd ..
> && git multi-pack-index write --object-dir repo.git" is supposed to
> work.

My recommendation would be to do the following things, all in a reroll
of this series:

  - Fix the bug by which we would delete a .rev or .bitmap file out of a
    different object store than we were working in (when the caller
    passes `--object-dir`).

  - Disallow running `git multi-pack-index` outside of a Git repository.

  - Restrict `--object-dir` to only work with alternates of the
    repository in the current working directory.

To me, that seems like both the least-surprising behavior, and what
would lend itself to the easiest implementation. I would probably argue
that the existing behavior (where `--object-dir` would work against
arbitrary repositories) is a bug, and shouldn't continue to be
supported.

So my plan would be to do that, which would generate something like the
following range-diff. If nobody has any objections, I'd like to send
what I currently have in ttaylorr/git on GitHub in the
tb/multi-pack-bitmaps branch as a reroll of this series, and then merge
that early in the cycle to give it a chance to be tested before we cut
2.34.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 00/25] multi-pack reachability bitmaps
  2021-08-25  2:10     ` Taylor Blau
@ 2021-08-25  2:13       ` Taylor Blau
  2021-08-25  7:36       ` Jeff King
  1 sibling, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-25  2:13 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jeff King, git, dstolee, gitster, jonathantanmy

On Tue, Aug 24, 2021 at 10:10:12PM -0400, Taylor Blau wrote:
> My recommendation would be to do the following things, all in a reroll
> of this series:

For what it's worth, the substantive changes (which I have not figured
out how to include in a range-diff since they are entirely new patches)
are these:

  - Replacing Johannes' patch with:
    https://github.com/ttaylorr/git/commit/2b1afbd516a75bb43a8aae6ff1cac6a83ed7f589,

  - and then adding another patch immediately after it:
    https://github.com/git/git/commit/0a2d4d8dbf3c50eb3e2b659d1dcdf432d3b4d223

...and otherwise keeping the remainder of the series unchanged.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 00/25] multi-pack reachability bitmaps
  2021-08-25  2:10     ` Taylor Blau
  2021-08-25  2:13       ` Taylor Blau
@ 2021-08-25  7:36       ` Jeff King
  2021-08-25  7:48         ` Johannes Berg
  2021-08-26 18:49         ` Taylor Blau
  1 sibling, 2 replies; 273+ messages in thread
From: Jeff King @ 2021-08-25  7:36 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Johannes Berg, git, dstolee, gitster, jonathantanmy

On Tue, Aug 24, 2021 at 10:10:12PM -0400, Taylor Blau wrote:

> > It's not clear to me that we have a resolution on whether calling "cd ..
> > && git multi-pack-index write --object-dir repo.git" is supposed to
> > work.
> 
> My recommendation would be to do the following things, all in a reroll
> of this series:
> 
>   - Fix the bug by which we would delete a .rev or .bitmap file out of a
>     different object store than we were working in (when the caller
>     passes `--object-dir`).
> 
>   - Disallow running `git multi-pack-index` outside of a Git repository.
> 
>   - Restrict `--object-dir` to only work with alternates of the
>     repository in the current working directory.
> 
> To me, that seems like both the least-surprising behavior, and what
> would lend itself to the easiest implementation. I would probably argue
> that the existing behavior (where `--object-dir` would work against
> arbitrary repositories) is a bug, and shouldn't continue to be
> supported.

All of those seem reasonable to me, and are what I would suggest if we
were starting from scratch. My only hesitation is whether people are
using the weird behavior of --object-dir in the wild (e.g., are bup
folks relying on it).

Johannes, is this something you're using _now_, and it works, or
something you hoped to use in the future?

In a sense, "hope to use" does not make you any less disappointed. ;)
But what I'm wondering is whether using --object-dir from outside a repo
entirely is actually something that even works. I.e., would we be
disabling a behavior that was not intended, but does happen to work? Or
are we closing off a possibly buggy and half-working part of the system?

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 00/25] multi-pack reachability bitmaps
  2021-08-25  7:36       ` Jeff King
@ 2021-08-25  7:48         ` Johannes Berg
  2021-08-26 18:49         ` Taylor Blau
  1 sibling, 0 replies; 273+ messages in thread
From: Johannes Berg @ 2021-08-25  7:48 UTC (permalink / raw)
  To: Jeff King, Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, 2021-08-25 at 03:36 -0400, Jeff King wrote:
> On Tue, Aug 24, 2021 at 10:10:12PM -0400, Taylor Blau wrote:
> 
> > > It's not clear to me that we have a resolution on whether calling "cd ..
> > > && git multi-pack-index write --object-dir repo.git" is supposed to
> > > work.
> > 
> > My recommendation would be to do the following things, all in a reroll
> > of this series:
> > 
> >   - Fix the bug by which we would delete a .rev or .bitmap file out of a
> >     different object store than we were working in (when the caller
> >     passes `--object-dir`).

That was what my patch did, afaict.

> >   - Disallow running `git multi-pack-index` outside of a Git repository.
> > 
> >   - Restrict `--object-dir` to only work with alternates of the
> >     repository in the current working directory.
> > 
> > To me, that seems like both the least-surprising behavior, and what
> > would lend itself to the easiest implementation. I would probably argue
> > that the existing behavior (where `--object-dir` would work against
> > arbitrary repositories) is a bug, and shouldn't continue to be
> > supported.
> 
> All of those seem reasonable to me, and are what I would suggest if we
> were starting from scratch. My only hesitation is whether people are
> using the weird behavior of --object-dir in the wild (e.g., are bup
> folks relying on it).
> 
> Johannes, is this something you're using _now_, and it works, or
> something you hoped to use in the future?

I was "hoping" to use

	git multi-pack-index --object-dir=... write

but never

	$ git multi-pack-index write --object-dir=...

which almost seems like it really is more like

	$ git -C ... multi-pack-index write

anyway, because you specify a repo? At least per the above example, I
never tried.


As I started playing with that again (I had done before, and it worked)
I noticed the segfault, hence my previous patch.


However, what I was thinking of doing is more outlined in this thread:
https://lore.kernel.org/git/20210820195558.44275-1-johannes@sipsolutions.net/


And essentially, as I described later in
https://lore.kernel.org/git/dbb24573efc3dd945acd8acdfd9fe627ad7cbcd2.camel@sipsolutions.net/

I have two only vaguely overlapping use cases.

One of them doesn't need "--object-dir", and the other requires that
[RFC PATCH] to be applied as well, which would basically let me use only
the small subset of git that is "git multi-pack-index" as machinery to
*just* do indexing, *without* really ever having a real "repository"
that git could otherwise operate on and worry about the actual objects
etc.

I might resend that with the code style issues fixed, but the objects
seemed more fundamental.

> But what I'm wondering is whether using --object-dir from outside a repo
> entirely is actually something that even works. I.e., would we be
> disabling a behavior that was not intended, but does happen to work? Or
> are we closing off a possibly buggy and half-working part of the system?

Well, it does work now, modulo the segfault, but that is actually a very
recent addition, I'd tried this before :)

johannes


^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 00/25] multi-pack reachability bitmaps
  2021-08-25  7:36       ` Jeff King
  2021-08-25  7:48         ` Johannes Berg
@ 2021-08-26 18:49         ` Taylor Blau
  2021-08-26 21:22           ` Taylor Blau
  1 sibling, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-08-26 18:49 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Berg, git, dstolee, gitster, jonathantanmy

On Wed, Aug 25, 2021 at 03:36:15AM -0400, Jeff King wrote:
> On Tue, Aug 24, 2021 at 10:10:12PM -0400, Taylor Blau wrote:
>
> > > It's not clear to me that we have a resolution on whether calling "cd ..
> > > && git multi-pack-index write --object-dir repo.git" is supposed to
> > > work.
> >
> > My recommendation would be to do the following things, all in a reroll
> > of this series:
> >
> >   - Fix the bug by which we would delete a .rev or .bitmap file out of a
> >     different object store than we were working in (when the caller
> >     passes `--object-dir`).
> >
> >   - Disallow running `git multi-pack-index` outside of a Git repository.
> >
> >   - Restrict `--object-dir` to only work with alternates of the
> >     repository in the current working directory.
> >
> > To me, that seems like both the least-surprising behavior, and what
> > would lend itself to the easiest implementation. I would probably argue
> > that the existing behavior (where `--object-dir` would work against
> > arbitrary repositories) is a bug, and shouldn't continue to be
> > supported.
>
> All of those seem reasonable to me, and are what I would suggest if we
> were starting from scratch. My only hesitation is whether people are
> using the weird behavior of --object-dir in the wild (e.g., are bup
> folks relying on it).
>
> Johannes, is this something you're using _now_, and it works, or
> something you hoped to use in the future?

I did some research[1] on what parts of `--object-dir` have worked (and not
worked) in the past, and came to the conclusion that although this
behavior is surprising, we do bear the responsibility of continuing to
maintain it.

And in that sense, I agree with your "only call close_object_store() if
the MIDX we are using came from the object store, or otherwise call
close_midx() if it didn't", so that's what I did in the
tb/multi-pack-bitmaps branch of my fork[2].

I think that this is the most reasonable path forward, since it resolves
Johannes' concerns while also not breaking any existing functionality in
the meantime as we add new features on top. It has the added benefit of
closing some holes that were open in the past, so I think that it's
worth doing.

Before I drop 27 patches onto the inboxes of list subscribers, would you
mind taking a look at [1] (and the rest of the patches in [2]) to make
sure that you're OK with the approach too?

Thanks,
Taylor

[1]: https://github.com/ttaylorr/git/commit/a24290489c2b30f3caed7e33fe8f85226a12778f
[2]: https://github.com/ttaylorr/git/compare/tb/multi-pack-bitmaps

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 00/25] multi-pack reachability bitmaps
  2021-08-26 18:49         ` Taylor Blau
@ 2021-08-26 21:22           ` Taylor Blau
  2021-08-27 21:30             ` Jeff King
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-08-26 21:22 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Berg, git, dstolee, gitster, jonathantanmy

On Thu, Aug 26, 2021 at 02:49:10PM -0400, Taylor Blau wrote:
> On Wed, Aug 25, 2021 at 03:36:15AM -0400, Jeff King wrote:
> > On Tue, Aug 24, 2021 at 10:10:12PM -0400, Taylor Blau wrote:
> >
> > > > It's not clear to me that we have a resolution on whether calling "cd ..
> > > > && git multi-pack-index write --object-dir repo.git" is supposed to
> > > > work.
> > >
> > > My recommendation would be to do the following things, all in a reroll
> > > of this series:
> > >
> > >   - Fix the bug by which we would delete a .rev or .bitmap file out of a
> > >     different object store than we were working in (when the caller
> > >     passes `--object-dir`).
> > >
> > >   - Disallow running `git multi-pack-index` outside of a Git repository.
> > >
> > >   - Restrict `--object-dir` to only work with alternates of the
> > >     repository in the current working directory.
> > >
> > > To me, that seems like both the least-surprising behavior, and what
> > > would lend itself to the easiest implementation. I would probably argue
> > > that the existing behavior (where `--object-dir` would work against
> > > arbitrary repositories) is a bug, and shouldn't continue to be
> > > supported.
> >
> > All of those seem reasonable to me, and are what I would suggest if we
> > were starting from scratch. My only hesitation is whether people are
> > using the weird behavior of --object-dir in the wild (e.g., are bup
> > folks relying on it).
> >
> > Johannes, is this something you're using _now_, and it works, or
> > something you hoped to use in the future?
>
> I did some research[1] on what parts of `--object-dir` have worked (and not
> worked) in the past, and came to the conclusion that although this
> behavior is surprising, we do bear the responsibility of continuing to
> maintain it.

Hmm. Upon thinking on in more, here is some evidence to the contrary.
The new test, specifically this snippet:

    git init repo &&
    test_when_finished "rm -fr repo" &&
    (
      cd repo &&
      test_commit base &&
      git repack -d
    ) &&

    nongit git multi-pack-index --object-dir=$(pwd)/repo/.git/objects write

will fail with GIT_TEST_DEFAULT_HASH=sha256, since the MIDX internals
settle on the hash size via `the_hash_algo` which doesn't respect the
hash algorithm used by the target repository.

And that seems like it never could have worked. Try this at your shell
to observe the failure:

    git init --object-format=sha256 repo &&
    git -C repo commit --allow-empty -m initial &&
    git -C repo repack -d &&

    git multi-pack-index write --object-dir=$(pwd)/repo/.git/objects

and get:

    error: wrong index v2 file size in
    /home/ttaylorr/repo/.git/objects/pack/pack-9f08dc78ae6f37407a5acad69e3fdf5a1887eb7da5c043a1ddedc56ea7160814.idx
    warning: failed to open pack-index
    '/home/ttaylorr/repo/.git/objects/pack/pack-9f08dc78ae6f37407a5acad69e3fdf5a1887eb7da5c043a1ddedc56ea7160814.idx'

since we're trying to open a sha256 index with the_hash_algo in
sha1-mode.

The question is do we consider this to be a bug in the existing behavior
that we should patch, or an indication that the feature shouldn't exist
in the first place?

I think that I tend to agree more with the latter, so I'm inclined to
drop support for it (where "it" is running the midx command outside of a
repository) in this series (i.e., by making the midx builtin have the
RUN_SETUP flag instead of RUN_SETUP_GENTLY).

Thoughts?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-24 22:10                 ` Taylor Blau
@ 2021-08-27  6:01                   ` Junio C Hamano
  2021-08-27 18:03                     ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Junio C Hamano @ 2021-08-27  6:01 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, jonathantanmy

Taylor Blau <me@ttaylorr.com> writes:

> On Tue, Aug 24, 2021 at 03:06:55PM -0700, Junio C Hamano wrote:
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>> > FWIW, here is what I have somewhere in 'seen' where two topics meet.
>>
>> Oops, one change missed.
>
> Thanks; that matches my own resolution. I noticed that it does fail the
> new test in t5319, since writing a MIDX wants to make sure that we are
> only touching an alternate's object directory (which will fail if we are
> running `git multi-pack-index` from outside of a repository).
>
> My opinion is that we should require being inside of a repository to run
> the MIDX builtin. Otherwise we're allowing that command to modify any
> old MIDX, which doesn't make sense.
>
> I think we probably need a single unifying topic, so I'm happy if you
> want to discard one of our two topics from seen in the meantime.

It seems that the *.rev test (probably added by the other topic that
is a single patch fix) fails under sha256 hash.  I am not going to
dig it any further myself, but for the interested, CI breakage is
here:

  https://github.com/git/git/runs/3440068613?check_suite_focus=true#step:5:1219

Thanks.


^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-27  6:01                   ` Junio C Hamano
@ 2021-08-27 18:03                     ` Taylor Blau
  2021-08-29 22:56                       ` Junio C Hamano
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-08-27 18:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, peff, dstolee, jonathantanmy

On Thu, Aug 26, 2021 at 11:01:26PM -0700, Junio C Hamano wrote:
> It seems that the *.rev test (probably added by the other topic that
> is a single patch fix) fails under sha256 hash.  I am not going to
> dig it any further myself, but for the interested, CI breakage is
> here:
>
>   https://github.com/git/git/runs/3440068613?check_suite_focus=true#step:5:1219
>
> Thanks.

I saw the same error myself when integrating that patch into my series.
I discussed it more in [1], but the failure is basically caused by the
midx code using the_hash_algo even when operating in a different
repository via --object-dir.

If the_hash_algo doesn't match (as is the case when using `--object-dir`
to point at a SHA-256 repository when invoking the builtin from a
repository using SHA-1 or outside of a repository altogether), then
we'll fail when trying to open the pack indexes.

Thanks,
Taylor

[1]: https://lore.kernel.org/git/YSgGBxh24UAZR5X3@nand.local/

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 00/25] multi-pack reachability bitmaps
  2021-08-26 21:22           ` Taylor Blau
@ 2021-08-27 21:30             ` Jeff King
  2021-08-29 22:42               ` Junio C Hamano
  0 siblings, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-08-27 21:30 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Johannes Berg, git, dstolee, gitster, jonathantanmy

On Thu, Aug 26, 2021 at 05:22:15PM -0400, Taylor Blau wrote:

> > I did some research[1] on what parts of `--object-dir` have worked (and not
> > worked) in the past, and came to the conclusion that although this
> > behavior is surprising, we do bear the responsibility of continuing to
> > maintain it.
> 
> Hmm. Upon thinking on in more, here is some evidence to the contrary.
> The new test, specifically this snippet:
> 
>     git init repo &&
>     test_when_finished "rm -fr repo" &&
>     (
>       cd repo &&
>       test_commit base &&
>       git repack -d
>     ) &&
> 
>     nongit git multi-pack-index --object-dir=$(pwd)/repo/.git/objects write
> 
> will fail with GIT_TEST_DEFAULT_HASH=sha256, since the MIDX internals
> settle on the hash size via `the_hash_algo` which doesn't respect the
> hash algorithm used by the target repository.

Yeah, I think this is a good example of the class of things that might
fail: anything that requires the repo config to behave correctly.

I do think the hash format is somewhat unusual here. Most of the changes
to the on-disk files are reflected in the files themselves (e.g., pack
index v2 is chosen by config at _write_ time, but readers can interpret
the file stand-alone).

There may be other config that could influence the writing of the midx,
and we'd skip it in this kind of non-repo setup. An example here is
repack.usedeltabaseoffset, which midx_repack() tries to respect.
Ignoring that doesn't produce a nonsense result, but it doesn't follow
what would happen if run from inside the repo.

The other class of problems I'd expect is where part of the midx
operation needs to look at other parts of the repo. Bitmap generation is
an obvious one there, since we'd want to look at refs to find the
reachable tips. Now obviously that's a new feature we're trying to
introduce here, so it can't be an existing breakage. But it does make me
wonder what other problems might be lurking.

So I dunno. Even if it mostly works now, I'm not sure it's something
that I'm all that happy about supporting going forward. It seems like a
recipe for subtle bugs where the midx code calls into other library code
that assumes that it can look at the repository struct.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 00/25] multi-pack reachability bitmaps
  2021-08-27 21:30             ` Jeff King
@ 2021-08-29 22:42               ` Junio C Hamano
  0 siblings, 0 replies; 273+ messages in thread
From: Junio C Hamano @ 2021-08-29 22:42 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, Johannes Berg, git, dstolee, jonathantanmy

Jeff King <peff@peff.net> writes:

>> The new test, specifically this snippet:
>> 
>>     git init repo &&
>>     test_when_finished "rm -fr repo" &&
>>     (
>>       cd repo &&
>>       test_commit base &&
>>       git repack -d
>>     ) &&
>> 
>>     nongit git multi-pack-index --object-dir=$(pwd)/repo/.git/objects write
>> 
>> will fail with GIT_TEST_DEFAULT_HASH=sha256, since the MIDX internals
>> settle on the hash size via `the_hash_algo` which doesn't respect the
>> hash algorithm used by the target repository.
>
> Yeah, I think this is a good example of the class of things that might
> fail: anything that requires the repo config to behave correctly.
>
> I do think the hash format is somewhat unusual here. Most of the changes
> to the on-disk files are reflected in the files themselves (e.g., pack
> index v2 is chosen by config at _write_ time, but readers can interpret
> the file stand-alone).
>
> There may be other config that could influence the writing of the midx,
> and we'd skip it in this kind of non-repo setup. An example here is
> repack.usedeltabaseoffset, which midx_repack() tries to respect.
> Ignoring that doesn't produce a nonsense result, but it doesn't follow
> what would happen if run from inside the repo.
>
> The other class of problems I'd expect is where part of the midx
> operation needs to look at other parts of the repo. Bitmap generation is
> an obvious one there, since we'd want to look at refs to find the
> reachable tips. Now obviously that's a new feature we're trying to
> introduce here, so it can't be an existing breakage. But it does make me
> wonder what other problems might be lurking.
>
> So I dunno. Even if it mostly works now, I'm not sure it's something
> that I'm all that happy about supporting going forward. It seems like a
> recipe for subtle bugs where the midx code calls into other library code
> that assumes that it can look at the repository struct.

I tend to agree that we should first disallow things that are not
what we know we definitely need, and the non-repo setup is something
we would want to punt on to make sure we have a solid support for
the mainstream usecase.

Thanks.

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-27 18:03                     ` Taylor Blau
@ 2021-08-29 22:56                       ` Junio C Hamano
  2021-08-30  0:07                         ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Junio C Hamano @ 2021-08-29 22:56 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, jonathantanmy

Taylor Blau <me@ttaylorr.com> writes:

> On Thu, Aug 26, 2021 at 11:01:26PM -0700, Junio C Hamano wrote:
>> It seems that the *.rev test (probably added by the other topic that
>> is a single patch fix) fails under sha256 hash.  I am not going to
>> dig it any further myself, but for the interested, CI breakage is
>> here:
>>
>>   https://github.com/git/git/runs/3440068613?check_suite_focus=true#step:5:1219
>>
>> Thanks.
>
> I saw the same error myself when integrating that patch into my series.
> I discussed it more in [1], but the failure is basically caused by the
> midx code using the_hash_algo even when operating in a different
> repository via --object-dir.
>
> If the_hash_algo doesn't match (as is the case when using `--object-dir`
> to point at a SHA-256 repository when invoking the builtin from a
> repository using SHA-1 or outside of a repository altogether), then
> we'll fail when trying to open the pack indexes.

My recollection is that "--object-dir" is mostly about the alternate
odb usecase---am I correct?  It is unfortunate that we didn't start
with "alternate repository" and said "we only care about the objects
in the object store they have, and we do not have to care what refs
they point into their object database or what configuration they
have" instead.

I wonder if it is safe to assume that in practice a directory given
to the "--object-dir" option is always the "objects" subdirectory in
a repository, and it is an error if there is no "config" file next
to the directory.  Then, we could check ../config relative to the
given directory and error out if they use different hash.

I do not recall offhand how careful link_alt_odb_entries() is, but I
suspect it isn't at all (back when I invented it, there weren't need
for configuration to switch between hashes, and since then I do not
recall seeing any heavy update to the alternate odb code).  Perhaps
we should tighten it so that we check the accompanying "config" file
first and ignore the entry with incompatible "hash" (and we may
later discover other trait on a repository that is incompatible with
the current one)?

Thanks.





^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-29 22:56                       ` Junio C Hamano
@ 2021-08-30  0:07                         ` Taylor Blau
  2021-08-30  0:34                           ` Junio C Hamano
  2021-08-31  1:21                           ` Derrick Stolee
  0 siblings, 2 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-30  0:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, peff, dstolee, jonathantanmy

On Sun, Aug 29, 2021 at 03:56:31PM -0700, Junio C Hamano wrote:
> My recollection is that "--object-dir" is mostly about the alternate
> odb usecase---am I correct?

That matches my understanding. The documentation refers to the value of
this flag as `<alt>`, making me think that supporting non-alternates is
a historical accident.

> I wonder if it is safe to assume that in practice a directory given
> to the "--object-dir" option is always the "objects" subdirectory in
> a repository, and it is an error if there is no "config" file next
> to the directory.  Then, we could check ../config relative to the
> given directory and error out if they use different hash.

Maybe... although I have to admit to not being very excited about it. Is
the idea to read ../config to try and check for any incompatibilities
between the in-core state and the target repository's settings? If so,
this seems like a recipe for catching bugs too late.

For e.g., catching the_hash_algo != target_repository->hash would
definitely squash the bug you saw when integrating, but we would have to
remember to update this spot later on if, say, the target repository
started using a different reference storage backend (since bitmap
generation necessarily iterates the references to figure out which
commits should receive coverage).

> I do not recall offhand how careful link_alt_odb_entries() is, but I
> suspect it isn't at all (back when I invented it, there weren't need
> for configuration to switch between hashes, and since then I do not
> recall seeing any heavy update to the alternate odb code).  Perhaps
> we should tighten it so that we check the accompanying "config" file
> first and ignore the entry with incompatible "hash" (and we may
> later discover other trait on a repository that is incompatible with
> the current one)?

Or are you saying you're concerned about an alternates chain which
don't all use the same object format?

If the former, then I would say:

    "Supporting arbitrary --object-dir when invoked from outside a
    repository is a bug that happened to not cause any problems, but
    is surprising, error-prone, and should fall outside of the burden of
    backwards compatibility, so we should get rid of it."

If the latter, then I agree we could and should do better at detecting
it and providing a helpful error message, but I don't see how doing so
now or later would affect this series. Even if we just disallow
--object-dir pointing at a non-alternate repository, we would still have
the issue of having alternate chains which don't all have the same
object format.

So that makes me feel like the latter is a problem outside of this
series that can be dealt with later.

I'm admittedly a little unsure of how to progress here. Given that this
series has received positive review over the complicated parts, it seems
that it is getting stuck on how to deal with `--object-dir`, especially
when invoked outside of a Git repository. My inclination would be to
send a new version that simply requires the MIDX builtin to be run from
within a repository (as well as the cleanups from Johannes).

Does that seem like a good direction forward to you? If not, let me know
if there's another issue that we should deal with first and I'd be happy
to start there.

> Thanks.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-30  0:07                         ` Taylor Blau
@ 2021-08-30  0:34                           ` Junio C Hamano
  2021-08-30  0:43                             ` Taylor Blau
  2021-08-31  1:21                           ` Derrick Stolee
  1 sibling, 1 reply; 273+ messages in thread
From: Junio C Hamano @ 2021-08-30  0:34 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, jonathantanmy

Taylor Blau <me@ttaylorr.com> writes:

> now or later would affect this series. Even if we just disallow
> --object-dir pointing at a non-alternate repository, we would still have
> the issue of having alternate chains which don't all have the same
> object format.

Exactly.  That is why I feel that it probably needs to be dealt with
before doing anything else.  The alternate mechanism pulling in an
object store that uses incompatible hash algo would break not just
the multi-pack-index but probably the basic object access layer as
well, which would be more grave problem, no?

> My inclination would be to
> send a new version that simply requires the MIDX builtin to be run from
> within a repository (as well as the cleanups from Johannes).

Sounds like a good first step.

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-30  0:34                           ` Junio C Hamano
@ 2021-08-30  0:43                             ` Taylor Blau
  2021-08-30 22:10                               ` brian m. carlson
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-08-30  0:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, peff, dstolee, jonathantanmy, sandals

On Sun, Aug 29, 2021 at 05:34:18PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > now or later would affect this series. Even if we just disallow
> > --object-dir pointing at a non-alternate repository, we would still have
> > the issue of having alternate chains which don't all have the same
> > object format.
>
> Exactly.  That is why I feel that it probably needs to be dealt with
> before doing anything else.  The alternate mechanism pulling in an
> object store that uses incompatible hash algo would break not just
> the multi-pack-index but probably the basic object access layer as
> well, which would be more grave problem, no?

Yeah; it does. Maybe I'm holding it wrong (and brian, cc'd, can help
me), but this is an easy way to see the problem:

  git init repo
  git init alternate

  git -C repo commit --allow-empty -m foo
  ( cd repo/.git/objects && pwd ) >alternate/.git/objects/info/alternates
  git -C alternate rev-list --objects --alternate-refs

which will produce:

    $ git rev-list --objects --alternate-refs
    warning: invalid line while parsing alternate refs: <sha256 id>

But I don't know if I quite understand your "probably needs to be dealt
with before doing anything else". I think we can proceed with this
series and deal with the alternate object-format thing separately, no?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-30  0:43                             ` Taylor Blau
@ 2021-08-30 22:10                               ` brian m. carlson
  2021-08-30 22:28                                 ` Junio C Hamano
  0 siblings, 1 reply; 273+ messages in thread
From: brian m. carlson @ 2021-08-30 22:10 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Junio C Hamano, git, peff, dstolee, jonathantanmy

[-- Attachment #1: Type: text/plain, Size: 1847 bytes --]

On 2021-08-30 at 00:43:30, Taylor Blau wrote:
> On Sun, Aug 29, 2021 at 05:34:18PM -0700, Junio C Hamano wrote:
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> > > now or later would affect this series. Even if we just disallow
> > > --object-dir pointing at a non-alternate repository, we would still have
> > > the issue of having alternate chains which don't all have the same
> > > object format.
> >
> > Exactly.  That is why I feel that it probably needs to be dealt with
> > before doing anything else.  The alternate mechanism pulling in an
> > object store that uses incompatible hash algo would break not just
> > the multi-pack-index but probably the basic object access layer as
> > well, which would be more grave problem, no?
> 
> Yeah; it does. Maybe I'm holding it wrong (and brian, cc'd, can help
> me), but this is an easy way to see the problem:
> 
>   git init repo
>   git init alternate
> 
>   git -C repo commit --allow-empty -m foo
>   ( cd repo/.git/objects && pwd ) >alternate/.git/objects/info/alternates
>   git -C alternate rev-list --objects --alternate-refs
> 
> which will produce:
> 
>     $ git rev-list --objects --alternate-refs
>     warning: invalid line while parsing alternate refs: <sha256 id>
> 
> But I don't know if I quite understand your "probably needs to be dealt
> with before doing anything else". I think we can proceed with this
> series and deal with the alternate object-format thing separately, no?

Yeah, this is a possible problem.  You can also see it when using git
index-pack outside of a repository with an incorrect --object-format
option.

I'm not sure how folks want to deal with that; I'm just fine saying,
"Well, don't do that," but other folks may have different opinions.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-30 22:10                               ` brian m. carlson
@ 2021-08-30 22:28                                 ` Junio C Hamano
  2021-08-30 22:33                                   ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Junio C Hamano @ 2021-08-30 22:28 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Taylor Blau, git, peff, dstolee, jonathantanmy

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> Yeah, this is a possible problem.  You can also see it when using git
> index-pack outside of a repository with an incorrect --object-format
> option.
>
> I'm not sure how folks want to deal with that; I'm just fine saying,
> "Well, don't do that," but other folks may have different opinions.

OK, so if we go back to the original breakage of the test script
that triggered this discussion, the right solution would be to make
sure both test repositories/object stores are prepared with the
algorithm specified with GIT_TEST_DEFAULT_HASH?

Thanks.

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-30 22:28                                 ` Junio C Hamano
@ 2021-08-30 22:33                                   ` Taylor Blau
  2021-08-31  5:19                                     ` Jeff King
  2021-08-31 16:29                                     ` Junio C Hamano
  0 siblings, 2 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-30 22:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: brian m. carlson, Taylor Blau, git, peff, dstolee, jonathantanmy

On Mon, Aug 30, 2021 at 03:28:47PM -0700, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
> > Yeah, this is a possible problem.  You can also see it when using git
> > index-pack outside of a repository with an incorrect --object-format
> > option.
> >
> > I'm not sure how folks want to deal with that; I'm just fine saying,
> > "Well, don't do that," but other folks may have different opinions.
>
> OK, so if we go back to the original breakage of the test script
> that triggered this discussion, the right solution would be to make
> sure both test repositories/object stores are prepared with the
> algorithm specified with GIT_TEST_DEFAULT_HASH?

Just to make sure do you still see this as a separate issue from running
the midx builtin outside of a repository?

I.e., if we require the midx builtin to be run in a repository, it
side-steps this issue (but presumably not completely, and so we should
deal with both eventually). I want to make sure that I'm on the same
page before I drop 25+ emails on the list.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-30  0:07                         ` Taylor Blau
  2021-08-30  0:34                           ` Junio C Hamano
@ 2021-08-31  1:21                           ` Derrick Stolee
  2021-08-31  5:37                             ` Jeff King
  1 sibling, 1 reply; 273+ messages in thread
From: Derrick Stolee @ 2021-08-31  1:21 UTC (permalink / raw)
  To: Taylor Blau, Junio C Hamano; +Cc: git, peff, dstolee, jonathantanmy

On 8/29/21 8:07 PM, Taylor Blau wrote:
> On Sun, Aug 29, 2021 at 03:56:31PM -0700, Junio C Hamano wrote:
>> My recollection is that "--object-dir" is mostly about the alternate
>> odb usecase---am I correct?
> 
> That matches my understanding. The documentation refers to the value of
> this flag as `<alt>`, making me think that supporting non-alternates is
> a historical accident.

Yes, supporting non-alternates is a historical accident. Supporting
alternates that are not actually the core object database of a full
repository is on purpose.

So, hopefully the remaining discussion that I am seeing can be
solved by a decision such as:

  "If we add the restriction that the builtin always runs with a
   repository and --object-dir always points to its objects dir
   or one of its registered alternates, then we have access to a
   local config file to learn how to interpret that object directory."

>> I wonder if it is safe to assume that in practice a directory given
>> to the "--object-dir" option is always the "objects" subdirectory in
>> a repository, and it is an error if there is no "config" file next
>> to the directory.  Then, we could check ../config relative to the
>> given directory and error out if they use different hash.

I would say that is not always the case, and we should not error out.

I think taking a look to see if ../config exists to use the data
might be helpful for some cases, but should not be a blocker for
completing the requested operation. The config from the non-alternate
repo should be sufficient for this (somewhat strange) case.

> I'm admittedly a little unsure of how to progress here. Given that this
> series has received positive review over the complicated parts, it seems
> that it is getting stuck on how to deal with `--object-dir`, especially
> when invoked outside of a Git repository. My inclination would be to
> send a new version that simply requires the MIDX builtin to be run from
> within a repository (as well as the cleanups from Johannes).
> 
> Does that seem like a good direction forward to you? If not, let me know
> if there's another issue that we should deal with first and I'd be happy
> to start there.

I think it is sensible to restrict 'git multi-pack-index' to run
inside a repository on its own merits. It happens to also solve
some tricky problems that have come up since its creation.

Sorry I'm so late to this thread. I gave most of the messages in this
chain a quick read and this seemed like the best place to chime in.
Hopefully this isn't too much of a re-tread of things covered elsewhere.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-30 22:33                                   ` Taylor Blau
@ 2021-08-31  5:19                                     ` Jeff King
  2021-08-31 16:29                                     ` Junio C Hamano
  1 sibling, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-08-31  5:19 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Junio C Hamano, brian m. carlson, git, dstolee, jonathantanmy

On Mon, Aug 30, 2021 at 06:33:18PM -0400, Taylor Blau wrote:

> On Mon, Aug 30, 2021 at 03:28:47PM -0700, Junio C Hamano wrote:
> > "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> >
> > > Yeah, this is a possible problem.  You can also see it when using git
> > > index-pack outside of a repository with an incorrect --object-format
> > > option.
> > >
> > > I'm not sure how folks want to deal with that; I'm just fine saying,
> > > "Well, don't do that," but other folks may have different opinions.
> >
> > OK, so if we go back to the original breakage of the test script
> > that triggered this discussion, the right solution would be to make
> > sure both test repositories/object stores are prepared with the
> > algorithm specified with GIT_TEST_DEFAULT_HASH?
> 
> Just to make sure do you still see this as a separate issue from running
> the midx builtin outside of a repository?

Adding my two cents: yes, I think it most definitely should be a
separate issue. As you demonstrated, differing config between alternates
and repos that point to them is not specific to the midx code. I agree
with brian's "well, don't do that". But _if_ we want to try to behave
better in such a case, whatever we changes we make would then naturally
apply to the midx code as well.

The two midx-specific things we have to care about are:

  - is it OK for the midx command to refuse to operate when we are not
    in a repository at all? I think yes; we can't even know which hash
    is being used, along with who knows what other lurking
    complications.

  - is it OK to restrict the midx command's --object-dir to only operate
    on a directory which is an alternate of the current repo? I think
    yes again. If it _isn't_ related, we have all the lurking problems
    from the first point, but even worse (because we use config, refs,
    and other information from our current repo with the _totally
    unrelated_ object dir).

So I'm all in favor of locking those down now before things get any more
complicated. If we later want to make the object store more aware of of
differences between alternates and the main store (like say, the object
hash in use), then we could consider loosening using the same mechanism.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-31  1:21                           ` Derrick Stolee
@ 2021-08-31  5:37                             ` Jeff King
  2021-08-31 16:33                               ` Junio C Hamano
  0 siblings, 1 reply; 273+ messages in thread
From: Jeff King @ 2021-08-31  5:37 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, Junio C Hamano, git, dstolee, jonathantanmy

On Mon, Aug 30, 2021 at 09:21:31PM -0400, Derrick Stolee wrote:

> Yes, supporting non-alternates is a historical accident. Supporting
> alternates that are not actually the core object database of a full
> repository is on purpose.
> 
> So, hopefully the remaining discussion that I am seeing can be
> solved by a decision such as:
> 
>   "If we add the restriction that the builtin always runs with a
>    repository and --object-dir always points to its objects dir
>    or one of its registered alternates, then we have access to a
>    local config file to learn how to interpret that object directory."

I left a similar comment in the other part of the thread. :)

> >> I wonder if it is safe to assume that in practice a directory given
> >> to the "--object-dir" option is always the "objects" subdirectory in
> >> a repository, and it is an error if there is no "config" file next
> >> to the directory.  Then, we could check ../config relative to the
> >> given directory and error out if they use different hash.
> 
> I would say that is not always the case, and we should not error out.
> 
> I think taking a look to see if ../config exists to use the data
> might be helpful for some cases, but should not be a blocker for
> completing the requested operation. The config from the non-alternate
> repo should be sufficient for this (somewhat strange) case.

Yes, agreed. We have long supported these kind of "bare" alternates, and
I wouldn't be surprised if they are in wide use (though I do wonder how
folks actually modify them, since most commands that touch objects
really do want to be in a repository).

In other cases where we may benefit from their being a containing repo
(e.g., accessing the ref tips of the alternate), we speculatively look
at ".." and see if there are any refs. See refs_from_alternate_cb()[0].

The natural extension for the hash-format problem would probably be to
call check_repository_format_gently() on the parent directory of the
alternate-objects dir. If it succeeds, then we can pull out the
hash_algo parameter from its repository_format struct. And if not, then
we just assume it matches the main repo.

But I suspect all of this is moot for now, beyond being able to return a
nicer error message. The rest of the code is not at all ready to handle
packs with two different hashes in the same process. And I suspect it
would take a reasonable amount of refactoring to make it so. If somebody
wants to work on that, I won't stop them, but I kind of doubt it is
worth anybody's time.

[0] Looking at refs_from_alternate_cb(), I did wonder if it would work
    at all with a reftable alternate, but I suspect it would. I think we
    ended up still having a "refs/" directory in that case, so we'd
    recognize it as a repo (though really, it ought to be using
    is_git_directory() instead of its hacky check). And then we farm out
    the actual ref iteration to a separate for-each-ref process, passing
    along --git-dir, which will read that alternate repo's config. So it
    should Just Work, even with a different ref backend. It's almost
    certainly broken if the hash algorithms don't match, though, because
    we'd get oddly sized results from for-each-ref's output.

    That's all just interesting tangent, though. :)

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-30 22:33                                   ` Taylor Blau
  2021-08-31  5:19                                     ` Jeff King
@ 2021-08-31 16:29                                     ` Junio C Hamano
  2021-08-31 16:39                                       ` Taylor Blau
  1 sibling, 1 reply; 273+ messages in thread
From: Junio C Hamano @ 2021-08-31 16:29 UTC (permalink / raw)
  To: Taylor Blau; +Cc: brian m. carlson, git, peff, dstolee, jonathantanmy

Taylor Blau <me@ttaylorr.com> writes:

> On Mon, Aug 30, 2021 at 03:28:47PM -0700, Junio C Hamano wrote:
>> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>>
>> > Yeah, this is a possible problem.  You can also see it when using git
>> > index-pack outside of a repository with an incorrect --object-format
>> > option.
>> >
>> > I'm not sure how folks want to deal with that; I'm just fine saying,
>> > "Well, don't do that," but other folks may have different opinions.
>>
>> OK, so if we go back to the original breakage of the test script
>> that triggered this discussion, the right solution would be to make
>> sure both test repositories/object stores are prepared with the
>> algorithm specified with GIT_TEST_DEFAULT_HASH?
>
> Just to make sure do you still see this as a separate issue from running
> the midx builtin outside of a repository?

They are separate issues, but the .midx issue has a small overlap
with the much bigger "do not mix repositories and object stores with
different hashes" issue.

The users of raw object stores (e.g. $GIT_OBJECT_DIRECTORIES,
"--object-dir", there may be others) need to be updated so that the
code paths involved can reliably learn what hash algorithm is used
and other traits that may not be available in the object store alone
(e.g. refs might be relevant if the using code needs to learn which
objects are still reachable) for the latter.  It would need a couple
of things that are fairly isolated to solve, I would imagine:

 (1) convention to either tie a raw object store with its repository
     or declare a raw object store is unusable because "other
     traits" are not found for it.

 (2) given a repository, inspect it and decide if it is "compatible"
     with the current repository.

 (3) update code paths involved in prepare_alt_odb() to use (1) and
     (2) to inspect and reject incompatible object store as
     alternate.

And once we have that, "git multi-pack-index --object-dir=X" can use
(1) and (2) for the same "Is this other object store compatible with
the current repository?" check, no?

The other side of the coin is that midx needs to do equivalents of
(1) and (2) anyway, and the required amount of the work for (3)
smells a lot smaller than work for (1) and (2).  (3) may be just a
matter of "add a call to is_odb_compatible(dir) for the directory
being added as an alt odb", and the same single validation call may
be all it needs on the --object-dir argument on the midx side.

I think it makes sense for the midx command to require being in a
repository to run (to establish what "the current repository" is)
and insist on the other object store given with --object-dir to be
"compatible" with the current repository (i.e. the same hash
algorithm, there may be others).  I am a bit fuzzy why we want it
to be already our alternate.

Thanks.

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-31  5:37                             ` Jeff King
@ 2021-08-31 16:33                               ` Junio C Hamano
  2021-08-31 16:43                                 ` Taylor Blau
  2021-09-01 10:03                                 ` Jeff King
  0 siblings, 2 replies; 273+ messages in thread
From: Junio C Hamano @ 2021-08-31 16:33 UTC (permalink / raw)
  To: Jeff King; +Cc: Derrick Stolee, Taylor Blau, git, dstolee, jonathantanmy

Jeff King <peff@peff.net> writes:

>> I think taking a look to see if ../config exists to use the data
>> might be helpful for some cases, but should not be a blocker for
>> completing the requested operation. The config from the non-alternate
>> repo should be sufficient for this (somewhat strange) case.
>
> Yes, agreed. We have long supported these kind of "bare" alternates, and
> I wouldn't be surprised if they are in wide use (though I do wonder how
> folks actually modify them, since most commands that touch objects
> really do want to be in a repository).

I kind of find the above two somewhat surprising, but I am willing
to go with the less safer option if that is what people want.

It has been perfectly OK in the pre-alternative-hash-algorithms
world, but we no longer live in such a world, so we'd need to come
up with a way to keep using alternates in a safer way.

I do not see the reasoning behind "should not be a blocker" from
Derrick substantiated.  What's the reason why that raw object store
cannot come from an existing repository, and what's the benefit we
get from not having to have a repository there?

> The natural extension for the hash-format problem would probably be to
> call check_repository_format_gently() on the parent directory of the
> alternate-objects dir. If it succeeds, then we can pull out the
> hash_algo parameter from its repository_format struct. And if not, then
> we just assume it matches the main repo.
>
> But I suspect all of this is moot for now, beyond being able to return a
> nicer error message. The rest of the code is not at all ready to handle
> packs with two different hashes in the same process.

I do not think it is all that urgent to make it possible for packs
with different algorithms to be used.  It is sufficient to _ignore_
(or error out) configured odb that is incompatible with the current
repository.

Thanks.

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-31 16:29                                     ` Junio C Hamano
@ 2021-08-31 16:39                                       ` Taylor Blau
  2021-08-31 17:44                                         ` Junio C Hamano
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 16:39 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Taylor Blau, brian m. carlson, git, peff, dstolee, jonathantanmy

On Tue, Aug 31, 2021 at 09:29:46AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > On Mon, Aug 30, 2021 at 03:28:47PM -0700, Junio C Hamano wrote:
> >> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> >>
> >> > Yeah, this is a possible problem.  You can also see it when using git
> >> > index-pack outside of a repository with an incorrect --object-format
> >> > option.
> >> >
> >> > I'm not sure how folks want to deal with that; I'm just fine saying,
> >> > "Well, don't do that," but other folks may have different opinions.
> >>
> >> OK, so if we go back to the original breakage of the test script
> >> that triggered this discussion, the right solution would be to make
> >> sure both test repositories/object stores are prepared with the
> >> algorithm specified with GIT_TEST_DEFAULT_HASH?
> >
> > Just to make sure do you still see this as a separate issue from running
> > the midx builtin outside of a repository?
>
> They are separate issues, but the .midx issue has a small overlap
> with the much bigger "do not mix repositories and object stores with
> different hashes" issue.

OK, good. Everything you wrote below (which I snipped off in my reply)
makes sense to me, and seems like a worthwhile direction to pursue
outside of this series, especially as more users start using sha256
repositories.

> I think it makes sense for the midx command to require being in a
> repository to run (to establish what "the current repository" is)
> and insist on the other object store given with --object-dir to be
> "compatible" with the current repository (i.e. the same hash
> algorithm, there may be others).  I am a bit fuzzy why we want it
> to be already our alternate.

I don't think there's any strict requirement to the other repository
being our alternate, other than touching arbitrary repositories is a
surprising behavior that appears (to me, at least) to be inconsistent
with the rest of Git.

After (the rerolled version of) this series, we'll be in a state where:

  - `git multi-pack-index` will not run when outside of a Git
    repository.
  - The `--object-dir` argument will only recognize object directories
    belonging to an alternate of the current repository.
  - Using `--object-dir` to point to a repository which uses a
    different hash than the repository in the current working directory
    will continue to not work (as was the case before this series).

I think(?) that there is consensus for that approach, so patches
incoming...

> Thanks.

(Thank you, by the way, for clarifying this all in so much detail. I
would much rather just have code to talk about, but it feels
particularly important to be on the same page beforehand in this
instance, since there is *so much* code, and this discussion is centered
around so little of it).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-31 16:33                               ` Junio C Hamano
@ 2021-08-31 16:43                                 ` Taylor Blau
  2021-08-31 17:17                                   ` Derrick Stolee
  2021-09-01 10:03                                 ` Jeff King
  1 sibling, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 16:43 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Derrick Stolee, Taylor Blau, git, dstolee, jonathantanmy

On Tue, Aug 31, 2021 at 09:33:38AM -0700, Junio C Hamano wrote:
> I do not see the reasoning behind "should not be a blocker" from
> Derrick substantiated.  What's the reason why that raw object store
> cannot come from an existing repository, and what's the benefit we
> get from not having to have a repository there?

I also didn't find the reasoning spelled out in his response, but I have
definitely had off-list discussions with Stolee where it was important to
be able to pass a value to `--object-dir` which does *not* belong to a
Git repository (but is used as a dumping ground for packs, a MIDX, and
loose objects).

It may be worthwhile to recapitulate that discussion here on the list.
(I'm hoping that Stolee won't mind filling in the details, since I seem
to have forgotten most of them).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-31 16:43                                 ` Taylor Blau
@ 2021-08-31 17:17                                   ` Derrick Stolee
  0 siblings, 0 replies; 273+ messages in thread
From: Derrick Stolee @ 2021-08-31 17:17 UTC (permalink / raw)
  To: Taylor Blau, Junio C Hamano; +Cc: Jeff King, git, dstolee, jonathantanmy

On 8/31/2021 12:43 PM, Taylor Blau wrote:
> On Tue, Aug 31, 2021 at 09:33:38AM -0700, Junio C Hamano wrote:
>> I do not see the reasoning behind "should not be a blocker" from
>> Derrick substantiated.  What's the reason why that raw object store
>> cannot come from an existing repository, and what's the benefit we
>> get from not having to have a repository there?
> 
> I also didn't find the reasoning spelled out in his response, but I have
> definitely had off-list discussions with Stolee where it was important to
> be able to pass a value to `--object-dir` which does *not* belong to a
> Git repository (but is used as a dumping ground for packs, a MIDX, and
> loose objects).
> 
> It may be worthwhile to recapitulate that discussion here on the list.
> (I'm hoping that Stolee won't mind filling in the details, since I seem
> to have forgotten most of them).

The way we have been using alternates in VFS for Git and Scalar is as a
"shared object cache" that is shared across multiple full Git repositories
with their own working trees. The shared object cache is located in a
location that can be found during "scalar clone" such as

	~/.scalarCache/url_<hash-of-URL>/

This directory contains the same data as a .git/objects directory would.

Data is added to that cache using hooks during 'git fetch' or other
requests for remote data. This means that the second "scalar clone"
command is much faster than the first, because it already has most of
the commit and tree data required to satisfy the partial clone.

(Note: this feature does not exist in the current Scalar CLI RFC, but
would be contributed later.)

These caches were designed before the multi-pack-index -- in fact,
they were an inspiration for them because now deleting a repo would not
clean up old pack-files. The data would be added as a raw pack-file that
is processed with 'git index-pack' or as loose objects. The --object-dir
option was directly created as a way to target the creation and
maintenance of a multi-pack-index within one of these caches that don't
exist as full repositories. Clearly, there were some gaps in that
implementation and I regret creating those gaps.

If I were to redesign the shared object cache, then I would have created
the cache directories as bare repos and then create the "clone" repo as
a worktree linked to that base. That would allow all objects and refs to
be shared, achieving the same goals and an even better user experience.

I'm advocating for the position to continue allowing this feature to
exist without a necessary on-upgrade conversion of these non-repos to
full repos. Maybe that is the best thing to do in the long-term, but
will take some time to do. Keeping compatibility for now seems like it
won't hurt too much.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-31 16:39                                       ` Taylor Blau
@ 2021-08-31 17:44                                         ` Junio C Hamano
  2021-08-31 18:48                                           ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Junio C Hamano @ 2021-08-31 17:44 UTC (permalink / raw)
  To: Taylor Blau; +Cc: brian m. carlson, git, peff, dstolee, jonathantanmy

Taylor Blau <me@ttaylorr.com> writes:

> After (the rerolled version of) this series, we'll be in a state where:
>
>   - `git multi-pack-index` will not run when outside of a Git
>     repository.
>   - The `--object-dir` argument will only recognize object directories
>     belonging to an alternate of the current repository.
>   - Using `--object-dir` to point to a repository which uses a
>     different hash than the repository in the current working directory
>     will continue to not work (as was the case before this series).

Hmph, re-reading the document for midx:

    --object-dir=<dir>::
            Use given directory for the location of Git objects. We check
            `<dir>/packs/multi-pack-index` for the current MIDX file, and
            `<dir>/packs` for the pack-files to index.

why does it matter if we are in a repository in the first place?
It's not like we combine the objects from the specified object dir
and our local object store (if that were the case, these two object
stores must be compatible).

How old is --object-dir option and how widely is it used?  Can we
just remove it and have users go to the repository that uses it
as its object store with "git -C <there>" mechanism, or have we come
too far with this (apparently broken) design to make such a fix
infeasible?

Thanks.

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-31 17:44                                         ` Junio C Hamano
@ 2021-08-31 18:48                                           ` Taylor Blau
  0 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 18:48 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Taylor Blau, brian m. carlson, git, peff, dstolee, jonathantanmy

On Tue, Aug 31, 2021 at 10:44:38AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > After (the rerolled version of) this series, we'll be in a state where:
> >
> >   - `git multi-pack-index` will not run when outside of a Git
> >     repository.
> >   - The `--object-dir` argument will only recognize object directories
> >     belonging to an alternate of the current repository.
> >   - Using `--object-dir` to point to a repository which uses a
> >     different hash than the repository in the current working directory
> >     will continue to not work (as was the case before this series).
>
> Hmph, re-reading the document for midx:
>
>     --object-dir=<dir>::
>             Use given directory for the location of Git objects. We check
>             `<dir>/packs/multi-pack-index` for the current MIDX file, and
>             `<dir>/packs` for the pack-files to index.
>
> why does it matter if we are in a repository in the first place?
> It's not like we combine the objects from the specified object dir
> and our local object store (if that were the case, these two object
> stores must be compatible).

It shouldn't matter, but the use-case is described in [1] by Stolee.  He
explains it in detail, but I do think we have to live with
`--object-dir` in one way or another. He does say it'd be OK to only
be able to invoke it from within a repository, and to only be able to
reference alternates, though.

Thanks,
Taylor

[1]: https://lore.kernel.org/git/d23bca9b-9da2-984f-065c-6cf60a80ddef@gmail.com/

^ permalink raw reply	[flat|nested] 273+ messages in thread

* [PATCH v5 00/27] multi-pack reachability bitmaps
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (24 preceding siblings ...)
  2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
@ 2021-08-31 20:51 ` Taylor Blau
  2021-08-31 20:51   ` [PATCH v5 01/27] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
                     ` (28 more replies)
  25 siblings, 29 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:51 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Here is another version of the multi-pack reachability bitmaps series. It is
virtually unchanged since last time.

The changes that did occur is that I integrated Johannes' patch from [1] to fix
cleaning up MIDX .rev and .bitmap files when using `--object-dir`. That inspired
a lengthy discussion [2] about `--object-dir`, alternates, object-format and
running the MIDX builtin outside of a Git repository.

This series resolves that discussion by leaving everything as-is, and only
changing the following:

  - `git multi-pack-index` will not run when outside of a Git
    repository.

  - The `--object-dir` argument will only recognize object directories
    belonging to an alternate of the current repository.

  - Using `--object-dir` to point to a repository which uses a
    different hash than the repository in the current working directory
    will continue to not work (as was the case before this series).

And because this incorporates [1], we will also not accidentally clean `.rev`
files from the wrong object directory.

I think that this version is ready-to-go, and that we can turn our attention to
squashing some of these cross-alternate buglets, and integrating MIDX bitmaps
with `git repack`.

[1]: https://lore.kernel.org/git/20210823171011.80588-1-johannes@sipsolutions.net/
[2]: https://lore.kernel.org/git/YSVsHo2wLhnraBnv@nand.local/

Jeff King (2):
  t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
  t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP

Taylor Blau (25):
  pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
  pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  pack-bitmap-write.c: free existing bitmaps
  Documentation: describe MIDX-based bitmaps
  midx: disallow running outside of a repository
  midx: fix `*.rev` cleanups with `--object-dir`
  midx: clear auxiliary .rev after replacing the MIDX
  midx: reject empty `--preferred-pack`'s
  midx: infer preferred pack when not given one
  midx: close linked MIDXs, avoid leaking memory
  midx: avoid opening multiple MIDXs when writing
  pack-bitmap.c: introduce 'bitmap_num_objects()'
  pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
  pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
  pack-bitmap.c: avoid redundant calls to try_partial_reuse
  pack-bitmap: read multi-pack bitmaps
  pack-bitmap: write multi-pack bitmaps
  t5310: move some tests to lib-bitmap.sh
  t/helper/test-read-midx.c: add --checksum mode
  t5326: test multi-pack bitmap behavior
  t5319: don't write MIDX bitmaps in t5319
  t7700: update to work with MIDX bitmap test knob
  midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  p5310: extract full and partial bitmap tests
  p5326: perf tests for MIDX bitmaps

 Documentation/git-multi-pack-index.txt       |  20 +-
 Documentation/technical/bitmap-format.txt    |  71 ++-
 Documentation/technical/multi-pack-index.txt |  10 +-
 builtin/multi-pack-index.c                   |   2 +
 builtin/pack-objects.c                       |   8 +-
 builtin/repack.c                             |  12 +-
 ci/run-build-and-tests.sh                    |   1 +
 git.c                                        |   2 +-
 midx.c                                       | 328 ++++++++++--
 midx.h                                       |   5 +
 pack-bitmap-write.c                          |  79 ++-
 pack-bitmap.c                                | 499 ++++++++++++++++---
 pack-bitmap.h                                |   9 +-
 packfile.c                                   |   2 +-
 t/README                                     |   4 +
 t/helper/test-read-midx.c                    |  16 +-
 t/lib-bitmap.sh                              | 240 +++++++++
 t/perf/lib-bitmap.sh                         |  69 +++
 t/perf/p5310-pack-bitmaps.sh                 |  65 +--
 t/perf/p5326-multi-pack-bitmaps.sh           |  43 ++
 t/t0410-partial-clone.sh                     |  12 +-
 t/t5310-pack-bitmaps.sh                      | 231 +--------
 t/t5319-multi-pack-index.sh                  |  53 +-
 t/t5326-multi-pack-bitmaps.sh                | 286 +++++++++++
 t/t7700-repack.sh                            |  18 +-
 25 files changed, 1644 insertions(+), 441 deletions(-)
 create mode 100644 t/perf/lib-bitmap.sh
 create mode 100755 t/perf/p5326-multi-pack-bitmaps.sh
 create mode 100755 t/t5326-multi-pack-bitmaps.sh

Range-diff against v4:
 1:  92dc0bbc0d =  1:  7815d9929d pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
 2:  979276bc74 =  2:  629171115a pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
 3:  8f00493955 =  3:  d469c1d8f6 pack-bitmap-write.c: free existing bitmaps
 4:  bc7db926d8 =  4:  158ff797c4 Documentation: describe MIDX-based bitmaps
 -:  ---------- >  5:  5f24be8985 midx: disallow running outside of a repository
 -:  ---------- >  6:  0aacaa9283 midx: fix `*.rev` cleanups with `--object-dir`
 5:  771741844b !  7:  d30e6fe9a5 midx: clear auxiliary .rev after replacing the MIDX
    @@ midx.c: static int write_midx_internal(const char *object_dir, struct multi_pack
      
      	if (flags & MIDX_WRITE_REV_INDEX)
      		write_midx_reverse_index(midx_name, midx_hash, &ctx);
    --	clear_midx_files_ext(the_repository, ".rev", midx_hash);
    +-	clear_midx_files_ext(object_dir, ".rev", midx_hash);
      
      	commit_lock_file(&lk);
      
    -+	clear_midx_files_ext(the_repository, ".rev", midx_hash);
    ++	clear_midx_files_ext(object_dir, ".rev", midx_hash);
     +
      cleanup:
      	for (i = 0; i < ctx.nr; i++) {
 6:  dab5dbf228 =  8:  db2a24a8ae midx: reject empty `--preferred-pack`'s
 7:  31f4517de0 =  9:  059c583e34 midx: infer preferred pack when not given one
 8:  aa3bd96d9b = 10:  6f5ca446f3 midx: close linked MIDXs, avoid leaking memory
 9:  c9fea31fa8 ! 11:  4656608f73 midx: avoid opening multiple MIDXs when writing
    @@ Commit message
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    + ## Documentation/git-multi-pack-index.txt ##
    +@@ Documentation/git-multi-pack-index.txt: OPTIONS
    + 	Use given directory for the location of Git objects. We check
    + 	`<dir>/packs/multi-pack-index` for the current MIDX file, and
    + 	`<dir>/packs` for the pack-files to index.
    +++
    ++`<dir>` must be an alternate of the current repository.
    + 
    + --[no-]progress::
    + 	Turn progress on/off explicitly. If neither is specified, progress is
    +
      ## midx.c ##
     @@ midx.c: static int midx_checksum_valid(struct multi_pack_index *m)
      	return hashfile_checksum_valid(m->data, m->data_len);
10:  ee72fb7e38 = 12:  4c793df9d1 pack-bitmap.c: introduce 'bitmap_num_objects()'
11:  ede0bf1ce1 = 13:  9f165037ce pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
12:  df6844def0 = 14:  ba5fd71fb3 pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
13:  4e06f051a7 = 15:  06db8dbbc1 pack-bitmap.c: avoid redundant calls to try_partial_reuse
14:  a0d73eb3d3 = 16:  61798853b6 pack-bitmap: read multi-pack bitmaps
15:  9d83ad77ab ! 17:  4968229663 pack-bitmap: write multi-pack bitmaps
    @@ midx.c: static int write_midx_internal(const char *object_dir,
     +			 * corresponding bitmap (or one wasn't requested).
     +			 */
     +			if (!want_bitmap)
    -+				clear_midx_files_ext(the_repository, ".bitmap",
    ++				clear_midx_files_ext(object_dir, ".bitmap",
     +						     NULL);
     +			goto cleanup;
     +		}
    @@ midx.c: static int write_midx_internal(const char *object_dir,
     +		}
     +	}
     +
    -+	close_object_store(the_repository->objects);
    ++	if (ctx.m)
    ++		close_object_store(the_repository->objects);
      
      	commit_lock_file(&lk);
      
    -+	clear_midx_files_ext(the_repository, ".bitmap", midx_hash);
    - 	clear_midx_files_ext(the_repository, ".rev", midx_hash);
    ++	clear_midx_files_ext(object_dir, ".bitmap", midx_hash);
    + 	clear_midx_files_ext(object_dir, ".rev", midx_hash);
      
      cleanup:
     @@ midx.c: static int write_midx_internal(const char *object_dir,
    @@ midx.c: void clear_midx_file(struct repository *r)
      	if (remove_path(midx))
      		die(_("failed to clear multi-pack-index at %s"), midx);
      
    -+	clear_midx_files_ext(r, ".bitmap", NULL);
    - 	clear_midx_files_ext(r, ".rev", NULL);
    ++	clear_midx_files_ext(r->objects->odb->path, ".bitmap", NULL);
    + 	clear_midx_files_ext(r->objects->odb->path, ".rev", NULL);
      
      	free(midx);
     
16:  a92af89884 = 18:  5d60b07e2e t5310: move some tests to lib-bitmap.sh
17:  d47aa4a919 = 19:  1a9c3538db t/helper/test-read-midx.c: add --checksum mode
18:  9d9d9f28a6 = 20:  8895114ace t5326: test multi-pack bitmap behavior
19:  3e0da7e5ed = 21:  94b1317e0c t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
20:  4e0d49a2dd = 22:  a4f4d90bba t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
21:  47eba8ecf9 = 23:  92a6370e77 t5319: don't write MIDX bitmaps in t5319
22:  3d78afa2ad = 24:  c49dc46fb2 t7700: update to work with MIDX bitmap test knob
23:  c2f94e033d = 25:  44a4800756 midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
24:  6b03016c99 = 26:  bf0981b606 p5310: extract full and partial bitmap tests
25:  d98faa4c2c = 27:  6888fe01aa p5326: perf tests for MIDX bitmaps
-- 
2.33.0.96.g73915697e6

^ permalink raw reply	[flat|nested] 273+ messages in thread

* [PATCH v5 01/27] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
@ 2021-08-31 20:51   ` Taylor Blau
  2021-08-31 20:51   ` [PATCH v5 02/27] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
                     ` (27 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:51 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

The special `--test-bitmap` mode of `git rev-list` is used to compare
the result of an object traversal with a bitmap to check its integrity.
This mode does not, however, assert that the types of reachable objects
are stored correctly.

Harden this mode by teaching it to also check that each time an object's
bit is marked, the corresponding bit should be set in exactly one of the
type bitmaps (whose type matches the object's true type).

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index d999616c9e..9b11af87aa 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1325,10 +1325,52 @@ void count_bitmap_commit_list(struct bitmap_index *bitmap_git,
 struct bitmap_test_data {
 	struct bitmap_index *bitmap_git;
 	struct bitmap *base;
+	struct bitmap *commits;
+	struct bitmap *trees;
+	struct bitmap *blobs;
+	struct bitmap *tags;
 	struct progress *prg;
 	size_t seen;
 };
 
+static void test_bitmap_type(struct bitmap_test_data *tdata,
+			     struct object *obj, int pos)
+{
+	enum object_type bitmap_type = OBJ_NONE;
+	int bitmaps_nr = 0;
+
+	if (bitmap_get(tdata->commits, pos)) {
+		bitmap_type = OBJ_COMMIT;
+		bitmaps_nr++;
+	}
+	if (bitmap_get(tdata->trees, pos)) {
+		bitmap_type = OBJ_TREE;
+		bitmaps_nr++;
+	}
+	if (bitmap_get(tdata->blobs, pos)) {
+		bitmap_type = OBJ_BLOB;
+		bitmaps_nr++;
+	}
+	if (bitmap_get(tdata->tags, pos)) {
+		bitmap_type = OBJ_TAG;
+		bitmaps_nr++;
+	}
+
+	if (bitmap_type == OBJ_NONE)
+		die("object %s not found in type bitmaps",
+		    oid_to_hex(&obj->oid));
+
+	if (bitmaps_nr > 1)
+		die("object %s does not have a unique type",
+		    oid_to_hex(&obj->oid));
+
+	if (bitmap_type != obj->type)
+		die("object %s: real type %s, expected: %s",
+		    oid_to_hex(&obj->oid),
+		    type_name(obj->type),
+		    type_name(bitmap_type));
+}
+
 static void test_show_object(struct object *object, const char *name,
 			     void *data)
 {
@@ -1338,6 +1380,7 @@ static void test_show_object(struct object *object, const char *name,
 	bitmap_pos = bitmap_position(tdata->bitmap_git, &object->oid);
 	if (bitmap_pos < 0)
 		die("Object not in bitmap: %s\n", oid_to_hex(&object->oid));
+	test_bitmap_type(tdata, object, bitmap_pos);
 
 	bitmap_set(tdata->base, bitmap_pos);
 	display_progress(tdata->prg, ++tdata->seen);
@@ -1352,6 +1395,7 @@ static void test_show_commit(struct commit *commit, void *data)
 				     &commit->object.oid);
 	if (bitmap_pos < 0)
 		die("Object not in bitmap: %s\n", oid_to_hex(&commit->object.oid));
+	test_bitmap_type(tdata, &commit->object, bitmap_pos);
 
 	bitmap_set(tdata->base, bitmap_pos);
 	display_progress(tdata->prg, ++tdata->seen);
@@ -1399,6 +1443,10 @@ void test_bitmap_walk(struct rev_info *revs)
 
 	tdata.bitmap_git = bitmap_git;
 	tdata.base = bitmap_new();
+	tdata.commits = ewah_to_bitmap(bitmap_git->commits);
+	tdata.trees = ewah_to_bitmap(bitmap_git->trees);
+	tdata.blobs = ewah_to_bitmap(bitmap_git->blobs);
+	tdata.tags = ewah_to_bitmap(bitmap_git->tags);
 	tdata.prg = start_progress("Verifying bitmap entries", result_popcnt);
 	tdata.seen = 0;
 
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 02/27] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
  2021-08-31 20:51   ` [PATCH v5 01/27] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
@ 2021-08-31 20:51   ` Taylor Blau
  2021-08-31 20:51   ` [PATCH v5 03/27] pack-bitmap-write.c: free existing bitmaps Taylor Blau
                     ` (26 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:51 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

The set of objects covered by a bitmap must be closed under
reachability, since it must be the case that there is a valid bit
position assigned for every possible reachable object (otherwise the
bitmaps would be incomplete).

Pack bitmaps are never written from 'git repack' unless repacking
all-into-one, and so we never write non-closed bitmaps (except in the
case of partial clones where we aren't guaranteed to have all objects).

But multi-pack bitmaps change this, since it isn't known whether the
set of objects in the MIDX is closed under reachability until walking
them. Plumb through a bit that is set when a reachable object isn't
found.

As soon as a reachable object isn't found in the set of objects to
include in the bitmap, bitmap_writer_build() knows that the set is not
closed, and so it now fails gracefully.

A test is added in t0410 to trigger a bitmap write without full
reachability closure by removing local copies of some reachable objects
from a promisor remote.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c   |  3 +-
 pack-bitmap-write.c      | 76 ++++++++++++++++++++++++++++------------
 pack-bitmap.h            |  2 +-
 t/t0410-partial-clone.sh |  9 ++++-
 4 files changed, 64 insertions(+), 26 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index df49f656b9..b63e06e46c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1256,7 +1256,8 @@ static void write_pack_file(void)
 
 				bitmap_writer_show_progress(progress);
 				bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1);
-				bitmap_writer_build(&to_pack);
+				if (bitmap_writer_build(&to_pack) < 0)
+					die(_("failed to write bitmap index"));
 				bitmap_writer_finish(written_list, nr_written,
 						     tmpname.buf, write_bitmap_options);
 				write_bitmap_index = 0;
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 88d9e696a5..d374f7884b 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -125,15 +125,20 @@ static inline void push_bitmapped_commit(struct commit *commit)
 	writer.selected_nr++;
 }
 
-static uint32_t find_object_pos(const struct object_id *oid)
+static uint32_t find_object_pos(const struct object_id *oid, int *found)
 {
 	struct object_entry *entry = packlist_find(writer.to_pack, oid);
 
 	if (!entry) {
-		die("Failed to write bitmap index. Packfile doesn't have full closure "
+		if (found)
+			*found = 0;
+		warning("Failed to write bitmap index. Packfile doesn't have full closure "
 			"(object %s is missing)", oid_to_hex(oid));
+		return 0;
 	}
 
+	if (found)
+		*found = 1;
 	return oe_in_pack_pos(writer.to_pack, entry);
 }
 
@@ -331,9 +336,10 @@ static void bitmap_builder_clear(struct bitmap_builder *bb)
 	bb->commits_nr = bb->commits_alloc = 0;
 }
 
-static void fill_bitmap_tree(struct bitmap *bitmap,
-			     struct tree *tree)
+static int fill_bitmap_tree(struct bitmap *bitmap,
+			    struct tree *tree)
 {
+	int found;
 	uint32_t pos;
 	struct tree_desc desc;
 	struct name_entry entry;
@@ -342,9 +348,11 @@ static void fill_bitmap_tree(struct bitmap *bitmap,
 	 * If our bit is already set, then there is nothing to do. Both this
 	 * tree and all of its children will be set.
 	 */
-	pos = find_object_pos(&tree->object.oid);
+	pos = find_object_pos(&tree->object.oid, &found);
+	if (!found)
+		return -1;
 	if (bitmap_get(bitmap, pos))
-		return;
+		return 0;
 	bitmap_set(bitmap, pos);
 
 	if (parse_tree(tree) < 0)
@@ -355,11 +363,15 @@ static void fill_bitmap_tree(struct bitmap *bitmap,
 	while (tree_entry(&desc, &entry)) {
 		switch (object_type(entry.mode)) {
 		case OBJ_TREE:
-			fill_bitmap_tree(bitmap,
-					 lookup_tree(the_repository, &entry.oid));
+			if (fill_bitmap_tree(bitmap,
+					     lookup_tree(the_repository, &entry.oid)) < 0)
+				return -1;
 			break;
 		case OBJ_BLOB:
-			bitmap_set(bitmap, find_object_pos(&entry.oid));
+			pos = find_object_pos(&entry.oid, &found);
+			if (!found)
+				return -1;
+			bitmap_set(bitmap, pos);
 			break;
 		default:
 			/* Gitlink, etc; not reachable */
@@ -368,15 +380,18 @@ static void fill_bitmap_tree(struct bitmap *bitmap,
 	}
 
 	free_tree_buffer(tree);
+	return 0;
 }
 
-static void fill_bitmap_commit(struct bb_commit *ent,
-			       struct commit *commit,
-			       struct prio_queue *queue,
-			       struct prio_queue *tree_queue,
-			       struct bitmap_index *old_bitmap,
-			       const uint32_t *mapping)
+static int fill_bitmap_commit(struct bb_commit *ent,
+			      struct commit *commit,
+			      struct prio_queue *queue,
+			      struct prio_queue *tree_queue,
+			      struct bitmap_index *old_bitmap,
+			      const uint32_t *mapping)
 {
+	int found;
+	uint32_t pos;
 	if (!ent->bitmap)
 		ent->bitmap = bitmap_new();
 
@@ -401,11 +416,16 @@ static void fill_bitmap_commit(struct bb_commit *ent,
 		 * Mark ourselves and queue our tree. The commit
 		 * walk ensures we cover all parents.
 		 */
-		bitmap_set(ent->bitmap, find_object_pos(&c->object.oid));
+		pos = find_object_pos(&c->object.oid, &found);
+		if (!found)
+			return -1;
+		bitmap_set(ent->bitmap, pos);
 		prio_queue_put(tree_queue, get_commit_tree(c));
 
 		for (p = c->parents; p; p = p->next) {
-			int pos = find_object_pos(&p->item->object.oid);
+			pos = find_object_pos(&p->item->object.oid, &found);
+			if (!found)
+				return -1;
 			if (!bitmap_get(ent->bitmap, pos)) {
 				bitmap_set(ent->bitmap, pos);
 				prio_queue_put(queue, p->item);
@@ -413,8 +433,12 @@ static void fill_bitmap_commit(struct bb_commit *ent,
 		}
 	}
 
-	while (tree_queue->nr)
-		fill_bitmap_tree(ent->bitmap, prio_queue_get(tree_queue));
+	while (tree_queue->nr) {
+		if (fill_bitmap_tree(ent->bitmap,
+				     prio_queue_get(tree_queue)) < 0)
+			return -1;
+	}
+	return 0;
 }
 
 static void store_selected(struct bb_commit *ent, struct commit *commit)
@@ -432,7 +456,7 @@ static void store_selected(struct bb_commit *ent, struct commit *commit)
 	kh_value(writer.bitmaps, hash_pos) = stored;
 }
 
-void bitmap_writer_build(struct packing_data *to_pack)
+int bitmap_writer_build(struct packing_data *to_pack)
 {
 	struct bitmap_builder bb;
 	size_t i;
@@ -441,6 +465,7 @@ void bitmap_writer_build(struct packing_data *to_pack)
 	struct prio_queue tree_queue = { NULL };
 	struct bitmap_index *old_bitmap;
 	uint32_t *mapping;
+	int closed = 1; /* until proven otherwise */
 
 	writer.bitmaps = kh_init_oid_map();
 	writer.to_pack = to_pack;
@@ -463,8 +488,11 @@ void bitmap_writer_build(struct packing_data *to_pack)
 		struct commit *child;
 		int reused = 0;
 
-		fill_bitmap_commit(ent, commit, &queue, &tree_queue,
-				   old_bitmap, mapping);
+		if (fill_bitmap_commit(ent, commit, &queue, &tree_queue,
+				       old_bitmap, mapping) < 0) {
+			closed = 0;
+			break;
+		}
 
 		if (ent->selected) {
 			store_selected(ent, commit);
@@ -499,7 +527,9 @@ void bitmap_writer_build(struct packing_data *to_pack)
 
 	stop_progress(&writer.progress);
 
-	compute_xor_offsets();
+	if (closed)
+		compute_xor_offsets();
+	return closed ? 0 : -1;
 }
 
 /**
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 99d733eb26..020cd8d868 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -87,7 +87,7 @@ struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
 				      struct commit *commit);
 void bitmap_writer_select_commits(struct commit **indexed_commits,
 		unsigned int indexed_commits_nr, int max_bitmaps);
-void bitmap_writer_build(struct packing_data *to_pack);
+int bitmap_writer_build(struct packing_data *to_pack);
 void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint32_t index_nr,
 			  const char *filename,
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index a211a66c67..bbcc51ee8e 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -536,7 +536,13 @@ test_expect_success 'gc does not repack promisor objects if there are none' '
 repack_and_check () {
 	rm -rf repo2 &&
 	cp -r repo repo2 &&
-	git -C repo2 repack $1 -d &&
+	if test x"$1" = "x--must-fail"
+	then
+		shift
+		test_must_fail git -C repo2 repack $1 -d
+	else
+		git -C repo2 repack $1 -d
+	fi &&
 	git -C repo2 fsck &&
 
 	git -C repo2 cat-file -e $2 &&
@@ -561,6 +567,7 @@ test_expect_success 'repack -d does not irreversibly delete promisor objects' '
 	printf "$THREE\n" | pack_as_from_promisor &&
 	delete_object repo "$ONE" &&
 
+	repack_and_check --must-fail -ab "$TWO" "$THREE" &&
 	repack_and_check -a "$TWO" "$THREE" &&
 	repack_and_check -A "$TWO" "$THREE" &&
 	repack_and_check -l "$TWO" "$THREE"
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 03/27] pack-bitmap-write.c: free existing bitmaps
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
  2021-08-31 20:51   ` [PATCH v5 01/27] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
  2021-08-31 20:51   ` [PATCH v5 02/27] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
@ 2021-08-31 20:51   ` Taylor Blau
  2021-08-31 20:51   ` [PATCH v5 04/27] Documentation: describe MIDX-based bitmaps Taylor Blau
                     ` (25 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:51 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

When writing a new bitmap, the bitmap writer code attempts to read the
existing bitmap (if one is present). This is done in order to quickly
permute the bits of any bitmaps for commits which appear in the existing
bitmap, and were also selected for the new bitmap.

But since this code was added in 341fa34887 (pack-bitmap-write: use
existing bitmaps, 2020-12-08), the resources associated with opening an
existing bitmap were never released.

It's fine to ignore this, but it's bad hygiene. It will also cause a
problem for the multi-pack-index builtin, which will be responsible not
only for writing bitmaps, but also for expiring any old multi-pack
bitmaps.

If an existing bitmap was reused here, it will also be expired. That
will cause a problem on platforms which require file resources to be
closed before unlinking them, like Windows. Avoid this by ensuring we
close reused bitmaps with free_bitmap_index() before removing them.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index d374f7884b..142fd0adb8 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -520,6 +520,7 @@ int bitmap_writer_build(struct packing_data *to_pack)
 	clear_prio_queue(&queue);
 	clear_prio_queue(&tree_queue);
 	bitmap_builder_clear(&bb);
+	free_bitmap_index(old_bitmap);
 	free(mapping);
 
 	trace2_region_leave("pack-bitmap-write", "building_bitmaps_total",
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 04/27] Documentation: describe MIDX-based bitmaps
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (2 preceding siblings ...)
  2021-08-31 20:51   ` [PATCH v5 03/27] pack-bitmap-write.c: free existing bitmaps Taylor Blau
@ 2021-08-31 20:51   ` Taylor Blau
  2021-08-31 20:51   ` [PATCH v5 05/27] midx: disallow running outside of a repository Taylor Blau
                     ` (24 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:51 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Update the technical documentation to describe the multi-pack bitmap
format. This patch merely introduces the new format, and describes its
high-level ideas. Git does not yet know how to read nor write these
multi-pack variants, and so the subsequent patches will:

  - Introduce code to interpret multi-pack bitmaps, according to this
    document.

  - Then, introduce code to write multi-pack bitmaps from the 'git
    multi-pack-index write' sub-command.

Finally, the implementation will gain tests in subsequent patches (as
opposed to inline with the patch teaching Git how to write multi-pack
bitmaps) to avoid a cyclic dependency.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/bitmap-format.txt    | 71 ++++++++++++++++----
 Documentation/technical/multi-pack-index.txt | 10 +--
 2 files changed, 60 insertions(+), 21 deletions(-)

diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index f8c18a0f7a..04b3ec2178 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -1,6 +1,44 @@
 GIT bitmap v1 format
 ====================
 
+== Pack and multi-pack bitmaps
+
+Bitmaps store reachability information about the set of objects in a packfile,
+or a multi-pack index (MIDX). The former is defined obviously, and the latter is
+defined as the union of objects in packs contained in the MIDX.
+
+A bitmap may belong to either one pack, or the repository's multi-pack index (if
+it exists). A repository may have at most one bitmap.
+
+An object is uniquely described by its bit position within a bitmap:
+
+	- If the bitmap belongs to a packfile, the __n__th bit corresponds to
+	the __n__th object in pack order. For a function `offset` which maps
+	objects to their byte offset within a pack, pack order is defined as
+	follows:
+
+		o1 <= o2 <==> offset(o1) <= offset(o2)
+
+	- If the bitmap belongs to a MIDX, the __n__th bit corresponds to the
+	__n__th object in MIDX order. With an additional function `pack` which
+	maps objects to the pack they were selected from by the MIDX, MIDX order
+	is defined as follows:
+
+		o1 <= o2 <==> pack(o1) <= pack(o2) /\ offset(o1) <= offset(o2)
+
+	The ordering between packs is done according to the MIDX's .rev file.
+	Notably, the preferred pack sorts ahead of all other packs.
+
+The on-disk representation (described below) of a bitmap is the same regardless
+of whether or not that bitmap belongs to a packfile or a MIDX. The only
+difference is the interpretation of the bits, which is described above.
+
+Certain bitmap extensions are supported (see: Appendix B). No extensions are
+required for bitmaps corresponding to packfiles. For bitmaps that correspond to
+MIDXs, both the bit-cache and rev-cache extensions are required.
+
+== On-disk format
+
 	- A header appears at the beginning:
 
 		4-byte signature: {'B', 'I', 'T', 'M'}
@@ -14,17 +52,19 @@ GIT bitmap v1 format
 			The following flags are supported:
 
 			- BITMAP_OPT_FULL_DAG (0x1) REQUIRED
-			This flag must always be present. It implies that the bitmap
-			index has been generated for a packfile with full closure
-			(i.e. where every single object in the packfile can find
-			 its parent links inside the same packfile). This is a
-			requirement for the bitmap index format, also present in JGit,
-			that greatly reduces the complexity of the implementation.
+			This flag must always be present. It implies that the
+			bitmap index has been generated for a packfile or
+			multi-pack index (MIDX) with full closure (i.e. where
+			every single object in the packfile/MIDX can find its
+			parent links inside the same packfile/MIDX). This is a
+			requirement for the bitmap index format, also present in
+			JGit, that greatly reduces the complexity of the
+			implementation.
 
 			- BITMAP_OPT_HASH_CACHE (0x4)
 			If present, the end of the bitmap file contains
 			`N` 32-bit name-hash values, one per object in the
-			pack. The format and meaning of the name-hash is
+			pack/MIDX. The format and meaning of the name-hash is
 			described below.
 
 		4-byte entry count (network byte order)
@@ -33,7 +73,8 @@ GIT bitmap v1 format
 
 		20-byte checksum
 
-			The SHA1 checksum of the pack this bitmap index belongs to.
+			The SHA1 checksum of the pack/MIDX this bitmap index
+			belongs to.
 
 	- 4 EWAH bitmaps that act as type indexes
 
@@ -50,7 +91,7 @@ GIT bitmap v1 format
 			- Tags
 
 		In each bitmap, the `n`th bit is set to true if the `n`th object
-		in the packfile is of that type.
+		in the packfile or multi-pack index is of that type.
 
 		The obvious consequence is that the OR of all 4 bitmaps will result
 		in a full set (all bits set), and the AND of all 4 bitmaps will
@@ -62,8 +103,9 @@ GIT bitmap v1 format
 		Each entry contains the following:
 
 		- 4-byte object position (network byte order)
-			The position **in the index for the packfile** where the
-			bitmap for this commit is found.
+			The position **in the index for the packfile or
+			multi-pack index** where the bitmap for this commit is
+			found.
 
 		- 1-byte XOR-offset
 			The xor offset used to compress this bitmap. For an entry
@@ -146,10 +188,11 @@ Name-hash cache
 ---------------
 
 If the BITMAP_OPT_HASH_CACHE flag is set, the end of the bitmap contains
-a cache of 32-bit values, one per object in the pack. The value at
+a cache of 32-bit values, one per object in the pack/MIDX. The value at
 position `i` is the hash of the pathname at which the `i`th object
-(counting in index order) in the pack can be found.  This can be fed
-into the delta heuristics to compare objects with similar pathnames.
+(counting in index or multi-pack index order) in the pack/MIDX can be found.
+This can be fed into the delta heuristics to compare objects with similar
+pathnames.
 
 The hash algorithm used is:
 
diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index fb688976c4..1a73c3ee20 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -71,14 +71,10 @@ Future Work
   still reducing the number of binary searches required for object
   lookups.
 
-- The reachability bitmap is currently paired directly with a single
-  packfile, using the pack-order as the object order to hopefully
-  compress the bitmaps well using run-length encoding. This could be
-  extended to pair a reachability bitmap with a multi-pack-index. If
-  the multi-pack-index is extended to store a "stable object order"
+- If the multi-pack-index is extended to store a "stable object order"
   (a function Order(hash) = integer that is constant for a given hash,
-  even as the multi-pack-index is updated) then a reachability bitmap
-  could point to a multi-pack-index and be updated independently.
+  even as the multi-pack-index is updated) then MIDX bitmaps could be
+  updated independently of the MIDX.
 
 - Packfiles can be marked as "special" using empty files that share
   the initial name but replace ".pack" with ".keep" or ".promisor".
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 05/27] midx: disallow running outside of a repository
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (3 preceding siblings ...)
  2021-08-31 20:51   ` [PATCH v5 04/27] Documentation: describe MIDX-based bitmaps Taylor Blau
@ 2021-08-31 20:51   ` Taylor Blau
  2021-08-31 20:51   ` [PATCH v5 06/27] midx: fix `*.rev` cleanups with `--object-dir` Taylor Blau
                     ` (23 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:51 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

The multi-pack-index command supports working with arbitrary object
directories via the `--object-dir` flag. Though this has historically
worked in arbitrary repositories (including when the command itself was
run outside of a Git repository), this has been somewhat of an accident.

For example, running:

    git multi-pack-index write --object-dir=/path/to/repo/objects

outside of a Git repository causes a BUG(). This is because the
top-level `cmd_multi_pack_index()` function stops parsing when it sees
"write", and then fills in the default object directory (the result of
calling `get_object_directory()`) before handing off to
`cmd_multi_pack_index_write()`. But there is no repository to
initialize, and so calling `get_object_directory()` results in a BUG()
(indicating that the current repository is not initialized).

Another case where this doesn't quite work as expected is when operating
in a SHA-256 repository. To see the failure, try this in your shell:

    git init --object-format=sha256 repo
    git -C repo commit --allow-empty base
    git -C repo repack -d

    git multi-pack-index --object-dir=$(pwd)/repo/.git/objects write

and observe that we cannot open the `.idx` file in "repo", because the
outermost process assumes that any repository that it works in also uses
the default value of `the_hash_algo` (at the time of writing, SHA-1).

There may be compelling reasons for trying to work around these bugs,
but working in arbitrary `--object-dir`'s is non-standard enough (and
likewise, these bugs prevalent enough) that I don't think any workflows
would be broken by abandoning this behavior.

Accordingly, restrict the `multi-pack-index` builtin to only work when
inside of a Git repository (i.e., its main utility becomes selecting
which alternate to operate in), which avoids both of the bugs above.

(Note that you can still trigger a bug when writing a MIDX in an
alternate which does not use the same object format as the repository
which it is an alternate of, but that is an unrelated bug to this one).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 git.c                       | 2 +-
 t/t5319-multi-pack-index.sh | 5 +++++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/git.c b/git.c
index 18bed9a996..60c2784be4 100644
--- a/git.c
+++ b/git.c
@@ -561,7 +561,7 @@ static struct cmd_struct commands[] = {
 	{ "merge-tree", cmd_merge_tree, RUN_SETUP | NO_PARSEOPT },
 	{ "mktag", cmd_mktag, RUN_SETUP | NO_PARSEOPT },
 	{ "mktree", cmd_mktree, RUN_SETUP },
-	{ "multi-pack-index", cmd_multi_pack_index, RUN_SETUP_GENTLY },
+	{ "multi-pack-index", cmd_multi_pack_index, RUN_SETUP },
 	{ "mv", cmd_mv, RUN_SETUP | NEED_WORK_TREE },
 	{ "name-rev", cmd_name_rev, RUN_SETUP },
 	{ "notes", cmd_notes, RUN_SETUP },
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 3d4d9f10c3..9034e94c0a 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -842,4 +842,9 @@ test_expect_success 'usage shown without sub-command' '
 	! test_i18ngrep "unrecognized subcommand" err
 '
 
+test_expect_success 'complains when run outside of a repository' '
+	nongit test_must_fail git multi-pack-index write 2>err &&
+	grep "not a git repository" err
+'
+
 test_done
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 06/27] midx: fix `*.rev` cleanups with `--object-dir`
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (4 preceding siblings ...)
  2021-08-31 20:51   ` [PATCH v5 05/27] midx: disallow running outside of a repository Taylor Blau
@ 2021-08-31 20:51   ` Taylor Blau
  2021-08-31 20:51   ` [PATCH v5 07/27] midx: clear auxiliary .rev after replacing the MIDX Taylor Blau
                     ` (22 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:51 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

If using --object-dir to point into an object directory which belongs to
a different repository than the one in the current working directory,
such as:

  git init repo
  git -C repo ... # add some objects
  cd alternate
  git multi-pack-index --object-dir ../repo/.git/objects write

the binary will segfault trying to access the object-dir via the repo it
found, but that's not fully initialized. Worse, if we later call
clear_midx_files_ext(), we will use `the_repository` and remove files
out of the wrong object directory.

Fix this by using the given object_dir (or the object directory of
`the_repository` if `--object-dir` wasn't given) to properly to clean up
the *.rev files, avoiding the crash.

Original-patch-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c                      | 10 +++++-----
 t/t5319-multi-pack-index.sh | 28 ++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/midx.c b/midx.c
index 321c6fdd2f..902e1a7a7d 100644
--- a/midx.c
+++ b/midx.c
@@ -882,7 +882,7 @@ static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
 	strbuf_release(&buf);
 }
 
-static void clear_midx_files_ext(struct repository *r, const char *ext,
+static void clear_midx_files_ext(const char *object_dir, const char *ext,
 				 unsigned char *keep_hash);
 
 static int midx_checksum_valid(struct multi_pack_index *m)
@@ -1086,7 +1086,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 
 	if (flags & MIDX_WRITE_REV_INDEX)
 		write_midx_reverse_index(midx_name, midx_hash, &ctx);
-	clear_midx_files_ext(the_repository, ".rev", midx_hash);
+	clear_midx_files_ext(object_dir, ".rev", midx_hash);
 
 	commit_lock_file(&lk);
 
@@ -1135,7 +1135,7 @@ static void clear_midx_file_ext(const char *full_path, size_t full_path_len,
 		die_errno(_("failed to remove %s"), full_path);
 }
 
-static void clear_midx_files_ext(struct repository *r, const char *ext,
+static void clear_midx_files_ext(const char *object_dir, const char *ext,
 				 unsigned char *keep_hash)
 {
 	struct clear_midx_data data;
@@ -1146,7 +1146,7 @@ static void clear_midx_files_ext(struct repository *r, const char *ext,
 				    hash_to_hex(keep_hash), ext);
 	data.ext = ext;
 
-	for_each_file_in_pack_dir(r->objects->odb->path,
+	for_each_file_in_pack_dir(object_dir,
 				  clear_midx_file_ext,
 				  &data);
 
@@ -1165,7 +1165,7 @@ void clear_midx_file(struct repository *r)
 	if (remove_path(midx))
 		die(_("failed to clear multi-pack-index at %s"), midx);
 
-	clear_midx_files_ext(r, ".rev", NULL);
+	clear_midx_files_ext(r->objects->odb->path, ".rev", NULL);
 
 	free(midx);
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 9034e94c0a..e953cdd6d1 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -201,6 +201,34 @@ test_expect_success 'write midx with twelve packs' '
 
 compare_results_with_midx "twelve packs"
 
+test_expect_success 'multi-pack-index *.rev cleanup with --object-dir' '
+	git init repo &&
+	git clone -s repo alternate &&
+
+	test_when_finished "rm -rf repo alternate" &&
+
+	(
+		cd repo &&
+		test_commit base &&
+		git repack -d
+	) &&
+
+	ours="alternate/.git/objects/pack/multi-pack-index-123.rev" &&
+	theirs="repo/.git/objects/pack/multi-pack-index-abc.rev" &&
+	touch "$ours" "$theirs" &&
+
+	(
+		cd alternate &&
+		git multi-pack-index --object-dir ../repo/.git/objects write
+	) &&
+
+	# writing a midx in "repo" should not remove the .rev file in the
+	# alternate
+	test_path_is_file repo/.git/objects/pack/multi-pack-index &&
+	test_path_is_file $ours &&
+	test_path_is_missing $theirs
+'
+
 test_expect_success 'warn on improper hash version' '
 	git init --object-format=sha1 sha1 &&
 	(
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 07/27] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (5 preceding siblings ...)
  2021-08-31 20:51   ` [PATCH v5 06/27] midx: fix `*.rev` cleanups with `--object-dir` Taylor Blau
@ 2021-08-31 20:51   ` Taylor Blau
  2021-08-31 20:52   ` [PATCH v5 08/27] midx: reject empty `--preferred-pack`'s Taylor Blau
                     ` (21 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:51 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

When writing a new multi-pack index, write_midx_internal() attempts to
clean up any auxiliary files (currently just the MIDX's `.rev` file, but
soon to include a `.bitmap`, too) corresponding to the MIDX it's
replacing.

This step should happen after the new MIDX is written into place, since
doing so beforehand means that the old MIDX could be read without its
corresponding .rev file.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 902e1a7a7d..0bcb403bae 100644
--- a/midx.c
+++ b/midx.c
@@ -1086,10 +1086,11 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 
 	if (flags & MIDX_WRITE_REV_INDEX)
 		write_midx_reverse_index(midx_name, midx_hash, &ctx);
-	clear_midx_files_ext(object_dir, ".rev", midx_hash);
 
 	commit_lock_file(&lk);
 
+	clear_midx_files_ext(object_dir, ".rev", midx_hash);
+
 cleanup:
 	for (i = 0; i < ctx.nr; i++) {
 		if (ctx.info[i].p) {
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 08/27] midx: reject empty `--preferred-pack`'s
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (6 preceding siblings ...)
  2021-08-31 20:51   ` [PATCH v5 07/27] midx: clear auxiliary .rev after replacing the MIDX Taylor Blau
@ 2021-08-31 20:52   ` Taylor Blau
  2021-08-31 20:52   ` [PATCH v5 09/27] midx: infer preferred pack when not given one Taylor Blau
                     ` (20 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:52 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

The soon-to-be-implemented multi-pack bitmap treats object in the first
bit position specially by assuming that all objects in the pack it was
selected from are also represented from that pack in the MIDX. In other
words, the pack from which the first object was selected must also have
all of its other objects selected from that same pack in the MIDX in
case of any duplicates.

But this assumption relies on the fact that there is at least one object
in that pack to begin with; otherwise the object in the first bit
position isn't from a preferred pack, in which case we can no longer
assume that all objects in that pack were also selected from the same
pack.

Guard this assumption by checking the number of objects in the given
preferred pack, and failing if the given pack is empty.

To make sure we can safely perform this check, open any packs which are
contained in an existing MIDX via prepare_midx_pack(). The same is done
for new packs via the add_pack_to_midx() callback, but packs picked up
from a previous MIDX will not yet have these opened.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt |  6 +++---
 midx.c                                 | 29 ++++++++++++++++++++++++++
 t/t5319-multi-pack-index.sh            | 17 +++++++++++++++
 3 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index ffd601bc17..c9b063d31e 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -37,9 +37,9 @@ write::
 --
 	--preferred-pack=<pack>::
 		Optionally specify the tie-breaking pack used when
-		multiple packs contain the same object. If not given,
-		ties are broken in favor of the pack with the lowest
-		mtime.
+		multiple packs contain the same object. `<pack>` must
+		contain at least one object. If not given, ties are
+		broken in favor of the pack with the lowest mtime.
 --
 
 verify::
diff --git a/midx.c b/midx.c
index 0bcb403bae..26089ec9c7 100644
--- a/midx.c
+++ b/midx.c
@@ -934,6 +934,25 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 			ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]);
 			ctx.info[ctx.nr].p = NULL;
 			ctx.info[ctx.nr].expired = 0;
+
+			if (flags & MIDX_WRITE_REV_INDEX) {
+				/*
+				 * If generating a reverse index, need to have
+				 * packed_git's loaded to compare their
+				 * mtimes and object count.
+				 */
+				if (prepare_midx_pack(the_repository, ctx.m, i)) {
+					error(_("could not load pack"));
+					result = 1;
+					goto cleanup;
+				}
+
+				if (open_pack_index(ctx.m->packs[i]))
+					die(_("could not open index for %s"),
+					    ctx.m->packs[i]->pack_name);
+				ctx.info[ctx.nr].p = ctx.m->packs[i];
+			}
+
 			ctx.nr++;
 		}
 	}
@@ -961,6 +980,16 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 		}
 	}
 
+	if (ctx.preferred_pack_idx > -1) {
+		struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p;
+		if (!preferred->num_objects) {
+			error(_("cannot select preferred pack %s with no objects"),
+			      preferred->pack_name);
+			result = 1;
+			goto cleanup;
+		}
+	}
+
 	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
 					 ctx.preferred_pack_idx);
 
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index e953cdd6d1..d7e4988f2b 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -305,6 +305,23 @@ test_expect_success 'midx picks objects from preferred pack' '
 	)
 '
 
+test_expect_success 'preferred packs must be non-empty' '
+	test_when_finished rm -rf preferred.git &&
+	git init preferred.git &&
+	(
+		cd preferred.git &&
+
+		test_commit base &&
+		git repack -ad &&
+
+		empty="$(git pack-objects $objdir/pack/pack </dev/null)" &&
+
+		test_must_fail git multi-pack-index write \
+			--preferred-pack=pack-$empty.pack 2>err &&
+		grep "with no objects" err
+	)
+'
+
 test_expect_success 'verify multi-pack-index success' '
 	git multi-pack-index verify --object-dir=$objdir
 '
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 09/27] midx: infer preferred pack when not given one
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (7 preceding siblings ...)
  2021-08-31 20:52   ` [PATCH v5 08/27] midx: reject empty `--preferred-pack`'s Taylor Blau
@ 2021-08-31 20:52   ` Taylor Blau
  2021-08-31 20:52   ` [PATCH v5 10/27] midx: close linked MIDXs, avoid leaking memory Taylor Blau
                     ` (19 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:52 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

In 9218c6a40c (midx: allow marking a pack as preferred, 2021-03-30), the
multi-pack index code learned how to select a pack which all duplicate
objects are selected from. That is, if an object appears in multiple
packs, select the copy in the preferred pack before breaking ties
according to the other rules like pack mtime and readdir() order.

Not specifying a preferred pack can cause serious problems with
multi-pack reachability bitmaps, because these bitmaps rely on having at
least one pack from which all duplicates are selected. Not having such a
pack causes problems with the code in pack-objects to reuse packs
verbatim (e.g., that code assumes that a delta object in a chunk of pack
sent verbatim will have its base object sent from the same pack).

So why does not marking a pack preferred cause problems here? The reason
is roughly as follows:

  - Ties are broken (when handling duplicate objects) by sorting
    according to midx_oid_compare(), which sorts objects by OID,
    preferred-ness, pack mtime, and finally pack ID (more on that
    later).

  - The psuedo pack-order (described in
    Documentation/technical/pack-format.txt under the section
    "multi-pack-index reverse indexes") is computed by
    midx_pack_order(), and sorts by pack ID and pack offset, with
    preferred packs sorting first.

  - But! Pack IDs come from incrementing the pack count in
    add_pack_to_midx(), which is a callback to
    for_each_file_in_pack_dir(), meaning that pack IDs are assigned in
    readdir() order.

When specifying a preferred pack, all of that works fine, because
duplicate objects are correctly resolved in favor of the copy in the
preferred pack, and the preferred pack sorts first in the object order.

"Sorting first" is critical, because the bitmap code relies on finding
out which pack holds the first object in the MIDX's pseudo pack-order to
determine which pack is preferred.

But if we didn't specify a preferred pack, and the pack which comes
first in readdir() order does not also have the lowest timestamp, then
it's possible that that pack (the one that sorts first in pseudo-pack
order, which the bitmap code will treat as the preferred one) did *not*
have all duplicate objects resolved in its favor, resulting in breakage.

The fix is simple: pick a (semi-arbitrary, non-empty) preferred pack
when none was specified. This forces that pack to have duplicates
resolved in its favor, and (critically) to sort first in pseudo-pack
order.  Unfortunately, testing this behavior portably isn't possible,
since it depends on readdir() order which isn't guaranteed by POSIX.

(Note that multi-pack reachability bitmaps have yet to be implemented;
so in that sense this patch is fixing a bug which does not yet exist.
But by having this patch beforehand, we can prevent the bug from ever
materializing.)

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 50 ++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 44 insertions(+), 6 deletions(-)

diff --git a/midx.c b/midx.c
index 26089ec9c7..67de1dbaeb 100644
--- a/midx.c
+++ b/midx.c
@@ -969,15 +969,57 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
 		goto cleanup;
 
-	ctx.preferred_pack_idx = -1;
 	if (preferred_pack_name) {
+		int found = 0;
 		for (i = 0; i < ctx.nr; i++) {
 			if (!cmp_idx_or_pack_name(preferred_pack_name,
 						  ctx.info[i].pack_name)) {
 				ctx.preferred_pack_idx = i;
+				found = 1;
 				break;
 			}
 		}
+
+		if (!found)
+			warning(_("unknown preferred pack: '%s'"),
+				preferred_pack_name);
+	} else if (ctx.nr && (flags & MIDX_WRITE_REV_INDEX)) {
+		struct packed_git *oldest = ctx.info[ctx.preferred_pack_idx].p;
+		ctx.preferred_pack_idx = 0;
+
+		if (packs_to_drop && packs_to_drop->nr)
+			BUG("cannot write a MIDX bitmap during expiration");
+
+		/*
+		 * set a preferred pack when writing a bitmap to ensure that
+		 * the pack from which the first object is selected in pseudo
+		 * pack-order has all of its objects selected from that pack
+		 * (and not another pack containing a duplicate)
+		 */
+		for (i = 1; i < ctx.nr; i++) {
+			struct packed_git *p = ctx.info[i].p;
+
+			if (!oldest->num_objects || p->mtime < oldest->mtime) {
+				oldest = p;
+				ctx.preferred_pack_idx = i;
+			}
+		}
+
+		if (!oldest->num_objects) {
+			/*
+			 * If all packs are empty; unset the preferred index.
+			 * This is acceptable since there will be no duplicate
+			 * objects to resolve, so the preferred value doesn't
+			 * matter.
+			 */
+			ctx.preferred_pack_idx = -1;
+		}
+	} else {
+		/*
+		 * otherwise don't mark any pack as preferred to avoid
+		 * interfering with expiration logic below
+		 */
+		ctx.preferred_pack_idx = -1;
 	}
 
 	if (ctx.preferred_pack_idx > -1) {
@@ -1058,11 +1100,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 						      ctx.info, ctx.nr,
 						      sizeof(*ctx.info),
 						      idx_or_pack_name_cmp);
-
-		if (!preferred)
-			warning(_("unknown preferred pack: '%s'"),
-				preferred_pack_name);
-		else {
+		if (preferred) {
 			uint32_t perm = ctx.pack_perm[preferred->orig_pack_int_id];
 			if (perm == PACK_EXPIRED)
 				warning(_("preferred pack '%s' is expired"),
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 10/27] midx: close linked MIDXs, avoid leaking memory
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (8 preceding siblings ...)
  2021-08-31 20:52   ` [PATCH v5 09/27] midx: infer preferred pack when not given one Taylor Blau
@ 2021-08-31 20:52   ` Taylor Blau
  2021-08-31 20:52   ` [PATCH v5 11/27] midx: avoid opening multiple MIDXs when writing Taylor Blau
                     ` (18 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:52 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

When a repository has at least one alternate, the MIDX belonging to each
alternate is accessed through the `next` pointer on the main object
store's copy of the MIDX. close_midx() didn't bother to close any
of the linked MIDXs. It likewise didn't free the memory pointed to by
`m`, leaving uninitialized bytes with live pointers to them left around
in the heap.

Clean this up by closing linked MIDXs, and freeing up the memory pointed
to by each of them. When callers call close_midx(), then they can
discard the entire linked list of MIDXs and set their pointer to the
head of that list to NULL.

This isn't strictly required for the upcoming patches, but it makes it
much more difficult (though still possible, for e.g., by calling
`close_midx(m->next)` which leaves `m->next` pointing at uninitialized
bytes) to have pointers to uninitialized memory.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/midx.c b/midx.c
index 67de1dbaeb..e83f22b5ee 100644
--- a/midx.c
+++ b/midx.c
@@ -195,6 +195,8 @@ void close_midx(struct multi_pack_index *m)
 	if (!m)
 		return;
 
+	close_midx(m->next);
+
 	munmap((unsigned char *)m->data, m->data_len);
 
 	for (i = 0; i < m->num_packs; i++) {
@@ -203,6 +205,7 @@ void close_midx(struct multi_pack_index *m)
 	}
 	FREE_AND_NULL(m->packs);
 	FREE_AND_NULL(m->pack_names);
+	free(m);
 }
 
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id)
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 11/27] midx: avoid opening multiple MIDXs when writing
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (9 preceding siblings ...)
  2021-08-31 20:52   ` [PATCH v5 10/27] midx: close linked MIDXs, avoid leaking memory Taylor Blau
@ 2021-08-31 20:52   ` Taylor Blau
  2021-08-31 20:52   ` [PATCH v5 12/27] pack-bitmap.c: introduce 'bitmap_num_objects()' Taylor Blau
                     ` (17 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:52 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Opening multiple instance of the same MIDX can lead to problems like two
separate packed_git structures which represent the same pack being added
to the repository's object store.

The above scenario can happen because prepare_midx_pack() checks if
`m->packs[pack_int_id]` is NULL in order to determine if a pack has been
opened and installed in the repository before. But a caller can
construct two copies of the same MIDX by calling get_multi_pack_index()
and load_multi_pack_index() since the former manipulates the
object store directly but the latter is a lower-level routine which
allocates a new MIDX for each call.

So if prepare_midx_pack() is called on multiple MIDXs with the same
pack_int_id, then that pack will be installed twice in the object
store's packed_git pointer.

This can lead to problems in, for e.g., the pack-bitmap code, which does
something like the following (in pack-bitmap.c:open_pack_bitmap()):

    struct bitmap_index *bitmap_git = ...;
    for (p = get_all_packs(r); p; p = p->next) {
      if (open_pack_bitmap_1(bitmap_git, p) == 0)
        ret = 0;
    }

which is a problem if two copies of the same pack exist in the
packed_git list because pack-bitmap.c:open_pack_bitmap_1() contains a
conditional like the following:

    if (bitmap_git->pack || bitmap_git->midx) {
      /* ignore extra bitmap file; we can only handle one */
      warning("ignoring extra bitmap file: %s", packfile->pack_name);
      close(fd);
      return -1;
    }

Avoid this scenario by not letting write_midx_internal() open a MIDX
that isn't also pointed at by the object store. So long as this is the
case, other routines should prefer to open MIDXs with
get_multi_pack_index() or reprepare_packed_git() instead of creating
instances on their own. Because get_multi_pack_index() returns
`r->object_store->multi_pack_index` if it is non-NULL, we'll only have
one instance of a MIDX open at one time, avoiding these problems.

To encourage this, drop the `struct multi_pack_index *` parameter from
`write_midx_internal()`, and rely instead on the `object_dir` to find
(or initialize) the correct MIDX instance.

Likewise, replace the call to `close_midx()` with
`close_object_store()`, since we're about to replace the MIDX with a new
one and should invalidate the object store's memory of any MIDX that
might have existed beforehand.

Note that this now forbids passing object directories that don't belong
to alternate repositories over `--object-dir`, since before we would
have happily opened a MIDX in any directory, but now restrict ourselves
to only those reachable by `r->objects->multi_pack_index` (and alternate
MIDXs that we can see by walking the `next` pointer).

As far as I can tell, supporting arbitrary directories with
`--object-dir` was a historical accident, since even the documentation
says `<alt>` when referring to the value passed to this option.

A future patch could clean this up and provide a warning() when a
non-alternate directory was given, since we'll still write a new MIDX
there, we just won't reuse any MIDX that might happen to already exist
in that directory.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt |  2 ++
 midx.c                                 | 26 +++++++++++++++-----------
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index c9b063d31e..0af6beb2dd 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -23,6 +23,8 @@ OPTIONS
 	Use given directory for the location of Git objects. We check
 	`<dir>/packs/multi-pack-index` for the current MIDX file, and
 	`<dir>/packs` for the pack-files to index.
++
+`<dir>` must be an alternate of the current repository.
 
 --[no-]progress::
 	Turn progress on/off explicitly. If neither is specified, progress is
diff --git a/midx.c b/midx.c
index e83f22b5ee..43510290dc 100644
--- a/midx.c
+++ b/midx.c
@@ -893,7 +893,7 @@ static int midx_checksum_valid(struct multi_pack_index *m)
 	return hashfile_checksum_valid(m->data, m->data_len);
 }
 
-static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
+static int write_midx_internal(const char *object_dir,
 			       struct string_list *packs_to_drop,
 			       const char *preferred_pack_name,
 			       unsigned flags)
@@ -904,6 +904,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	struct hashfile *f = NULL;
 	struct lock_file lk;
 	struct write_midx_context ctx = { 0 };
+	struct multi_pack_index *cur;
 	int pack_name_concat_len = 0;
 	int dropped_packs = 0;
 	int result = 0;
@@ -914,10 +915,12 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 		die_errno(_("unable to create leading directories of %s"),
 			  midx_name);
 
-	if (m)
-		ctx.m = m;
-	else
-		ctx.m = load_multi_pack_index(object_dir, 1);
+	for (cur = get_multi_pack_index(the_repository); cur; cur = cur->next) {
+		if (!strcmp(object_dir, cur->object_dir)) {
+			ctx.m = cur;
+			break;
+		}
+	}
 
 	if (ctx.m && !midx_checksum_valid(ctx.m)) {
 		warning(_("ignoring existing multi-pack-index; checksum mismatch"));
@@ -1119,7 +1122,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
 
 	if (ctx.m)
-		close_midx(ctx.m);
+		close_object_store(the_repository->objects);
 
 	if (ctx.nr - dropped_packs == 0) {
 		error(_("no pack files to index."));
@@ -1182,8 +1185,7 @@ int write_midx_file(const char *object_dir,
 		    const char *preferred_pack_name,
 		    unsigned flags)
 {
-	return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name,
-				   flags);
+	return write_midx_internal(object_dir, NULL, preferred_pack_name, flags);
 }
 
 struct clear_midx_data {
@@ -1461,8 +1463,10 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
 
 	free(count);
 
-	if (packs_to_drop.nr)
-		result = write_midx_internal(object_dir, m, &packs_to_drop, NULL, flags);
+	if (packs_to_drop.nr) {
+		result = write_midx_internal(object_dir, &packs_to_drop, NULL, flags);
+		m = NULL;
+	}
 
 	string_list_clear(&packs_to_drop, 0);
 	return result;
@@ -1651,7 +1655,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 		goto cleanup;
 	}
 
-	result = write_midx_internal(object_dir, m, NULL, NULL, flags);
+	result = write_midx_internal(object_dir, NULL, NULL, flags);
 	m = NULL;
 
 cleanup:
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 12/27] pack-bitmap.c: introduce 'bitmap_num_objects()'
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (10 preceding siblings ...)
  2021-08-31 20:52   ` [PATCH v5 11/27] midx: avoid opening multiple MIDXs when writing Taylor Blau
@ 2021-08-31 20:52   ` Taylor Blau
  2021-08-31 20:52   ` [PATCH v5 13/27] pack-bitmap.c: introduce 'nth_bitmap_object_oid()' Taylor Blau
                     ` (16 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:52 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A subsequent patch to support reading MIDX bitmaps will be less noisy
after extracting a generic function to return how many objects are
contained in a bitmap.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 37 +++++++++++++++++++++----------------
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 9b11af87aa..65356f9657 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -136,6 +136,11 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
 	return b;
 }
 
+static uint32_t bitmap_num_objects(struct bitmap_index *index)
+{
+	return index->pack->num_objects;
+}
+
 static int load_bitmap_header(struct bitmap_index *index)
 {
 	struct bitmap_disk_header *header = (void *)index->map;
@@ -154,7 +159,7 @@ static int load_bitmap_header(struct bitmap_index *index)
 	/* Parse known bitmap format options */
 	{
 		uint32_t flags = ntohs(header->options);
-		size_t cache_size = st_mult(index->pack->num_objects, sizeof(uint32_t));
+		size_t cache_size = st_mult(bitmap_num_objects(index), sizeof(uint32_t));
 		unsigned char *index_end = index->map + index->map_size - the_hash_algo->rawsz;
 
 		if ((flags & BITMAP_OPT_FULL_DAG) == 0)
@@ -404,7 +409,7 @@ static inline int bitmap_position_extended(struct bitmap_index *bitmap_git,
 
 	if (pos < kh_end(positions)) {
 		int bitmap_pos = kh_value(positions, pos);
-		return bitmap_pos + bitmap_git->pack->num_objects;
+		return bitmap_pos + bitmap_num_objects(bitmap_git);
 	}
 
 	return -1;
@@ -456,7 +461,7 @@ static int ext_index_add_object(struct bitmap_index *bitmap_git,
 		bitmap_pos = kh_value(eindex->positions, hash_pos);
 	}
 
-	return bitmap_pos + bitmap_git->pack->num_objects;
+	return bitmap_pos + bitmap_num_objects(bitmap_git);
 }
 
 struct bitmap_show_data {
@@ -673,7 +678,7 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
 	for (i = 0; i < eindex->count; ++i) {
 		struct object *obj;
 
-		if (!bitmap_get(objects, bitmap_git->pack->num_objects + i))
+		if (!bitmap_get(objects, bitmap_num_objects(bitmap_git) + i))
 			continue;
 
 		obj = eindex->objects[i];
@@ -832,7 +837,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
 	 * them individually.
 	 */
 	for (i = 0; i < eindex->count; i++) {
-		uint32_t pos = i + bitmap_git->pack->num_objects;
+		uint32_t pos = i + bitmap_num_objects(bitmap_git);
 		if (eindex->objects[i]->type == type &&
 		    bitmap_get(to_filter, pos) &&
 		    !bitmap_get(tips, pos))
@@ -859,7 +864,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 
 	oi.sizep = &size;
 
-	if (pos < pack->num_objects) {
+	if (pos < bitmap_num_objects(bitmap_git)) {
 		off_t ofs = pack_pos_to_offset(pack, pos);
 		if (packed_object_info(the_repository, pack, ofs, &oi) < 0) {
 			struct object_id oid;
@@ -869,7 +874,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 		}
 	} else {
 		struct eindex *eindex = &bitmap_git->ext_index;
-		struct object *obj = eindex->objects[pos - pack->num_objects];
+		struct object *obj = eindex->objects[pos - bitmap_num_objects(bitmap_git)];
 		if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
 			die(_("unable to get size of %s"), oid_to_hex(&obj->oid));
 	}
@@ -911,7 +916,7 @@ static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
 	}
 
 	for (i = 0; i < eindex->count; i++) {
-		uint32_t pos = i + bitmap_git->pack->num_objects;
+		uint32_t pos = i + bitmap_num_objects(bitmap_git);
 		if (eindex->objects[i]->type == OBJ_BLOB &&
 		    bitmap_get(to_filter, pos) &&
 		    !bitmap_get(tips, pos) &&
@@ -1137,8 +1142,8 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 	enum object_type type;
 	unsigned long size;
 
-	if (pos >= bitmap_git->pack->num_objects)
-		return; /* not actually in the pack */
+	if (pos >= bitmap_num_objects(bitmap_git))
+		return; /* not actually in the pack or MIDX */
 
 	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
 	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
@@ -1204,6 +1209,7 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 	struct pack_window *w_curs = NULL;
 	size_t i = 0;
 	uint32_t offset;
+	uint32_t objects_nr = bitmap_num_objects(bitmap_git);
 
 	assert(result);
 
@@ -1211,8 +1217,8 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 		i++;
 
 	/* Don't mark objects not in the packfile */
-	if (i > bitmap_git->pack->num_objects / BITS_IN_EWORD)
-		i = bitmap_git->pack->num_objects / BITS_IN_EWORD;
+	if (i > objects_nr / BITS_IN_EWORD)
+		i = objects_nr / BITS_IN_EWORD;
 
 	reuse = bitmap_word_alloc(i);
 	memset(reuse->words, 0xFF, i * sizeof(eword_t));
@@ -1296,7 +1302,7 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
 
 	for (i = 0; i < eindex->count; ++i) {
 		if (eindex->objects[i]->type == type &&
-			bitmap_get(objects, bitmap_git->pack->num_objects + i))
+			bitmap_get(objects, bitmap_num_objects(bitmap_git) + i))
 			count++;
 	}
 
@@ -1517,7 +1523,7 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 	uint32_t i, num_objects;
 	uint32_t *reposition;
 
-	num_objects = bitmap_git->pack->num_objects;
+	num_objects = bitmap_num_objects(bitmap_git);
 	CALLOC_ARRAY(reposition, num_objects);
 
 	for (i = 0; i < num_objects; ++i) {
@@ -1600,7 +1606,6 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
 static off_t get_disk_usage_for_extended(struct bitmap_index *bitmap_git)
 {
 	struct bitmap *result = bitmap_git->result;
-	struct packed_git *pack = bitmap_git->pack;
 	struct eindex *eindex = &bitmap_git->ext_index;
 	off_t total = 0;
 	struct object_info oi = OBJECT_INFO_INIT;
@@ -1612,7 +1617,7 @@ static off_t get_disk_usage_for_extended(struct bitmap_index *bitmap_git)
 	for (i = 0; i < eindex->count; i++) {
 		struct object *obj = eindex->objects[i];
 
-		if (!bitmap_get(result, pack->num_objects + i))
+		if (!bitmap_get(result, bitmap_num_objects(bitmap_git) + i))
 			continue;
 
 		if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 13/27] pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (11 preceding siblings ...)
  2021-08-31 20:52   ` [PATCH v5 12/27] pack-bitmap.c: introduce 'bitmap_num_objects()' Taylor Blau
@ 2021-08-31 20:52   ` Taylor Blau
  2021-08-31 20:52   ` [PATCH v5 14/27] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' Taylor Blau
                     ` (15 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:52 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A subsequent patch to support reading MIDX bitmaps will be less noisy
after extracting a generic function to fetch the nth OID contained in
the bitmap.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 65356f9657..612f62da97 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -223,6 +223,13 @@ static inline uint8_t read_u8(const unsigned char *buffer, size_t *pos)
 
 #define MAX_XOR_OFFSET 160
 
+static int nth_bitmap_object_oid(struct bitmap_index *index,
+				 struct object_id *oid,
+				 uint32_t n)
+{
+	return nth_packed_object_id(oid, index->pack, n);
+}
+
 static int load_bitmap_entries_v1(struct bitmap_index *index)
 {
 	uint32_t i;
@@ -242,7 +249,7 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 		xor_offset = read_u8(index->map, &index->map_pos);
 		flags = read_u8(index->map, &index->map_pos);
 
-		if (nth_packed_object_id(&oid, index->pack, commit_idx_pos) < 0)
+		if (nth_bitmap_object_oid(index, &oid, commit_idx_pos) < 0)
 			return error("corrupt ewah bitmap: commit index %u out of range",
 				     (unsigned)commit_idx_pos);
 
@@ -868,8 +875,8 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 		off_t ofs = pack_pos_to_offset(pack, pos);
 		if (packed_object_info(the_repository, pack, ofs, &oi) < 0) {
 			struct object_id oid;
-			nth_packed_object_id(&oid, pack,
-					     pack_pos_to_index(pack, pos));
+			nth_bitmap_object_oid(bitmap_git, &oid,
+					      pack_pos_to_index(pack, pos));
 			die(_("unable to get size of %s"), oid_to_hex(&oid));
 		}
 	} else {
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 14/27] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (12 preceding siblings ...)
  2021-08-31 20:52   ` [PATCH v5 13/27] pack-bitmap.c: introduce 'nth_bitmap_object_oid()' Taylor Blau
@ 2021-08-31 20:52   ` Taylor Blau
  2021-08-31 20:52   ` [PATCH v5 15/27] pack-bitmap.c: avoid redundant calls to try_partial_reuse Taylor Blau
                     ` (14 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:52 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

In a recent commit, pack-objects learned support for the
'pack.preferBitmapTips' configuration. This patch prepares the
multi-pack bitmap code to respect this configuration, too.

The yet-to-be implemented code will find that it is more efficient to
check whether each reference contains a prefix found in the configured
set of values rather than doing an additional traversal.

Implement a function 'bitmap_is_preferred_refname()' which will perform
that check. Its caller will be added in a subsequent patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 16 ++++++++++++++++
 pack-bitmap.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 612f62da97..d5296750eb 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1658,3 +1658,19 @@ const struct string_list *bitmap_preferred_tips(struct repository *r)
 {
 	return repo_config_get_value_multi(r, "pack.preferbitmaptips");
 }
+
+int bitmap_is_preferred_refname(struct repository *r, const char *refname)
+{
+	const struct string_list *preferred_tips = bitmap_preferred_tips(r);
+	struct string_list_item *item;
+
+	if (!preferred_tips)
+		return 0;
+
+	for_each_string_list_item(item, preferred_tips) {
+		if (starts_with(refname, item->string))
+			return 1;
+	}
+
+	return 0;
+}
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 020cd8d868..52ea10de51 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -94,5 +94,6 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint16_t options);
 
 const struct string_list *bitmap_preferred_tips(struct repository *r);
+int bitmap_is_preferred_refname(struct repository *r, const char *refname);
 
 #endif
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 15/27] pack-bitmap.c: avoid redundant calls to try_partial_reuse
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (13 preceding siblings ...)
  2021-08-31 20:52   ` [PATCH v5 14/27] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' Taylor Blau
@ 2021-08-31 20:52   ` Taylor Blau
  2021-08-31 20:52   ` [PATCH v5 16/27] pack-bitmap: read multi-pack bitmaps Taylor Blau
                     ` (13 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:52 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

try_partial_reuse() is used to mark any bits in the beginning of a
bitmap whose objects can be reused verbatim from the pack they came
from.

Currently this function returns void, and signals nothing to the caller
when bits could not be reused. But multi-pack bitmaps would benefit from
having such a signal, because they may try to pass objects which are in
bounds, but from a pack other than the preferred one.

Any extra calls are noops because of a conditional in
reuse_partial_packfile_from_bitmap(), but those loop iterations can be
avoided by letting try_partial_reuse() indicate when it can't accept any
more bits for reuse, and then listening to that signal.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 40 +++++++++++++++++++++++++++++-----------
 1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index d5296750eb..4e37f5d574 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1140,22 +1140,26 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	return NULL;
 }
 
-static void try_partial_reuse(struct bitmap_index *bitmap_git,
-			      size_t pos,
-			      struct bitmap *reuse,
-			      struct pack_window **w_curs)
+/*
+ * -1 means "stop trying further objects"; 0 means we may or may not have
+ * reused, but you can keep feeding bits.
+ */
+static int try_partial_reuse(struct bitmap_index *bitmap_git,
+			     size_t pos,
+			     struct bitmap *reuse,
+			     struct pack_window **w_curs)
 {
 	off_t offset, header;
 	enum object_type type;
 	unsigned long size;
 
 	if (pos >= bitmap_num_objects(bitmap_git))
-		return; /* not actually in the pack or MIDX */
+		return -1; /* not actually in the pack or MIDX */
 
 	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
 	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
 	if (type < 0)
-		return; /* broken packfile, punt */
+		return -1; /* broken packfile, punt */
 
 	if (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA) {
 		off_t base_offset;
@@ -1172,9 +1176,9 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 		base_offset = get_delta_base(bitmap_git->pack, w_curs,
 					     &offset, type, header);
 		if (!base_offset)
-			return;
+			return 0;
 		if (offset_to_pack_pos(bitmap_git->pack, base_offset, &base_pos) < 0)
-			return;
+			return 0;
 
 		/*
 		 * We assume delta dependencies always point backwards. This
@@ -1186,7 +1190,7 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 		 * odd parameters.
 		 */
 		if (base_pos >= pos)
-			return;
+			return 0;
 
 		/*
 		 * And finally, if we're not sending the base as part of our
@@ -1197,13 +1201,14 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 		 * object_entry code path handle it.
 		 */
 		if (!bitmap_get(reuse, base_pos))
-			return;
+			return 0;
 	}
 
 	/*
 	 * If we got here, then the object is OK to reuse. Mark it.
 	 */
 	bitmap_set(reuse, pos);
+	return 0;
 }
 
 int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
@@ -1239,10 +1244,23 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 				break;
 
 			offset += ewah_bit_ctz64(word >> offset);
-			try_partial_reuse(bitmap_git, pos + offset, reuse, &w_curs);
+			if (try_partial_reuse(bitmap_git, pos + offset, reuse,
+					      &w_curs) < 0) {
+				/*
+				 * try_partial_reuse indicated we couldn't reuse
+				 * any bits, so there is no point in trying more
+				 * bits in the current word, or any other words
+				 * in result.
+				 *
+				 * Jump out of both loops to avoid future
+				 * unnecessary calls to try_partial_reuse.
+				 */
+				goto done;
+			}
 		}
 	}
 
+done:
 	unuse_pack(&w_curs);
 
 	*entries = bitmap_popcount(reuse);
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 16/27] pack-bitmap: read multi-pack bitmaps
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (14 preceding siblings ...)
  2021-08-31 20:52   ` [PATCH v5 15/27] pack-bitmap.c: avoid redundant calls to try_partial_reuse Taylor Blau
@ 2021-08-31 20:52   ` Taylor Blau
  2021-08-31 20:52   ` [PATCH v5 17/27] pack-bitmap: write " Taylor Blau
                     ` (12 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:52 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

This prepares the code in pack-bitmap to interpret the new multi-pack
bitmaps described in Documentation/technical/bitmap-format.txt, which
mostly involves converting bit positions to accommodate looking them up
in a MIDX.

Note that there are currently no writers who write multi-pack bitmaps,
and that this will be implemented in the subsequent commit. Note also
that get_midx_checksum() and get_midx_filename() are made non-static so
they can be called from pack-bitmap.c.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c |   5 +
 midx.c                 |   4 +-
 midx.h                 |   2 +
 pack-bitmap-write.c    |   2 +-
 pack-bitmap.c          | 357 ++++++++++++++++++++++++++++++++++++-----
 pack-bitmap.h          |   6 +
 packfile.c             |   2 +-
 7 files changed, 336 insertions(+), 42 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index b63e06e46c..c26dedfe5d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1124,6 +1124,11 @@ static void write_reused_pack(struct hashfile *f)
 				break;
 
 			offset += ewah_bit_ctz64(word >> offset);
+			/*
+			 * Can use bit positions directly, even for MIDX
+			 * bitmaps. See comment in try_partial_reuse()
+			 * for why.
+			 */
 			write_reused_pack_one(pos + offset, f, &w_curs);
 			display_progress(progress_state, ++written);
 		}
diff --git a/midx.c b/midx.c
index 43510290dc..6a10f7a042 100644
--- a/midx.c
+++ b/midx.c
@@ -48,12 +48,12 @@ static uint8_t oid_version(void)
 	}
 }
 
-static const unsigned char *get_midx_checksum(struct multi_pack_index *m)
+const unsigned char *get_midx_checksum(struct multi_pack_index *m)
 {
 	return m->data + m->data_len - the_hash_algo->rawsz;
 }
 
-static char *get_midx_filename(const char *object_dir)
+char *get_midx_filename(const char *object_dir)
 {
 	return xstrfmt("%s/pack/multi-pack-index", object_dir);
 }
diff --git a/midx.h b/midx.h
index 8684cf0fef..1172df1a71 100644
--- a/midx.h
+++ b/midx.h
@@ -42,6 +42,8 @@ struct multi_pack_index {
 #define MIDX_PROGRESS     (1 << 0)
 #define MIDX_WRITE_REV_INDEX (1 << 1)
 
+const unsigned char *get_midx_checksum(struct multi_pack_index *m);
+char *get_midx_filename(const char *object_dir);
 char *get_midx_rev_filename(struct multi_pack_index *m);
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 142fd0adb8..9c55c1531e 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -48,7 +48,7 @@ void bitmap_writer_show_progress(int show)
 }
 
 /**
- * Build the initial type index for the packfile
+ * Build the initial type index for the packfile or multi-pack-index
  */
 void bitmap_writer_build_type_index(struct packing_data *to_pack,
 				    struct pack_idx_entry **index,
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 4e37f5d574..fa69ed7a6d 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -13,6 +13,7 @@
 #include "repository.h"
 #include "object-store.h"
 #include "list-objects-filter-options.h"
+#include "midx.h"
 #include "config.h"
 
 /*
@@ -35,8 +36,15 @@ struct stored_bitmap {
  * the active bitmap index is the largest one.
  */
 struct bitmap_index {
-	/* Packfile to which this bitmap index belongs to */
+	/*
+	 * The pack or multi-pack index (MIDX) that this bitmap index belongs
+	 * to.
+	 *
+	 * Exactly one of these must be non-NULL; this specifies the object
+	 * order used to interpret this bitmap.
+	 */
 	struct packed_git *pack;
+	struct multi_pack_index *midx;
 
 	/*
 	 * Mark the first `reuse_objects` in the packfile as reused:
@@ -71,6 +79,9 @@ struct bitmap_index {
 	/* If not NULL, this is a name-hash cache pointing into map. */
 	uint32_t *hashes;
 
+	/* The checksum of the packfile or MIDX; points into map. */
+	const unsigned char *checksum;
+
 	/*
 	 * Extended index.
 	 *
@@ -138,6 +149,8 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
 
 static uint32_t bitmap_num_objects(struct bitmap_index *index)
 {
+	if (index->midx)
+		return index->midx->num_objects;
 	return index->pack->num_objects;
 }
 
@@ -175,6 +188,7 @@ static int load_bitmap_header(struct bitmap_index *index)
 	}
 
 	index->entry_count = ntohl(header->entry_count);
+	index->checksum = header->checksum;
 	index->map_pos += header_size;
 	return 0;
 }
@@ -227,6 +241,8 @@ static int nth_bitmap_object_oid(struct bitmap_index *index,
 				 struct object_id *oid,
 				 uint32_t n)
 {
+	if (index->midx)
+		return nth_midxed_object_oid(oid, index->midx, n) ? 0 : -1;
 	return nth_packed_object_id(oid, index->pack, n);
 }
 
@@ -274,7 +290,14 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 	return 0;
 }
 
-static char *pack_bitmap_filename(struct packed_git *p)
+char *midx_bitmap_filename(struct multi_pack_index *midx)
+{
+	return xstrfmt("%s-%s.bitmap",
+		       get_midx_filename(midx->object_dir),
+		       hash_to_hex(get_midx_checksum(midx)));
+}
+
+char *pack_bitmap_filename(struct packed_git *p)
 {
 	size_t len;
 
@@ -283,6 +306,57 @@ static char *pack_bitmap_filename(struct packed_git *p)
 	return xstrfmt("%.*s.bitmap", (int)len, p->pack_name);
 }
 
+static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
+			      struct multi_pack_index *midx)
+{
+	struct stat st;
+	char *idx_name = midx_bitmap_filename(midx);
+	int fd = git_open(idx_name);
+
+	free(idx_name);
+
+	if (fd < 0)
+		return -1;
+
+	if (fstat(fd, &st)) {
+		close(fd);
+		return -1;
+	}
+
+	if (bitmap_git->pack || bitmap_git->midx) {
+		/* ignore extra bitmap file; we can only handle one */
+		warning("ignoring extra bitmap file: %s",
+			get_midx_filename(midx->object_dir));
+		close(fd);
+		return -1;
+	}
+
+	bitmap_git->midx = midx;
+	bitmap_git->map_size = xsize_t(st.st_size);
+	bitmap_git->map_pos = 0;
+	bitmap_git->map = xmmap(NULL, bitmap_git->map_size, PROT_READ,
+				MAP_PRIVATE, fd, 0);
+	close(fd);
+
+	if (load_bitmap_header(bitmap_git) < 0)
+		goto cleanup;
+
+	if (!hasheq(get_midx_checksum(bitmap_git->midx), bitmap_git->checksum))
+		goto cleanup;
+
+	if (load_midx_revindex(bitmap_git->midx) < 0) {
+		warning(_("multi-pack bitmap is missing required reverse index"));
+		goto cleanup;
+	}
+	return 0;
+
+cleanup:
+	munmap(bitmap_git->map, bitmap_git->map_size);
+	bitmap_git->map_size = 0;
+	bitmap_git->map = NULL;
+	return -1;
+}
+
 static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git *packfile)
 {
 	int fd;
@@ -304,7 +378,8 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
 		return -1;
 	}
 
-	if (bitmap_git->pack) {
+	if (bitmap_git->pack || bitmap_git->midx) {
+		/* ignore extra bitmap file; we can only handle one */
 		warning("ignoring extra bitmap file: %s", packfile->pack_name);
 		close(fd);
 		return -1;
@@ -331,13 +406,39 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
 	return 0;
 }
 
-static int load_pack_bitmap(struct bitmap_index *bitmap_git)
+static int load_reverse_index(struct bitmap_index *bitmap_git)
+{
+	if (bitmap_is_midx(bitmap_git)) {
+		uint32_t i;
+		int ret;
+
+		/*
+		 * The multi-pack-index's .rev file is already loaded via
+		 * open_pack_bitmap_1().
+		 *
+		 * But we still need to open the individual pack .rev files,
+		 * since we will need to make use of them in pack-objects.
+		 */
+		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
+			if (prepare_midx_pack(the_repository, bitmap_git->midx, i))
+				die(_("load_reverse_index: could not open pack"));
+			ret = load_pack_revindex(bitmap_git->midx->packs[i]);
+			if (ret)
+				return ret;
+		}
+		return 0;
+	}
+	return load_pack_revindex(bitmap_git->pack);
+}
+
+static int load_bitmap(struct bitmap_index *bitmap_git)
 {
 	assert(bitmap_git->map);
 
 	bitmap_git->bitmaps = kh_init_oid_map();
 	bitmap_git->ext_index.positions = kh_init_oid_pos();
-	if (load_pack_revindex(bitmap_git->pack))
+
+	if (load_reverse_index(bitmap_git))
 		goto failed;
 
 	if (!(bitmap_git->commits = read_bitmap_1(bitmap_git)) ||
@@ -381,11 +482,47 @@ static int open_pack_bitmap(struct repository *r,
 	return ret;
 }
 
+static int open_midx_bitmap(struct repository *r,
+			    struct bitmap_index *bitmap_git)
+{
+	struct multi_pack_index *midx;
+
+	assert(!bitmap_git->map);
+
+	for (midx = get_multi_pack_index(r); midx; midx = midx->next) {
+		if (!open_midx_bitmap_1(bitmap_git, midx))
+			return 0;
+	}
+	return -1;
+}
+
+static int open_bitmap(struct repository *r,
+		       struct bitmap_index *bitmap_git)
+{
+	assert(!bitmap_git->map);
+
+	if (!open_midx_bitmap(r, bitmap_git))
+		return 0;
+	return open_pack_bitmap(r, bitmap_git);
+}
+
 struct bitmap_index *prepare_bitmap_git(struct repository *r)
 {
 	struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
 
-	if (!open_pack_bitmap(r, bitmap_git) && !load_pack_bitmap(bitmap_git))
+	if (!open_bitmap(r, bitmap_git) && !load_bitmap(bitmap_git))
+		return bitmap_git;
+
+	free_bitmap_index(bitmap_git);
+	return NULL;
+}
+
+struct bitmap_index *prepare_midx_bitmap_git(struct repository *r,
+					     struct multi_pack_index *midx)
+{
+	struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
+
+	if (!open_midx_bitmap_1(bitmap_git, midx) && !load_bitmap(bitmap_git))
 		return bitmap_git;
 
 	free_bitmap_index(bitmap_git);
@@ -435,10 +572,26 @@ static inline int bitmap_position_packfile(struct bitmap_index *bitmap_git,
 	return pos;
 }
 
+static int bitmap_position_midx(struct bitmap_index *bitmap_git,
+				const struct object_id *oid)
+{
+	uint32_t want, got;
+	if (!bsearch_midx(oid, bitmap_git->midx, &want))
+		return -1;
+
+	if (midx_to_pack_pos(bitmap_git->midx, want, &got) < 0)
+		return -1;
+	return got;
+}
+
 static int bitmap_position(struct bitmap_index *bitmap_git,
 			   const struct object_id *oid)
 {
-	int pos = bitmap_position_packfile(bitmap_git, oid);
+	int pos;
+	if (bitmap_is_midx(bitmap_git))
+		pos = bitmap_position_midx(bitmap_git, oid);
+	else
+		pos = bitmap_position_packfile(bitmap_git, oid);
 	return (pos >= 0) ? pos : bitmap_position_extended(bitmap_git, oid);
 }
 
@@ -749,6 +902,7 @@ static void show_objects_for_type(
 			continue;
 
 		for (offset = 0; offset < BITS_IN_EWORD; ++offset) {
+			struct packed_git *pack;
 			struct object_id oid;
 			uint32_t hash = 0, index_pos;
 			off_t ofs;
@@ -758,14 +912,28 @@ static void show_objects_for_type(
 
 			offset += ewah_bit_ctz64(word >> offset);
 
-			index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset);
-			ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset);
-			nth_packed_object_id(&oid, bitmap_git->pack, index_pos);
+			if (bitmap_is_midx(bitmap_git)) {
+				struct multi_pack_index *m = bitmap_git->midx;
+				uint32_t pack_id;
+
+				index_pos = pack_pos_to_midx(m, pos + offset);
+				ofs = nth_midxed_offset(m, index_pos);
+				nth_midxed_object_oid(&oid, m, index_pos);
+
+				pack_id = nth_midxed_pack_int_id(m, index_pos);
+				pack = bitmap_git->midx->packs[pack_id];
+			} else {
+				index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset);
+				ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset);
+				nth_bitmap_object_oid(bitmap_git, &oid, index_pos);
+
+				pack = bitmap_git->pack;
+			}
 
 			if (bitmap_git->hashes)
 				hash = get_be32(bitmap_git->hashes + index_pos);
 
-			show_reach(&oid, object_type, 0, hash, bitmap_git->pack, ofs);
+			show_reach(&oid, object_type, 0, hash, pack, ofs);
 		}
 	}
 }
@@ -777,8 +945,13 @@ static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
 		struct object *object = roots->item;
 		roots = roots->next;
 
-		if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
-			return 1;
+		if (bitmap_is_midx(bitmap_git)) {
+			if (bsearch_midx(&object->oid, bitmap_git->midx, NULL))
+				return 1;
+		} else {
+			if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
+				return 1;
+		}
 	}
 
 	return 0;
@@ -865,14 +1038,26 @@ static void filter_bitmap_blob_none(struct bitmap_index *bitmap_git,
 static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 				     uint32_t pos)
 {
-	struct packed_git *pack = bitmap_git->pack;
 	unsigned long size;
 	struct object_info oi = OBJECT_INFO_INIT;
 
 	oi.sizep = &size;
 
 	if (pos < bitmap_num_objects(bitmap_git)) {
-		off_t ofs = pack_pos_to_offset(pack, pos);
+		struct packed_git *pack;
+		off_t ofs;
+
+		if (bitmap_is_midx(bitmap_git)) {
+			uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
+			uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
+
+			pack = bitmap_git->midx->packs[pack_id];
+			ofs = nth_midxed_offset(bitmap_git->midx, midx_pos);
+		} else {
+			pack = bitmap_git->pack;
+			ofs = pack_pos_to_offset(pack, pos);
+		}
+
 		if (packed_object_info(the_repository, pack, ofs, &oi) < 0) {
 			struct object_id oid;
 			nth_bitmap_object_oid(bitmap_git, &oid,
@@ -1053,7 +1238,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	/* try to open a bitmapped pack, but don't parse it yet
 	 * because we may not need to use it */
 	CALLOC_ARRAY(bitmap_git, 1);
-	if (open_pack_bitmap(revs->repo, bitmap_git) < 0)
+	if (open_bitmap(revs->repo, bitmap_git) < 0)
 		goto cleanup;
 
 	for (i = 0; i < revs->pending.nr; ++i) {
@@ -1097,7 +1282,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	 * from disk. this is the point of no return; after this the rev_list
 	 * becomes invalidated and we must perform the revwalk through bitmaps
 	 */
-	if (load_pack_bitmap(bitmap_git) < 0)
+	if (load_bitmap(bitmap_git) < 0)
 		goto cleanup;
 
 	object_array_clear(&revs->pending);
@@ -1145,19 +1330,43 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
  * reused, but you can keep feeding bits.
  */
 static int try_partial_reuse(struct bitmap_index *bitmap_git,
+			     struct packed_git *pack,
 			     size_t pos,
 			     struct bitmap *reuse,
 			     struct pack_window **w_curs)
 {
-	off_t offset, header;
+	off_t offset, delta_obj_offset;
 	enum object_type type;
 	unsigned long size;
 
-	if (pos >= bitmap_num_objects(bitmap_git))
-		return -1; /* not actually in the pack or MIDX */
+	/*
+	 * try_partial_reuse() is called either on (a) objects in the
+	 * bitmapped pack (in the case of a single-pack bitmap) or (b)
+	 * objects in the preferred pack of a multi-pack bitmap.
+	 * Importantly, the latter can pretend as if only a single pack
+	 * exists because:
+	 *
+	 *   - The first pack->num_objects bits of a MIDX bitmap are
+	 *     reserved for the preferred pack, and
+	 *
+	 *   - Ties due to duplicate objects are always resolved in
+	 *     favor of the preferred pack.
+	 *
+	 * Therefore we do not need to ever ask the MIDX for its copy of
+	 * an object by OID, since it will always select it from the
+	 * preferred pack. Likewise, the selected copy of the base
+	 * object for any deltas will reside in the same pack.
+	 *
+	 * This means that we can reuse pos when looking up the bit in
+	 * the reuse bitmap, too, since bits corresponding to the
+	 * preferred pack precede all bits from other packs.
+	 */
 
-	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
-	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
+	if (pos >= pack->num_objects)
+		return -1; /* not actually in the pack or MIDX preferred pack */
+
+	offset = delta_obj_offset = pack_pos_to_offset(pack, pos);
+	type = unpack_object_header(pack, w_curs, &offset, &size);
 	if (type < 0)
 		return -1; /* broken packfile, punt */
 
@@ -1173,11 +1382,11 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
 		 * and the normal slow path will complain about it in
 		 * more detail.
 		 */
-		base_offset = get_delta_base(bitmap_git->pack, w_curs,
-					     &offset, type, header);
+		base_offset = get_delta_base(pack, w_curs, &offset, type,
+					     delta_obj_offset);
 		if (!base_offset)
 			return 0;
-		if (offset_to_pack_pos(bitmap_git->pack, base_offset, &base_pos) < 0)
+		if (offset_to_pack_pos(pack, base_offset, &base_pos) < 0)
 			return 0;
 
 		/*
@@ -1211,24 +1420,48 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
 	return 0;
 }
 
+static uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git)
+{
+	struct multi_pack_index *m = bitmap_git->midx;
+	if (!m)
+		BUG("midx_preferred_pack: requires non-empty MIDX");
+	return nth_midxed_pack_int_id(m, pack_pos_to_midx(bitmap_git->midx, 0));
+}
+
 int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 				       struct packed_git **packfile_out,
 				       uint32_t *entries,
 				       struct bitmap **reuse_out)
 {
+	struct packed_git *pack;
 	struct bitmap *result = bitmap_git->result;
 	struct bitmap *reuse;
 	struct pack_window *w_curs = NULL;
 	size_t i = 0;
 	uint32_t offset;
-	uint32_t objects_nr = bitmap_num_objects(bitmap_git);
+	uint32_t objects_nr;
 
 	assert(result);
 
+	load_reverse_index(bitmap_git);
+
+	if (bitmap_is_midx(bitmap_git))
+		pack = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)];
+	else
+		pack = bitmap_git->pack;
+	objects_nr = pack->num_objects;
+
 	while (i < result->word_alloc && result->words[i] == (eword_t)~0)
 		i++;
 
-	/* Don't mark objects not in the packfile */
+	/*
+	 * Don't mark objects not in the packfile or preferred pack. This bitmap
+	 * marks objects eligible for reuse, but the pack-reuse code only
+	 * understands how to reuse a single pack. Since the preferred pack is
+	 * guaranteed to have all bases for its deltas (in a multi-pack bitmap),
+	 * we use it instead of another pack. In single-pack bitmaps, the choice
+	 * is made for us.
+	 */
 	if (i > objects_nr / BITS_IN_EWORD)
 		i = objects_nr / BITS_IN_EWORD;
 
@@ -1244,8 +1477,8 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 				break;
 
 			offset += ewah_bit_ctz64(word >> offset);
-			if (try_partial_reuse(bitmap_git, pos + offset, reuse,
-					      &w_curs) < 0) {
+			if (try_partial_reuse(bitmap_git, pack, pos + offset,
+					      reuse, &w_curs) < 0) {
 				/*
 				 * try_partial_reuse indicated we couldn't reuse
 				 * any bits, so there is no point in trying more
@@ -1274,7 +1507,7 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 	 * need to be handled separately.
 	 */
 	bitmap_and_not(result, reuse);
-	*packfile_out = bitmap_git->pack;
+	*packfile_out = pack;
 	*reuse_out = reuse;
 	return 0;
 }
@@ -1548,6 +1781,12 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 	uint32_t i, num_objects;
 	uint32_t *reposition;
 
+	if (!bitmap_is_midx(bitmap_git))
+		load_reverse_index(bitmap_git);
+	else if (load_midx_revindex(bitmap_git->midx) < 0)
+		BUG("rebuild_existing_bitmaps: missing required rev-cache "
+		    "extension");
+
 	num_objects = bitmap_num_objects(bitmap_git);
 	CALLOC_ARRAY(reposition, num_objects);
 
@@ -1555,8 +1794,13 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 		struct object_id oid;
 		struct object_entry *oe;
 
-		nth_packed_object_id(&oid, bitmap_git->pack,
-				     pack_pos_to_index(bitmap_git->pack, i));
+		if (bitmap_is_midx(bitmap_git))
+			nth_midxed_object_oid(&oid,
+					      bitmap_git->midx,
+					      pack_pos_to_midx(bitmap_git->midx, i));
+		else
+			nth_packed_object_id(&oid, bitmap_git->pack,
+					     pack_pos_to_index(bitmap_git->pack, i));
 		oe = packlist_find(mapping, &oid);
 
 		if (oe)
@@ -1582,6 +1826,19 @@ void free_bitmap_index(struct bitmap_index *b)
 	free(b->ext_index.hashes);
 	bitmap_free(b->result);
 	bitmap_free(b->haves);
+	if (bitmap_is_midx(b)) {
+		/*
+		 * Multi-pack bitmaps need to have resources associated with
+		 * their on-disk reverse indexes unmapped so that stale .rev and
+		 * .bitmap files can be removed.
+		 *
+		 * Unlike pack-based bitmaps, multi-pack bitmaps can be read and
+		 * written in the same 'git multi-pack-index write --bitmap'
+		 * process. Close resources so they can be removed safely on
+		 * platforms like Windows.
+		 */
+		close_midx_revindex(b->midx);
+	}
 	free(b);
 }
 
@@ -1596,7 +1853,6 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
 				     enum object_type object_type)
 {
 	struct bitmap *result = bitmap_git->result;
-	struct packed_git *pack = bitmap_git->pack;
 	off_t total = 0;
 	struct ewah_iterator it;
 	eword_t filter;
@@ -1613,15 +1869,35 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
 			continue;
 
 		for (offset = 0; offset < BITS_IN_EWORD; offset++) {
-			size_t pos;
-
 			if ((word >> offset) == 0)
 				break;
 
 			offset += ewah_bit_ctz64(word >> offset);
-			pos = base + offset;
-			total += pack_pos_to_offset(pack, pos + 1) -
-				 pack_pos_to_offset(pack, pos);
+
+			if (bitmap_is_midx(bitmap_git)) {
+				uint32_t pack_pos;
+				uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, base + offset);
+				off_t offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
+
+				uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
+				struct packed_git *pack = bitmap_git->midx->packs[pack_id];
+
+				if (offset_to_pack_pos(pack, offset, &pack_pos) < 0) {
+					struct object_id oid;
+					nth_midxed_object_oid(&oid, bitmap_git->midx, midx_pos);
+
+					die(_("could not find %s in pack %s at offset %"PRIuMAX),
+					    oid_to_hex(&oid),
+					    pack->pack_name,
+					    (uintmax_t)offset);
+				}
+
+				total += pack_pos_to_offset(pack, pack_pos + 1) - offset;
+			} else {
+				size_t pos = base + offset;
+				total += pack_pos_to_offset(bitmap_git->pack, pos + 1) -
+					 pack_pos_to_offset(bitmap_git->pack, pos);
+			}
 		}
 	}
 
@@ -1672,6 +1948,11 @@ off_t get_disk_usage_from_bitmap(struct bitmap_index *bitmap_git,
 	return total;
 }
 
+int bitmap_is_midx(struct bitmap_index *bitmap_git)
+{
+	return !!bitmap_git->midx;
+}
+
 const struct string_list *bitmap_preferred_tips(struct repository *r)
 {
 	return repo_config_get_value_multi(r, "pack.preferbitmaptips");
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 52ea10de51..81664f933f 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -44,6 +44,8 @@ typedef int (*show_reachable_fn)(
 struct bitmap_index;
 
 struct bitmap_index *prepare_bitmap_git(struct repository *r);
+struct bitmap_index *prepare_midx_bitmap_git(struct repository *r,
+					     struct multi_pack_index *midx);
 void count_bitmap_commit_list(struct bitmap_index *, uint32_t *commits,
 			      uint32_t *trees, uint32_t *blobs, uint32_t *tags);
 void traverse_bitmap_commit_list(struct bitmap_index *,
@@ -92,6 +94,10 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint32_t index_nr,
 			  const char *filename,
 			  uint16_t options);
+char *midx_bitmap_filename(struct multi_pack_index *midx);
+char *pack_bitmap_filename(struct packed_git *p);
+
+int bitmap_is_midx(struct bitmap_index *bitmap_git);
 
 const struct string_list *bitmap_preferred_tips(struct repository *r);
 int bitmap_is_preferred_refname(struct repository *r, const char *refname);
diff --git a/packfile.c b/packfile.c
index 9ef6d98292..371f5488cf 100644
--- a/packfile.c
+++ b/packfile.c
@@ -860,7 +860,7 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
 	if (!strcmp(file_name, "multi-pack-index"))
 		return;
 	if (starts_with(file_name, "multi-pack-index") &&
-	    ends_with(file_name, ".rev"))
+	    (ends_with(file_name, ".bitmap") || ends_with(file_name, ".rev")))
 		return;
 	if (ends_with(file_name, ".idx") ||
 	    ends_with(file_name, ".rev") ||
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 17/27] pack-bitmap: write multi-pack bitmaps
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (15 preceding siblings ...)
  2021-08-31 20:52   ` [PATCH v5 16/27] pack-bitmap: read multi-pack bitmaps Taylor Blau
@ 2021-08-31 20:52   ` Taylor Blau
  2021-08-31 20:52   ` [PATCH v5 18/27] t5310: move some tests to lib-bitmap.sh Taylor Blau
                     ` (11 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:52 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Write multi-pack bitmaps in the format described by
Documentation/technical/bitmap-format.txt, inferring their presence with
the absence of '--bitmap'.

To write a multi-pack bitmap, this patch attempts to reuse as much of
the existing machinery from pack-objects as possible. Specifically, the
MIDX code prepares a packing_data struct that pretends as if a single
packfile has been generated containing all of the objects contained
within the MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt |  12 +-
 builtin/multi-pack-index.c             |   2 +
 midx.c                                 | 209 ++++++++++++++++++++++++-
 midx.h                                 |   1 +
 4 files changed, 215 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index 0af6beb2dd..a9df3dbd32 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -10,7 +10,7 @@ SYNOPSIS
 --------
 [verse]
 'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress]
-	[--preferred-pack=<pack>] <subcommand>
+	[--preferred-pack=<pack>] [--[no-]bitmap] <subcommand>
 
 DESCRIPTION
 -----------
@@ -42,6 +42,9 @@ write::
 		multiple packs contain the same object. `<pack>` must
 		contain at least one object. If not given, ties are
 		broken in favor of the pack with the lowest mtime.
+
+	--[no-]bitmap::
+		Control whether or not a multi-pack bitmap is written.
 --
 
 verify::
@@ -83,6 +86,13 @@ EXAMPLES
 $ git multi-pack-index write
 -----------------------------------------------
 
+* Write a MIDX file for the packfiles in the current .git folder with a
+corresponding bitmap.
++
+-------------------------------------------------------------
+$ git multi-pack-index write --preferred-pack=<pack> --bitmap
+-------------------------------------------------------------
+
 * Write a MIDX file for the packfiles in an alternate object store.
 +
 -----------------------------------------------
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 8ff0dee2ec..73c0113b48 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -68,6 +68,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 		OPT_STRING(0, "preferred-pack", &opts.preferred_pack,
 			   N_("preferred-pack"),
 			   N_("pack for reuse when computing a multi-pack bitmap")),
+		OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
+			MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
 		OPT_END(),
 	};
 
diff --git a/midx.c b/midx.c
index 6a10f7a042..284221ae62 100644
--- a/midx.c
+++ b/midx.c
@@ -13,6 +13,10 @@
 #include "repository.h"
 #include "chunk-format.h"
 #include "pack.h"
+#include "pack-bitmap.h"
+#include "refs.h"
+#include "revision.h"
+#include "list-objects.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
@@ -893,6 +897,166 @@ static int midx_checksum_valid(struct multi_pack_index *m)
 	return hashfile_checksum_valid(m->data, m->data_len);
 }
 
+static void prepare_midx_packing_data(struct packing_data *pdata,
+				      struct write_midx_context *ctx)
+{
+	uint32_t i;
+
+	memset(pdata, 0, sizeof(struct packing_data));
+	prepare_packing_data(the_repository, pdata);
+
+	for (i = 0; i < ctx->entries_nr; i++) {
+		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
+		struct object_entry *to = packlist_alloc(pdata, &from->oid);
+
+		oe_set_in_pack(pdata, to,
+			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
+	}
+}
+
+static int add_ref_to_pending(const char *refname,
+			      const struct object_id *oid,
+			      int flag, void *cb_data)
+{
+	struct rev_info *revs = (struct rev_info*)cb_data;
+	struct object *object;
+
+	if ((flag & REF_ISSYMREF) && (flag & REF_ISBROKEN)) {
+		warning("symbolic ref is dangling: %s", refname);
+		return 0;
+	}
+
+	object = parse_object_or_die(oid, refname);
+	if (object->type != OBJ_COMMIT)
+		return 0;
+
+	add_pending_object(revs, object, "");
+	if (bitmap_is_preferred_refname(revs->repo, refname))
+		object->flags |= NEEDS_BITMAP;
+	return 0;
+}
+
+struct bitmap_commit_cb {
+	struct commit **commits;
+	size_t commits_nr, commits_alloc;
+
+	struct write_midx_context *ctx;
+};
+
+static const struct object_id *bitmap_oid_access(size_t index,
+						 const void *_entries)
+{
+	const struct pack_midx_entry *entries = _entries;
+	return &entries[index].oid;
+}
+
+static void bitmap_show_commit(struct commit *commit, void *_data)
+{
+	struct bitmap_commit_cb *data = _data;
+	int pos = oid_pos(&commit->object.oid, data->ctx->entries,
+			  data->ctx->entries_nr,
+			  bitmap_oid_access);
+	if (pos < 0)
+		return;
+
+	ALLOC_GROW(data->commits, data->commits_nr + 1, data->commits_alloc);
+	data->commits[data->commits_nr++] = commit;
+}
+
+static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr_p,
+						    struct write_midx_context *ctx)
+{
+	struct rev_info revs;
+	struct bitmap_commit_cb cb = {0};
+
+	cb.ctx = ctx;
+
+	repo_init_revisions(the_repository, &revs, NULL);
+	setup_revisions(0, NULL, &revs, NULL);
+	for_each_ref(add_ref_to_pending, &revs);
+
+	/*
+	 * Skipping promisor objects here is intentional, since it only excludes
+	 * them from the list of reachable commits that we want to select from
+	 * when computing the selection of MIDX'd commits to receive bitmaps.
+	 *
+	 * Reachability bitmaps do require that their objects be closed under
+	 * reachability, but fetching any objects missing from promisors at this
+	 * point is too late. But, if one of those objects can be reached from
+	 * an another object that is included in the bitmap, then we will
+	 * complain later that we don't have reachability closure (and fail
+	 * appropriately).
+	 */
+	fetch_if_missing = 0;
+	revs.exclude_promisor_objects = 1;
+
+	if (prepare_revision_walk(&revs))
+		die(_("revision walk setup failed"));
+
+	traverse_commit_list(&revs, bitmap_show_commit, NULL, &cb);
+	if (indexed_commits_nr_p)
+		*indexed_commits_nr_p = cb.commits_nr;
+
+	return cb.commits;
+}
+
+static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
+			     struct write_midx_context *ctx,
+			     unsigned flags)
+{
+	struct packing_data pdata;
+	struct pack_idx_entry **index;
+	struct commit **commits = NULL;
+	uint32_t i, commits_nr;
+	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
+	int ret;
+
+	prepare_midx_packing_data(&pdata, ctx);
+
+	commits = find_commits_for_midx_bitmap(&commits_nr, ctx);
+
+	/*
+	 * Build the MIDX-order index based on pdata.objects (which is already
+	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
+	 * this order).
+	 */
+	ALLOC_ARRAY(index, pdata.nr_objects);
+	for (i = 0; i < pdata.nr_objects; i++)
+		index[i] = &pdata.objects[i].idx;
+
+	bitmap_writer_show_progress(flags & MIDX_PROGRESS);
+	bitmap_writer_build_type_index(&pdata, index, pdata.nr_objects);
+
+	/*
+	 * bitmap_writer_finish expects objects in lex order, but pack_order
+	 * gives us exactly that. use it directly instead of re-sorting the
+	 * array.
+	 *
+	 * This changes the order of objects in 'index' between
+	 * bitmap_writer_build_type_index and bitmap_writer_finish.
+	 *
+	 * The same re-ordering takes place in the single-pack bitmap code via
+	 * write_idx_file(), which is called by finish_tmp_packfile(), which
+	 * happens between bitmap_writer_build_type_index() and
+	 * bitmap_writer_finish().
+	 */
+	for (i = 0; i < pdata.nr_objects; i++)
+		index[ctx->pack_order[i]] = &pdata.objects[i].idx;
+
+	bitmap_writer_select_commits(commits, commits_nr, -1);
+	ret = bitmap_writer_build(&pdata);
+	if (ret < 0)
+		goto cleanup;
+
+	bitmap_writer_set_checksum(midx_hash);
+	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, 0);
+
+cleanup:
+	free(index);
+	free(bitmap_name);
+	return ret;
+}
+
 static int write_midx_internal(const char *object_dir,
 			       struct string_list *packs_to_drop,
 			       const char *preferred_pack_name,
@@ -938,7 +1102,7 @@ static int write_midx_internal(const char *object_dir,
 
 			ctx.info[ctx.nr].orig_pack_int_id = i;
 			ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]);
-			ctx.info[ctx.nr].p = NULL;
+			ctx.info[ctx.nr].p = ctx.m->packs[i];
 			ctx.info[ctx.nr].expired = 0;
 
 			if (flags & MIDX_WRITE_REV_INDEX) {
@@ -972,8 +1136,26 @@ static int write_midx_internal(const char *object_dir,
 	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
 	stop_progress(&ctx.progress);
 
-	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
-		goto cleanup;
+	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop) {
+		struct bitmap_index *bitmap_git;
+		int bitmap_exists;
+		int want_bitmap = flags & MIDX_WRITE_BITMAP;
+
+		bitmap_git = prepare_midx_bitmap_git(the_repository, ctx.m);
+		bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
+		free_bitmap_index(bitmap_git);
+
+		if (bitmap_exists || !want_bitmap) {
+			/*
+			 * The correct MIDX already exists, and so does a
+			 * corresponding bitmap (or one wasn't requested).
+			 */
+			if (!want_bitmap)
+				clear_midx_files_ext(object_dir, ".bitmap",
+						     NULL);
+			goto cleanup;
+		}
+	}
 
 	if (preferred_pack_name) {
 		int found = 0;
@@ -989,7 +1171,8 @@ static int write_midx_internal(const char *object_dir,
 		if (!found)
 			warning(_("unknown preferred pack: '%s'"),
 				preferred_pack_name);
-	} else if (ctx.nr && (flags & MIDX_WRITE_REV_INDEX)) {
+	} else if (ctx.nr &&
+		   (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
 		struct packed_git *oldest = ctx.info[ctx.preferred_pack_idx].p;
 		ctx.preferred_pack_idx = 0;
 
@@ -1121,9 +1304,6 @@ static int write_midx_internal(const char *object_dir,
 	hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
 	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
 
-	if (ctx.m)
-		close_object_store(the_repository->objects);
-
 	if (ctx.nr - dropped_packs == 0) {
 		error(_("no pack files to index."));
 		result = 1;
@@ -1154,14 +1334,25 @@ static int write_midx_internal(const char *object_dir,
 	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
 
-	if (flags & MIDX_WRITE_REV_INDEX)
+	if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))
 		ctx.pack_order = midx_pack_order(&ctx);
 
 	if (flags & MIDX_WRITE_REV_INDEX)
 		write_midx_reverse_index(midx_name, midx_hash, &ctx);
+	if (flags & MIDX_WRITE_BITMAP) {
+		if (write_midx_bitmap(midx_name, midx_hash, &ctx, flags) < 0) {
+			error(_("could not write multi-pack bitmap"));
+			result = 1;
+			goto cleanup;
+		}
+	}
+
+	if (ctx.m)
+		close_object_store(the_repository->objects);
 
 	commit_lock_file(&lk);
 
+	clear_midx_files_ext(object_dir, ".bitmap", midx_hash);
 	clear_midx_files_ext(object_dir, ".rev", midx_hash);
 
 cleanup:
@@ -1178,6 +1369,7 @@ static int write_midx_internal(const char *object_dir,
 	free(ctx.pack_perm);
 	free(ctx.pack_order);
 	free(midx_name);
+
 	return result;
 }
 
@@ -1238,6 +1430,7 @@ void clear_midx_file(struct repository *r)
 	if (remove_path(midx))
 		die(_("failed to clear multi-pack-index at %s"), midx);
 
+	clear_midx_files_ext(r->objects->odb->path, ".bitmap", NULL);
 	clear_midx_files_ext(r->objects->odb->path, ".rev", NULL);
 
 	free(midx);
diff --git a/midx.h b/midx.h
index 1172df1a71..350f4d0a7b 100644
--- a/midx.h
+++ b/midx.h
@@ -41,6 +41,7 @@ struct multi_pack_index {
 
 #define MIDX_PROGRESS     (1 << 0)
 #define MIDX_WRITE_REV_INDEX (1 << 1)
+#define MIDX_WRITE_BITMAP (1 << 2)
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m);
 char *get_midx_filename(const char *object_dir);
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 18/27] t5310: move some tests to lib-bitmap.sh
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (16 preceding siblings ...)
  2021-08-31 20:52   ` [PATCH v5 17/27] pack-bitmap: write " Taylor Blau
@ 2021-08-31 20:52   ` Taylor Blau
  2021-08-31 20:52   ` [PATCH v5 19/27] t/helper/test-read-midx.c: add --checksum mode Taylor Blau
                     ` (10 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:52 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

We'll soon be adding a test script that will cover many of the same
bitmap concepts as t5310, but for MIDX bitmaps. Let's pull out as many
of the applicable tests as we can so we don't have to rewrite them.

There should be no functional change to t5310; we still run the same
operations in the same order.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/lib-bitmap.sh         | 236 ++++++++++++++++++++++++++++++++++++++++
 t/t5310-pack-bitmaps.sh | 227 +-------------------------------------
 2 files changed, 240 insertions(+), 223 deletions(-)

diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index fe3f98be24..77464da6fd 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -1,3 +1,6 @@
+# Helpers for scripts testing bitmap functionality; see t5310 for
+# example usage.
+
 # Compare a file containing rev-list bitmap traversal output to its non-bitmap
 # counterpart. You can't just use test_cmp for this, because the two produce
 # subtly different output:
@@ -24,3 +27,236 @@ test_bitmap_traversal () {
 	test_cmp "$1.normalized" "$2.normalized" &&
 	rm -f "$1.normalized" "$2.normalized"
 }
+
+# To ensure the logic for "maximal commits" is exercised, make
+# the repository a bit more complicated.
+#
+#    other                         second
+#      *                             *
+# (99 commits)                  (99 commits)
+#      *                             *
+#      |\                           /|
+#      | * octo-other  octo-second * |
+#      |/|\_________  ____________/|\|
+#      | \          \/  __________/  |
+#      |  | ________/\ /             |
+#      *  |/          * merge-right  *
+#      | _|__________/ \____________ |
+#      |/ |                         \|
+# (l1) *  * merge-left               * (r1)
+#      | / \________________________ |
+#      |/                           \|
+# (l2) *                             * (r2)
+#       \___________________________ |
+#                                   \|
+#                                    * (base)
+#
+# We only push bits down the first-parent history, which
+# makes some of these commits unimportant!
+#
+# The important part for the maximal commit algorithm is how
+# the bitmasks are extended. Assuming starting bit positions
+# for second (bit 0) and other (bit 1), the bitmasks at the
+# end should be:
+#
+#      second: 1       (maximal, selected)
+#       other: 01      (maximal, selected)
+#      (base): 11 (maximal)
+#
+# This complicated history was important for a previous
+# version of the walk that guarantees never walking a
+# commit multiple times. That goal might be important
+# again, so preserve this complicated case. For now, this
+# test will guarantee that the bitmaps are computed
+# correctly, even with the repeat calculations.
+setup_bitmap_history() {
+	test_expect_success 'setup repo with moderate-sized history' '
+		test_commit_bulk --id=file 10 &&
+		git branch -M second &&
+		git checkout -b other HEAD~5 &&
+		test_commit_bulk --id=side 10 &&
+
+		# add complicated history setup, including merges and
+		# ambiguous merge-bases
+
+		git checkout -b merge-left other~2 &&
+		git merge second~2 -m "merge-left" &&
+
+		git checkout -b merge-right second~1 &&
+		git merge other~1 -m "merge-right" &&
+
+		git checkout -b octo-second second &&
+		git merge merge-left merge-right -m "octopus-second" &&
+
+		git checkout -b octo-other other &&
+		git merge merge-left merge-right -m "octopus-other" &&
+
+		git checkout other &&
+		git merge octo-other -m "pull octopus" &&
+
+		git checkout second &&
+		git merge octo-second -m "pull octopus" &&
+
+		# Remove these branches so they are not selected
+		# as bitmap tips
+		git branch -D merge-left &&
+		git branch -D merge-right &&
+		git branch -D octo-other &&
+		git branch -D octo-second &&
+
+		# add padding to make these merges less interesting
+		# and avoid having them selected for bitmaps
+		test_commit_bulk --id=file 100 &&
+		git checkout other &&
+		test_commit_bulk --id=side 100 &&
+		git checkout second &&
+
+		bitmaptip=$(git rev-parse second) &&
+		blob=$(echo tagged-blob | git hash-object -w --stdin) &&
+		git tag tagged-blob $blob
+	'
+}
+
+rev_list_tests_head () {
+	test_expect_success "counting commits via bitmap ($state, $branch)" '
+		git rev-list --count $branch >expect &&
+		git rev-list --use-bitmap-index --count $branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting partial commits via bitmap ($state, $branch)" '
+		git rev-list --count $branch~5..$branch >expect &&
+		git rev-list --use-bitmap-index --count $branch~5..$branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting commits with limit ($state, $branch)" '
+		git rev-list --count -n 1 $branch >expect &&
+		git rev-list --use-bitmap-index --count -n 1 $branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting non-linear history ($state, $branch)" '
+		git rev-list --count other...second >expect &&
+		git rev-list --use-bitmap-index --count other...second >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting commits with limiting ($state, $branch)" '
+		git rev-list --count $branch -- 1.t >expect &&
+		git rev-list --use-bitmap-index --count $branch -- 1.t >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting objects via bitmap ($state, $branch)" '
+		git rev-list --count --objects $branch >expect &&
+		git rev-list --use-bitmap-index --count --objects $branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "enumerate commits ($state, $branch)" '
+		git rev-list --use-bitmap-index $branch >actual &&
+		git rev-list $branch >expect &&
+		test_bitmap_traversal --no-confirm-bitmaps expect actual
+	'
+
+	test_expect_success "enumerate --objects ($state, $branch)" '
+		git rev-list --objects --use-bitmap-index $branch >actual &&
+		git rev-list --objects $branch >expect &&
+		test_bitmap_traversal expect actual
+	'
+
+	test_expect_success "bitmap --objects handles non-commit objects ($state, $branch)" '
+		git rev-list --objects --use-bitmap-index $branch tagged-blob >actual &&
+		grep $blob actual
+	'
+}
+
+rev_list_tests () {
+	state=$1
+
+	for branch in "second" "other"
+	do
+		rev_list_tests_head
+	done
+}
+
+basic_bitmap_tests () {
+	tip="$1"
+	test_expect_success 'rev-list --test-bitmap verifies bitmaps' "
+		git rev-list --test-bitmap "${tip:-HEAD}"
+	"
+
+	rev_list_tests 'full bitmap'
+
+	test_expect_success 'clone from bitmapped repository' '
+		rm -fr clone.git &&
+		git clone --no-local --bare . clone.git &&
+		git rev-parse HEAD >expect &&
+		git --git-dir=clone.git rev-parse HEAD >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success 'partial clone from bitmapped repository' '
+		test_config uploadpack.allowfilter true &&
+		rm -fr partial-clone.git &&
+		git clone --no-local --bare --filter=blob:none . partial-clone.git &&
+		(
+			cd partial-clone.git &&
+			pack=$(echo objects/pack/*.pack) &&
+			git verify-pack -v "$pack" >have &&
+			awk "/blob/ { print \$1 }" <have >blobs &&
+			# we expect this single blob because of the direct ref
+			git rev-parse refs/tags/tagged-blob >expect &&
+			test_cmp expect blobs
+		)
+	'
+
+	test_expect_success 'setup further non-bitmapped commits' '
+		test_commit_bulk --id=further 10
+	'
+
+	rev_list_tests 'partial bitmap'
+
+	test_expect_success 'fetch (partial bitmap)' '
+		git --git-dir=clone.git fetch origin second:second &&
+		git rev-parse HEAD >expect &&
+		git --git-dir=clone.git rev-parse HEAD >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success 'enumerating progress counts pack-reused objects' '
+		count=$(git rev-list --objects --all --count) &&
+		git repack -adb &&
+
+		# check first with only reused objects; confirm that our
+		# progress showed the right number, and also that we did
+		# pack-reuse as expected.  Check only the final "done"
+		# line of the meter (there may be an arbitrary number of
+		# intermediate lines ending with CR).
+		GIT_PROGRESS_DELAY=0 \
+			git pack-objects --all --stdout --progress \
+			</dev/null >/dev/null 2>stderr &&
+		grep "Enumerating objects: $count, done" stderr &&
+		grep "pack-reused $count" stderr &&
+
+		# now the same but with one non-reused object
+		git commit --allow-empty -m "an extra commit object" &&
+		GIT_PROGRESS_DELAY=0 \
+			git pack-objects --all --stdout --progress \
+			</dev/null >/dev/null 2>stderr &&
+		grep "Enumerating objects: $((count+1)), done" stderr &&
+		grep "pack-reused $count" stderr
+	'
+}
+
+# have_delta <obj> <expected_base>
+#
+# Note that because this relies on cat-file, it might find _any_ copy of an
+# object in the repository. The caller is responsible for making sure
+# there's only one (e.g., via "repack -ad", or having just fetched a copy).
+have_delta () {
+	echo $2 >expect &&
+	echo $1 | git cat-file --batch-check="%(deltabase)" >actual &&
+	test_cmp expect actual
+}
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index b02838750e..4318f84d53 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -25,93 +25,10 @@ has_any () {
 	grep -Ff "$1" "$2"
 }
 
-# To ensure the logic for "maximal commits" is exercised, make
-# the repository a bit more complicated.
-#
-#    other                         second
-#      *                             *
-# (99 commits)                  (99 commits)
-#      *                             *
-#      |\                           /|
-#      | * octo-other  octo-second * |
-#      |/|\_________  ____________/|\|
-#      | \          \/  __________/  |
-#      |  | ________/\ /             |
-#      *  |/          * merge-right  *
-#      | _|__________/ \____________ |
-#      |/ |                         \|
-# (l1) *  * merge-left               * (r1)
-#      | / \________________________ |
-#      |/                           \|
-# (l2) *                             * (r2)
-#       \___________________________ |
-#                                   \|
-#                                    * (base)
-#
-# We only push bits down the first-parent history, which
-# makes some of these commits unimportant!
-#
-# The important part for the maximal commit algorithm is how
-# the bitmasks are extended. Assuming starting bit positions
-# for second (bit 0) and other (bit 1), the bitmasks at the
-# end should be:
-#
-#      second: 1       (maximal, selected)
-#       other: 01      (maximal, selected)
-#      (base): 11 (maximal)
-#
-# This complicated history was important for a previous
-# version of the walk that guarantees never walking a
-# commit multiple times. That goal might be important
-# again, so preserve this complicated case. For now, this
-# test will guarantee that the bitmaps are computed
-# correctly, even with the repeat calculations.
+setup_bitmap_history
 
-test_expect_success 'setup repo with moderate-sized history' '
-	test_commit_bulk --id=file 10 &&
-	git branch -M second &&
-	git checkout -b other HEAD~5 &&
-	test_commit_bulk --id=side 10 &&
-
-	# add complicated history setup, including merges and
-	# ambiguous merge-bases
-
-	git checkout -b merge-left other~2 &&
-	git merge second~2 -m "merge-left" &&
-
-	git checkout -b merge-right second~1 &&
-	git merge other~1 -m "merge-right" &&
-
-	git checkout -b octo-second second &&
-	git merge merge-left merge-right -m "octopus-second" &&
-
-	git checkout -b octo-other other &&
-	git merge merge-left merge-right -m "octopus-other" &&
-
-	git checkout other &&
-	git merge octo-other -m "pull octopus" &&
-
-	git checkout second &&
-	git merge octo-second -m "pull octopus" &&
-
-	# Remove these branches so they are not selected
-	# as bitmap tips
-	git branch -D merge-left &&
-	git branch -D merge-right &&
-	git branch -D octo-other &&
-	git branch -D octo-second &&
-
-	# add padding to make these merges less interesting
-	# and avoid having them selected for bitmaps
-	test_commit_bulk --id=file 100 &&
-	git checkout other &&
-	test_commit_bulk --id=side 100 &&
-	git checkout second &&
-
-	bitmaptip=$(git rev-parse second) &&
-	blob=$(echo tagged-blob | git hash-object -w --stdin) &&
-	git tag tagged-blob $blob &&
-	git config repack.writebitmaps true
+test_expect_success 'setup writing bitmaps during repack' '
+	git config repack.writeBitmaps true
 '
 
 test_expect_success 'full repack creates bitmaps' '
@@ -123,109 +40,7 @@ test_expect_success 'full repack creates bitmaps' '
 	grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
 '
 
-test_expect_success 'rev-list --test-bitmap verifies bitmaps' '
-	git rev-list --test-bitmap HEAD
-'
-
-rev_list_tests_head () {
-	test_expect_success "counting commits via bitmap ($state, $branch)" '
-		git rev-list --count $branch >expect &&
-		git rev-list --use-bitmap-index --count $branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting partial commits via bitmap ($state, $branch)" '
-		git rev-list --count $branch~5..$branch >expect &&
-		git rev-list --use-bitmap-index --count $branch~5..$branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting commits with limit ($state, $branch)" '
-		git rev-list --count -n 1 $branch >expect &&
-		git rev-list --use-bitmap-index --count -n 1 $branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting non-linear history ($state, $branch)" '
-		git rev-list --count other...second >expect &&
-		git rev-list --use-bitmap-index --count other...second >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting commits with limiting ($state, $branch)" '
-		git rev-list --count $branch -- 1.t >expect &&
-		git rev-list --use-bitmap-index --count $branch -- 1.t >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting objects via bitmap ($state, $branch)" '
-		git rev-list --count --objects $branch >expect &&
-		git rev-list --use-bitmap-index --count --objects $branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "enumerate commits ($state, $branch)" '
-		git rev-list --use-bitmap-index $branch >actual &&
-		git rev-list $branch >expect &&
-		test_bitmap_traversal --no-confirm-bitmaps expect actual
-	'
-
-	test_expect_success "enumerate --objects ($state, $branch)" '
-		git rev-list --objects --use-bitmap-index $branch >actual &&
-		git rev-list --objects $branch >expect &&
-		test_bitmap_traversal expect actual
-	'
-
-	test_expect_success "bitmap --objects handles non-commit objects ($state, $branch)" '
-		git rev-list --objects --use-bitmap-index $branch tagged-blob >actual &&
-		grep $blob actual
-	'
-}
-
-rev_list_tests () {
-	state=$1
-
-	for branch in "second" "other"
-	do
-		rev_list_tests_head
-	done
-}
-
-rev_list_tests 'full bitmap'
-
-test_expect_success 'clone from bitmapped repository' '
-	git clone --no-local --bare . clone.git &&
-	git rev-parse HEAD >expect &&
-	git --git-dir=clone.git rev-parse HEAD >actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'partial clone from bitmapped repository' '
-	test_config uploadpack.allowfilter true &&
-	git clone --no-local --bare --filter=blob:none . partial-clone.git &&
-	(
-		cd partial-clone.git &&
-		pack=$(echo objects/pack/*.pack) &&
-		git verify-pack -v "$pack" >have &&
-		awk "/blob/ { print \$1 }" <have >blobs &&
-		# we expect this single blob because of the direct ref
-		git rev-parse refs/tags/tagged-blob >expect &&
-		test_cmp expect blobs
-	)
-'
-
-test_expect_success 'setup further non-bitmapped commits' '
-	test_commit_bulk --id=further 10
-'
-
-rev_list_tests 'partial bitmap'
-
-test_expect_success 'fetch (partial bitmap)' '
-	git --git-dir=clone.git fetch origin second:second &&
-	git rev-parse HEAD >expect &&
-	git --git-dir=clone.git rev-parse HEAD >actual &&
-	test_cmp expect actual
-'
+basic_bitmap_tests
 
 test_expect_success 'incremental repack fails when bitmaps are requested' '
 	test_commit more-1 &&
@@ -461,40 +276,6 @@ test_expect_success 'truncated bitmap fails gracefully (cache)' '
 	test_i18ngrep corrupted.bitmap.index stderr
 '
 
-test_expect_success 'enumerating progress counts pack-reused objects' '
-	count=$(git rev-list --objects --all --count) &&
-	git repack -adb &&
-
-	# check first with only reused objects; confirm that our progress
-	# showed the right number, and also that we did pack-reuse as expected.
-	# Check only the final "done" line of the meter (there may be an
-	# arbitrary number of intermediate lines ending with CR).
-	GIT_PROGRESS_DELAY=0 \
-		git pack-objects --all --stdout --progress \
-		</dev/null >/dev/null 2>stderr &&
-	grep "Enumerating objects: $count, done" stderr &&
-	grep "pack-reused $count" stderr &&
-
-	# now the same but with one non-reused object
-	git commit --allow-empty -m "an extra commit object" &&
-	GIT_PROGRESS_DELAY=0 \
-		git pack-objects --all --stdout --progress \
-		</dev/null >/dev/null 2>stderr &&
-	grep "Enumerating objects: $((count+1)), done" stderr &&
-	grep "pack-reused $count" stderr
-'
-
-# have_delta <obj> <expected_base>
-#
-# Note that because this relies on cat-file, it might find _any_ copy of an
-# object in the repository. The caller is responsible for making sure
-# there's only one (e.g., via "repack -ad", or having just fetched a copy).
-have_delta () {
-	echo $2 >expect &&
-	echo $1 | git cat-file --batch-check="%(deltabase)" >actual &&
-	test_cmp expect actual
-}
-
 # Create a state of history with these properties:
 #
 #  - refs that allow a client to fetch some new history, while sharing some old
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 19/27] t/helper/test-read-midx.c: add --checksum mode
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (17 preceding siblings ...)
  2021-08-31 20:52   ` [PATCH v5 18/27] t5310: move some tests to lib-bitmap.sh Taylor Blau
@ 2021-08-31 20:52   ` Taylor Blau
  2021-08-31 20:52   ` [PATCH v5 20/27] t5326: test multi-pack bitmap behavior Taylor Blau
                     ` (9 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:52 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Subsequent tests will want to check for the existence of a multi-pack
bitmap which matches the multi-pack-index stored in the pack directory.

The multi-pack bitmap includes the hex checksum of the MIDX it
corresponds to in its filename (for example,
'$packdir/multi-pack-index-<checksum>.bitmap'). As a result, some tests
want a way to learn what '<checksum>' is.

This helper addresses that need by printing the checksum of the
repository's multi-pack-index.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/helper/test-read-midx.c | 16 +++++++++++++++-
 t/lib-bitmap.sh           |  4 ++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 7c2eb11a8e..cb0d27049a 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -60,12 +60,26 @@ static int read_midx_file(const char *object_dir, int show_objects)
 	return 0;
 }
 
+static int read_midx_checksum(const char *object_dir)
+{
+	struct multi_pack_index *m;
+
+	setup_git_directory();
+	m = load_multi_pack_index(object_dir, 1);
+	if (!m)
+		return 1;
+	printf("%s\n", hash_to_hex(get_midx_checksum(m)));
+	return 0;
+}
+
 int cmd__read_midx(int argc, const char **argv)
 {
 	if (!(argc == 2 || argc == 3))
-		usage("read-midx [--show-objects] <object-dir>");
+		usage("read-midx [--show-objects|--checksum] <object-dir>");
 
 	if (!strcmp(argv[1], "--show-objects"))
 		return read_midx_file(argv[2], 1);
+	else if (!strcmp(argv[1], "--checksum"))
+		return read_midx_checksum(argv[2]);
 	return read_midx_file(argv[1], 0);
 }
diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index 77464da6fd..21d0392dda 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -260,3 +260,7 @@ have_delta () {
 	echo $1 | git cat-file --batch-check="%(deltabase)" >actual &&
 	test_cmp expect actual
 }
+
+midx_checksum () {
+	test-tool read-midx --checksum "$1"
+}
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 20/27] t5326: test multi-pack bitmap behavior
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (18 preceding siblings ...)
  2021-08-31 20:52   ` [PATCH v5 19/27] t/helper/test-read-midx.c: add --checksum mode Taylor Blau
@ 2021-08-31 20:52   ` Taylor Blau
  2021-08-31 20:52   ` [PATCH v5 21/27] t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP Taylor Blau
                     ` (8 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:52 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

This patch introduces a new test, t5326, which tests the basic
functionality of multi-pack bitmaps.

Some trivial behavior is tested, such as:

  - Whether bitmaps can be generated with more than one pack.
  - Whether clones can be served with all objects in the bitmap.
  - Whether follow-up fetches can be served with some objects outside of
    the server's bitmap

These use lib-bitmap's tests (which in turn were pulled from t5310), and
we cover cases where the MIDX represents both a single pack and multiple
packs.

In addition, some non-trivial and MIDX-specific behavior is tested, too,
including:

  - Whether multi-pack bitmaps behave correctly with respect to the
    pack-reuse machinery when the base for some object is selected from
    a different pack than the delta.
  - Whether multi-pack bitmaps correctly respect the
    pack.preferBitmapTips configuration.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5326-multi-pack-bitmaps.sh | 286 ++++++++++++++++++++++++++++++++++
 1 file changed, 286 insertions(+)
 create mode 100755 t/t5326-multi-pack-bitmaps.sh

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
new file mode 100755
index 0000000000..4ad7c2c969
--- /dev/null
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -0,0 +1,286 @@
+#!/bin/sh
+
+test_description='exercise basic multi-pack bitmap functionality'
+. ./test-lib.sh
+. "${TEST_DIRECTORY}/lib-bitmap.sh"
+
+# We'll be writing our own midx and bitmaps, so avoid getting confused by the
+# automatic ones.
+GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
+objdir=.git/objects
+midx=$objdir/pack/multi-pack-index
+
+# midx_pack_source <obj>
+midx_pack_source () {
+	test-tool read-midx --show-objects .git/objects | grep "^$1 " | cut -f2
+}
+
+setup_bitmap_history
+
+test_expect_success 'enable core.multiPackIndex' '
+	git config core.multiPackIndex true
+'
+
+test_expect_success 'create single-pack midx with bitmaps' '
+	git repack -ad &&
+	git multi-pack-index write --bitmap &&
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+	test_path_is_file $midx-$(midx_checksum $objdir).rev
+'
+
+basic_bitmap_tests
+
+test_expect_success 'create new additional packs' '
+	for i in $(test_seq 1 16)
+	do
+		test_commit "$i" &&
+		git repack -d || return 1
+	done &&
+
+	git checkout -b other2 HEAD~8 &&
+	for i in $(test_seq 1 8)
+	do
+		test_commit "side-$i" &&
+		git repack -d || return 1
+	done &&
+	git checkout second
+'
+
+test_expect_success 'create multi-pack midx with bitmaps' '
+	git multi-pack-index write --bitmap &&
+
+	ls $objdir/pack/pack-*.pack >packs &&
+	test_line_count = 25 packs &&
+
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+	test_path_is_file $midx-$(midx_checksum $objdir).rev
+'
+
+basic_bitmap_tests
+
+test_expect_success '--no-bitmap is respected when bitmaps exist' '
+	git multi-pack-index write --bitmap &&
+
+	test_commit respect--no-bitmap &&
+	git repack -d &&
+
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+	test_path_is_file $midx-$(midx_checksum $objdir).rev &&
+
+	git multi-pack-index write --no-bitmap &&
+
+	test_path_is_file $midx &&
+	test_path_is_missing $midx-$(midx_checksum $objdir).bitmap &&
+	test_path_is_missing $midx-$(midx_checksum $objdir).rev
+'
+
+test_expect_success 'setup midx with base from later pack' '
+	# Write a and b so that "a" is a delta on top of base "b", since Git
+	# prefers to delete contents out of a base rather than add to a shorter
+	# object.
+	test_seq 1 128 >a &&
+	test_seq 1 130 >b &&
+
+	git add a b &&
+	git commit -m "initial commit" &&
+
+	a=$(git rev-parse HEAD:a) &&
+	b=$(git rev-parse HEAD:b) &&
+
+	# In the first pack, "a" is stored as a delta to "b".
+	p1=$(git pack-objects .git/objects/pack/pack <<-EOF
+	$a
+	$b
+	EOF
+	) &&
+
+	# In the second pack, "a" is missing, and "b" is not a delta nor base to
+	# any other object.
+	p2=$(git pack-objects .git/objects/pack/pack <<-EOF
+	$b
+	$(git rev-parse HEAD)
+	$(git rev-parse HEAD^{tree})
+	EOF
+	) &&
+
+	git prune-packed &&
+	# Use the second pack as the preferred source, so that "b" occurs
+	# earlier in the MIDX object order, rendering "a" unusable for pack
+	# reuse.
+	git multi-pack-index write --bitmap --preferred-pack=pack-$p2.idx &&
+
+	have_delta $a $b &&
+	test $(midx_pack_source $a) != $(midx_pack_source $b)
+'
+
+rev_list_tests 'full bitmap with backwards delta'
+
+test_expect_success 'clone with bitmaps enabled' '
+	git clone --no-local --bare . clone-reverse-delta.git &&
+	test_when_finished "rm -fr clone-reverse-delta.git" &&
+
+	git rev-parse HEAD >expect &&
+	git --git-dir=clone-reverse-delta.git rev-parse HEAD >actual &&
+	test_cmp expect actual
+'
+
+bitmap_reuse_tests() {
+	from=$1
+	to=$2
+
+	test_expect_success "setup pack reuse tests ($from -> $to)" '
+		rm -fr repo &&
+		git init repo &&
+		(
+			cd repo &&
+			test_commit_bulk 16 &&
+			git tag old-tip &&
+
+			git config core.multiPackIndex true &&
+			if test "MIDX" = "$from"
+			then
+				git repack -Ad &&
+				git multi-pack-index write --bitmap
+			else
+				git repack -Adb
+			fi
+		)
+	'
+
+	test_expect_success "build bitmap from existing ($from -> $to)" '
+		(
+			cd repo &&
+			test_commit_bulk --id=further 16 &&
+			git tag new-tip &&
+
+			if test "MIDX" = "$to"
+			then
+				git repack -d &&
+				git multi-pack-index write --bitmap
+			else
+				git repack -Adb
+			fi
+		)
+	'
+
+	test_expect_success "verify resulting bitmaps ($from -> $to)" '
+		(
+			cd repo &&
+			git for-each-ref &&
+			git rev-list --test-bitmap refs/tags/old-tip &&
+			git rev-list --test-bitmap refs/tags/new-tip
+		)
+	'
+}
+
+bitmap_reuse_tests 'pack' 'MIDX'
+bitmap_reuse_tests 'MIDX' 'pack'
+bitmap_reuse_tests 'MIDX' 'MIDX'
+
+test_expect_success 'missing object closure fails gracefully' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit loose &&
+		test_commit packed &&
+
+		# Do not pass "--revs"; we want a pack without the "loose"
+		# commit.
+		git pack-objects $objdir/pack/pack <<-EOF &&
+		$(git rev-parse packed)
+		EOF
+
+		test_must_fail git multi-pack-index write --bitmap 2>err &&
+		grep "doesn.t have full closure" err &&
+		test_path_is_missing $midx
+	)
+'
+
+test_expect_success 'setup partial bitmaps' '
+	test_commit packed &&
+	git repack &&
+	test_commit loose &&
+	git multi-pack-index write --bitmap 2>err &&
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+	test_path_is_file $midx-$(midx_checksum $objdir).rev
+'
+
+basic_bitmap_tests HEAD~
+
+test_expect_success 'removing a MIDX clears stale bitmaps' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+		test_commit base &&
+		git repack &&
+		git multi-pack-index write --bitmap &&
+
+		# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
+		stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
+		stale_rev=$midx-$(midx_checksum $objdir).rev &&
+		rm $midx &&
+
+		# Then write a new MIDX.
+		test_commit new &&
+		git repack &&
+		git multi-pack-index write --bitmap &&
+
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+		test_path_is_file $midx-$(midx_checksum $objdir).rev &&
+		test_path_is_missing $stale_bitmap &&
+		test_path_is_missing $stale_rev
+	)
+'
+
+test_expect_success 'pack.preferBitmapTips' '
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit_bulk --message="%s" 103 &&
+
+		git log --format="%H" >commits.raw &&
+		sort <commits.raw >commits &&
+
+		git log --format="create refs/tags/%s %H" HEAD >refs &&
+		git update-ref --stdin <refs &&
+
+		git multi-pack-index write --bitmap &&
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+		test_path_is_file $midx-$(midx_checksum $objdir).rev &&
+
+		test-tool bitmap list-commits | sort >bitmaps &&
+		comm -13 bitmaps commits >before &&
+		test_line_count = 1 before &&
+
+		perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
+			<before | git update-ref --stdin &&
+
+		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
+		rm -fr $midx-$(midx_checksum $objdir).rev &&
+		rm -fr $midx &&
+
+		git -c pack.preferBitmapTips=refs/tags/include \
+			multi-pack-index write --bitmap &&
+		test-tool bitmap list-commits | sort >bitmaps &&
+		comm -13 bitmaps commits >after &&
+
+		! test_cmp before after
+	)
+'
+
+test_done
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 21/27] t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (19 preceding siblings ...)
  2021-08-31 20:52   ` [PATCH v5 20/27] t5326: test multi-pack bitmap behavior Taylor Blau
@ 2021-08-31 20:52   ` Taylor Blau
  2021-08-31 20:52   ` [PATCH v5 22/27] t5310: " Taylor Blau
                     ` (7 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:52 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

From: Jeff King <peff@peff.net>

Generating a MIDX bitmap causes tests which repack in a partial clone to
fail because they are missing objects. Missing objects is an expected
component of tests in t0410, so disable this knob altogether. Graceful
degradation when writing a bitmap with missing objects is tested in
t5326.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t0410-partial-clone.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index bbcc51ee8e..bba679685f 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -4,6 +4,9 @@ test_description='partial clone'
 
 . ./test-lib.sh
 
+# missing promisor objects cause repacks which write bitmaps to fail
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
 delete_object () {
 	rm $1/.git/objects/$(echo $2 | sed -e 's|^..|&/|')
 }
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 22/27] t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (20 preceding siblings ...)
  2021-08-31 20:52   ` [PATCH v5 21/27] t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP Taylor Blau
@ 2021-08-31 20:52   ` Taylor Blau
  2021-08-31 20:52   ` [PATCH v5 23/27] t5319: don't write MIDX bitmaps in t5319 Taylor Blau
                     ` (6 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:52 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

From: Jeff King <peff@peff.net>

Generating a MIDX bitmap confuses many of the tests in t5310, which
expect to control whether and how bitmaps are written. Since the
relevant MIDX-bitmap tests here are covered already in t5326, let's just
disable the flag for the whole t5310 script.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5310-pack-bitmaps.sh | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 4318f84d53..673baa5c3c 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -8,6 +8,10 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 . "$TEST_DIRECTORY"/lib-bundle.sh
 . "$TEST_DIRECTORY"/lib-bitmap.sh
 
+# t5310 deals only with single-pack bitmaps, so don't write MIDX bitmaps in
+# their place.
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
 objpath () {
 	echo ".git/objects/$(echo "$1" | sed -e 's|\(..\)|\1/|')"
 }
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 23/27] t5319: don't write MIDX bitmaps in t5319
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (21 preceding siblings ...)
  2021-08-31 20:52   ` [PATCH v5 22/27] t5310: " Taylor Blau
@ 2021-08-31 20:52   ` Taylor Blau
  2021-08-31 20:52   ` [PATCH v5 24/27] t7700: update to work with MIDX bitmap test knob Taylor Blau
                     ` (5 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:52 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

This test is specifically about generating a midx still respecting a
pack-based bitmap file. Generating a MIDX bitmap would confuse the test.
Let's override the 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' variable to
make sure we don't do so.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5319-multi-pack-index.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index d7e4988f2b..b3f9f3969d 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -532,7 +532,8 @@ test_expect_success 'repack preserves multi-pack-index when creating packs' '
 compare_results_with_midx "after repack"
 
 test_expect_success 'multi-pack-index and pack-bitmap' '
-	git -c repack.writeBitmaps=true repack -ad &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -c repack.writeBitmaps=true repack -ad &&
 	git multi-pack-index write &&
 	git rev-list --test-bitmap HEAD
 '
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 24/27] t7700: update to work with MIDX bitmap test knob
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (22 preceding siblings ...)
  2021-08-31 20:52   ` [PATCH v5 23/27] t5319: don't write MIDX bitmaps in t5319 Taylor Blau
@ 2021-08-31 20:52   ` Taylor Blau
  2021-08-31 20:52   ` [PATCH v5 25/27] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
                     ` (4 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:52 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A number of these tests are focused only on pack-based bitmaps and need
to be updated to disable 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' where
necessary.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t7700-repack.sh | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 25b235c063..98eda3bfeb 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -63,13 +63,14 @@ test_expect_success 'objects in packs marked .keep are not repacked' '
 
 test_expect_success 'writing bitmaps via command-line can duplicate .keep objects' '
 	# build on $oid, $packid, and .keep state from previous
-	git repack -Adbl &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 git repack -Adbl &&
 	test_has_duplicate_object true
 '
 
 test_expect_success 'writing bitmaps via config can duplicate .keep objects' '
 	# build on $oid, $packid, and .keep state from previous
-	git -c repack.writebitmaps=true repack -Adl &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -c repack.writebitmaps=true repack -Adl &&
 	test_has_duplicate_object true
 '
 
@@ -189,7 +190,9 @@ test_expect_success 'repack --keep-pack' '
 
 test_expect_success 'bitmaps are created by default in bare repos' '
 	git clone --bare .git bare.git &&
-	git -C bare.git repack -ad &&
+	rm -f bare.git/objects/pack/*.bitmap &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -C bare.git repack -ad &&
 	bitmap=$(ls bare.git/objects/pack/*.bitmap) &&
 	test_path_is_file "$bitmap"
 '
@@ -200,7 +203,8 @@ test_expect_success 'incremental repack does not complain' '
 '
 
 test_expect_success 'bitmaps can be disabled on bare repos' '
-	git -c repack.writeBitmaps=false -C bare.git repack -ad &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -c repack.writeBitmaps=false -C bare.git repack -ad &&
 	bitmap=$(ls bare.git/objects/pack/*.bitmap || :) &&
 	test -z "$bitmap"
 '
@@ -211,7 +215,8 @@ test_expect_success 'no bitmaps created if .keep files present' '
 	keep=${pack%.pack}.keep &&
 	test_when_finished "rm -f \"\$keep\"" &&
 	>"$keep" &&
-	git -C bare.git repack -ad 2>stderr &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -C bare.git repack -ad 2>stderr &&
 	test_must_be_empty stderr &&
 	find bare.git/objects/pack/ -type f -name "*.bitmap" >actual &&
 	test_must_be_empty actual
@@ -222,7 +227,8 @@ test_expect_success 'auto-bitmaps do not complain if unavailable' '
 	blob=$(test-tool genrandom big $((1024*1024)) |
 	       git -C bare.git hash-object -w --stdin) &&
 	git -C bare.git update-ref refs/tags/big $blob &&
-	git -C bare.git repack -ad 2>stderr &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -C bare.git repack -ad 2>stderr &&
 	test_must_be_empty stderr &&
 	find bare.git/objects/pack -type f -name "*.bitmap" >actual &&
 	test_must_be_empty actual
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 25/27] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (23 preceding siblings ...)
  2021-08-31 20:52   ` [PATCH v5 24/27] t7700: update to work with MIDX bitmap test knob Taylor Blau
@ 2021-08-31 20:52   ` Taylor Blau
  2021-08-31 20:52   ` [PATCH v5 26/27] p5310: extract full and partial bitmap tests Taylor Blau
                     ` (3 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:52 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Introduce a new 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' environment
variable to also write a multi-pack bitmap when
'GIT_TEST_MULTI_PACK_INDEX' is set.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c          | 12 ++++++++++--
 ci/run-build-and-tests.sh |  1 +
 midx.h                    |  2 ++
 t/README                  |  4 ++++
 4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 5f9bc74adc..82ab668272 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -515,6 +515,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		if (!(pack_everything & ALL_INTO_ONE) ||
 		    !is_bare_repository())
 			write_bitmaps = 0;
+	} else if (write_bitmaps &&
+		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0) &&
+		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0)) {
+		write_bitmaps = 0;
 	}
 	if (pack_kept_objects < 0)
 		pack_kept_objects = write_bitmaps > 0;
@@ -725,8 +729,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		update_server_info(0);
 	remove_temporary_files();
 
-	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
-		write_midx_file(get_object_directory(), NULL, 0);
+	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0)) {
+		unsigned flags = 0;
+		if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0))
+			flags |= MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX;
+		write_midx_file(get_object_directory(), NULL, flags);
+	}
 
 	string_list_clear(&names, 0);
 	string_list_clear(&rollback, 0);
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index 3ce81ffee9..7ee9ba9325 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -23,6 +23,7 @@ linux-gcc)
 	export GIT_TEST_COMMIT_GRAPH=1
 	export GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1
 	export GIT_TEST_MULTI_PACK_INDEX=1
+	export GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1
 	export GIT_TEST_ADD_I_USE_BUILTIN=1
 	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
 	export GIT_TEST_WRITE_REV_INDEX=1
diff --git a/midx.h b/midx.h
index 350f4d0a7b..aa3da557bb 100644
--- a/midx.h
+++ b/midx.h
@@ -8,6 +8,8 @@ struct pack_entry;
 struct repository;
 
 #define GIT_TEST_MULTI_PACK_INDEX "GIT_TEST_MULTI_PACK_INDEX"
+#define GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP \
+	"GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP"
 
 struct multi_pack_index {
 	struct multi_pack_index *next;
diff --git a/t/README b/t/README
index 9e70122302..12014aa988 100644
--- a/t/README
+++ b/t/README
@@ -425,6 +425,10 @@ GIT_TEST_MULTI_PACK_INDEX=<boolean>, when true, forces the multi-pack-
 index to be written after every 'git repack' command, and overrides the
 'core.multiPackIndex' setting to true.
 
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=<boolean>, when true, sets the
+'--bitmap' option on all invocations of 'git multi-pack-index write',
+and ignores pack-objects' '--write-bitmap-index'.
+
 GIT_TEST_SIDEBAND_ALL=<boolean>, when true, overrides the
 'uploadpack.allowSidebandAll' setting to true, and when false, forces
 fetch-pack to not request sideband-all (even if the server advertises
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 26/27] p5310: extract full and partial bitmap tests
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (24 preceding siblings ...)
  2021-08-31 20:52   ` [PATCH v5 25/27] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
@ 2021-08-31 20:52   ` Taylor Blau
  2021-08-31 20:52   ` [PATCH v5 27/27] p5326: perf tests for MIDX bitmaps Taylor Blau
                     ` (2 subsequent siblings)
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:52 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A new p5326 introduced by the next patch will want these same tests,
interjecting its own setup in between. Move them out so that both perf
tests can reuse them.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/lib-bitmap.sh         | 69 ++++++++++++++++++++++++++++++++++++
 t/perf/p5310-pack-bitmaps.sh | 65 ++-------------------------------
 2 files changed, 72 insertions(+), 62 deletions(-)
 create mode 100644 t/perf/lib-bitmap.sh

diff --git a/t/perf/lib-bitmap.sh b/t/perf/lib-bitmap.sh
new file mode 100644
index 0000000000..63d3bc7cec
--- /dev/null
+++ b/t/perf/lib-bitmap.sh
@@ -0,0 +1,69 @@
+# Helper functions for testing bitmap performance; see p5310.
+
+test_full_bitmap () {
+	test_perf 'simulated clone' '
+		git pack-objects --stdout --all </dev/null >/dev/null
+	'
+
+	test_perf 'simulated fetch' '
+		have=$(git rev-list HEAD~100 -1) &&
+		{
+			echo HEAD &&
+			echo ^$have
+		} | git pack-objects --revs --stdout >/dev/null
+	'
+
+	test_perf 'pack to file (bitmap)' '
+		git pack-objects --use-bitmap-index --all pack1b </dev/null >/dev/null
+	'
+
+	test_perf 'rev-list (commits)' '
+		git rev-list --all --use-bitmap-index >/dev/null
+	'
+
+	test_perf 'rev-list (objects)' '
+		git rev-list --all --use-bitmap-index --objects >/dev/null
+	'
+
+	test_perf 'rev-list with tag negated via --not --all (objects)' '
+		git rev-list perf-tag --not --all --use-bitmap-index --objects >/dev/null
+	'
+
+	test_perf 'rev-list with negative tag (objects)' '
+		git rev-list HEAD --not perf-tag --use-bitmap-index --objects >/dev/null
+	'
+
+	test_perf 'rev-list count with blob:none' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=blob:none >/dev/null
+	'
+
+	test_perf 'rev-list count with blob:limit=1k' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=blob:limit=1k >/dev/null
+	'
+
+	test_perf 'rev-list count with tree:0' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=tree:0 >/dev/null
+	'
+
+	test_perf 'simulated partial clone' '
+		git pack-objects --stdout --all --filter=blob:none </dev/null >/dev/null
+	'
+}
+
+test_partial_bitmap () {
+	test_perf 'clone (partial bitmap)' '
+		git pack-objects --stdout --all </dev/null >/dev/null
+	'
+
+	test_perf 'pack to file (partial bitmap)' '
+		git pack-objects --use-bitmap-index --all pack2b </dev/null >/dev/null
+	'
+
+	test_perf 'rev-list with tree filter (partial bitmap)' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=tree:0 >/dev/null
+	'
+}
diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index 452be01056..7ad4f237bc 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -2,6 +2,7 @@
 
 test_description='Tests pack performance using bitmaps'
 . ./perf-lib.sh
+. "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
 
 test_perf_large_repo
 
@@ -25,56 +26,7 @@ test_perf 'repack to disk' '
 	git repack -ad
 '
 
-test_perf 'simulated clone' '
-	git pack-objects --stdout --all </dev/null >/dev/null
-'
-
-test_perf 'simulated fetch' '
-	have=$(git rev-list HEAD~100 -1) &&
-	{
-		echo HEAD &&
-		echo ^$have
-	} | git pack-objects --revs --stdout >/dev/null
-'
-
-test_perf 'pack to file (bitmap)' '
-	git pack-objects --use-bitmap-index --all pack1b </dev/null >/dev/null
-'
-
-test_perf 'rev-list (commits)' '
-	git rev-list --all --use-bitmap-index >/dev/null
-'
-
-test_perf 'rev-list (objects)' '
-	git rev-list --all --use-bitmap-index --objects >/dev/null
-'
-
-test_perf 'rev-list with tag negated via --not --all (objects)' '
-	git rev-list perf-tag --not --all --use-bitmap-index --objects >/dev/null
-'
-
-test_perf 'rev-list with negative tag (objects)' '
-	git rev-list HEAD --not perf-tag --use-bitmap-index --objects >/dev/null
-'
-
-test_perf 'rev-list count with blob:none' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=blob:none >/dev/null
-'
-
-test_perf 'rev-list count with blob:limit=1k' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=blob:limit=1k >/dev/null
-'
-
-test_perf 'rev-list count with tree:0' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=tree:0 >/dev/null
-'
-
-test_perf 'simulated partial clone' '
-	git pack-objects --stdout --all --filter=blob:none </dev/null >/dev/null
-'
+test_full_bitmap
 
 test_expect_success 'create partial bitmap state' '
 	# pick a commit to represent the repo tip in the past
@@ -97,17 +49,6 @@ test_expect_success 'create partial bitmap state' '
 	git update-ref HEAD $orig_tip
 '
 
-test_perf 'clone (partial bitmap)' '
-	git pack-objects --stdout --all </dev/null >/dev/null
-'
-
-test_perf 'pack to file (partial bitmap)' '
-	git pack-objects --use-bitmap-index --all pack2b </dev/null >/dev/null
-'
-
-test_perf 'rev-list with tree filter (partial bitmap)' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=tree:0 >/dev/null
-'
+test_partial_bitmap
 
 test_done
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* [PATCH v5 27/27] p5326: perf tests for MIDX bitmaps
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (25 preceding siblings ...)
  2021-08-31 20:52   ` [PATCH v5 26/27] p5310: extract full and partial bitmap tests Taylor Blau
@ 2021-08-31 20:52   ` Taylor Blau
  2021-09-01 18:07   ` [PATCH v5 00/27] multi-pack reachability bitmaps Junio C Hamano
  2021-09-02  9:45   ` Jeff King
  28 siblings, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-08-31 20:52 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

These new performance tests demonstrate effectively the same behavior as
p5310, but use a multi-pack bitmap instead of a single-pack one.

Notably, p5326 does not create a MIDX bitmap with multiple packs. This
is so we can measure a direct comparison between it and p5310. Any
difference between the two is measuring just the overhead of using MIDX
bitmaps.

Here are the results of p5310 and p5326 together, measured at the same
time and on the same machine (using a Xenon W-2255 CPU):

    Test                                                  HEAD
    ------------------------------------------------------------------------
    5310.2: repack to disk                                96.78(93.39+11.33)
    5310.3: simulated clone                               9.98(9.79+0.19)
    5310.4: simulated fetch                               1.75(4.26+0.19)
    5310.5: pack to file (bitmap)                         28.20(27.87+8.70)
    5310.6: rev-list (commits)                            0.41(0.36+0.05)
    5310.7: rev-list (objects)                            1.61(1.54+0.07)
    5310.8: rev-list count with blob:none                 0.25(0.21+0.04)
    5310.9: rev-list count with blob:limit=1k             2.65(2.54+0.10)
    5310.10: rev-list count with tree:0                   0.23(0.19+0.04)
    5310.11: simulated partial clone                      4.34(4.21+0.12)
    5310.13: clone (partial bitmap)                       11.05(12.21+0.48)
    5310.14: pack to file (partial bitmap)                31.25(34.22+3.70)
    5310.15: rev-list with tree filter (partial bitmap)   0.26(0.22+0.04)

versus the same tests (this time using a multi-pack index):

    Test                                                  HEAD
    ------------------------------------------------------------------------
    5326.2: setup multi-pack index                        78.99(75.29+11.58)
    5326.3: simulated clone                               11.78(11.56+0.22)
    5326.4: simulated fetch                               1.70(4.49+0.13)
    5326.5: pack to file (bitmap)                         28.02(27.72+8.76)
    5326.6: rev-list (commits)                            0.42(0.36+0.06)
    5326.7: rev-list (objects)                            1.65(1.58+0.06)
    5326.8: rev-list count with blob:none                 0.26(0.21+0.05)
    5326.9: rev-list count with blob:limit=1k             2.97(2.86+0.10)
    5326.10: rev-list count with tree:0                   0.25(0.20+0.04)
    5326.11: simulated partial clone                      5.65(5.49+0.16)
    5326.13: clone (partial bitmap)                       12.22(13.43+0.38)
    5326.14: pack to file (partial bitmap)                30.05(31.57+7.25)
    5326.15: rev-list with tree filter (partial bitmap)   0.24(0.20+0.04)

There is slight overhead in "simulated clone", "simulated partial
clone", and "clone (partial bitmap)". Unsurprisingly, that overhead is
due to using the MIDX's reverse index to map between bit positions and
MIDX positions.

This can be reproduced by running "git repack -adb" along with "git
multi-pack-index write --bitmap" in a large-ish repository. Then run:

    $ perf record -o pack.perf git -c core.multiPackIndex=false \
      pack-objects --all --stdout >/dev/null </dev/null
    $ perf record -o midx.perf git -c core.multiPackIndex=true \
      pack-objects --all --stdout >/dev/null </dev/null

and compare the two with "perf diff -c delta -o 1 pack.perf midx.perf".
The most notable results are below (the next largest positive delta is
+0.14%):

    # Event 'cycles'
    #
    # Baseline    Delta  Shared Object       Symbol
    # ........  .......  ..................  ..........................
    #
                 +5.86%  git                 [.] nth_midxed_offset
                 +5.24%  git                 [.] nth_midxed_pack_int_id
         3.45%   +0.97%  git                 [.] offset_to_pack_pos
         3.30%   +0.57%  git                 [.] pack_pos_to_offset
                 +0.30%  git                 [.] pack_pos_to_midx

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/p5326-multi-pack-bitmaps.sh | 43 ++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)
 create mode 100755 t/perf/p5326-multi-pack-bitmaps.sh

diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
new file mode 100755
index 0000000000..5845109ac7
--- /dev/null
+++ b/t/perf/p5326-multi-pack-bitmaps.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+
+test_description='Tests performance using midx bitmaps'
+. ./perf-lib.sh
+. "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
+
+test_perf_large_repo
+
+test_expect_success 'enable multi-pack index' '
+	git config core.multiPackIndex true
+'
+
+test_perf 'setup multi-pack index' '
+	git repack -ad &&
+	git multi-pack-index write --bitmap
+'
+
+test_full_bitmap
+
+test_expect_success 'create partial bitmap state' '
+	# pick a commit to represent the repo tip in the past
+	cutoff=$(git rev-list HEAD~100 -1) &&
+	orig_tip=$(git rev-parse HEAD) &&
+
+	# now pretend we have just one tip
+	rm -rf .git/logs .git/refs/* .git/packed-refs &&
+	git update-ref HEAD $cutoff &&
+
+	# and then repack, which will leave us with a nice
+	# big bitmap pack of the "old" history, and all of
+	# the new history will be loose, as if it had been pushed
+	# up incrementally and exploded via unpack-objects
+	git repack -Ad &&
+	git multi-pack-index write --bitmap &&
+
+	# and now restore our original tip, as if the pushes
+	# had happened
+	git update-ref HEAD $orig_tip
+'
+
+test_partial_bitmap
+
+test_done
-- 
2.33.0.96.g73915697e6

^ permalink raw reply related	[flat|nested] 273+ messages in thread

* Re: [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX
  2021-08-31 16:33                               ` Junio C Hamano
  2021-08-31 16:43                                 ` Taylor Blau
@ 2021-09-01 10:03                                 ` Jeff King
  1 sibling, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-09-01 10:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, Taylor Blau, git, dstolee, jonathantanmy

On Tue, Aug 31, 2021 at 09:33:38AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >> I think taking a look to see if ../config exists to use the data
> >> might be helpful for some cases, but should not be a blocker for
> >> completing the requested operation. The config from the non-alternate
> >> repo should be sufficient for this (somewhat strange) case.
> >
> > Yes, agreed. We have long supported these kind of "bare" alternates, and
> > I wouldn't be surprised if they are in wide use (though I do wonder how
> > folks actually modify them, since most commands that touch objects
> > really do want to be in a repository).
> 
> I kind of find the above two somewhat surprising, but I am willing
> to go with the less safer option if that is what people want.
> 
> It has been perfectly OK in the pre-alternative-hash-algorithms
> world, but we no longer live in such a world, so we'd need to come
> up with a way to keep using alternates in a safer way.

I think the point is that most people _do_ still live in that world.
They have not started using the new hash algorithm yet, and what they
have been doing for years will continue to work. Likewise, once they
switch, things will continue to work as long as each repo's alternates
use the same hash.

So my reasoning was less "this is useful, and a good idea" and more "it
works now, and will probably continue to work OK in practice, so taking
it away will probably bother people".

Now if somebody wants to make an argument that they are not actually
workable now, I could buy that. ;) You cannot even run "pack-objects"
without a repository, though it is not too hard to copy the result
around.

> > But I suspect all of this is moot for now, beyond being able to return a
> > nicer error message. The rest of the code is not at all ready to handle
> > packs with two different hashes in the same process.
> 
> I do not think it is all that urgent to make it possible for packs
> with different algorithms to be used.  It is sufficient to _ignore_
> (or error out) configured odb that is incompatible with the current
> repository.

Yes, I think that would be an improvement. I just don't find it all that
urgent, since they're likely to get an error anyway (just probably one
that is more mysterious). Given the work involved to even detect the
situation, it doesn't seem like that high a priority to me.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v5 00/27] multi-pack reachability bitmaps
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (26 preceding siblings ...)
  2021-08-31 20:52   ` [PATCH v5 27/27] p5326: perf tests for MIDX bitmaps Taylor Blau
@ 2021-09-01 18:07   ` Junio C Hamano
  2021-09-01 19:08     ` Taylor Blau
  2021-09-02  9:45   ` Jeff King
  28 siblings, 1 reply; 273+ messages in thread
From: Junio C Hamano @ 2021-09-01 18:07 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, jonathantanmy

Taylor Blau <me@ttaylorr.com> writes:

> Here is another version of the multi-pack reachability bitmaps series. It is
> virtually unchanged since last time.
>
> The changes that did occur is that I integrated Johannes' patch from [1] to fix
> cleaning up MIDX .rev and .bitmap files when using `--object-dir`. That inspired
> a lengthy discussion [2] about `--object-dir`, alternates, object-format and
> running the MIDX builtin outside of a Git repository.
>
> This series resolves that discussion by leaving everything as-is, and only
> changing the following:
>
>   - `git multi-pack-index` will not run when outside of a Git
>     repository.
>
>   - The `--object-dir` argument will only recognize object directories
>     belonging to an alternate of the current repository.
>
>   - Using `--object-dir` to point to a repository which uses a
>     different hash than the repository in the current working directory
>     will continue to not work (as was the case before this series).
>
> And because this incorporates [1], we will also not accidentally clean `.rev`
> files from the wrong object directory.
>
> I think that this version is ready-to-go, and that we can turn our attention to
> squashing some of these cross-alternate buglets, and integrating MIDX bitmaps
> with `git repack`.

Thanks.

>     +@@ Documentation/git-multi-pack-index.txt: OPTIONS
>     + 	Use given directory for the location of Git objects. We check
>     + 	`<dir>/packs/multi-pack-index` for the current MIDX file, and
>     + 	`<dir>/packs` for the pack-files to index.
>     +++
>     ++`<dir>` must be an alternate of the current repository.

After replacing the previous round with this round and running "git
diff @{1}" on the branch, I noticed this documentation update, but
did't find any new code that tries to ensure that the requirement is
met.  It's a bit curious omission.

I think it is OK to allow running this command on <dir> and then add
it as a new alternate (iow, the <dir> being an alternate is not a
strict requirement for correct computation and writing of the midx,
even though it may be a requirement for correct use of the resulting
midx), so perhaps that is where the lack of validation comes from?

THanks.


^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v5 00/27] multi-pack reachability bitmaps
  2021-09-01 18:07   ` [PATCH v5 00/27] multi-pack reachability bitmaps Junio C Hamano
@ 2021-09-01 19:08     ` Taylor Blau
  2021-09-01 19:23       ` Junio C Hamano
  0 siblings, 1 reply; 273+ messages in thread
From: Taylor Blau @ 2021-09-01 19:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, peff, dstolee, jonathantanmy

On Wed, Sep 01, 2021 at 11:07:59AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> >     +@@ Documentation/git-multi-pack-index.txt: OPTIONS
> >     + 	Use given directory for the location of Git objects. We check
> >     + 	`<dir>/packs/multi-pack-index` for the current MIDX file, and
> >     + 	`<dir>/packs` for the pack-files to index.
> >     +++
> >     ++`<dir>` must be an alternate of the current repository.
>
> After replacing the previous round with this round and running "git
> diff @{1}" on the branch, I noticed this documentation update, but
> did't find any new code that tries to ensure that the requirement is
> met.  It's a bit curious omission.
>
> I think it is OK to allow running this command on <dir> and then add
> it as a new alternate (iow, the <dir> being an alternate is not a
> strict requirement for correct computation and writing of the midx,
> even though it may be a requirement for correct use of the resulting
> midx), so perhaps that is where the lack of validation comes from?

I wasn't sure whether to include it or not, since we technically will
still write a MIDX in that object directory (alternate or not), but we
won't load up an existing MIDX that is already there to reference. So
we'll get the same result, just slower.

I'm comfortable with saying what's written in the documentation, since
even though it happens to work today, we should leave ourselves open to
not supporting directories which aren't alternates.

But I'm equally OK if you would rather drop this hunk from the
documentation when staging.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v5 00/27] multi-pack reachability bitmaps
  2021-09-01 19:08     ` Taylor Blau
@ 2021-09-01 19:23       ` Junio C Hamano
  2021-09-01 20:34         ` Taylor Blau
  0 siblings, 1 reply; 273+ messages in thread
From: Junio C Hamano @ 2021-09-01 19:23 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, jonathantanmy

Taylor Blau <me@ttaylorr.com> writes:

> I'm comfortable with saying what's written in the documentation, since
> even though it happens to work today, we should leave ourselves open to
> not supporting directories which aren't alternates.
>
> But I'm equally OK if you would rather drop this hunk from the
> documentation when staging.

Oh, no, don't get me wrong.  I am comfortable with the documented
limitation, as that is what the area experts have agreed that is
reasonable given the expected use case.

I however am much less comfortable with a documented limitation that
we make no attempt to enforce, and that is why the first thing I
looked for after seeing the documentation update was new code to
make sure we reject a random directory that is not our alternate
object store.

Thanks.

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v5 00/27] multi-pack reachability bitmaps
  2021-09-01 19:23       ` Junio C Hamano
@ 2021-09-01 20:34         ` Taylor Blau
  2021-09-01 20:49           ` Junio C Hamano
  2021-09-02  9:38           ` Jeff King
  0 siblings, 2 replies; 273+ messages in thread
From: Taylor Blau @ 2021-09-01 20:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, peff, dstolee, jonathantanmy

On Wed, Sep 01, 2021 at 12:23:26PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > I'm comfortable with saying what's written in the documentation, since
> > even though it happens to work today, we should leave ourselves open to
> > not supporting directories which aren't alternates.
> >
> > But I'm equally OK if you would rather drop this hunk from the
> > documentation when staging.
>
> Oh, no, don't get me wrong.  I am comfortable with the documented
> limitation, as that is what the area experts have agreed that is
> reasonable given the expected use case.
>
> I however am much less comfortable with a documented limitation that
> we make no attempt to enforce, and that is why the first thing I
> looked for after seeing the documentation update was new code to
> make sure we reject a random directory that is not our alternate
> object store.

Sure, I don't mind getting more strict here in this series. If you want,
the below could be queued instead of the original 11/27:

--- 8< ---

Subject: [PATCH] midx: avoid opening multiple MIDXs when writing

Opening multiple instance of the same MIDX can lead to problems like two
separate packed_git structures which represent the same pack being added
to the repository's object store.

The above scenario can happen because prepare_midx_pack() checks if
`m->packs[pack_int_id]` is NULL in order to determine if a pack has been
opened and installed in the repository before. But a caller can
construct two copies of the same MIDX by calling get_multi_pack_index()
and load_multi_pack_index() since the former manipulates the
object store directly but the latter is a lower-level routine which
allocates a new MIDX for each call.

So if prepare_midx_pack() is called on multiple MIDXs with the same
pack_int_id, then that pack will be installed twice in the object
store's packed_git pointer.

This can lead to problems in, for e.g., the pack-bitmap code, which does
something like the following (in pack-bitmap.c:open_pack_bitmap()):

    struct bitmap_index *bitmap_git = ...;
    for (p = get_all_packs(r); p; p = p->next) {
      if (open_pack_bitmap_1(bitmap_git, p) == 0)
        ret = 0;
    }

which is a problem if two copies of the same pack exist in the
packed_git list because pack-bitmap.c:open_pack_bitmap_1() contains a
conditional like the following:

    if (bitmap_git->pack || bitmap_git->midx) {
      /* ignore extra bitmap file; we can only handle one */
      warning("ignoring extra bitmap file: %s", packfile->pack_name);
      close(fd);
      return -1;
    }

Avoid this scenario by not letting write_midx_internal() open a MIDX
that isn't also pointed at by the object store. So long as this is the
case, other routines should prefer to open MIDXs with
get_multi_pack_index() or reprepare_packed_git() instead of creating
instances on their own. Because get_multi_pack_index() returns
`r->object_store->multi_pack_index` if it is non-NULL, we'll only have
one instance of a MIDX open at one time, avoiding these problems.

To encourage this, drop the `struct multi_pack_index *` parameter from
`write_midx_internal()`, and rely instead on the `object_dir` to find
(or initialize) the correct MIDX instance.

Likewise, replace the call to `close_midx()` with
`close_object_store()`, since we're about to replace the MIDX with a new
one and should invalidate the object store's memory of any MIDX that
might have existed beforehand.

Note that this now forbids passing object directories that don't belong
to alternate repositories over `--object-dir`, since before we would
have happily opened a MIDX in any directory, but now restrict ourselves
to only those reachable by `r->objects->multi_pack_index` (and alternate
MIDXs that we can see by walking the `next` pointer).

As far as I can tell, supporting arbitrary directories with
`--object-dir` was a historical accident, since even the documentation
says `<alt>` when referring to the value passed to this option.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt |  2 ++
 builtin/commit-graph.c                 | 22 -------------------
 midx.c                                 | 29 ++++++++++++++++----------
 object-file.c                          | 21 +++++++++++++++++++
 object-store.h                         |  1 +
 t/t5319-multi-pack-index.sh            | 10 ++++++++-
 6 files changed, 51 insertions(+), 34 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index c9b063d31e..0af6beb2dd 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -23,6 +23,8 @@ OPTIONS
 	Use given directory for the location of Git objects. We check
 	`<dir>/packs/multi-pack-index` for the current MIDX file, and
 	`<dir>/packs` for the pack-files to index.
++
+`<dir>` must be an alternate of the current repository.

 --[no-]progress::
 	Turn progress on/off explicitly. If neither is specified, progress is
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index cd86315221..003eaaac5c 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -43,28 +43,6 @@ static struct opts_commit_graph {
 	int enable_changed_paths;
 } opts;

-static struct object_directory *find_odb(struct repository *r,
-					 const char *obj_dir)
-{
-	struct object_directory *odb;
-	char *obj_dir_real = real_pathdup(obj_dir, 1);
-	struct strbuf odb_path_real = STRBUF_INIT;
-
-	prepare_alt_odb(r);
-	for (odb = r->objects->odb; odb; odb = odb->next) {
-		strbuf_realpath(&odb_path_real, odb->path, 1);
-		if (!strcmp(obj_dir_real, odb_path_real.buf))
-			break;
-	}
-
-	free(obj_dir_real);
-	strbuf_release(&odb_path_real);
-
-	if (!odb)
-		die(_("could not find object directory matching %s"), obj_dir);
-	return odb;
-}
-
 static int graph_verify(int argc, const char **argv)
 {
 	struct commit_graph *graph = NULL;
diff --git a/midx.c b/midx.c
index e83f22b5ee..25906044ff 100644
--- a/midx.c
+++ b/midx.c
@@ -893,7 +893,7 @@ static int midx_checksum_valid(struct multi_pack_index *m)
 	return hashfile_checksum_valid(m->data, m->data_len);
 }

-static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
+static int write_midx_internal(const char *object_dir,
 			       struct string_list *packs_to_drop,
 			       const char *preferred_pack_name,
 			       unsigned flags)
@@ -904,20 +904,26 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	struct hashfile *f = NULL;
 	struct lock_file lk;
 	struct write_midx_context ctx = { 0 };
+	struct multi_pack_index *cur;
 	int pack_name_concat_len = 0;
 	int dropped_packs = 0;
 	int result = 0;
 	struct chunkfile *cf;

+	/* Ensure the given object_dir is local, or a known alternate. */
+	find_odb(the_repository, object_dir);
+
 	midx_name = get_midx_filename(object_dir);
 	if (safe_create_leading_directories(midx_name))
 		die_errno(_("unable to create leading directories of %s"),
 			  midx_name);

-	if (m)
-		ctx.m = m;
-	else
-		ctx.m = load_multi_pack_index(object_dir, 1);
+	for (cur = get_multi_pack_index(the_repository); cur; cur = cur->next) {
+		if (!strcmp(object_dir, cur->object_dir)) {
+			ctx.m = cur;
+			break;
+		}
+	}

 	if (ctx.m && !midx_checksum_valid(ctx.m)) {
 		warning(_("ignoring existing multi-pack-index; checksum mismatch"));
@@ -1119,7 +1125,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));

 	if (ctx.m)
-		close_midx(ctx.m);
+		close_object_store(the_repository->objects);

 	if (ctx.nr - dropped_packs == 0) {
 		error(_("no pack files to index."));
@@ -1182,8 +1188,7 @@ int write_midx_file(const char *object_dir,
 		    const char *preferred_pack_name,
 		    unsigned flags)
 {
-	return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name,
-				   flags);
+	return write_midx_internal(object_dir, NULL, preferred_pack_name, flags);
 }

 struct clear_midx_data {
@@ -1461,8 +1466,10 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla

 	free(count);

-	if (packs_to_drop.nr)
-		result = write_midx_internal(object_dir, m, &packs_to_drop, NULL, flags);
+	if (packs_to_drop.nr) {
+		result = write_midx_internal(object_dir, &packs_to_drop, NULL, flags);
+		m = NULL;
+	}

 	string_list_clear(&packs_to_drop, 0);
 	return result;
@@ -1651,7 +1658,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 		goto cleanup;
 	}

-	result = write_midx_internal(object_dir, m, NULL, NULL, flags);
+	result = write_midx_internal(object_dir, NULL, NULL, flags);
 	m = NULL;

 cleanup:
diff --git a/object-file.c b/object-file.c
index a8be899481..a4d720b4f5 100644
--- a/object-file.c
+++ b/object-file.c
@@ -820,6 +820,27 @@ char *compute_alternate_path(const char *path, struct strbuf *err)
 	return ref_git;
 }

+struct object_directory *find_odb(struct repository *r, const char *obj_dir)
+{
+	struct object_directory *odb;
+	char *obj_dir_real = real_pathdup(obj_dir, 1);
+	struct strbuf odb_path_real = STRBUF_INIT;
+
+	prepare_alt_odb(r);
+	for (odb = r->objects->odb; odb; odb = odb->next) {
+		strbuf_realpath(&odb_path_real, odb->path, 1);
+		if (!strcmp(obj_dir_real, odb_path_real.buf))
+			break;
+	}
+
+	free(obj_dir_real);
+	strbuf_release(&odb_path_real);
+
+	if (!odb)
+		die(_("could not find object directory matching %s"), obj_dir);
+	return odb;
+}
+
 static void fill_alternate_refs_command(struct child_process *cmd,
 					const char *repo_path)
 {
diff --git a/object-store.h b/object-store.h
index d24915ced1..250aa5f33c 100644
--- a/object-store.h
+++ b/object-store.h
@@ -38,6 +38,7 @@ KHASH_INIT(odb_path_map, const char * /* key: odb_path */,

 void prepare_alt_odb(struct repository *r);
 char *compute_alternate_path(const char *path, struct strbuf *err);
+struct object_directory *find_odb(struct repository *r, const char *obj_dir);
 typedef int alt_odb_fn(struct object_directory *, void *);
 int foreach_alt_odb(alt_odb_fn, void*);
 typedef void alternate_ref_fn(const struct object_id *oid, void *);
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index d7e4988f2b..bd09c3194b 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -582,7 +582,15 @@ test_expect_success 'force some 64-bit offsets with pack-objects' '
 	idx64=objects64/pack/test-64-$pack64.idx &&
 	chmod u+w $idx64 &&
 	corrupt_data $idx64 $(test_oid idxoff) "\02" &&
-	midx64=$(git multi-pack-index --object-dir=objects64 write) &&
+	# objects64 is not a real repository, but can serve as an alternate
+	# anyway so we can write a MIDX into it
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+		( cd ../objects64 && pwd ) >.git/objects/info/alternates &&
+		midx64=$(git multi-pack-index --object-dir=../objects64 write)
+	) &&
 	midx_read_expect 1 63 5 objects64 " large-offsets"
 '

--
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 273+ messages in thread

* Re: [PATCH v5 00/27] multi-pack reachability bitmaps
  2021-09-01 20:34         ` Taylor Blau
@ 2021-09-01 20:49           ` Junio C Hamano
  2021-09-01 20:54             ` Taylor Blau
  2021-09-02  9:40             ` Jeff King
  2021-09-02  9:38           ` Jeff King
  1 sibling, 2 replies; 273+ messages in thread
From: Junio C Hamano @ 2021-09-01 20:49 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee, jonathantanmy

Taylor Blau <me@ttaylorr.com> writes:

> Sure, I don't mind getting more strict here in this series. If you want,
> the below could be queued instead of the original 11/27:

That may make the documentation and the code more consistent.

> As far as I can tell, supporting arbitrary directories with
> `--object-dir` was a historical accident, since even the documentation
> says `<alt>` when referring to the value passed to this option.

The synopsis has [--object-dir=<dir>], which wants to be cleaned up
for consistency (or <alt> updated to <dir>, but I tend to agree with
you that unifying to <alt> may make our intention more clear).

It is unfortunate that "git multi-pack-index -h" says <file>, which
is probably doubly wrong.  It seems this is the only instance that
abuses OPT_FILENAME() for a non-file, so perhaps it is not too bad
to fix it using the lower-level OPTION_FILENAME (instead of adding
a one-off OPT_DIRECTORY_NAME() helper).

Neither is something that would block this step, of course.

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v5 00/27] multi-pack reachability bitmaps
  2021-09-01 20:49           ` Junio C Hamano
@ 2021-09-01 20:54             ` Taylor Blau
  2021-09-02  9:40             ` Jeff King
  1 sibling, 0 replies; 273+ messages in thread
From: Taylor Blau @ 2021-09-01 20:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, peff, dstolee, jonathantanmy

On Wed, Sep 01, 2021 at 01:49:47PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Sure, I don't mind getting more strict here in this series. If you want,
> > the below could be queued instead of the original 11/27:
>
> That may make the documentation and the code more consistent.
>
> > As far as I can tell, supporting arbitrary directories with
> > `--object-dir` was a historical accident, since even the documentation
> > says `<alt>` when referring to the value passed to this option.
>
> The synopsis has [--object-dir=<dir>], which wants to be cleaned up
> for consistency (or <alt> updated to <dir>, but I tend to agree with
> you that unifying to <alt> may make our intention more clear).
>
> It is unfortunate that "git multi-pack-index -h" says <file>, which
> is probably doubly wrong.  It seems this is the only instance that
> abuses OPT_FILENAME() for a non-file, so perhaps it is not too bad
> to fix it using the lower-level OPTION_FILENAME (instead of adding
> a one-off OPT_DIRECTORY_NAME() helper).
>
> Neither is something that would block this step, of course.

I think there is definitely plenty of opportunity to clean all of this
up even more. But I don't think this already-long series is the place to
do it necessarily, since we don't want to let these last-minute (mostly)
cosmetic issues get in the way of this series as a whole.

Hopefully this v5 is at a point where we could start merging it down to
'next' and then address things like the helptext, `s/dir/alt` and so on.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v5 00/27] multi-pack reachability bitmaps
  2021-09-01 20:34         ` Taylor Blau
  2021-09-01 20:49           ` Junio C Hamano
@ 2021-09-02  9:38           ` Jeff King
  1 sibling, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-09-02  9:38 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Junio C Hamano, git, dstolee, jonathantanmy

On Wed, Sep 01, 2021 at 04:34:01PM -0400, Taylor Blau wrote:

> > Oh, no, don't get me wrong.  I am comfortable with the documented
> > limitation, as that is what the area experts have agreed that is
> > reasonable given the expected use case.
> >
> > I however am much less comfortable with a documented limitation that
> > we make no attempt to enforce, and that is why the first thing I
> > looked for after seeing the documentation update was new code to
> > make sure we reject a random directory that is not our alternate
> > object store.
> 
> Sure, I don't mind getting more strict here in this series. If you want,
> the below could be queued instead of the original 11/27:
> 
> --- 8< ---
> 
> Subject: [PATCH] midx: avoid opening multiple MIDXs when writing

I think this is worth doing here, as part of this series.

Two observations (neither of which would lead to changing the patch):

> diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
> index c9b063d31e..0af6beb2dd 100644
> --- a/Documentation/git-multi-pack-index.txt
> +++ b/Documentation/git-multi-pack-index.txt
> @@ -23,6 +23,8 @@ OPTIONS
>  	Use given directory for the location of Git objects. We check
>  	`<dir>/packs/multi-pack-index` for the current MIDX file, and
>  	`<dir>/packs` for the pack-files to index.
> ++
> +`<dir>` must be an alternate of the current repository.

I wondered if this needed to say "must be the main object directory of
or an alternate of the current repository". But if you are intending to
operate in the main object directory, you would simply omit --object-dir
entirely. It is good that it will still work if you specified it
explicitly, but I don't think we need to clutter the documentation with
it.

> index cd86315221..003eaaac5c 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -43,28 +43,6 @@ static struct opts_commit_graph {
>  	int enable_changed_paths;
>  } opts;
> 
> -static struct object_directory *find_odb(struct repository *r,
> -					 const char *obj_dir)
> -{
> -	struct object_directory *odb;
> -	char *obj_dir_real = real_pathdup(obj_dir, 1);
> -	struct strbuf odb_path_real = STRBUF_INIT;
> -
> -	prepare_alt_odb(r);
> -	for (odb = r->objects->odb; odb; odb = odb->next) {
> -		strbuf_realpath(&odb_path_real, odb->path, 1);
> -		if (!strcmp(obj_dir_real, odb_path_real.buf))
> -			break;
> -	}
> -
> -	free(obj_dir_real);
> -	strbuf_release(&odb_path_real);
> -
> -	if (!odb)
> -		die(_("could not find object directory matching %s"), obj_dir);
> -	return odb;
> -}

Ah, right, commit-graph faces this same conundrum we've been discussing.
And it behaves in the way that we concluded:

  $ git init one
  $ git commit-graph write --object-dir $PWD/one/.git/objects
  fatal: not a git repository (or any of the parent directories): .git

  $ git init two
  $ git -C two commit-graph write --object-dir $PWD/one/.git/objects
  fatal: could not find object directory matching /home/peff/tmp/one/.git/objects

That gives me more confidence in the direction we decided on.

(Apologies if this was obvious to others, but I didn't see any mention
of commit-graph's similar option in the recent discussion).

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v5 00/27] multi-pack reachability bitmaps
  2021-09-01 20:49           ` Junio C Hamano
  2021-09-01 20:54             ` Taylor Blau
@ 2021-09-02  9:40             ` Jeff King
  1 sibling, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-09-02  9:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, dstolee, jonathantanmy

On Wed, Sep 01, 2021 at 01:49:47PM -0700, Junio C Hamano wrote:

> > As far as I can tell, supporting arbitrary directories with
> > `--object-dir` was a historical accident, since even the documentation
> > says `<alt>` when referring to the value passed to this option.
> 
> The synopsis has [--object-dir=<dir>], which wants to be cleaned up
> for consistency (or <alt> updated to <dir>, but I tend to agree with
> you that unifying to <alt> may make our intention more clear).
> 
> It is unfortunate that "git multi-pack-index -h" says <file>, which
> is probably doubly wrong.  It seems this is the only instance that
> abuses OPT_FILENAME() for a non-file, so perhaps it is not too bad
> to fix it using the lower-level OPTION_FILENAME (instead of adding
> a one-off OPT_DIRECTORY_NAME() helper).

That made me wonder what "git commit-graph -h" says. It says "<dir>"
(even though it already must be an alternate), because it uses
OPT_STRING().

I think using OPTION_FILENAME or similar is better there, too, though,
because it reinterprets the name after the repo-setup chdir() step. But
now you'd have to for an OPT_DIRNAME() helper. :)

> Neither is something that would block this step, of course.

Yeah, very much agreed that this can come later on top.

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

* Re: [PATCH v5 00/27] multi-pack reachability bitmaps
  2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
                     ` (27 preceding siblings ...)
  2021-09-01 18:07   ` [PATCH v5 00/27] multi-pack reachability bitmaps Junio C Hamano
@ 2021-09-02  9:45   ` Jeff King
  28 siblings, 0 replies; 273+ messages in thread
From: Jeff King @ 2021-09-02  9:45 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Tue, Aug 31, 2021 at 04:51:33PM -0400, Taylor Blau wrote:

> This series resolves that discussion by leaving everything as-is, and only
> changing the following:
> 
>   - `git multi-pack-index` will not run when outside of a Git
>     repository.
> 
>   - The `--object-dir` argument will only recognize object directories
>     belonging to an alternate of the current repository.
> 
>   - Using `--object-dir` to point to a repository which uses a
>     different hash than the repository in the current working directory
>     will continue to not work (as was the case before this series).
> 
> And because this incorporates [1], we will also not accidentally clean `.rev`
> files from the wrong object directory.

Thanks. I read over the new patches, and all looks good to me (using the
revised patch 11 you already sent, and which I commented on separately).

-Peff

^ permalink raw reply	[flat|nested] 273+ messages in thread

end of thread, other threads:[~2021-09-02  9:45 UTC | newest]

Thread overview: 273+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
2021-04-09 18:10 ` [PATCH 01/22] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
2021-04-09 18:10 ` [PATCH 02/22] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
2021-04-16  2:46   ` Jonathan Tan
2021-04-09 18:10 ` [PATCH 03/22] pack-bitmap-write.c: free existing bitmaps Taylor Blau
2021-04-09 18:10 ` [PATCH 04/22] Documentation: build 'technical/bitmap-format' by default Taylor Blau
2021-04-09 18:11 ` [PATCH 05/22] Documentation: describe MIDX-based bitmaps Taylor Blau
2021-04-09 18:11 ` [PATCH 06/22] midx: make a number of functions non-static Taylor Blau
2021-04-09 18:11 ` [PATCH 07/22] midx: clear auxiliary .rev after replacing the MIDX Taylor Blau
2021-04-09 18:11 ` [PATCH 08/22] midx: respect 'core.multiPackIndex' when writing Taylor Blau
2021-04-09 18:11 ` [PATCH 09/22] pack-bitmap.c: introduce 'bitmap_num_objects()' Taylor Blau
2021-04-09 18:11 ` [PATCH 10/22] pack-bitmap.c: introduce 'nth_bitmap_object_oid()' Taylor Blau
2021-04-09 18:11 ` [PATCH 11/22] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' Taylor Blau
2021-04-09 18:11 ` [PATCH 12/22] pack-bitmap: read multi-pack bitmaps Taylor Blau
2021-04-16  2:39   ` Jonathan Tan
2021-04-16  3:13     ` Taylor Blau
2021-04-09 18:11 ` [PATCH 13/22] pack-bitmap: write " Taylor Blau
2021-05-04  5:02   ` Jonathan Tan
2021-05-06 20:18     ` Taylor Blau
2021-05-06 22:00       ` Jonathan Tan
2021-04-09 18:11 ` [PATCH 14/22] t5310: move some tests to lib-bitmap.sh Taylor Blau
2021-04-09 18:11 ` [PATCH 15/22] t/helper/test-read-midx.c: add --checksum mode Taylor Blau
2021-04-09 18:12 ` [PATCH 16/22] t5326: test multi-pack bitmap behavior Taylor Blau
2021-05-04 17:51   ` Jonathan Tan
2021-04-09 18:12 ` [PATCH 17/22] t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP Taylor Blau
2021-04-09 18:12 ` [PATCH 18/22] t5319: don't write MIDX bitmaps in t5319 Taylor Blau
2021-04-09 18:12 ` [PATCH 19/22] t7700: update to work with MIDX bitmap test knob Taylor Blau
2021-04-09 18:12 ` [PATCH 20/22] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
2021-04-09 18:12 ` [PATCH 21/22] p5310: extract full and partial bitmap tests Taylor Blau
2021-04-09 18:12 ` [PATCH 22/22] p5326: perf tests for MIDX bitmaps Taylor Blau
2021-05-04 18:00   ` Jonathan Tan
2021-05-05  0:55     ` Junio C Hamano
2021-06-21 22:24 ` [PATCH v2 00/24] multi-pack reachability bitmaps Taylor Blau
2021-06-21 22:24   ` [PATCH v2 01/24] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
2021-06-24 23:02     ` Ævar Arnfjörð Bjarmason
2021-07-14 17:24       ` Taylor Blau
2021-07-21  9:45     ` Jeff King
2021-07-21 17:15       ` Taylor Blau
2021-06-21 22:25   ` [PATCH v2 02/24] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
2021-06-24 23:23     ` Ævar Arnfjörð Bjarmason
2021-07-14 17:32       ` Taylor Blau
2021-07-14 18:44         ` Ævar Arnfjörð Bjarmason
2021-07-21  9:53         ` Jeff King
2021-07-21  9:50     ` Jeff King
2021-07-21 17:20       ` Taylor Blau
2021-07-23  7:37         ` Jeff King
2021-07-26 18:48           ` Taylor Blau
2021-07-27 17:11             ` Jeff King
2021-06-21 22:25   ` [PATCH v2 03/24] pack-bitmap-write.c: free existing bitmaps Taylor Blau
2021-07-21  9:54     ` Jeff King
2021-06-21 22:25   ` [PATCH v2 04/24] Documentation: build 'technical/bitmap-format' by default Taylor Blau
2021-06-24 23:35     ` Ævar Arnfjörð Bjarmason
2021-07-14 17:41       ` Taylor Blau
2021-07-14 22:58         ` Ævar Arnfjörð Bjarmason
2021-07-21 10:04           ` Jeff King
2021-07-21 10:10             ` Jeff King
2021-07-21  9:58     ` Jeff King
2021-07-21 10:08       ` Jeff King
2021-07-21 17:23         ` Taylor Blau
2021-07-23  7:39           ` Jeff King
2021-07-26 18:49             ` Taylor Blau
2021-06-21 22:25   ` [PATCH v2 05/24] Documentation: describe MIDX-based bitmaps Taylor Blau
2021-07-21 10:18     ` Jeff King
2021-07-21 17:53       ` Taylor Blau
2021-07-23  7:45         ` Jeff King
2021-06-21 22:25   ` [PATCH v2 06/24] midx: make a number of functions non-static Taylor Blau
2021-06-24 23:42     ` Ævar Arnfjörð Bjarmason
2021-07-14 23:01       ` Taylor Blau
2021-06-21 22:25   ` [PATCH v2 07/24] midx: clear auxiliary .rev after replacing the MIDX Taylor Blau
2021-07-21 10:19     ` Jeff King
2021-06-21 22:25   ` [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing Taylor Blau
2021-06-24 23:43     ` Ævar Arnfjörð Bjarmason
2021-07-21 10:23     ` Jeff King
2021-07-21 19:22       ` Taylor Blau
2021-07-23  8:29         ` Jeff King
2021-07-26 18:59           ` Taylor Blau
2021-07-26 22:14             ` Taylor Blau
2021-07-27 17:29               ` Jeff King
2021-07-27 17:36                 ` Taylor Blau
2021-07-27 17:42                   ` Jeff King
2021-07-27 17:47                     ` Taylor Blau
2021-07-27 17:55                       ` Jeff King
2021-07-27 20:05                         ` Taylor Blau
2021-07-28 17:46                           ` Jeff King
2021-07-29 19:44                             ` Taylor Blau
2021-08-12 19:59                               ` Jeff King
2021-07-27 17:17             ` Jeff King
2021-06-21 22:25   ` [PATCH v2 09/24] midx: infer preferred pack when not given one Taylor Blau
2021-07-21 10:34     ` Jeff King
2021-07-21 20:16       ` Taylor Blau
2021-07-23  8:50         ` Jeff King
2021-07-26 19:44           ` Taylor Blau
2021-06-21 22:25   ` [PATCH v2 10/24] pack-bitmap.c: introduce 'bitmap_num_objects()' Taylor Blau
2021-07-21 10:35     ` Jeff King
2021-06-21 22:25   ` [PATCH v2 11/24] pack-bitmap.c: introduce 'nth_bitmap_object_oid()' Taylor Blau
2021-06-24 14:59     ` Taylor Blau
2021-07-21 10:37     ` Jeff King
2021-07-21 10:38       ` Jeff King
2021-06-21 22:25   ` [PATCH v2 12/24] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' Taylor Blau
2021-07-21 10:39     ` Jeff King
2021-07-21 20:18       ` Taylor Blau
2021-06-21 22:25   ` [PATCH v2 13/24] pack-bitmap: read multi-pack bitmaps Taylor Blau
2021-07-21 11:32     ` Jeff King
2021-07-21 23:01       ` Taylor Blau
2021-07-23  9:40         ` Jeff King
2021-07-23 10:00         ` Jeff King
2021-07-26 20:36           ` Taylor Blau
2021-06-21 22:25   ` [PATCH v2 14/24] pack-bitmap: write " Taylor Blau
2021-06-24 23:45     ` Ævar Arnfjörð Bjarmason
2021-07-15 14:33       ` Taylor Blau
2021-07-21 12:09     ` Jeff King
2021-07-26 18:12       ` Taylor Blau
2021-07-26 18:23         ` Taylor Blau
2021-07-27 17:11         ` Jeff King
2021-07-27 20:33           ` Taylor Blau
2021-07-28 17:52             ` Jeff King
2021-07-29 19:33               ` Taylor Blau
2021-08-12 20:00                 ` Jeff King
2021-06-21 22:25   ` [PATCH v2 15/24] t5310: move some tests to lib-bitmap.sh Taylor Blau
2021-06-21 22:25   ` [PATCH v2 16/24] t/helper/test-read-midx.c: add --checksum mode Taylor Blau
2021-06-21 22:25   ` [PATCH v2 17/24] t5326: test multi-pack bitmap behavior Taylor Blau
2021-06-21 22:25   ` [PATCH v2 18/24] t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP Taylor Blau
2021-06-21 22:25   ` [PATCH v2 19/24] t5310: " Taylor Blau
2021-06-21 22:25   ` [PATCH v2 20/24] t5319: don't write MIDX bitmaps in t5319 Taylor Blau
2021-06-21 22:25   ` [PATCH v2 21/24] t7700: update to work with MIDX bitmap test knob Taylor Blau
2021-06-21 22:25   ` [PATCH v2 22/24] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
2021-06-25  0:03     ` Ævar Arnfjörð Bjarmason
2021-06-21 22:25   ` [PATCH v2 23/24] p5310: extract full and partial bitmap tests Taylor Blau
2021-06-21 22:26   ` [PATCH v2 24/24] p5326: perf tests for MIDX bitmaps Taylor Blau
2021-06-25  9:06   ` [PATCH v2 00/24] multi-pack reachability bitmaps Ævar Arnfjörð Bjarmason
2021-07-15 14:36     ` Taylor Blau
2021-07-21 12:12       ` Jeff King
2021-07-27 21:19 ` [PATCH v3 00/25] " Taylor Blau
2021-07-27 21:19   ` [PATCH v3 01/25] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
2021-07-27 21:19   ` [PATCH v3 02/25] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
2021-07-27 21:19   ` [PATCH v3 03/25] pack-bitmap-write.c: free existing bitmaps Taylor Blau
2021-07-27 21:19   ` [PATCH v3 04/25] Documentation: describe MIDX-based bitmaps Taylor Blau
2021-07-27 21:19   ` [PATCH v3 05/25] midx: clear auxiliary .rev after replacing the MIDX Taylor Blau
2021-07-27 21:19   ` [PATCH v3 06/25] midx: reject empty `--preferred-pack`'s Taylor Blau
2021-07-27 21:19   ` [PATCH v3 07/25] midx: infer preferred pack when not given one Taylor Blau
2021-07-27 21:19   ` [PATCH v3 08/25] midx: close linked MIDXs, avoid leaking memory Taylor Blau
2021-07-27 21:19   ` [PATCH v3 09/25] midx: avoid opening multiple MIDXs when writing Taylor Blau
2021-07-29 19:30     ` Taylor Blau
2021-08-12 20:15     ` Jeff King
2021-08-12 20:22       ` Jeff King
2021-08-12 21:20         ` Taylor Blau
2021-07-27 21:19   ` [PATCH v3 10/25] pack-bitmap.c: introduce 'bitmap_num_objects()' Taylor Blau
2021-07-27 21:19   ` [PATCH v3 11/25] pack-bitmap.c: introduce 'nth_bitmap_object_oid()' Taylor Blau
2021-07-27 21:19   ` [PATCH v3 12/25] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' Taylor Blau
2021-07-27 21:19   ` [PATCH v3 13/25] pack-bitmap.c: avoid redundant calls to try_partial_reuse Taylor Blau
2021-07-27 21:19   ` [PATCH v3 14/25] pack-bitmap: read multi-pack bitmaps Taylor Blau
2021-07-27 21:20   ` [PATCH v3 15/25] pack-bitmap: write " Taylor Blau
2021-07-27 21:20   ` [PATCH v3 16/25] t5310: move some tests to lib-bitmap.sh Taylor Blau
2021-08-12 20:25     ` Jeff King
2021-07-27 21:20   ` [PATCH v3 17/25] t/helper/test-read-midx.c: add --checksum mode Taylor Blau
2021-08-12 20:31     ` Jeff King
2021-08-12 21:31       ` Taylor Blau
2021-07-27 21:20   ` [PATCH v3 18/25] t5326: test multi-pack bitmap behavior Taylor Blau
2021-08-12 21:02     ` Jeff King
2021-08-12 21:07       ` Jeff King
2021-08-12 22:38       ` Taylor Blau
2021-08-12 23:23         ` Jeff King
2021-07-27 21:20   ` [PATCH v3 19/25] t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP Taylor Blau
2021-07-27 21:20   ` [PATCH v3 20/25] t5310: " Taylor Blau
2021-07-27 21:20   ` [PATCH v3 21/25] t5319: don't write MIDX bitmaps in t5319 Taylor Blau
2021-07-27 21:20   ` [PATCH v3 22/25] t7700: update to work with MIDX bitmap test knob Taylor Blau
2021-07-27 21:20   ` [PATCH v3 23/25] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
2021-08-12 21:09     ` Jeff King
2021-07-27 21:20   ` [PATCH v3 24/25] p5310: extract full and partial bitmap tests Taylor Blau
2021-07-27 21:20   ` [PATCH v3 25/25] p5326: perf tests for MIDX bitmaps Taylor Blau
2021-08-12 21:18     ` Jeff King
2021-08-12 21:21   ` [PATCH v3 00/25] multi-pack reachability bitmaps Jeff King
2021-08-12 22:41     ` Taylor Blau
2021-08-24 16:15 ` [PATCH v4 " Taylor Blau
2021-08-24 16:15   ` [PATCH v4 01/25] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
2021-08-24 16:15   ` [PATCH v4 02/25] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
2021-08-24 16:15   ` [PATCH v4 03/25] pack-bitmap-write.c: free existing bitmaps Taylor Blau
2021-08-24 16:15   ` [PATCH v4 04/25] Documentation: describe MIDX-based bitmaps Taylor Blau
2021-08-24 16:16   ` [PATCH v4 05/25] midx: clear auxiliary .rev after replacing the MIDX Taylor Blau
2021-08-24 20:27     ` Junio C Hamano
2021-08-24 20:34       ` Taylor Blau
2021-08-24 21:12         ` Junio C Hamano
2021-08-24 21:24           ` Taylor Blau
2021-08-24 22:01             ` Taylor Blau
2021-08-24 22:04             ` Junio C Hamano
2021-08-24 22:06               ` Junio C Hamano
2021-08-24 22:10                 ` Taylor Blau
2021-08-27  6:01                   ` Junio C Hamano
2021-08-27 18:03                     ` Taylor Blau
2021-08-29 22:56                       ` Junio C Hamano
2021-08-30  0:07                         ` Taylor Blau
2021-08-30  0:34                           ` Junio C Hamano
2021-08-30  0:43                             ` Taylor Blau
2021-08-30 22:10                               ` brian m. carlson
2021-08-30 22:28                                 ` Junio C Hamano
2021-08-30 22:33                                   ` Taylor Blau
2021-08-31  5:19                                     ` Jeff King
2021-08-31 16:29                                     ` Junio C Hamano
2021-08-31 16:39                                       ` Taylor Blau
2021-08-31 17:44                                         ` Junio C Hamano
2021-08-31 18:48                                           ` Taylor Blau
2021-08-31  1:21                           ` Derrick Stolee
2021-08-31  5:37                             ` Jeff King
2021-08-31 16:33                               ` Junio C Hamano
2021-08-31 16:43                                 ` Taylor Blau
2021-08-31 17:17                                   ` Derrick Stolee
2021-09-01 10:03                                 ` Jeff King
2021-08-24 16:16   ` [PATCH v4 06/25] midx: reject empty `--preferred-pack`'s Taylor Blau
2021-08-24 16:16   ` [PATCH v4 07/25] midx: infer preferred pack when not given one Taylor Blau
2021-08-24 16:16   ` [PATCH v4 08/25] midx: close linked MIDXs, avoid leaking memory Taylor Blau
2021-08-24 16:16   ` [PATCH v4 09/25] midx: avoid opening multiple MIDXs when writing Taylor Blau
2021-08-24 16:16   ` [PATCH v4 10/25] pack-bitmap.c: introduce 'bitmap_num_objects()' Taylor Blau
2021-08-24 16:16   ` [PATCH v4 11/25] pack-bitmap.c: introduce 'nth_bitmap_object_oid()' Taylor Blau
2021-08-24 16:16   ` [PATCH v4 12/25] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' Taylor Blau
2021-08-24 16:16   ` [PATCH v4 13/25] pack-bitmap.c: avoid redundant calls to try_partial_reuse Taylor Blau
2021-08-24 16:16   ` [PATCH v4 14/25] pack-bitmap: read multi-pack bitmaps Taylor Blau
2021-08-24 16:16   ` [PATCH v4 15/25] pack-bitmap: write " Taylor Blau
2021-08-24 16:16   ` [PATCH v4 16/25] t5310: move some tests to lib-bitmap.sh Taylor Blau
2021-08-24 16:16   ` [PATCH v4 17/25] t/helper/test-read-midx.c: add --checksum mode Taylor Blau
2021-08-24 16:16   ` [PATCH v4 18/25] t5326: test multi-pack bitmap behavior Taylor Blau
2021-08-24 16:16   ` [PATCH v4 19/25] t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP Taylor Blau
2021-08-24 16:16   ` [PATCH v4 20/25] t5310: " Taylor Blau
2021-08-24 16:16   ` [PATCH v4 21/25] t5319: don't write MIDX bitmaps in t5319 Taylor Blau
2021-08-24 16:16   ` [PATCH v4 22/25] t7700: update to work with MIDX bitmap test knob Taylor Blau
2021-08-24 16:16   ` [PATCH v4 23/25] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
2021-08-24 16:16   ` [PATCH v4 24/25] p5310: extract full and partial bitmap tests Taylor Blau
2021-08-24 16:16   ` [PATCH v4 25/25] p5326: perf tests for MIDX bitmaps Taylor Blau
2021-08-25  0:28   ` [PATCH v4 00/25] multi-pack reachability bitmaps Jeff King
2021-08-25  2:10     ` Taylor Blau
2021-08-25  2:13       ` Taylor Blau
2021-08-25  7:36       ` Jeff King
2021-08-25  7:48         ` Johannes Berg
2021-08-26 18:49         ` Taylor Blau
2021-08-26 21:22           ` Taylor Blau
2021-08-27 21:30             ` Jeff King
2021-08-29 22:42               ` Junio C Hamano
2021-08-31 20:51 ` [PATCH v5 00/27] " Taylor Blau
2021-08-31 20:51   ` [PATCH v5 01/27] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
2021-08-31 20:51   ` [PATCH v5 02/27] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
2021-08-31 20:51   ` [PATCH v5 03/27] pack-bitmap-write.c: free existing bitmaps Taylor Blau
2021-08-31 20:51   ` [PATCH v5 04/27] Documentation: describe MIDX-based bitmaps Taylor Blau
2021-08-31 20:51   ` [PATCH v5 05/27] midx: disallow running outside of a repository Taylor Blau
2021-08-31 20:51   ` [PATCH v5 06/27] midx: fix `*.rev` cleanups with `--object-dir` Taylor Blau
2021-08-31 20:51   ` [PATCH v5 07/27] midx: clear auxiliary .rev after replacing the MIDX Taylor Blau
2021-08-31 20:52   ` [PATCH v5 08/27] midx: reject empty `--preferred-pack`'s Taylor Blau
2021-08-31 20:52   ` [PATCH v5 09/27] midx: infer preferred pack when not given one Taylor Blau
2021-08-31 20:52   ` [PATCH v5 10/27] midx: close linked MIDXs, avoid leaking memory Taylor Blau
2021-08-31 20:52   ` [PATCH v5 11/27] midx: avoid opening multiple MIDXs when writing Taylor Blau
2021-08-31 20:52   ` [PATCH v5 12/27] pack-bitmap.c: introduce 'bitmap_num_objects()' Taylor Blau
2021-08-31 20:52   ` [PATCH v5 13/27] pack-bitmap.c: introduce 'nth_bitmap_object_oid()' Taylor Blau
2021-08-31 20:52   ` [PATCH v5 14/27] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' Taylor Blau
2021-08-31 20:52   ` [PATCH v5 15/27] pack-bitmap.c: avoid redundant calls to try_partial_reuse Taylor Blau
2021-08-31 20:52   ` [PATCH v5 16/27] pack-bitmap: read multi-pack bitmaps Taylor Blau
2021-08-31 20:52   ` [PATCH v5 17/27] pack-bitmap: write " Taylor Blau
2021-08-31 20:52   ` [PATCH v5 18/27] t5310: move some tests to lib-bitmap.sh Taylor Blau
2021-08-31 20:52   ` [PATCH v5 19/27] t/helper/test-read-midx.c: add --checksum mode Taylor Blau
2021-08-31 20:52   ` [PATCH v5 20/27] t5326: test multi-pack bitmap behavior Taylor Blau
2021-08-31 20:52   ` [PATCH v5 21/27] t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP Taylor Blau
2021-08-31 20:52   ` [PATCH v5 22/27] t5310: " Taylor Blau
2021-08-31 20:52   ` [PATCH v5 23/27] t5319: don't write MIDX bitmaps in t5319 Taylor Blau
2021-08-31 20:52   ` [PATCH v5 24/27] t7700: update to work with MIDX bitmap test knob Taylor Blau
2021-08-31 20:52   ` [PATCH v5 25/27] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
2021-08-31 20:52   ` [PATCH v5 26/27] p5310: extract full and partial bitmap tests Taylor Blau
2021-08-31 20:52   ` [PATCH v5 27/27] p5326: perf tests for MIDX bitmaps Taylor Blau
2021-09-01 18:07   ` [PATCH v5 00/27] multi-pack reachability bitmaps Junio C Hamano
2021-09-01 19:08     ` Taylor Blau
2021-09-01 19:23       ` Junio C Hamano
2021-09-01 20:34         ` Taylor Blau
2021-09-01 20:49           ` Junio C Hamano
2021-09-01 20:54             ` Taylor Blau
2021-09-02  9:40             ` Jeff King
2021-09-02  9:38           ` Jeff King
2021-09-02  9:45   ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.