git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] pack-bitmap: permute existing namehash values
@ 2021-09-07 21:17 Taylor Blau
  2021-09-07 21:17 ` [PATCH 1/4] t/helper/test-bitmap.c: add 'dump-hashes' mode Taylor Blau
                   ` (5 more replies)
  0 siblings, 6 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-07 21:17 UTC (permalink / raw)
  To: git; +Cc: peff

This short series depends on tb/multi-pack-bitmaps and is used to propagate
the hashcache between .bitmap files when generating multi-pack reachability
bitmaps.

The gist is that single pack bitmaps get the namehash values via the traversal
used to generate the pack, but there is no such traversal for MIDX bitmaps, and
thus the hash-cache extension (if we were to write it) would contain all zeros.

This series teaches the pack-bitmap code how to permute the hashcache values
from an existing bitmap into the new one. This doesn't keep the hashcache
up-to-date, but it at least prevents existing data from falling out when
generating a MIDX bitmap.

Taylor Blau (4):
  t/helper/test-bitmap.c: add 'dump-hashes' mode
  pack-bitmap.c: propagate namehash values from existing bitmaps
  midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
  t5326: test propagating hashcache values

 Documentation/config/pack.txt |  4 ++++
 builtin/multi-pack-index.c    | 21 ++++++++++++++++++++
 midx.c                        |  6 +++++-
 midx.h                        |  1 +
 pack-bitmap.c                 | 36 +++++++++++++++++++++++++++++------
 pack-bitmap.h                 |  1 +
 t/helper/test-bitmap.c        | 10 +++++++++-
 t/t5326-multi-pack-bitmaps.sh | 32 +++++++++++++++++++++++++++++++
 8 files changed, 103 insertions(+), 8 deletions(-)

-- 
2.33.0.96.g73915697e6

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 1/4] t/helper/test-bitmap.c: add 'dump-hashes' mode
  2021-09-07 21:17 [PATCH 0/4] pack-bitmap: permute existing namehash values Taylor Blau
@ 2021-09-07 21:17 ` Taylor Blau
  2021-09-08  1:37   ` Ævar Arnfjörð Bjarmason
  2021-09-07 21:17 ` [PATCH 2/4] pack-bitmap.c: propagate namehash values from existing bitmaps Taylor Blau
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 55+ messages in thread
From: Taylor Blau @ 2021-09-07 21:17 UTC (permalink / raw)
  To: git; +Cc: peff

The pack-bitmap writer code is about to learn how to propagate values
from an existing hash-cache. To prepare, teach the test-bitmap helper to
dump the values from a bitmap's hash-cache extension in order to test
those changes.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c          | 22 ++++++++++++++++++++++
 pack-bitmap.h          |  1 +
 t/helper/test-bitmap.c | 10 +++++++++-
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index fa69ed7a6d..e44af36933 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1744,6 +1744,28 @@ int test_bitmap_commits(struct repository *r)
 	return 0;
 }
 
+int test_bitmap_hashes(struct repository *r)
+{
+	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
+	struct object_id oid;
+	uint32_t i;
+
+	if (!bitmap_git->hashes)
+		goto cleanup;
+
+	for (i = 0; i < bitmap_num_objects(bitmap_git); i++) {
+		nth_bitmap_object_oid(bitmap_git, &oid, i);
+
+		printf("%s %"PRIu32"\n",
+		       oid_to_hex(&oid), get_be32(bitmap_git->hashes + i));
+	}
+
+cleanup:
+	free_bitmap_index(bitmap_git);
+
+	return 0;
+}
+
 int rebuild_bitmap(const uint32_t *reposition,
 		   struct ewah_bitmap *source,
 		   struct bitmap *dest)
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 81664f933f..b29514418c 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -53,6 +53,7 @@ void traverse_bitmap_commit_list(struct bitmap_index *,
 				 show_reachable_fn show_reachable);
 void test_bitmap_walk(struct rev_info *revs);
 int test_bitmap_commits(struct repository *r);
+int test_bitmap_hashes(struct repository *r);
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 					 struct list_objects_filter_options *filter,
 					 int filter_provided_objects);
diff --git a/t/helper/test-bitmap.c b/t/helper/test-bitmap.c
index 134a1e9d76..ff35f5999b 100644
--- a/t/helper/test-bitmap.c
+++ b/t/helper/test-bitmap.c
@@ -7,6 +7,11 @@ static int bitmap_list_commits(void)
 	return test_bitmap_commits(the_repository);
 }
 
+static int bitmap_dump_hashes(void)
+{
+	return test_bitmap_hashes(the_repository);
+}
+
 int cmd__bitmap(int argc, const char **argv)
 {
 	setup_git_directory();
@@ -16,9 +21,12 @@ int cmd__bitmap(int argc, const char **argv)
 
 	if (!strcmp(argv[1], "list-commits"))
 		return bitmap_list_commits();
+	if (!strcmp(argv[1], "dump-hashes"))
+		return bitmap_dump_hashes();
 
 usage:
-	usage("\ttest-tool bitmap list-commits");
+	usage("\ttest-tool bitmap list-commits\n"
+	      "\ttest-tool bitmap dump-hashes");
 
 	return -1;
 }
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 2/4] pack-bitmap.c: propagate namehash values from existing bitmaps
  2021-09-07 21:17 [PATCH 0/4] pack-bitmap: permute existing namehash values Taylor Blau
  2021-09-07 21:17 ` [PATCH 1/4] t/helper/test-bitmap.c: add 'dump-hashes' mode Taylor Blau
@ 2021-09-07 21:17 ` Taylor Blau
  2021-09-07 21:18 ` [PATCH 3/4] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps Taylor Blau
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-07 21:17 UTC (permalink / raw)
  To: git; +Cc: peff

When an old bitmap exists while writing a new one, we load it and build
a "reposition" table which maps bit positions of objects from the old
bitmap to their respective positions in the new bitmap. This can help
when we encounter a commit which was selected in both the old and new
bitmap, since we only need to permute its bit (not recompute it from
scratch).

We do not, however, repurpose existing namehash values in the case of
the hash-cache extension. There has been thus far no good reason to do
so, since all of the namehash values for objects in the new bitmap would
be populated during the traversal that was just performed by
pack-objects when generating single-pack reachability bitmaps.

But this isn't the case for multi-pack bitmaps, which are written via
`git multi-pack-index write --bitmap` and do not perform any traversal.
In this case all namehash values are set to zero, but we don't even
bother to check the `pack.writeBitmapHashcache` option anyway, so it
fails to matter.

There are two approaches we could take to fill in non-zero hash-cache
values:

  - have either the multi-pack-index builtin run its own
    traversal to attempt to fill in some values, or let a hypothetical
    caller (like `pack-objects` when `repack` eventually drives the
    `multi-pack-index` builtin) fill in the values they found during
    their traversal

  - or copy any existing namehash values that were stored in an
    existing bitmap to their corresponding positions in the new bitmap

In a system where a repository is generally repacked with `git repack
--geometric=<d>` and occasionally repacked with `git repack -a`, the
hash-cache coverage will tend towards all objects.

Since populating the hash-cache is additive (i.e., doing so only helps
our delta search), any intermediate lack of full coverage is just fine.
So let's start by just propagating any values from the existing
hash-cache if we see one.

The next patch will respect the `pack.writeBitmapHashcache` option while
writing MIDX bitmaps, and then test this new behavior.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index e44af36933..cb876e7e9d 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1815,18 +1815,20 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 	for (i = 0; i < num_objects; ++i) {
 		struct object_id oid;
 		struct object_entry *oe;
+		uint32_t index_pos;
 
 		if (bitmap_is_midx(bitmap_git))
-			nth_midxed_object_oid(&oid,
-					      bitmap_git->midx,
-					      pack_pos_to_midx(bitmap_git->midx, i));
+			index_pos = pack_pos_to_midx(bitmap_git->midx, i);
 		else
-			nth_packed_object_id(&oid, bitmap_git->pack,
-					     pack_pos_to_index(bitmap_git->pack, i));
+			index_pos = pack_pos_to_index(bitmap_git->pack, i);
+		nth_bitmap_object_oid(bitmap_git, &oid, index_pos);
 		oe = packlist_find(mapping, &oid);
 
-		if (oe)
+		if (oe) {
 			reposition[i] = oe_in_pack_pos(mapping, oe) + 1;
+			if (bitmap_git->hashes && !oe->hash)
+				oe->hash = get_be32(bitmap_git->hashes + index_pos);
+		}
 	}
 
 	return reposition;
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 3/4] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
  2021-09-07 21:17 [PATCH 0/4] pack-bitmap: permute existing namehash values Taylor Blau
  2021-09-07 21:17 ` [PATCH 1/4] t/helper/test-bitmap.c: add 'dump-hashes' mode Taylor Blau
  2021-09-07 21:17 ` [PATCH 2/4] pack-bitmap.c: propagate namehash values from existing bitmaps Taylor Blau
@ 2021-09-07 21:18 ` Taylor Blau
  2021-09-08  1:40   ` Ævar Arnfjörð Bjarmason
  2021-09-13  0:38   ` Junio C Hamano
  2021-09-07 21:18 ` [PATCH 4/4] t5326: test propagating hashcache values Taylor Blau
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-07 21:18 UTC (permalink / raw)
  To: git; +Cc: peff

In the previous commit, the bitmap writing code learned to propagate
values from an existing hash-cache extension into the bitmap that it is
writing.

Now that that functionality exists, let's expose it by teaching the 'git
multi-pack-index' builtin to respect the `pack.writeBitmapHashCache`
option so that the hash-cache may be written at all.

Two minor points worth noting here:

  - The 'git multi-pack-index write' sub-command didn't previously read
    any configuration (instead this is handled in the base command). A
    separate handler is added here to respect this write-specific
    config option.

  - I briefly considered adding a 'bitmap_flags' field to the static
    options struct, but decided against it since it would require
    plumbing through a new parameter to the write_midx_file() function.

    Instead, a new MIDX-specific flag is added, which is translated to
    the corresponding bitmap one.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config/pack.txt |  4 ++++
 builtin/multi-pack-index.c    | 21 +++++++++++++++++++++
 midx.c                        |  6 +++++-
 midx.h                        |  1 +
 4 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index 763f7af7c4..ad7f73a1ea 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -159,6 +159,10 @@ pack.writeBitmapHashCache::
 	between an older, bitmapped pack and objects that have been
 	pushed since the last gc). The downside is that it consumes 4
 	bytes per object of disk space. Defaults to true.
++
+When writing a multi-pack reachability bitmap, no new namehashes are
+computed; instead, any namehashes stored in an existing bitmap are
+permuted into their appropriate location when writing a new bitmap.
 
 pack.writeReverseIndex::
 	When true, git will write a corresponding .rev file (see:
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 73c0113b48..dd1652531b 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -61,6 +61,23 @@ static struct option *add_common_options(struct option *prev)
 	return parse_options_concat(common_opts, prev);
 }
 
+static int git_multi_pack_index_write_config(const char *var, const char *value,
+					     void *cb)
+{
+	if (!strcmp(var, "pack.writebitmaphashcache")) {
+		if (git_config_bool(var, value))
+			opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
+		else
+			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
+	}
+
+	/*
+	 * No need to fall-back to 'git_default_config', since this was already
+	 * called in 'cmd_multi_pack_index()'.
+	 */
+	return 0;
+}
+
 static int cmd_multi_pack_index_write(int argc, const char **argv)
 {
 	struct option *options;
@@ -73,6 +90,10 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 		OPT_END(),
 	};
 
+	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
+
+	git_config(git_multi_pack_index_write_config, NULL);
+
 	options = add_common_options(builtin_multi_pack_index_write_options);
 
 	trace2_cmd_mode(argv[0]);
diff --git a/midx.c b/midx.c
index ccdc3e5702..6c35dcd557 100644
--- a/midx.c
+++ b/midx.c
@@ -1008,9 +1008,13 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
 	struct pack_idx_entry **index;
 	struct commit **commits = NULL;
 	uint32_t i, commits_nr;
+	uint16_t options = 0;
 	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
 	int ret;
 
+	if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
+		options |= BITMAP_OPT_HASH_CACHE;
+
 	prepare_midx_packing_data(&pdata, ctx);
 
 	commits = find_commits_for_midx_bitmap(&commits_nr, ctx);
@@ -1049,7 +1053,7 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
 		goto cleanup;
 
 	bitmap_writer_set_checksum(midx_hash);
-	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, 0);
+	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, options);
 
 cleanup:
 	free(index);
diff --git a/midx.h b/midx.h
index aa3da557bb..541d9ac728 100644
--- a/midx.h
+++ b/midx.h
@@ -44,6 +44,7 @@ struct multi_pack_index {
 #define MIDX_PROGRESS     (1 << 0)
 #define MIDX_WRITE_REV_INDEX (1 << 1)
 #define MIDX_WRITE_BITMAP (1 << 2)
+#define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m);
 char *get_midx_filename(const char *object_dir);
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 4/4] t5326: test propagating hashcache values
  2021-09-07 21:17 [PATCH 0/4] pack-bitmap: permute existing namehash values Taylor Blau
                   ` (2 preceding siblings ...)
  2021-09-07 21:18 ` [PATCH 3/4] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps Taylor Blau
@ 2021-09-07 21:18 ` Taylor Blau
  2021-09-08  1:46   ` Ævar Arnfjörð Bjarmason
  2021-09-13  0:46   ` Junio C Hamano
  2021-09-14 22:05 ` [PATCH v2 0/7] pack-bitmap: permute existing namehash values Taylor Blau
  2021-09-17 21:21 ` [PATCH v3 " Taylor Blau
  5 siblings, 2 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-07 21:18 UTC (permalink / raw)
  To: git; +Cc: peff

Now that we both can propagate values from the hashcache, and respect
the configuration to enable the hashcache at all, test that both of
these function correctly by hardening their behavior with a test.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5326-multi-pack-bitmaps.sh | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 4ad7c2c969..24148ca35b 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -283,4 +283,36 @@ test_expect_success 'pack.preferBitmapTips' '
 	)
 '
 
+test_expect_success 'hash-cache values are propagated from pack bitmaps' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		git config pack.writeBitmapHashCache true &&
+
+		test_commit base &&
+		test_commit base2 &&
+		git repack -adb &&
+
+		test-tool bitmap dump-hashes >pack.raw &&
+		test_file_not_empty pack.raw &&
+		sort pack.raw >pack.hashes &&
+
+		test_commit new &&
+		git repack &&
+		git multi-pack-index write --bitmap &&
+
+		test-tool bitmap dump-hashes >midx.raw &&
+		sort midx.raw >midx.hashes &&
+
+		# ensure that every namehash in the pack bitmap can be found in
+		# the midx bitmap (i.e., that there are no oid-namehash pairs
+		# unique to the pack bitmap).
+		comm -23 pack.hashes midx.hashes >dropped.hashes &&
+		test_must_be_empty dropped.hashes
+	)
+'
+
 test_done
-- 
2.33.0.96.g73915697e6

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] t/helper/test-bitmap.c: add 'dump-hashes' mode
  2021-09-07 21:17 ` [PATCH 1/4] t/helper/test-bitmap.c: add 'dump-hashes' mode Taylor Blau
@ 2021-09-08  1:37   ` Ævar Arnfjörð Bjarmason
  2021-09-08  2:24     ` Taylor Blau
  0 siblings, 1 reply; 55+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-08  1:37 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff


On Tue, Sep 07 2021, Taylor Blau wrote:

> +int test_bitmap_hashes(struct repository *r)
> +{
> +	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
> +	struct object_id oid;
> +	uint32_t i;
> +
> +	if (!bitmap_git->hashes)
> +		goto cleanup;
> +
> +	for (i = 0; i < bitmap_num_objects(bitmap_git); i++) {
> +		nth_bitmap_object_oid(bitmap_git, &oid, i);
> +
> +		printf("%s %"PRIu32"\n",
> +		       oid_to_hex(&oid), get_be32(bitmap_git->hashes + i));
> +	}
> +
> +cleanup:
> +	free_bitmap_index(bitmap_git);
> +
> +	return 0;
> +}

Seems like it should return void, it doesn't ever return non-0, nor in
the rest of this series....

> +static int bitmap_dump_hashes(void)
> +{
> +	return test_bitmap_hashes(the_repository);
> +}
> [...]
>  		return bitmap_list_commits();
> +	if (!strcmp(argv[1], "dump-hashes"))
> +		return bitmap_dump_hashes();

Perhaps the return code only for the brevity of this test-only code?
Seems like having bitmap_dump_hashes() do the "return 0" would be better
in that case.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/4] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
  2021-09-07 21:18 ` [PATCH 3/4] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps Taylor Blau
@ 2021-09-08  1:40   ` Ævar Arnfjörð Bjarmason
  2021-09-08  2:28     ` Taylor Blau
  2021-09-13  0:38   ` Junio C Hamano
  1 sibling, 1 reply; 55+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-08  1:40 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff


On Tue, Sep 07 2021, Taylor Blau wrote:

> +static int git_multi_pack_index_write_config(const char *var, const char *value,
> +					     void *cb)
> +{
> +	if (!strcmp(var, "pack.writebitmaphashcache")) {
> +		if (git_config_bool(var, value))
> +			opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
> +		else
> +			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
> +	}
> +
> +	/*
> +	 * No need to fall-back to 'git_default_config', since this was already
> +	 * called in 'cmd_multi_pack_index()'.
> +	 */
> +	return 0;
> +}
> +
>  static int cmd_multi_pack_index_write(int argc, const char **argv)
>  {
>  	struct option *options;
> @@ -73,6 +90,10 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
>  		OPT_END(),
>  	};
>  
> +	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
> +
> +	git_config(git_multi_pack_index_write_config, NULL);
> +

Since this is a write-only config option it would seem more logical to
just call git_config() once, and have a git_multip_pack_index_config,
which then would fall back on git_default_config, so we iterate it once,
and no need for a comment about the oddity.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/4] t5326: test propagating hashcache values
  2021-09-07 21:18 ` [PATCH 4/4] t5326: test propagating hashcache values Taylor Blau
@ 2021-09-08  1:46   ` Ævar Arnfjörð Bjarmason
  2021-09-08  2:30     ` Taylor Blau
  2021-09-13  0:46   ` Junio C Hamano
  1 sibling, 1 reply; 55+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-08  1:46 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff


On Tue, Sep 07 2021, Taylor Blau wrote:

> Now that we both can propagate values from the hashcache, and respect
> the configuration to enable the hashcache at all, test that both of
> these function correctly by hardening their behavior with a test.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  t/t5326-multi-pack-bitmaps.sh | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
>
> diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
> index 4ad7c2c969..24148ca35b 100755
> --- a/t/t5326-multi-pack-bitmaps.sh
> +++ b/t/t5326-multi-pack-bitmaps.sh
> @@ -283,4 +283,36 @@ test_expect_success 'pack.preferBitmapTips' '
>  	)
>  '
>  
> +test_expect_success 'hash-cache values are propagated from pack bitmaps' '
> +	rm -fr repo &&
> +	git init repo &&
> +	test_when_finished "rm -fr repo" &&

It seems the need for this "rm -fr repo" dance instead of just relying
on test_when_finished "rm -rf repo" is because of a 3x tests in a
function in tb/multi-pack-bitmaps that should probably be combined into
one, i.e. they share the same logical "repo" setup.

> +	(
> +		cd repo &&
> +
> +		git config pack.writeBitmapHashCache true &&

s/git config/test_config/, surely.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/4] t/helper/test-bitmap.c: add 'dump-hashes' mode
  2021-09-08  1:37   ` Ævar Arnfjörð Bjarmason
@ 2021-09-08  2:24     ` Taylor Blau
  0 siblings, 0 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-08  2:24 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, peff

On Wed, Sep 08, 2021 at 03:37:05AM +0200, Ævar Arnfjörð Bjarmason wrote:
> > +static int bitmap_dump_hashes(void)
> > +{
> > +	return test_bitmap_hashes(the_repository);
> > +}
> > [...]
> >  		return bitmap_list_commits();
> > +	if (!strcmp(argv[1], "dump-hashes"))
> > +		return bitmap_dump_hashes();
>
> Perhaps the return code only for the brevity of this test-only code?
> Seems like having bitmap_dump_hashes() do the "return 0" would be better
> in that case.

Yeah, it is silly to just return a constant from bitmap_dump_hashes
(and ditto for bitmap_list_commits), but it makes this easier to write.

I don't care either way. If you feel strongly, then please say so (but I
can't imagine that you do).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/4] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
  2021-09-08  1:40   ` Ævar Arnfjörð Bjarmason
@ 2021-09-08  2:28     ` Taylor Blau
  2021-09-09  8:18       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 55+ messages in thread
From: Taylor Blau @ 2021-09-08  2:28 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, peff

On Wed, Sep 08, 2021 at 03:40:19AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Tue, Sep 07 2021, Taylor Blau wrote:
>
> > +static int git_multi_pack_index_write_config(const char *var, const char *value,
> > +					     void *cb)
> > +{
> > +	if (!strcmp(var, "pack.writebitmaphashcache")) {
> > +		if (git_config_bool(var, value))
> > +			opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
> > +		else
> > +			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
> > +	}
> > +
> > +	/*
> > +	 * No need to fall-back to 'git_default_config', since this was already
> > +	 * called in 'cmd_multi_pack_index()'.
> > +	 */
> > +	return 0;
> > +}
> > +
> >  static int cmd_multi_pack_index_write(int argc, const char **argv)
> >  {
> >  	struct option *options;
> > @@ -73,6 +90,10 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
> >  		OPT_END(),
> >  	};
> >
> > +	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
> > +
> > +	git_config(git_multi_pack_index_write_config, NULL);
> > +
>
> Since this is a write-only config option it would seem more logical to
> just call git_config() once, and have a git_multip_pack_index_config,
> which then would fall back on git_default_config, so we iterate it once,
> and no need for a comment about the oddity.

Perhaps, but I'm not crazy about each sub-command having to call
git_config() itself when 'write' is the only one that actually has any
values to read.

FWIW, the commit-graph builtin does the same thing as is written here
(calling git_config() twice, once in cmd_commit_graph() with
git_default_config as the callback and again in cmd_commit_graph_write()
with git_commit_graph_write_config as the callback).

So I'm not opposed to cleaning it up, but I'd rather be consistent with
the existing behavior. To be honest, I'm not at all convinced that
reading the config twice is a bottleneck here when compared to
generating a MIDX.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/4] t5326: test propagating hashcache values
  2021-09-08  1:46   ` Ævar Arnfjörð Bjarmason
@ 2021-09-08  2:30     ` Taylor Blau
  2021-09-17  8:56       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 55+ messages in thread
From: Taylor Blau @ 2021-09-08  2:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, peff

On Wed, Sep 08, 2021 at 03:46:29AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Tue, Sep 07 2021, Taylor Blau wrote:
>
> > Now that we both can propagate values from the hashcache, and respect
> > the configuration to enable the hashcache at all, test that both of
> > these function correctly by hardening their behavior with a test.
> >
> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > ---
> >  t/t5326-multi-pack-bitmaps.sh | 32 ++++++++++++++++++++++++++++++++
> >  1 file changed, 32 insertions(+)
> >
> > diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
> > index 4ad7c2c969..24148ca35b 100755
> > --- a/t/t5326-multi-pack-bitmaps.sh
> > +++ b/t/t5326-multi-pack-bitmaps.sh
> > @@ -283,4 +283,36 @@ test_expect_success 'pack.preferBitmapTips' '
> >  	)
> >  '
> >
> > +test_expect_success 'hash-cache values are propagated from pack bitmaps' '
> > +	rm -fr repo &&
> > +	git init repo &&
> > +	test_when_finished "rm -fr repo" &&
>
> It seems the need for this "rm -fr repo" dance instead of just relying
> on test_when_finished "rm -rf repo" is because of a 3x tests in a
> function in tb/multi-pack-bitmaps that should probably be combined into
> one, i.e. they share the same logical "repo" setup.

Yeah. There's definitely room for clean-up there if we want to have each
of the tests operate on the same repo. I have always found sharing a
repo between tests difficult to reason about, since we have to assume
that any --run parameters could be passed in.

So I have tended to err on the side of creating and throwing away a new
repo per test, but I understand that's somewhat atypical for Git's
tests.

> > +	(
> > +		cd repo &&
> > +
> > +		git config pack.writeBitmapHashCache true &&
>
> s/git config/test_config/, surely.

No, since this is in a subshell (and we don't care about unsetting the
value of a config in a repo that we're going to throw away, anyways).

(Side-note: since this has a /bin/sh shebang, we can't detect that we're
in a subshell and so test_config actually _would_ work here. But
switching this test to run with /bin/bash where we can detect whether or
not we're in a subshell would cause this test to fail with test_config).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/4] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
  2021-09-08  2:28     ` Taylor Blau
@ 2021-09-09  8:18       ` Ævar Arnfjörð Bjarmason
  2021-09-09  9:34         ` Ævar Arnfjörð Bjarmason
  2021-09-09 14:47         ` Taylor Blau
  0 siblings, 2 replies; 55+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-09  8:18 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff


On Tue, Sep 07 2021, Taylor Blau wrote:

> On Wed, Sep 08, 2021 at 03:40:19AM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Tue, Sep 07 2021, Taylor Blau wrote:
>>
>> > +static int git_multi_pack_index_write_config(const char *var, const char *value,
>> > +					     void *cb)
>> > +{
>> > +	if (!strcmp(var, "pack.writebitmaphashcache")) {
>> > +		if (git_config_bool(var, value))
>> > +			opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
>> > +		else
>> > +			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
>> > +	}
>> > +
>> > +	/*
>> > +	 * No need to fall-back to 'git_default_config', since this was already
>> > +	 * called in 'cmd_multi_pack_index()'.
>> > +	 */
>> > +	return 0;
>> > +}
>> > +
>> >  static int cmd_multi_pack_index_write(int argc, const char **argv)
>> >  {
>> >  	struct option *options;
>> > @@ -73,6 +90,10 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
>> >  		OPT_END(),
>> >  	};
>> >
>> > +	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
>> > +
>> > +	git_config(git_multi_pack_index_write_config, NULL);
>> > +
>>
>> Since this is a write-only config option it would seem more logical to
>> just call git_config() once, and have a git_multip_pack_index_config,
>> which then would fall back on git_default_config, so we iterate it once,
>> and no need for a comment about the oddity.
>
> Perhaps, but I'm not crazy about each sub-command having to call
> git_config() itself when 'write' is the only one that actually has any
> values to read.
>
> FWIW, the commit-graph builtin does the same thing as is written here
> (calling git_config() twice, once in cmd_commit_graph() with
> git_default_config as the callback and again in cmd_commit_graph_write()
> with git_commit_graph_write_config as the callback).

I didn't notice your earlier d356d5debe5 (commit-graph: introduce
'commitGraph.maxNewFilters', 2020-09-17). As an aside the test added in
that commit seems to be broken or not testing that code change at all,
if I comment out the git_config(git_commit_graph_write_config, &opts)
it'll pass.

As a comment on this series I'd find 4/4 squashed into 3/4 easier to
read, when I did a "git blame" and found d356d5debe5 I discovered the
test right away, if and when this gets merged someone might do the same,
but not find the test as easily (they'd probably then grep the config
variable name and find it eventually...).

More importantly, the same issue with the commit-graph test seems to be
the case here, if I comment out the added config reading code it'll
still pass, it seems to be testing something, but not that the config is
being read.

> So I'm not opposed to cleaning it up, but I'd rather be consistent with
> the existing behavior. To be honest, I'm not at all convinced that
> reading the config twice is a bottleneck here when compared to
> generating a MIDX.

It's never going to matter at all for performance, I should have been
clearer with my comments. I meant them purely as a "this code is hard to
follow" comment.

I.e. since we read the config twice, and in both commit-graph.c and
multi-pack-index.c munge and write to the "opts" struct on
parse_options(), you'll need to follow logic like:

    1. Read config in cmd_X(), might set variable xyz
    2. Do parse_options() in cmd_X(), might set variable xyz also
    3. Now in cmd_X_subcmd(), read config, might set variable xyz
    4. Do parse_options() in cmd_X(), migh set variable xyz also

Of course in this case the relevant opts.flags only matters for the
"write" subcommand, so on more careful reading we don't need to worry
about the value flip-flopping between config defaults and getopts
settings, but just in terms of establishing a pattern we'll be following
in the subcommand built-ins I think this is setting us up for more
complexity than is needed.

As far as being consistent with existing behavior, in git-worktree,
git-stash which are both similarly structured subcommands we follow the
pattern of calling git_config() once, it seems to me better to follow
that pattern than the one in d356d5debe5 if the config can be
unambiguously parsed in one pass.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/4] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
  2021-09-09  8:18       ` Ævar Arnfjörð Bjarmason
@ 2021-09-09  9:34         ` Ævar Arnfjörð Bjarmason
  2021-09-09 14:55           ` Taylor Blau
  2021-09-09 14:47         ` Taylor Blau
  1 sibling, 1 reply; 55+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-09  9:34 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff


On Thu, Sep 09 2021, Ævar Arnfjörð Bjarmason wrote:

> On Tue, Sep 07 2021, Taylor Blau wrote:
>
>> On Wed, Sep 08, 2021 at 03:40:19AM +0200, Ævar Arnfjörð Bjarmason wrote:
>>>
>>> On Tue, Sep 07 2021, Taylor Blau wrote:
>>>
>>> > +static int git_multi_pack_index_write_config(const char *var, const char *value,
>>> > +					     void *cb)
>>> > +{
>>> > +	if (!strcmp(var, "pack.writebitmaphashcache")) {
>>> > +		if (git_config_bool(var, value))
>>> > +			opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
>>> > +		else
>>> > +			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
>>> > +	}
>>> > +
>>> > +	/*
>>> > +	 * No need to fall-back to 'git_default_config', since this was already
>>> > +	 * called in 'cmd_multi_pack_index()'.
>>> > +	 */
>>> > +	return 0;
>>> > +}
>>> > +
>>> >  static int cmd_multi_pack_index_write(int argc, const char **argv)
>>> >  {
>>> >  	struct option *options;
>>> > @@ -73,6 +90,10 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
>>> >  		OPT_END(),
>>> >  	};
>>> >
>>> > +	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
>>> > +
>>> > +	git_config(git_multi_pack_index_write_config, NULL);
>>> > +
>>>
>>> Since this is a write-only config option it would seem more logical to
>>> just call git_config() once, and have a git_multip_pack_index_config,
>>> which then would fall back on git_default_config, so we iterate it once,
>>> and no need for a comment about the oddity.
>>
>> Perhaps, but I'm not crazy about each sub-command having to call
>> git_config() itself when 'write' is the only one that actually has any
>> values to read.
>>
>> FWIW, the commit-graph builtin does the same thing as is written here
>> (calling git_config() twice, once in cmd_commit_graph() with
>> git_default_config as the callback and again in cmd_commit_graph_write()
>> with git_commit_graph_write_config as the callback).
>
> I didn't notice your earlier d356d5debe5 (commit-graph: introduce
> 'commitGraph.maxNewFilters', 2020-09-17). As an aside the test added in
> that commit seems to be broken or not testing that code change at all,
> if I comment out the git_config(git_commit_graph_write_config, &opts)
> it'll pass.
>
> As a comment on this series I'd find 4/4 squashed into 3/4 easier to
> read, when I did a "git blame" and found d356d5debe5 I discovered the
> test right away, if and when this gets merged someone might do the same,
> but not find the test as easily (they'd probably then grep the config
> variable name and find it eventually...).
>
> More importantly, the same issue with the commit-graph test seems to be
> the case here, if I comment out the added config reading code it'll
> still pass, it seems to be testing something, but not that the config is
> being read.
>
>> So I'm not opposed to cleaning it up, but I'd rather be consistent with
>> the existing behavior. To be honest, I'm not at all convinced that
>> reading the config twice is a bottleneck here when compared to
>> generating a MIDX.
>
> It's never going to matter at all for performance, I should have been
> clearer with my comments. I meant them purely as a "this code is hard to
> follow" comment.
>
> I.e. since we read the config twice, and in both commit-graph.c and
> multi-pack-index.c munge and write to the "opts" struct on
> parse_options(), you'll need to follow logic like:
>
>     1. Read config in cmd_X(), might set variable xyz
>     2. Do parse_options() in cmd_X(), might set variable xyz also
>     3. Now in cmd_X_subcmd(), read config, might set variable xyz
>     4. Do parse_options() in cmd_X(), migh set variable xyz also
>
> Of course in this case the relevant opts.flags only matters for the
> "write" subcommand, so on more careful reading we don't need to worry
> about the value flip-flopping between config defaults and getopts
> settings, but just in terms of establishing a pattern we'll be following
> in the subcommand built-ins I think this is setting us up for more
> complexity than is needed.
>
> As far as being consistent with existing behavior, in git-worktree,
> git-stash which are both similarly structured subcommands we follow the
> pattern of calling git_config() once, it seems to me better to follow
> that pattern than the one in d356d5debe5 if the config can be
> unambiguously parsed in one pass.

In similar spirit as my
https://lore.kernel.org/git/87v93bidhn.fsf@evledraar.gmail.com/ I
started seeing if not doing the flags via getopt but instead variables &
setting the flags later was better, and came up with this on top. Not
for this series, more to muse on how we can write these subcommands in a
simpler manner (or not).

I may have discovered a subtle bug in the process, in
cmd_multi_pack_index_repack() we end up calling write_midx_internal(),
which cares about MIDX_WRITE_REV_INDEX, but only
cmd_multi_pack_index_write() will set that flag, both before & after my
patch. Are we using the wrong flags during repack as a result?

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index dd1652531bf..1b97b2ee4e1 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -45,14 +45,16 @@ static char const * const builtin_multi_pack_index_usage[] = {
 static struct opts_multi_pack_index {
 	const char *object_dir;
 	const char *preferred_pack;
-	unsigned long batch_size;
-	unsigned flags;
-} opts;
+	int progress;
+	int write_bitmap_hash_cache;
+} opts = {
+	.write_bitmap_hash_cache = -1,
+};
 
 static struct option common_opts[] = {
 	OPT_FILENAME(0, "object-dir", &opts.object_dir,
 	  N_("object directory containing set of packfile and pack-index pairs")),
-	OPT_BIT(0, "progress", &opts.flags, N_("force progress reporting"), MIDX_PROGRESS),
+	OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
 	OPT_END(),
 };
 
@@ -61,38 +63,29 @@ static struct option *add_common_options(struct option *prev)
 	return parse_options_concat(common_opts, prev);
 }
 
-static int git_multi_pack_index_write_config(const char *var, const char *value,
-					     void *cb)
+static int git_multi_pack_index_config(const char *var, const char *value,
+				       void *cb)
 {
 	if (!strcmp(var, "pack.writebitmaphashcache")) {
-		if (git_config_bool(var, value))
-			opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
-		else
-			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
+		opts.write_bitmap_hash_cache = git_config_bool(var, value);
+		return 0;
 	}
 
-	/*
-	 * No need to fall-back to 'git_default_config', since this was already
-	 * called in 'cmd_multi_pack_index()'.
-	 */
-	return 0;
+	return git_default_config(var, value, NULL);
 }
 
 static int cmd_multi_pack_index_write(int argc, const char **argv)
 {
 	struct option *options;
+	static int write_bitmap = 0;
 	static struct option builtin_multi_pack_index_write_options[] = {
 		OPT_STRING(0, "preferred-pack", &opts.preferred_pack,
 			   N_("preferred-pack"),
 			   N_("pack for reuse when computing a multi-pack bitmap")),
-		OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
-			MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
+		OPT_BOOL(0, "bitmap", &write_bitmap, N_("write multi-pack bitmap")),
 		OPT_END(),
 	};
-
-	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
-
-	git_config(git_multi_pack_index_write_config, NULL);
+	unsigned flags = 0;
 
 	options = add_common_options(builtin_multi_pack_index_write_options);
 
@@ -107,8 +100,15 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 
 	FREE_AND_NULL(options);
 
-	return write_midx_file(opts.object_dir, opts.preferred_pack,
-			       opts.flags);
+	if (opts.progress)
+		flags |= MIDX_PROGRESS;
+	/* Both -1 default and 1 via config */
+	if (!opts.write_bitmap_hash_cache)
+		flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
+	if (write_bitmap)
+		flags |= MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX;
+
+	return write_midx_file(opts.object_dir, opts.preferred_pack, flags);
 }
 
 static int cmd_multi_pack_index_verify(int argc, const char **argv)
@@ -124,7 +124,7 @@ static int cmd_multi_pack_index_verify(int argc, const char **argv)
 		usage_with_options(builtin_multi_pack_index_verify_usage,
 				   options);
 
-	return verify_midx_file(the_repository, opts.object_dir, opts.flags);
+	return verify_midx_file(the_repository, opts.object_dir, opts.progress);
 }
 
 static int cmd_multi_pack_index_expire(int argc, const char **argv)
@@ -140,14 +140,15 @@ static int cmd_multi_pack_index_expire(int argc, const char **argv)
 		usage_with_options(builtin_multi_pack_index_expire_usage,
 				   options);
 
-	return expire_midx_packs(the_repository, opts.object_dir, opts.flags);
+	return expire_midx_packs(the_repository, opts.object_dir, opts.progress);
 }
 
 static int cmd_multi_pack_index_repack(int argc, const char **argv)
 {
+	static unsigned long batch_size = 0;
 	struct option *options;
 	static struct option builtin_multi_pack_index_repack_options[] = {
-		OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
+		OPT_MAGNITUDE(0, "batch-size", &batch_size,
 		  N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")),
 		OPT_END(),
 	};
@@ -167,7 +168,8 @@ static int cmd_multi_pack_index_repack(int argc, const char **argv)
 	FREE_AND_NULL(options);
 
 	return midx_repack(the_repository, opts.object_dir,
-			   (size_t)opts.batch_size, opts.flags);
+			   (size_t)batch_size,
+			   opts.progress ? MIDX_PROGRESS : 0);
 }
 
 int cmd_multi_pack_index(int argc, const char **argv,
@@ -175,10 +177,10 @@ int cmd_multi_pack_index(int argc, const char **argv,
 {
 	struct option *builtin_multi_pack_index_options = common_opts;
 
-	git_config(git_default_config, NULL);
+	git_config(git_multi_pack_index_config, NULL);
 
 	if (isatty(2))
-		opts.flags |= MIDX_PROGRESS;
+		opts.progress = 1;
 	argc = parse_options(argc, argv, prefix,
 			     builtin_multi_pack_index_options,
 			     builtin_multi_pack_index_usage,
diff --git a/midx.c b/midx.c
index 6c35dcd557c..3e722888d69 100644
--- a/midx.c
+++ b/midx.c
@@ -1482,7 +1482,7 @@ static int compare_pair_pos_vs_id(const void *_a, const void *_b)
 			display_progress(progress, _n); \
 	} while (0)
 
-int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags)
+int verify_midx_file(struct repository *r, const char *object_dir, int opt_progress)
 {
 	struct pair_pos_vs_id *pairs = NULL;
 	uint32_t i;
@@ -1505,7 +1505,7 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 	if (!midx_checksum_valid(m))
 		midx_report(_("incorrect checksum"));
 
-	if (flags & MIDX_PROGRESS)
+	if (opt_progress)
 		progress = start_delayed_progress(_("Looking for referenced packfiles"),
 					  m->num_packs);
 	for (i = 0; i < m->num_packs; i++) {
@@ -1534,7 +1534,7 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 		return verify_midx_error;
 	}
 
-	if (flags & MIDX_PROGRESS)
+	if (opt_progress)
 		progress = start_sparse_progress(_("Verifying OID order in multi-pack-index"),
 						 m->num_objects - 1);
 	for (i = 0; i < m->num_objects - 1; i++) {
@@ -1563,14 +1563,14 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 		pairs[i].pack_int_id = nth_midxed_pack_int_id(m, i);
 	}
 
-	if (flags & MIDX_PROGRESS)
+	if (opt_progress)
 		progress = start_sparse_progress(_("Sorting objects by packfile"),
 						 m->num_objects);
 	display_progress(progress, 0); /* TODO: Measure QSORT() progress */
 	QSORT(pairs, m->num_objects, compare_pair_pos_vs_id);
 	stop_progress(&progress);
 
-	if (flags & MIDX_PROGRESS)
+	if (opt_progress)
 		progress = start_sparse_progress(_("Verifying object offsets"), m->num_objects);
 	for (i = 0; i < m->num_objects; i++) {
 		struct object_id oid;
diff --git a/midx.h b/midx.h
index 541d9ac728d..0dfe6a54ef3 100644
--- a/midx.h
+++ b/midx.h
@@ -64,7 +64,7 @@ int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, i
 
 int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags);
 void clear_midx_file(struct repository *r);
-int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags);
+int verify_midx_file(struct repository *r, const char *object_dir, int opt_progress);
 int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags);
 int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags);
 

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/4] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
  2021-09-09  8:18       ` Ævar Arnfjörð Bjarmason
  2021-09-09  9:34         ` Ævar Arnfjörð Bjarmason
@ 2021-09-09 14:47         ` Taylor Blau
  1 sibling, 0 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-09 14:47 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Taylor Blau, git, peff

On Thu, Sep 09, 2021 at 10:18:10AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Tue, Sep 07 2021, Taylor Blau wrote:
>
> > On Wed, Sep 08, 2021 at 03:40:19AM +0200, Ævar Arnfjörð Bjarmason wrote:
> >>
> >> On Tue, Sep 07 2021, Taylor Blau wrote:
> >>
> >> > +static int git_multi_pack_index_write_config(const char *var, const char *value,
> >> > +					     void *cb)
> >> > +{
> >> > +	if (!strcmp(var, "pack.writebitmaphashcache")) {
> >> > +		if (git_config_bool(var, value))
> >> > +			opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
> >> > +		else
> >> > +			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
> >> > +	}
> >> > +
> >> > +	/*
> >> > +	 * No need to fall-back to 'git_default_config', since this was already
> >> > +	 * called in 'cmd_multi_pack_index()'.
> >> > +	 */
> >> > +	return 0;
> >> > +}
> >> > +
> >> >  static int cmd_multi_pack_index_write(int argc, const char **argv)
> >> >  {
> >> >  	struct option *options;
> >> > @@ -73,6 +90,10 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
> >> >  		OPT_END(),
> >> >  	};
> >> >
> >> > +	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
> >> > +
> >> > +	git_config(git_multi_pack_index_write_config, NULL);
> >> > +
> >>
> >> Since this is a write-only config option it would seem more logical to
> >> just call git_config() once, and have a git_multip_pack_index_config,
> >> which then would fall back on git_default_config, so we iterate it once,
> >> and no need for a comment about the oddity.
> >
> > Perhaps, but I'm not crazy about each sub-command having to call
> > git_config() itself when 'write' is the only one that actually has any
> > values to read.
> >
> > FWIW, the commit-graph builtin does the same thing as is written here
> > (calling git_config() twice, once in cmd_commit_graph() with
> > git_default_config as the callback and again in cmd_commit_graph_write()
> > with git_commit_graph_write_config as the callback).
>
> I didn't notice your earlier d356d5debe5 (commit-graph: introduce
> 'commitGraph.maxNewFilters', 2020-09-17). As an aside the test added in
> that commit seems to be broken or not testing that code change at all,
> if I comment out the git_config(git_commit_graph_write_config, &opts)
> it'll pass.

That makes sense; the test that d356d5debe5 added is ensuring that the
option `--max-new-filters` overrides any configured value of
`commitGraph.maxNewFilters` (so not reading the configuration would be
fine there).

> More importantly, the same issue with the commit-graph test seems to be
> the case here, if I comment out the added config reading code it'll
> still pass, it seems to be testing something, but not that the config is
> being read.

I think this also makes sense; since MIDX_WRITE_BITMAP_HASH_CACHE is the
default and is set in cmd_multi_pack_index_write(). So it may be worth
adding a test to say "make sure the hash-cache _isn't_ written when I:

    git config pack.writeBitmapHashCache &&
    git multi-pack-index write --bitmap

But I don't feel strongly about it (hence I didn't write such a test in
the original version which I sent here). If you think it would be
helpful in a newer version, I'm happy to add it.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/4] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
  2021-09-09  9:34         ` Ævar Arnfjörð Bjarmason
@ 2021-09-09 14:55           ` Taylor Blau
  2021-09-09 15:50             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 55+ messages in thread
From: Taylor Blau @ 2021-09-09 14:55 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Taylor Blau, git, peff

On Thu, Sep 09, 2021 at 11:34:16AM +0200, Ævar Arnfjörð Bjarmason wrote:
> In similar spirit as my
> https://lore.kernel.org/git/87v93bidhn.fsf@evledraar.gmail.com/ I
> started seeing if not doing the flags via getopt but instead variables &
> setting the flags later was better, and came up with this on top. Not
> for this series, more to muse on how we can write these subcommands in a
> simpler manner (or not).

Sure, I think that everything you wrote here is correct. I don't have a
strong opinion, really, but one benefit of not manipulating a single
'flags' int is that we don't have to do stuff like:

  if (git_config_bool(var, value))
    opts.flags |= FLAG_FOO;
  else
    opts.flags &= ~FLAG_FOO;

and instead can write `opts.foo = git_config_bool(var, value)`.

Of course, the trade-off is that you later have to turn `opts.foo` into
flags at some point (or pass each option as an individual parameter). So
nothing's free, and I see it as a toss-up between which is easier to
read and write.

> I may have discovered a subtle bug in the process, in
> cmd_multi_pack_index_repack() we end up calling write_midx_internal(),
> which cares about MIDX_WRITE_REV_INDEX, but only
> cmd_multi_pack_index_write() will set that flag, both before & after my
> patch. Are we using the wrong flags during repack as a result?

Only the `write` sub-command would ever want to set that flag, since we
don't support writing a bitmap after `expire`. So that part seems right,
but perhaps there is a another problem you're seeing?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/4] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
  2021-09-09 14:55           ` Taylor Blau
@ 2021-09-09 15:50             ` Ævar Arnfjörð Bjarmason
  2021-09-09 16:23               ` Taylor Blau
  0 siblings, 1 reply; 55+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-09 15:50 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff


On Thu, Sep 09 2021, Taylor Blau wrote:

> On Thu, Sep 09, 2021 at 11:34:16AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> In similar spirit as my
>> https://lore.kernel.org/git/87v93bidhn.fsf@evledraar.gmail.com/ I
>> started seeing if not doing the flags via getopt but instead variables &
>> setting the flags later was better, and came up with this on top. Not
>> for this series, more to muse on how we can write these subcommands in a
>> simpler manner (or not).
>
> Sure, I think that everything you wrote here is correct. I don't have a
> strong opinion, really, but one benefit of not manipulating a single
> 'flags' int is that we don't have to do stuff like:
>
>   if (git_config_bool(var, value))
>     opts.flags |= FLAG_FOO;
>   else
>     opts.flags &= ~FLAG_FOO;
>
> and instead can write `opts.foo = git_config_bool(var, value)`.

*nod*

> Of course, the trade-off is that you later have to turn `opts.foo` into
> flags at some point (or pass each option as an individual parameter). So
> nothing's free, and I see it as a toss-up between which is easier to
> read and write.

I think that trade-off is usually a benefit, also in the pack-write.c
etc. case, i.e. you enforce a clear boundary between the built-in and
the underlying API, and don't have to e.g. wonder if some write flag is
handled by verify() (which will just care about the progress flag), as
you pass some moral equivalent of a "struct
all_the_options_for_all_the_things" around between all of them.

I realize trying to solve that problem from a different angle may be how
you ended up going for per-subcommand config reading....

>> I may have discovered a subtle bug in the process, in
>> cmd_multi_pack_index_repack() we end up calling write_midx_internal(),
>> which cares about MIDX_WRITE_REV_INDEX, but only
>> cmd_multi_pack_index_write() will set that flag, both before & after my
>> patch. Are we using the wrong flags during repack as a result?
>
> Only the `write` sub-command would ever want to set that flag, since we
> don't support writing a bitmap after `expire`. So that part seems right,
> but perhaps there is a another problem you're seeing?

In midx_repack() we'll call write_midx_internal(). That function gets
the "flags" we pass to midx_repack() and will check
MIDX_WRITE_REV_INDEX. I haven't checked whether we actually reach that,
but that's what I was wondering, i.e. whether the repack routine would
"write" when repacking, and we missed the flag option there.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/4] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
  2021-09-09 15:50             ` Ævar Arnfjörð Bjarmason
@ 2021-09-09 16:23               ` Taylor Blau
  0 siblings, 0 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-09 16:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Taylor Blau, git, peff

On Thu, Sep 09, 2021 at 05:50:47PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> I may have discovered a subtle bug in the process, in
> >> cmd_multi_pack_index_repack() we end up calling write_midx_internal(),
> >> which cares about MIDX_WRITE_REV_INDEX, but only
> >> cmd_multi_pack_index_write() will set that flag, both before & after my
> >> patch. Are we using the wrong flags during repack as a result?
> >
> > Only the `write` sub-command would ever want to set that flag, since we
> > don't support writing a bitmap after `expire`. So that part seems right,
> > but perhaps there is a another problem you're seeing?
>
> In midx_repack() we'll call write_midx_internal(). That function gets
> the "flags" we pass to midx_repack() and will check
> MIDX_WRITE_REV_INDEX. I haven't checked whether we actually reach that,
> but that's what I was wondering, i.e. whether the repack routine would
> "write" when repacking, and we missed the flag option there.

I don't think it's a problem in practice. We would never have
MIDX_WRITE_REV_INDEX set when executing cmd_multi_pack_index_repack(),
(and the same is true for all other subcommands besides `write`) because
the default value for flags is just MIDX_PROGRESS (if isatty(2)), and we
only add the WRITE_REV_INDEX bit from within the write handler.

More generally, that is to say "we only support writing a bitmap from
the `write` sub-command". There is no reason that we couldn't lift this
limitation and support writing a bitmap on the resulting MIDX after
`expire` or `repack` we just haven't done so.

But I don't see any problems with not getting the right flags, etc.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/4] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
  2021-09-07 21:18 ` [PATCH 3/4] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps Taylor Blau
  2021-09-08  1:40   ` Ævar Arnfjörð Bjarmason
@ 2021-09-13  0:38   ` Junio C Hamano
  2021-09-14  1:15     ` Taylor Blau
  1 sibling, 1 reply; 55+ messages in thread
From: Junio C Hamano @ 2021-09-13  0:38 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff

Taylor Blau <me@ttaylorr.com> writes:

> +static int git_multi_pack_index_write_config(const char *var, const char *value,
> +					     void *cb)
> +{
> +	if (!strcmp(var, "pack.writebitmaphashcache")) {
> +		if (git_config_bool(var, value))
> +			opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
> +		else
> +			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
> +	}
> +
> +	/*
> +	 * No need to fall-back to 'git_default_config', since this was already
> +	 * called in 'cmd_multi_pack_index()'.
> +	 */

It's probably not just "No need to", but calling default_config() or
any "more generic" config this late is a wrong pattern, as the one
that was already called in the caller may have been overridden by
the command line options parser, and calling "more generic" config
callback after that would be a no-no.  So I think this should read
"we should never make a fall-back call to ...", instead.

Granted, cmd_multi_pack_index() only uses default_config(), and what
it sets will not be overridden by the command line argument parser
of the multi-pack-index command, so it is not a problem yet, but if
we ever introduce configuration variables that are applicable to
multiple subcommands of multi-pack-index, this distinction may start
to matter.

Unless a Git subcommand cannot work with just a single call to
git_config() with a callback upfront, I actually think it may be
better for them to issue a more pointed git_config_get_*() on the
variables they are interested in, never making a separate call to
git_config().

But that can be left for the future.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/4] t5326: test propagating hashcache values
  2021-09-07 21:18 ` [PATCH 4/4] t5326: test propagating hashcache values Taylor Blau
  2021-09-08  1:46   ` Ævar Arnfjörð Bjarmason
@ 2021-09-13  0:46   ` Junio C Hamano
  2021-09-14  1:12     ` Taylor Blau
  1 sibling, 1 reply; 55+ messages in thread
From: Junio C Hamano @ 2021-09-13  0:46 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff

Taylor Blau <me@ttaylorr.com> writes:

> Now that we both can propagate values from the hashcache, and respect
> the configuration to enable the hashcache at all, test that both of
> these function correctly by hardening their behavior with a test.

When we serve "git fetch" requests from a resulting repository,
because we have namehash available without traversing, we can serve
a packfile to the requestor, using the reachability bitmap to pick
objects that need to be sent, grouping the blobs with the same
namehash to be deltified against each other, resulting in a pack
that is better compressed than before?

What I am wondering is if we can come up with a "now, because we no
longer lose hashcache, we can do X so much better, here are the
numbers".


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/4] t5326: test propagating hashcache values
  2021-09-13  0:46   ` Junio C Hamano
@ 2021-09-14  1:12     ` Taylor Blau
  2021-09-14  2:05       ` Junio C Hamano
  0 siblings, 1 reply; 55+ messages in thread
From: Taylor Blau @ 2021-09-14  1:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, peff

On Sun, Sep 12, 2021 at 05:46:18PM -0700, Junio C Hamano wrote:
> What I am wondering is if we can come up with a "now, because we no
> longer lose hashcache, we can do X so much better, here are the
> numbers".

Sure, here they are. Bear in mind that these numbers are (a) on git.git
and (b) with `pack.threads` set to 1 so the overall runtime looks more
like CPU time.

  Test                            origin/tb/multi-pack-bitmaps   HEAD
  -------------------------------------------------------------------------------------
  5326.4: simulated clone         1.87(1.80+0.07)                1.46(1.42+0.03) -21.9%
  5326.5: simulated fetch         2.66(2.61+0.04)                1.47(1.43+0.04) -44.7%
  5326.6: pack to file (bitmap)   2.74(2.62+0.12)                1.89(1.82+0.07) -31.0%

Apologies for taking a little while to respond, I spent longer than I'm
willing to admit double checking these numbers with Peff because of
inconsistencies in my testing setup.

Alas, there they are. They are basically no different than having the
name-hash for single pack bitmaps, it's just now we don't throw them
away when generating a MIDX bitmap from a state where the repository
already has a single-pack bitmap.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/4] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
  2021-09-13  0:38   ` Junio C Hamano
@ 2021-09-14  1:15     ` Taylor Blau
  0 siblings, 0 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-14  1:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, peff

On Sun, Sep 12, 2021 at 05:38:36PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > +static int git_multi_pack_index_write_config(const char *var, const char *value,
> > +					     void *cb)
> > +{
> > +	if (!strcmp(var, "pack.writebitmaphashcache")) {
> > +		if (git_config_bool(var, value))
> > +			opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
> > +		else
> > +			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
> > +	}
> > +
> > +	/*
> > +	 * No need to fall-back to 'git_default_config', since this was already
> > +	 * called in 'cmd_multi_pack_index()'.
> > +	 */
>
> It's probably not just "No need to", but calling default_config() or
> any "more generic" config this late is a wrong pattern [...]

Makes sense to me, will apply. Thanks!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/4] t5326: test propagating hashcache values
  2021-09-14  1:12     ` Taylor Blau
@ 2021-09-14  2:05       ` Junio C Hamano
  2021-09-14  5:11         ` Taylor Blau
  2021-09-14  5:23         ` Jeff King
  0 siblings, 2 replies; 55+ messages in thread
From: Junio C Hamano @ 2021-09-14  2:05 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff

Taylor Blau <me@ttaylorr.com> writes:

> Alas, there they are. They are basically no different than having the
> name-hash for single pack bitmaps, it's just now we don't throw them
> away when generating a MIDX bitmap from a state where the repository
> already has a single-pack bitmap.

I actually wasn't expecting any CPU/time difference.

I hope that we are talking about the same name-hash, which is used
to sort the blobs so that when pack-objects try to find a good delta
base, the blobs from the same path will sit close to each other and
hopefully fit in the pack window.

The effect I was hoping to see by not discarding the information was
that we find better delta base hence smaller deltas in the resulting
packfiles.

Thanks.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/4] t5326: test propagating hashcache values
  2021-09-14  2:05       ` Junio C Hamano
@ 2021-09-14  5:11         ` Taylor Blau
  2021-09-14  5:17           ` Taylor Blau
  2021-09-14  5:27           ` Jeff King
  2021-09-14  5:23         ` Jeff King
  1 sibling, 2 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-14  5:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff

On Mon, Sep 13, 2021 at 07:05:32PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Alas, there they are. They are basically no different than having the
> > name-hash for single pack bitmaps, it's just now we don't throw them
> > away when generating a MIDX bitmap from a state where the repository
> > already has a single-pack bitmap.
>
> I actually wasn't expecting any CPU/time difference.

I think it is possible to see the CPU usage go down without affecting the
resulting pack size. See below for a more detailed analysis.

> I hope that we are talking about the same name-hash, which is used
> to sort the blobs so that when pack-objects try to find a good delta
> base, the blobs from the same path will sit close to each other and
> hopefully fit in the pack window.

Yes, of course.

> The effect I was hoping to see by not discarding the information was
> that we find better delta base hence smaller deltas in the resulting
> packfiles.

I think it is possible to observe either a decrease in CPU or a decrease
in the resulting pack size.

In my experience having the name-hash filled in results in finding good
delta pairs much more quickly than without, but that in many
repositories the resulting pack size is basically the same. In other
words, the resulting pack is pretty similar whether you use the
name-hash or not, it just affects how quickly you get there.

Some experiments to back that up: I instrumented the existing p5326 by
replacing anything like "pack-objects ... --stdout >/dev/null" with
"pack-objects ... --stdout >pack.tmp" and then added test_size's to
measure the size of each pack.

On the tip of this branch, the results are:

		Test                              origin/tb/multi-pack-bitmaps   HEAD
		----------------------------------------------------------------------------
		5326.5: simulated clone size                 3.3G                 3.3G +0.0%
		5326.7: simulated fetch size                10.5M                10.5M -0.2%
		5326.21: clone (partial bitmap)              3.3G                 3.3G +0.0%

Looking at c171d3e677 (pack-bitmap: implement optional name_hash cache,
2013-10-22), I modified[1] that script to replace timing pack-objects with
counting the number of bytes it wrote.

Doing that shows that the name-hash doesn't make a substantial difference in
the resulting pack size (numbers on a recent-ish copy of the kernel):

		Test                      c171d3e677d777c50231d8dea32ae691936da819^   c171d3e677d777c50231d8dea32ae691936da819
		--------------------------------------------------------------------------------------------------------------
		9999.3: simulated clone              3.2G                                        3.2G +0.0%
		9999.4: simulated fetch                32                                          32 +0.0%
		9999.6: partial bitmap               3.1G                                        3.1G +0.0%

(As a mostly-unrelated aside, I was curious why the pack size jumped from 3.2GB
to 3.3GB, but I can reproduce that jump even in p5310--the single pack bitmap
test--on the tip of my branch. So it does appear to be a regression which I'll
look into, but it's unrelated to this branch or MIDX bitmaps).

Thanks,
Taylor

[1]: https://gist.github.com/ttaylorr/6cfa3eb9fd012f81b833873d50f96f71

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/4] t5326: test propagating hashcache values
  2021-09-14  5:11         ` Taylor Blau
@ 2021-09-14  5:17           ` Taylor Blau
  2021-09-14  5:27           ` Jeff King
  1 sibling, 0 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-14  5:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff

On Tue, Sep 14, 2021 at 01:11:34AM -0400, Taylor Blau wrote:
> On the tip of this branch, the results are:
>
> [...]

Eek, those tabs are horrific. I must have left my editor in paste mode
when inserting them, sorry about that.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/4] t5326: test propagating hashcache values
  2021-09-14  2:05       ` Junio C Hamano
  2021-09-14  5:11         ` Taylor Blau
@ 2021-09-14  5:23         ` Jeff King
  2021-09-14  5:49           ` Junio C Hamano
  1 sibling, 1 reply; 55+ messages in thread
From: Jeff King @ 2021-09-14  5:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git

On Mon, Sep 13, 2021 at 07:05:32PM -0700, Junio C Hamano wrote:

> Taylor Blau <me@ttaylorr.com> writes:
> 
> > Alas, there they are. They are basically no different than having the
> > name-hash for single pack bitmaps, it's just now we don't throw them
> > away when generating a MIDX bitmap from a state where the repository
> > already has a single-pack bitmap.
> 
> I actually wasn't expecting any CPU/time difference.

I was, for the same reason we saw an improvement there in ae4f07fbcc
(pack-bitmap: implement optional name_hash cache, 2013-12-21): without a
name-hash, we try a bunch of fruitless deltas before we find a decent
one.

> I hope that we are talking about the same name-hash, which is used
> to sort the blobs so that when pack-objects try to find a good delta
> base, the blobs from the same path will sit close to each other and
> hopefully fit in the pack window.

Yes, exactly. We spend less time finding the good ones if the likely
candidates are close together. We may _also_ find better ones overall,
depending on the number of candidates and the window size.

The bitmap perf tests (neither p5310 nor its new midx cousin p5326)
don't check the output size.

> The effect I was hoping to see by not discarding the information was
> that we find better delta base hence smaller deltas in the resulting
> packfiles.

If we add a size check like so[1]:

diff --git a/t/perf/lib-bitmap.sh b/t/perf/lib-bitmap.sh
index 63d3bc7cec..648cd5b13d 100644
--- a/t/perf/lib-bitmap.sh
+++ b/t/perf/lib-bitmap.sh
@@ -10,7 +10,11 @@ test_full_bitmap () {
 		{
 			echo HEAD &&
 			echo ^$have
-		} | git pack-objects --revs --stdout >/dev/null
+		} | git pack-objects --revs --stdout >tmp.pack
+	'
+
+	test_size 'fetch size' '
+		wc -c <tmp.pack
 	'
 
 	test_perf 'pack to file (bitmap)' '

then the results I get using linux.git are:

Test                       origin/tb/multi-pack-bitmaps   origin/tb/midx-write-propagate-namehash
-------------------------------------------------------------------------------------------------
5326.4: simulated fetch    2.32(7.16+0.21)                2.00(3.79+0.18) -13.8%
5326.5: fetch size         16.7M                          15.5M -7.1%

so you can see that we spent about half as much CPU (ignore the
wall-clock percentage; the interesting thing is the userspace time,
because my machine has 8 cores). But we also shaved off a bit from the
pack, so we really did manage to find better deltas, too.

I see that Taylor just posted a very similar response, and independently
did the exact same experiment I did. ;) I'll send this anyway, though,
as my particular run showed slightly different results.

-Peff

[1] The other thing you'd want (and I presume Taylor was using for his
    earlier timings) is:

diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
index 5845109ac7..a4ac7746a7 100755
--- a/t/perf/p5326-multi-pack-bitmaps.sh
+++ b/t/perf/p5326-multi-pack-bitmaps.sh
@@ -11,7 +11,7 @@ test_expect_success 'enable multi-pack index' '
 '
 
 test_perf 'setup multi-pack index' '
-	git repack -ad &&
+	git repack -adb &&
 	git multi-pack-index write --bitmap
 '
 

since otherwise there is no pack bitmap for the midx to pull the
name-hashes from.

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/4] t5326: test propagating hashcache values
  2021-09-14  5:11         ` Taylor Blau
  2021-09-14  5:17           ` Taylor Blau
@ 2021-09-14  5:27           ` Jeff King
  2021-09-14  5:31             ` Taylor Blau
  1 sibling, 1 reply; 55+ messages in thread
From: Jeff King @ 2021-09-14  5:27 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Junio C Hamano, git

On Tue, Sep 14, 2021 at 01:11:34AM -0400, Taylor Blau wrote:

> Some experiments to back that up: I instrumented the existing p5326 by
> replacing anything like "pack-objects ... --stdout >/dev/null" with
> "pack-objects ... --stdout >pack.tmp" and then added test_size's to
> measure the size of each pack.
> 
> On the tip of this branch, the results are:
> 
> 		Test                              origin/tb/multi-pack-bitmaps   HEAD
> 		----------------------------------------------------------------------------
> 		5326.5: simulated clone size                 3.3G                 3.3G +0.0%
> 		5326.7: simulated fetch size                10.5M                10.5M -0.2%
> 		5326.21: clone (partial bitmap)              3.3G                 3.3G +0.0%

I wouldn't expect a change in the clone size. We're already including
all the objects from the single pack, so we won't even look for new
deltas.

In my run, I did see a small improvement in the fetch size (though my
size both before and after was larger than yours). This is going to
depend on the exact set of deltas we have (which in turn depends on how
your repo happens to have been packed before the script even starts) and
which ones the client actually wants (which may depend on the exact tip
of your repo).

Presumably you also saw a decrease in the user CPU time of 5326.6 here.
If not, you may have forgotten the extra patch to create the pack
bitmap.

-Peff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/4] t5326: test propagating hashcache values
  2021-09-14  5:27           ` Jeff King
@ 2021-09-14  5:31             ` Taylor Blau
  0 siblings, 0 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-14  5:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, Junio C Hamano, git

On Tue, Sep 14, 2021 at 01:27:31AM -0400, Jeff King wrote:
> On Tue, Sep 14, 2021 at 01:11:34AM -0400, Taylor Blau wrote:
>
> > Some experiments to back that up: I instrumented the existing p5326 by
> > replacing anything like "pack-objects ... --stdout >/dev/null" with
> > "pack-objects ... --stdout >pack.tmp" and then added test_size's to
> > measure the size of each pack.
> >
> > On the tip of this branch, the results are:
> >
> > 		Test                              origin/tb/multi-pack-bitmaps   HEAD
> > 		----------------------------------------------------------------------------
> > 		5326.5: simulated clone size                 3.3G                 3.3G +0.0%
> > 		5326.7: simulated fetch size                10.5M                10.5M -0.2%
> > 		5326.21: clone (partial bitmap)              3.3G                 3.3G +0.0%
>
> I wouldn't expect a change in the clone size. We're already including
> all the objects from the single pack, so we won't even look for new
> deltas.
>
> In my run, I did see a small improvement in the fetch size (though my
> size both before and after was larger than yours). This is going to
> depend on the exact set of deltas we have (which in turn depends on how
> your repo happens to have been packed before the script even starts) and
> which ones the client actually wants (which may depend on the exact tip
> of your repo).

Yes, I agree with all of that. I am still interested in trying to figure
out why the resulting clone size seems to go up (independent of the
changes here). I'm bisecting it, but it's slow, since every step
requires you to repack the kernel.

> Presumably you also saw a decrease in the user CPU time of 5326.6 here.
> If not, you may have forgotten the extra patch to create the pack
> bitmap.

I did, but didn't bother to include the timings in the quoted part,
since I already shared them in [1].

I have a handful of new patches for an updated version of this series
which explains the extra patch you are talking about, too.

Thanks,
Taylor

[1]: https://lore.kernel.org/git/YT%2F3BuDa7KfUN%2F38@nand.local/

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/4] t5326: test propagating hashcache values
  2021-09-14  5:23         ` Jeff King
@ 2021-09-14  5:49           ` Junio C Hamano
  0 siblings, 0 replies; 55+ messages in thread
From: Junio C Hamano @ 2021-09-14  5:49 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git

Jeff King <peff@peff.net> writes:

> On Mon, Sep 13, 2021 at 07:05:32PM -0700, Junio C Hamano wrote:
>
>> Taylor Blau <me@ttaylorr.com> writes:
>> 
>> > Alas, there they are. They are basically no different than having the
>> > name-hash for single pack bitmaps, it's just now we don't throw them
>> > away when generating a MIDX bitmap from a state where the repository
>> > already has a single-pack bitmap.
>> 
>> I actually wasn't expecting any CPU/time difference.
>
> I was, for the same reason we saw an improvement there in ae4f07fbcc
> (pack-bitmap: implement optional name_hash cache, 2013-12-21): without a
> name-hash, we try a bunch of fruitless deltas before we find a decent
> one.

Nice.  We learn new things every once in a while ;-)

>> I hope that we are talking about the same name-hash, which is used
>> to sort the blobs so that when pack-objects try to find a good delta
>> base, the blobs from the same path will sit close to each other and
>> hopefully fit in the pack window.
>
> Yes, exactly. We spend less time finding the good ones if the likely
> candidates are close together. We may _also_ find better ones overall,
> depending on the number of candidates and the window size.

It is a pleasant surprise that in a real history like linux.git we
can even find good delta base without the name hash (unless we are
using insanely wide window size, that is).  The objects in such a
case will be sorted solely by size, larger to smaller, and we need
to find a good delta base within that window.  It may not be as
horrible as fast-import (which only tries to delta against the
previous single object), but with ~70k paths in a single revision
with history that is ~1m deep, I was pessimistic to see a size-only
sort to drop even a pair of blobs from the same path within the
default window size of 10.  But it seems that such a pessimism is
unwarranted, which is a good news ;-).

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 0/7] pack-bitmap: permute existing namehash values
  2021-09-07 21:17 [PATCH 0/4] pack-bitmap: permute existing namehash values Taylor Blau
                   ` (3 preceding siblings ...)
  2021-09-07 21:18 ` [PATCH 4/4] t5326: test propagating hashcache values Taylor Blau
@ 2021-09-14 22:05 ` Taylor Blau
  2021-09-14 22:06   ` [PATCH v2 1/7] t/helper/test-bitmap.c: add 'dump-hashes' mode Taylor Blau
                     ` (7 more replies)
  2021-09-17 21:21 ` [PATCH v3 " Taylor Blau
  5 siblings, 8 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-14 22:05 UTC (permalink / raw)
  To: git; +Cc: peff, gitster, avarab

Here is a small-ish rerool of my series to carry forward values from an existing
hash-cache when generating multi-pack reachability bitmaps.

Since last time, the performance tests in p5326 were cleaned up in order to
produce timings for generating a pack when using a MIDX bitmap with the
hash-cache extension.

The test-tool implementation in the second patch was also fixed to print the
correct OID for each name-hash. In the previous version, the order we looked up
read the hash-cache was according to pack order, when it should be in index
order. This bug still allowed the tests to pass, because the two orderings are
permutations of one another, so the expected and actual orderings were wrong in
the same way.

It is based on the `tb/multi-pack-bitmaps` topic, which graduated to master.
Thanks in advance for your review.

Taylor Blau (7):
  t/helper/test-bitmap.c: add 'dump-hashes' mode
  pack-bitmap.c: propagate namehash values from existing bitmaps
  midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
  p5326: create missing 'perf-tag' tag
  p5326: don't set core.multiPackIndex unnecessarily
  p5326: generate pack bitmaps before writing the MIDX bitmap
  t5326: test propagating hashcache values

 Documentation/config/pack.txt      |  4 +++
 builtin/multi-pack-index.c         | 21 +++++++++++++++
 midx.c                             |  6 ++++-
 midx.h                             |  1 +
 pack-bitmap.c                      | 41 +++++++++++++++++++++++++-----
 pack-bitmap.h                      |  1 +
 t/helper/test-bitmap.c             | 10 +++++++-
 t/perf/p5326-multi-pack-bitmaps.sh |  8 +++---
 t/t5326-multi-pack-bitmaps.sh      | 32 +++++++++++++++++++++++
 9 files changed, 113 insertions(+), 11 deletions(-)

Range-diff against v1:
1:  918f9b275a ! 1:  4f2b8d9530 t/helper/test-bitmap.c: add 'dump-hashes' mode
    @@ pack-bitmap.c: int test_bitmap_commits(struct repository *r)
     +{
     +	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
     +	struct object_id oid;
    -+	uint32_t i;
    ++	uint32_t i, index_pos;
     +
     +	if (!bitmap_git->hashes)
     +		goto cleanup;
     +
     +	for (i = 0; i < bitmap_num_objects(bitmap_git); i++) {
    -+		nth_bitmap_object_oid(bitmap_git, &oid, i);
    ++		if (bitmap_is_midx(bitmap_git))
    ++			index_pos = pack_pos_to_midx(bitmap_git->midx, i);
    ++		else
    ++			index_pos = pack_pos_to_index(bitmap_git->pack, i);
    ++
    ++		nth_bitmap_object_oid(bitmap_git, &oid, index_pos);
     +
     +		printf("%s %"PRIu32"\n",
    -+		       oid_to_hex(&oid), get_be32(bitmap_git->hashes + i));
    ++		       oid_to_hex(&oid), get_be32(bitmap_git->hashes + index_pos));
     +	}
     +
     +cleanup:
2:  fa9f5633f6 = 2:  2cd2f3aa5e pack-bitmap.c: propagate namehash values from existing bitmaps
3:  be8f47e13c ! 3:  f0d8f106c2 midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
    @@ builtin/multi-pack-index.c: static struct option *add_common_options(struct opti
     +	}
     +
     +	/*
    -+	 * No need to fall-back to 'git_default_config', since this was already
    -+	 * called in 'cmd_multi_pack_index()'.
    ++	 * We should never make a fall-back call to 'git_default_config', since
    ++	 * this was already called in 'cmd_multi_pack_index()'.
     +	 */
     +	return 0;
     +}
-:  ---------- > 4:  a8c6e845ad p5326: create missing 'perf-tag' tag
-:  ---------- > 5:  191922c8f2 p5326: don't set core.multiPackIndex unnecessarily
-:  ---------- > 6:  040bb40548 p5326: generate pack bitmaps before writing the MIDX bitmap
4:  acf3ec60cb ! 7:  fdf71432b3 t5326: test propagating hashcache values
    @@ Commit message
         the configuration to enable the hashcache at all, test that both of
         these function correctly by hardening their behavior with a test.
     
    +    Like the hash-cache in classic single-pack bitmaps, this helps more
    +    proportionally the more up-to-date your bitmap coverage is. When our
    +    bitmap coverage is out-of-date with the ref tips, we spend more time
    +    proportionally traversing, and all of that traversal gets the name-hash
    +    filled in.
    +
    +    But for the up-to-date bitmaps, this helps quite a bit. These numbers
    +    are on git.git, with `pack.threads=1` to help see the difference
    +    reflected in the overall runtime.
    +
    +        Test                            origin/tb/multi-pack-bitmaps   HEAD
    +        -------------------------------------------------------------------------------------
    +        5326.4: simulated clone         1.87(1.80+0.07)                1.46(1.42+0.03) -21.9%
    +        5326.5: simulated fetch         2.66(2.61+0.04)                1.47(1.43+0.04) -44.7%
    +        5326.6: pack to file (bitmap)   2.74(2.62+0.12)                1.89(1.82+0.07) -31.0%
    +
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## t/t5326-multi-pack-bitmaps.sh ##
-- 
2.33.0.96.g73915697e6

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 1/7] t/helper/test-bitmap.c: add 'dump-hashes' mode
  2021-09-14 22:05 ` [PATCH v2 0/7] pack-bitmap: permute existing namehash values Taylor Blau
@ 2021-09-14 22:06   ` Taylor Blau
  2021-09-14 22:06   ` [PATCH v2 2/7] pack-bitmap.c: propagate namehash values from existing bitmaps Taylor Blau
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-14 22:06 UTC (permalink / raw)
  To: git; +Cc: peff, gitster, avarab

The pack-bitmap writer code is about to learn how to propagate values
from an existing hash-cache. To prepare, teach the test-bitmap helper to
dump the values from a bitmap's hash-cache extension in order to test
those changes.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c          | 27 +++++++++++++++++++++++++++
 pack-bitmap.h          |  1 +
 t/helper/test-bitmap.c | 10 +++++++++-
 3 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 8504110a4d..04de387318 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1742,6 +1742,33 @@ int test_bitmap_commits(struct repository *r)
 	return 0;
 }
 
+int test_bitmap_hashes(struct repository *r)
+{
+	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
+	struct object_id oid;
+	uint32_t i, index_pos;
+
+	if (!bitmap_git->hashes)
+		goto cleanup;
+
+	for (i = 0; i < bitmap_num_objects(bitmap_git); i++) {
+		if (bitmap_is_midx(bitmap_git))
+			index_pos = pack_pos_to_midx(bitmap_git->midx, i);
+		else
+			index_pos = pack_pos_to_index(bitmap_git->pack, i);
+
+		nth_bitmap_object_oid(bitmap_git, &oid, index_pos);
+
+		printf("%s %"PRIu32"\n",
+		       oid_to_hex(&oid), get_be32(bitmap_git->hashes + index_pos));
+	}
+
+cleanup:
+	free_bitmap_index(bitmap_git);
+
+	return 0;
+}
+
 int rebuild_bitmap(const uint32_t *reposition,
 		   struct ewah_bitmap *source,
 		   struct bitmap *dest)
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 469090bad2..ed46d27077 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -52,6 +52,7 @@ void traverse_bitmap_commit_list(struct bitmap_index *,
 				 show_reachable_fn show_reachable);
 void test_bitmap_walk(struct rev_info *revs);
 int test_bitmap_commits(struct repository *r);
+int test_bitmap_hashes(struct repository *r);
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 					 struct list_objects_filter_options *filter,
 					 int filter_provided_objects);
diff --git a/t/helper/test-bitmap.c b/t/helper/test-bitmap.c
index 134a1e9d76..ff35f5999b 100644
--- a/t/helper/test-bitmap.c
+++ b/t/helper/test-bitmap.c
@@ -7,6 +7,11 @@ static int bitmap_list_commits(void)
 	return test_bitmap_commits(the_repository);
 }
 
+static int bitmap_dump_hashes(void)
+{
+	return test_bitmap_hashes(the_repository);
+}
+
 int cmd__bitmap(int argc, const char **argv)
 {
 	setup_git_directory();
@@ -16,9 +21,12 @@ int cmd__bitmap(int argc, const char **argv)
 
 	if (!strcmp(argv[1], "list-commits"))
 		return bitmap_list_commits();
+	if (!strcmp(argv[1], "dump-hashes"))
+		return bitmap_dump_hashes();
 
 usage:
-	usage("\ttest-tool bitmap list-commits");
+	usage("\ttest-tool bitmap list-commits\n"
+	      "\ttest-tool bitmap dump-hashes");
 
 	return -1;
 }
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 2/7] pack-bitmap.c: propagate namehash values from existing bitmaps
  2021-09-14 22:05 ` [PATCH v2 0/7] pack-bitmap: permute existing namehash values Taylor Blau
  2021-09-14 22:06   ` [PATCH v2 1/7] t/helper/test-bitmap.c: add 'dump-hashes' mode Taylor Blau
@ 2021-09-14 22:06   ` Taylor Blau
  2021-09-14 22:06   ` [PATCH v2 3/7] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps Taylor Blau
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-14 22:06 UTC (permalink / raw)
  To: git; +Cc: peff, gitster, avarab

When an old bitmap exists while writing a new one, we load it and build
a "reposition" table which maps bit positions of objects from the old
bitmap to their respective positions in the new bitmap. This can help
when we encounter a commit which was selected in both the old and new
bitmap, since we only need to permute its bit (not recompute it from
scratch).

We do not, however, repurpose existing namehash values in the case of
the hash-cache extension. There has been thus far no good reason to do
so, since all of the namehash values for objects in the new bitmap would
be populated during the traversal that was just performed by
pack-objects when generating single-pack reachability bitmaps.

But this isn't the case for multi-pack bitmaps, which are written via
`git multi-pack-index write --bitmap` and do not perform any traversal.
In this case all namehash values are set to zero, but we don't even
bother to check the `pack.writeBitmapHashcache` option anyway, so it
fails to matter.

There are two approaches we could take to fill in non-zero hash-cache
values:

  - have either the multi-pack-index builtin run its own
    traversal to attempt to fill in some values, or let a hypothetical
    caller (like `pack-objects` when `repack` eventually drives the
    `multi-pack-index` builtin) fill in the values they found during
    their traversal

  - or copy any existing namehash values that were stored in an
    existing bitmap to their corresponding positions in the new bitmap

In a system where a repository is generally repacked with `git repack
--geometric=<d>` and occasionally repacked with `git repack -a`, the
hash-cache coverage will tend towards all objects.

Since populating the hash-cache is additive (i.e., doing so only helps
our delta search), any intermediate lack of full coverage is just fine.
So let's start by just propagating any values from the existing
hash-cache if we see one.

The next patch will respect the `pack.writeBitmapHashcache` option while
writing MIDX bitmaps, and then test this new behavior.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 04de387318..33a3732992 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1818,18 +1818,20 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 	for (i = 0; i < num_objects; ++i) {
 		struct object_id oid;
 		struct object_entry *oe;
+		uint32_t index_pos;
 
 		if (bitmap_is_midx(bitmap_git))
-			nth_midxed_object_oid(&oid,
-					      bitmap_git->midx,
-					      pack_pos_to_midx(bitmap_git->midx, i));
+			index_pos = pack_pos_to_midx(bitmap_git->midx, i);
 		else
-			nth_packed_object_id(&oid, bitmap_git->pack,
-					     pack_pos_to_index(bitmap_git->pack, i));
+			index_pos = pack_pos_to_index(bitmap_git->pack, i);
+		nth_bitmap_object_oid(bitmap_git, &oid, index_pos);
 		oe = packlist_find(mapping, &oid);
 
-		if (oe)
+		if (oe) {
 			reposition[i] = oe_in_pack_pos(mapping, oe) + 1;
+			if (bitmap_git->hashes && !oe->hash)
+				oe->hash = get_be32(bitmap_git->hashes + index_pos);
+		}
 	}
 
 	return reposition;
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 3/7] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
  2021-09-14 22:05 ` [PATCH v2 0/7] pack-bitmap: permute existing namehash values Taylor Blau
  2021-09-14 22:06   ` [PATCH v2 1/7] t/helper/test-bitmap.c: add 'dump-hashes' mode Taylor Blau
  2021-09-14 22:06   ` [PATCH v2 2/7] pack-bitmap.c: propagate namehash values from existing bitmaps Taylor Blau
@ 2021-09-14 22:06   ` Taylor Blau
  2021-09-14 22:06   ` [PATCH v2 4/7] p5326: create missing 'perf-tag' tag Taylor Blau
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-14 22:06 UTC (permalink / raw)
  To: git; +Cc: peff, gitster, avarab

In the previous commit, the bitmap writing code learned to propagate
values from an existing hash-cache extension into the bitmap that it is
writing.

Now that that functionality exists, let's expose it by teaching the 'git
multi-pack-index' builtin to respect the `pack.writeBitmapHashCache`
option so that the hash-cache may be written at all.

Two minor points worth noting here:

  - The 'git multi-pack-index write' sub-command didn't previously read
    any configuration (instead this is handled in the base command). A
    separate handler is added here to respect this write-specific
    config option.

  - I briefly considered adding a 'bitmap_flags' field to the static
    options struct, but decided against it since it would require
    plumbing through a new parameter to the write_midx_file() function.

    Instead, a new MIDX-specific flag is added, which is translated to
    the corresponding bitmap one.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config/pack.txt |  4 ++++
 builtin/multi-pack-index.c    | 21 +++++++++++++++++++++
 midx.c                        |  6 +++++-
 midx.h                        |  1 +
 4 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index 763f7af7c4..ad7f73a1ea 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -159,6 +159,10 @@ pack.writeBitmapHashCache::
 	between an older, bitmapped pack and objects that have been
 	pushed since the last gc). The downside is that it consumes 4
 	bytes per object of disk space. Defaults to true.
++
+When writing a multi-pack reachability bitmap, no new namehashes are
+computed; instead, any namehashes stored in an existing bitmap are
+permuted into their appropriate location when writing a new bitmap.
 
 pack.writeReverseIndex::
 	When true, git will write a corresponding .rev file (see:
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 73c0113b48..578ffea6e8 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -61,6 +61,23 @@ static struct option *add_common_options(struct option *prev)
 	return parse_options_concat(common_opts, prev);
 }
 
+static int git_multi_pack_index_write_config(const char *var, const char *value,
+					     void *cb)
+{
+	if (!strcmp(var, "pack.writebitmaphashcache")) {
+		if (git_config_bool(var, value))
+			opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
+		else
+			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
+	}
+
+	/*
+	 * We should never make a fall-back call to 'git_default_config', since
+	 * this was already called in 'cmd_multi_pack_index()'.
+	 */
+	return 0;
+}
+
 static int cmd_multi_pack_index_write(int argc, const char **argv)
 {
 	struct option *options;
@@ -73,6 +90,10 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 		OPT_END(),
 	};
 
+	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
+
+	git_config(git_multi_pack_index_write_config, NULL);
+
 	options = add_common_options(builtin_multi_pack_index_write_options);
 
 	trace2_cmd_mode(argv[0]);
diff --git a/midx.c b/midx.c
index 864034a6ad..38c8600458 100644
--- a/midx.c
+++ b/midx.c
@@ -1008,9 +1008,13 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
 	struct pack_idx_entry **index;
 	struct commit **commits = NULL;
 	uint32_t i, commits_nr;
+	uint16_t options = 0;
 	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
 	int ret;
 
+	if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
+		options |= BITMAP_OPT_HASH_CACHE;
+
 	prepare_midx_packing_data(&pdata, ctx);
 
 	commits = find_commits_for_midx_bitmap(&commits_nr, ctx);
@@ -1049,7 +1053,7 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
 		goto cleanup;
 
 	bitmap_writer_set_checksum(midx_hash);
-	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, 0);
+	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, options);
 
 cleanup:
 	free(index);
diff --git a/midx.h b/midx.h
index aa3da557bb..541d9ac728 100644
--- a/midx.h
+++ b/midx.h
@@ -44,6 +44,7 @@ struct multi_pack_index {
 #define MIDX_PROGRESS     (1 << 0)
 #define MIDX_WRITE_REV_INDEX (1 << 1)
 #define MIDX_WRITE_BITMAP (1 << 2)
+#define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m);
 char *get_midx_filename(const char *object_dir);
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 4/7] p5326: create missing 'perf-tag' tag
  2021-09-14 22:05 ` [PATCH v2 0/7] pack-bitmap: permute existing namehash values Taylor Blau
                     ` (2 preceding siblings ...)
  2021-09-14 22:06   ` [PATCH v2 3/7] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps Taylor Blau
@ 2021-09-14 22:06   ` Taylor Blau
  2021-09-16 22:36     ` Jeff King
  2021-09-14 22:06   ` [PATCH v2 5/7] p5326: don't set core.multiPackIndex unnecessarily Taylor Blau
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 55+ messages in thread
From: Taylor Blau @ 2021-09-14 22:06 UTC (permalink / raw)
  To: git; +Cc: peff, gitster, avarab

Some of the tests in test_full_bitmap rely on having a tag named
perf-tag in place. We could create it in test_full_bitmap(), but we want
to have it in place before the repack starts.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/p5326-multi-pack-bitmaps.sh | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
index 5845109ac7..51b5636259 100755
--- a/t/perf/p5326-multi-pack-bitmaps.sh
+++ b/t/perf/p5326-multi-pack-bitmaps.sh
@@ -10,6 +10,12 @@ test_expect_success 'enable multi-pack index' '
 	git config core.multiPackIndex true
 '
 
+# we need to create the tag up front such that it is covered by the repack and
+# thus by generated bitmaps.
+test_expect_success 'create tags' '
+	git tag --message="tag pointing to HEAD" perf-tag HEAD
+'
+
 test_perf 'setup multi-pack index' '
 	git repack -ad &&
 	git multi-pack-index write --bitmap
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 5/7] p5326: don't set core.multiPackIndex unnecessarily
  2021-09-14 22:05 ` [PATCH v2 0/7] pack-bitmap: permute existing namehash values Taylor Blau
                     ` (3 preceding siblings ...)
  2021-09-14 22:06   ` [PATCH v2 4/7] p5326: create missing 'perf-tag' tag Taylor Blau
@ 2021-09-14 22:06   ` Taylor Blau
  2021-09-16 22:38     ` Jeff King
  2021-09-14 22:06   ` [PATCH v2 6/7] p5326: generate pack bitmaps before writing the MIDX bitmap Taylor Blau
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 55+ messages in thread
From: Taylor Blau @ 2021-09-14 22:06 UTC (permalink / raw)
  To: git; +Cc: peff, gitster, avarab

When this performance test was originally written, `core.multiPackIndex`
was not the default and thus had to be enabled. But now that we have
18e449f86b (midx: enable core.multiPackIndex by default, 2020-09-25), we
no longer need this.

Drop the unnecessary setup (even though it's not hurting anything, it is
unnecessary at best and confusing at worst).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/p5326-multi-pack-bitmaps.sh | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
index 51b5636259..a9c5499537 100755
--- a/t/perf/p5326-multi-pack-bitmaps.sh
+++ b/t/perf/p5326-multi-pack-bitmaps.sh
@@ -6,10 +6,6 @@ test_description='Tests performance using midx bitmaps'
 
 test_perf_large_repo
 
-test_expect_success 'enable multi-pack index' '
-	git config core.multiPackIndex true
-'
-
 # we need to create the tag up front such that it is covered by the repack and
 # thus by generated bitmaps.
 test_expect_success 'create tags' '
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 6/7] p5326: generate pack bitmaps before writing the MIDX bitmap
  2021-09-14 22:05 ` [PATCH v2 0/7] pack-bitmap: permute existing namehash values Taylor Blau
                     ` (4 preceding siblings ...)
  2021-09-14 22:06   ` [PATCH v2 5/7] p5326: don't set core.multiPackIndex unnecessarily Taylor Blau
@ 2021-09-14 22:06   ` Taylor Blau
  2021-09-16 22:45     ` Jeff King
  2021-09-14 22:06   ` [PATCH v2 7/7] t5326: test propagating hashcache values Taylor Blau
  2021-09-16 22:52   ` [PATCH v2 0/7] pack-bitmap: permute existing namehash values Jeff King
  7 siblings, 1 reply; 55+ messages in thread
From: Taylor Blau @ 2021-09-14 22:06 UTC (permalink / raw)
  To: git; +Cc: peff, gitster, avarab

To help test the performance of permuting the contents of the hash-cache
when generating a MIDX bitmap, we need a bitmap which has its hash-cache
populated.

And since multi-pack bitmaps don't add *new* values to the hash-cache,
we have to rely on a single-pack bitmap to generate those values for us.

Therefore, generate a pack bitmap before the MIDX one in order to ensure
that the MIDX bitmap has entries in its hash-cache.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/p5326-multi-pack-bitmaps.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
index a9c5499537..38557859b7 100755
--- a/t/perf/p5326-multi-pack-bitmaps.sh
+++ b/t/perf/p5326-multi-pack-bitmaps.sh
@@ -13,7 +13,7 @@ test_expect_success 'create tags' '
 '
 
 test_perf 'setup multi-pack index' '
-	git repack -ad &&
+	git repack -adb &&
 	git multi-pack-index write --bitmap
 '
 
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 7/7] t5326: test propagating hashcache values
  2021-09-14 22:05 ` [PATCH v2 0/7] pack-bitmap: permute existing namehash values Taylor Blau
                     ` (5 preceding siblings ...)
  2021-09-14 22:06   ` [PATCH v2 6/7] p5326: generate pack bitmaps before writing the MIDX bitmap Taylor Blau
@ 2021-09-14 22:06   ` Taylor Blau
  2021-09-16 22:49     ` Jeff King
  2021-09-16 22:52   ` [PATCH v2 0/7] pack-bitmap: permute existing namehash values Jeff King
  7 siblings, 1 reply; 55+ messages in thread
From: Taylor Blau @ 2021-09-14 22:06 UTC (permalink / raw)
  To: git; +Cc: peff, gitster, avarab

Now that we both can propagate values from the hashcache, and respect
the configuration to enable the hashcache at all, test that both of
these function correctly by hardening their behavior with a test.

Like the hash-cache in classic single-pack bitmaps, this helps more
proportionally the more up-to-date your bitmap coverage is. When our
bitmap coverage is out-of-date with the ref tips, we spend more time
proportionally traversing, and all of that traversal gets the name-hash
filled in.

But for the up-to-date bitmaps, this helps quite a bit. These numbers
are on git.git, with `pack.threads=1` to help see the difference
reflected in the overall runtime.

    Test                            origin/tb/multi-pack-bitmaps   HEAD
    -------------------------------------------------------------------------------------
    5326.4: simulated clone         1.87(1.80+0.07)                1.46(1.42+0.03) -21.9%
    5326.5: simulated fetch         2.66(2.61+0.04)                1.47(1.43+0.04) -44.7%
    5326.6: pack to file (bitmap)   2.74(2.62+0.12)                1.89(1.82+0.07) -31.0%

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5326-multi-pack-bitmaps.sh | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 4ad7c2c969..24148ca35b 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -283,4 +283,36 @@ test_expect_success 'pack.preferBitmapTips' '
 	)
 '
 
+test_expect_success 'hash-cache values are propagated from pack bitmaps' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		git config pack.writeBitmapHashCache true &&
+
+		test_commit base &&
+		test_commit base2 &&
+		git repack -adb &&
+
+		test-tool bitmap dump-hashes >pack.raw &&
+		test_file_not_empty pack.raw &&
+		sort pack.raw >pack.hashes &&
+
+		test_commit new &&
+		git repack &&
+		git multi-pack-index write --bitmap &&
+
+		test-tool bitmap dump-hashes >midx.raw &&
+		sort midx.raw >midx.hashes &&
+
+		# ensure that every namehash in the pack bitmap can be found in
+		# the midx bitmap (i.e., that there are no oid-namehash pairs
+		# unique to the pack bitmap).
+		comm -23 pack.hashes midx.hashes >dropped.hashes &&
+		test_must_be_empty dropped.hashes
+	)
+'
+
 test_done
-- 
2.33.0.96.g73915697e6

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 4/7] p5326: create missing 'perf-tag' tag
  2021-09-14 22:06   ` [PATCH v2 4/7] p5326: create missing 'perf-tag' tag Taylor Blau
@ 2021-09-16 22:36     ` Jeff King
  2021-09-17  4:14       ` Taylor Blau
  0 siblings, 1 reply; 55+ messages in thread
From: Jeff King @ 2021-09-16 22:36 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, gitster, avarab

On Tue, Sep 14, 2021 at 06:06:09PM -0400, Taylor Blau wrote:

> Some of the tests in test_full_bitmap rely on having a tag named
> perf-tag in place. We could create it in test_full_bitmap(), but we want
> to have it in place before the repack starts.

I wondered how p5326 ever could have worked without this.

But I think the answer is that it was added in a parallel-ish branch via
540cdc11ad (pack-bitmap: avoid traversal of objects referenced by
uninteresting tag, 2021-03-22).

If that had then been merged with the midx-bitmap topic, we would have
seen a conflict as yours tried to delete the surrounding tests that got
moved into lib-bitmap.sh. But we didn't see such a merge.  Your
9387fbd646 (p5310: extract full and partial bitmap tests, moves the
perf-tag test, so p5326 was broken then (well, in the follow-on commit
that introduced it).

Knowing the history of the midx-bitmap series, I'm almost certain what
happened is that it got rebased, and you probably _did_ see a textual
conflict, which you resolved correctly, moving the perf-tag test into
lib-bitmap.sh But you missed the semantic conflict that p5326 also
needed to add in the setup step.

All of which is to say this looks fine to me. ;) I was just talking out
loud to try to understand what had happened.

-Peff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 5/7] p5326: don't set core.multiPackIndex unnecessarily
  2021-09-14 22:06   ` [PATCH v2 5/7] p5326: don't set core.multiPackIndex unnecessarily Taylor Blau
@ 2021-09-16 22:38     ` Jeff King
  0 siblings, 0 replies; 55+ messages in thread
From: Jeff King @ 2021-09-16 22:38 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, gitster, avarab

On Tue, Sep 14, 2021 at 06:06:11PM -0400, Taylor Blau wrote:

> When this performance test was originally written, `core.multiPackIndex`
> was not the default and thus had to be enabled. But now that we have
> 18e449f86b (midx: enable core.multiPackIndex by default, 2020-09-25), we
> no longer need this.
> 
> Drop the unnecessary setup (even though it's not hurting anything, it is
> unnecessary at best and confusing at worst).

Makes sense. Sometimes it is worthwhile to carry extra code in the perf
tests for older versions, so that we can test performance across a wider
span. But there were no midx bitmaps at all when 18e449f86b happened, so
there is little point in worrying about going back that far for this
test.

-Peff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 6/7] p5326: generate pack bitmaps before writing the MIDX bitmap
  2021-09-14 22:06   ` [PATCH v2 6/7] p5326: generate pack bitmaps before writing the MIDX bitmap Taylor Blau
@ 2021-09-16 22:45     ` Jeff King
  2021-09-17  4:20       ` Taylor Blau
  0 siblings, 1 reply; 55+ messages in thread
From: Jeff King @ 2021-09-16 22:45 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, gitster, avarab

On Tue, Sep 14, 2021 at 06:06:14PM -0400, Taylor Blau wrote:

> To help test the performance of permuting the contents of the hash-cache
> when generating a MIDX bitmap, we need a bitmap which has its hash-cache
> populated.
> 
> And since multi-pack bitmaps don't add *new* values to the hash-cache,
> we have to rely on a single-pack bitmap to generate those values for us.
> 
> Therefore, generate a pack bitmap before the MIDX one in order to ensure
> that the MIDX bitmap has entries in its hash-cache.

Makes sense. This is a little more contrived of a setup than the
original, but an utterly realistic one. If you are using midx bitmaps,
you are probably interspersing them with occasional full pack bitmaps.

It might be interesting to also do:

  rm -f .git/objects/pack/pack-*.bitmap

after generating the midx bitmap. That would confirm the further timing
tests are using the midx bitmap, and not ever "cheating" by looking at
the pack one (having poked in this direction before, I know that this
all works, so it would be a future-proofing thing).

> diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
> index a9c5499537..38557859b7 100755
> --- a/t/perf/p5326-multi-pack-bitmaps.sh
> +++ b/t/perf/p5326-multi-pack-bitmaps.sh
> @@ -13,7 +13,7 @@ test_expect_success 'create tags' '
>  '
>  
>  test_perf 'setup multi-pack index' '
> -	git repack -ad &&
> +	git repack -adb &&
>  	git multi-pack-index write --bitmap
>  '

This sort-of existed before your series, but I think is a bit "worse"
now: we are timing both "repack" and "multi-pack-index" write together.
So:

  - the timing for the midx write that we are interested in timing will
    be diluted by the much-bigger full-repack

  - we'll actually do _three_ full repacks (the default
    GIT_PERF_REPEAT_COUNT for the "run" script), since it's inside a
    test_perf()

So:

  test_expect_success 'start with bitmapped pack' '
	git repack -adb
  '

  test_perf 'setup multi-pack index' '
	git multi-pack-index write --bitmap
  '

would run faster and give us more interesting timings. Possibly you'd
want to drop the midx and its bitmaps as part of that test_perf, too.
The first run will be using the pack bitmap, and the others will use the
midx. I doubt it makes much difference either way, though.

And of course if you want to take my earlier suggestion, then it's easy
to add:

  test_expect_success 'drop pack bitmap' '
	rm -f .git/objects/pack/pack-*.bitmap
  '

afterwards; you wouldn't want to do it inside the test_perf() call
because of the repeat-count.

-Peff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 7/7] t5326: test propagating hashcache values
  2021-09-14 22:06   ` [PATCH v2 7/7] t5326: test propagating hashcache values Taylor Blau
@ 2021-09-16 22:49     ` Jeff King
  0 siblings, 0 replies; 55+ messages in thread
From: Jeff King @ 2021-09-16 22:49 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, gitster, avarab

On Tue, Sep 14, 2021 at 06:06:16PM -0400, Taylor Blau wrote:

> Now that we both can propagate values from the hashcache, and respect
> the configuration to enable the hashcache at all, test that both of
> these function correctly by hardening their behavior with a test.
> 
> Like the hash-cache in classic single-pack bitmaps, this helps more
> proportionally the more up-to-date your bitmap coverage is. When our
> bitmap coverage is out-of-date with the ref tips, we spend more time
> proportionally traversing, and all of that traversal gets the name-hash
> filled in.
> 
> But for the up-to-date bitmaps, this helps quite a bit. These numbers
> are on git.git, with `pack.threads=1` to help see the difference
> reflected in the overall runtime.
> 
>     Test                            origin/tb/multi-pack-bitmaps   HEAD
>     -------------------------------------------------------------------------------------
>     5326.4: simulated clone         1.87(1.80+0.07)                1.46(1.42+0.03) -21.9%
>     5326.5: simulated fetch         2.66(2.61+0.04)                1.47(1.43+0.04) -44.7%
>     5326.6: pack to file (bitmap)   2.74(2.62+0.12)                1.89(1.82+0.07) -31.0%

The percentages here match timings I did. Doing it with linux.git gives
bigger absolute numbers, but I think this is sufficient (and a lot less
painful when people are trying to replicate).

> +test_expect_success 'hash-cache values are propagated from pack bitmaps' '
> +	rm -fr repo &&
> +	git init repo &&
> +	test_when_finished "rm -fr repo" &&
> +	(
> +		cd repo &&
> +
> +		git config pack.writeBitmapHashCache true &&

This is the default as of your earlier commits, so we could probably
drop this. I don't mind keeping it as explicit documentation of what we
expect, though.

-Peff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 0/7] pack-bitmap: permute existing namehash values
  2021-09-14 22:05 ` [PATCH v2 0/7] pack-bitmap: permute existing namehash values Taylor Blau
                     ` (6 preceding siblings ...)
  2021-09-14 22:06   ` [PATCH v2 7/7] t5326: test propagating hashcache values Taylor Blau
@ 2021-09-16 22:52   ` Jeff King
  7 siblings, 0 replies; 55+ messages in thread
From: Jeff King @ 2021-09-16 22:52 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, gitster, avarab

On Tue, Sep 14, 2021 at 06:05:59PM -0400, Taylor Blau wrote:

> Here is a small-ish rerool of my series to carry forward values from an existing
> hash-cache when generating multi-pack reachability bitmaps.
> 
> Since last time, the performance tests in p5326 were cleaned up in order to
> produce timings for generating a pack when using a MIDX bitmap with the
> hash-cache extension.

I had seen most of these in prototype form before, but I hadn't yet
carefully looked at your cleanups for upstream. I gave them all a
careful read, and it all looks good to me.

I did have a few comments on patch 6 that I think are worth following up
on. We can also do so on top if you prefer.

-Peff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 4/7] p5326: create missing 'perf-tag' tag
  2021-09-16 22:36     ` Jeff King
@ 2021-09-17  4:14       ` Taylor Blau
  0 siblings, 0 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-17  4:14 UTC (permalink / raw)
  To: Jeff King; +Cc: git, gitster, avarab

On Thu, Sep 16, 2021 at 06:36:54PM -0400, Jeff King wrote:
> On Tue, Sep 14, 2021 at 06:06:09PM -0400, Taylor Blau wrote:
>
> > Some of the tests in test_full_bitmap rely on having a tag named
> > perf-tag in place. We could create it in test_full_bitmap(), but we want
> > to have it in place before the repack starts.
>
> I wondered how p5326 ever could have worked without this.
>
> [...]
>
> Knowing the history of the midx-bitmap series, I'm almost certain what
> happened is that it got rebased, and you probably _did_ see a textual
> conflict, which you resolved correctly, moving the perf-tag test into
> lib-bitmap.sh But you missed the semantic conflict that p5326 also
> needed to add in the setup step.

Sounds about right :-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 6/7] p5326: generate pack bitmaps before writing the MIDX bitmap
  2021-09-16 22:45     ` Jeff King
@ 2021-09-17  4:20       ` Taylor Blau
  0 siblings, 0 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-17  4:20 UTC (permalink / raw)
  To: Jeff King; +Cc: git, gitster, avarab

On Thu, Sep 16, 2021 at 06:45:33PM -0400, Jeff King wrote:
> It might be interesting to also do:
>
>   rm -f .git/objects/pack/pack-*.bitmap
>
> after generating the midx bitmap. That would confirm the further timing
> tests are using the midx bitmap, and not ever "cheating" by looking at
> the pack one (having poked in this direction before, I know that this
> all works, so it would be a future-proofing thing).

This and the suggestion to avoid timing the single pack bitmap
generation are both good ones.

I think to be totally accurate we would want to drop the pack bitmap
before every MIDX bitmap generation, but I also think that in practice
it does not matter much.

The reuse code currently does not try too hard to recognize the
situation of "oh, I selected the same exact commits and don't have to do
any work". It kind of does eventually, but only after doing a lot of
preparation.

So I'm dubious as to whether we're really timing anything *that* useful,
but it's probably worth keeping around anyway.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/4] t5326: test propagating hashcache values
  2021-09-08  2:30     ` Taylor Blau
@ 2021-09-17  8:56       ` Ævar Arnfjörð Bjarmason
  2021-09-17 17:32         ` Taylor Blau
  0 siblings, 1 reply; 55+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-17  8:56 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff


On Tue, Sep 07 2021, Taylor Blau wrote:

> On Wed, Sep 08, 2021 at 03:46:29AM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Tue, Sep 07 2021, Taylor Blau wrote:
>>
>> > Now that we both can propagate values from the hashcache, and respect
>> > the configuration to enable the hashcache at all, test that both of
>> > these function correctly by hardening their behavior with a test.
>> >
>> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
>> > ---
>> >  t/t5326-multi-pack-bitmaps.sh | 32 ++++++++++++++++++++++++++++++++
>> >  1 file changed, 32 insertions(+)
>> >
>> > diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
>> > index 4ad7c2c969..24148ca35b 100755
>> > --- a/t/t5326-multi-pack-bitmaps.sh
>> > +++ b/t/t5326-multi-pack-bitmaps.sh
>> > @@ -283,4 +283,36 @@ test_expect_success 'pack.preferBitmapTips' '
>> >  	)
>> >  '
>> >
>> > +test_expect_success 'hash-cache values are propagated from pack bitmaps' '
>> > +	rm -fr repo &&
>> > +	git init repo &&
>> > +	test_when_finished "rm -fr repo" &&
>>
>> It seems the need for this "rm -fr repo" dance instead of just relying
>> on test_when_finished "rm -rf repo" is because of a 3x tests in a
>> function in tb/multi-pack-bitmaps that should probably be combined into
>> one, i.e. they share the same logical "repo" setup.
>
> Yeah. There's definitely room for clean-up there if we want to have each
> of the tests operate on the same repo. I have always found sharing a
> repo between tests difficult to reason about, since we have to assume
> that any --run parameters could be passed in.
>
> So I have tended to err on the side of creating and throwing away a new
> repo per test, but I understand that's somewhat atypical for Git's
> tests.

Just in an effort to make myself clear here, because between your note
in https://lore.kernel.org/git/YUOds4kcHRgMk5nC@nand.local/ and
re-reading my mail here I can barely understand what I meant here :)

What I mean is that the whole "rm -rf" dance at the start of tests is
fallout from an earlier call you made in c51f5a6437c (t5326: test
multi-pack bitmap behavior, 2021-08-31) to split up three tests that
really are the same logical test in terms of setup/teardown.

If they were to be re-combined as shown in the diff-on-top at the end
here none of the tests need a "rm -rf repo" at the beginning, because
they can all rely on a starting pattern of:

    test_when_finished "rm -rf repo" &&
    git init repo &&
    ( cd repo ... )

Or, you can also just not re-use the "repo" name, which is what I did in
the fsck tests at
https://lore.kernel.org/git/patch-v6-01.22-ebe89f65354-20210907T104559Z-avarab@gmail.com/;
then you don't even need test_when_finished "rm -rf[ ...]".

None of this requires fixing here or a re-roll, unless you happen to
think this diff is the best thing since sliced bread (assume my
Signed-off-by).

But just as a general musing on patterns to use in tests, I see why you
used that 1x test as 3x, because you want "test_expect_success" to give
you the label on for each "step".

I think that would be worth fixing in test-lib.sh, there's no inherent
reason we couldn't support "checkpoints" or "subtests" within
"test_expect_success" (the latter being a part of the TAP protocol).

But until then IMNSHO this sort of thing is an anti-pattern,
i.e. needing to write things like:

    test_expect_success '[...]' 'A'
    test_expect_success '[...]' 'B'

Where a part of what B needs to do is to clean up the mess left behind
A, or relies on its setup.

That's just added verbosity and context when reading the code. Ideally
you'd just need to read the "setup" at the start, and then individual
tests, with this pattern you need to read the preceding test(s) to see
where the crap they're cleaning came from, and if it's even needed etc.

(For the "setup" part I've suggested a test_expect_setup, see
https://lore.kernel.org/git/8735vrvg39.fsf@evledraar.gmail.com/).

The "needs the setup" part of this has the negative side-effect of
breaking the semantics of the --run option. As noted in the E-Mail
linked in the last paragraph it doesn't work well in general because of
things like this, but let's steer in the right direction.

I.e. with this change you can run:

    ./t5326-multi-pack-bitmaps.sh --run=111-113

Which is great, usually you need at least 1,<nums you want> ,r
1-10,<nums you want> to get the setup. But because you split them this
breaks:

    ./t5326-multi-pack-bitmaps.sh --run=112-113

I.e. we'll run only the 2nd and 3rd test in a sequence that needs the
1st for setup.

For context: My preference for this isn't just an asthetic or
theoretical thing. It's really nice to be hacking some code, find that
I've broken the bitmap code somehow in your test #112, and be able to
just run (since running the whole thing takes 13s on my box, and I'd
like to test in a tight loop):

    ./t5326-multi-pack-bitmaps.sh --run=112

But that breaks as noted above, so I need read earlier tests (I was
probably looking at test #112 already at this point) and see how their
setup works etc.

Anyway, as with so many things in git.git being able to just assume
--run works like this isn't something you can rely on in the general
case, but just as a matter of where we should be headed...

>> > +	(
>> > +		cd repo &&
>> > +
>> > +		git config pack.writeBitmapHashCache true &&
>>
>> s/git config/test_config/, surely.
>
> No, since this is in a subshell (and we don't care about unsetting the
> value of a config in a repo that we're going to throw away, anyways).
>
> (Side-note: since this has a /bin/sh shebang, we can't detect that we're
> in a subshell and so test_config actually _would_ work here. But
> switching this test to run with /bin/bash where we can detect whether or
> not we're in a subshell would cause this test to fail with test_config).

Yes, you're 100% right here. This was purely a misreading on my part, I
managed to somehow not see/take into account the subshell. Using
test_config makes no sense here.

The diff-on-top for discussion mentioned above:

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 24148ca35b3..a6bb0abb387 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -133,8 +133,8 @@ bitmap_reuse_tests() {
 	from=$1
 	to=$2
 
-	test_expect_success "setup pack reuse tests ($from -> $to)" '
-		rm -fr repo &&
+	test_expect_success "setup/build/verify ($from -> $to) pack reuse bitmaps" '
+		test_when_finished "rm -rf repo" &&
 		git init repo &&
 		(
 			cd repo &&
@@ -148,13 +148,9 @@ bitmap_reuse_tests() {
 				git multi-pack-index write --bitmap
 			else
 				git repack -Adb
-			fi
-		)
-	'
+			fi &&
 
-	test_expect_success "build bitmap from existing ($from -> $to)" '
-		(
-			cd repo &&
+			# Build
 			test_commit_bulk --id=further 16 &&
 			git tag new-tip &&
 
@@ -164,13 +160,9 @@ bitmap_reuse_tests() {
 				git multi-pack-index write --bitmap
 			else
 				git repack -Adb
-			fi
-		)
-	'
+			fi &&
 
-	test_expect_success "verify resulting bitmaps ($from -> $to)" '
-		(
-			cd repo &&
+			# Verify
 			git for-each-ref &&
 			git rev-list --test-bitmap refs/tags/old-tip &&
 			git rev-list --test-bitmap refs/tags/new-tip
@@ -183,9 +175,8 @@ bitmap_reuse_tests 'MIDX' 'pack'
 bitmap_reuse_tests 'MIDX' 'MIDX'
 
 test_expect_success 'missing object closure fails gracefully' '
-	rm -fr repo &&
-	git init repo &&
 	test_when_finished "rm -fr repo" &&
+	git init repo &&
 	(
 		cd repo &&
 
@@ -217,9 +208,8 @@ test_expect_success 'setup partial bitmaps' '
 basic_bitmap_tests HEAD~
 
 test_expect_success 'removing a MIDX clears stale bitmaps' '
-	rm -fr repo &&
-	git init repo &&
 	test_when_finished "rm -fr repo" &&
+	git init repo &&
 	(
 		cd repo &&
 		test_commit base &&
@@ -245,8 +235,8 @@ test_expect_success 'removing a MIDX clears stale bitmaps' '
 '
 
 test_expect_success 'pack.preferBitmapTips' '
-	git init repo &&
 	test_when_finished "rm -fr repo" &&
+	git init repo &&
 	(
 		cd repo &&
 
@@ -284,9 +274,8 @@ test_expect_success 'pack.preferBitmapTips' '
 '
 
 test_expect_success 'hash-cache values are propagated from pack bitmaps' '
-	rm -fr repo &&
-	git init repo &&
 	test_when_finished "rm -fr repo" &&
+	git init repo &&
 	(
 		cd repo &&
 

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/4] t5326: test propagating hashcache values
  2021-09-17  8:56       ` Ævar Arnfjörð Bjarmason
@ 2021-09-17 17:32         ` Taylor Blau
  2021-09-17 19:22           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 55+ messages in thread
From: Taylor Blau @ 2021-09-17 17:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, peff

On Fri, Sep 17, 2021 at 10:56:09AM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> It seems the need for this "rm -fr repo" dance instead of just relying
> >> on test_when_finished "rm -rf repo" is because of a 3x tests in a
> >> function in tb/multi-pack-bitmaps that should probably be combined into
> >> one, i.e. they share the same logical "repo" setup.
> >
> > Yeah. There's definitely room for clean-up there if we want to have each
> > of the tests operate on the same repo. I have always found sharing a
> > repo between tests difficult to reason about, since we have to assume
> > that any --run parameters could be passed in.
> >
> > So I have tended to err on the side of creating and throwing away a new
> > repo per test, but I understand that's somewhat atypical for Git's
> > tests.
>
> Just in an effort to make myself clear here, because between your note
> in https://lore.kernel.org/git/YUOds4kcHRgMk5nC@nand.local/ and
> re-reading my mail here I can barely understand what I meant here :)

Thanks for clarifying; if I could summarize I would say that this
discussion got started since a new test I added starts with:

    rm -fr repo &&
    git init repo &&
    test_when_finished "rm -fr repo"

where the first rm is designed to clean any state left behind from the
split-up tests added in c51f5a6437c.

I understand your suggestion, and I even think that it may be worth
doing, but I'm not sure that I buy that any such cleanup is related to
this series for any reason other than "you happened to add a new test in
this file which extended the pattern".

So let's pursue the cleanup, but outside of this series (and maybe once
the dust has settled more generally on the MIDX bitmaps topics to avoid
having the maintainer deal with any conflicts).

> Or, you can also just not re-use the "repo" name, which is what I did in
> the fsck tests at
> https://lore.kernel.org/git/patch-v6-01.22-ebe89f65354-20210907T104559Z-avarab@gmail.com/;
> then you don't even need test_when_finished "rm -rf[ ...]".

TBH, I think that this is the most appealing thing to do. It's easy and
doesn't require us to think too hard about test_expect_setup or
sub-tests or anything like that where I'm not sure the complexity is
warranted.

I would probably do something like this:

--- >8 ---

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index ec4aa89f63..11845f67ae 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -132,12 +132,12 @@ test_expect_success 'clone with bitmaps enabled' '
 bitmap_reuse_tests() {
 	from=$1
 	to=$2
+	repo="bitmap-reuse-$from-$to"

 	test_expect_success "setup pack reuse tests ($from -> $to)" '
-		rm -fr repo &&
-		git init repo &&
+		git init $repo &&
 		(
-			cd repo &&
+			cd $repo &&
 			test_commit_bulk 16 &&
 			git tag old-tip &&

@@ -154,7 +154,7 @@ bitmap_reuse_tests() {

 	test_expect_success "build bitmap from existing ($from -> $to)" '
 		(
-			cd repo &&
+			cd $repo &&
 			test_commit_bulk --id=further 16 &&
 			git tag new-tip &&

@@ -170,7 +170,7 @@ bitmap_reuse_tests() {

 	test_expect_success "verify resulting bitmaps ($from -> $to)" '
 		(
-			cd repo &&
+			cd $repo &&
 			git for-each-ref &&
 			git rev-list --test-bitmap refs/tags/old-tip &&
 			git rev-list --test-bitmap refs/tags/new-tip
@@ -183,7 +183,6 @@ bitmap_reuse_tests 'MIDX' 'pack'
 bitmap_reuse_tests 'MIDX' 'MIDX'

 test_expect_success 'missing object closure fails gracefully' '
-	rm -fr repo &&
 	git init repo &&
 	test_when_finished "rm -fr repo" &&
 	(
@@ -217,7 +216,6 @@ test_expect_success 'setup partial bitmaps' '
 basic_bitmap_tests HEAD~

 test_expect_success 'removing a MIDX clears stale bitmaps' '
-	rm -fr repo &&
 	git init repo &&
 	test_when_finished "rm -fr repo" &&
 	(
@@ -284,7 +282,6 @@ test_expect_success 'pack.preferBitmapTips' '
 '

 test_expect_success 'hash-cache values are propagated from pack bitmaps' '
-	rm -fr repo &&
 	git init repo &&
 	test_when_finished "rm -fr repo" &&
 	(

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 4/4] t5326: test propagating hashcache values
  2021-09-17 17:32         ` Taylor Blau
@ 2021-09-17 19:22           ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-17 19:22 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff


On Fri, Sep 17 2021, Taylor Blau wrote:

> On Fri, Sep 17, 2021 at 10:56:09AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> >> It seems the need for this "rm -fr repo" dance instead of just relying
>> >> on test_when_finished "rm -rf repo" is because of a 3x tests in a
>> >> function in tb/multi-pack-bitmaps that should probably be combined into
>> >> one, i.e. they share the same logical "repo" setup.
>> >
>> > Yeah. There's definitely room for clean-up there if we want to have each
>> > of the tests operate on the same repo. I have always found sharing a
>> > repo between tests difficult to reason about, since we have to assume
>> > that any --run parameters could be passed in.
>> >
>> > So I have tended to err on the side of creating and throwing away a new
>> > repo per test, but I understand that's somewhat atypical for Git's
>> > tests.
>>
>> Just in an effort to make myself clear here, because between your note
>> in https://lore.kernel.org/git/YUOds4kcHRgMk5nC@nand.local/ and
>> re-reading my mail here I can barely understand what I meant here :)
>
> Thanks for clarifying; if I could summarize I would say that this
> discussion got started since a new test I added starts with:
>
>     rm -fr repo &&
>     git init repo &&
>     test_when_finished "rm -fr repo"
>
> where the first rm is designed to clean any state left behind from the
> split-up tests added in c51f5a6437c.
>
> I understand your suggestion, and I even think that it may be worth
> doing, but I'm not sure that I buy that any such cleanup is related to
> this series for any reason other than "you happened to add a new test in
> this file which extended the pattern".
>
> So let's pursue the cleanup, but outside of this series (and maybe once
> the dust has settled more generally on the MIDX bitmaps topics to avoid
> having the maintainer deal with any conflicts).

Yes I agree, FWIW I didn't look carefully at what was in-flight where at
the time, or the series this depends on was itself in "seen", I can't
remember which..

>> Or, you can also just not re-use the "repo" name, which is what I did in
>> the fsck tests at
>> https://lore.kernel.org/git/patch-v6-01.22-ebe89f65354-20210907T104559Z-avarab@gmail.com/;
>> then you don't even need test_when_finished "rm -rf[ ...]".
>
> TBH, I think that this is the most appealing thing to do. It's easy and
> doesn't require us to think too hard about test_expect_setup or
> sub-tests or anything like that where I'm not sure the complexity is
> warranted.
>
> I would probably do something like this:

Yeah, that bypasses the cleanup bit, but leaves --run broken as I
described. For new code I think it's good practice to have --run work
too.

Again, I think that's a non-issue here considering the rest of the
dumpster fire in the test suite when it comes to that, so I'm not
suggesting a re-roll or whatever.

I just used the upthread as a jumping off point since you had a question
about these patterns in the other topic, and perhaps you/some bystander
would be convinced and follow that pattern in the future.

> --- >8 ---
>
> diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
> index ec4aa89f63..11845f67ae 100755
> --- a/t/t5326-multi-pack-bitmaps.sh
> +++ b/t/t5326-multi-pack-bitmaps.sh
> @@ -132,12 +132,12 @@ test_expect_success 'clone with bitmaps enabled' '
>  bitmap_reuse_tests() {
>  	from=$1
>  	to=$2
> +	repo="bitmap-reuse-$from-$to"
>
>  	test_expect_success "setup pack reuse tests ($from -> $to)" '
> -		rm -fr repo &&
> -		git init repo &&
> +		git init $repo &&
>  		(
> -			cd repo &&
> +			cd $repo &&
>  			test_commit_bulk 16 &&
>  			git tag old-tip &&
>
> @@ -154,7 +154,7 @@ bitmap_reuse_tests() {
>
>  	test_expect_success "build bitmap from existing ($from -> $to)" '
>  		(
> -			cd repo &&
> +			cd $repo &&
>  			test_commit_bulk --id=further 16 &&
>  			git tag new-tip &&
>
> @@ -170,7 +170,7 @@ bitmap_reuse_tests() {
>
>  	test_expect_success "verify resulting bitmaps ($from -> $to)" '
>  		(
> -			cd repo &&
> +			cd $repo &&
>  			git for-each-ref &&
>  			git rev-list --test-bitmap refs/tags/old-tip &&
>  			git rev-list --test-bitmap refs/tags/new-tip
> @@ -183,7 +183,6 @@ bitmap_reuse_tests 'MIDX' 'pack'
>  bitmap_reuse_tests 'MIDX' 'MIDX'
>
>  test_expect_success 'missing object closure fails gracefully' '
> -	rm -fr repo &&
>  	git init repo &&
>  	test_when_finished "rm -fr repo" &&
>  	(
> @@ -217,7 +216,6 @@ test_expect_success 'setup partial bitmaps' '
>  basic_bitmap_tests HEAD~
>
>  test_expect_success 'removing a MIDX clears stale bitmaps' '
> -	rm -fr repo &&
>  	git init repo &&
>  	test_when_finished "rm -fr repo" &&
>  	(
> @@ -284,7 +282,6 @@ test_expect_success 'pack.preferBitmapTips' '
>  '
>
>  test_expect_success 'hash-cache values are propagated from pack bitmaps' '
> -	rm -fr repo &&
>  	git init repo &&
>  	test_when_finished "rm -fr repo" &&
>  	(


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v3 0/7] pack-bitmap: permute existing namehash values
  2021-09-07 21:17 [PATCH 0/4] pack-bitmap: permute existing namehash values Taylor Blau
                   ` (4 preceding siblings ...)
  2021-09-14 22:05 ` [PATCH v2 0/7] pack-bitmap: permute existing namehash values Taylor Blau
@ 2021-09-17 21:21 ` Taylor Blau
  2021-09-17 21:21   ` [PATCH v3 1/7] t/helper/test-bitmap.c: add 'dump-hashes' mode Taylor Blau
                     ` (7 more replies)
  5 siblings, 8 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-17 21:21 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, gitster

Here is a very small reroll of my series to permute values from an existing
hash-cache when generating multi-pack reachability bitmaps.

The only changes since last time are a slightly modified version of the
performance tests which don't time generating the pack-bitmap. And the new test
in t5326 no longer has a redundant `git config pack.writeBitmapHashCache` since
it is true by default.

It is based on the `tb/multi-pack-bitmaps` topic, which graduated to master.
Thanks in advance for your review on this hopefully-final iteration :-).

Taylor Blau (7):
  t/helper/test-bitmap.c: add 'dump-hashes' mode
  pack-bitmap.c: propagate namehash values from existing bitmaps
  midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
  p5326: create missing 'perf-tag' tag
  p5326: don't set core.multiPackIndex unnecessarily
  p5326: generate pack bitmaps before writing the MIDX bitmap
  t5326: test propagating hashcache values

 Documentation/config/pack.txt      |  4 +++
 builtin/multi-pack-index.c         | 21 +++++++++++++++
 midx.c                             |  6 ++++-
 midx.h                             |  1 +
 pack-bitmap.c                      | 41 +++++++++++++++++++++++++-----
 pack-bitmap.h                      |  1 +
 t/helper/test-bitmap.c             | 10 +++++++-
 t/perf/p5326-multi-pack-bitmaps.sh | 15 ++++++++---
 t/t5326-multi-pack-bitmaps.sh      | 30 ++++++++++++++++++++++
 9 files changed, 118 insertions(+), 11 deletions(-)

Range-diff against v2:
1:  4f2b8d9530 = 1:  4f2b8d9530 t/helper/test-bitmap.c: add 'dump-hashes' mode
2:  2cd2f3aa5e = 2:  2cd2f3aa5e pack-bitmap.c: propagate namehash values from existing bitmaps
3:  f0d8f106c2 = 3:  f0d8f106c2 midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
4:  a8c6e845ad = 4:  a8c6e845ad p5326: create missing 'perf-tag' tag
5:  191922c8f2 = 5:  191922c8f2 p5326: don't set core.multiPackIndex unnecessarily
6:  040bb40548 ! 6:  59b6914ef8 p5326: generate pack bitmaps before writing the MIDX bitmap
    @@ Commit message
         we have to rely on a single-pack bitmap to generate those values for us.

         Therefore, generate a pack bitmap before the MIDX one in order to ensure
    -    that the MIDX bitmap has entries in its hash-cache.
    +    that the MIDX bitmap has entries in its hash-cache. Since we don't want
    +    to time generating the pack bitmap, move that to a non-perf test run
    +    before we try to generate the MIDX bitmap.
    +
    +    Likewise, get rid of the pack bitmap afterwords, to make certain that we
    +    are not accidentally using it in the performance tests run later on.

         Signed-off-by: Taylor Blau <me@ttaylorr.com>

      ## t/perf/p5326-multi-pack-bitmaps.sh ##
     @@ t/perf/p5326-multi-pack-bitmaps.sh: test_expect_success 'create tags' '
    + 	git tag --message="tag pointing to HEAD" perf-tag HEAD
      '

    ++test_expect_success 'start with bitmapped pack' '
    ++	git repack -adb
    ++'
    ++
      test_perf 'setup multi-pack index' '
     -	git repack -ad &&
    -+	git repack -adb &&
      	git multi-pack-index write --bitmap
      '

    ++test_expect_success 'drop pack bitmap' '
    ++	rm -f .git/objects/pack/pack-*.bitmap
    ++'
    ++
    + test_full_bitmap
    +
    + test_expect_success 'create partial bitmap state' '
7:  fdf71432b3 ! 7:  bb16125915 t5326: test propagating hashcache values
    @@ t/t5326-multi-pack-bitmaps.sh: test_expect_success 'pack.preferBitmapTips' '
     +	(
     +		cd repo &&
     +
    -+		git config pack.writeBitmapHashCache true &&
    -+
     +		test_commit base &&
     +		test_commit base2 &&
     +		git repack -adb &&
--
2.33.0.96.g73915697e6

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v3 1/7] t/helper/test-bitmap.c: add 'dump-hashes' mode
  2021-09-17 21:21 ` [PATCH v3 " Taylor Blau
@ 2021-09-17 21:21   ` Taylor Blau
  2021-09-17 21:21   ` [PATCH v3 2/7] pack-bitmap.c: propagate namehash values from existing bitmaps Taylor Blau
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-17 21:21 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, gitster

The pack-bitmap writer code is about to learn how to propagate values
from an existing hash-cache. To prepare, teach the test-bitmap helper to
dump the values from a bitmap's hash-cache extension in order to test
those changes.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c          | 27 +++++++++++++++++++++++++++
 pack-bitmap.h          |  1 +
 t/helper/test-bitmap.c | 10 +++++++++-
 3 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 8504110a4d..04de387318 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1742,6 +1742,33 @@ int test_bitmap_commits(struct repository *r)
 	return 0;
 }
 
+int test_bitmap_hashes(struct repository *r)
+{
+	struct bitmap_index *bitmap_git = prepare_bitmap_git(r);
+	struct object_id oid;
+	uint32_t i, index_pos;
+
+	if (!bitmap_git->hashes)
+		goto cleanup;
+
+	for (i = 0; i < bitmap_num_objects(bitmap_git); i++) {
+		if (bitmap_is_midx(bitmap_git))
+			index_pos = pack_pos_to_midx(bitmap_git->midx, i);
+		else
+			index_pos = pack_pos_to_index(bitmap_git->pack, i);
+
+		nth_bitmap_object_oid(bitmap_git, &oid, index_pos);
+
+		printf("%s %"PRIu32"\n",
+		       oid_to_hex(&oid), get_be32(bitmap_git->hashes + index_pos));
+	}
+
+cleanup:
+	free_bitmap_index(bitmap_git);
+
+	return 0;
+}
+
 int rebuild_bitmap(const uint32_t *reposition,
 		   struct ewah_bitmap *source,
 		   struct bitmap *dest)
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 469090bad2..ed46d27077 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -52,6 +52,7 @@ void traverse_bitmap_commit_list(struct bitmap_index *,
 				 show_reachable_fn show_reachable);
 void test_bitmap_walk(struct rev_info *revs);
 int test_bitmap_commits(struct repository *r);
+int test_bitmap_hashes(struct repository *r);
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 					 struct list_objects_filter_options *filter,
 					 int filter_provided_objects);
diff --git a/t/helper/test-bitmap.c b/t/helper/test-bitmap.c
index 134a1e9d76..ff35f5999b 100644
--- a/t/helper/test-bitmap.c
+++ b/t/helper/test-bitmap.c
@@ -7,6 +7,11 @@ static int bitmap_list_commits(void)
 	return test_bitmap_commits(the_repository);
 }
 
+static int bitmap_dump_hashes(void)
+{
+	return test_bitmap_hashes(the_repository);
+}
+
 int cmd__bitmap(int argc, const char **argv)
 {
 	setup_git_directory();
@@ -16,9 +21,12 @@ int cmd__bitmap(int argc, const char **argv)
 
 	if (!strcmp(argv[1], "list-commits"))
 		return bitmap_list_commits();
+	if (!strcmp(argv[1], "dump-hashes"))
+		return bitmap_dump_hashes();
 
 usage:
-	usage("\ttest-tool bitmap list-commits");
+	usage("\ttest-tool bitmap list-commits\n"
+	      "\ttest-tool bitmap dump-hashes");
 
 	return -1;
 }
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 2/7] pack-bitmap.c: propagate namehash values from existing bitmaps
  2021-09-17 21:21 ` [PATCH v3 " Taylor Blau
  2021-09-17 21:21   ` [PATCH v3 1/7] t/helper/test-bitmap.c: add 'dump-hashes' mode Taylor Blau
@ 2021-09-17 21:21   ` Taylor Blau
  2021-09-17 21:21   ` [PATCH v3 3/7] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps Taylor Blau
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-17 21:21 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, gitster

When an old bitmap exists while writing a new one, we load it and build
a "reposition" table which maps bit positions of objects from the old
bitmap to their respective positions in the new bitmap. This can help
when we encounter a commit which was selected in both the old and new
bitmap, since we only need to permute its bit (not recompute it from
scratch).

We do not, however, repurpose existing namehash values in the case of
the hash-cache extension. There has been thus far no good reason to do
so, since all of the namehash values for objects in the new bitmap would
be populated during the traversal that was just performed by
pack-objects when generating single-pack reachability bitmaps.

But this isn't the case for multi-pack bitmaps, which are written via
`git multi-pack-index write --bitmap` and do not perform any traversal.
In this case all namehash values are set to zero, but we don't even
bother to check the `pack.writeBitmapHashcache` option anyway, so it
fails to matter.

There are two approaches we could take to fill in non-zero hash-cache
values:

  - have either the multi-pack-index builtin run its own
    traversal to attempt to fill in some values, or let a hypothetical
    caller (like `pack-objects` when `repack` eventually drives the
    `multi-pack-index` builtin) fill in the values they found during
    their traversal

  - or copy any existing namehash values that were stored in an
    existing bitmap to their corresponding positions in the new bitmap

In a system where a repository is generally repacked with `git repack
--geometric=<d>` and occasionally repacked with `git repack -a`, the
hash-cache coverage will tend towards all objects.

Since populating the hash-cache is additive (i.e., doing so only helps
our delta search), any intermediate lack of full coverage is just fine.
So let's start by just propagating any values from the existing
hash-cache if we see one.

The next patch will respect the `pack.writeBitmapHashcache` option while
writing MIDX bitmaps, and then test this new behavior.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 04de387318..33a3732992 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1818,18 +1818,20 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 	for (i = 0; i < num_objects; ++i) {
 		struct object_id oid;
 		struct object_entry *oe;
+		uint32_t index_pos;
 
 		if (bitmap_is_midx(bitmap_git))
-			nth_midxed_object_oid(&oid,
-					      bitmap_git->midx,
-					      pack_pos_to_midx(bitmap_git->midx, i));
+			index_pos = pack_pos_to_midx(bitmap_git->midx, i);
 		else
-			nth_packed_object_id(&oid, bitmap_git->pack,
-					     pack_pos_to_index(bitmap_git->pack, i));
+			index_pos = pack_pos_to_index(bitmap_git->pack, i);
+		nth_bitmap_object_oid(bitmap_git, &oid, index_pos);
 		oe = packlist_find(mapping, &oid);
 
-		if (oe)
+		if (oe) {
 			reposition[i] = oe_in_pack_pos(mapping, oe) + 1;
+			if (bitmap_git->hashes && !oe->hash)
+				oe->hash = get_be32(bitmap_git->hashes + index_pos);
+		}
 	}
 
 	return reposition;
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 3/7] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
  2021-09-17 21:21 ` [PATCH v3 " Taylor Blau
  2021-09-17 21:21   ` [PATCH v3 1/7] t/helper/test-bitmap.c: add 'dump-hashes' mode Taylor Blau
  2021-09-17 21:21   ` [PATCH v3 2/7] pack-bitmap.c: propagate namehash values from existing bitmaps Taylor Blau
@ 2021-09-17 21:21   ` Taylor Blau
  2021-09-17 21:21   ` [PATCH v3 4/7] p5326: create missing 'perf-tag' tag Taylor Blau
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-17 21:21 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, gitster

In the previous commit, the bitmap writing code learned to propagate
values from an existing hash-cache extension into the bitmap that it is
writing.

Now that that functionality exists, let's expose it by teaching the 'git
multi-pack-index' builtin to respect the `pack.writeBitmapHashCache`
option so that the hash-cache may be written at all.

Two minor points worth noting here:

  - The 'git multi-pack-index write' sub-command didn't previously read
    any configuration (instead this is handled in the base command). A
    separate handler is added here to respect this write-specific
    config option.

  - I briefly considered adding a 'bitmap_flags' field to the static
    options struct, but decided against it since it would require
    plumbing through a new parameter to the write_midx_file() function.

    Instead, a new MIDX-specific flag is added, which is translated to
    the corresponding bitmap one.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config/pack.txt |  4 ++++
 builtin/multi-pack-index.c    | 21 +++++++++++++++++++++
 midx.c                        |  6 +++++-
 midx.h                        |  1 +
 4 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index 763f7af7c4..ad7f73a1ea 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -159,6 +159,10 @@ pack.writeBitmapHashCache::
 	between an older, bitmapped pack and objects that have been
 	pushed since the last gc). The downside is that it consumes 4
 	bytes per object of disk space. Defaults to true.
++
+When writing a multi-pack reachability bitmap, no new namehashes are
+computed; instead, any namehashes stored in an existing bitmap are
+permuted into their appropriate location when writing a new bitmap.
 
 pack.writeReverseIndex::
 	When true, git will write a corresponding .rev file (see:
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 73c0113b48..578ffea6e8 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -61,6 +61,23 @@ static struct option *add_common_options(struct option *prev)
 	return parse_options_concat(common_opts, prev);
 }
 
+static int git_multi_pack_index_write_config(const char *var, const char *value,
+					     void *cb)
+{
+	if (!strcmp(var, "pack.writebitmaphashcache")) {
+		if (git_config_bool(var, value))
+			opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
+		else
+			opts.flags &= ~MIDX_WRITE_BITMAP_HASH_CACHE;
+	}
+
+	/*
+	 * We should never make a fall-back call to 'git_default_config', since
+	 * this was already called in 'cmd_multi_pack_index()'.
+	 */
+	return 0;
+}
+
 static int cmd_multi_pack_index_write(int argc, const char **argv)
 {
 	struct option *options;
@@ -73,6 +90,10 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 		OPT_END(),
 	};
 
+	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
+
+	git_config(git_multi_pack_index_write_config, NULL);
+
 	options = add_common_options(builtin_multi_pack_index_write_options);
 
 	trace2_cmd_mode(argv[0]);
diff --git a/midx.c b/midx.c
index 864034a6ad..38c8600458 100644
--- a/midx.c
+++ b/midx.c
@@ -1008,9 +1008,13 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
 	struct pack_idx_entry **index;
 	struct commit **commits = NULL;
 	uint32_t i, commits_nr;
+	uint16_t options = 0;
 	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
 	int ret;
 
+	if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
+		options |= BITMAP_OPT_HASH_CACHE;
+
 	prepare_midx_packing_data(&pdata, ctx);
 
 	commits = find_commits_for_midx_bitmap(&commits_nr, ctx);
@@ -1049,7 +1053,7 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
 		goto cleanup;
 
 	bitmap_writer_set_checksum(midx_hash);
-	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, 0);
+	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, options);
 
 cleanup:
 	free(index);
diff --git a/midx.h b/midx.h
index aa3da557bb..541d9ac728 100644
--- a/midx.h
+++ b/midx.h
@@ -44,6 +44,7 @@ struct multi_pack_index {
 #define MIDX_PROGRESS     (1 << 0)
 #define MIDX_WRITE_REV_INDEX (1 << 1)
 #define MIDX_WRITE_BITMAP (1 << 2)
+#define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m);
 char *get_midx_filename(const char *object_dir);
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 4/7] p5326: create missing 'perf-tag' tag
  2021-09-17 21:21 ` [PATCH v3 " Taylor Blau
                     ` (2 preceding siblings ...)
  2021-09-17 21:21   ` [PATCH v3 3/7] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps Taylor Blau
@ 2021-09-17 21:21   ` Taylor Blau
  2021-09-17 21:21   ` [PATCH v3 5/7] p5326: don't set core.multiPackIndex unnecessarily Taylor Blau
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-17 21:21 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, gitster

Some of the tests in test_full_bitmap rely on having a tag named
perf-tag in place. We could create it in test_full_bitmap(), but we want
to have it in place before the repack starts.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/p5326-multi-pack-bitmaps.sh | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
index 5845109ac7..51b5636259 100755
--- a/t/perf/p5326-multi-pack-bitmaps.sh
+++ b/t/perf/p5326-multi-pack-bitmaps.sh
@@ -10,6 +10,12 @@ test_expect_success 'enable multi-pack index' '
 	git config core.multiPackIndex true
 '
 
+# we need to create the tag up front such that it is covered by the repack and
+# thus by generated bitmaps.
+test_expect_success 'create tags' '
+	git tag --message="tag pointing to HEAD" perf-tag HEAD
+'
+
 test_perf 'setup multi-pack index' '
 	git repack -ad &&
 	git multi-pack-index write --bitmap
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 5/7] p5326: don't set core.multiPackIndex unnecessarily
  2021-09-17 21:21 ` [PATCH v3 " Taylor Blau
                     ` (3 preceding siblings ...)
  2021-09-17 21:21   ` [PATCH v3 4/7] p5326: create missing 'perf-tag' tag Taylor Blau
@ 2021-09-17 21:21   ` Taylor Blau
  2021-09-17 21:21   ` [PATCH v3 6/7] p5326: generate pack bitmaps before writing the MIDX bitmap Taylor Blau
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-17 21:21 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, gitster

When this performance test was originally written, `core.multiPackIndex`
was not the default and thus had to be enabled. But now that we have
18e449f86b (midx: enable core.multiPackIndex by default, 2020-09-25), we
no longer need this.

Drop the unnecessary setup (even though it's not hurting anything, it is
unnecessary at best and confusing at worst).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/p5326-multi-pack-bitmaps.sh | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
index 51b5636259..a9c5499537 100755
--- a/t/perf/p5326-multi-pack-bitmaps.sh
+++ b/t/perf/p5326-multi-pack-bitmaps.sh
@@ -6,10 +6,6 @@ test_description='Tests performance using midx bitmaps'
 
 test_perf_large_repo
 
-test_expect_success 'enable multi-pack index' '
-	git config core.multiPackIndex true
-'
-
 # we need to create the tag up front such that it is covered by the repack and
 # thus by generated bitmaps.
 test_expect_success 'create tags' '
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 6/7] p5326: generate pack bitmaps before writing the MIDX bitmap
  2021-09-17 21:21 ` [PATCH v3 " Taylor Blau
                     ` (4 preceding siblings ...)
  2021-09-17 21:21   ` [PATCH v3 5/7] p5326: don't set core.multiPackIndex unnecessarily Taylor Blau
@ 2021-09-17 21:21   ` Taylor Blau
  2021-09-17 21:21   ` [PATCH v3 7/7] t5326: test propagating hashcache values Taylor Blau
  2021-09-17 22:12   ` [PATCH v3 0/7] pack-bitmap: permute existing namehash values Jeff King
  7 siblings, 0 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-17 21:21 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, gitster

To help test the performance of permuting the contents of the hash-cache
when generating a MIDX bitmap, we need a bitmap which has its hash-cache
populated.

And since multi-pack bitmaps don't add *new* values to the hash-cache,
we have to rely on a single-pack bitmap to generate those values for us.

Therefore, generate a pack bitmap before the MIDX one in order to ensure
that the MIDX bitmap has entries in its hash-cache. Since we don't want
to time generating the pack bitmap, move that to a non-perf test run
before we try to generate the MIDX bitmap.

Likewise, get rid of the pack bitmap afterwords, to make certain that we
are not accidentally using it in the performance tests run later on.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/p5326-multi-pack-bitmaps.sh | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
index a9c5499537..f2fa228f16 100755
--- a/t/perf/p5326-multi-pack-bitmaps.sh
+++ b/t/perf/p5326-multi-pack-bitmaps.sh
@@ -12,11 +12,18 @@ test_expect_success 'create tags' '
 	git tag --message="tag pointing to HEAD" perf-tag HEAD
 '
 
+test_expect_success 'start with bitmapped pack' '
+	git repack -adb
+'
+
 test_perf 'setup multi-pack index' '
-	git repack -ad &&
 	git multi-pack-index write --bitmap
 '
 
+test_expect_success 'drop pack bitmap' '
+	rm -f .git/objects/pack/pack-*.bitmap
+'
+
 test_full_bitmap
 
 test_expect_success 'create partial bitmap state' '
-- 
2.33.0.96.g73915697e6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 7/7] t5326: test propagating hashcache values
  2021-09-17 21:21 ` [PATCH v3 " Taylor Blau
                     ` (5 preceding siblings ...)
  2021-09-17 21:21   ` [PATCH v3 6/7] p5326: generate pack bitmaps before writing the MIDX bitmap Taylor Blau
@ 2021-09-17 21:21   ` Taylor Blau
  2021-09-17 22:12   ` [PATCH v3 0/7] pack-bitmap: permute existing namehash values Jeff King
  7 siblings, 0 replies; 55+ messages in thread
From: Taylor Blau @ 2021-09-17 21:21 UTC (permalink / raw)
  To: git; +Cc: peff, avarab, gitster

Now that we both can propagate values from the hashcache, and respect
the configuration to enable the hashcache at all, test that both of
these function correctly by hardening their behavior with a test.

Like the hash-cache in classic single-pack bitmaps, this helps more
proportionally the more up-to-date your bitmap coverage is. When our
bitmap coverage is out-of-date with the ref tips, we spend more time
proportionally traversing, and all of that traversal gets the name-hash
filled in.

But for the up-to-date bitmaps, this helps quite a bit. These numbers
are on git.git, with `pack.threads=1` to help see the difference
reflected in the overall runtime.

    Test                            origin/tb/multi-pack-bitmaps   HEAD
    -------------------------------------------------------------------------------------
    5326.4: simulated clone         1.87(1.80+0.07)                1.46(1.42+0.03) -21.9%
    5326.5: simulated fetch         2.66(2.61+0.04)                1.47(1.43+0.04) -44.7%
    5326.6: pack to file (bitmap)   2.74(2.62+0.12)                1.89(1.82+0.07) -31.0%

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5326-multi-pack-bitmaps.sh | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 4ad7c2c969..ec4aa89f63 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -283,4 +283,34 @@ test_expect_success 'pack.preferBitmapTips' '
 	)
 '
 
+test_expect_success 'hash-cache values are propagated from pack bitmaps' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit base &&
+		test_commit base2 &&
+		git repack -adb &&
+
+		test-tool bitmap dump-hashes >pack.raw &&
+		test_file_not_empty pack.raw &&
+		sort pack.raw >pack.hashes &&
+
+		test_commit new &&
+		git repack &&
+		git multi-pack-index write --bitmap &&
+
+		test-tool bitmap dump-hashes >midx.raw &&
+		sort midx.raw >midx.hashes &&
+
+		# ensure that every namehash in the pack bitmap can be found in
+		# the midx bitmap (i.e., that there are no oid-namehash pairs
+		# unique to the pack bitmap).
+		comm -23 pack.hashes midx.hashes >dropped.hashes &&
+		test_must_be_empty dropped.hashes
+	)
+'
+
 test_done
-- 
2.33.0.96.g73915697e6

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 0/7] pack-bitmap: permute existing namehash values
  2021-09-17 21:21 ` [PATCH v3 " Taylor Blau
                     ` (6 preceding siblings ...)
  2021-09-17 21:21   ` [PATCH v3 7/7] t5326: test propagating hashcache values Taylor Blau
@ 2021-09-17 22:12   ` Jeff King
  7 siblings, 0 replies; 55+ messages in thread
From: Jeff King @ 2021-09-17 22:12 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, avarab, gitster

On Fri, Sep 17, 2021 at 05:21:11PM -0400, Taylor Blau wrote:

> Here is a very small reroll of my series to permute values from an existing
> hash-cache when generating multi-pack reachability bitmaps.
> 
> The only changes since last time are a slightly modified version of the
> performance tests which don't time generating the pack-bitmap. And the new test
> in t5326 no longer has a redundant `git config pack.writeBitmapHashCache` since
> it is true by default.

Thanks for picking up my suggestions. This all looks good to me!

-Peff

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2021-09-17 22:12 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-07 21:17 [PATCH 0/4] pack-bitmap: permute existing namehash values Taylor Blau
2021-09-07 21:17 ` [PATCH 1/4] t/helper/test-bitmap.c: add 'dump-hashes' mode Taylor Blau
2021-09-08  1:37   ` Ævar Arnfjörð Bjarmason
2021-09-08  2:24     ` Taylor Blau
2021-09-07 21:17 ` [PATCH 2/4] pack-bitmap.c: propagate namehash values from existing bitmaps Taylor Blau
2021-09-07 21:18 ` [PATCH 3/4] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps Taylor Blau
2021-09-08  1:40   ` Ævar Arnfjörð Bjarmason
2021-09-08  2:28     ` Taylor Blau
2021-09-09  8:18       ` Ævar Arnfjörð Bjarmason
2021-09-09  9:34         ` Ævar Arnfjörð Bjarmason
2021-09-09 14:55           ` Taylor Blau
2021-09-09 15:50             ` Ævar Arnfjörð Bjarmason
2021-09-09 16:23               ` Taylor Blau
2021-09-09 14:47         ` Taylor Blau
2021-09-13  0:38   ` Junio C Hamano
2021-09-14  1:15     ` Taylor Blau
2021-09-07 21:18 ` [PATCH 4/4] t5326: test propagating hashcache values Taylor Blau
2021-09-08  1:46   ` Ævar Arnfjörð Bjarmason
2021-09-08  2:30     ` Taylor Blau
2021-09-17  8:56       ` Ævar Arnfjörð Bjarmason
2021-09-17 17:32         ` Taylor Blau
2021-09-17 19:22           ` Ævar Arnfjörð Bjarmason
2021-09-13  0:46   ` Junio C Hamano
2021-09-14  1:12     ` Taylor Blau
2021-09-14  2:05       ` Junio C Hamano
2021-09-14  5:11         ` Taylor Blau
2021-09-14  5:17           ` Taylor Blau
2021-09-14  5:27           ` Jeff King
2021-09-14  5:31             ` Taylor Blau
2021-09-14  5:23         ` Jeff King
2021-09-14  5:49           ` Junio C Hamano
2021-09-14 22:05 ` [PATCH v2 0/7] pack-bitmap: permute existing namehash values Taylor Blau
2021-09-14 22:06   ` [PATCH v2 1/7] t/helper/test-bitmap.c: add 'dump-hashes' mode Taylor Blau
2021-09-14 22:06   ` [PATCH v2 2/7] pack-bitmap.c: propagate namehash values from existing bitmaps Taylor Blau
2021-09-14 22:06   ` [PATCH v2 3/7] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps Taylor Blau
2021-09-14 22:06   ` [PATCH v2 4/7] p5326: create missing 'perf-tag' tag Taylor Blau
2021-09-16 22:36     ` Jeff King
2021-09-17  4:14       ` Taylor Blau
2021-09-14 22:06   ` [PATCH v2 5/7] p5326: don't set core.multiPackIndex unnecessarily Taylor Blau
2021-09-16 22:38     ` Jeff King
2021-09-14 22:06   ` [PATCH v2 6/7] p5326: generate pack bitmaps before writing the MIDX bitmap Taylor Blau
2021-09-16 22:45     ` Jeff King
2021-09-17  4:20       ` Taylor Blau
2021-09-14 22:06   ` [PATCH v2 7/7] t5326: test propagating hashcache values Taylor Blau
2021-09-16 22:49     ` Jeff King
2021-09-16 22:52   ` [PATCH v2 0/7] pack-bitmap: permute existing namehash values Jeff King
2021-09-17 21:21 ` [PATCH v3 " Taylor Blau
2021-09-17 21:21   ` [PATCH v3 1/7] t/helper/test-bitmap.c: add 'dump-hashes' mode Taylor Blau
2021-09-17 21:21   ` [PATCH v3 2/7] pack-bitmap.c: propagate namehash values from existing bitmaps Taylor Blau
2021-09-17 21:21   ` [PATCH v3 3/7] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps Taylor Blau
2021-09-17 21:21   ` [PATCH v3 4/7] p5326: create missing 'perf-tag' tag Taylor Blau
2021-09-17 21:21   ` [PATCH v3 5/7] p5326: don't set core.multiPackIndex unnecessarily Taylor Blau
2021-09-17 21:21   ` [PATCH v3 6/7] p5326: generate pack bitmaps before writing the MIDX bitmap Taylor Blau
2021-09-17 21:21   ` [PATCH v3 7/7] t5326: test propagating hashcache values Taylor Blau
2021-09-17 22:12   ` [PATCH v3 0/7] pack-bitmap: permute existing namehash values Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).