git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] pack-bitmap: gracefully handle missing BTMP chunks
@ 2024-04-09  5:59 Patrick Steinhardt
  2024-04-10 15:02 ` Taylor Blau
  2024-04-15  6:41 ` [PATCH v2] " Patrick Steinhardt
  0 siblings, 2 replies; 13+ messages in thread
From: Patrick Steinhardt @ 2024-04-09  5:59 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 6191 bytes --]

In 0fea6b73f1 (Merge branch 'tb/multi-pack-verbatim-reuse', 2024-01-12)
we have introduced multi-pack verbatim reuse of objects. This series has
introduced a new BTMP chunk, which encodes information about bitmapped
objects in the multi-pack index. Starting with dab60934e3 (pack-bitmap:
pass `bitmapped_pack` struct to pack-reuse functions, 2023-12-14) we use
this information to figure out objects which we can reuse from each of
the packfiles.

One thing that we glossed over though is backwards compatibility with
repositories that do not yet have BTMP chunks in their multi-pack index.
In that case, `nth_bitmapped_pack()` would return an error, which causes
us to emit a warning followed by another error message. These warnings
are visible to users that fetch from a repository:

```
$ git fetch
...
remote: error: MIDX does not contain the BTMP chunk
remote: warning: unable to load pack: 'pack-f6bb7bd71d345ea9fe604b60cab9ba9ece54ffbe.idx', disabling pack-reuse
remote: Enumerating objects: 40, done.
remote: Counting objects: 100% (40/40), done.
remote: Compressing objects: 100% (39/39), done.
remote: Total 40 (delta 5), reused 0 (delta 0), pack-reused 0 (from 0)
...
```

While the fetch succeeds the user is left wondering what they did wrong.
Furthermore, as visible both from the warning and from the reuse stats,
pack-reuse is completely disabled in such repositories.

What is quite interesting is that this issue can even be triggered in
case `pack.allowPackReuse=single` is set, which is the default value.
One could have expected that in this case we fall back to the old logic,
which is to use the preferred packfile without consulting BTMP chunks at
all. But either we fail with the above error in case they are missing,
or we use the first pack in the multi-pack-index. The former case
disables pack-reuse altogether, whereas the latter case may result in
reusing objects from a suboptimal packfile.

Fix this issue by partially reverting the logic back to what we had
before this patch series landed. Namely, in the case where we have no
BTMP chunks or when `pack.allowPackReuse=single` are set, we use the
preferred pack instead of consulting the BTMP chunks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 midx.c                        |  7 ++++---
 pack-bitmap.c                 | 36 ++++++++++++++++++-----------------
 t/t5326-multi-pack-bitmaps.sh | 22 +++++++++++++++++++++
 3 files changed, 45 insertions(+), 20 deletions(-)

diff --git a/midx.c b/midx.c
index 41521e019c..6903e9dfd2 100644
--- a/midx.c
+++ b/midx.c
@@ -1661,9 +1661,10 @@ static int write_midx_internal(const char *object_dir,
 		add_chunk(cf, MIDX_CHUNKID_REVINDEX,
 			  st_mult(ctx.entries_nr, sizeof(uint32_t)),
 			  write_midx_revindex);
-		add_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
-			  bitmapped_packs_concat_len,
-			  write_midx_bitmapped_packs);
+		if (git_env_bool("GIT_TEST_MIDX_WRITE_BTMP", 1))
+			add_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
+				  bitmapped_packs_concat_len,
+				  write_midx_bitmapped_packs);
 	}
 
 	write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 2baeabacee..f286805724 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2049,7 +2049,25 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 
 	load_reverse_index(r, bitmap_git);
 
-	if (bitmap_is_midx(bitmap_git)) {
+	if (bitmap_is_midx(bitmap_git) &&
+	    (!multi_pack_reuse || !bitmap_git->midx->chunk_bitmapped_packs)) {
+		uint32_t preferred_pack_pos;
+		struct packed_git *pack;
+
+		if (midx_preferred_pack(bitmap_git->midx, &preferred_pack_pos) < 0) {
+			warning(_("unable to compute preferred pack, disabling pack-reuse"));
+			return;
+		}
+
+		pack = bitmap_git->midx->packs[preferred_pack_pos];
+
+		ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
+		packs[packs_nr].p = pack;
+		packs[packs_nr].bitmap_nr = pack->num_objects;
+		packs[packs_nr].bitmap_pos = 0;
+
+		objects_nr = packs[packs_nr++].bitmap_nr;
+	} else if (bitmap_is_midx(bitmap_git)) {
 		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
 			struct bitmapped_pack pack;
 			if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {
@@ -2062,26 +2080,10 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 			if (!pack.bitmap_nr)
 				continue;
 
-			if (!multi_pack_reuse && pack.bitmap_pos) {
-				/*
-				 * If we're only reusing a single pack, skip
-				 * over any packs which are not positioned at
-				 * the beginning of the MIDX bitmap.
-				 *
-				 * This is consistent with the existing
-				 * single-pack reuse behavior, which only reuses
-				 * parts of the MIDX's preferred pack.
-				 */
-				continue;
-			}
-
 			ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
 			memcpy(&packs[packs_nr++], &pack, sizeof(pack));
 
 			objects_nr += pack.p->num_objects;
-
-			if (!multi_pack_reuse)
-				break;
 		}
 
 		QSORT(packs, packs_nr, bitmapped_pack_cmp);
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 70d1b58709..ee3843b239 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -513,4 +513,26 @@ test_expect_success 'corrupt MIDX with bitmap causes fallback' '
 	)
 '
 
+for allow_pack_reuse in single multi
+do
+	test_expect_success "MIDX without BTMP chunk does not complain with $allow_pack_reuse pack reuse" '
+		test_when_finished "rm -rf midx-without-btmp" &&
+		git init midx-without-btmp &&
+		(
+			cd midx-without-btmp &&
+			test_commit initial &&
+
+			# Write a multi-pack index that does have a bitmap, but
+			# no BTMP chunk. Such MIDX files would not be generated
+			# by modern Git anymore, but they were generated by
+			# older Git versions.
+			GIT_TEST_MIDX_WRITE_BTMP=false \
+				git repack -Adbl --write-bitmap-index --write-midx &&
+			git -c pack.allowPackReuse=$allow_pack_reuse \
+				pack-objects --all --use-bitmap-index --stdout </dev/null >/dev/null 2>err &&
+			test_must_be_empty err
+		)
+	'
+done
+
 test_done
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] pack-bitmap: gracefully handle missing BTMP chunks
  2024-04-09  5:59 [PATCH] pack-bitmap: gracefully handle missing BTMP chunks Patrick Steinhardt
@ 2024-04-10 15:02 ` Taylor Blau
  2024-04-15  6:34   ` Patrick Steinhardt
  2024-04-15  6:41 ` [PATCH v2] " Patrick Steinhardt
  1 sibling, 1 reply; 13+ messages in thread
From: Taylor Blau @ 2024-04-10 15:02 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On Tue, Apr 09, 2024 at 07:59:25AM +0200, Patrick Steinhardt wrote:
> One thing that we glossed over though is backwards compatibility with
> repositories that do not yet have BTMP chunks in their multi-pack index.
> In that case, `nth_bitmapped_pack()` would return an error, which causes
> us to emit a warning followed by another error message. These warnings
> are visible to users that fetch from a repository:
>
> ```
> $ git fetch
> ...
> remote: error: MIDX does not contain the BTMP chunk
> remote: warning: unable to load pack: 'pack-f6bb7bd71d345ea9fe604b60cab9ba9ece54ffbe.idx', disabling pack-reuse
> remote: Enumerating objects: 40, done.
> remote: Counting objects: 100% (40/40), done.
> remote: Compressing objects: 100% (39/39), done.
> remote: Total 40 (delta 5), reused 0 (delta 0), pack-reused 0 (from 0)
> ...
> ```

Nice catch. This is definitely an oversight from my original series,
which should not disrupt single-pack reuse from the MIDX's preferred
pack when using a MIDX/bitmap written prior to the introduction of the
BTMP chunk.

> Fix this issue by partially reverting the logic back to what we had
> before this patch series landed. Namely, in the case where we have no
> BTMP chunks or when `pack.allowPackReuse=single` are set, we use the
> preferred pack instead of consulting the BTMP chunks.

I think that the approach here makes sense to me. The gist is:

- If we don't have a BTMP chunk, then only reuse objects from the
  preferred pack.

- Otherwise, we do have a BTMP chunk (or we're not doing multi-pack
  reuse). If we are reusing objects from multiple packs, then do the
  usual MIDX reuse routines.

- Otherwise, we're doing single-pack reuse. In that case, only reuse
  objects from the preferred pack.

> diff --git a/midx.c b/midx.c
> index 41521e019c..6903e9dfd2 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -1661,9 +1661,10 @@ static int write_midx_internal(const char *object_dir,
>  		add_chunk(cf, MIDX_CHUNKID_REVINDEX,
>  			  st_mult(ctx.entries_nr, sizeof(uint32_t)),
>  			  write_midx_revindex);
> -		add_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
> -			  bitmapped_packs_concat_len,
> -			  write_midx_bitmapped_packs);
> +		if (git_env_bool("GIT_TEST_MIDX_WRITE_BTMP", 1))
> +			add_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
> +				  bitmapped_packs_concat_len,
> +				  write_midx_bitmapped_packs);

I wish that this were possible to exercise without a new
GIT_TEST_-variable. I think there are a couple of alternatives:

You could introduce a new GIT_TEST_MIDX_READ_BTMP variable, and then set
that to control whether or not we read the BTMP chunk. This is what we
did in:

  - 28cd730680d (pack-bitmap: prepare to read lookup table extension,
    2022-08-14), as well as in

  - 7f514b7a5e7 (midx: read `RIDX` chunk when present, 2022-01-25)

. I have a vague preference towards controlling whether or not we read
the BTMP chunk (as opposed to whether or not we write it) as this
removes a potential footgun for users who might accidentally disable
writing a BTMP chunk (in which case you have to rewrite the whole MIDX)
as opposed to reading it (in which case you just change your environment
variable).

Of course, that is still using a GIT_TEST_-variable, which is less than
ideal IMHO. The other alternative would be to store a MIDX file as a
test fixture in the tree (which we do in a couple of places). But with
the recent xz shenanigans, I'm not sure that's a great idea either ;-).

> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 2baeabacee..f286805724 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -2049,7 +2049,25 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
>
>  	load_reverse_index(r, bitmap_git);
>
> -	if (bitmap_is_midx(bitmap_git)) {
> +	if (bitmap_is_midx(bitmap_git) &&
> +	    (!multi_pack_reuse || !bitmap_git->midx->chunk_bitmapped_packs)) {
> +		uint32_t preferred_pack_pos;
> +		struct packed_git *pack;
> +
> +		if (midx_preferred_pack(bitmap_git->midx, &preferred_pack_pos) < 0) {
> +			warning(_("unable to compute preferred pack, disabling pack-reuse"));
> +			return;
> +		}
> +
> +		pack = bitmap_git->midx->packs[preferred_pack_pos];
> +
> +		ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
> +		packs[packs_nr].p = pack;
> +		packs[packs_nr].bitmap_nr = pack->num_objects;
> +		packs[packs_nr].bitmap_pos = 0;
> +
> +		objects_nr = packs[packs_nr++].bitmap_nr;
> +	} else if (bitmap_is_midx(bitmap_git)) {
>  		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
>  			struct bitmapped_pack pack;
>  			if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {

This all makes sense to me. I think we could make the result slightly
more readable by handling the case where we're doing multi-pack reuse
separately from the case where we're not.

I tried to make that change locally to see if it was a good idea, and
I'm reasonably happy with the result. I can't think of a great way to
talk about it without just showing the resulting patch (as the
inter-diff is fairly difficult to read IMHO). So here is the resulting
patch (forging your s-o-b):

--- 8< ---
Subject: [PATCH] pack-bitmap: gracefully handle missing BTMP chunks

In 0fea6b73f1 (Merge branch 'tb/multi-pack-verbatim-reuse', 2024-01-12)
we have introduced multi-pack verbatim reuse of objects. This series has
introduced a new BTMP chunk, which encodes information about bitmapped
objects in the multi-pack index. Starting with dab60934e3 (pack-bitmap:
pass `bitmapped_pack` struct to pack-reuse functions, 2023-12-14) we use
this information to figure out objects which we can reuse from each of
the packfiles.

One thing that we glossed over though is backwards compatibility with
repositories that do not yet have BTMP chunks in their multi-pack index.
In that case, `nth_bitmapped_pack()` would return an error, which causes
us to emit a warning followed by another error message. These warnings
are visible to users that fetch from a repository:

```
$ git fetch
...
remote: error: MIDX does not contain the BTMP chunk
remote: warning: unable to load pack: 'pack-f6bb7bd71d345ea9fe604b60cab9ba9ece54ffbe.idx', disabling pack-reuse
remote: Enumerating objects: 40, done.
remote: Counting objects: 100% (40/40), done.
remote: Compressing objects: 100% (39/39), done.
remote: Total 40 (delta 5), reused 0 (delta 0), pack-reused 0 (from 0)
...
```

While the fetch succeeds the user is left wondering what they did wrong.
Furthermore, as visible both from the warning and from the reuse stats,
pack-reuse is completely disabled in such repositories.

What is quite interesting is that this issue can even be triggered in
case `pack.allowPackReuse=single` is set, which is the default value.
One could have expected that in this case we fall back to the old logic,
which is to use the preferred packfile without consulting BTMP chunks at
all. But either we fail with the above error in case they are missing,
or we use the first pack in the multi-pack-index. The former case
disables pack-reuse altogether, whereas the latter case may result in
reusing objects from a suboptimal packfile.

Fix this issue by partially reverting the logic back to what we had
before this patch series landed. Namely, in the case where we have no
BTMP chunks or when `pack.allowPackReuse=single` are set, we use the
preferred pack instead of consulting the BTMP chunks.

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c                        |  7 +++---
 pack-bitmap.c                 | 44 +++++++++++++++++++----------------
 t/t5326-multi-pack-bitmaps.sh | 22 ++++++++++++++++++
 3 files changed, 50 insertions(+), 23 deletions(-)

diff --git a/midx.c b/midx.c
index 41521e019c6..6903e9dfd25 100644
--- a/midx.c
+++ b/midx.c
@@ -1661,9 +1661,10 @@ static int write_midx_internal(const char *object_dir,
 		add_chunk(cf, MIDX_CHUNKID_REVINDEX,
 			  st_mult(ctx.entries_nr, sizeof(uint32_t)),
 			  write_midx_revindex);
-		add_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
-			  bitmapped_packs_concat_len,
-			  write_midx_bitmapped_packs);
+		if (git_env_bool("GIT_TEST_MIDX_WRITE_BTMP", 1))
+			add_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
+				  bitmapped_packs_concat_len,
+				  write_midx_bitmapped_packs);
 	}

 	write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 2baeabacee1..44b32ee3561 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2049,7 +2049,14 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,

 	load_reverse_index(r, bitmap_git);

-	if (bitmap_is_midx(bitmap_git)) {
+	if (!bitmap_is_midx(bitmap_git) ||
+	    !bitmap_git->midx->chunk_bitmapped_packs)
+		multi_pack_reuse = 0;
+
+	if (multi_pack_reuse) {
+		if (!bitmap_is_midx(bitmap_git))
+			BUG("attempting to perform multi-pack reuse on non-MIDX bitmap");
+
 		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
 			struct bitmapped_pack pack;
 			if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {
@@ -2062,36 +2069,33 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 			if (!pack.bitmap_nr)
 				continue;

-			if (!multi_pack_reuse && pack.bitmap_pos) {
-				/*
-				 * If we're only reusing a single pack, skip
-				 * over any packs which are not positioned at
-				 * the beginning of the MIDX bitmap.
-				 *
-				 * This is consistent with the existing
-				 * single-pack reuse behavior, which only reuses
-				 * parts of the MIDX's preferred pack.
-				 */
-				continue;
-			}
-
 			ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
 			memcpy(&packs[packs_nr++], &pack, sizeof(pack));

 			objects_nr += pack.p->num_objects;
-
-			if (!multi_pack_reuse)
-				break;
 		}

 		QSORT(packs, packs_nr, bitmapped_pack_cmp);
 	} else {
+		struct packed_git *pack;
+		if (bitmap_is_midx(bitmap_git)) {
+			uint32_t pack_int_id;
+			if (midx_preferred_pack(bitmap_git->midx, &pack_int_id) < 0) {
+				warning(_("unable to compute preferred pack, "
+					  "disabling pack-reuse"));
+				return;
+			}
+
+			pack = bitmap_git->midx->packs[pack_int_id];
+		} else {
+			pack = bitmap_git->pack;
+		}
+
 		ALLOC_GROW(packs, packs_nr + 1, packs_alloc);

-		packs[packs_nr].p = bitmap_git->pack;
-		packs[packs_nr].bitmap_nr = bitmap_git->pack->num_objects;
+		packs[packs_nr].p = pack;
+		packs[packs_nr].bitmap_nr = pack->num_objects;
 		packs[packs_nr].bitmap_pos = 0;
-
 		objects_nr = packs[packs_nr++].bitmap_nr;
 	}

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 70d1b58709a..ee3843b2390 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -513,4 +513,26 @@ test_expect_success 'corrupt MIDX with bitmap causes fallback' '
 	)
 '

+for allow_pack_reuse in single multi
+do
+	test_expect_success "MIDX without BTMP chunk does not complain with $allow_pack_reuse pack reuse" '
+		test_when_finished "rm -rf midx-without-btmp" &&
+		git init midx-without-btmp &&
+		(
+			cd midx-without-btmp &&
+			test_commit initial &&
+
+			# Write a multi-pack index that does have a bitmap, but
+			# no BTMP chunk. Such MIDX files would not be generated
+			# by modern Git anymore, but they were generated by
+			# older Git versions.
+			GIT_TEST_MIDX_WRITE_BTMP=false \
+				git repack -Adbl --write-bitmap-index --write-midx &&
+			git -c pack.allowPackReuse=$allow_pack_reuse \
+				pack-objects --all --use-bitmap-index --stdout </dev/null >/dev/null 2>err &&
+			test_must_be_empty err
+		)
+	'
+done
+
 test_done

base-commit: 91ec36f2cca02d33ab0ed6e87195c6fe801debae
--
2.44.0.549.g74a2f60dcb0
--- >8 ---

The way I would structure this series is to first apply the portion of
the above patch *without* these lines:

-	if (bitmap_is_midx(bitmap_git)) {
+	if (!bitmap_is_midx(bitmap_git) ||
+	    !bitmap_git->midx->chunk_bitmapped_packs)
+		multi_pack_reuse = 0;
+

, so we're still able to reproduce the issue. Then, apply the remaining
portions (the above diff, the test, and the GIT_TEST_MIDX_WRITE_BTMP
stuff) to demonstrate that the issue is fixed via a separate commit.

I'm happy to write that up, and equally happy to not do it ;-). Sorry
for the lengthy review, but thank you very much for spotting and fixing
this issue.

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] pack-bitmap: gracefully handle missing BTMP chunks
  2024-04-10 15:02 ` Taylor Blau
@ 2024-04-15  6:34   ` Patrick Steinhardt
  2024-04-15 22:42     ` Taylor Blau
  0 siblings, 1 reply; 13+ messages in thread
From: Patrick Steinhardt @ 2024-04-15  6:34 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 4764 bytes --]

On Wed, Apr 10, 2024 at 11:02:10AM -0400, Taylor Blau wrote:
> On Tue, Apr 09, 2024 at 07:59:25AM +0200, Patrick Steinhardt wrote:
[snip]
> > diff --git a/midx.c b/midx.c
> > index 41521e019c..6903e9dfd2 100644
> > --- a/midx.c
> > +++ b/midx.c
> > @@ -1661,9 +1661,10 @@ static int write_midx_internal(const char *object_dir,
> >  		add_chunk(cf, MIDX_CHUNKID_REVINDEX,
> >  			  st_mult(ctx.entries_nr, sizeof(uint32_t)),
> >  			  write_midx_revindex);
> > -		add_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
> > -			  bitmapped_packs_concat_len,
> > -			  write_midx_bitmapped_packs);
> > +		if (git_env_bool("GIT_TEST_MIDX_WRITE_BTMP", 1))
> > +			add_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
> > +				  bitmapped_packs_concat_len,
> > +				  write_midx_bitmapped_packs);
> 
> I wish that this were possible to exercise without a new
> GIT_TEST_-variable. I think there are a couple of alternatives:
> 
> You could introduce a new GIT_TEST_MIDX_READ_BTMP variable, and then set
> that to control whether or not we read the BTMP chunk. This is what we
> did in:
> 
>   - 28cd730680d (pack-bitmap: prepare to read lookup table extension,
>     2022-08-14), as well as in
> 
>   - 7f514b7a5e7 (midx: read `RIDX` chunk when present, 2022-01-25)
> 
> . I have a vague preference towards controlling whether or not we read
> the BTMP chunk (as opposed to whether or not we write it) as this
> removes a potential footgun for users who might accidentally disable
> writing a BTMP chunk (in which case you have to rewrite the whole MIDX)
> as opposed to reading it (in which case you just change your environment
> variable).
> 
> Of course, that is still using a GIT_TEST_-variable, which is less than
> ideal IMHO. The other alternative would be to store a MIDX file as a
> test fixture in the tree (which we do in a couple of places). But with
> the recent xz shenanigans, I'm not sure that's a great idea either ;-).

I'm happy to convert this to use `GIT_TEST_MIDX_READ_BTMP` instead.

> > diff --git a/pack-bitmap.c b/pack-bitmap.c
> > index 2baeabacee..f286805724 100644
> > --- a/pack-bitmap.c
> > +++ b/pack-bitmap.c
> > @@ -2049,7 +2049,25 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
> >
> >  	load_reverse_index(r, bitmap_git);
> >
> > -	if (bitmap_is_midx(bitmap_git)) {
> > +	if (bitmap_is_midx(bitmap_git) &&
> > +	    (!multi_pack_reuse || !bitmap_git->midx->chunk_bitmapped_packs)) {
> > +		uint32_t preferred_pack_pos;
> > +		struct packed_git *pack;
> > +
> > +		if (midx_preferred_pack(bitmap_git->midx, &preferred_pack_pos) < 0) {
> > +			warning(_("unable to compute preferred pack, disabling pack-reuse"));
> > +			return;
> > +		}
> > +
> > +		pack = bitmap_git->midx->packs[preferred_pack_pos];
> > +
> > +		ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
> > +		packs[packs_nr].p = pack;
> > +		packs[packs_nr].bitmap_nr = pack->num_objects;
> > +		packs[packs_nr].bitmap_pos = 0;
> > +
> > +		objects_nr = packs[packs_nr++].bitmap_nr;
> > +	} else if (bitmap_is_midx(bitmap_git)) {
> >  		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
> >  			struct bitmapped_pack pack;
> >  			if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {
> 
> This all makes sense to me. I think we could make the result slightly
> more readable by handling the case where we're doing multi-pack reuse
> separately from the case where we're not.
> 
> I tried to make that change locally to see if it was a good idea, and
> I'm reasonably happy with the result. I can't think of a great way to
> talk about it without just showing the resulting patch (as the
> inter-diff is fairly difficult to read IMHO). So here is the resulting
> patch (forging your s-o-b):

Yup, the result indeed looks nicer, thanks!

[snip]
> The way I would structure this series is to first apply the portion of
> the above patch *without* these lines:
> 
> -	if (bitmap_is_midx(bitmap_git)) {
> +	if (!bitmap_is_midx(bitmap_git) ||
> +	    !bitmap_git->midx->chunk_bitmapped_packs)
> +		multi_pack_reuse = 0;
> +
> 
> , so we're still able to reproduce the issue. Then, apply the remaining
> portions (the above diff, the test, and the GIT_TEST_MIDX_WRITE_BTMP
> stuff) to demonstrate that the issue is fixed via a separate commit.
> 
> I'm happy to write that up, and equally happy to not do it ;-). Sorry
> for the lengthy review, but thank you very much for spotting and fixing
> this issue.

I'd prefer to leave it as a single patch. Junio has expressed at times
that he doesn't see much value in these splits only to demonstrate a
broken test. An interested reader can easily revert the fix and see that
the test would fail.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2] pack-bitmap: gracefully handle missing BTMP chunks
  2024-04-09  5:59 [PATCH] pack-bitmap: gracefully handle missing BTMP chunks Patrick Steinhardt
  2024-04-10 15:02 ` Taylor Blau
@ 2024-04-15  6:41 ` Patrick Steinhardt
  2024-04-15  8:51   ` Patrick Steinhardt
  2024-04-15 22:51   ` Taylor Blau
  1 sibling, 2 replies; 13+ messages in thread
From: Patrick Steinhardt @ 2024-04-15  6:41 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 11982 bytes --]

In 0fea6b73f1 (Merge branch 'tb/multi-pack-verbatim-reuse', 2024-01-12)
we have introduced multi-pack verbatim reuse of objects. This series has
introduced a new BTMP chunk, which encodes information about bitmapped
objects in the multi-pack index. Starting with dab60934e3 (pack-bitmap:
pass `bitmapped_pack` struct to pack-reuse functions, 2023-12-14) we use
this information to figure out objects which we can reuse from each of
the packfiles.

One thing that we glossed over though is backwards compatibility with
repositories that do not yet have BTMP chunks in their multi-pack index.
In that case, `nth_bitmapped_pack()` would return an error, which causes
us to emit a warning followed by another error message. These warnings
are visible to users that fetch from a repository:

```
$ git fetch
...
remote: error: MIDX does not contain the BTMP chunk
remote: warning: unable to load pack: 'pack-f6bb7bd71d345ea9fe604b60cab9ba9ece54ffbe.idx', disabling pack-reuse
remote: Enumerating objects: 40, done.
remote: Counting objects: 100% (40/40), done.
remote: Compressing objects: 100% (39/39), done.
remote: Total 40 (delta 5), reused 0 (delta 0), pack-reused 0 (from 0)
...
```

While the fetch succeeds the user is left wondering what they did wrong.
Furthermore, as visible both from the warning and from the reuse stats,
pack-reuse is completely disabled in such repositories.

What is quite interesting is that this issue can even be triggered in
case `pack.allowPackReuse=single` is set, which is the default value.
One could have expected that in this case we fall back to the old logic,
which is to use the preferred packfile without consulting BTMP chunks at
all. But either we fail with the above error in case they are missing,
or we use the first pack in the multi-pack-index. The former case
disables pack-reuse altogether, whereas the latter case may result in
reusing objects from a suboptimal packfile.

Fix this issue by partially reverting the logic back to what we had
before this patch series landed. Namely, in the case where we have no
BTMP chunks or when `pack.allowPackReuse=single` are set, we use the
preferred pack instead of consulting the BTMP chunks.

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
Range-diff against v1:
1:  5933a302b5 ! 1:  a8251f8278 pack-bitmap: gracefully handle missing BTMP chunks
    @@ Commit message
         BTMP chunks or when `pack.allowPackReuse=single` are set, we use the
         preferred pack instead of consulting the BTMP chunks.
     
    +    Helped-by: Taylor Blau <me@ttaylorr.com>
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## midx.c ##
    -@@ midx.c: static int write_midx_internal(const char *object_dir,
    - 		add_chunk(cf, MIDX_CHUNKID_REVINDEX,
    - 			  st_mult(ctx.entries_nr, sizeof(uint32_t)),
    - 			  write_midx_revindex);
    --		add_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
    --			  bitmapped_packs_concat_len,
    --			  write_midx_bitmapped_packs);
    -+		if (git_env_bool("GIT_TEST_MIDX_WRITE_BTMP", 1))
    -+			add_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
    -+				  bitmapped_packs_concat_len,
    -+				  write_midx_bitmapped_packs);
    - 	}
    +@@ midx.c: struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
      
    - 	write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
    + 	pair_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, &m->chunk_large_offsets,
    + 		   &m->chunk_large_offsets_len);
    +-	pair_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
    +-		   (const unsigned char **)&m->chunk_bitmapped_packs,
    +-		   &m->chunk_bitmapped_packs_len);
    ++	if (git_env_bool("GIT_TEST_MIDX_READ_BTMP", 1))
    ++		pair_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
    ++			   (const unsigned char **)&m->chunk_bitmapped_packs,
    ++			   &m->chunk_bitmapped_packs_len);
    + 
    + 	if (git_env_bool("GIT_TEST_MIDX_READ_RIDX", 1))
    + 		pair_chunk(cf, MIDX_CHUNKID_REVINDEX, &m->chunk_revindex,
     
      ## pack-bitmap.c ##
     @@ pack-bitmap.c: void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
    @@ pack-bitmap.c: void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitm
      	load_reverse_index(r, bitmap_git);
      
     -	if (bitmap_is_midx(bitmap_git)) {
    -+	if (bitmap_is_midx(bitmap_git) &&
    -+	    (!multi_pack_reuse || !bitmap_git->midx->chunk_bitmapped_packs)) {
    -+		uint32_t preferred_pack_pos;
    -+		struct packed_git *pack;
    -+
    -+		if (midx_preferred_pack(bitmap_git->midx, &preferred_pack_pos) < 0) {
    -+			warning(_("unable to compute preferred pack, disabling pack-reuse"));
    -+			return;
    -+		}
    -+
    -+		pack = bitmap_git->midx->packs[preferred_pack_pos];
    ++	if (!bitmap_is_midx(bitmap_git) || !bitmap_git->midx->chunk_bitmapped_packs)
    ++		multi_pack_reuse = 0;
     +
    -+		ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
    -+		packs[packs_nr].p = pack;
    -+		packs[packs_nr].bitmap_nr = pack->num_objects;
    -+		packs[packs_nr].bitmap_pos = 0;
    -+
    -+		objects_nr = packs[packs_nr++].bitmap_nr;
    -+	} else if (bitmap_is_midx(bitmap_git)) {
    ++	if (multi_pack_reuse) {
      		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
      			struct bitmapped_pack pack;
      			if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {
    @@ pack-bitmap.c: void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitm
      		}
      
      		QSORT(packs, packs_nr, bitmapped_pack_cmp);
    + 	} else {
    +-		ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
    ++		struct packed_git *pack;
    ++
    ++		if (bitmap_is_midx(bitmap_git)) {
    ++			uint32_t preferred_pack_pos;
    ++
    ++			if (midx_preferred_pack(bitmap_git->midx, &preferred_pack_pos) < 0) {
    ++				warning(_("unable to compute preferred pack, disabling pack-reuse"));
    ++				return;
    ++			}
    + 
    +-		packs[packs_nr].p = bitmap_git->pack;
    +-		packs[packs_nr].bitmap_nr = bitmap_git->pack->num_objects;
    ++			pack = bitmap_git->midx->packs[preferred_pack_pos];
    ++		} else {
    ++			pack = bitmap_git->pack;
    ++		}
    ++
    ++		ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
    ++		packs[packs_nr].p = pack;
    ++		packs[packs_nr].bitmap_nr = pack->num_objects;
    + 		packs[packs_nr].bitmap_pos = 0;
    + 
    + 		objects_nr = packs[packs_nr++].bitmap_nr;
     
      ## t/t5326-multi-pack-bitmaps.sh ##
     @@ t/t5326-multi-pack-bitmaps.sh: test_expect_success 'corrupt MIDX with bitmap causes fallback' '
    @@ t/t5326-multi-pack-bitmaps.sh: test_expect_success 'corrupt MIDX with bitmap cau
      
     +for allow_pack_reuse in single multi
     +do
    -+	test_expect_success "MIDX without BTMP chunk does not complain with $allow_pack_reuse pack reuse" '
    ++	test_expect_success "reading MIDX without BTMP chunk does not complain with $allow_pack_reuse pack reuse" '
     +		test_when_finished "rm -rf midx-without-btmp" &&
     +		git init midx-without-btmp &&
     +		(
     +			cd midx-without-btmp &&
     +			test_commit initial &&
     +
    -+			# Write a multi-pack index that does have a bitmap, but
    -+			# no BTMP chunk. Such MIDX files would not be generated
    -+			# by modern Git anymore, but they were generated by
    -+			# older Git versions.
    -+			GIT_TEST_MIDX_WRITE_BTMP=false \
    -+				git repack -Adbl --write-bitmap-index --write-midx &&
    -+			git -c pack.allowPackReuse=$allow_pack_reuse \
    ++			git repack -Adbl --write-bitmap-index --write-midx &&
    ++			GIT_TEST_MIDX_READ_BTMP=false git -c pack.allowPackReuse=$allow_pack_reuse \
     +				pack-objects --all --use-bitmap-index --stdout </dev/null >/dev/null 2>err &&
     +			test_must_be_empty err
     +		)

 midx.c                        |  7 +++---
 pack-bitmap.c                 | 41 ++++++++++++++++++-----------------
 t/t5326-multi-pack-bitmaps.sh | 17 +++++++++++++++
 3 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/midx.c b/midx.c
index ae3b49166c..6f07de3688 100644
--- a/midx.c
+++ b/midx.c
@@ -170,9 +170,10 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 
 	pair_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, &m->chunk_large_offsets,
 		   &m->chunk_large_offsets_len);
-	pair_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
-		   (const unsigned char **)&m->chunk_bitmapped_packs,
-		   &m->chunk_bitmapped_packs_len);
+	if (git_env_bool("GIT_TEST_MIDX_READ_BTMP", 1))
+		pair_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
+			   (const unsigned char **)&m->chunk_bitmapped_packs,
+			   &m->chunk_bitmapped_packs_len);
 
 	if (git_env_bool("GIT_TEST_MIDX_READ_RIDX", 1))
 		pair_chunk(cf, MIDX_CHUNKID_REVINDEX, &m->chunk_revindex,
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 2baeabacee..35c5ef9d3c 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2049,7 +2049,10 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 
 	load_reverse_index(r, bitmap_git);
 
-	if (bitmap_is_midx(bitmap_git)) {
+	if (!bitmap_is_midx(bitmap_git) || !bitmap_git->midx->chunk_bitmapped_packs)
+		multi_pack_reuse = 0;
+
+	if (multi_pack_reuse) {
 		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
 			struct bitmapped_pack pack;
 			if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {
@@ -2062,34 +2065,32 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 			if (!pack.bitmap_nr)
 				continue;
 
-			if (!multi_pack_reuse && pack.bitmap_pos) {
-				/*
-				 * If we're only reusing a single pack, skip
-				 * over any packs which are not positioned at
-				 * the beginning of the MIDX bitmap.
-				 *
-				 * This is consistent with the existing
-				 * single-pack reuse behavior, which only reuses
-				 * parts of the MIDX's preferred pack.
-				 */
-				continue;
-			}
-
 			ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
 			memcpy(&packs[packs_nr++], &pack, sizeof(pack));
 
 			objects_nr += pack.p->num_objects;
-
-			if (!multi_pack_reuse)
-				break;
 		}
 
 		QSORT(packs, packs_nr, bitmapped_pack_cmp);
 	} else {
+		struct packed_git *pack;
+
+		if (bitmap_is_midx(bitmap_git)) {
+			uint32_t preferred_pack_pos;
+
+			if (midx_preferred_pack(bitmap_git->midx, &preferred_pack_pos) < 0) {
+				warning(_("unable to compute preferred pack, disabling pack-reuse"));
+				return;
+			}
+
+			pack = bitmap_git->midx->packs[preferred_pack_pos];
+		} else {
+			pack = bitmap_git->pack;
+		}
+
 		ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
-
-		packs[packs_nr].p = bitmap_git->pack;
-		packs[packs_nr].bitmap_nr = bitmap_git->pack->num_objects;
+		packs[packs_nr].p = pack;
+		packs[packs_nr].bitmap_nr = pack->num_objects;
 		packs[packs_nr].bitmap_pos = 0;
 
 		objects_nr = packs[packs_nr++].bitmap_nr;
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 70d1b58709..5d7d321840 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -513,4 +513,21 @@ test_expect_success 'corrupt MIDX with bitmap causes fallback' '
 	)
 '
 
+for allow_pack_reuse in single multi
+do
+	test_expect_success "reading MIDX without BTMP chunk does not complain with $allow_pack_reuse pack reuse" '
+		test_when_finished "rm -rf midx-without-btmp" &&
+		git init midx-without-btmp &&
+		(
+			cd midx-without-btmp &&
+			test_commit initial &&
+
+			git repack -Adbl --write-bitmap-index --write-midx &&
+			GIT_TEST_MIDX_READ_BTMP=false git -c pack.allowPackReuse=$allow_pack_reuse \
+				pack-objects --all --use-bitmap-index --stdout </dev/null >/dev/null 2>err &&
+			test_must_be_empty err
+		)
+	'
+done
+
 test_done

base-commit: 19981daefd7c147444462739375462b49412ce33
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] pack-bitmap: gracefully handle missing BTMP chunks
  2024-04-15  6:41 ` [PATCH v2] " Patrick Steinhardt
@ 2024-04-15  8:51   ` Patrick Steinhardt
  2024-04-15 17:41     ` Junio C Hamano
  2024-04-15 22:51   ` Taylor Blau
  1 sibling, 1 reply; 13+ messages in thread
From: Patrick Steinhardt @ 2024-04-15  8:51 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2706 bytes --]

On Mon, Apr 15, 2024 at 08:41:25AM +0200, Patrick Steinhardt wrote:
> In 0fea6b73f1 (Merge branch 'tb/multi-pack-verbatim-reuse', 2024-01-12)
> we have introduced multi-pack verbatim reuse of objects. This series has
> introduced a new BTMP chunk, which encodes information about bitmapped
> objects in the multi-pack index. Starting with dab60934e3 (pack-bitmap:
> pass `bitmapped_pack` struct to pack-reuse functions, 2023-12-14) we use
> this information to figure out objects which we can reuse from each of
> the packfiles.
> 
> One thing that we glossed over though is backwards compatibility with
> repositories that do not yet have BTMP chunks in their multi-pack index.
> In that case, `nth_bitmapped_pack()` would return an error, which causes
> us to emit a warning followed by another error message. These warnings
> are visible to users that fetch from a repository:
> 
> ```
> $ git fetch
> ...
> remote: error: MIDX does not contain the BTMP chunk
> remote: warning: unable to load pack: 'pack-f6bb7bd71d345ea9fe604b60cab9ba9ece54ffbe.idx', disabling pack-reuse
> remote: Enumerating objects: 40, done.
> remote: Counting objects: 100% (40/40), done.
> remote: Compressing objects: 100% (39/39), done.
> remote: Total 40 (delta 5), reused 0 (delta 0), pack-reused 0 (from 0)
> ...
> ```
> 
> While the fetch succeeds the user is left wondering what they did wrong.
> Furthermore, as visible both from the warning and from the reuse stats,
> pack-reuse is completely disabled in such repositories.
> 
> What is quite interesting is that this issue can even be triggered in
> case `pack.allowPackReuse=single` is set, which is the default value.
> One could have expected that in this case we fall back to the old logic,
> which is to use the preferred packfile without consulting BTMP chunks at
> all. But either we fail with the above error in case they are missing,
> or we use the first pack in the multi-pack-index. The former case
> disables pack-reuse altogether, whereas the latter case may result in
> reusing objects from a suboptimal packfile.
> 
> Fix this issue by partially reverting the logic back to what we had
> before this patch series landed. Namely, in the case where we have no
> BTMP chunks or when `pack.allowPackReuse=single` are set, we use the
> preferred pack instead of consulting the BTMP chunks.
> 
> Helped-by: Taylor Blau <me@ttaylorr.com>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>

Junio, it would be great if we could still land this fix in Git v2.45
given that it is addressing a regression in Git v2.44. This of course
assumes that the current version of this patch looks good to Taylor.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] pack-bitmap: gracefully handle missing BTMP chunks
  2024-04-15  8:51   ` Patrick Steinhardt
@ 2024-04-15 17:41     ` Junio C Hamano
  2024-04-15 22:51       ` Taylor Blau
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2024-04-15 17:41 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Taylor Blau

Patrick Steinhardt <ps@pks.im> writes:

>> Helped-by: Taylor Blau <me@ttaylorr.com>
>> Signed-off-by: Patrick Steinhardt <ps@pks.im>
>
> Junio, it would be great if we could still land this fix in Git v2.45
> given that it is addressing a regression in Git v2.44. This of course
> assumes that the current version of this patch looks good to Taylor.

Indeed.  It would be nice to see an acked by or something.

Will queue, in the meantime.  Thanks for a ping.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] pack-bitmap: gracefully handle missing BTMP chunks
  2024-04-15  6:34   ` Patrick Steinhardt
@ 2024-04-15 22:42     ` Taylor Blau
  0 siblings, 0 replies; 13+ messages in thread
From: Taylor Blau @ 2024-04-15 22:42 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On Mon, Apr 15, 2024 at 08:34:33AM +0200, Patrick Steinhardt wrote:
> > Of course, that is still using a GIT_TEST_-variable, which is less than
> > ideal IMHO. The other alternative would be to store a MIDX file as a
> > test fixture in the tree (which we do in a couple of places). But with
> > the recent xz shenanigans, I'm not sure that's a great idea either ;-).
>
> I'm happy to convert this to use `GIT_TEST_MIDX_READ_BTMP` instead.

Thanks, I think that is definitely an improvement.

> > The way I would structure this series is to first apply the portion of
> > the above patch *without* these lines:
> >
> > -	if (bitmap_is_midx(bitmap_git)) {
> > +	if (!bitmap_is_midx(bitmap_git) ||
> > +	    !bitmap_git->midx->chunk_bitmapped_packs)
> > +		multi_pack_reuse = 0;
> > +
> >
> > , so we're still able to reproduce the issue. Then, apply the remaining
> > portions (the above diff, the test, and the GIT_TEST_MIDX_WRITE_BTMP
> > stuff) to demonstrate that the issue is fixed via a separate commit.
> >
> > I'm happy to write that up, and equally happy to not do it ;-). Sorry
> > for the lengthy review, but thank you very much for spotting and fixing
> > this issue.
>
> I'd prefer to leave it as a single patch. Junio has expressed at times
> that he doesn't see much value in these splits only to demonstrate a
> broken test. An interested reader can easily revert the fix and see that
> the test would fail.

Yeah, to be clear, I wasn't suggesting adding a test_expect_failure
here. I was suggesting instead that you do a refactoring that doesn't
change the behavior in the first step, and then add the test plus the
fix (after it is made easier to land by the first step) in the second
step.

But I don't feel strongly about it whatsoever, so I think that the new
version you have here is fine. I'll take a closer look at it right
now...

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] pack-bitmap: gracefully handle missing BTMP chunks
  2024-04-15  6:41 ` [PATCH v2] " Patrick Steinhardt
  2024-04-15  8:51   ` Patrick Steinhardt
@ 2024-04-15 22:51   ` Taylor Blau
  2024-04-16  4:47     ` Patrick Steinhardt
  1 sibling, 1 reply; 13+ messages in thread
From: Taylor Blau @ 2024-04-15 22:51 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On Mon, Apr 15, 2024 at 08:41:25AM +0200, Patrick Steinhardt wrote:
> diff --git a/midx.c b/midx.c
> index ae3b49166c..6f07de3688 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -170,9 +170,10 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
>
>  	pair_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, &m->chunk_large_offsets,
>  		   &m->chunk_large_offsets_len);
> -	pair_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
> -		   (const unsigned char **)&m->chunk_bitmapped_packs,
> -		   &m->chunk_bitmapped_packs_len);
> +	if (git_env_bool("GIT_TEST_MIDX_READ_BTMP", 1))
> +		pair_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
> +			   (const unsigned char **)&m->chunk_bitmapped_packs,
> +			   &m->chunk_bitmapped_packs_len);

OK, so we're switching to a new GIT_TEST_-variable here, which controls
whether or not we read the BTMP chunk. That makes sense, and is much
appreciated :-).

> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 2baeabacee..35c5ef9d3c 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -2049,7 +2049,10 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
>
>  	load_reverse_index(r, bitmap_git);
>
> -	if (bitmap_is_midx(bitmap_git)) {
> +	if (!bitmap_is_midx(bitmap_git) || !bitmap_git->midx->chunk_bitmapped_packs)
> +		multi_pack_reuse = 0;
> +

Either we don't have a MIDX, or we do, but it doesn't have a BTMP chunk.
In either case, we should disable multi-pack reuse (either using the
single pack corresponding with a classic pack-bitmap, or the preferred
pack if using a MIDX bitamp written prior to the BTMP chunk).

Looking good.

> +	if (multi_pack_reuse) {
>  		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
>  			struct bitmapped_pack pack;
>  			if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {
> @@ -2062,34 +2065,32 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
>  			if (!pack.bitmap_nr)
>  				continue;
>
> -			if (!multi_pack_reuse && pack.bitmap_pos) {
> -				/*
> -				 * If we're only reusing a single pack, skip
> -				 * over any packs which are not positioned at
> -				 * the beginning of the MIDX bitmap.
> -				 *
> -				 * This is consistent with the existing
> -				 * single-pack reuse behavior, which only reuses
> -				 * parts of the MIDX's preferred pack.
> -				 */
> -				continue;
> -			}

Yep, this hunk can go since it used to belong to the outer if-statement
in the pre-image that was conditioned on 'bitmap_is_midx()'. This is
dealt with separately, since we know ahead of time we're doing
multi-pack reuse (and can do so).
> -
>  			ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
>  			memcpy(&packs[packs_nr++], &pack, sizeof(pack));
>
>  			objects_nr += pack.p->num_objects;
> -
> -			if (!multi_pack_reuse)
> -				break;
>  		}
>
>  		QSORT(packs, packs_nr, bitmapped_pack_cmp);
>  	} else {
> +		struct packed_git *pack;
> +
> +		if (bitmap_is_midx(bitmap_git)) {
> +			uint32_t preferred_pack_pos;
> +
> +			if (midx_preferred_pack(bitmap_git->midx, &preferred_pack_pos) < 0) {
> +				warning(_("unable to compute preferred pack, disabling pack-reuse"));
> +				return;
> +			}
> +
> +			pack = bitmap_git->midx->packs[preferred_pack_pos];
> +		} else {
> +			pack = bitmap_git->pack;
> +		}
> +

Looking good. Here we're doing single-pack reuse (either from the pack
corresponding with the bitmap or the MIDX's preferred pack). Either way
we set the 'pack' variable to point at the appropriate pack, and then
add that pack to the list of reusable packs below. Good.

>  		ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
> -
> -		packs[packs_nr].p = bitmap_git->pack;
> -		packs[packs_nr].bitmap_nr = bitmap_git->pack->num_objects;
> +		packs[packs_nr].p = pack;
> +		packs[packs_nr].bitmap_nr = pack->num_objects;
>  		packs[packs_nr].bitmap_pos = 0;
>
>  		objects_nr = packs[packs_nr++].bitmap_nr;

Makes sense.

> diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
> index 70d1b58709..5d7d321840 100755
> --- a/t/t5326-multi-pack-bitmaps.sh
> +++ b/t/t5326-multi-pack-bitmaps.sh
> @@ -513,4 +513,21 @@ test_expect_success 'corrupt MIDX with bitmap causes fallback' '
>  	)
>  '
>
> +for allow_pack_reuse in single multi
> +do
> +	test_expect_success "reading MIDX without BTMP chunk does not complain with $allow_pack_reuse pack reuse" '
> +		test_when_finished "rm -rf midx-without-btmp" &&
> +		git init midx-without-btmp &&
> +		(
> +			cd midx-without-btmp &&
> +			test_commit initial &&
> +
> +			git repack -Adbl --write-bitmap-index --write-midx &&

`-b` is redundant with `--write-bitmap-index`.

> +			GIT_TEST_MIDX_READ_BTMP=false git -c pack.allowPackReuse=$allow_pack_reuse \
> +				pack-objects --all --use-bitmap-index --stdout </dev/null >/dev/null 2>err &&

A small note here, but setting stdin to read from /dev/null is
unnecessary with `--all.`

> +			test_must_be_empty err
> +		)
> +	'
> +done
> +

This test looks like it's exercising the right thing, but I'm not sure
why it was split into two separate tests. Perhaps to allow the two to
fail separately?

Either way, the repository initialization, test_commit, and repacking
could probably be combined into a single step to avoid re-running them
for different values of $allow_pack_reuse.

I would probably have written:

    git init midx-without-btmp &&
    (
        cd midx-without-btmp &&

        test_commit base &&
        git repack -adb --write-midx &&

        for c in single multi
        do
            GIT_TEST_MIDX_READ_BTMP=false git -c pack.allowPackReuse=$c pack-objects \
              --all --use-bitmap-index --stdout >/dev/null 2>err &&
            test_must_be_empty err || return 1
        done
    )

TBH, I would like to see this test cleaned up before merging this one
down. But otherwise this patch is looking good.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] pack-bitmap: gracefully handle missing BTMP chunks
  2024-04-15 17:41     ` Junio C Hamano
@ 2024-04-15 22:51       ` Taylor Blau
  2024-04-15 23:46         ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Taylor Blau @ 2024-04-15 22:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git

On Mon, Apr 15, 2024 at 10:41:09AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> >> Helped-by: Taylor Blau <me@ttaylorr.com>
> >> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> >
> > Junio, it would be great if we could still land this fix in Git v2.45
> > given that it is addressing a regression in Git v2.44. This of course
> > assumes that the current version of this patch looks good to Taylor.
>
> Indeed.  It would be nice to see an acked by or something.
>
> Will queue, in the meantime.  Thanks for a ping.

I took a look, and I think the patch is good. I have a couple of notes
on the test that I would prefer to see addressed before merging it down,
though.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] pack-bitmap: gracefully handle missing BTMP chunks
  2024-04-15 22:51       ` Taylor Blau
@ 2024-04-15 23:46         ` Junio C Hamano
  0 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2024-04-15 23:46 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Patrick Steinhardt, git

Taylor Blau <me@ttaylorr.com> writes:

> On Mon, Apr 15, 2024 at 10:41:09AM -0700, Junio C Hamano wrote:
>> Patrick Steinhardt <ps@pks.im> writes:
>>
>> >> Helped-by: Taylor Blau <me@ttaylorr.com>
>> >> Signed-off-by: Patrick Steinhardt <ps@pks.im>
>> >
>> > Junio, it would be great if we could still land this fix in Git v2.45
>> > given that it is addressing a regression in Git v2.44. This of course
>> > assumes that the current version of this patch looks good to Taylor.
>>
>> Indeed.  It would be nice to see an acked by or something.
>>
>> Will queue, in the meantime.  Thanks for a ping.
>
> I took a look, and I think the patch is good. I have a couple of notes
> on the test that I would prefer to see addressed before merging it down,
> though.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] pack-bitmap: gracefully handle missing BTMP chunks
  2024-04-15 22:51   ` Taylor Blau
@ 2024-04-16  4:47     ` Patrick Steinhardt
  2024-04-16  5:12       ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Patrick Steinhardt @ 2024-04-16  4:47 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2860 bytes --]

On Mon, Apr 15, 2024 at 06:51:16PM -0400, Taylor Blau wrote:
> On Mon, Apr 15, 2024 at 08:41:25AM +0200, Patrick Steinhardt wrote:
[snip]
> > diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
> > index 70d1b58709..5d7d321840 100755
> > --- a/t/t5326-multi-pack-bitmaps.sh
> > +++ b/t/t5326-multi-pack-bitmaps.sh
> > @@ -513,4 +513,21 @@ test_expect_success 'corrupt MIDX with bitmap causes fallback' '
> >  	)
> >  '
> >
> > +for allow_pack_reuse in single multi
> > +do
> > +	test_expect_success "reading MIDX without BTMP chunk does not complain with $allow_pack_reuse pack reuse" '
> > +		test_when_finished "rm -rf midx-without-btmp" &&
> > +		git init midx-without-btmp &&
> > +		(
> > +			cd midx-without-btmp &&
> > +			test_commit initial &&
> > +
> > +			git repack -Adbl --write-bitmap-index --write-midx &&
> 
> `-b` is redundant with `--write-bitmap-index`.

Oops, right.

> > +			GIT_TEST_MIDX_READ_BTMP=false git -c pack.allowPackReuse=$allow_pack_reuse \
> > +				pack-objects --all --use-bitmap-index --stdout </dev/null >/dev/null 2>err &&
> 
> A small note here, but setting stdin to read from /dev/null is
> unnecessary with `--all.`

Is it really? Executing `git pack-objects --all --stdout` on my system
blocks until stdin is closed. It _seems_ to work in the tests alright,
but doesn't work outside of them. Which is puzzling on its own.

> > +			test_must_be_empty err
> > +		)
> > +	'
> > +done
> > +
> 
> This test looks like it's exercising the right thing, but I'm not sure
> why it was split into two separate tests. Perhaps to allow the two to
> fail separately?

Exactly. It makes it easier to see which of both tests fails in case
only one does.

> Either way, the repository initialization, test_commit, and repacking
> could probably be combined into a single step to avoid re-running them
> for different values of $allow_pack_reuse.
> 
> I would probably have written:
> 
>     git init midx-without-btmp &&
>     (
>         cd midx-without-btmp &&
> 
>         test_commit base &&
>         git repack -adb --write-midx &&
> 
>         for c in single multi
>         do
>             GIT_TEST_MIDX_READ_BTMP=false git -c pack.allowPackReuse=$c pack-objects \
>               --all --use-bitmap-index --stdout >/dev/null 2>err &&
>             test_must_be_empty err || return 1
>         done
>     )
> 
> TBH, I would like to see this test cleaned up before merging this one
> down. But otherwise this patch is looking good.

So I'm a bit torn here. I think your proposed way to test things is
inferior regarding usability, even though it is superior regarding
performance. We could move the common setup into a separate test, but
that has the issue that tests cannot easily be run as self-contained
units.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] pack-bitmap: gracefully handle missing BTMP chunks
  2024-04-16  4:47     ` Patrick Steinhardt
@ 2024-04-16  5:12       ` Jeff King
  2024-04-16  5:14         ` Patrick Steinhardt
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2024-04-16  5:12 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Taylor Blau, git

On Tue, Apr 16, 2024 at 06:47:51AM +0200, Patrick Steinhardt wrote:

> > > +			GIT_TEST_MIDX_READ_BTMP=false git -c pack.allowPackReuse=$allow_pack_reuse \
> > > +				pack-objects --all --use-bitmap-index --stdout </dev/null >/dev/null 2>err &&
> > 
> > A small note here, but setting stdin to read from /dev/null is
> > unnecessary with `--all.`
> 
> Is it really? Executing `git pack-objects --all --stdout` on my system
> blocks until stdin is closed. It _seems_ to work in the tests alright,
> but doesn't work outside of them. Which is puzzling on its own.

Inside a test_expect block, stdin is already redirected from /dev/null.
See 781f76b158 (test-lib: redirect stdin of tests, 2011-12-15).

I do think it's still good practice to redirect from /dev/null
explicitly to indicate the intent.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] pack-bitmap: gracefully handle missing BTMP chunks
  2024-04-16  5:12       ` Jeff King
@ 2024-04-16  5:14         ` Patrick Steinhardt
  0 siblings, 0 replies; 13+ messages in thread
From: Patrick Steinhardt @ 2024-04-16  5:14 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git

[-- Attachment #1: Type: text/plain, Size: 950 bytes --]

On Tue, Apr 16, 2024 at 01:12:32AM -0400, Jeff King wrote:
> On Tue, Apr 16, 2024 at 06:47:51AM +0200, Patrick Steinhardt wrote:
> 
> > > > +			GIT_TEST_MIDX_READ_BTMP=false git -c pack.allowPackReuse=$allow_pack_reuse \
> > > > +				pack-objects --all --use-bitmap-index --stdout </dev/null >/dev/null 2>err &&
> > > 
> > > A small note here, but setting stdin to read from /dev/null is
> > > unnecessary with `--all.`
> > 
> > Is it really? Executing `git pack-objects --all --stdout` on my system
> > blocks until stdin is closed. It _seems_ to work in the tests alright,
> > but doesn't work outside of them. Which is puzzling on its own.
> 
> Inside a test_expect block, stdin is already redirected from /dev/null.
> See 781f76b158 (test-lib: redirect stdin of tests, 2011-12-15).
> 
> I do think it's still good practice to redirect from /dev/null
> explicitly to indicate the intent.

Ah, that explains. Thanks!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-04-16  5:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-09  5:59 [PATCH] pack-bitmap: gracefully handle missing BTMP chunks Patrick Steinhardt
2024-04-10 15:02 ` Taylor Blau
2024-04-15  6:34   ` Patrick Steinhardt
2024-04-15 22:42     ` Taylor Blau
2024-04-15  6:41 ` [PATCH v2] " Patrick Steinhardt
2024-04-15  8:51   ` Patrick Steinhardt
2024-04-15 17:41     ` Junio C Hamano
2024-04-15 22:51       ` Taylor Blau
2024-04-15 23:46         ` Junio C Hamano
2024-04-15 22:51   ` Taylor Blau
2024-04-16  4:47     ` Patrick Steinhardt
2024-04-16  5:12       ` Jeff King
2024-04-16  5:14         ` Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).