All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] repack: don't remove .keep packs with `--pack-kept-objects`
@ 2022-10-18  2:26 Taylor Blau
  2022-10-18  6:54 ` Jeff King
  0 siblings, 1 reply; 3+ messages in thread
From: Taylor Blau @ 2022-10-18  2:26 UTC (permalink / raw)
  To: git; +Cc: vdye, derrickstolee, peff

`git repack` supports a `--pack-kept-objects` flag which more or less
translates to whether or not we pass `--honor-pack-keep` down to `git
pack-objects` when assembling a new pack.

This behavior has existed since ee34a2bead (repack: add
`repack.packKeptObjects` config var, 2014-03-03). In that commit, the
documentation was extended to say:

    [...] Note that we still do not delete `.keep` packs after
    `pack-objects` finishes.

Unfortunately, this is not the case when `--pack-kept-objects` is
combined with a `--geometric` repack. When doing a geometric repack, we
include `.keep` packs when enumerating available packs only when
`pack_kept_objects` is set.

So this all works fine when `--no-pack-kept-objects` (or similar) is
given. Kept packs are excluded from the geometric roll-up, so when we go
to delete redundant packs (with `-d`), no `.keep` packs appear "below
the split" in our geometric progression.

But when `--pack-kept-objects` is given, things can go awry. Namely,
when a kept pack is included in the list of packs tracked by the
`pack_geometry` struct *and* part of the pack roll-up, we will delete
the `.keep` pack when we shouldn't.

Note that this *doesn't* result in object corruption, since the `.keep`
pack's objects are still present in the new pack. But the `.keep` pack
itself is removed, which violates our promise from back in ee34a2bead.

But there's more. Because `repack` computes the geometric roll-up
independently from selecting which packs belong in a MIDX (with
`--write-midx`), this can lead to odd behavior. Consider when a `.keep`
pack appears below the geometric split (ie., its objects will be part of
the new pack we generate).

We'll write a MIDX containing the new pack along with the existing
`.keep` pack. But because the `.keep` pack appears below the geometric
split line, we'll (incorrectly) try to remove it. While this doesn't
corrupt the repository, it does cause us to remove the MIDX we just
wrote, since removing that pack would invalidate the new MIDX.

Funny enough, this behavior became far less noticeable after e4d0c11c04
(repack: respect kept objects with '--write-midx -b', 2021-12-20), which
made `pack_kept_objects` be enabled by default only when we were writing
a non-MIDX bitmap.

But e4d0c11c04 didn't resolve this bug, it just made it harder to notice
unless callers explicitly passed `--pack-kept-objects`.

The solution is to avoid trying to remove `.keep` packs during
`--geometric` repacks, even when they appear below the geometric split
line, which is the approach this patch implements.

Co-authored-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
We at GitHub had a slow stream of repacks that removed the just-written
MIDX.

This had mostly been a mystery that didn't occur at high enough volume
to justify looking into. But it went away entirely after merging in
v2.36.x, which contains e4d0c11c04.

Some investigating with Victoria today led to the patch above, which is
still relevant since e4d0c11c04 papers over an existing bug.

 builtin/repack.c            |  5 +++++
 t/t7700-repack.sh           | 24 ++++++++++++++++++++++++
 t/t7703-repack-geometric.sh |  6 +++++-
 3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index a5bacc7797..f71909696d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -1089,6 +1089,11 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 				strbuf_addstr(&buf, pack_basename(p));
 				strbuf_strip_suffix(&buf, ".pack");

+				if ((p->pack_keep) ||
+				    (string_list_has_string(&existing_kept_packs,
+							    buf.buf)))
+					continue;
+
 				remove_redundant_pack(packdir, buf.buf);
 			}
 			strbuf_release(&buf);
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index ca45c4cd2c..df8e94d7a8 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -426,6 +426,30 @@ test_expect_success '--write-midx -b packs non-kept objects' '
 	)
 '

+test_expect_success '--write-midx with --pack-kept-objects' '
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit one &&
+		test_commit two &&
+
+		one="$(echo "one" | git pack-objects --revs $objdir/pack/pack)" &&
+		two="$(echo "one..two" | git pack-objects --revs $objdir/pack/pack)" &&
+
+		keep="$objdir/pack/pack-$one.keep" &&
+		touch "$keep" &&
+
+		git repack --write-midx --write-bitmap-index --geometric=2 -d \
+			--pack-kept-objects &&
+
+		test_path_is_file $keep &&
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap
+	)
+'
+
 test_expect_success TTY '--quiet disables progress' '
 	test_terminal env GIT_PROGRESS_DELAY=0 \
 		git -C midx repack -ad --quiet --write-midx 2>stderr &&
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index da87f8b2d8..8821fbd2dd 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -176,8 +176,12 @@ test_expect_success '--geometric ignores kept packs' '
 		# be repacked, too.
 		git repack --geometric 2 -d --pack-kept-objects &&

+		# After repacking, two packs remain: one new one (containing the
+		# objects in both the .keep and non-kept pack), and the .keep
+		# pack (since `--pack-kept-objects -d` does not actually delete
+		# the kept pack).
 		find $objdir/pack -name "*.pack" >after &&
-		test_line_count = 1 after
+		test_line_count = 2 after
 	)
 '

--
2.38.0.16.g393fd4c6db

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] repack: don't remove .keep packs with `--pack-kept-objects`
  2022-10-18  2:26 [PATCH] repack: don't remove .keep packs with `--pack-kept-objects` Taylor Blau
@ 2022-10-18  6:54 ` Jeff King
  2022-10-21  3:14   ` Taylor Blau
  0 siblings, 1 reply; 3+ messages in thread
From: Jeff King @ 2022-10-18  6:54 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, vdye, derrickstolee

On Mon, Oct 17, 2022 at 10:26:06PM -0400, Taylor Blau wrote:

> But when `--pack-kept-objects` is given, things can go awry. Namely,
> when a kept pack is included in the list of packs tracked by the
> `pack_geometry` struct *and* part of the pack roll-up, we will delete
> the `.keep` pack when we shouldn't.

Oops. Your explanation makes sense, and this is an easy case to miss.

After reading the message but before seeing the patch, I'd guess we just
need to add "if (!this_pack_is_kept)" to the removal loop.

And indeed, it's close to that:

> diff --git a/builtin/repack.c b/builtin/repack.c
> index a5bacc7797..f71909696d 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -1089,6 +1089,11 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  				strbuf_addstr(&buf, pack_basename(p));
>  				strbuf_strip_suffix(&buf, ".pack");
> 
> +				if ((p->pack_keep) ||
> +				    (string_list_has_string(&existing_kept_packs,
> +							    buf.buf)))
> +					continue;
> +
>  				remove_redundant_pack(packdir, buf.buf);
>  			}
>  			strbuf_release(&buf);

but what is the difference between p->pack_keep, and being in the
existing_kept_packs list? From the similar logic in
init_pack_geometry():

         /*
          * Any pack that has its pack_keep bit set will appear
          * in existing_kept_packs below, but this saves us from
          * doing a more expensive check.
          */
         if (p->pack_keep)
                 continue;

         /*
          * The pack may be kept via the --keep-pack option;
          * check 'existing_kept_packs' to determine whether to
          * ignore it.
          */
         strbuf_reset(&buf);
         strbuf_addstr(&buf, pack_basename(p));
         strbuf_strip_suffix(&buf, ".pack");

         if (string_list_has_string(existing_kept_packs, buf.buf))
                  continue;

it looks like one is a superset of the other, but checking p->pack_keep
is just slightly cheaper, so we do it first. And checking just
p->pack_keep isn't sufficient; the list has extra packs from the
--keep-pack command-line. So your new code checking both is correct.

But one thing to note. In that existing logic, the two checks are split
apart, since in the fast path (i.e., when p->pack_keep is set) we can
skip the extra work to form the pack string. We could do that here, but
I doubt it matters much:

  1. It's really not that expensive. It's a little extra string
     manipulation and a binary-search lookup for each pack.

  2. Normally most packs aren't kept, so we'd end up in the "slow" path
     most of the time anyway.

So I suspect that existing code is premature optimization, and you
really could just look in existing_kept_packs (both there and in this
patch).

But I don't mind this patch copying the existing logic in the meantime.
It's exactly this kind of "while we are in the area" cleanup where you
end up surprised that the p->pack_keep check really was covering some
subtle corner case. ;)

So the patch looks good to me as-is (and sorry for the verbose review;
we've just had enough tricky corner cases in this repack code that I
wanted to make sure I understood all the implications).

> This had mostly been a mystery that didn't occur at high enough volume
> to justify looking into. But it went away entirely after merging in
> v2.36.x, which contains e4d0c11c04.
> 
> Some investigating with Victoria today led to the patch above, which is
> still relevant since e4d0c11c04 papers over an existing bug.

Thank you both for digging into this. Because of the scale, GitHub has a
unique opportunity to trigger these kind of weird corner cases and races
that would otherwise go unreported and just cause occasional
head-scratching among people running sane-sized Git servers.

-Peff

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] repack: don't remove .keep packs with `--pack-kept-objects`
  2022-10-18  6:54 ` Jeff King
@ 2022-10-21  3:14   ` Taylor Blau
  0 siblings, 0 replies; 3+ messages in thread
From: Taylor Blau @ 2022-10-21  3:14 UTC (permalink / raw)
  To: Jeff King; +Cc: git, vdye, derrickstolee

On Tue, Oct 18, 2022 at 02:54:56AM -0400, Jeff King wrote:
> So the patch looks good to me as-is (and sorry for the verbose review;
> we've just had enough tricky corner cases in this repack code that I
> wanted to make sure I understood all the implications).

Yeah, I agree that this is probably premature optimization, but in
practice it doesn't hurt much. Copying it was definitely intentional,
and I share your general unease about changing it in this patch.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-10-21  3:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-18  2:26 [PATCH] repack: don't remove .keep packs with `--pack-kept-objects` Taylor Blau
2022-10-18  6:54 ` Jeff King
2022-10-21  3:14   ` Taylor Blau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.