All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] builtin/repack.c: avoid making cruft packs preferred
@ 2023-10-03 16:27 Taylor Blau
  2023-10-03 16:30 ` Taylor Blau
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Taylor Blau @ 2023-10-03 16:27 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano, Patrick Steinhardt

When doing a `--geometric` repack, we make sure that the preferred pack
(if writing a MIDX) is the largest pack that we *didn't* repack. That
has the effect of keeping the preferred pack in sync with the pack
containing a majority of the repository's reachable objects.

But if the repository happens to double in size, we'll repack
everything. Here we don't specify any `--preferred-pack`, and instead
let the MIDX code choose.

In the past, that worked fine, since there would only be one pack to
choose from: the one we just wrote. But it's no longer necessarily the
case that there is one pack to choose from. It's possible that the
repository also has a cruft pack, too.

If the cruft pack happens to come earlier in lexical order (and has an
earlier mtime than any non-cruft pack), we'll pick that pack as
preferred. This makes it impossible to reuse chunks of the reachable
pack verbatim from pack-objects, so is sub-optimal.

Luckily, this is a somewhat rare circumstance to be in, since we would
have to repack the entire repository during a `--geometric` repack, and
the cruft pack would have to sort ahead of the pack we just created.

Note that this behavior is usually just a performance regression. But
it's possible it could be a correctness issue.

Suppose an object was duplicated among the cruft and non-cruft pack. The
MIDX will pick the one from the pack with the lowest mtime, which will
always be the cruft one. But if the non-cruft pack happens to sort
earlier in lexical order, we'll treat that one as preferred, but not all
duplicates will be resolved in favor of that pack.

So if we happened to have an object which appears in both packs
(e.g., due to a cruft object being freshened, causing it to appear
loose, and then repacking it via the `--geometric` repack) it's possible
the duplicate would be picked from the non-preferred pack.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
I've had this sitting in my patch queue for a while now. It's a
non-critical performance fix that avoids the repack/MIDX machinery from
ever choosing a cruft pack as preferred when writing a MIDX bitmap
without a given --preferred-pack.

There is no correctness issue here, but choosing a pack with few/no
reachable objects means that our pack reuse mechanism will rarely kick
in, resulting in performance degradation.

 builtin/repack.c        | 47 ++++++++++++++++++++++++++++++++++++++++-
 t/t7704-repack-cruft.sh | 39 ++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 04770b15fe..a1a893d952 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -355,6 +355,18 @@ static struct generated_pack_data *populate_pack_exts(const char *name)
 	return data;
 }

+static int has_pack_ext(const struct generated_pack_data *data,
+			const char *ext)
+{
+	int i;
+	for (i = 0; i < ARRAY_SIZE(exts); i++) {
+		if (strcmp(exts[i].name, ext))
+			continue;
+		return !!data->tempfiles[i];
+	}
+	BUG("unknown pack extension: '%s'", ext);
+}
+
 static void repack_promisor_objects(const struct pack_objects_args *args,
 				    struct string_list *names)
 {
@@ -772,6 +784,7 @@ static void midx_included_packs(struct string_list *include,

 static int write_midx_included_packs(struct string_list *include,
 				     struct pack_geometry *geometry,
+				     struct string_list *names,
 				     const char *refs_snapshot,
 				     int show_progress, int write_bitmaps)
 {
@@ -801,6 +814,38 @@ static int write_midx_included_packs(struct string_list *include,
 	if (preferred)
 		strvec_pushf(&cmd.args, "--preferred-pack=%s",
 			     pack_basename(preferred));
+	else if (names->nr) {
+		/* The largest pack was repacked, meaning that either
+		 * one or two packs exist depending on whether the
+		 * repository has a cruft pack or not.
+		 *
+		 * Select the non-cruft one as preferred to encourage
+		 * pack-reuse among packs containing reachable objects
+		 * over unreachable ones.
+		 *
+		 * (Note we could write multiple packs here if
+		 * `--max-pack-size` was given, but any one of them
+		 * will suffice, so pick the first one.)
+		 */
+		for_each_string_list_item(item, names) {
+			struct generated_pack_data *data = item->util;
+			if (has_pack_ext(data, ".mtimes"))
+				continue;
+
+			strvec_pushf(&cmd.args, "--preferred-pack=pack-%s.pack",
+				     item->string);
+			break;
+		}
+	} else {
+		/*
+		 * No packs were kept, and no packs were written. The
+		 * only thing remaining are .keep packs (unless
+		 * --pack-kept-objects was given).
+		 *
+		 * Set the `--preferred-pack` arbitrarily here.
+		 */
+		;
+	}

 	if (refs_snapshot)
 		strvec_pushf(&cmd.args, "--refs-snapshot=%s", refs_snapshot);
@@ -1360,7 +1405,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		struct string_list include = STRING_LIST_INIT_NODUP;
 		midx_included_packs(&include, &existing, &names, &geometry);

-		ret = write_midx_included_packs(&include, &geometry,
+		ret = write_midx_included_packs(&include, &geometry, &names,
 						refs_snapshot ? get_tempfile_path(refs_snapshot) : NULL,
 						show_progress, write_bitmaps > 0);

diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index dc86ca8269..be3735dff0 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -372,4 +372,43 @@ test_expect_success '--max-cruft-size ignores non-local packs' '
 	)
 '

+test_expect_success 'reachable packs are preferred over cruft ones' '
+	repo="cruft-preferred-packs" &&
+	git init "$repo" &&
+	(
+		cd "$repo" &&
+
+		# This test needs to exercise careful control over when a MIDX
+		# is and is not written. Unset the corresponding TEST variable
+		# accordingly.
+		sane_unset GIT_TEST_MULTI_PACK_INDEX &&
+
+		test_commit base &&
+		test_commit --no-tag cruft &&
+
+		non_cruft="$(echo base | git pack-objects --revs $packdir/pack)" &&
+		# Write a cruft pack which both (a) sorts ahead of the non-cruft
+		# pack in lexical order, and (b) has an older mtime to appease
+		# the MIDX preferred pack selection routine.
+		cruft="$(echo pack-$non_cruft.pack | git pack-objects --cruft $packdir/pack-A)" &&
+		test-tool chmtime -1000 $packdir/pack-A-$cruft.pack &&
+
+		test_commit other &&
+		git repack -d &&
+
+		git repack --geometric 2 -d --write-midx --write-bitmap-index &&
+
+		# After repacking, there are two packs left: one reachable one
+		# (which is the result of combining both of the existing two
+		# non-cruft packs), and one cruft pack.
+		find .git/objects/pack -type f -name "*.pack" >packs &&
+		test_line_count = 2 packs &&
+
+		# Make sure that the pack we just wrote is marked as preferred,
+		# not the cruft one.
+		pack="$(test-tool read-midx --preferred-pack $objdir)" &&
+		test_path_is_missing "$packdir/$(basename "$pack" ".idx").mtimes"
+	)
+'
+
 test_done
--
2.42.0.311.ga7f7a0e966.dirty

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] builtin/repack.c: avoid making cruft packs preferred
  2023-10-03 16:27 [PATCH] builtin/repack.c: avoid making cruft packs preferred Taylor Blau
@ 2023-10-03 16:30 ` Taylor Blau
  2023-10-03 20:20   ` Junio C Hamano
  2023-10-03 20:16 ` Jeff King
  2023-10-03 21:54 ` [PATCH v2] " Taylor Blau
  2 siblings, 1 reply; 10+ messages in thread
From: Taylor Blau @ 2023-10-03 16:30 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano, Patrick Steinhardt

On Tue, Oct 03, 2023 at 12:27:51PM -0400, Taylor Blau wrote:
> I've had this sitting in my patch queue for a while now. It's a
> non-critical performance fix that avoids the repack/MIDX machinery from
> ever choosing a cruft pack as preferred when writing a MIDX bitmap
> without a given --preferred-pack.
>
> There is no correctness issue here, but choosing a pack with few/no
> reachable objects means that our pack reuse mechanism will rarely kick
> in, resulting in performance degradation.
>
>  builtin/repack.c        | 47 ++++++++++++++++++++++++++++++++++++++++-
>  t/t7704-repack-cruft.sh | 39 ++++++++++++++++++++++++++++++++++
>  2 files changed, 85 insertions(+), 1 deletion(-)

Oops, I should have mentioned that this is meant to be applied on top of
'tb/multi-cruft-pack' to reduce the conflict resolution burden. Sorry
about that.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] builtin/repack.c: avoid making cruft packs preferred
  2023-10-03 16:27 [PATCH] builtin/repack.c: avoid making cruft packs preferred Taylor Blau
  2023-10-03 16:30 ` Taylor Blau
@ 2023-10-03 20:16 ` Jeff King
  2023-10-03 21:52   ` Taylor Blau
  2023-10-03 21:54 ` [PATCH v2] " Taylor Blau
  2 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2023-10-03 20:16 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano, Patrick Steinhardt

On Tue, Oct 03, 2023 at 12:27:51PM -0400, Taylor Blau wrote:

> Note that this behavior is usually just a performance regression. But
> it's possible it could be a correctness issue.
> 
> Suppose an object was duplicated among the cruft and non-cruft pack. The
> MIDX will pick the one from the pack with the lowest mtime, which will
> always be the cruft one. But if the non-cruft pack happens to sort
> earlier in lexical order, we'll treat that one as preferred, but not all
> duplicates will be resolved in favor of that pack.
> 
> So if we happened to have an object which appears in both packs
> (e.g., due to a cruft object being freshened, causing it to appear
> loose, and then repacking it via the `--geometric` repack) it's possible
> the duplicate would be picked from the non-preferred pack.

I'm not sure I understand how that is a correctness issue. The contents
of the object are the same in either pack. Or do you mean that the
pack-reuse code in pack-objects.c may get confused and try to use the
wrong pack/offset when sending the object out? I would think it would
always be coming from the preferred pack for that code (so the outcome
is just that we fail to do the pack-reuse optimization very well, but we
don't generate a wrong answer).

Other than that, the explanation and patch make perfect sense to me, and
the patch looks good.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] builtin/repack.c: avoid making cruft packs preferred
  2023-10-03 16:30 ` Taylor Blau
@ 2023-10-03 20:20   ` Junio C Hamano
  2023-10-03 21:39     ` Taylor Blau
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2023-10-03 20:20 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Patrick Steinhardt

Taylor Blau <me@ttaylorr.com> writes:

> On Tue, Oct 03, 2023 at 12:27:51PM -0400, Taylor Blau wrote:
>> I've had this sitting in my patch queue for a while now. It's a
>> non-critical performance fix that avoids the repack/MIDX machinery from
>> ever choosing a cruft pack as preferred when writing a MIDX bitmap
>> without a given --preferred-pack.
>>
>> There is no correctness issue here, but choosing a pack with few/no
>> reachable objects means that our pack reuse mechanism will rarely kick
>> in, resulting in performance degradation.
>>
>>  builtin/repack.c        | 47 ++++++++++++++++++++++++++++++++++++++++-
>>  t/t7704-repack-cruft.sh | 39 ++++++++++++++++++++++++++++++++++
>>  2 files changed, 85 insertions(+), 1 deletion(-)
>
> Oops, I should have mentioned that this is meant to be applied on top of
> 'tb/multi-cruft-pack' to reduce the conflict resolution burden. Sorry
> about that.

Sorry, but I do not follow.  tb/multi-cruft-pack was merged to
'master' at c0b5d46d (Documentation/gitformat-pack.txt: drop mixed
version section, 2023-08-28) but back then t7704 did not exist.  Do
you mean the other topic in-flight from you about max-cruft-size?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] builtin/repack.c: avoid making cruft packs preferred
  2023-10-03 20:20   ` Junio C Hamano
@ 2023-10-03 21:39     ` Taylor Blau
  0 siblings, 0 replies; 10+ messages in thread
From: Taylor Blau @ 2023-10-03 21:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Patrick Steinhardt

On Tue, Oct 03, 2023 at 01:20:24PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > On Tue, Oct 03, 2023 at 12:27:51PM -0400, Taylor Blau wrote:
> >> I've had this sitting in my patch queue for a while now. It's a
> >> non-critical performance fix that avoids the repack/MIDX machinery from
> >> ever choosing a cruft pack as preferred when writing a MIDX bitmap
> >> without a given --preferred-pack.
> >>
> >> There is no correctness issue here, but choosing a pack with few/no
> >> reachable objects means that our pack reuse mechanism will rarely kick
> >> in, resulting in performance degradation.
> >>
> >>  builtin/repack.c        | 47 ++++++++++++++++++++++++++++++++++++++++-
> >>  t/t7704-repack-cruft.sh | 39 ++++++++++++++++++++++++++++++++++
> >>  2 files changed, 85 insertions(+), 1 deletion(-)
> >
> > Oops, I should have mentioned that this is meant to be applied on top of
> > 'tb/multi-cruft-pack' to reduce the conflict resolution burden. Sorry
> > about that.
>
> Sorry, but I do not follow.  tb/multi-cruft-pack was merged to
> 'master' at c0b5d46d (Documentation/gitformat-pack.txt: drop mixed
> version section, 2023-08-28) but back then t7704 did not exist.  Do
> you mean the other topic in-flight from you about max-cruft-size?

My mistake -- this should be applied on top of the patches at this
series:

    https://lore.kernel.org/git/cover.1696293862.git.me@ttaylorr.com/

which is the other topic that you're referring to. I ran "git
for-each-ref 'refs/remotes/origin/tb/*' | grep cruft" and mistakenly
grabbed 'tb/multi-cruft-pack' thinking that that was that series and not
the one already merged to 'master'.

But it looks like the series linked above hasn't yet grown a topic
branch in gitster/git yet, hence my mistake. Sorry about that!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] builtin/repack.c: avoid making cruft packs preferred
  2023-10-03 20:16 ` Jeff King
@ 2023-10-03 21:52   ` Taylor Blau
  0 siblings, 0 replies; 10+ messages in thread
From: Taylor Blau @ 2023-10-03 21:52 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Patrick Steinhardt

On Tue, Oct 03, 2023 at 04:16:17PM -0400, Jeff King wrote:
> On Tue, Oct 03, 2023 at 12:27:51PM -0400, Taylor Blau wrote:
>
> > Note that this behavior is usually just a performance regression. But
> > it's possible it could be a correctness issue.
> >
> > Suppose an object was duplicated among the cruft and non-cruft pack. The
> > MIDX will pick the one from the pack with the lowest mtime, which will
> > always be the cruft one. But if the non-cruft pack happens to sort
> > earlier in lexical order, we'll treat that one as preferred, but not all
> > duplicates will be resolved in favor of that pack.
> >
> > So if we happened to have an object which appears in both packs
> > (e.g., due to a cruft object being freshened, causing it to appear
> > loose, and then repacking it via the `--geometric` repack) it's possible
> > the duplicate would be picked from the non-preferred pack.
>
> I'm not sure I understand how that is a correctness issue. The contents
> of the object are the same in either pack. Or do you mean that the
> pack-reuse code in pack-objects.c may get confused and try to use the
> wrong pack/offset when sending the object out? I would think it would
> always be coming from the preferred pack for that code (so the outcome
> is just that we fail to do the pack-reuse optimization very well, but we
> don't generate a wrong answer).

Admittedly I'm not 100% sure of my interpretation here either, since I
wrote most of this patch's log message nearly two years ago ;-).

I think the issue was as follows:

  - you have an object duplicated among some cruft and non-cruft pack
  - the cruft pack happens to have an older mtime than the non-cruft one
  - the repack reused no non-cruft packs, which (before this patch)
    would let the MIDX avoid picking a preferred pack
  - if the non-cruft pack happens to sort ahead of the cruft one in
    lexical order, it'll appear in the first few bits of the bitmap's
    ordering, causing us to treat it as if it were the preferred pack
  - ...but the MIDX will resolve duplicate objects in favor of the
    oldest pack when neither are preferred, causing us to pick a
    duplicate from the non-"preferred" pack.

(The last point is what causes the pack-bitmap reading code to get
confused, since it expects and makes assumption based on resolving ties
in favor of the preferred pack.)

That feels right to me, but admittedly my memory is pretty hazy on those
bitmap bugs. However, I don't think that this is an issue anymore, since
this patch was written before we had 177c0d6e63 (midx: infer preferred
pack when not given one, 2021-08-31) in GitHub's private fork, and the
patch message here incorrectly refers to statements about GitHub's fork,
not git.git.

But since we have 177c0d6e63 in git.git, this paragraph can and should
be dropped as finding ourselves in that situation is impossible, since
we would infer a preferred pack when not given one, and resolve
duplicates in favor of it.

That makes this purely a performance issue.

> Other than that, the explanation and patch make perfect sense to me, and
> the patch looks good.

Thanks!

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] builtin/repack.c: avoid making cruft packs preferred
  2023-10-03 16:27 [PATCH] builtin/repack.c: avoid making cruft packs preferred Taylor Blau
  2023-10-03 16:30 ` Taylor Blau
  2023-10-03 20:16 ` Jeff King
@ 2023-10-03 21:54 ` Taylor Blau
  2023-10-04 13:13   ` Jeff King
  2023-10-05 11:14   ` Patrick Steinhardt
  2 siblings, 2 replies; 10+ messages in thread
From: Taylor Blau @ 2023-10-03 21:54 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano, Patrick Steinhardt

When doing a `--geometric` repack, we make sure that the preferred pack
(if writing a MIDX) is the largest pack that we *didn't* repack. That
has the effect of keeping the preferred pack in sync with the pack
containing a majority of the repository's reachable objects.

But if the repository happens to double in size, we'll repack
everything. Here we don't specify any `--preferred-pack`, and instead
let the MIDX code choose.

In the past, that worked fine, since there would only be one pack to
choose from: the one we just wrote. But it's no longer necessarily the
case that there is one pack to choose from. It's possible that the
repository also has a cruft pack, too.

If the cruft pack happens to come earlier in lexical order (and has an
earlier mtime than any non-cruft pack), we'll pick that pack as
preferred. This makes it impossible to reuse chunks of the reachable
pack verbatim from pack-objects, so is sub-optimal.

Luckily, this is a somewhat rare circumstance to be in, since we would
have to repack the entire repository during a `--geometric` repack, and
the cruft pack would have to sort ahead of the pack we just created.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
A small re-roll of the original patch, removing a few paragraphs from
the proposed log message that are irrelevant and impossible to produce
in today's git.git.

For more details, see <ZRyNHRf+tQwV+T6P@nand.local>.

Range-diff against v1:
1:  19d9aae08e ! 1:  0353939351 builtin/repack.c: avoid making cruft packs preferred
    @@ Commit message
         have to repack the entire repository during a `--geometric` repack, and
         the cruft pack would have to sort ahead of the pack we just created.

    -    Note that this behavior is usually just a performance regression. But
    -    it's possible it could be a correctness issue.
    -
    -    Suppose an object was duplicated among the cruft and non-cruft pack. The
    -    MIDX will pick the one from the pack with the lowest mtime, which will
    -    always be the cruft one. But if the non-cruft pack happens to sort
    -    earlier in lexical order, we'll treat that one as preferred, but not all
    -    duplicates will be resolved in favor of that pack.
    -
    -    So if we happened to have an object which appears in both packs
    -    (e.g., due to a cruft object being freshened, causing it to appear
    -    loose, and then repacking it via the `--geometric` repack) it's possible
    -    the duplicate would be picked from the non-preferred pack.
    -
         Signed-off-by: Taylor Blau <me@ttaylorr.com>

      ## builtin/repack.c ##

 builtin/repack.c        | 47 ++++++++++++++++++++++++++++++++++++++++-
 t/t7704-repack-cruft.sh | 39 ++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 04770b15fe..a1a893d952 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -355,6 +355,18 @@ static struct generated_pack_data *populate_pack_exts(const char *name)
 	return data;
 }

+static int has_pack_ext(const struct generated_pack_data *data,
+			const char *ext)
+{
+	int i;
+	for (i = 0; i < ARRAY_SIZE(exts); i++) {
+		if (strcmp(exts[i].name, ext))
+			continue;
+		return !!data->tempfiles[i];
+	}
+	BUG("unknown pack extension: '%s'", ext);
+}
+
 static void repack_promisor_objects(const struct pack_objects_args *args,
 				    struct string_list *names)
 {
@@ -772,6 +784,7 @@ static void midx_included_packs(struct string_list *include,

 static int write_midx_included_packs(struct string_list *include,
 				     struct pack_geometry *geometry,
+				     struct string_list *names,
 				     const char *refs_snapshot,
 				     int show_progress, int write_bitmaps)
 {
@@ -801,6 +814,38 @@ static int write_midx_included_packs(struct string_list *include,
 	if (preferred)
 		strvec_pushf(&cmd.args, "--preferred-pack=%s",
 			     pack_basename(preferred));
+	else if (names->nr) {
+		/* The largest pack was repacked, meaning that either
+		 * one or two packs exist depending on whether the
+		 * repository has a cruft pack or not.
+		 *
+		 * Select the non-cruft one as preferred to encourage
+		 * pack-reuse among packs containing reachable objects
+		 * over unreachable ones.
+		 *
+		 * (Note we could write multiple packs here if
+		 * `--max-pack-size` was given, but any one of them
+		 * will suffice, so pick the first one.)
+		 */
+		for_each_string_list_item(item, names) {
+			struct generated_pack_data *data = item->util;
+			if (has_pack_ext(data, ".mtimes"))
+				continue;
+
+			strvec_pushf(&cmd.args, "--preferred-pack=pack-%s.pack",
+				     item->string);
+			break;
+		}
+	} else {
+		/*
+		 * No packs were kept, and no packs were written. The
+		 * only thing remaining are .keep packs (unless
+		 * --pack-kept-objects was given).
+		 *
+		 * Set the `--preferred-pack` arbitrarily here.
+		 */
+		;
+	}

 	if (refs_snapshot)
 		strvec_pushf(&cmd.args, "--refs-snapshot=%s", refs_snapshot);
@@ -1360,7 +1405,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		struct string_list include = STRING_LIST_INIT_NODUP;
 		midx_included_packs(&include, &existing, &names, &geometry);

-		ret = write_midx_included_packs(&include, &geometry,
+		ret = write_midx_included_packs(&include, &geometry, &names,
 						refs_snapshot ? get_tempfile_path(refs_snapshot) : NULL,
 						show_progress, write_bitmaps > 0);

diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
index dc86ca8269..be3735dff0 100755
--- a/t/t7704-repack-cruft.sh
+++ b/t/t7704-repack-cruft.sh
@@ -372,4 +372,43 @@ test_expect_success '--max-cruft-size ignores non-local packs' '
 	)
 '

+test_expect_success 'reachable packs are preferred over cruft ones' '
+	repo="cruft-preferred-packs" &&
+	git init "$repo" &&
+	(
+		cd "$repo" &&
+
+		# This test needs to exercise careful control over when a MIDX
+		# is and is not written. Unset the corresponding TEST variable
+		# accordingly.
+		sane_unset GIT_TEST_MULTI_PACK_INDEX &&
+
+		test_commit base &&
+		test_commit --no-tag cruft &&
+
+		non_cruft="$(echo base | git pack-objects --revs $packdir/pack)" &&
+		# Write a cruft pack which both (a) sorts ahead of the non-cruft
+		# pack in lexical order, and (b) has an older mtime to appease
+		# the MIDX preferred pack selection routine.
+		cruft="$(echo pack-$non_cruft.pack | git pack-objects --cruft $packdir/pack-A)" &&
+		test-tool chmtime -1000 $packdir/pack-A-$cruft.pack &&
+
+		test_commit other &&
+		git repack -d &&
+
+		git repack --geometric 2 -d --write-midx --write-bitmap-index &&
+
+		# After repacking, there are two packs left: one reachable one
+		# (which is the result of combining both of the existing two
+		# non-cruft packs), and one cruft pack.
+		find .git/objects/pack -type f -name "*.pack" >packs &&
+		test_line_count = 2 packs &&
+
+		# Make sure that the pack we just wrote is marked as preferred,
+		# not the cruft one.
+		pack="$(test-tool read-midx --preferred-pack $objdir)" &&
+		test_path_is_missing "$packdir/$(basename "$pack" ".idx").mtimes"
+	)
+'
+
 test_done
--
2.42.0.341.g9d86323b73.dirty

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] builtin/repack.c: avoid making cruft packs preferred
  2023-10-03 21:54 ` [PATCH v2] " Taylor Blau
@ 2023-10-04 13:13   ` Jeff King
  2023-10-05 11:14   ` Patrick Steinhardt
  1 sibling, 0 replies; 10+ messages in thread
From: Jeff King @ 2023-10-04 13:13 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano, Patrick Steinhardt

On Tue, Oct 03, 2023 at 05:54:19PM -0400, Taylor Blau wrote:

> A small re-roll of the original patch, removing a few paragraphs from
> the proposed log message that are irrelevant and impossible to produce
> in today's git.git.
> 
> For more details, see <ZRyNHRf+tQwV+T6P@nand.local>.

Thanks, this looks good to me, and the explanation in the linked message
makes sense.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] builtin/repack.c: avoid making cruft packs preferred
  2023-10-03 21:54 ` [PATCH v2] " Taylor Blau
  2023-10-04 13:13   ` Jeff King
@ 2023-10-05 11:14   ` Patrick Steinhardt
  2023-10-05 20:27     ` Eric Sunshine
  1 sibling, 1 reply; 10+ messages in thread
From: Patrick Steinhardt @ 2023-10-05 11:14 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1146 bytes --]

On Tue, Oct 03, 2023 at 05:54:19PM -0400, Taylor Blau wrote:
[snip]
> @@ -801,6 +814,38 @@ static int write_midx_included_packs(struct string_list *include,
>  	if (preferred)
>  		strvec_pushf(&cmd.args, "--preferred-pack=%s",
>  			     pack_basename(preferred));
> +	else if (names->nr) {
> +		/* The largest pack was repacked, meaning that either
> +		 * one or two packs exist depending on whether the
> +		 * repository has a cruft pack or not.

Nit: this comment will grow stale soonish once your patch series lands
that introduces a maximum packfile size for cruft packs as there can be
arbitrarily many cruft packs from thereon.

> +		 * Select the non-cruft one as preferred to encourage
> +		 * pack-reuse among packs containing reachable objects
> +		 * over unreachable ones.
> +		 *
> +		 * (Note we could write multiple packs here if
> +		 * `--max-pack-size` was given, but any one of them
> +		 * will suffice, so pick the first one.)
> +		 */

Well, okay, you kind of acknowledge this here.

The rest of this patch series looks good to me and makes sense. I don't
really think that this comment here is worth a reroll.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] builtin/repack.c: avoid making cruft packs preferred
  2023-10-05 11:14   ` Patrick Steinhardt
@ 2023-10-05 20:27     ` Eric Sunshine
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Sunshine @ 2023-10-05 20:27 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Taylor Blau, git, Jeff King, Junio C Hamano

On Thu, Oct 5, 2023 at 10:46 AM Patrick Steinhardt <ps@pks.im> wrote:
> On Tue, Oct 03, 2023 at 05:54:19PM -0400, Taylor Blau wrote:
> > +             /* The largest pack was repacked, meaning that either
> > +              * one or two packs exist depending on whether the
> > +              * repository has a cruft pack or not.
>
> Nit: this comment will grow stale soonish once your patch series lands
> that introduces a maximum packfile size for cruft packs as there can be
> arbitrarily many cruft packs from thereon.
>
> The rest of this patch series looks good to me and makes sense. I don't
> really think that this comment here is worth a reroll.

If you do happen to reroll for some reason, though, perhaps take care
of the minor style violation, as well. Use:

    /*
     * Multi-line comment
     * style.
     */

rather than:

    /* Multi-line comment
     * style.
     */

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-10-05 20:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-03 16:27 [PATCH] builtin/repack.c: avoid making cruft packs preferred Taylor Blau
2023-10-03 16:30 ` Taylor Blau
2023-10-03 20:20   ` Junio C Hamano
2023-10-03 21:39     ` Taylor Blau
2023-10-03 20:16 ` Jeff King
2023-10-03 21:52   ` Taylor Blau
2023-10-03 21:54 ` [PATCH v2] " Taylor Blau
2023-10-04 13:13   ` Jeff King
2023-10-05 11:14   ` Patrick Steinhardt
2023-10-05 20:27     ` Eric Sunshine

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.