git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] pack-objects: update "nr_seen" progress based on pack-reused count
@ 2021-04-12  3:41 Jeff King
  2021-04-13  0:48 ` Derrick Stolee
  0 siblings, 1 reply; 3+ messages in thread
From: Jeff King @ 2021-04-12  3:41 UTC (permalink / raw)
  To: git

When serving a clone or fetch with bitmaps, after deciding which objects
need to be sent our "pack reuse" mechanism kicks in: we try to send
more-or-less verbatim a bunch of objects from the beginning of the
bitmapped packfile without even adding them to the to_pack.objects
array.

After deciding which objects will be in the "reused" portion, we update
nr_result to account for those, and then trigger display_progress() to
show the user (who is undoubtedly dazzled that we managed to enumerate
so many objects so quickly).

But then something confusing happens: the "Enumerating objects" progress
meter jumps _backwards_, counting up from zero the number of objects we
actually add into to_pack.objects.

This worked correctly once upon a time, but was broken in 5af050437a
(pack-objects: show some progress when counting kept objects,
2018-04-15), when the latter half of that progress meter switched to
using a separate nr_seen counter, rather than nr_result. Nobody noticed
for two reasons:

  - prior to the pack-reuse fixes from a14aebeac3 (Merge branch
    'jk/packfile-reuse-cleanup', 2020-02-14), the reuse code almost
    never kicked in anyway

  - the output looks _kind of_ correct. The "backwards" moment is hard
    to catch, because we overwrite the old progress number with the new
    one, and the larger number is displayed only for a second. So unless
    you look at that exact second, you just see the much smaller value,
    counting up to the number of non-reused objects (though of course if
    you catch it in stderr, or look at GIT_TRACE_PACKET from a server
    with bitmaps, you can see both values).

This smaller output isn't wrong per se, but isn't counting what we ever
intended to. We should give the user the whole number of objects we
considered (which, as per 5af050437a's original purpose, is already
_not_ a count of what goes into to_pack.objects). The follow-on
"Counting objects" meter shows the actual number of objects we feed into
that array.

We can easily fix this by bumping (and showing) nr_seen for the
pack-reused objects. When the included test is run without this patch,
the second pack-objects invocation produces "Enumerating objects: 1" to
show the one loose object, even though the resulting pack has hundreds
of objects in it. With it, we jump to "Enumerating objects: 674" after
deciding on reuse, and then "675" when we add in the loose object.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/pack-objects.c  |  3 ++-
 t/t5310-pack-bitmaps.sh | 23 +++++++++++++++++++++++
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 525c2d8552..faee5a5c76 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3527,7 +3527,8 @@ static int get_object_list_from_bitmap(struct rev_info *revs)
 			&reuse_packfile_bitmap)) {
 		assert(reuse_packfile_objects);
 		nr_result += reuse_packfile_objects;
-		display_progress(progress_state, nr_result);
+		nr_seen += reuse_packfile_objects;
+		display_progress(progress_state, nr_seen);
 	}
 
 	traverse_bitmap_commit_list(bitmap_git, revs,
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 40b9f63244..8d0933b6e5 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -461,6 +461,29 @@ test_expect_success 'truncated bitmap fails gracefully (cache)' '
 	test_i18ngrep corrupted.bitmap.index stderr
 '
 
+test_expect_success 'enumerating progress counts pack-reused objects' '
+	count=$(git rev-list --objects --all --count) &&
+	git repack -adb &&
+
+	# check first with only reused objects; confirm that our progress
+	# showed the right number, and also that we did pack-reuse as expected.
+	# Check only the final "done" line of the meter (there may be an
+	# arbitrary number of intermediate lines ending with CR).
+	GIT_PROGRESS_DELAY=0 \
+		git pack-objects --all --stdout --progress \
+		</dev/null >/dev/null 2>stderr &&
+	grep "Enumerating objects: $count, done" stderr &&
+	grep "pack-reused $count" stderr &&
+
+	# now the same but with one non-reused object
+	git commit --allow-empty -m "an extra commit object" &&
+	GIT_PROGRESS_DELAY=0 \
+		git pack-objects --all --stdout --progress \
+		</dev/null >/dev/null 2>stderr &&
+	grep "Enumerating objects: $((count+1)), done" stderr &&
+	grep "pack-reused $count" stderr
+'
+
 # have_delta <obj> <expected_base>
 #
 # Note that because this relies on cat-file, it might find _any_ copy of an
-- 
2.31.1.657.gecf191e18d

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] pack-objects: update "nr_seen" progress based on pack-reused count
  2021-04-12  3:41 [PATCH] pack-objects: update "nr_seen" progress based on pack-reused count Jeff King
@ 2021-04-13  0:48 ` Derrick Stolee
  2021-04-13  7:39   ` Jeff King
  0 siblings, 1 reply; 3+ messages in thread
From: Derrick Stolee @ 2021-04-13  0:48 UTC (permalink / raw)
  To: Jeff King, git

On 4/11/2021 11:41 PM, Jeff King wrote:> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 525c2d8552..faee5a5c76 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3527,7 +3527,8 @@ static int get_object_list_from_bitmap(struct rev_info *revs)
>  			&reuse_packfile_bitmap)) {
>  		assert(reuse_packfile_objects);
>  		nr_result += reuse_packfile_objects;
> -		display_progress(progress_state, nr_result);
> +		nr_seen += reuse_packfile_objects;
> +		display_progress(progress_state, nr_seen);

nr_seen and nr_result are defined in the same line with nr_written,
as static globals. I can understand how this mistake happened.

There are no other places where nr_result is used for
display_progress() while nr_seen _is_ used a couple more times.

> +test_expect_success 'enumerating progress counts pack-reused objects' '
> +	count=$(git rev-list --objects --all --count) &&
> +	git repack -adb &&
> +
> +	# check first with only reused objects; confirm that our progress
> +	# showed the right number, and also that we did pack-reuse as expected.
> +	# Check only the final "done" line of the meter (there may be an
> +	# arbitrary number of intermediate lines ending with CR).
> +	GIT_PROGRESS_DELAY=0 \
> +		git pack-objects --all --stdout --progress \
> +		</dev/null >/dev/null 2>stderr &&
> +	grep "Enumerating objects: $count, done" stderr &&
> +	grep "pack-reused $count" stderr &&
> +
> +	# now the same but with one non-reused object
> +	git commit --allow-empty -m "an extra commit object" &&
> +	GIT_PROGRESS_DELAY=0 \
> +		git pack-objects --all --stdout --progress \
> +		</dev/null >/dev/null 2>stderr &&
> +	grep "Enumerating objects: $((count+1)), done" stderr &&
> +	grep "pack-reused $count" stderr
> +'

Good test. LGTM.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] pack-objects: update "nr_seen" progress based on pack-reused count
  2021-04-13  0:48 ` Derrick Stolee
@ 2021-04-13  7:39   ` Jeff King
  0 siblings, 0 replies; 3+ messages in thread
From: Jeff King @ 2021-04-13  7:39 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git

On Mon, Apr 12, 2021 at 08:48:24PM -0400, Derrick Stolee wrote:

> On 4/11/2021 11:41 PM, Jeff King wrote:> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> > index 525c2d8552..faee5a5c76 100644
> > --- a/builtin/pack-objects.c
> > +++ b/builtin/pack-objects.c
> > @@ -3527,7 +3527,8 @@ static int get_object_list_from_bitmap(struct rev_info *revs)
> >  			&reuse_packfile_bitmap)) {
> >  		assert(reuse_packfile_objects);
> >  		nr_result += reuse_packfile_objects;
> > -		display_progress(progress_state, nr_result);
> > +		nr_seen += reuse_packfile_objects;
> > +		display_progress(progress_state, nr_seen);
> 
> nr_seen and nr_result are defined in the same line with nr_written,
> as static globals. I can understand how this mistake happened.

I think it is even more subtle than that. Both topics (the one to
convert callers to use nr_seen as progress, and the one adding this call
to use nr_result for progress) were "in flight" at the same time, but in
a funny way. The latter was written much earlier in a fork of Git, but
not sent upstream immediately. So the nr_seen conversion didn't know
about it, and later the other topic was "merged" (actually,
cherry-picked) into upstream.

So it really is a semantic conflict when merging two branches that
happened simultaneously. It's a little odd in that the branches were
running simultaneously for years, but the same thing could happen even
with two topics much closer together.

Anyway, that's all just an interesting sidenote.

> There are no other places where nr_result is used for
> display_progress() while nr_seen _is_ used a couple more times.

Thanks for double-checking. That make sense, since we added only this
one call, and the others were all converted to nr_seen when it was
introduced.

-Peff

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-04-13  7:39 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-12  3:41 [PATCH] pack-objects: update "nr_seen" progress based on pack-reused count Jeff King
2021-04-13  0:48 ` Derrick Stolee
2021-04-13  7:39   ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).