All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, peff@peff.net
Subject: Re: [PATCH 4/4] t5326: test propagating hashcache values
Date: Tue, 14 Sep 2021 01:11:34 -0400	[thread overview]
Message-ID: <YUAvBso+UsBTYizb@nand.local> (raw)
In-Reply-To: <xmqqpmtbc3o3.fsf@gitster.g>

On Mon, Sep 13, 2021 at 07:05:32PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Alas, there they are. They are basically no different than having the
> > name-hash for single pack bitmaps, it's just now we don't throw them
> > away when generating a MIDX bitmap from a state where the repository
> > already has a single-pack bitmap.
>
> I actually wasn't expecting any CPU/time difference.

I think it is possible to see the CPU usage go down without affecting the
resulting pack size. See below for a more detailed analysis.

> I hope that we are talking about the same name-hash, which is used
> to sort the blobs so that when pack-objects try to find a good delta
> base, the blobs from the same path will sit close to each other and
> hopefully fit in the pack window.

Yes, of course.

> The effect I was hoping to see by not discarding the information was
> that we find better delta base hence smaller deltas in the resulting
> packfiles.

I think it is possible to observe either a decrease in CPU or a decrease
in the resulting pack size.

In my experience having the name-hash filled in results in finding good
delta pairs much more quickly than without, but that in many
repositories the resulting pack size is basically the same. In other
words, the resulting pack is pretty similar whether you use the
name-hash or not, it just affects how quickly you get there.

Some experiments to back that up: I instrumented the existing p5326 by
replacing anything like "pack-objects ... --stdout >/dev/null" with
"pack-objects ... --stdout >pack.tmp" and then added test_size's to
measure the size of each pack.

On the tip of this branch, the results are:

		Test                              origin/tb/multi-pack-bitmaps   HEAD
		----------------------------------------------------------------------------
		5326.5: simulated clone size                 3.3G                 3.3G +0.0%
		5326.7: simulated fetch size                10.5M                10.5M -0.2%
		5326.21: clone (partial bitmap)              3.3G                 3.3G +0.0%

Looking at c171d3e677 (pack-bitmap: implement optional name_hash cache,
2013-10-22), I modified[1] that script to replace timing pack-objects with
counting the number of bytes it wrote.

Doing that shows that the name-hash doesn't make a substantial difference in
the resulting pack size (numbers on a recent-ish copy of the kernel):

		Test                      c171d3e677d777c50231d8dea32ae691936da819^   c171d3e677d777c50231d8dea32ae691936da819
		--------------------------------------------------------------------------------------------------------------
		9999.3: simulated clone              3.2G                                        3.2G +0.0%
		9999.4: simulated fetch                32                                          32 +0.0%
		9999.6: partial bitmap               3.1G                                        3.1G +0.0%

(As a mostly-unrelated aside, I was curious why the pack size jumped from 3.2GB
to 3.3GB, but I can reproduce that jump even in p5310--the single pack bitmap
test--on the tip of my branch. So it does appear to be a regression which I'll
look into, but it's unrelated to this branch or MIDX bitmaps).

Thanks,
Taylor

[1]: https://gist.github.com/ttaylorr/6cfa3eb9fd012f81b833873d50f96f71

  reply	other threads:[~2021-09-14  5:11 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-07 21:17 [PATCH 0/4] pack-bitmap: permute existing namehash values Taylor Blau
2021-09-07 21:17 ` [PATCH 1/4] t/helper/test-bitmap.c: add 'dump-hashes' mode Taylor Blau
2021-09-08  1:37   ` Ævar Arnfjörð Bjarmason
2021-09-08  2:24     ` Taylor Blau
2021-09-07 21:17 ` [PATCH 2/4] pack-bitmap.c: propagate namehash values from existing bitmaps Taylor Blau
2021-09-07 21:18 ` [PATCH 3/4] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps Taylor Blau
2021-09-08  1:40   ` Ævar Arnfjörð Bjarmason
2021-09-08  2:28     ` Taylor Blau
2021-09-09  8:18       ` Ævar Arnfjörð Bjarmason
2021-09-09  9:34         ` Ævar Arnfjörð Bjarmason
2021-09-09 14:55           ` Taylor Blau
2021-09-09 15:50             ` Ævar Arnfjörð Bjarmason
2021-09-09 16:23               ` Taylor Blau
2021-09-09 14:47         ` Taylor Blau
2021-09-13  0:38   ` Junio C Hamano
2021-09-14  1:15     ` Taylor Blau
2021-09-07 21:18 ` [PATCH 4/4] t5326: test propagating hashcache values Taylor Blau
2021-09-08  1:46   ` Ævar Arnfjörð Bjarmason
2021-09-08  2:30     ` Taylor Blau
2021-09-17  8:56       ` Ævar Arnfjörð Bjarmason
2021-09-17 17:32         ` Taylor Blau
2021-09-17 19:22           ` Ævar Arnfjörð Bjarmason
2021-09-13  0:46   ` Junio C Hamano
2021-09-14  1:12     ` Taylor Blau
2021-09-14  2:05       ` Junio C Hamano
2021-09-14  5:11         ` Taylor Blau [this message]
2021-09-14  5:17           ` Taylor Blau
2021-09-14  5:27           ` Jeff King
2021-09-14  5:31             ` Taylor Blau
2021-09-14  5:23         ` Jeff King
2021-09-14  5:49           ` Junio C Hamano
2021-09-14 22:05 ` [PATCH v2 0/7] pack-bitmap: permute existing namehash values Taylor Blau
2021-09-14 22:06   ` [PATCH v2 1/7] t/helper/test-bitmap.c: add 'dump-hashes' mode Taylor Blau
2021-09-14 22:06   ` [PATCH v2 2/7] pack-bitmap.c: propagate namehash values from existing bitmaps Taylor Blau
2021-09-14 22:06   ` [PATCH v2 3/7] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps Taylor Blau
2021-09-14 22:06   ` [PATCH v2 4/7] p5326: create missing 'perf-tag' tag Taylor Blau
2021-09-16 22:36     ` Jeff King
2021-09-17  4:14       ` Taylor Blau
2021-09-14 22:06   ` [PATCH v2 5/7] p5326: don't set core.multiPackIndex unnecessarily Taylor Blau
2021-09-16 22:38     ` Jeff King
2021-09-14 22:06   ` [PATCH v2 6/7] p5326: generate pack bitmaps before writing the MIDX bitmap Taylor Blau
2021-09-16 22:45     ` Jeff King
2021-09-17  4:20       ` Taylor Blau
2021-09-14 22:06   ` [PATCH v2 7/7] t5326: test propagating hashcache values Taylor Blau
2021-09-16 22:49     ` Jeff King
2021-09-16 22:52   ` [PATCH v2 0/7] pack-bitmap: permute existing namehash values Jeff King
2021-09-17 21:21 ` [PATCH v3 " Taylor Blau
2021-09-17 21:21   ` [PATCH v3 1/7] t/helper/test-bitmap.c: add 'dump-hashes' mode Taylor Blau
2021-09-17 21:21   ` [PATCH v3 2/7] pack-bitmap.c: propagate namehash values from existing bitmaps Taylor Blau
2021-09-17 21:21   ` [PATCH v3 3/7] midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps Taylor Blau
2021-09-17 21:21   ` [PATCH v3 4/7] p5326: create missing 'perf-tag' tag Taylor Blau
2021-09-17 21:21   ` [PATCH v3 5/7] p5326: don't set core.multiPackIndex unnecessarily Taylor Blau
2021-09-17 21:21   ` [PATCH v3 6/7] p5326: generate pack bitmaps before writing the MIDX bitmap Taylor Blau
2021-09-17 21:21   ` [PATCH v3 7/7] t5326: test propagating hashcache values Taylor Blau
2021-09-17 22:12   ` [PATCH v3 0/7] pack-bitmap: permute existing namehash values Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YUAvBso+UsBTYizb@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.