All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, "Jeff King" <peff@peff.net>,
	"Felipe Contreras" <felipe.contreras@gmail.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	"Chris Torek" <chris.torek@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [PATCH v2 1/3] p5400: add perf tests for git-receive-pack(1)
Date: Tue, 29 Jun 2021 14:09:25 +0200	[thread overview]
Message-ID: <87r1gkg8rf.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <YNq7MD/fMQp05I21@ncase>


On Tue, Jun 29 2021, Patrick Steinhardt wrote:

> [[PGP Signed Part:Undecided]]
> On Mon, Jun 28, 2021 at 09:49:54AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Mon, Jun 28 2021, Patrick Steinhardt wrote:
>> 
>> > [[PGP Signed Part:Undecided]]
>> > We'll the connectivity check logic for git-receive-pack(1) in the
>> > following commits to make it perform better. As a preparatory step, add
>> > some benchmarks such that we can measure these changes.
>> >
>> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
>> > ---
>> >  t/perf/p5400-receive-pack.sh | 97 ++++++++++++++++++++++++++++++++++++
>> >  1 file changed, 97 insertions(+)
>> >  create mode 100755 t/perf/p5400-receive-pack.sh
>> >
>> > diff --git a/t/perf/p5400-receive-pack.sh b/t/perf/p5400-receive-pack.sh
>> > new file mode 100755
>> > index 0000000000..a945e014a3
>> > --- /dev/null
>> > +++ b/t/perf/p5400-receive-pack.sh
>> > @@ -0,0 +1,97 @@
>> > +#!/bin/sh
>> > +
>> > +test_description="Tests performance of receive-pack"
>> > +
>> > +. ./perf-lib.sh
>> > +
>> > +test_perf_large_repo
>> 
>> From the runtime I think this just needs test_perf_default_repo, no?
>> I.e. we should only have *_large_* for cases where git.git is too small
>> to produce meaningful results.
>> 
>> Part of th problem is that git.git has become larger over time...
>
> I did these tests for 3/3 with git.git first, and results were
> significantly different. The performance issues I'm trying to fix with
> the connectivity check really only start to show up with largish
> repositories.
>
>> > +test_expect_success 'setup' '
>> > +	# Create a main branch such that we do not have to rely on any specific
>> > +	# branch to exist in the perf repository.
>> > +	git switch --force-create main &&
>> > +
>> > +	# Set up a pre-receive hook such that no refs will ever be changed.
>> > +	# This easily allows multiple perf runs, but still exercises
>> > +	# server-side reference negotiation and checking for consistency.
>> > +	mkdir hooks &&
>> > +	write_script hooks/pre-receive <<-EOF &&
>> > +		#!/bin/sh
>> 
>> You don't need the #!/bin/sh here, and it won't be used. write_script()
>> adds it (or the wanted shell path).
>
> Makes sense.
>
>> > +		echo "failed in pre-receive hook"
>> > +		exit 1
>> > +	EOF
>> > +	cat >config <<-EOF &&
>> > +		[core]
>> > +			hooksPath=$(pwd)/hooks
>> > +	EOF
>> 
>> Easier understood IMO as:
>> 
>>     git config -f config core.hooksPath ...
>
> Yup, will change.
>
>> > +	GIT_CONFIG_GLOBAL="$(pwd)/config" &&
>> > +	export GIT_CONFIG_GLOBAL &&
>> > +
>> > +	git switch --create updated &&
>> > +	test_commit --no-tag updated
>> > +'
>> > +
>> > +setup_empty() {
>> > +	git init --bare "$2"
>> > +}
>> 
>> I searched ahead for setup_empty, looked unused, but...
>> 
>> > +setup_clone() {
>> > +	git clone --bare --no-local --branch main "$1" "$2"
>> > +}
>> > +
>> > +setup_clone_bitmap() {
>> > +	git clone --bare --no-local --branch main "$1" "$2" &&
>> > +	git -C "$2" repack -Adb
>> > +}
>> > +
>> > +# Create a reference for each commit in the target repository with extra-refs.
>> > +# While this may be an atypical setup, biggish repositories easily end up with
>> > +# hundreds of thousands of refs, and this is a good enough approximation.
>> > +setup_extrarefs() {
>> > +	git clone --bare --no-local --branch main "$1" "$2" &&
>> > +	git -C "$2" log --all --format="tformat:create refs/commit/%h %H" |
>> > +		git -C "$2" update-ref --stdin
>> > +}
>> > +
>> > +# Create a reference for each commit in the target repository with extra-refs.
>> > +# While this may be an atypical setup, biggish repositories easily end up with
>> > +# hundreds of thousands of refs, and this is a good enough approximation.
>> > +setup_extrarefs_bitmap() {
>> > +	git clone --bare --no-local --branch main "$1" "$2" &&
>> > +	git -C "$2" log --all --format="tformat:create refs/commit/%h %H" |
>> > +		git -C "$2" update-ref --stdin &&
>> > +	git -C "$2" repack -Adb
>> > +}
>> > +
>> > +for repo in empty clone clone_bitmap extrarefs extrarefs_bitmap
>> > +do
>> > +	test_expect_success "$repo setup" '
>> 
>> > +		rm -rf target.git &&
>> > +		setup_$repo "$(pwd)" target.git
>> 
>> ...here we use it via interpolation.
>> 
>> I'd find this whole pattern much easier to understand if the setups were
>> just a bunch of test_expect_success that created a repo_empty.git,
>> repo_extrarefs.git etc. Then this loop would be:
>> 
>>     for repo in repo*.git ...
>> 
>> I'd think that would also give you more meaningful perf data, as now the
>> OS will churn between the clone & the subsequent push tests, better to
>> do all the setup, then all the different perf tests.
>> 
>> Perhaps there's also a way to re-use this setup across different runs, I
>> don't know/can't remember if t/perf has a less transient thing than the
>> normal trash directory to use for that.
>
> I originally had code like this, but the issue with first creating all
> the repos is that it requires lots of disk space with large repos given
> that we'll clone it once per setup. Combined with the fact that I
> often run tests in tmpfs, this led to out-of-memory situations quite
> fast given that I had 3x6GB repositories plus the seeded packfiles in
> RAM.
>
> This is why I've changed the setup to do the setup as we go, to bring
> disk usage down to something sane.
>
> Patrick
>
> [[End of PGP Signed Part]]

Ah, I see. In that case wouldn't it be even better/faster with/without
my suggestion to not use "clone" here, which would either be manually
set up with alternates, or removing the --no-local flag.

You'd then share bulk of the object database, and just have different
references. B.t.w. you'll probably get less noise/more relevant results
if you then do a "pack-refs" after creating those N references.

So you should have:

 0. Your "big" test repop (not used directly)
 1. An empty repo
 2. The "big" test repo itself, but just the HEAD branch, using
    alternates to point to #0
 3. Ditto, but we create a crapload of refs for each commit for a
    version of #2.
 4. Ditto #3 (could even "cp" over the packed refs file to save time),
    but add a bitmap on top.

Well, presumably for #4 we'd actually want to do the "git repack -Adb"
for #2 (or enforce that #0 must have it), then just move the *.bitmap
file(s) to #4. Now the test case conflates whether we have bitmaps with
how well (re)packed something is.

I think this might also allow you to get rid of the pre-receive hook for
a "real" push test, since the side-repos would be so cheap at this point
that you could perhaps setup N of them to push into.

  reply	other threads:[~2021-06-29 12:23 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-28  5:33 [PATCH v2 0/3] Speed up connectivity checks via bitmaps Patrick Steinhardt
2021-06-28  5:33 ` [PATCH v2 1/3] p5400: add perf tests for git-receive-pack(1) Patrick Steinhardt
2021-06-28  7:49   ` Ævar Arnfjörð Bjarmason
2021-06-29  6:18     ` Patrick Steinhardt
2021-06-29 12:09       ` Ævar Arnfjörð Bjarmason [this message]
2021-06-28  5:33 ` [PATCH v2 2/3] receive-pack: skip connectivity checks on delete-only commands Patrick Steinhardt
2021-06-28  8:00   ` Ævar Arnfjörð Bjarmason
2021-06-28  8:06     ` Ævar Arnfjörð Bjarmason
2021-06-29  6:26     ` Patrick Steinhardt
2021-06-30  1:31   ` Jeff King
2021-06-30  1:35     ` Jeff King
2021-06-30 13:52     ` Patrick Steinhardt
2021-06-28  5:33 ` [PATCH v2 3/3] connected: implement connectivity check using bitmaps Patrick Steinhardt
2021-06-28 20:23   ` Taylor Blau
2021-06-29 22:44     ` Taylor Blau
2021-06-30  2:04       ` Jeff King
2021-06-30  3:07         ` Taylor Blau
2021-06-30  5:45           ` Jeff King
2021-07-02 17:44             ` Taylor Blau
2021-07-02 21:21               ` Jeff King
2021-06-30  1:51   ` Jeff King
2021-07-20 14:26     ` Patrick Steinhardt
2021-08-02  9:37 ` [PATCH v3 0/4] Speed up connectivity checks Patrick Steinhardt
2021-08-02  9:38   ` [PATCH v3 1/4] connected: do not sort input revisions Patrick Steinhardt
2021-08-02 12:49     ` Ævar Arnfjörð Bjarmason
2021-08-03  8:50       ` Patrick Steinhardt
2021-08-04 11:01         ` Ævar Arnfjörð Bjarmason
2021-08-02 19:00     ` Junio C Hamano
2021-08-03  8:55       ` Patrick Steinhardt
2021-08-03 21:47         ` Junio C Hamano
2021-08-02  9:38   ` [PATCH v3 2/4] revision: stop retrieving reference twice Patrick Steinhardt
2021-08-02 12:53     ` Ævar Arnfjörð Bjarmason
2021-08-02  9:38   ` [PATCH v3 3/4] revision: avoid loading object headers multiple times Patrick Steinhardt
2021-08-02 12:55     ` Ævar Arnfjörð Bjarmason
2021-08-05 10:12       ` Patrick Steinhardt
2021-08-02 19:40     ` Junio C Hamano
2021-08-03  9:07       ` Patrick Steinhardt
2021-08-06 14:17         ` Patrick Steinhardt
2021-08-02  9:38   ` [PATCH v3 4/4] revision: avoid hitting packfiles when commits are in commit-graph Patrick Steinhardt
2021-08-02 20:01     ` Junio C Hamano
2021-08-03  9:16       ` Patrick Steinhardt
2021-08-03 21:56         ` Junio C Hamano
2021-08-05 11:01           ` Patrick Steinhardt
2021-08-05 16:16             ` Junio C Hamano
2021-08-04 10:51         ` Ævar Arnfjörð Bjarmason
2021-08-05 11:25   ` [PATCH v4 0/6] Speed up connectivity checks Patrick Steinhardt
2021-08-05 11:25     ` [PATCH v4 1/6] revision: separate walk and unsorted flags Patrick Steinhardt
2021-08-05 18:47       ` Junio C Hamano
2021-08-05 11:25     ` [PATCH v4 2/6] connected: do not sort input revisions Patrick Steinhardt
2021-08-05 18:44       ` Junio C Hamano
2021-08-06  6:00         ` Patrick Steinhardt
2021-08-06 16:50           ` Junio C Hamano
2021-08-05 11:25     ` [PATCH v4 3/6] revision: stop retrieving reference twice Patrick Steinhardt
2021-08-05 11:25     ` [PATCH v4 4/6] revision: avoid loading object headers multiple times Patrick Steinhardt
2021-08-05 11:25     ` [PATCH v4 5/6] commit-graph: split out function to search commit position Patrick Steinhardt
2021-08-05 11:25     ` [PATCH v4 6/6] revision: avoid hitting packfiles when commits are in commit-graph Patrick Steinhardt
2021-08-09  8:00 ` [PATCH v5 0/5] Speed up connectivity checks Patrick Steinhardt
2021-08-09  8:02   ` Patrick Steinhardt
2021-08-09  8:11 ` Patrick Steinhardt
2021-08-09  8:11   ` [PATCH v5 1/5] revision: separate walk and unsorted flags Patrick Steinhardt
2021-08-09  8:11   ` [PATCH v5 2/5] connected: do not sort input revisions Patrick Steinhardt
2021-08-09  8:11   ` [PATCH v5 3/5] revision: stop retrieving reference twice Patrick Steinhardt
2021-08-09  8:11   ` [PATCH v5 4/5] commit-graph: split out function to search commit position Patrick Steinhardt
2021-08-09  8:12   ` [PATCH v5 5/5] revision: avoid hitting packfiles when commits are in commit-graph Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r1gkg8rf.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=chris.torek@gmail.com \
    --cc=felipe.contreras@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.