From: Taylor Blau <me@ttaylorr.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: "Taylor Blau" <me@ttaylorr.com>,
git@vger.kernel.org, "SZEDER Gábor" <szeder.dev@gmail.com>,
"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [PATCH v2] refs: implement reference transaction hook
Date: Mon, 15 Jun 2020 10:46:39 -0600 [thread overview]
Message-ID: <20200615164639.GD71506@syl.local> (raw)
In-Reply-To: <20200604073632.GA498923@tanuki.pks.im>
Hi Patrick,
Sorry for the slow response. I was out of the office last week and am
only just now getting a chance to catch up on the backlog of emails that
I missed.
On Thu, Jun 04, 2020 at 09:36:32AM +0200, Patrick Steinhardt wrote:
> On Wed, Jun 03, 2020 at 10:51:42AM -0600, Taylor Blau wrote:
> > Hi Patrick,
> >
> > On Wed, Jun 03, 2020 at 02:27:50PM +0200, Patrick Steinhardt wrote:
> > > In order to test the impact on the case where we don't have any
> > > "reference-transaction" hook installed in the repository, this commit
> > > introduces a new performance test for git-update-refs(1). Run against an
> > > empty repository, it produces the following results:
> > >
> > > Test HEAD~ HEAD
> > > ------------------------------------------------------------------------------
> > > 1400.2: update existing reference 2.05(1.58+0.54) 2.08(1.58+0.57) +1.5%
> > > 1400.3: create and destroy reference 1.79(1.38+0.49) 1.82(1.39+0.51) +1.7%
> > >
> > > So the overhead is around ~1.5%. Given that git-update-refs(1) is a
> > > near-direct wrapper around reference transactions, there likely is no
> > > other command that is impacted much worse than this.
> >
> > This is a serious performance regression that is worth considering. For
> > example, a 1.5% slow-down on average in reference transactions would
> > cause be enough for me to bisect the regression down to see what
> > changed.
> >
> > What are ways that this could be avoided? I bet that this would work
> > quite well with Emily's work on hooks, where we could check in the
> > configuration first whether a hook is even configured.
> >
> > Could this be integrated with that? If not, could you cache the result
> > of whether or not the hook exists, and/or implement some mechanism to
> > say something like, for eg., "only run reference transaction hooks
> > core.blah = 1" is true?
>
> I wasn't aware of her work until now, so I'll take a look.
>
> Meanwhile, I toyed around with performance and tried out two different
> caching mechanisms:
>
> - map-cache: `find_hook()` uses a map of hook names mapped to their
> resolved hook path (or `NULL` if none was found). This is a
> generic mechanism working across all hooks, but has some overhead
> in maintaining the map.
>
> - reftx-cache: `run_transaction_hook()` caches the path returned by
> `find_hook()`. It's got less overhead as it only caches the path,
> but obviously only applies to the reference-transaction hook.
>
> In theory, we could go even further and cache the hook's file
> descriptor, executing it via fexecve(3P) whenever it's required, but I
> didn't go down that route yet. This would also solve the atomicity
> issue, though, if the administrator replaces the reference-transactions
> hook halfway through the transaction.
>
> Benchmarking results are mixed, mostly in the sense that I can choose
> which run of the benchmarks I take in order to have my own work look
> better or worse, as timings vary quite a lot between runs. Which
> probably hints at the fact that the benchmarks themselves aren't really
> reliable. The issue is that a single git-update-ref(1) run finishes so
> quick that it's hard to measure with our benchmarks, but spawning
> thousands of them to get something different than 0.0s very much depends
> on the operating system and thus fluctuates. On the other hand, if we
> only spawn a single git-update-refs and have it perform a few thousand
> ref updates, overhead of the hook will not show at all as it is only
> executed once per prepare/commit of the transaction.
>
> The following timings are for the case where we execute
>
> git update-ref refs/heads/update-branch PRE POST &&
> git update-ref refs/heads/update-branch POST PRE
>
> respectively
>
> git update-ref refs/heads/new POST &&
> git update-ref -d refs/heads/new
>
> a thousand times:
>
> Test master ref-hook-no-cache ref-hook-map-cache ref-hook-reftx-cache
> ------------------------------------------------------------------------------------------------------------------------------
> 1400.2: update existing reference 1.96(1.50+0.53) 2.00(1.54+0.53) +2.0% 2.02(1.54+0.55) +3.1% 1.98(1.52+0.52) +1.0%
> 1400.3: create and destroy reference 1.74(1.33+0.49) 1.77(1.38+0.47) +1.7% 1.77(1.36+0.48) +1.7% 1.76(1.35+0.49) +1.1%
Huh. It is super interesting (to me, at least) that caching the set of
hooks that are on disk and should be run makes this *slower* than
without the cache. What's going on there? In p1400.2, I'd expect to see
'ref-hook-map-cache' attain at most a 2.0% slow-down, plus a little bit
to process the hook when it is there, but not much more than that.
I think that this just seems a little contrived to me. I can understand
why server administrators may want this feature, but the general
user-base of Git doesn't seem to stand to benefit much from this change
(in my own mind, at least).
So... I'm not sure where this leaves us. Maybe a 2.0% slow-down on an
already fast 'git update-ref' invocation for the average user won't be
noticeable. It certainly *will* be noticeable to administrators who
processes a much higher volume of such transactions.
> For such a short-lived program like git-update-refs(1), one can see that
> the overhead of using a map negatively impacts performance compared to
> the no-cache case. But using the reftx-cache roughly cuts the overhead
> in half as expected, as we only need to look up the hook once instead of
> twice.
>
> And here's the results if we use a single `git update-ref --stdin` with a
> thousand reference updates at once:
>
> Test master ref-hook-no-cache ref-hook-map-cache ref-hook-reftx-cache
> ------------------------------------------------------------------------------------------------------------------------
> 1400.2: git update-ref --stdin 0.21(0.09+0.12) 0.21(0.07+0.14) +0.0% 0.21(0.07+0.13) +0.0% 0.21(0.07+0.13) +0.0%
>
> As expected, there's nothing much to be seen here because there's only a
> single transaction for all ref updates and, as a result, at most two
> calls to `access(refhook_path, X_OK)`.
Better, but I have to imagine that real-world usage will look much more
like a thousand tiny transactions than one large one.
> Patrick
Thanks,
Taylor
next prev parent reply other threads:[~2020-06-15 16:46 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-02 8:25 [PATCH] refs: implement reference transaction hooks Patrick Steinhardt
2020-06-02 17:47 ` Junio C Hamano
2020-06-03 11:26 ` Patrick Steinhardt
2020-06-07 20:12 ` SZEDER Gábor
2020-06-08 5:36 ` Patrick Steinhardt
2020-06-02 18:09 ` SZEDER Gábor
2020-06-03 9:46 ` Patrick Steinhardt
2020-06-03 12:27 ` [PATCH v2] refs: implement reference transaction hook Patrick Steinhardt
2020-06-03 16:51 ` Taylor Blau
2020-06-04 7:36 ` Patrick Steinhardt
2020-06-15 16:46 ` Taylor Blau [this message]
2020-06-16 5:45 ` Patrick Steinhardt
2020-06-03 17:44 ` Han-Wen Nienhuys
2020-06-03 18:03 ` Junio C Hamano
2020-06-18 10:27 ` [PATCH v3] " Patrick Steinhardt
2020-06-18 22:23 ` Junio C Hamano
2020-06-19 6:56 ` [PATCH v4] " Patrick Steinhardt
2020-10-23 1:03 ` Jeff King
2020-10-23 3:59 ` Junio C Hamano
2020-10-23 19:57 ` Taylor Blau
2020-10-23 22:07 ` Taylor Blau
2020-10-26 7:43 ` Patrick Steinhardt
2020-10-26 23:52 ` Taylor Blau
2020-10-27 5:37 ` Jeff King
2020-10-28 18:22 ` Patrick Steinhardt
2020-10-23 20:00 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200615164639.GD71506@syl.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=ps@pks.im \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).