git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: "Taylor Blau" <me@ttaylorr.com>,
	git@vger.kernel.org, "SZEDER Gábor" <szeder.dev@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [PATCH v2] refs: implement reference transaction hook
Date: Mon, 15 Jun 2020 10:46:39 -0600	[thread overview]
Message-ID: <20200615164639.GD71506@syl.local> (raw)
In-Reply-To: <20200604073632.GA498923@tanuki.pks.im>

Hi Patrick,

Sorry for the slow response. I was out of the office last week and am
only just now getting a chance to catch up on the backlog of emails that
I missed.

On Thu, Jun 04, 2020 at 09:36:32AM +0200, Patrick Steinhardt wrote:
> On Wed, Jun 03, 2020 at 10:51:42AM -0600, Taylor Blau wrote:
> > Hi Patrick,
> >
> > On Wed, Jun 03, 2020 at 02:27:50PM +0200, Patrick Steinhardt wrote:
> > > In order to test the impact on the case where we don't have any
> > > "reference-transaction" hook installed in the repository, this commit
> > > introduces a new performance test for git-update-refs(1). Run against an
> > > empty repository, it produces the following results:
> > >
> > >   Test                                   HEAD~             HEAD
> > >   ------------------------------------------------------------------------------
> > >   1400.2: update existing reference      2.05(1.58+0.54)   2.08(1.58+0.57) +1.5%
> > >   1400.3: create and destroy reference   1.79(1.38+0.49)   1.82(1.39+0.51) +1.7%
> > >
> > > So the overhead is around ~1.5%. Given that git-update-refs(1) is a
> > > near-direct wrapper around reference transactions, there likely is no
> > > other command that is impacted much worse than this.
> >
> > This is a serious performance regression that is worth considering. For
> > example, a 1.5% slow-down on average in reference transactions would
> > cause be enough for me to bisect the regression down to see what
> > changed.
> >
> > What are ways that this could be avoided? I bet that this would work
> > quite well with Emily's work on hooks, where we could check in the
> > configuration first whether a hook is even configured.
> >
> > Could this be integrated with that? If not, could you cache the result
> > of whether or not the hook exists, and/or implement some mechanism to
> > say something like, for eg., "only run reference transaction hooks
> > core.blah = 1" is true?
>
> I wasn't aware of her work until now, so I'll take a look.
>
> Meanwhile, I toyed around with performance and tried out two different
> caching mechanisms:
>
>     - map-cache: `find_hook()` uses a map of hook names mapped to their
>       resolved hook path (or `NULL` if none was found). This is a
>       generic mechanism working across all hooks, but has some overhead
>       in maintaining the map.
>
>     - reftx-cache: `run_transaction_hook()` caches the path returned by
>       `find_hook()`. It's got less overhead as it only caches the path,
>       but obviously only applies to the reference-transaction hook.
>
> In theory, we could go even further and cache the hook's file
> descriptor, executing it via fexecve(3P) whenever it's required, but I
> didn't go down that route yet. This would also solve the atomicity
> issue, though, if the administrator replaces the reference-transactions
> hook halfway through the transaction.
>
> Benchmarking results are mixed, mostly in the sense that I can choose
> which run of the benchmarks I take in order to have my own work look
> better or worse, as timings vary quite a lot between runs. Which
> probably hints at the fact that the benchmarks themselves aren't really
> reliable. The issue is that a single git-update-ref(1) run finishes so
> quick that it's hard to measure with our benchmarks, but spawning
> thousands of them to get something different than 0.0s very much depends
> on the operating system and thus fluctuates. On the other hand, if we
> only spawn a single git-update-refs and have it perform a few thousand
> ref updates, overhead of the hook will not show at all as it is only
> executed once per prepare/commit of the transaction.
>
> The following timings are for the case where we execute
>
>     git update-ref refs/heads/update-branch PRE POST &&
>     git update-ref refs/heads/update-branch POST PRE
>
> respectively
>
>     git update-ref refs/heads/new POST &&
>     git update-ref -d refs/heads/new
>
> a thousand times:
>
> Test                                   master            ref-hook-no-cache       ref-hook-map-cache      ref-hook-reftx-cache
> ------------------------------------------------------------------------------------------------------------------------------
> 1400.2: update existing reference      1.96(1.50+0.53)   2.00(1.54+0.53) +2.0%   2.02(1.54+0.55) +3.1%   1.98(1.52+0.52) +1.0%
> 1400.3: create and destroy reference   1.74(1.33+0.49)   1.77(1.38+0.47) +1.7%   1.77(1.36+0.48) +1.7%   1.76(1.35+0.49) +1.1%

Huh. It is super interesting (to me, at least) that caching the set of
hooks that are on disk and should be run makes this *slower* than
without the cache. What's going on there? In p1400.2, I'd expect to see
'ref-hook-map-cache' attain at most a 2.0% slow-down, plus a little bit
to process the hook when it is there, but not much more than that.

I think that this just seems a little contrived to me. I can understand
why server administrators may want this feature, but the general
user-base of Git doesn't seem to stand to benefit much from this change
(in my own mind, at least).

So... I'm not sure where this leaves us. Maybe a 2.0% slow-down on an
already fast 'git update-ref' invocation for the average user won't be
noticeable. It certainly *will* be noticeable to administrators who
processes a much higher volume of such transactions.

> For such a short-lived program like git-update-refs(1), one can see that
> the overhead of using a map negatively impacts performance compared to
> the no-cache case. But using the reftx-cache roughly cuts the overhead
> in half as expected, as we only need to look up the hook once instead of
> twice.
>
> And here's the results if we use a single `git update-ref --stdin` with a
> thousand reference updates at once:
>
> Test                             master            ref-hook-no-cache       ref-hook-map-cache      ref-hook-reftx-cache
> ------------------------------------------------------------------------------------------------------------------------
> 1400.2: git update-ref --stdin   0.21(0.09+0.12)   0.21(0.07+0.14) +0.0%   0.21(0.07+0.13) +0.0%   0.21(0.07+0.13) +0.0%
>
> As expected, there's nothing much to be seen here because there's only a
> single transaction for all ref updates and, as a result, at most two
> calls to `access(refhook_path, X_OK)`.

Better, but I have to imagine that real-world usage will look much more
like a thousand tiny transactions than one large one.

> Patrick

Thanks,
Taylor

  reply	other threads:[~2020-06-15 16:46 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-02  8:25 [PATCH] refs: implement reference transaction hooks Patrick Steinhardt
2020-06-02 17:47 ` Junio C Hamano
2020-06-03 11:26   ` Patrick Steinhardt
2020-06-07 20:12     ` SZEDER Gábor
2020-06-08  5:36       ` Patrick Steinhardt
2020-06-02 18:09 ` SZEDER Gábor
2020-06-03  9:46   ` Patrick Steinhardt
2020-06-03 12:27 ` [PATCH v2] refs: implement reference transaction hook Patrick Steinhardt
2020-06-03 16:51   ` Taylor Blau
2020-06-04  7:36     ` Patrick Steinhardt
2020-06-15 16:46       ` Taylor Blau [this message]
2020-06-16  5:45         ` Patrick Steinhardt
2020-06-03 17:44   ` Han-Wen Nienhuys
2020-06-03 18:03     ` Junio C Hamano
2020-06-18 10:27 ` [PATCH v3] " Patrick Steinhardt
2020-06-18 22:23   ` Junio C Hamano
2020-06-19  6:56 ` [PATCH v4] " Patrick Steinhardt
2020-10-23  1:03   ` Jeff King
2020-10-23  3:59     ` Junio C Hamano
2020-10-23 19:57       ` Taylor Blau
2020-10-23 22:07         ` Taylor Blau
2020-10-26  7:43       ` Patrick Steinhardt
2020-10-26 23:52         ` Taylor Blau
2020-10-27  5:37           ` Jeff King
2020-10-28 18:22           ` Patrick Steinhardt
2020-10-23 20:00     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200615164639.GD71506@syl.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=ps@pks.im \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).