All of lore.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Jeff King <peff@peff.net>, git@vger.kernel.org
Subject: Re: [PATCH] refs: sync loose refs to disk before committing them
Date: Fri, 5 Nov 2021 10:22:23 +0100	[thread overview]
Message-ID: <YYT3zx+P5maUAu5K@ncase> (raw)
In-Reply-To: <nycvar.QRO.7.76.6.2111051010070.56@tvgsbejvaqbjf.bet>

[-- Attachment #1: Type: text/plain, Size: 5107 bytes --]

On Fri, Nov 05, 2021 at 10:12:25AM +0100, Johannes Schindelin wrote:
> Hi Peff & Patrick,
> 
> On Fri, 5 Nov 2021, Jeff King wrote:
> 
> > On Fri, Nov 05, 2021 at 03:07:18AM -0400, Jeff King wrote:
> >
> > >   2. It's not clear what the performance implications will be,
> > >      especially on a busy server doing a lot of ref updates, or on a
> > >      filesystem where fsync() ends up syncing everything, not just the
> > >      one file (my impression is ext3 is such a system, but not ext4).
> > >      Whereas another solution may be journaling data and metadata writes
> > >      in order without worrying about the durability of writing them to
> > >      disk.
> > >
> > >      I suspect for small updates (say, a push of one or two refs), this
> > >      will have little impact. We'd generally fsync the incoming packfile
> > >      and its idx anyway, so we're adding may one or two fsyncs on top of
> > >      that. But if you're pushing 100 refs, that will be 100 sequential
> > >      fsyncs, which may add up to quite a bit of latency. It would be
> > >      nice if we could batch these by somehow (e.g., by opening up all of
> > >      the lockfiles, writing and fsyncing them, and then renaming one by
> > >      one).
> >
> > So here's a quick experiment that shows a worst case: a small push that
> > updates a bunch of refs. After building Git with and without your patch,
> > I set up a small repo like:
> >
> >   git init
> >   git commit --allow-empty -m foo
> >   for i in $(seq 100); do
> >     git update-ref refs/heads/$i HEAD
> >   done
> >
> > To give a clean slate between runs, I stuck this in a script called
> > "setup":
> >
> >   #!/bin/sh
> >   rm -rf dst.git
> >   git init --bare dst.git
> >   sync
> >
> > And then ran:
> >
> >   $ hyperfine -L v orig,fsync -p ./setup '/tmp/{v}/bin/git push dst.git refs/heads/*'
> >   Benchmark 1: /tmp/orig/bin/git push dst.git refs/heads/*
> >     Time (mean ± σ):       9.9 ms ±   0.2 ms    [User: 6.3 ms, System: 4.7 ms]
> >     Range (min … max):     9.5 ms …  10.5 ms    111 runs
> >
> >   Benchmark 2: /tmp/fsync/bin/git push dst.git refs/heads/*
> >     Time (mean ± σ):     401.0 ms ±   7.7 ms    [User: 9.4 ms, System: 15.2 ms]
> >     Range (min … max):   389.4 ms … 412.4 ms    10 runs
> >
> >   Summary
> >     '/tmp/orig/bin/git push dst.git refs/heads/*' ran
> >      40.68 ± 1.16 times faster than '/tmp/fsync/bin/git push dst.git refs/heads/*'
> >
> > So it really does produce a noticeable impact (this is on a system with
> > a decent SSD and no other disk load, so I'd expect it to be about
> > average for modern hardware).
> >
> > Now this test isn't entirely fair. 100 refs is a larger than average
> > number to be pushing, and the effect is out-sized because there's
> > virtually no time spent dealing with the objects themselves, nor is
> > there any network latency. But 400ms feels like a non-trivial amount of
> > time just in absolute numbers.
> >
> > The numbers scale pretty linearly, as you'd expect. Pushing 10 refs
> > takes ~40ms, 100 takes ~400ms, and 1000 takes ~4s. The non-fsyncing
> > version gets slower, too (there's more work to do), but much more slowly
> > (6ms, 10ms, and 50ms respectively).
> >
> > So this will definitely hurt at edge / pathological cases.
> 
> Ouch.
> 
> I wonder whether this could be handled similarly to the
> `core.fsyncObjectFiles=batch` mode that has been proposed in
> https://lore.kernel.org/git/pull.1076.v8.git.git.1633366667.gitgitgadget@gmail.com/
> 
> Essentially, we would have to find a better layer to do this, where we
> can synchronize after a potentially quite large number of ref updates has
> happened. That would definitely be a different layer than the file-based
> refs backend, of course, and would probably apply in a different way to
> other refs backends.
> 
> Ciao,
> Dscho

Good question. We'd likely have to make this part of the ref_transaction
layer to make this work without having to update all callers. It would
likely work something like the following:

    1. The caller queues up all ref updates.

    2. The caller moves the transaction into prepared state, which
       creates all the lockfiles for us.

    3. After all lockfiles have been created, we must sync them to disk.
       This may either happen when preparing the transaction, or when
       committing it. I think it's better located in prepare though so
       that the likelihood that the transaction succeeds after a
       successful prepare is high.

    4. Now we can rename all files into place.

Naturally, we'd only benefit from this batching in case the caller
actually queues up all ref updates into a single transaction. And
there's going to be at least some which don't by default. The first that
comes to my mind is git-fetch(1), which explicitly requires the user to
use `--atomic` to queue them all up into a single transaction. And I
guess there's going to be others which use one transaction per ref.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2021-11-05  9:22 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-04 12:38 [PATCH] refs: sync loose refs to disk before committing them Patrick Steinhardt
2021-11-04 13:14 ` Ævar Arnfjörð Bjarmason
2021-11-04 14:51   ` Patrick Steinhardt
2021-11-04 21:24   ` Junio C Hamano
2021-11-04 22:36     ` Neeraj Singh
2021-11-05  1:40       ` Junio C Hamano
2021-11-05  6:36         ` Jeff King
2021-11-05  8:35       ` Ævar Arnfjörð Bjarmason
2021-11-05  9:04         ` Jeff King
2021-11-05  7:07 ` Jeff King
2021-11-05  7:17   ` Jeff King
2021-11-05  9:12     ` Johannes Schindelin
2021-11-05  9:22       ` Patrick Steinhardt [this message]
2021-11-05  9:34       ` Jeff King
2021-11-09 11:25         ` Patrick Steinhardt
2021-11-10  8:36           ` Jeff King
2021-11-10  9:16             ` Patrick Steinhardt
2021-11-10 11:40 ` [PATCH v2 0/3] " Patrick Steinhardt
2021-11-10 11:40   ` [PATCH v2 1/3] wrapper: handle EINTR in `git_fsync()` Patrick Steinhardt
2021-11-10 14:33     ` Johannes Schindelin
2021-11-10 14:39     ` Ævar Arnfjörð Bjarmason
2021-11-10 11:40   ` [PATCH v2 2/3] wrapper: provide function to sync directories Patrick Steinhardt
2021-11-10 14:40     ` Ævar Arnfjörð Bjarmason
2021-11-10 11:41   ` [PATCH v2 3/3] refs: add configuration to enable flushing of refs Patrick Steinhardt
2021-11-10 14:49     ` Ævar Arnfjörð Bjarmason
2021-11-10 19:15       ` Neeraj Singh
2021-11-10 20:23         ` Ævar Arnfjörð Bjarmason
2021-11-11  0:03           ` Neeraj Singh
2021-11-11 12:14           ` Patrick Steinhardt
2021-11-11 12:06       ` Patrick Steinhardt
2021-11-11  0:18     ` Neeraj Singh
2021-11-10 14:44   ` [PATCH v2 0/3] refs: sync loose refs to disk before committing them Johannes Schindelin
2021-11-10 20:45   ` Jeff King
2021-11-11 11:47     ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YYT3zx+P5maUAu5K@ncase \
    --to=ps@pks.im \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.