git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Neeraj Singh <neerajsi@microsoft.com>,
	git@vger.kernel.org
Subject: Re: [PATCH] refs: sync loose refs to disk before committing them
Date: Wed, 10 Nov 2021 10:16:39 +0100	[thread overview]
Message-ID: <YYuN971dStHrlJh3@ncase> (raw)
In-Reply-To: <YYuEq42toR6mycem@coredump.intra.peff.net>

[-- Attachment #1: Type: text/plain, Size: 3193 bytes --]

On Wed, Nov 10, 2021 at 03:36:59AM -0500, Jeff King wrote:
> On Tue, Nov 09, 2021 at 12:25:46PM +0100, Patrick Steinhardt wrote:
> 
> > So I've finally found the time to have another look at massaging this
> > into the ref_transaction mechanism. If we do want to batch the fsync(3P)
> > calls, then we basically have two different alternatives:
> > 
> >     1. We first lock all loose refs by creating the respective lock
> >        files and writing the updated ref into that file. We keep the
> >        file descriptor open such that we can then flush them all in one
> >        go.
> > 
> >     2. Same as before, we lock all refs and write the updated pointers
> >        into the lockfiles, but this time we close each lockfile after
> >        having written to it. Later, we reopen them all to fsync(3P) them
> >        to disk.
> > 
> > I'm afraid both alternatives aren't any good: the first alternative
> > risks running out of file descriptors if you queue up lots of refs. And
> > the second alternative is going to be slow, especially so on Windows if
> > I'm not mistaken.
> 
> I agree the first is a dead end. I had imagined something like the
> second, but I agree that we'd have to measure the cost of re-opening.
> It's too bad there is not a syscall to sync a particular set of paths
> (or even better, a directory tree recursively).
> 
> There is another option, though: the batch-fsync code for objects does a
> "cheap" fsync while we have the descriptor open, and then later triggers
> a to-disk sync on a separate file. My understanding is that this works
> because modern filesystems will make sure the data write is in the
> journal on the cheap sync, and then the separate-file sync makes sure
> the journal goes to disk.
> 
> We could do something like that here. In fact, if you don't care about
> durability and just filesystem corruption, then you only care about the
> first sync (because the bad case is when the rename gets journaled but
> the data write doesn't).

Ah, interesting. That does sound like a good way forward to me, thanks
for the pointers!

Patrick

> In fact, even if you did choose to re-open and fsync each one, that's
> still sequential. We'd need some way to tell the kernel to sync them all
> at once. The batch-fsync trickery above is one such way (I haven't
> tried, but I wonder if making a bunch of fsync calls in parallel would
> work similarly).
> 
> > So with both not being feasible, we'll likely have to come up with a
> > more complex scheme if we want to batch-sync files. One idea would be to
> > fsync(3P) all lockfiles every $n refs, but it adds complexity in a place
> > where I'd really like to have things as simple as possible. It also
> > raises the question what $n would have to be.
> 
> I do think syncing every $n would not be too hard to implement. It could
> all be hidden behind a state machine API that collects lock files and
> flushes when it sees fit. You'd just call a magic "batch_fsync_and_close"
> instead of "fsync() and close()", though you would have to remember to
> do a final flush call to tell it there are no more coming.
> 
> -Peff

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2021-11-10  9:17 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-04 12:38 [PATCH] refs: sync loose refs to disk before committing them Patrick Steinhardt
2021-11-04 13:14 ` Ævar Arnfjörð Bjarmason
2021-11-04 14:51   ` Patrick Steinhardt
2021-11-04 21:24   ` Junio C Hamano
2021-11-04 22:36     ` Neeraj Singh
2021-11-05  1:40       ` Junio C Hamano
2021-11-05  6:36         ` Jeff King
2021-11-05  8:35       ` Ævar Arnfjörð Bjarmason
2021-11-05  9:04         ` Jeff King
2021-11-05  7:07 ` Jeff King
2021-11-05  7:17   ` Jeff King
2021-11-05  9:12     ` Johannes Schindelin
2021-11-05  9:22       ` Patrick Steinhardt
2021-11-05  9:34       ` Jeff King
2021-11-09 11:25         ` Patrick Steinhardt
2021-11-10  8:36           ` Jeff King
2021-11-10  9:16             ` Patrick Steinhardt [this message]
2021-11-10 11:40 ` [PATCH v2 0/3] " Patrick Steinhardt
2021-11-10 11:40   ` [PATCH v2 1/3] wrapper: handle EINTR in `git_fsync()` Patrick Steinhardt
2021-11-10 14:33     ` Johannes Schindelin
2021-11-10 14:39     ` Ævar Arnfjörð Bjarmason
2021-11-10 11:40   ` [PATCH v2 2/3] wrapper: provide function to sync directories Patrick Steinhardt
2021-11-10 14:40     ` Ævar Arnfjörð Bjarmason
2021-11-10 11:41   ` [PATCH v2 3/3] refs: add configuration to enable flushing of refs Patrick Steinhardt
2021-11-10 14:49     ` Ævar Arnfjörð Bjarmason
2021-11-10 19:15       ` Neeraj Singh
2021-11-10 20:23         ` Ævar Arnfjörð Bjarmason
2021-11-11  0:03           ` Neeraj Singh
2021-11-11 12:14           ` Patrick Steinhardt
2021-11-11 12:06       ` Patrick Steinhardt
2021-11-11  0:18     ` Neeraj Singh
2021-11-10 14:44   ` [PATCH v2 0/3] refs: sync loose refs to disk before committing them Johannes Schindelin
2021-11-10 20:45   ` Jeff King
2021-11-11 11:47     ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YYuN971dStHrlJh3@ncase \
    --to=ps@pks.im \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=neerajsi@microsoft.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).