Re: [PATCH v2 3/3] refs: add configuration to enable flushing of refs

From: Patrick Steinhardt <ps@pks.im>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Neeraj Singh <nksingh85@gmail.com>,
	git@vger.kernel.org, Jeff King <peff@peff.net>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Junio C Hamano <gitster@pobox.com>, Eric Wong <e@80x24.org>,
	"Neeraj K. Singh" <neerajsi@microsoft.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH v2 3/3] refs: add configuration to enable flushing of refs
Date: Thu, 11 Nov 2021 13:14:03 +0100	[thread overview]
Message-ID: <YY0JC/zS/2rqbZIp@ncase> (raw)
In-Reply-To: <211110.8635o3rbdc.gmgdl@evledraar.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5709 bytes --]

On Wed, Nov 10, 2021 at 09:23:04PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Nov 10 2021, Neeraj Singh wrote:
> 
> > On Wed, Nov 10, 2021 at 03:49:02PM +0100, Ævar Arnfjörð Bjarmason wrote:
> >> 
> >> On Wed, Nov 10 2021, Patrick Steinhardt wrote:
[snip]
> >> ...[continued from above]: Again, per my potentially wrong understanding
> >> of syncing a "x" and "y" via an fsync of a subsequent "z" that's
> >> adjacent on the FS to those two.
> >
> > I suspect Patrick is concerned about the case where the worktree is on
> > a separate filesystem from the main repo, so there might be a motivation
> > to sync the worktree refs separately. Is that right, Patrick?
> 
> But he's syncing .git/objects, not .git/worktrees/<NAME>/refs/, no?

That'd be a bug then ;) My intent was to sync .git/refs and the
per-worktree refs.

[snip]
> > In my view there are two separable concerns:
> >
> >     1) What durability do we want for the user's data when an entire 'git'
> >        command completes? I.e. if I run 'git add <1000 files>; git commit' and the
> >        system loses power after both commands return, should I see all of those files
> >        in the committed state of the repo?
> >
> >     2) What internal consistency do we want in the git databases (ODB and REFS) if
> >        we crash midway through a 'git' command? I.e. if 'git add <1000 files>' crashes
> >        before returning, can there be an invalid object or a torn ref state?
> >
> > If were only concerned with (1), then doing a single fsync at the end of the high-level
> > git command would be sufficient. However, (2) requires more fsyncs to provide barriers
> > between different phases internal to the git commands. For instance, we need a barrier
> > between creating the ODB state for a commit (blobs, trees, commit obj) and the refs
> > pointing to it.
> >
> > I am not concerned with a few additional fsyncs for (2). On recent mainstream filesystems/ssds
> > each fsync may cost tens of milliseconds, so the difference between 1 to 3 fsyncs would not
> > be user perceptible. However, going from a couple fsyncs to 1000 fsyncs (one per blob added)
> > would become apparent to the user.
> >
> > The more optimal way to handle consistency and durability is Write-ahead-logging with log
> > replay on crash recovery. That approach can reduce the number of fsyncs in the active workload
> > to at most two (one to sync the log with a commit record and then one before truncating the
> > log after updating the main database). At that point, however, I think it would be best to
> > use an existing database engine with some modifications to create a good data layout for git.
> 
> I think that git should safely store your data by default, we clearly
> don't do that well enough in some cases, and should fix it.
> 
> But there's also cases where people would much rather have performance,
> and I think we should support that. E.g. running git in CI, doing a
> one-off import/fetch that you'll invoke "sync(1)" yourself after
> etc. This is the direction Eric Wong's patches are going into.
> 
> I understand from his patches/comments that you're not correct that just
> 1-3 fsyncs are OK, i.e. for some use-cases 0 is OK, and any >0 is too
> many, since it'll force a flush of the whole disk or something.
> 
> Even when git is is operating in a completely safe mode I think we'd
> still have license to play it fast and loose with /some/ user data,
> because users don't really care about all of their data in the .git/
> directory.

I think we need to discern two important cases: the first case is us
losing data, and the second case is us leaving the repository in a
corrupt state. I'm okay-ish with the first case: if your machine crashes
it's not completely unexpected that losing data would be a natural
consequence (well, losing the data that's currently in transit at least).

But we should make sure that we're not leaving the repo in a corrupt
state under any circumstance, and that's exactly what can happen right
now because we don't flush refs to disk before doing the renames.
Assuming we've got semantics correct, we are thus forced to do a page
cache flush to make sure that data is on disk before doing a rename to
not corrupt the repo. But in the case where we're fine with losing some
data, we may skip doing the final fsync to also synchronize the rename
to disk.

Patrick

> I.e. you do care about the *.pack file, but an associated *.bitmap is a
> derived file, so we probably don't need to fsync that too. Is it worth
> worrying about? Probably not, but that's what I had in mind with the
> above-linked proposed config schema.
> 
> Similarly for say writing 1k loose objects I think it would be
> completely fine to end up with a corrupt object during a "git fetch", as
> long as we also guarantee that we can gracefully recover from that
> corruption.
> 
> I.e. 1) we didn't update the ref(s) involved 2) we know the FS has the
> sorts of properties you're aiming for with "batch".
> 
> Now, as some patches I've had to object*.c recently show we don't handle
> those cases gracefully either. I.e. if we find a short loose object
> we'll panic, even if we'd be perfectly capable of getting it from
> elsewhere, and nothing otherwise references it.
> 
> But if we fixed that and gc/fsck/fetch etc. learned to deal with that
> content I don't see why wouldn't e.g. default to not fsyncing loose
> objects when invoked from say "fetch" by default, depending on FS/OS
> detection, and if we couldn't say it was safe defaulted to some "POSIX"
> mode that would be pedantic but slow and safe.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]