Re: RFC: A configuration design for future-proofing fsync() configuration

From: Neeraj Singh <nksingh85@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Git List <git@vger.kernel.org>, Patrick Steinhardt <ps@pks.im>,
	Jeff King <peff@peff.net>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Junio C Hamano <gitster@pobox.com>,
	"Neeraj K. Singh" <neerajsi@microsoft.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Eric Wong <e@80x24.org>, Christoph Hellwig <hch@lst.de>,
	Emily Shaffer <emilyshaffer@google.com>
Subject: Re: RFC: A configuration design for future-proofing fsync() configuration
Date: Wed, 17 Nov 2021 14:16:35 -0800	[thread overview]
Message-ID: <CANQDOdcdhfGtPg0PxpXQA5gQ4x9VknKDKCCi4HEB0Z1xgnjKzg@mail.gmail.com> (raw)
In-Reply-To: <211111.86pmr7pk9o.gmgdl@evledraar.gmail.com>

On Wed, Nov 10, 2021 at 5:13 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> On Wed, Nov 10 2021, Neeraj Singh wrote:
>
> > On Wed, Nov 10, 2021 at 04:09:33PM +0100, Ævar Arnfjörð Bjarmason wrote:
> >> I think this sort of config schema would make everyone above happy
> >>
> >> It would:
> >>
> >>  A) Be easy to extend for any future fsync behavior we'd reasonably
> >>     implement
> >>
> >>  B) Not make older git versions die. It's fine if they warn(), but not die.
> >>
> >>  C) Has some pretty contrived key names, but I'm trying to maintain the
> >>     constraint that you can set both fsck.X=Y and
> >>     e.g. fetch.fsck.X=Y. I.e. we should be able to configure things
> >>     globally *and* per-command, like color.*, fsck.* etc.
> >>
> >> Proposal:
> >>
> >>   ; Turns on/off all fsync, whatever the method is. I.e. allows you to
> >>   ; never make any fsync() calls whatsoever (which we have another
> >>   ; in-flight topic for).
> >>
> >>   ; The "false" was controversial, and we could just leave it
> >>   ; unimplemented
> >>   core.fsync = <bool>
> >>
> >>   ; Optional, by default we'd use the most pedantic (I'd call our
> >>   ; current "loose", whether we want to forward-support it is another
> >>   ; matter.
> >>   ;
> >>   ; Whatever names we pick an option like this should ignore (or at most
> >>   ; warn about) values it doesn't know about, not hard die on it.
> >>   ;
> >>   ; Here "bach" is what Neeraj and Patrick are pursuing, a hypothetical
> >>   ; POSIX would be a pedantic way of exhaustively fsyncing everything.
> >>   ;
> >>   ; We'd leave door open to e.g. setting it to "linux:ext4" or whatever,
> >>   ; to do only the work needed on some specific popular FS
> >>   core.fsyncMethod = loose | POSIX | batch | linux:ext4 | NTFS | ...
> >>
> >>   ; Turn on or off entire categories of files we'd like to sync. This
> >>   ; way Neeraj's and Patrick's approach would be to set
> >>   ; core.fsyncMethod=batch, and then core.fsyncGroup=files &
> >>   ; core.fsyncGroup=refs.
> >>
> >>   ; If we learn about a new core.fsyncGroup = xyz in the future a <bool>
> >>   ; in "core.fsyncGroupDefault" will prevail. I.e. if true it's
> >>   ; included, if false not.
> >>   ;
> >>   ; Whether "false" or "true" is the default depends on
> >>   ; core.fsyncMethod. For POSIX it would be true, for "loose" it's
> >>   ; false.
> >>   core.fsyncGroup = files
> >>   core.fsyncGroup = refs
> >>   core.fsyncGroup = objects
> >>
> >> I'm not sure I like calling it "group". Maybe "class", "category"? Doing
> >> it with this structure is extensible to the two-level keys, as noted
> >> above.
> >>
> >>   ; Our existing config knob. When "false" synonymous with:
> >>   ;
> >>   ;     core.fsync = true
> >>   ;     core.fsyncMethod = loose
> >>   ;     core.fsyncGroup = pack
> >>   ;
> >>   ; When "true" synonymous with the same as the above, plus:
> >>   ;     core.fsyncGroup = loose
> >>   ;
> >>   : Or something like that. I.e. we'll fsync *.pack, *.bitmap etc, and ;
> >>   ; probably some other stuff, but not loose objects etc.
> >>   ;
> >>   ; Whatever we fsync now exactly this schema should be generic enough
> >>   ; to support it.
> >>   core.fsyncObjectFiles = <bool>
> >>
> >>   ; A namespace for core.fsyncMethod = <X>. Specific methods will
> >>   ; own this namespace and can configure whatever they want.
> >>   fsyncMethod.<x>.<a> = <b>
> >>
> >> E.g. we might have:
> >>
> >>   fsyncMethod.POSIX.content = true
> >>   fsyncMethod.POSIX.metadata = false
> >>
> >> If we know we'd like to (depending on other config) to fsync things
> >> exhaustively or not, but do different things depending on file content
> >> or metadata. I.e. maybe your FS's fsync() on a file fd always implies a
> >> sync of the metadata, and maybe not.
> >>
> >>   ; Change whatever fsync configuration you want per-command, similar to
> >>   ; fsck.* and fetch.fsck.*
> >>   transfer.fsyncGroup=*
> >>   fetch.fsyncGroup=*
> >>   ...
> >>
> >> 1. https://lore.kernel.org/git/211110.86v910gi9a.gmgdl@evledraar.gmail.com/
> >> 2. https://lore.kernel.org/git/20211028002102.19384-1-e@80x24.org/
> >> 3. https://lore.kernel.org/git/cover.1636544377.git.ps@pks.im/
> > Hi Ævar,
> >
> > Thanks for noticing the backwards compatibility issue with the 'batch' flag. I
> > agree that we need to fix that before committing my changes to master.
> >
> > I'm hoping that we can agree to a version of what you're proposing, but my
> > preference would be to cut out the more granular controls. I'd prefer to see
> > just:
> >       core.fsync = [bool]             - Turn fsyncing on or off.
> >       core.fsyncMethod = [string]     - Controls how it's done (with a non-fatal warn on unrecognized values).
> >       core.fsyncObjectFiles = [bool]  - Sets core.fsync if that setting doesn't already have a value. For back-compat.
>
> I'm fine with something simpler as long as we don't think we'll
> plausibly start painting ourselves into a corner.
>
> But core.fsyncObjectFiles is *not* a setting of a "core.fsync" in the
> sense that Eric suggested we have.
>
> I.e. it's effectively a sort of early and partial Linux-only version of
> what your "batch" mode is. I.e. to skip fsyncing the loose object files,
> and only fsync() the eventual refs we write.
>
> "Sort of" because we'd e.g. fsync packs unconditionally etc, but if we
> make core.fsyncObjectFiles=false be core.fsync=false then we can't have
> a "real" core.fsync=false, i.e. one that guarantees no fsync() calls at
> all.
>
> We could also simply decide that it's a bad setting and we're going to
> deprecate it, but another way is having a generic config layout that can
> express what it's doing and more.
>
> > I don't think either we or the users should have to reason about what it means
> > for some parts of the repo to be fsynced and others not to be. If core.fsync is
> > 'false' and someone gets a weird state after a system crash, no one should be
> > surprised.
>
> Yes. I'm fine with leaving this on the table. I should have be more
> explicit that I'm not suggesting we implement all this exhaustive config
> support, but if we imagine a sensible config schema that is extensible
> (my proposal may or may not be that) then we can implement just 1-2
> variables in it and know that we have room to grow in the future.
>
> > If core.fsync is 'true', and people are running on a reasonable
> > common filesystem, we should be trying to give decent performance and good
> > durability.
>
> Yeah, I just wonder if we can easily provide config to have people
> decide that trade-off themselves.
>
> E.g. from the performance numbers in [1] I might turn off fsyncing when
> we write anything in the working tree.
>
> We don't do that particular thing now, but if we're being pulled in one
> direction of always being fsync-safe by default...
>
> I can also see it being useful to e.g. do:
>
>     gc.fsync = false
>
> Or blacklist other similar batch operations, although with a global knob
> that can also rather easily be:
>
>     git -c core.fsync gc
>
> So maybe the whole "fsck" rationale doesn't apply here.
>
> > It would be nice to loop in some Linux fs developers to find out what can be
> > done on current implementations to get the durability without terrible
> > performance. From reading the docs and mailing threads it looks like the
> > sync_file_range + bulk fsync approach should actually work on the current XFS
> > implementation.
>

After sleeping on it for a while, I'm willing to consolidate the
configuration along the lines that you've specified, but I'd like to
reduce the number of degrees of freedom.

My proposal in Documentation form:

core.fsync::
A comma-separated list of parts of the repository which should be hardened by
calling fsync when created or modified. When an aggregate option is
specified, a subcomponent can be overriden by prefixing it with a '-'. For
example, `core.fsync=all,-index` means "fsync everything except the index".
Items which are not fsync'ed may be lost in the even of an unclean system
shutdown. This setting defaults to `objects,-loose-objects`
+
* `loose-objects` hardens objects added to the repo in loose-object form.
* `packs` hardens objects added to the repo in packfile form and the related
  bitmap and index files.
* `commit-graph` hardens the commit graph file.
* `refs` (future) hardens references when they are modified.
* `index` (future) hardens the index when it is modified.
* `objects` is an aggregate option that includes `loose-objects`, `packs`, and
  `commit-graph`.
* `all` is an aggregate option that syncs all individual components above.
* `none` is an aggregate option that disables fsync completely.

core.fsyncMethod::
A value indicating the strategy Git will use to harden repository data using
fsync and related primitives.
+
* 'default' uses the fsync(2) system call or platform equivalents.
* 'batch' uses APIs such as sync_file_range or equivalent to reduce the number
  of hardware FLUSH CACHE requests sent to the storage hardware.
* 'writeout-only' (future) issues requests to send the writes to the storage
* hardware, but does not send any FLUSH CACHE request.
* 'syncfs' (future) uses the syncfs API, where available, to sync all of the
  files on the same filesystem as the Git repo.

core.fsyncObjectFiles::
If `true`, this legacy setting is equivalent to `core.fsync=objects`. If
`core.fsync` is explicitly specified, then this setting is ignored.

Thanks,
Neeraj