Re: RFC: A configuration design for future-proofing fsync() configuration

From: Neeraj Singh <nksingh85@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Git List" <git@vger.kernel.org>,
	"Patrick Steinhardt" <ps@pks.im>, "Jeff King" <peff@peff.net>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Neeraj K. Singh" <neerajsi@microsoft.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Eric Wong" <e@80x24.org>, "Christoph Hellwig" <hch@lst.de>,
	"Emily Shaffer" <emilyshaffer@google.com>,
	"Derrick Stolee" <stolee@gmail.com>
Subject: Re: RFC: A configuration design for future-proofing fsync() configuration
Date: Thu, 18 Nov 2021 11:46:59 -0800	[thread overview]
Message-ID: <CANQDOdc+j0PgRJw0bzTLoAnSq=taabXSE4r9jrYczNaBHX9XuQ@mail.gmail.com> (raw)
In-Reply-To: <xmqq35ntxp9y.fsf@gitster.g>

On Thu, Nov 18, 2021 at 11:00 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Neeraj Singh <nksingh85@gmail.com> writes:
>
> > After sleeping on it for a while, I'm willing to consolidate the
> > configuration along the lines that you've specified, but I'd like to
> > reduce the number of degrees of freedom.
> >
> > My proposal in Documentation form:
> >
> > core.fsync::
> > A comma-separated list of parts of the repository which should be hardened by
> > calling fsync when created or modified. When an aggregate option is
> > specified, a subcomponent can be overriden by prefixing it with a '-'. For
> > example, `core.fsync=all,-index` means "fsync everything except the index".
> > Items which are not fsync'ed may be lost in the even of an unclean system
> > shutdown. This setting defaults to `objects,-loose-objects`
> > +
> > * `loose-objects` hardens objects added to the repo in loose-object form.
> > * `packs` hardens objects added to the repo in packfile form and the related
> >   bitmap and index files.
> > * `commit-graph` hardens the commit graph file.
> > * `refs` (future) hardens references when they are modified.
> > * `index` (future) hardens the index when it is modified.
> > * `objects` is an aggregate option that includes `loose-objects`, `packs`, and
> >   `commit-graph`.
> > * `all` is an aggregate option that syncs all individual components above.
> > * `none` is an aggregate option that disables fsync completely.
>
> I wasn't closely following the discussion at all, but the above
> simplification may still even be too fine-grained?  For example,
> what does it mean to care less about the robustness of loose objects
> than packs or ref updates?  How does an existing fine-grained
> classification interact with new classes of filesystem entity we
> will introduce under .git in the future?  Imagine that we didn't
> have .midx and multi-pack bitmap yet; since 'loose-objects',
> 'packs', and 'commit-graph' are the only three groups we can choose
> to place any "objects and reachability" related data in, we need to
> pick one, and choosing 'packs' class may be the choice of least
> resistance, the default kitchen-sync category for anything related
> to "object".  Or just like 'commit-graph' has its own category,
> would we invent a new class and call it 'multi-pack'?
>
> I cannot shake the feeling that these are making everything
> unnecessarily complex and adding more things that we need to explain
> to the end-user---and the worst part is I doubt it would help the
> end-users very much tot understand what gets explained.
 I agree with you that this is fairly complex. I did two things to
come up with this specific list:
1) Looked at what we're fsyncing today (most of these items are being
synced through the CSUM_FSYNC flag).  Some of these things should
perhaps not be fsynced if they are derived metadata that can be
reconstructed easily.  For instance, can the commit-graph file be
recomputed easily enough from the ODB and the refs? How hard is it to
reconstruct the pack-indexes? Maybe they should get their own item.
2) Thought about what's necessary to be able to retrieve data out of
git after a series of commands followed by a system crash.  If I fetch
a repo, modify some worktree files, add the modifications, and then
commit them, what does Git need to persist to not lose unique work?
We need to sync the packfiles since they form the base of the commit
graph and trees and it may be difficult to construct a sane repo if we
have objects without their dependencies. We obviously need to sync the
new objects the user is adding. We need to sync the index after 'add'
in case we crash between 'add' and 'commit', so the user can find
their new objects.  Lastly we need to sync the refs after a commit so
that the user can find the added objects after switching branches.

I also wanted to make sure we could express the current state of
fsyncing so that users could go back to that if the cost of syncing
some particular thing is too high in their workload.

I expect that people should really specify `objects` to sync all of
the things that comprise the object store and leave it to us to decide
which subcomponents are considered derived metadata.

>
> > core.fsyncMethod::
> > A value indicating the strategy Git will use to harden repository data using
> > fsync and related primitives.
> > +
> > * 'default' uses the fsync(2) system call or platform equivalents.
> > * 'batch' uses APIs such as sync_file_range or equivalent to reduce the number
> >   of hardware FLUSH CACHE requests sent to the storage hardware.
> > * 'writeout-only' (future) issues requests to send the writes to the storage
> > * hardware, but does not send any FLUSH CACHE request.
> > * 'syncfs' (future) uses the syncfs API, where available, to sync all of the
> >   files on the same filesystem as the Git repo.
>
> How would an end-user choose among these?  If they assume that the
> version of Git they use is bug-free, is there a reason why they
> should ever pick 'default' over 'batch', for example?  Shouldn't we
> be the one to choose the best approach on the underlying filesystem
> for the users, instead of forcing them to choose?
>
> As implementors, these choices may be of interest and give you a
> handy way to compare different design, but I am not sure if we want
> to give anything more complex than a binary choice, "default" and
> "eatmydata".

Maybe `default` should be renamed `fsync` to indicate the specific
action to be performed and no value can be "let git decide".  I think
the "eatmydata" option would be core.fsync=none and core.fsyncMethod
would then be ignored.  One reason to have this option would be to
allow distributors and organizations to choose a good config based on
the actual filesystem them deploy on.  On Windows, it would be clear
that we should use 'batch' because we're committed to making sure that
setting is actually safe on our filesystems (we'll fix bugs in the FS
if we find out that people are reporting corrupt repos).  Given that
the Linux and POSIX durability situation is really murky, it's hard to
see how Git can give any useful guarantee on those platforms.

>
> > core.fsyncObjectFiles::
> > If `true`, this legacy setting is equivalent to `core.fsync=objects`. If
> > `core.fsync` is explicitly specified, then this setting is ignored.
>
> I think deprecating this very-specific knob is a good idea,
> regardless of how complex we'd want to make the alternative.
>

Glad you agree.  I hope we see others weigh in on the tradeoff between
complexity and control.

Thanks,
Neeraj