All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Neeraj Singh <nksingh85@gmail.com>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Git List" <git@vger.kernel.org>,
	"Patrick Steinhardt" <ps@pks.im>, "Jeff King" <peff@peff.net>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Neeraj K. Singh" <neerajsi@microsoft.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Eric Wong" <e@80x24.org>, "Christoph Hellwig" <hch@lst.de>,
	"Emily Shaffer" <emilyshaffer@google.com>
Subject: Re: RFC: A configuration design for future-proofing fsync() configuration
Date: Thu, 18 Nov 2021 11:00:25 -0800	[thread overview]
Message-ID: <xmqq35ntxp9y.fsf@gitster.g> (raw)
In-Reply-To: <CANQDOdcdhfGtPg0PxpXQA5gQ4x9VknKDKCCi4HEB0Z1xgnjKzg@mail.gmail.com> (Neeraj Singh's message of "Wed, 17 Nov 2021 14:16:35 -0800")

Neeraj Singh <nksingh85@gmail.com> writes:

> After sleeping on it for a while, I'm willing to consolidate the
> configuration along the lines that you've specified, but I'd like to
> reduce the number of degrees of freedom.
>
> My proposal in Documentation form:
>
> core.fsync::
> A comma-separated list of parts of the repository which should be hardened by
> calling fsync when created or modified. When an aggregate option is
> specified, a subcomponent can be overriden by prefixing it with a '-'. For
> example, `core.fsync=all,-index` means "fsync everything except the index".
> Items which are not fsync'ed may be lost in the even of an unclean system
> shutdown. This setting defaults to `objects,-loose-objects`
> +
> * `loose-objects` hardens objects added to the repo in loose-object form.
> * `packs` hardens objects added to the repo in packfile form and the related
>   bitmap and index files.
> * `commit-graph` hardens the commit graph file.
> * `refs` (future) hardens references when they are modified.
> * `index` (future) hardens the index when it is modified.
> * `objects` is an aggregate option that includes `loose-objects`, `packs`, and
>   `commit-graph`.
> * `all` is an aggregate option that syncs all individual components above.
> * `none` is an aggregate option that disables fsync completely.

I wasn't closely following the discussion at all, but the above
simplification may still even be too fine-grained?  For example,
what does it mean to care less about the robustness of loose objects
than packs or ref updates?  How does an existing fine-grained
classification interact with new classes of filesystem entity we
will introduce under .git in the future?  Imagine that we didn't
have .midx and multi-pack bitmap yet; since 'loose-objects',
'packs', and 'commit-graph' are the only three groups we can choose
to place any "objects and reachability" related data in, we need to
pick one, and choosing 'packs' class may be the choice of least
resistance, the default kitchen-sync category for anything related
to "object".  Or just like 'commit-graph' has its own category,
would we invent a new class and call it 'multi-pack'?

I cannot shake the feeling that these are making everything
unnecessarily complex and adding more things that we need to explain
to the end-user---and the worst part is I doubt it would help the
end-users very much tot understand what gets explained.

> core.fsyncMethod::
> A value indicating the strategy Git will use to harden repository data using
> fsync and related primitives.
> +
> * 'default' uses the fsync(2) system call or platform equivalents.
> * 'batch' uses APIs such as sync_file_range or equivalent to reduce the number
>   of hardware FLUSH CACHE requests sent to the storage hardware.
> * 'writeout-only' (future) issues requests to send the writes to the storage
> * hardware, but does not send any FLUSH CACHE request.
> * 'syncfs' (future) uses the syncfs API, where available, to sync all of the
>   files on the same filesystem as the Git repo.

How would an end-user choose among these?  If they assume that the
version of Git they use is bug-free, is there a reason why they
should ever pick 'default' over 'batch', for example?  Shouldn't we
be the one to choose the best approach on the underlying filesystem
for the users, instead of forcing them to choose?

As implementors, these choices may be of interest and give you a
handy way to compare different design, but I am not sure if we want
to give anything more complex than a binary choice, "default" and
"eatmydata".

> core.fsyncObjectFiles::
> If `true`, this legacy setting is equivalent to `core.fsync=objects`. If
> `core.fsync` is explicitly specified, then this setting is ignored.

I think deprecating this very-specific knob is a good idea,
regardless of how complex we'd want to make the alternative.

Thanks.

  reply	other threads:[~2021-11-18 19:00 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-10 15:09 RFC: A configuration design for future-proofing fsync() configuration Ævar Arnfjörð Bjarmason
2021-11-11  0:47 ` Neeraj Singh
2021-11-11  0:57   ` Ævar Arnfjörð Bjarmason
2021-11-17 22:16     ` Neeraj Singh
2021-11-18 19:00       ` Junio C Hamano [this message]
2021-11-18 19:46         ` Neeraj Singh
2021-11-12  5:54   ` Christoph Hellwig
2021-11-17 18:49     ` Neeraj Singh
2021-11-11 18:03 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq35ntxp9y.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=e@80x24.org \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=hch@lst.de \
    --cc=neerajsi@microsoft.com \
    --cc=nksingh85@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.