git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Neeraj Singh <nksingh85@gmail.com>
Cc: gitgitgadget@gmail.com, Johannes.Schindelin@gmx.de,
	bagasdotme@gmail.com, git@vger.kernel.org,
	neerajsi@microsoft.com, newren@gmail.com, ps@pks.im,
	rsbecker@nexbridge.com, sandals@crustytoothpaste.net
Subject: do we have too much fsync() configuration in 'next'? (was: [PATCH v7] core.fsync: documentation and user-friendly aggregate options)
Date: Wed, 23 Mar 2022 15:20:52 +0100	[thread overview]
Message-ID: <220323.86fsn8ohg8.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <20220315191245.17990-1-neerajsi@microsoft.com>


On Tue, Mar 15 2022, Neeraj Singh wrote:

I know this is probably 80% my fault by egging you on about initially
adding the wildmatch() based thing you didn't go for.

But having looked at this with fresh eyes quite deeply I really think
we're severely over-configuring things here:

> +core.fsync::
> +	A comma-separated list of components of the repository that
> +	should be hardened via the core.fsyncMethod when created or
> +	modified.  You can disable hardening of any component by
> +	prefixing it with a '-'.  Items that are not hardened may be
> +	lost in the event of an unclean	system shutdown. Unless you
> +	have special requirements, it is recommended that you leave
> +	this option empty or pick one of `committed`, `added`,
> +	or `all`.
> ++
> +When this configuration is encountered, the set of components starts with
> +the platform default value, disabled components are removed, and additional
> +components are added. `none` resets the state so that the platform default
> +is ignored.
> ++
> +The empty string resets the fsync configuration to the platform
> +default. The default on most platforms is equivalent to
> +`core.fsync=committed,-loose-object`, which has good performance,
> +but risks losing recent work in the event of an unclean system shutdown.
> ++
> +* `none` clears the set of fsynced components.
> +* `loose-object` hardens objects added to the repo in loose-object form.
> +* `pack` hardens objects added to the repo in packfile form.
> +* `pack-metadata` hardens packfile bitmaps and indexes.
> +* `commit-graph` hardens the commit graph file.
> +* `index` hardens the index when it is modified.
> +* `objects` is an aggregate option that is equivalent to
> +  `loose-object,pack`.
> +* `derived-metadata` is an aggregate option that is equivalent to
> +  `pack-metadata,commit-graph`.
> +* `committed` is an aggregate option that is currently equivalent to
> +  `objects`. This mode sacrifices some performance to ensure that work
> +  that is committed to the repository with `git commit` or similar commands
> +  is hardened.
> +* `added` is an aggregate option that is currently equivalent to
> +  `committed,index`. This mode sacrifices additional performance to
> +  ensure that the results of commands like `git add` and similar operations
> +  are hardened.
> +* `all` is an aggregate option that syncs all individual components above.
> +
>  core.fsyncMethod::
>  	A value indicating the strategy Git will use to harden repository data
>  	using fsync and related primitives.

On top of my
https://lore.kernel.org/git/RFC-patch-v2-7.7-a5951366c6e-20220323T140753Z-avarab@gmail.com/
which makes the tmp-objdir part of your not-in-next-just-seen follow-up
series configurable via "fsyncMethod.batch.quarantine" I really think we
should just go for something like the belwo patch (note that
misspelled/mistook "bulk" for "batch" in that linked-t patch, fixed
below.

I.e. I think we should just do our default fsync() of everything, and
probably SOON make the fsync-ing of loose objects the default. Those who
care about performance will have "batch" (or "writeout-only"), which we
can have OS-specific detections for.

But really, all of the rest of this is unduly boxing us into
overconfiguration that I think nobody really needs.

If someone really needs this level of detail they can LD_PRELOAD
something to have fsync intercept fd's and paths, and act appropriately.

Worse, as the RFC series I sent
(https://lore.kernel.org/git/RFC-cover-v2-0.7-00000000000-20220323T140753Z-avarab@gmail.com/)
shows we can and should "batch" up fsync() operations across these
configuration boundaries, which this level of configuration would seem
to preclude.

Or, we'd need to explain why "core.fsync=loose-object" won't *actually*
call fsync() on a single loose object's fd under "batch" as I had to do
on top of this in
https://lore.kernel.org/git/RFC-patch-v2-6.7-c20301d7967-20220323T140753Z-avarab@gmail.com/

The same is going to apply for almost all of the rest of these
configuration categories.

I.e. a natural follow-up to e.g. batching across objects & index as I'm
doing in
https://lore.kernel.org/git/RFC-patch-v2-4.7-61f4f3d7ef4-20220323T140753Z-avarab@gmail.com/
is to do likewise for all the PACK-related stuff before we rename it
in-place. Or even have "git gc" issue only a single fsync() for all of
PACKs, their metadata files, commit-graph etc., and then rename() things
in-place as appropriate afterwards.

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index 365a12dc7ae..536238e209b 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -548,49 +548,35 @@ core.whitespace::
   errors. The default tab width is 8. Allowed values are 1 to 63.
 
 core.fsync::
-	A comma-separated list of components of the repository that
-	should be hardened via the core.fsyncMethod when created or
-	modified.  You can disable hardening of any component by
-	prefixing it with a '-'.  Items that are not hardened may be
-	lost in the event of an unclean	system shutdown. Unless you
-	have special requirements, it is recommended that you leave
-	this option empty or pick one of `committed`, `added`,
-	or `all`.
-+
-When this configuration is encountered, the set of components starts with
-the platform default value, disabled components are removed, and additional
-components are added. `none` resets the state so that the platform default
-is ignored.
-+
-The empty string resets the fsync configuration to the platform
-default. The default on most platforms is equivalent to
-`core.fsync=committed,-loose-object`, which has good performance,
-but risks losing recent work in the event of an unclean system shutdown.
-+
-* `none` clears the set of fsynced components.
-* `loose-object` hardens objects added to the repo in loose-object form.
-* `pack` hardens objects added to the repo in packfile form.
-* `pack-metadata` hardens packfile bitmaps and indexes.
-* `commit-graph` hardens the commit graph file.
-* `index` hardens the index when it is modified.
-* `objects` is an aggregate option that is equivalent to
-  `loose-object,pack`.
-* `derived-metadata` is an aggregate option that is equivalent to
-  `pack-metadata,commit-graph`.
-* `committed` is an aggregate option that is currently equivalent to
-  `objects`. This mode sacrifices some performance to ensure that work
-  that is committed to the repository with `git commit` or similar commands
-  is hardened.
-* `added` is an aggregate option that is currently equivalent to
-  `committed,index`. This mode sacrifices additional performance to
-  ensure that the results of commands like `git add` and similar operations
-  are hardened.
-* `all` is an aggregate option that syncs all individual components above.
+	A boolen defaulting to `true`. To ensure data integrity git
+	will fsync() its objects, index and refu updates etc. This can
+	be set to `false` to disable `fsync()`-ing.
++
+Only set this to `false` if you know what you're doing, and are
+prepared to deal with data corruption. Valid use-cases include
+throwaway uses of repositories on ramdisks, one-off mass-imports
+followed by calling `sync(1)` etc.
++
+Note that the syncing of loose objects is currently excluded from
+`core.fsync=true`. To turn on all fsync-ing you'll need
+`core.fsync=true` and `core.fsyncObjectFiles=true`, but see
+`core.fsyncMethod=batch` below for a much faster alternative that's
+just as safe on various modern OS's.
++
+The default is in flux and may change in the future, in particular the
+equivalent of the already-deprecated `core.fsyncObjectFiles` setting
+might soon default to `true`, and `core.fsyncMethod`'s default of
+`fsync` might default to a setting deemed to be safe on the local OS,
+suc has `batch` or `writeout-only`
 
 core.fsyncMethod::
 	A value indicating the strategy Git will use to harden repository data
 	using fsync and related primitives.
 +
+Defaults to `fsync`, but as discussed for `core.fsync` above might
+change to use one of the values below taking advantage of
+platform-specific "faster `fsync()`".
++
 * `fsync` uses the fsync() system call or platform equivalents.
 * `writeout-only` issues pagecache writeback requests, but depending on the
   filesystem and storage hardware, data added to the repository may not be
@@ -680,8 +666,8 @@ backed up by any standard (e.g. POSIX), but worked in practice on some
 Linux setups.
 +
 Nowadays you should almost certainly want to use
-`core.fsync=loose-object` instead in combination with
-`core.fsyncMethod=bulk`, and possibly with
+`core.fsync=true` instead in combination with
+`core.fsyncMethod=batch`, and possibly with
 `fsyncMethod.batch.quarantine=true`, see above. On modern OS's (Linux,
 OSX, Windows) that gives you most of the performance benefit of
 `core.fsyncObjectFiles=false` with all of the safety of the old

  parent reply	other threads:[~2022-03-23 14:46 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-04  3:28 [PATCH 0/2] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2021-12-04  3:28 ` [PATCH 1/2] fsync: add writeout-only mode for fsyncing repo data Neeraj Singh via GitGitGadget
2021-12-06  7:54   ` Neeraj Singh
2021-12-04  3:28 ` [PATCH 2/2] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-07  2:46 ` [PATCH v2 0/3] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2021-12-07  2:46   ` [PATCH v2 1/3] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2021-12-07 11:44     ` Patrick Steinhardt
2021-12-07 12:14       ` Ævar Arnfjörð Bjarmason
2021-12-07 23:29       ` Neeraj Singh
2021-12-07 12:18     ` Ævar Arnfjörð Bjarmason
2021-12-07 23:58       ` Neeraj Singh
2021-12-07  2:46   ` [PATCH v2 2/3] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-07 11:53     ` Patrick Steinhardt
2021-12-07 20:46       ` Neeraj Singh
2021-12-07 12:29     ` Ævar Arnfjörð Bjarmason
2021-12-07 21:44       ` Neeraj Singh
2021-12-08 10:05         ` Ævar Arnfjörð Bjarmason
2021-12-09  0:14           ` Neeraj Singh
2021-12-09  0:44             ` Junio C Hamano
2021-12-09  4:08             ` Ævar Arnfjörð Bjarmason
2021-12-09  6:18               ` Neeraj Singh
2022-01-18 23:50                 ` Neeraj Singh
2022-01-19 15:28                   ` Ævar Arnfjörð Bjarmason
2022-01-19 14:52                 ` Ævar Arnfjörð Bjarmason
2022-01-28  1:28                   ` Neeraj Singh
2021-12-07  2:46   ` [PATCH v2 3/3] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2021-12-07 11:56   ` [PATCH v2 0/3] A design for future-proofing fsync() configuration Patrick Steinhardt
2021-12-08  0:44     ` Neeraj Singh
2021-12-09  0:57   ` [PATCH v3 0/4] " Neeraj K. Singh via GitGitGadget
2021-12-09  0:57     ` [PATCH v3 1/4] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2021-12-09  0:57     ` [PATCH v3 2/4] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2021-12-09  0:57     ` [PATCH v3 3/4] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2021-12-09  0:57     ` [PATCH v3 4/4] core.fsync: add a `derived-metadata` aggregate option Neeraj Singh via GitGitGadget
2022-01-08  1:13     ` [PATCH v3 0/4] A design for future-proofing fsync() configuration Neeraj Singh
2022-01-09  0:55       ` rsbecker
2022-01-10 19:00         ` Neeraj Singh
2022-02-01  3:33     ` [PATCH v4 " Neeraj K. Singh via GitGitGadget
2022-02-01  3:33       ` [PATCH v4 1/4] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-02-01  3:33       ` [PATCH v4 2/4] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2022-02-02  0:51         ` Junio C Hamano
2022-02-02  1:42           ` Junio C Hamano
2022-02-11 21:18             ` Neeraj Singh
2022-02-11 22:19               ` Junio C Hamano
2022-02-11 23:04                 ` Neeraj Singh
2022-02-11 23:15                   ` Junio C Hamano
2022-02-12  0:39                     ` rsbecker
2022-02-14  7:04                     ` Patrick Steinhardt
2022-02-14 17:17                       ` Junio C Hamano
2022-03-09 13:42                         ` Patrick Steinhardt
2022-03-09 18:50                           ` Ævar Arnfjörð Bjarmason
2022-03-09 20:03                           ` Junio C Hamano
2022-03-10 12:33                             ` Patrick Steinhardt
2022-03-10 17:15                               ` Junio C Hamano
2022-03-09 20:05                           ` Neeraj Singh
2022-02-11 20:38           ` Neeraj Singh
2022-02-01  3:33       ` [PATCH v4 3/4] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-02-01  3:33       ` [PATCH v4 4/4] core.fsync: add a `derived-metadata` aggregate option Neeraj Singh via GitGitGadget
2022-03-09 23:03       ` [PATCH v5 0/5] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2022-03-09 23:03         ` [PATCH v5 1/5] wrapper: move inclusion of CSPRNG headers the wrapper.c file Neeraj Singh via GitGitGadget
2022-03-09 23:29           ` Junio C Hamano
2022-03-10  1:21             ` Neeraj Singh
2022-03-10  1:26           ` brian m. carlson
2022-03-10  1:56             ` Neeraj Singh
2022-03-09 23:03         ` [PATCH v5 2/5] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-03-09 23:48           ` Junio C Hamano
2022-03-09 23:03         ` [PATCH v5 3/5] core.fsync: introduce granular fsync control Neeraj Singh via GitGitGadget
2022-03-10  0:21           ` Junio C Hamano
2022-03-10  2:53             ` Neeraj Singh
2022-03-10  7:19               ` Junio C Hamano
2022-03-10 18:38                 ` Neeraj Singh
2022-03-10 18:44                   ` Junio C Hamano
2022-03-10 19:57                     ` Junio C Hamano
2022-03-10 20:25                       ` Neeraj Singh
2022-03-10 21:17                         ` Junio C Hamano
2022-03-10 13:11               ` Johannes Schindelin
2022-03-10 17:18               ` Junio C Hamano
2022-03-09 23:03         ` [PATCH v5 4/5] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-03-09 23:03         ` [PATCH v5 5/5] core.fsync: documentation and user-friendly aggregate options Neeraj Singh via GitGitGadget
2022-03-10  9:53         ` Future-proofed syncing of refs Patrick Steinhardt
2022-03-10  9:53         ` [PATCH 6/8] core.fsync: add `fsync_component()` wrapper which doesn't die Patrick Steinhardt
2022-03-10 17:34           ` Junio C Hamano
2022-03-10 18:40             ` Neeraj Singh
2022-03-10  9:53         ` [PATCH 7/8] core.fsync: new option to harden loose references Patrick Steinhardt
2022-03-10 18:25           ` Junio C Hamano
2022-03-10 19:03             ` Neeraj Singh
2022-03-10 22:54           ` Neeraj Singh
2022-03-11  6:40           ` Junio C Hamano
2022-03-11  9:15             ` Patrick Steinhardt
2022-03-11  9:36               ` Ævar Arnfjörð Bjarmason
2022-03-10  9:53         ` [PATCH 8/8] core.fsync: new option to harden packed references Patrick Steinhardt
2022-03-10 18:28           ` Junio C Hamano
2022-03-11  9:10             ` Patrick Steinhardt
2022-03-10 22:43         ` [PATCH v6 0/6] A design for future-proofing fsync() configuration Neeraj K. Singh via GitGitGadget
2022-03-10 22:43           ` [PATCH v6 1/6] wrapper: make inclusion of Windows csprng header tightly scoped Neeraj Singh via GitGitGadget
2022-03-10 22:43           ` [PATCH v6 2/6] core.fsyncmethod: add writeout-only mode Neeraj Singh via GitGitGadget
2022-03-10 22:43           ` [PATCH v6 3/6] core.fsync: introduce granular fsync control infrastructure Neeraj Singh via GitGitGadget
2022-03-10 22:43           ` [PATCH v6 4/6] core.fsync: add configuration parsing Neeraj Singh via GitGitGadget
2022-03-28 11:06             ` Jiang Xin
2022-03-28 19:45               ` Neeraj Singh
2022-03-10 22:43           ` [PATCH v6 5/6] core.fsync: new option to harden the index Neeraj Singh via GitGitGadget
2022-03-10 22:43           ` [PATCH v6 6/6] core.fsync: documentation and user-friendly aggregate options Neeraj Singh via GitGitGadget
2022-03-15 19:12             ` [PATCH v7] " Neeraj Singh
2022-03-15 19:32               ` Junio C Hamano
2022-03-15 19:56                 ` Neeraj Singh
2022-03-23 14:20               ` Ævar Arnfjörð Bjarmason [this message]
2022-03-25 21:24                 ` do we have too much fsync() configuration in 'next'? (was: [PATCH v7] core.fsync: documentation and user-friendly aggregate options) Neeraj Singh
2022-03-26  0:24                   ` Ævar Arnfjörð Bjarmason
2022-03-26  1:23                     ` do we have too much fsync() configuration in 'next'? Junio C Hamano
2022-03-26  1:25                     ` do we have too much fsync() configuration in 'next'? (was: [PATCH v7] core.fsync: documentation and user-friendly aggregate options) Neeraj Singh
2022-03-26 15:31                       ` Ævar Arnfjörð Bjarmason
2022-03-27  5:27                         ` Neeraj Singh
2022-03-27 12:43                           ` Ævar Arnfjörð Bjarmason
2022-03-28 10:56                             ` Patrick Steinhardt
2022-03-28 11:25                               ` Ævar Arnfjörð Bjarmason
2022-03-28 19:56                                 ` Neeraj Singh
2022-03-30 16:59                                   ` Neeraj Singh
2022-03-10 23:34           ` [PATCH v6 0/6] A design for future-proofing fsync() configuration Junio C Hamano
2022-03-11  0:03             ` Neeraj Singh
2022-03-11 18:50               ` Neeraj Singh
2022-03-13 23:50             ` Junio C Hamano
2022-03-11  9:58           ` [PATCH v2] core.fsync: new option to harden references Patrick Steinhardt
2022-03-25  6:11             ` SZEDER Gábor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220323.86fsn8ohg8.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=bagasdotme@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=neerajsi@microsoft.com \
    --cc=newren@gmail.com \
    --cc=nksingh85@gmail.com \
    --cc=ps@pks.im \
    --cc=rsbecker@nexbridge.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).