git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RFC: A configuration design for future-proofing fsync() configuration
@ 2021-11-10 15:09 Ævar Arnfjörð Bjarmason
  2021-11-11  0:47 ` Neeraj Singh
  2021-11-11 18:03 ` Junio C Hamano
  0 siblings, 2 replies; 9+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-10 15:09 UTC (permalink / raw)
  To: git
  Cc: Patrick Steinhardt, Jeff King, Johannes Schindelin,
	Junio C Hamano, Neeraj K. Singh, Linus Torvalds, Eric Wong,
	Christoph Hellwig, Emily Shaffer

As a follow-up to various fsync topics in-flight I've been encouraging
those involved to come up with some way to configure fsync() in a way
that'll make holistic sense in the end-state.

Continuing a discussion from [1] currently we have:

    ; Defaults to 'false'
    core.fsyncObjectFiles = [true|false]

In master..next this has been extended to this by Neeraj:

   core.fsyncObjectFiles = [true|false|batch]

Which, as an aside I hadn't considered before and I think we need to
change before it lands on "master", we really don't want config users
want to enable that makes older versions hard die. It's annoying to want
to configure a new thing and not being able to put it in .gitconfig
because older versions die on it:

    $ git -c core.fsyncObjectFiles=batch status; echo $?
    fatal: bad boolean config value 'batch' for 'core.fsyncobjectfiles'
    128

Then there's Eric Wong's proposed[2]:

    core.fsync = <bool>

And now Patrick Steinhardt has a proposal to extend Neeraj's with[3]:

    ; Like core.fsyncObjectFiles, but apparently for .git/refs, not
    ; .git/objects (but see my confusion on that topic in [1])
    core.fsyncRefFiles = [<bool>|batch]

I think this sort of config schema would make everyone above happy

It would:

 A) Be easy to extend for any future fsync behavior we'd reasonably
    implement
 
 B) Not make older git versions die. It's fine if they warn(), but not die.

 C) Has some pretty contrived key names, but I'm trying to maintain the
    constraint that you can set both fsck.X=Y and
    e.g. fetch.fsck.X=Y. I.e. we should be able to configure things
    globally *and* per-command, like color.*, fsck.* etc.

Proposal:

  ; Turns on/off all fsync, whatever the method is. I.e. allows you to
  ; never make any fsync() calls whatsoever (which we have another
  ; in-flight topic for).

  ; The "false" was controversial, and we could just leave it
  ; unimplemented
  core.fsync = <bool>

  ; Optional, by default we'd use the most pedantic (I'd call our
  ; current "loose", whether we want to forward-support it is another
  ; matter.
  ;
  ; Whatever names we pick an option like this should ignore (or at most
  ; warn about) values it doesn't know about, not hard die on it.
  ;
  ; Here "bach" is what Neeraj and Patrick are pursuing, a hypothetical
  ; POSIX would be a pedantic way of exhaustively fsyncing everything.
  ; 
  ; We'd leave door open to e.g. setting it to "linux:ext4" or whatever,
  ; to do only the work needed on some specific popular FS
  core.fsyncMethod = loose | POSIX | batch | linux:ext4 | NTFS | ...

  ; Turn on or off entire categories of files we'd like to sync. This
  ; way Neeraj's and Patrick's approach would be to set
  ; core.fsyncMethod=batch, and then core.fsyncGroup=files &
  ; core.fsyncGroup=refs.

  ; If we learn about a new core.fsyncGroup = xyz in the future a <bool>
  ; in "core.fsyncGroupDefault" will prevail. I.e. if true it's
  ; included, if false not.
  ;
  ; Whether "false" or "true" is the default depends on
  ; core.fsyncMethod. For POSIX it would be true, for "loose" it's
  ; false.
  core.fsyncGroup = files
  core.fsyncGroup = refs
  core.fsyncGroup = objects

I'm not sure I like calling it "group". Maybe "class", "category"? Doing
it with this structure is extensible to the two-level keys, as noted
above.

  ; Our existing config knob. When "false" synonymous with:
  ;
  ;     core.fsync = true
  ;     core.fsyncMethod = loose
  ;     core.fsyncGroup = pack
  ;
  ; When "true" synonymous with the same as the above, plus:
  ;     core.fsyncGroup = loose
  ;
  : Or something like that. I.e. we'll fsync *.pack, *.bitmap etc, and ;
  ; probably some other stuff, but not loose objects etc.
  ;
  ; Whatever we fsync now exactly this schema should be generic enough
  ; to support it.
  core.fsyncObjectFiles = <bool>

  ; A namespace for core.fsyncMethod = <X>. Specific methods will
  ; own this namespace and can configure whatever they want.
  fsyncMethod.<x>.<a> = <b>

E.g. we might have:

  fsyncMethod.POSIX.content = true
  fsyncMethod.POSIX.metadata = false

If we know we'd like to (depending on other config) to fsync things
exhaustively or not, but do different things depending on file content
or metadata. I.e. maybe your FS's fsync() on a file fd always implies a
sync of the metadata, and maybe not.

  ; Change whatever fsync configuration you want per-command, similar to
  ; fsck.* and fetch.fsck.*
  transfer.fsyncGroup=*
  fetch.fsyncGroup=*
  ...

1. https://lore.kernel.org/git/211110.86v910gi9a.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/20211028002102.19384-1-e@80x24.org/
3. https://lore.kernel.org/git/cover.1636544377.git.ps@pks.im/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: A configuration design for future-proofing fsync() configuration
  2021-11-10 15:09 RFC: A configuration design for future-proofing fsync() configuration Ævar Arnfjörð Bjarmason
@ 2021-11-11  0:47 ` Neeraj Singh
  2021-11-11  0:57   ` Ævar Arnfjörð Bjarmason
  2021-11-12  5:54   ` Christoph Hellwig
  2021-11-11 18:03 ` Junio C Hamano
  1 sibling, 2 replies; 9+ messages in thread
From: Neeraj Singh @ 2021-11-11  0:47 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Patrick Steinhardt, Jeff King, Johannes Schindelin,
	Junio C Hamano, Neeraj K. Singh, Linus Torvalds, Eric Wong,
	Christoph Hellwig, Emily Shaffer

On Wed, Nov 10, 2021 at 04:09:33PM +0100, Ævar Arnfjörð Bjarmason wrote:
> As a follow-up to various fsync topics in-flight I've been encouraging
> those involved to come up with some way to configure fsync() in a way
> that'll make holistic sense in the end-state.
> 
> Continuing a discussion from [1] currently we have:
> 
>     ; Defaults to 'false'
>     core.fsyncObjectFiles = [true|false]
> 
> In master..next this has been extended to this by Neeraj:
> 
>    core.fsyncObjectFiles = [true|false|batch]
> 
> Which, as an aside I hadn't considered before and I think we need to
> change before it lands on "master", we really don't want config users
> want to enable that makes older versions hard die. It's annoying to want
> to configure a new thing and not being able to put it in .gitconfig
> because older versions die on it:
> 
>     $ git -c core.fsyncObjectFiles=batch status; echo $?
>     fatal: bad boolean config value 'batch' for 'core.fsyncobjectfiles'
>     128
> 
> Then there's Eric Wong's proposed[2]:
> 
>     core.fsync = <bool>
> 
> And now Patrick Steinhardt has a proposal to extend Neeraj's with[3]:
> 
>     ; Like core.fsyncObjectFiles, but apparently for .git/refs, not
>     ; .git/objects (but see my confusion on that topic in [1])
>     core.fsyncRefFiles = [<bool>|batch]
> 
> I think this sort of config schema would make everyone above happy
> 
> It would:
> 
>  A) Be easy to extend for any future fsync behavior we'd reasonably
>     implement
>  
>  B) Not make older git versions die. It's fine if they warn(), but not die.
> 
>  C) Has some pretty contrived key names, but I'm trying to maintain the
>     constraint that you can set both fsck.X=Y and
>     e.g. fetch.fsck.X=Y. I.e. we should be able to configure things
>     globally *and* per-command, like color.*, fsck.* etc.
> 
> Proposal:
> 
>   ; Turns on/off all fsync, whatever the method is. I.e. allows you to
>   ; never make any fsync() calls whatsoever (which we have another
>   ; in-flight topic for).
> 
>   ; The "false" was controversial, and we could just leave it
>   ; unimplemented
>   core.fsync = <bool>
> 
>   ; Optional, by default we'd use the most pedantic (I'd call our
>   ; current "loose", whether we want to forward-support it is another
>   ; matter.
>   ;
>   ; Whatever names we pick an option like this should ignore (or at most
>   ; warn about) values it doesn't know about, not hard die on it.
>   ;
>   ; Here "bach" is what Neeraj and Patrick are pursuing, a hypothetical
>   ; POSIX would be a pedantic way of exhaustively fsyncing everything.
>   ; 
>   ; We'd leave door open to e.g. setting it to "linux:ext4" or whatever,
>   ; to do only the work needed on some specific popular FS
>   core.fsyncMethod = loose | POSIX | batch | linux:ext4 | NTFS | ...
> 
>   ; Turn on or off entire categories of files we'd like to sync. This
>   ; way Neeraj's and Patrick's approach would be to set
>   ; core.fsyncMethod=batch, and then core.fsyncGroup=files &
>   ; core.fsyncGroup=refs.
> 
>   ; If we learn about a new core.fsyncGroup = xyz in the future a <bool>
>   ; in "core.fsyncGroupDefault" will prevail. I.e. if true it's
>   ; included, if false not.
>   ;
>   ; Whether "false" or "true" is the default depends on
>   ; core.fsyncMethod. For POSIX it would be true, for "loose" it's
>   ; false.
>   core.fsyncGroup = files
>   core.fsyncGroup = refs
>   core.fsyncGroup = objects
> 
> I'm not sure I like calling it "group". Maybe "class", "category"? Doing
> it with this structure is extensible to the two-level keys, as noted
> above.
> 
>   ; Our existing config knob. When "false" synonymous with:
>   ;
>   ;     core.fsync = true
>   ;     core.fsyncMethod = loose
>   ;     core.fsyncGroup = pack
>   ;
>   ; When "true" synonymous with the same as the above, plus:
>   ;     core.fsyncGroup = loose
>   ;
>   : Or something like that. I.e. we'll fsync *.pack, *.bitmap etc, and ;
>   ; probably some other stuff, but not loose objects etc.
>   ;
>   ; Whatever we fsync now exactly this schema should be generic enough
>   ; to support it.
>   core.fsyncObjectFiles = <bool>
> 
>   ; A namespace for core.fsyncMethod = <X>. Specific methods will
>   ; own this namespace and can configure whatever they want.
>   fsyncMethod.<x>.<a> = <b>
> 
> E.g. we might have:
> 
>   fsyncMethod.POSIX.content = true
>   fsyncMethod.POSIX.metadata = false
> 
> If we know we'd like to (depending on other config) to fsync things
> exhaustively or not, but do different things depending on file content
> or metadata. I.e. maybe your FS's fsync() on a file fd always implies a
> sync of the metadata, and maybe not.
> 
>   ; Change whatever fsync configuration you want per-command, similar to
>   ; fsck.* and fetch.fsck.*
>   transfer.fsyncGroup=*
>   fetch.fsyncGroup=*
>   ...
> 
> 1. https://lore.kernel.org/git/211110.86v910gi9a.gmgdl@evledraar.gmail.com/
> 2. https://lore.kernel.org/git/20211028002102.19384-1-e@80x24.org/
> 3. https://lore.kernel.org/git/cover.1636544377.git.ps@pks.im/
Hi Ævar,

Thanks for noticing the backwards compatibility issue with the 'batch' flag. I
agree that we need to fix that before committing my changes to master.

I'm hoping that we can agree to a version of what you're proposing, but my
preference would be to cut out the more granular controls. I'd prefer to see
just:
	core.fsync = [bool]   		- Turn fsyncing on or off.
	core.fsyncMethod = [string] 	- Controls how it's done (with a non-fatal warn on unrecognized values).
	core.fsyncObjectFiles = [bool]  - Sets core.fsync if that setting doesn't already have a value. For back-compat.

I don't think either we or the users should have to reason about what it means
for some parts of the repo to be fsynced and others not to be. If core.fsync is
'false' and someone gets a weird state after a system crash, no one should be
surprised. If core.fsync is 'true', and people are running on a reasonable
common filesystem, we should be trying to give decent performance and good
durability.

It would be nice to loop in some Linux fs developers to find out what can be
done on current implementations to get the durability without terrible
performance. From reading the docs and mailing threads it looks like the
sync_file_range + bulk fsync approach should actually work on the current XFS
implementation.

Thanks,
Neeraj
Windows Core Filesystem Dev

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: A configuration design for future-proofing fsync() configuration
  2021-11-11  0:47 ` Neeraj Singh
@ 2021-11-11  0:57   ` Ævar Arnfjörð Bjarmason
  2021-11-17 22:16     ` Neeraj Singh
  2021-11-12  5:54   ` Christoph Hellwig
  1 sibling, 1 reply; 9+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-11  0:57 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: git, Patrick Steinhardt, Jeff King, Johannes Schindelin,
	Junio C Hamano, Neeraj K. Singh, Linus Torvalds, Eric Wong,
	Christoph Hellwig, Emily Shaffer


On Wed, Nov 10 2021, Neeraj Singh wrote:

> On Wed, Nov 10, 2021 at 04:09:33PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> As a follow-up to various fsync topics in-flight I've been encouraging
>> those involved to come up with some way to configure fsync() in a way
>> that'll make holistic sense in the end-state.
>> 
>> Continuing a discussion from [1] currently we have:
>> 
>>     ; Defaults to 'false'
>>     core.fsyncObjectFiles = [true|false]
>> 
>> In master..next this has been extended to this by Neeraj:
>> 
>>    core.fsyncObjectFiles = [true|false|batch]
>> 
>> Which, as an aside I hadn't considered before and I think we need to
>> change before it lands on "master", we really don't want config users
>> want to enable that makes older versions hard die. It's annoying to want
>> to configure a new thing and not being able to put it in .gitconfig
>> because older versions die on it:
>> 
>>     $ git -c core.fsyncObjectFiles=batch status; echo $?
>>     fatal: bad boolean config value 'batch' for 'core.fsyncobjectfiles'
>>     128
>> 
>> Then there's Eric Wong's proposed[2]:
>> 
>>     core.fsync = <bool>
>> 
>> And now Patrick Steinhardt has a proposal to extend Neeraj's with[3]:
>> 
>>     ; Like core.fsyncObjectFiles, but apparently for .git/refs, not
>>     ; .git/objects (but see my confusion on that topic in [1])
>>     core.fsyncRefFiles = [<bool>|batch]
>> 
>> I think this sort of config schema would make everyone above happy
>> 
>> It would:
>> 
>>  A) Be easy to extend for any future fsync behavior we'd reasonably
>>     implement
>>  
>>  B) Not make older git versions die. It's fine if they warn(), but not die.
>> 
>>  C) Has some pretty contrived key names, but I'm trying to maintain the
>>     constraint that you can set both fsck.X=Y and
>>     e.g. fetch.fsck.X=Y. I.e. we should be able to configure things
>>     globally *and* per-command, like color.*, fsck.* etc.
>> 
>> Proposal:
>> 
>>   ; Turns on/off all fsync, whatever the method is. I.e. allows you to
>>   ; never make any fsync() calls whatsoever (which we have another
>>   ; in-flight topic for).
>> 
>>   ; The "false" was controversial, and we could just leave it
>>   ; unimplemented
>>   core.fsync = <bool>
>> 
>>   ; Optional, by default we'd use the most pedantic (I'd call our
>>   ; current "loose", whether we want to forward-support it is another
>>   ; matter.
>>   ;
>>   ; Whatever names we pick an option like this should ignore (or at most
>>   ; warn about) values it doesn't know about, not hard die on it.
>>   ;
>>   ; Here "bach" is what Neeraj and Patrick are pursuing, a hypothetical
>>   ; POSIX would be a pedantic way of exhaustively fsyncing everything.
>>   ; 
>>   ; We'd leave door open to e.g. setting it to "linux:ext4" or whatever,
>>   ; to do only the work needed on some specific popular FS
>>   core.fsyncMethod = loose | POSIX | batch | linux:ext4 | NTFS | ...
>> 
>>   ; Turn on or off entire categories of files we'd like to sync. This
>>   ; way Neeraj's and Patrick's approach would be to set
>>   ; core.fsyncMethod=batch, and then core.fsyncGroup=files &
>>   ; core.fsyncGroup=refs.
>> 
>>   ; If we learn about a new core.fsyncGroup = xyz in the future a <bool>
>>   ; in "core.fsyncGroupDefault" will prevail. I.e. if true it's
>>   ; included, if false not.
>>   ;
>>   ; Whether "false" or "true" is the default depends on
>>   ; core.fsyncMethod. For POSIX it would be true, for "loose" it's
>>   ; false.
>>   core.fsyncGroup = files
>>   core.fsyncGroup = refs
>>   core.fsyncGroup = objects
>> 
>> I'm not sure I like calling it "group". Maybe "class", "category"? Doing
>> it with this structure is extensible to the two-level keys, as noted
>> above.
>> 
>>   ; Our existing config knob. When "false" synonymous with:
>>   ;
>>   ;     core.fsync = true
>>   ;     core.fsyncMethod = loose
>>   ;     core.fsyncGroup = pack
>>   ;
>>   ; When "true" synonymous with the same as the above, plus:
>>   ;     core.fsyncGroup = loose
>>   ;
>>   : Or something like that. I.e. we'll fsync *.pack, *.bitmap etc, and ;
>>   ; probably some other stuff, but not loose objects etc.
>>   ;
>>   ; Whatever we fsync now exactly this schema should be generic enough
>>   ; to support it.
>>   core.fsyncObjectFiles = <bool>
>> 
>>   ; A namespace for core.fsyncMethod = <X>. Specific methods will
>>   ; own this namespace and can configure whatever they want.
>>   fsyncMethod.<x>.<a> = <b>
>> 
>> E.g. we might have:
>> 
>>   fsyncMethod.POSIX.content = true
>>   fsyncMethod.POSIX.metadata = false
>> 
>> If we know we'd like to (depending on other config) to fsync things
>> exhaustively or not, but do different things depending on file content
>> or metadata. I.e. maybe your FS's fsync() on a file fd always implies a
>> sync of the metadata, and maybe not.
>> 
>>   ; Change whatever fsync configuration you want per-command, similar to
>>   ; fsck.* and fetch.fsck.*
>>   transfer.fsyncGroup=*
>>   fetch.fsyncGroup=*
>>   ...
>> 
>> 1. https://lore.kernel.org/git/211110.86v910gi9a.gmgdl@evledraar.gmail.com/
>> 2. https://lore.kernel.org/git/20211028002102.19384-1-e@80x24.org/
>> 3. https://lore.kernel.org/git/cover.1636544377.git.ps@pks.im/
> Hi Ævar,
>
> Thanks for noticing the backwards compatibility issue with the 'batch' flag. I
> agree that we need to fix that before committing my changes to master.
>
> I'm hoping that we can agree to a version of what you're proposing, but my
> preference would be to cut out the more granular controls. I'd prefer to see
> just:
> 	core.fsync = [bool]   		- Turn fsyncing on or off.
> 	core.fsyncMethod = [string] 	- Controls how it's done (with a non-fatal warn on unrecognized values).
> 	core.fsyncObjectFiles = [bool]  - Sets core.fsync if that setting doesn't already have a value. For back-compat.

I'm fine with something simpler as long as we don't think we'll
plausibly start painting ourselves into a corner.

But core.fsyncObjectFiles is *not* a setting of a "core.fsync" in the
sense that Eric suggested we have.

I.e. it's effectively a sort of early and partial Linux-only version of
what your "batch" mode is. I.e. to skip fsyncing the loose object files,
and only fsync() the eventual refs we write.

"Sort of" because we'd e.g. fsync packs unconditionally etc, but if we
make core.fsyncObjectFiles=false be core.fsync=false then we can't have
a "real" core.fsync=false, i.e. one that guarantees no fsync() calls at
all.

We could also simply decide that it's a bad setting and we're going to
deprecate it, but another way is having a generic config layout that can
express what it's doing and more.

> I don't think either we or the users should have to reason about what it means
> for some parts of the repo to be fsynced and others not to be. If core.fsync is
> 'false' and someone gets a weird state after a system crash, no one should be
> surprised.

Yes. I'm fine with leaving this on the table. I should have be more
explicit that I'm not suggesting we implement all this exhaustive config
support, but if we imagine a sensible config schema that is extensible
(my proposal may or may not be that) then we can implement just 1-2
variables in it and know that we have room to grow in the future.

> If core.fsync is 'true', and people are running on a reasonable
> common filesystem, we should be trying to give decent performance and good
> durability.

Yeah, I just wonder if we can easily provide config to have people
decide that trade-off themselves.

E.g. from the performance numbers in [1] I might turn off fsyncing when
we write anything in the working tree.

We don't do that particular thing now, but if we're being pulled in one
direction of always being fsync-safe by default...

I can also see it being useful to e.g. do:

    gc.fsync = false

Or blacklist other similar batch operations, although with a global knob
that can also rather easily be:

    git -c core.fsync gc

So maybe the whole "fsck" rationale doesn't apply here.

> It would be nice to loop in some Linux fs developers to find out what can be
> done on current implementations to get the durability without terrible
> performance. From reading the docs and mailing threads it looks like the
> sync_file_range + bulk fsync approach should actually work on the current XFS
> implementation.

I CC'd Linus on the topic of core.fsyncObjectFiles on this thread, and
Christoph Hellwig who chimed in on the topic of the behavior of Linux
FS's on recent threads, I don't know where we'd find a focused set of
Linux devs who might be interested (I'm not going to just spam LKML). If
anyone does pointing them to this thread would be most welcome.

1. https://lore.kernel.org/git/YYwvVy6AWDjkWazn@coredump.intra.peff.net/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: A configuration design for future-proofing fsync() configuration
  2021-11-10 15:09 RFC: A configuration design for future-proofing fsync() configuration Ævar Arnfjörð Bjarmason
  2021-11-11  0:47 ` Neeraj Singh
@ 2021-11-11 18:03 ` Junio C Hamano
  1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2021-11-11 18:03 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Patrick Steinhardt, Jeff King, Johannes Schindelin,
	Neeraj K. Singh, Linus Torvalds, Eric Wong, Christoph Hellwig,
	Emily Shaffer

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Continuing a discussion from [1] currently we have:
>
>     ; Defaults to 'false'
>     core.fsyncObjectFiles = [true|false]
>
> In master..next this has been extended to this by Neeraj:
>
>    core.fsyncObjectFiles = [true|false|batch]
>
> Which, as an aside I hadn't considered before and I think we need to
> change before it lands on "master", we really don't want config users
> want to enable that makes older versions hard die. It's annoying to want
> to configure a new thing and not being able to put it in .gitconfig
> because older versions die on it:
>
>     $ git -c core.fsyncObjectFiles=batch status; echo $?
>     fatal: bad boolean config value 'batch' for 'core.fsyncobjectfiles'
>     128

But then it is also annoying to find out that the shiny new toy you
thought you configured silently is not kicking in.  I actually think
Neeraj's "if you are in a mixed environment, you need to be aware of
which copies of Git you use are prepared to use it" would be better
for end users.

For us Git developers, it would be less convenient, though.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: A configuration design for future-proofing fsync() configuration
  2021-11-11  0:47 ` Neeraj Singh
  2021-11-11  0:57   ` Ævar Arnfjörð Bjarmason
@ 2021-11-12  5:54   ` Christoph Hellwig
  2021-11-17 18:49     ` Neeraj Singh
  1 sibling, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2021-11-12  5:54 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Ævar Arnfjörð Bjarmason, git, Patrick Steinhardt,
	Jeff King, Johannes Schindelin, Junio C Hamano, Neeraj K. Singh,
	Linus Torvalds, Eric Wong, Christoph Hellwig, Emily Shaffer

On Wed, Nov 10, 2021 at 04:47:24PM -0800, Neeraj Singh wrote:
> It would be nice to loop in some Linux fs developers to find out what can be
> done on current implementations to get the durability without terrible
> performance. From reading the docs and mailing threads it looks like the
> sync_file_range + bulk fsync approach should actually work on the current XFS
> implementation.

If you want more than just my advice linux-fsdevel@vger.kernel.org is
a good place to find a wide range of opinions.

Anyway, I think syncfs is the biggest band for the buck as it will give
you very efficient syncing with very little overhead in git, but it does
have a huge noisy neighbor problem that might make it unattractive
for multi-tenant file systems or git hosting.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: A configuration design for future-proofing fsync() configuration
  2021-11-12  5:54   ` Christoph Hellwig
@ 2021-11-17 18:49     ` Neeraj Singh
  0 siblings, 0 replies; 9+ messages in thread
From: Neeraj Singh @ 2021-11-17 18:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ævar Arnfjörð Bjarmason, Git List,
	Patrick Steinhardt, Jeff King, Johannes Schindelin,
	Junio C Hamano, Neeraj K. Singh, Linus Torvalds, Eric Wong,
	Emily Shaffer, linux-fsdevel, Amir Goldstein

On Thu, Nov 11, 2021 at 9:54 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Wed, Nov 10, 2021 at 04:47:24PM -0800, Neeraj Singh wrote:
> > It would be nice to loop in some Linux fs developers to find out what can be
> > done on current implementations to get the durability without terrible
> > performance. From reading the docs and mailing threads it looks like the
> > sync_file_range + bulk fsync approach should actually work on the current XFS
> > implementation.
>
> If you want more than just my advice linux-fsdevel@vger.kernel.org is
> a good place to find a wide range of opinions.
>
> Anyway, I think syncfs is the biggest band for the buck as it will give
> you very efficient syncing with very little overhead in git, but it does
> have a huge noisy neighbor problem that might make it unattractive
> for multi-tenant file systems or git hosting.

To summarize where we are at for linux-fsdevel:
We're working on making Git preserve data added to the repo even if
the system crashes or loses power at some point soon after a Git
command completes. The default behavior of git-for-windows is to set
core.fsyncobjectfiles=true, which at least ensures durability for
loose object files.

The current implementation of core.fsyncobjectfiles inserts an fsync
between writing each new object to a temp name and renaming it to its
final hash-based name. This approach is slow when adding hundreds of
files to the repo [1]. The main cost on the hardware we tested is
actually the CACHE_FLUSH request sent down to
the storage hardware. There is also work in-flight by Patrick
Steinhardt to sync ref files [2].

In a patch series at [3], I implemented a batch mode that issues
pagecache writeback for each object file when it's being written and
then before any of the files are renamed to their final destination we
do an fsync to a dummy file on the same filesystem.  On linux, this is
using the sync_file_range(fd,0,0,  SYNC_FILE_RANGE_WRITE_AND_WAIT) to
do the pagecache writeback.  According to Amir's thread at [4] this
flag combo should actually trigger the desired writeback. The
expectation is that the fsync of the dummy file should trigger a log
writeback and one or more CACHE_FLUSH commands to harden the block
mapping metadata and directory entries such that the data would be
retrievable after the fsync completes.

The equivalent sequence is specified to work on the common Windows
filesystems [5]. The question I have for the Linux community is
whether the same sequence will work on any of the common extant Linux
filesystems such that it can provide value to Git users on Linux. My
understanding from Christoph Hellwig's comments is that on XFS at
least the sync_file_range, fsync, and rename sequence would allow us
to guarantee that the complete written contents of the file would be
visible if the new name is visible.  I also expect that additional
fsync to a dummy file after the renames would also ensure that the log
is forced again, which should ensure that all of the renames are
visible before a ref file could be written that points at one of the
object names.

I wasn't able to find any clear semantics about the ext4 filesystem,
and I gather from what I've read that the btrfs filesystem does not
support the desired semantics.  Christoph mentioned that syncfs would
efficiently provide a batched CACHE_FLUSH with the cost of picking up
dirty cached data unrelated to Git.

Are there any opinions on the Linux side about what APIs we should use
to provide durability across multiple Git files while not completely
tanking performance by adding one CACHE_FLUSH per file modified?  What
are the semantics of the ext4 log (when it is enabled) with regards to
creating a temp file, populating its contents and then renaming it?
Are they similar enough to XFS's 'log force' such that our batch mode
would work there?

Thanks,
Neeraj
Windows Core Filesystem Dev

[1] https://docs.google.com/spreadsheets/d/1uxMBkEXFFnQ1Y3lXKqcKpw6Mq44BzhpCAcPex14T-QQ/edit#gid=1898936117
[2] https://lore.kernel.org/git/cover.1636544377.git.ps@pks.im/
[3] https://lore.kernel.org/git/b9d3d87443266767f00e77c967bd77357fe50484.1633366667.git.gitgitgadget@gmail.com/
[4] https://lore.kernel.org/linux-fsdevel/20190419072938.31320-1-amir73il@gmail.com/
[5] See FLUSH_FLAGS_NO_SYNC -
https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntflushbuffersfileex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: A configuration design for future-proofing fsync() configuration
  2021-11-11  0:57   ` Ævar Arnfjörð Bjarmason
@ 2021-11-17 22:16     ` Neeraj Singh
  2021-11-18 19:00       ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Neeraj Singh @ 2021-11-17 22:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git List, Patrick Steinhardt, Jeff King, Johannes Schindelin,
	Junio C Hamano, Neeraj K. Singh, Linus Torvalds, Eric Wong,
	Christoph Hellwig, Emily Shaffer

On Wed, Nov 10, 2021 at 5:13 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> On Wed, Nov 10 2021, Neeraj Singh wrote:
>
> > On Wed, Nov 10, 2021 at 04:09:33PM +0100, Ævar Arnfjörð Bjarmason wrote:
> >> I think this sort of config schema would make everyone above happy
> >>
> >> It would:
> >>
> >>  A) Be easy to extend for any future fsync behavior we'd reasonably
> >>     implement
> >>
> >>  B) Not make older git versions die. It's fine if they warn(), but not die.
> >>
> >>  C) Has some pretty contrived key names, but I'm trying to maintain the
> >>     constraint that you can set both fsck.X=Y and
> >>     e.g. fetch.fsck.X=Y. I.e. we should be able to configure things
> >>     globally *and* per-command, like color.*, fsck.* etc.
> >>
> >> Proposal:
> >>
> >>   ; Turns on/off all fsync, whatever the method is. I.e. allows you to
> >>   ; never make any fsync() calls whatsoever (which we have another
> >>   ; in-flight topic for).
> >>
> >>   ; The "false" was controversial, and we could just leave it
> >>   ; unimplemented
> >>   core.fsync = <bool>
> >>
> >>   ; Optional, by default we'd use the most pedantic (I'd call our
> >>   ; current "loose", whether we want to forward-support it is another
> >>   ; matter.
> >>   ;
> >>   ; Whatever names we pick an option like this should ignore (or at most
> >>   ; warn about) values it doesn't know about, not hard die on it.
> >>   ;
> >>   ; Here "bach" is what Neeraj and Patrick are pursuing, a hypothetical
> >>   ; POSIX would be a pedantic way of exhaustively fsyncing everything.
> >>   ;
> >>   ; We'd leave door open to e.g. setting it to "linux:ext4" or whatever,
> >>   ; to do only the work needed on some specific popular FS
> >>   core.fsyncMethod = loose | POSIX | batch | linux:ext4 | NTFS | ...
> >>
> >>   ; Turn on or off entire categories of files we'd like to sync. This
> >>   ; way Neeraj's and Patrick's approach would be to set
> >>   ; core.fsyncMethod=batch, and then core.fsyncGroup=files &
> >>   ; core.fsyncGroup=refs.
> >>
> >>   ; If we learn about a new core.fsyncGroup = xyz in the future a <bool>
> >>   ; in "core.fsyncGroupDefault" will prevail. I.e. if true it's
> >>   ; included, if false not.
> >>   ;
> >>   ; Whether "false" or "true" is the default depends on
> >>   ; core.fsyncMethod. For POSIX it would be true, for "loose" it's
> >>   ; false.
> >>   core.fsyncGroup = files
> >>   core.fsyncGroup = refs
> >>   core.fsyncGroup = objects
> >>
> >> I'm not sure I like calling it "group". Maybe "class", "category"? Doing
> >> it with this structure is extensible to the two-level keys, as noted
> >> above.
> >>
> >>   ; Our existing config knob. When "false" synonymous with:
> >>   ;
> >>   ;     core.fsync = true
> >>   ;     core.fsyncMethod = loose
> >>   ;     core.fsyncGroup = pack
> >>   ;
> >>   ; When "true" synonymous with the same as the above, plus:
> >>   ;     core.fsyncGroup = loose
> >>   ;
> >>   : Or something like that. I.e. we'll fsync *.pack, *.bitmap etc, and ;
> >>   ; probably some other stuff, but not loose objects etc.
> >>   ;
> >>   ; Whatever we fsync now exactly this schema should be generic enough
> >>   ; to support it.
> >>   core.fsyncObjectFiles = <bool>
> >>
> >>   ; A namespace for core.fsyncMethod = <X>. Specific methods will
> >>   ; own this namespace and can configure whatever they want.
> >>   fsyncMethod.<x>.<a> = <b>
> >>
> >> E.g. we might have:
> >>
> >>   fsyncMethod.POSIX.content = true
> >>   fsyncMethod.POSIX.metadata = false
> >>
> >> If we know we'd like to (depending on other config) to fsync things
> >> exhaustively or not, but do different things depending on file content
> >> or metadata. I.e. maybe your FS's fsync() on a file fd always implies a
> >> sync of the metadata, and maybe not.
> >>
> >>   ; Change whatever fsync configuration you want per-command, similar to
> >>   ; fsck.* and fetch.fsck.*
> >>   transfer.fsyncGroup=*
> >>   fetch.fsyncGroup=*
> >>   ...
> >>
> >> 1. https://lore.kernel.org/git/211110.86v910gi9a.gmgdl@evledraar.gmail.com/
> >> 2. https://lore.kernel.org/git/20211028002102.19384-1-e@80x24.org/
> >> 3. https://lore.kernel.org/git/cover.1636544377.git.ps@pks.im/
> > Hi Ævar,
> >
> > Thanks for noticing the backwards compatibility issue with the 'batch' flag. I
> > agree that we need to fix that before committing my changes to master.
> >
> > I'm hoping that we can agree to a version of what you're proposing, but my
> > preference would be to cut out the more granular controls. I'd prefer to see
> > just:
> >       core.fsync = [bool]             - Turn fsyncing on or off.
> >       core.fsyncMethod = [string]     - Controls how it's done (with a non-fatal warn on unrecognized values).
> >       core.fsyncObjectFiles = [bool]  - Sets core.fsync if that setting doesn't already have a value. For back-compat.
>
> I'm fine with something simpler as long as we don't think we'll
> plausibly start painting ourselves into a corner.
>
> But core.fsyncObjectFiles is *not* a setting of a "core.fsync" in the
> sense that Eric suggested we have.
>
> I.e. it's effectively a sort of early and partial Linux-only version of
> what your "batch" mode is. I.e. to skip fsyncing the loose object files,
> and only fsync() the eventual refs we write.
>
> "Sort of" because we'd e.g. fsync packs unconditionally etc, but if we
> make core.fsyncObjectFiles=false be core.fsync=false then we can't have
> a "real" core.fsync=false, i.e. one that guarantees no fsync() calls at
> all.
>
> We could also simply decide that it's a bad setting and we're going to
> deprecate it, but another way is having a generic config layout that can
> express what it's doing and more.
>
> > I don't think either we or the users should have to reason about what it means
> > for some parts of the repo to be fsynced and others not to be. If core.fsync is
> > 'false' and someone gets a weird state after a system crash, no one should be
> > surprised.
>
> Yes. I'm fine with leaving this on the table. I should have be more
> explicit that I'm not suggesting we implement all this exhaustive config
> support, but if we imagine a sensible config schema that is extensible
> (my proposal may or may not be that) then we can implement just 1-2
> variables in it and know that we have room to grow in the future.
>
> > If core.fsync is 'true', and people are running on a reasonable
> > common filesystem, we should be trying to give decent performance and good
> > durability.
>
> Yeah, I just wonder if we can easily provide config to have people
> decide that trade-off themselves.
>
> E.g. from the performance numbers in [1] I might turn off fsyncing when
> we write anything in the working tree.
>
> We don't do that particular thing now, but if we're being pulled in one
> direction of always being fsync-safe by default...
>
> I can also see it being useful to e.g. do:
>
>     gc.fsync = false
>
> Or blacklist other similar batch operations, although with a global knob
> that can also rather easily be:
>
>     git -c core.fsync gc
>
> So maybe the whole "fsck" rationale doesn't apply here.
>
> > It would be nice to loop in some Linux fs developers to find out what can be
> > done on current implementations to get the durability without terrible
> > performance. From reading the docs and mailing threads it looks like the
> > sync_file_range + bulk fsync approach should actually work on the current XFS
> > implementation.
>

After sleeping on it for a while, I'm willing to consolidate the
configuration along the lines that you've specified, but I'd like to
reduce the number of degrees of freedom.

My proposal in Documentation form:

core.fsync::
A comma-separated list of parts of the repository which should be hardened by
calling fsync when created or modified. When an aggregate option is
specified, a subcomponent can be overriden by prefixing it with a '-'. For
example, `core.fsync=all,-index` means "fsync everything except the index".
Items which are not fsync'ed may be lost in the even of an unclean system
shutdown. This setting defaults to `objects,-loose-objects`
+
* `loose-objects` hardens objects added to the repo in loose-object form.
* `packs` hardens objects added to the repo in packfile form and the related
  bitmap and index files.
* `commit-graph` hardens the commit graph file.
* `refs` (future) hardens references when they are modified.
* `index` (future) hardens the index when it is modified.
* `objects` is an aggregate option that includes `loose-objects`, `packs`, and
  `commit-graph`.
* `all` is an aggregate option that syncs all individual components above.
* `none` is an aggregate option that disables fsync completely.

core.fsyncMethod::
A value indicating the strategy Git will use to harden repository data using
fsync and related primitives.
+
* 'default' uses the fsync(2) system call or platform equivalents.
* 'batch' uses APIs such as sync_file_range or equivalent to reduce the number
  of hardware FLUSH CACHE requests sent to the storage hardware.
* 'writeout-only' (future) issues requests to send the writes to the storage
* hardware, but does not send any FLUSH CACHE request.
* 'syncfs' (future) uses the syncfs API, where available, to sync all of the
  files on the same filesystem as the Git repo.

core.fsyncObjectFiles::
If `true`, this legacy setting is equivalent to `core.fsync=objects`. If
`core.fsync` is explicitly specified, then this setting is ignored.

Thanks,
Neeraj

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: A configuration design for future-proofing fsync() configuration
  2021-11-17 22:16     ` Neeraj Singh
@ 2021-11-18 19:00       ` Junio C Hamano
  2021-11-18 19:46         ` Neeraj Singh
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2021-11-18 19:00 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Ævar Arnfjörð Bjarmason, Git List,
	Patrick Steinhardt, Jeff King, Johannes Schindelin,
	Neeraj K. Singh, Linus Torvalds, Eric Wong, Christoph Hellwig,
	Emily Shaffer

Neeraj Singh <nksingh85@gmail.com> writes:

> After sleeping on it for a while, I'm willing to consolidate the
> configuration along the lines that you've specified, but I'd like to
> reduce the number of degrees of freedom.
>
> My proposal in Documentation form:
>
> core.fsync::
> A comma-separated list of parts of the repository which should be hardened by
> calling fsync when created or modified. When an aggregate option is
> specified, a subcomponent can be overriden by prefixing it with a '-'. For
> example, `core.fsync=all,-index` means "fsync everything except the index".
> Items which are not fsync'ed may be lost in the even of an unclean system
> shutdown. This setting defaults to `objects,-loose-objects`
> +
> * `loose-objects` hardens objects added to the repo in loose-object form.
> * `packs` hardens objects added to the repo in packfile form and the related
>   bitmap and index files.
> * `commit-graph` hardens the commit graph file.
> * `refs` (future) hardens references when they are modified.
> * `index` (future) hardens the index when it is modified.
> * `objects` is an aggregate option that includes `loose-objects`, `packs`, and
>   `commit-graph`.
> * `all` is an aggregate option that syncs all individual components above.
> * `none` is an aggregate option that disables fsync completely.

I wasn't closely following the discussion at all, but the above
simplification may still even be too fine-grained?  For example,
what does it mean to care less about the robustness of loose objects
than packs or ref updates?  How does an existing fine-grained
classification interact with new classes of filesystem entity we
will introduce under .git in the future?  Imagine that we didn't
have .midx and multi-pack bitmap yet; since 'loose-objects',
'packs', and 'commit-graph' are the only three groups we can choose
to place any "objects and reachability" related data in, we need to
pick one, and choosing 'packs' class may be the choice of least
resistance, the default kitchen-sync category for anything related
to "object".  Or just like 'commit-graph' has its own category,
would we invent a new class and call it 'multi-pack'?

I cannot shake the feeling that these are making everything
unnecessarily complex and adding more things that we need to explain
to the end-user---and the worst part is I doubt it would help the
end-users very much tot understand what gets explained.

> core.fsyncMethod::
> A value indicating the strategy Git will use to harden repository data using
> fsync and related primitives.
> +
> * 'default' uses the fsync(2) system call or platform equivalents.
> * 'batch' uses APIs such as sync_file_range or equivalent to reduce the number
>   of hardware FLUSH CACHE requests sent to the storage hardware.
> * 'writeout-only' (future) issues requests to send the writes to the storage
> * hardware, but does not send any FLUSH CACHE request.
> * 'syncfs' (future) uses the syncfs API, where available, to sync all of the
>   files on the same filesystem as the Git repo.

How would an end-user choose among these?  If they assume that the
version of Git they use is bug-free, is there a reason why they
should ever pick 'default' over 'batch', for example?  Shouldn't we
be the one to choose the best approach on the underlying filesystem
for the users, instead of forcing them to choose?

As implementors, these choices may be of interest and give you a
handy way to compare different design, but I am not sure if we want
to give anything more complex than a binary choice, "default" and
"eatmydata".

> core.fsyncObjectFiles::
> If `true`, this legacy setting is equivalent to `core.fsync=objects`. If
> `core.fsync` is explicitly specified, then this setting is ignored.

I think deprecating this very-specific knob is a good idea,
regardless of how complex we'd want to make the alternative.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: A configuration design for future-proofing fsync() configuration
  2021-11-18 19:00       ` Junio C Hamano
@ 2021-11-18 19:46         ` Neeraj Singh
  0 siblings, 0 replies; 9+ messages in thread
From: Neeraj Singh @ 2021-11-18 19:46 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, Git List,
	Patrick Steinhardt, Jeff King, Johannes Schindelin,
	Neeraj K. Singh, Linus Torvalds, Eric Wong, Christoph Hellwig,
	Emily Shaffer, Derrick Stolee

On Thu, Nov 18, 2021 at 11:00 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Neeraj Singh <nksingh85@gmail.com> writes:
>
> > After sleeping on it for a while, I'm willing to consolidate the
> > configuration along the lines that you've specified, but I'd like to
> > reduce the number of degrees of freedom.
> >
> > My proposal in Documentation form:
> >
> > core.fsync::
> > A comma-separated list of parts of the repository which should be hardened by
> > calling fsync when created or modified. When an aggregate option is
> > specified, a subcomponent can be overriden by prefixing it with a '-'. For
> > example, `core.fsync=all,-index` means "fsync everything except the index".
> > Items which are not fsync'ed may be lost in the even of an unclean system
> > shutdown. This setting defaults to `objects,-loose-objects`
> > +
> > * `loose-objects` hardens objects added to the repo in loose-object form.
> > * `packs` hardens objects added to the repo in packfile form and the related
> >   bitmap and index files.
> > * `commit-graph` hardens the commit graph file.
> > * `refs` (future) hardens references when they are modified.
> > * `index` (future) hardens the index when it is modified.
> > * `objects` is an aggregate option that includes `loose-objects`, `packs`, and
> >   `commit-graph`.
> > * `all` is an aggregate option that syncs all individual components above.
> > * `none` is an aggregate option that disables fsync completely.
>
> I wasn't closely following the discussion at all, but the above
> simplification may still even be too fine-grained?  For example,
> what does it mean to care less about the robustness of loose objects
> than packs or ref updates?  How does an existing fine-grained
> classification interact with new classes of filesystem entity we
> will introduce under .git in the future?  Imagine that we didn't
> have .midx and multi-pack bitmap yet; since 'loose-objects',
> 'packs', and 'commit-graph' are the only three groups we can choose
> to place any "objects and reachability" related data in, we need to
> pick one, and choosing 'packs' class may be the choice of least
> resistance, the default kitchen-sync category for anything related
> to "object".  Or just like 'commit-graph' has its own category,
> would we invent a new class and call it 'multi-pack'?
>
> I cannot shake the feeling that these are making everything
> unnecessarily complex and adding more things that we need to explain
> to the end-user---and the worst part is I doubt it would help the
> end-users very much tot understand what gets explained.
 I agree with you that this is fairly complex. I did two things to
come up with this specific list:
1) Looked at what we're fsyncing today (most of these items are being
synced through the CSUM_FSYNC flag).  Some of these things should
perhaps not be fsynced if they are derived metadata that can be
reconstructed easily.  For instance, can the commit-graph file be
recomputed easily enough from the ODB and the refs? How hard is it to
reconstruct the pack-indexes? Maybe they should get their own item.
2) Thought about what's necessary to be able to retrieve data out of
git after a series of commands followed by a system crash.  If I fetch
a repo, modify some worktree files, add the modifications, and then
commit them, what does Git need to persist to not lose unique work?
We need to sync the packfiles since they form the base of the commit
graph and trees and it may be difficult to construct a sane repo if we
have objects without their dependencies. We obviously need to sync the
new objects the user is adding. We need to sync the index after 'add'
in case we crash between 'add' and 'commit', so the user can find
their new objects.  Lastly we need to sync the refs after a commit so
that the user can find the added objects after switching branches.

I also wanted to make sure we could express the current state of
fsyncing so that users could go back to that if the cost of syncing
some particular thing is too high in their workload.

I expect that people should really specify `objects` to sync all of
the things that comprise the object store and leave it to us to decide
which subcomponents are considered derived metadata.

>
> > core.fsyncMethod::
> > A value indicating the strategy Git will use to harden repository data using
> > fsync and related primitives.
> > +
> > * 'default' uses the fsync(2) system call or platform equivalents.
> > * 'batch' uses APIs such as sync_file_range or equivalent to reduce the number
> >   of hardware FLUSH CACHE requests sent to the storage hardware.
> > * 'writeout-only' (future) issues requests to send the writes to the storage
> > * hardware, but does not send any FLUSH CACHE request.
> > * 'syncfs' (future) uses the syncfs API, where available, to sync all of the
> >   files on the same filesystem as the Git repo.
>
> How would an end-user choose among these?  If they assume that the
> version of Git they use is bug-free, is there a reason why they
> should ever pick 'default' over 'batch', for example?  Shouldn't we
> be the one to choose the best approach on the underlying filesystem
> for the users, instead of forcing them to choose?
>
> As implementors, these choices may be of interest and give you a
> handy way to compare different design, but I am not sure if we want
> to give anything more complex than a binary choice, "default" and
> "eatmydata".

Maybe `default` should be renamed `fsync` to indicate the specific
action to be performed and no value can be "let git decide".  I think
the "eatmydata" option would be core.fsync=none and core.fsyncMethod
would then be ignored.  One reason to have this option would be to
allow distributors and organizations to choose a good config based on
the actual filesystem them deploy on.  On Windows, it would be clear
that we should use 'batch' because we're committed to making sure that
setting is actually safe on our filesystems (we'll fix bugs in the FS
if we find out that people are reporting corrupt repos).  Given that
the Linux and POSIX durability situation is really murky, it's hard to
see how Git can give any useful guarantee on those platforms.

>
> > core.fsyncObjectFiles::
> > If `true`, this legacy setting is equivalent to `core.fsync=objects`. If
> > `core.fsync` is explicitly specified, then this setting is ignored.
>
> I think deprecating this very-specific knob is a good idea,
> regardless of how complex we'd want to make the alternative.
>

Glad you agree.  I hope we see others weigh in on the tradeoff between
complexity and control.

Thanks,
Neeraj

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-11-18 19:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-10 15:09 RFC: A configuration design for future-proofing fsync() configuration Ævar Arnfjörð Bjarmason
2021-11-11  0:47 ` Neeraj Singh
2021-11-11  0:57   ` Ævar Arnfjörð Bjarmason
2021-11-17 22:16     ` Neeraj Singh
2021-11-18 19:00       ` Junio C Hamano
2021-11-18 19:46         ` Neeraj Singh
2021-11-12  5:54   ` Christoph Hellwig
2021-11-17 18:49     ` Neeraj Singh
2021-11-11 18:03 ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).