git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] enable core.fsyncObjectFiles by default
@ 2018-01-17 18:48 Christoph Hellwig
  2018-01-17 19:04 ` Junio C Hamano
  2018-01-17 20:55 ` Jeff King
  0 siblings, 2 replies; 52+ messages in thread
From: Christoph Hellwig @ 2018-01-17 18:48 UTC (permalink / raw)
  To: git; +Cc: linux-fsdevel

fsync is required for data integrity as there is no gurantee that
data makes it to disk at any specified time without it.  Even for
ext3 with data=ordered mode the file system will only commit all
data at some point in time that is not guaranteed.

I've lost data on development machines with various times countless
times due to the lack of this option, and now lost trees on a
git server with ext4 as well yesterday.  It's time to make git
safe by default.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/config.txt | 6 ++----
 environment.c            | 2 +-
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0e25b2c92..9a1cec5c8 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -866,10 +866,8 @@ core.whitespace::
 core.fsyncObjectFiles::
 	This boolean will enable 'fsync()' when writing object files.
 +
-This is a total waste of time and effort on a filesystem that orders
-data writes properly, but can be useful for filesystems that do not use
-journalling (traditional UNIX filesystems) or that only journal metadata
-and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
+This option is enabled by default and ensures actual data integrity
+by calling fsync after writing object files.
 
 core.preloadIndex::
 	Enable parallel index preload for operations like 'git diff'
diff --git a/environment.c b/environment.c
index 63ac38a46..c74375b5e 100644
--- a/environment.c
+++ b/environment.c
@@ -36,7 +36,7 @@ const char *git_hooks_path;
 int zlib_compression_level = Z_BEST_SPEED;
 int core_compression_level;
 int pack_compression_level = Z_DEFAULT_COMPRESSION;
-int fsync_object_files;
+int fsync_object_files = 1;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 96 * 1024 * 1024;
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 18:48 [PATCH] enable core.fsyncObjectFiles by default Christoph Hellwig
@ 2018-01-17 19:04 ` Junio C Hamano
  2018-01-17 19:35   ` Christoph Hellwig
                     ` (2 more replies)
  2018-01-17 20:55 ` Jeff King
  1 sibling, 3 replies; 52+ messages in thread
From: Junio C Hamano @ 2018-01-17 19:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: git, linux-fsdevel

Christoph Hellwig <hch@lst.de> writes:

> fsync is required for data integrity as there is no gurantee that
> data makes it to disk at any specified time without it.  Even for
> ext3 with data=ordered mode the file system will only commit all
> data at some point in time that is not guaranteed.

It comes from this one:

commit aafe9fbaf4f1d1f27a6f6e3eb3e246fff81240ef
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Jun 18 15:18:44 2008 -0700

    Add config option to enable 'fsync()' of object files
    
    As explained in the documentation[*] this is totally useless on
    filesystems that do ordered/journalled data writes, but it can be a
    useful safety feature on filesystems like HFS+ that only journal the
    metadata, not the actual file contents.
    
    It defaults to off, although we could presumably in theory some day
    auto-enable it on a per-filesystem basis.
    
    [*] Yes, I updated the docs for the thing.  Hell really _has_ frozen
        over, and the four horsemen are probably just beyond the horizon.
        EVERYBODY PANIC!
    
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 0e25b2c92..9a1cec5c8 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -866,10 +866,8 @@ core.whitespace::
>  core.fsyncObjectFiles::
>  	This boolean will enable 'fsync()' when writing object files.
>  +
> -This is a total waste of time and effort on a filesystem that orders
> -data writes properly, but can be useful for filesystems that do not use
> -journalling (traditional UNIX filesystems) or that only journal metadata
> -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
> +This option is enabled by default and ensures actual data integrity
> +by calling fsync after writing object files.

I am somewhat sympathetic to the desire to flip the default to
"safe" and allow those who know they are already safe to tweak the
knob for performance, and it also makes sense to document that the
default is "true" here.  But I do not see the point of removing the
four lines from this paragraph; the sole effect of the removal is to
rob information from readers that they can use to decide if they
want to disable the configuration, no?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 19:04 ` Junio C Hamano
@ 2018-01-17 19:35   ` Christoph Hellwig
  2018-01-17 20:05     ` Andreas Schwab
  2018-01-17 19:37   ` Matthew Wilcox
  2018-01-17 21:44   ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 52+ messages in thread
From: Christoph Hellwig @ 2018-01-17 19:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Christoph Hellwig, git, linux-fsdevel

On Wed, Jan 17, 2018 at 11:04:32AM -0800, Junio C Hamano wrote:
> I am somewhat sympathetic to the desire to flip the default to
> "safe" and allow those who know they are already safe to tweak the
> knob for performance, and it also makes sense to document that the
> default is "true" here.  But I do not see the point of removing the
> four lines from this paragraph; the sole effect of the removal is to
> rob information from readers that they can use to decide if they
> want to disable the configuration, no?

Does this sound any better?  It is a little to technical for my
taste, but I couldn't come up with anything better:

---
From ab8f2d38dfe40e74de5399af0d069427c7473b76 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Wed, 17 Jan 2018 19:42:46 +0100
Subject: enable core.fsyncObjectFiles by default

fsync is required for data integrity as there is no gurantee that
data makes it to disk at any specified time without it.  Even for
ext3 with data=ordered mode the file system will only commit all
data at some point in time that is not guaranteed.

I've lost data on development machines with various times countless
times due to the lack of this option, and now lost trees on a
git server with ext4 as well yesterday.  It's time to make git
safe by default.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/config.txt | 11 +++++++----
 environment.c            |  2 +-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0e25b2c92..8b99f1389 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -866,10 +866,13 @@ core.whitespace::
 core.fsyncObjectFiles::
 	This boolean will enable 'fsync()' when writing object files.
 +
-This is a total waste of time and effort on a filesystem that orders
-data writes properly, but can be useful for filesystems that do not use
-journalling (traditional UNIX filesystems) or that only journal metadata
-and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
+This option is enabled by default and ensures actual data integrity
+by calling fsync after writing object files.
+
+Note that this might be really slow on ext3 in the traditional
+data=ordered mode, in which case you might want to disable this option.
+ext3 in data=ordered mode will order the actual data writeout with
+metadata operation, but not actually guarantee data integrity either.
 
 core.preloadIndex::
 	Enable parallel index preload for operations like 'git diff'
diff --git a/environment.c b/environment.c
index 63ac38a46..c74375b5e 100644
--- a/environment.c
+++ b/environment.c
@@ -36,7 +36,7 @@ const char *git_hooks_path;
 int zlib_compression_level = Z_BEST_SPEED;
 int core_compression_level;
 int pack_compression_level = Z_DEFAULT_COMPRESSION;
-int fsync_object_files;
+int fsync_object_files = 1;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 96 * 1024 * 1024;
-- 
2.14.2




^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 19:04 ` Junio C Hamano
  2018-01-17 19:35   ` Christoph Hellwig
@ 2018-01-17 19:37   ` Matthew Wilcox
  2018-01-17 19:42     ` Christoph Hellwig
  2018-01-17 21:44   ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 52+ messages in thread
From: Matthew Wilcox @ 2018-01-17 19:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Christoph Hellwig, git, linux-fsdevel

On Wed, Jan 17, 2018 at 11:04:32AM -0800, Junio C Hamano wrote:
> Christoph Hellwig <hch@lst.de> writes:
> > @@ -866,10 +866,8 @@ core.whitespace::
> >  core.fsyncObjectFiles::
> >  	This boolean will enable 'fsync()' when writing object files.
> >  +
> > -This is a total waste of time and effort on a filesystem that orders
> > -data writes properly, but can be useful for filesystems that do not use
> > -journalling (traditional UNIX filesystems) or that only journal metadata
> > -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
> > +This option is enabled by default and ensures actual data integrity
> > +by calling fsync after writing object files.
> 
> I am somewhat sympathetic to the desire to flip the default to
> "safe" and allow those who know they are already safe to tweak the
> knob for performance, and it also makes sense to document that the
> default is "true" here.  But I do not see the point of removing the
> four lines from this paragraph; the sole effect of the removal is to
> rob information from readers that they can use to decide if they
> want to disable the configuration, no?

How about this instead?

This option is enabled by default and ensures data integrity by calling
fsync after writing object files.  It is not necessary on filesystems
which journal data writes, but is still necessary on filesystems which
do not use journalling (ext2), or that only journal metadata writes
(OS X's HFS+, or Linux's ext4 with "data=writeback").  Turning this
option off will increase performance at the possible risk of data loss.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 19:37   ` Matthew Wilcox
@ 2018-01-17 19:42     ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2018-01-17 19:42 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Junio C Hamano, Christoph Hellwig, git, linux-fsdevel

On Wed, Jan 17, 2018 at 11:37:31AM -0800, Matthew Wilcox wrote:
> How about this instead?
> 
> This option is enabled by default and ensures data integrity by calling
> fsync after writing object files.  It is not necessary on filesystems
> which journal data writes, but is still necessary on filesystems which
> do not use journalling (ext2), or that only journal metadata writes
> (OS X's HFS+, or Linux's ext4 with "data=writeback").  Turning this
> option off will increase performance at the possible risk of data loss.

I think this goes entirely into the wrong direction.  The point is
fsync is always the right thing to do.  But on ext3 (and ext3 only)
the right thing is way too painful, and conveniently ext3 happens
to be almost ok without it.  So if anything should get a special
mention it is ext3.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 19:35   ` Christoph Hellwig
@ 2018-01-17 20:05     ` Andreas Schwab
  0 siblings, 0 replies; 52+ messages in thread
From: Andreas Schwab @ 2018-01-17 20:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Junio C Hamano, git, linux-fsdevel

On Jan 17 2018, Christoph Hellwig <hch@lst.de> wrote:

> I've lost data on development machines with various times countless
> times due to the lack of this option, and now lost trees on a

Too many times. :-)

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 18:48 [PATCH] enable core.fsyncObjectFiles by default Christoph Hellwig
  2018-01-17 19:04 ` Junio C Hamano
@ 2018-01-17 20:55 ` Jeff King
  2018-01-17 21:10   ` Christoph Hellwig
  1 sibling, 1 reply; 52+ messages in thread
From: Jeff King @ 2018-01-17 20:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: git, linux-fsdevel

On Wed, Jan 17, 2018 at 07:48:28PM +0100, Christoph Hellwig wrote:

> fsync is required for data integrity as there is no gurantee that
> data makes it to disk at any specified time without it.  Even for
> ext3 with data=ordered mode the file system will only commit all
> data at some point in time that is not guaranteed.
> 
> I've lost data on development machines with various times countless
> times due to the lack of this option, and now lost trees on a
> git server with ext4 as well yesterday.  It's time to make git
> safe by default.

I'm definitely sympathetic, and I've contemplated a patch like this a
few times. But I'm not sure we're "safe by default" here after this
patch. In particular:

  1. This covers only loose objects. We generally sync pack writes
     already, so we're covered there. But we do not sync ref updates at
     all, which we'd probably want to in a default-safe setup (a common
     post-crash symptom I've seen is zero-length ref files).

  2. Is it sufficient to fsync() the individual file's descriptors?
     We often do other filesystem operations (like hardlinking or
     renaming) that also need to be committed to disk before an
     operation can be considered saved.

  3. Related to (2), we often care about the order of metadata commits.
     E.g., a common sequence is:

       a. Write object contents to tempfile.

       b. rename() or hardlink tempfile to final name.

       c. Write object name into ref.lock file.

       d. rename() ref.lock to ref

     If we see (d) but not (b), then the result is a corrupted
     repository. Is this guaranteed by ext4 journaling with
     data=ordered?

It may be that data=ordered gets us what we need for (2) and (3). But I
think at the very least we should consider fsyncing ref updates based on
a config option, too.

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 20:55 ` Jeff King
@ 2018-01-17 21:10   ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2018-01-17 21:10 UTC (permalink / raw)
  To: Jeff King; +Cc: Christoph Hellwig, git, linux-fsdevel

On Wed, Jan 17, 2018 at 03:55:09PM -0500, Jeff King wrote:
> I'm definitely sympathetic, and I've contemplated a patch like this a
> few times. But I'm not sure we're "safe by default" here after this
> patch. In particular:
> 
>   1. This covers only loose objects. We generally sync pack writes
>      already, so we're covered there. But we do not sync ref updates at
>      all, which we'd probably want to in a default-safe setup (a common
>      post-crash symptom I've seen is zero-length ref files).

I've not seen them myself yet, but yes, they need an fsync.

>   2. Is it sufficient to fsync() the individual file's descriptors?
>      We often do other filesystem operations (like hardlinking or
>      renaming) that also need to be committed to disk before an
>      operation can be considered saved.

No, for metadata operations we need to fsync the directory as well.

>   3. Related to (2), we often care about the order of metadata commits.
>      E.g., a common sequence is:
> 
>        a. Write object contents to tempfile.
> 
>        b. rename() or hardlink tempfile to final name.
> 
>        c. Write object name into ref.lock file.
> 
>        d. rename() ref.lock to ref
> 
>      If we see (d) but not (b), then the result is a corrupted
>      repository. Is this guaranteed by ext4 journaling with
>      data=ordered?

It is not generally guranteed by Linux file system semantics.  Various
file system will actually start writeback of file data before rename,
but not actually wait on it.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 19:04 ` Junio C Hamano
  2018-01-17 19:35   ` Christoph Hellwig
  2018-01-17 19:37   ` Matthew Wilcox
@ 2018-01-17 21:44   ` Ævar Arnfjörð Bjarmason
  2018-01-17 22:07     ` Linus Torvalds
  2 siblings, 1 reply; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-17 21:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Christoph Hellwig, git, linux-fsdevel, Linus Torvalds


On Wed, Jan 17 2018, Junio C. Hamano jotted:

> Christoph Hellwig <hch@lst.de> writes:
>
>> fsync is required for data integrity as there is no gurantee that
>> data makes it to disk at any specified time without it.  Even for
>> ext3 with data=ordered mode the file system will only commit all
>> data at some point in time that is not guaranteed.
>
> It comes from this one:
>
> commit aafe9fbaf4f1d1f27a6f6e3eb3e246fff81240ef
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Wed Jun 18 15:18:44 2008 -0700
>
>     Add config option to enable 'fsync()' of object files
>
>     As explained in the documentation[*] this is totally useless on
>     filesystems that do ordered/journalled data writes, but it can be a
>     useful safety feature on filesystems like HFS+ that only journal the
>     metadata, not the actual file contents.
>
>     It defaults to off, although we could presumably in theory some day
>     auto-enable it on a per-filesystem basis.
>
>     [*] Yes, I updated the docs for the thing.  Hell really _has_ frozen
>         over, and the four horsemen are probably just beyond the horizon.
>         EVERYBODY PANIC!
>
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index 0e25b2c92..9a1cec5c8 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -866,10 +866,8 @@ core.whitespace::
>>  core.fsyncObjectFiles::
>>  	This boolean will enable 'fsync()' when writing object files.
>>  +
>> -This is a total waste of time and effort on a filesystem that orders
>> -data writes properly, but can be useful for filesystems that do not use
>> -journalling (traditional UNIX filesystems) or that only journal metadata
>> -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
>> +This option is enabled by default and ensures actual data integrity
>> +by calling fsync after writing object files.
>
> I am somewhat sympathetic to the desire to flip the default to
> "safe" and allow those who know they are already safe to tweak the
> knob for performance, and it also makes sense to document that the
> default is "true" here.  But I do not see the point of removing the
> four lines from this paragraph; the sole effect of the removal is to
> rob information from readers that they can use to decide if they
> want to disable the configuration, no?

[CC'd the author of the current behavior]

Some points/questions:

 a) Is there some reliable way to test whether this is needed from
    userspace? I'm thinking something like `git update-index
    --test-untracked-cache` but for fsync().

 b) On the filesystems that don't need this, what's the performance
    impact?

    I ran a small test myself on CentOS 7 (3.10) with ext4 data=ordered
    on the tests I thought might do a lot of loose object writes:

      $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS="NO_OPENSSL=Y CFLAGS=-O3 -j56" ./run origin/master fsync-on~ fsync-on p3400-rebase.sh p0007-write-cache.sh
      [...]
      Test                                                            fsync-on~         fsync-on
      -------------------------------------------------------------------------------------------------------
      3400.2: rebase on top of a lot of unrelated changes             1.45(1.30+0.17)   1.45(1.28+0.20) +0.0%
      3400.4: rebase a lot of unrelated changes without split-index   4.34(3.71+0.66)   4.33(3.69+0.66) -0.2%
      3400.6: rebase a lot of unrelated changes with split-index      3.38(2.94+0.47)   3.38(2.93+0.47) +0.0%
      0007.2: write_locked_index 3 times (3214 files)                 0.01(0.00+0.00)   0.01(0.00+0.00) +0.0%

   No impact. However I did my own test of running the test suite 10%
   times with/without this patch, and it runs 9% slower:

     fsync-off: avg:21.59 21.50 21.50 21.52 21.53 21.54 21.57 21.59 21.61 21.63 21.95
      fsync-on: avg:23.43 23.21 23.25 23.26 23.26 23.27 23.32 23.49 23.51 23.83 23.88

   Test script at the end of this E-Mail.

 c) What sort of guarantees in this regard do NFS-mounted filesystems
    commonly make?

Test script:

use v5.10.0;
use strict;
use warnings;
use Time::HiRes qw(time);
use List::Util qw(sum);
use Data::Dumper;

my %time;
for my $ref (@ARGV) {
    system "git checkout $ref";
    system qq[make -j56 CFLAGS="-O3 -g" NO_OPENSSL=Y all];
    for (1..10) {
        my $t0 = -time();
        system "(cd t && NO_SVN_TESTS=1 GIT_TEST_HTTPD=0 prove -j56 --state=slow,save t[0-9]*.sh)";
        $t0 += time();
        push @{$time{$ref}} => $t0;
    }
}
for my $ref (sort keys %time) {
    printf "%20s: avg:%.2f %s\n",
        $ref,
        sum(@{$time{$ref}})/@{$time{$ref}},
        join(" ", map { sprintf "%.02f", $_ } sort { $a <=> $b } @{$time{$ref}});
}

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 21:44   ` Ævar Arnfjörð Bjarmason
@ 2018-01-17 22:07     ` Linus Torvalds
  2018-01-17 22:25       ` Linus Torvalds
                         ` (2 more replies)
  0 siblings, 3 replies; 52+ messages in thread
From: Linus Torvalds @ 2018-01-17 22:07 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Christoph Hellwig, Git Mailing List, linux-fsdevel

On Wed, Jan 17, 2018 at 1:44 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>     I ran a small test myself on CentOS 7 (3.10) with ext4 data=ordered
>     on the tests I thought might do a lot of loose object writes:
>
>       $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS="NO_OPENSSL=Y CFLAGS=-O3 -j56" ./run origin/master fsync-on~ fsync-on p3400-rebase.sh p0007-write-cache.sh
>       [...]
>       Test                                                            fsync-on~         fsync-on
>       -------------------------------------------------------------------------------------------------------
>       3400.2: rebase on top of a lot of unrelated changes             1.45(1.30+0.17)   1.45(1.28+0.20) +0.0%
>       3400.4: rebase a lot of unrelated changes without split-index   4.34(3.71+0.66)   4.33(3.69+0.66) -0.2%
>       3400.6: rebase a lot of unrelated changes with split-index      3.38(2.94+0.47)   3.38(2.93+0.47) +0.0%
>       0007.2: write_locked_index 3 times (3214 files)                 0.01(0.00+0.00)   0.01(0.00+0.00) +0.0%
>
>    No impact. However I did my own test of running the test suite 10%
>    times with/without this patch, and it runs 9% slower:
>
>      fsync-off: avg:21.59 21.50 21.50 21.52 21.53 21.54 21.57 21.59 21.61 21.63 21.95
>       fsync-on: avg:23.43 23.21 23.25 23.26 23.26 23.27 23.32 23.49 23.51 23.83 23.88

That's not the thing you should check.

Now re-do the test while another process writes to a totally unrelated
a huge file (say, do a ISO file copy or something).

That was the thing that several filesystems get completely and
horribly wrong. Generally _particularly_ the logging filesystems that
don't even need the fsync, because they use a single log for
everything (so fsync serializes all the writes, not just the writes to
the one file it's fsync'ing).

The original git design was very much to write each object file
without any syncing, because they don't matter since a new object file
- by definition - isn't really reachable. Then sync before writing the
index file or a new ref.

But things have changed, I'm not arguing that the code shouldn't be
made safe by default. I personally refuse to use rotating media on my
machines anyway, largely exactly because of the fsync() issue (things
like "firefox" started doing fsync on the mysql database for stupid
things, and you'd get huge pauses).

But I do think your benchmark is wrong. The case where only git does
something is not interesting or relevant. It really is "git does
something _and_ somebody else writes something entirely unrelated at
the same time" that matters.

                  Linus

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 22:07     ` Linus Torvalds
@ 2018-01-17 22:25       ` Linus Torvalds
  2018-01-17 23:16       ` Ævar Arnfjörð Bjarmason
  2018-01-17 23:52       ` Theodore Ts'o
  2 siblings, 0 replies; 52+ messages in thread
From: Linus Torvalds @ 2018-01-17 22:25 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Christoph Hellwig, Git Mailing List, linux-fsdevel

On Wed, Jan 17, 2018 at 2:07 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The original git design was very much to write each object file
> without any syncing, because they don't matter since a new object file
> - by definition - isn't really reachable. Then sync before writing the
> index file or a new ref.

.. actually, I think it originally sync'ed only before starting to
actually remove objects due to repacking.

The theory was that you can lose the last commit series or whatever,
and have to git-fsck, and have to re-do it, but you could never lose
long-term work. If your machine crashes, you still remember what you
did just before the crash.

That theory may have been more correct back in the days than it is
now. People who use git might be less willing to treat it like a
filesystem that you can fsck than I was back 10+ ago.

It's worth noting that the commit that Junio pointed to (from 2008)
didn't actually change any behavior. It just allowed people who cared
to change behavior. The original "let's not waste time on fsync every
object write, because we can just re-create the state anyway" behavior
goes back to 2005.

              Linus

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 22:07     ` Linus Torvalds
  2018-01-17 22:25       ` Linus Torvalds
@ 2018-01-17 23:16       ` Ævar Arnfjörð Bjarmason
  2018-01-17 23:42         ` Linus Torvalds
  2018-01-17 23:52       ` Theodore Ts'o
  2 siblings, 1 reply; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-17 23:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, Christoph Hellwig, Git Mailing List,
	linux-fsdevel, Roberto Tyley


On Wed, Jan 17 2018, Linus Torvalds jotted:

> On Wed, Jan 17, 2018 at 1:44 PM, Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>>     I ran a small test myself on CentOS 7 (3.10) with ext4 data=ordered
>>     on the tests I thought might do a lot of loose object writes:
>>
>>       $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS="NO_OPENSSL=Y CFLAGS=-O3 -j56" ./run origin/master fsync-on~ fsync-on p3400-rebase.sh p0007-write-cache.sh
>>       [...]
>>       Test                                                            fsync-on~         fsync-on
>>       -------------------------------------------------------------------------------------------------------
>>       3400.2: rebase on top of a lot of unrelated changes             1.45(1.30+0.17)   1.45(1.28+0.20) +0.0%
>>       3400.4: rebase a lot of unrelated changes without split-index   4.34(3.71+0.66)   4.33(3.69+0.66) -0.2%
>>       3400.6: rebase a lot of unrelated changes with split-index      3.38(2.94+0.47)   3.38(2.93+0.47) +0.0%
>>       0007.2: write_locked_index 3 times (3214 files)                 0.01(0.00+0.00)   0.01(0.00+0.00) +0.0%
>>
>>    No impact. However I did my own test of running the test suite 10%
>>    times with/without this patch, and it runs 9% slower:

That should be "10 times" b.t.w., not "10% times"

>>
>>      fsync-off: avg:21.59 21.50 21.50 21.52 21.53 21.54 21.57 21.59 21.61 21.63 21.95
>>       fsync-on: avg:23.43 23.21 23.25 23.26 23.26 23.27 23.32 23.49 23.51 23.83 23.88
>
> That's not the thing you should check.
>
> Now re-do the test while another process writes to a totally unrelated
> a huge file (say, do a ISO file copy or something).
>
> That was the thing that several filesystems get completely and
> horribly wrong. Generally _particularly_ the logging filesystems that
> don't even need the fsync, because they use a single log for
> everything (so fsync serializes all the writes, not just the writes to
> the one file it's fsync'ing).
>
> The original git design was very much to write each object file
> without any syncing, because they don't matter since a new object file
> - by definition - isn't really reachable. Then sync before writing the
> index file or a new ref.
>>
> But things have changed, I'm not arguing that the code shouldn't be
> made safe by default. I personally refuse to use rotating media on my
> machines anyway, largely exactly because of the fsync() issue (things
> like "firefox" started doing fsync on the mysql database for stupid
> things, and you'd get huge pauses).
>
> But I do think your benchmark is wrong. The case where only git does
> something is not interesting or relevant. It really is "git does
> something _and_ somebody else writes something entirely unrelated at
> the same time" that matters.

Yeah it's shitty, just a quick hack to get some since there was a
discussion of performance, but neither your original patch or this
thread had quoted any.

One thing you may have missed is that this is a parallel (56 tests at a
time) run of the full test suite. So there's plenty of other git
processes (and test setup/teardown) racing with any given git
process. Running the test suite in a loop like this gives me ~100K IO
ops/s & ~50% disk utilization.

Or does overall FS activity and raw throughput (e.g. with an ISO copy)
matter more than general FS contention?

Tweaking it to emulate this iso copy case, running another test with one
of these running concurrently:

    # setup
    dd if=/dev/urandom of=/tmp/fake.iso bs=1024 count=$((1000*1024))
    # run in a loop (shuf to not always write the same thing)
    while sleep 0.1; do shuf /tmp/fake.iso | pv >/tmp/fake.shuf.iso; done

Gives throughput that spikes to 100% (not consistently) and:

    fsync-off: avg:36.37 31.74 33.83 35.12 36.19 36.32 37.04 37.34 37.71 37.93 40.43
     fsync-on: avg:38.09 34.56 35.14 35.69 36.41 36.41 37.96 38.25 40.45 41.44 44.59

~4.7% slower, v.s. ~8.5% in my earlier
87h8rki2iu.fsf@evledraar.gmail.com without that running.

Which is not an argument for / against this patch, but those numbers
seem significant, and generally if the entire test suite slows down by
that much there's going to be sub-parts of it that are much worse.

Which might be a reason to tread more carefully and if it *does* slow
things down perhaps do it with more granularity, e.g. turning it on in
git-receive-pack might be more sensible than in git-filter-branch.

I remember Roberto Tyley's BFG writing an amazing amount of loose
objects, but it doesn't seem to have an fsync() option, I wonder if
adding one would be a representative pathological test case.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 23:16       ` Ævar Arnfjörð Bjarmason
@ 2018-01-17 23:42         ` Linus Torvalds
  0 siblings, 0 replies; 52+ messages in thread
From: Linus Torvalds @ 2018-01-17 23:42 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Christoph Hellwig, Git Mailing List,
	linux-fsdevel, Roberto Tyley

On Wed, Jan 17, 2018 at 3:16 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> Or does overall FS activity and raw throughput (e.g. with an ISO copy)
> matter more than general FS contention?

Traditionally, yes.

Also note that none of this is about "throughput". It's about waiting
for a second or two when you do a simple "git commit" or similar.

Also, hardware has changed. I haven't used a rotational disk in about
10 years now, and I don't worry about latencies so much more.

The fact that you get ~100k iops indicates that you probably aren't
using those stupid platters of rust either. So I doubt you can even
trigger the bad cases.

                Linus

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 22:07     ` Linus Torvalds
  2018-01-17 22:25       ` Linus Torvalds
  2018-01-17 23:16       ` Ævar Arnfjörð Bjarmason
@ 2018-01-17 23:52       ` Theodore Ts'o
  2018-01-17 23:57         ` Linus Torvalds
  2020-09-17 11:06         ` Ævar Arnfjörð Bjarmason
  2 siblings, 2 replies; 52+ messages in thread
From: Theodore Ts'o @ 2018-01-17 23:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Christoph Hellwig, Git Mailing List, linux-fsdevel

On Wed, Jan 17, 2018 at 02:07:22PM -0800, Linus Torvalds wrote:
> 
> Now re-do the test while another process writes to a totally unrelated
> a huge file (say, do a ISO file copy or something).
> 
> That was the thing that several filesystems get completely and
> horribly wrong. Generally _particularly_ the logging filesystems that
> don't even need the fsync, because they use a single log for
> everything (so fsync serializes all the writes, not just the writes to
> the one file it's fsync'ing).

Well, let's be fair; this is something *ext3* got wrong, and it was
the default file system back them.  All of the modern file systems now
do delayed allocation, which means that an fsync of one file doesn't
actually imply an fsync of another file.  Hence...

> The original git design was very much to write each object file
> without any syncing, because they don't matter since a new object file
> - by definition - isn't really reachable. Then sync before writing the
> index file or a new ref.

This isn't really safe any more.  Yes, there's a single log.  But
files which are subject to delayed allocation are in the page cache,
and just because you fsync the index file doesn't mean that the object
file is now written to disk.  It was true for ext3, but it's not true
for ext4, xfs, btrfs, etc.

The good news is that if you have another process downloading a huge
ISO image, the fsync of the index file won't force the ISO file to be
written out.  The bad news is that it won't force out the other git
object files, either.

Now, there is a potential downside of fsync'ing each object file, and
that is the cost of doing a CACHE FLUSH on a HDD is non-trivial, and
even on a SSD, it's not optimal to call CACHE FLUSH thousands of times
in a second.  So if you are creating thousands of tiny files, and you
fsync each one, each fsync(2) call is a serializing instruction, which
means it won't return until that one file is written to disk.  If you
are writing lots of small files, and you are using a HDD, you'll be
bottlenecked to around 30 files per second on a 5400 RPM HDD, and this
is true regardless of what file system you use, because the bottle
neck is the CACHE FLUSH operation, and how you organize the metadata
and how you do the block allocation, is largely lost in the noise
compared to the CACHE FLUSH command, which serializes everything.

There are solutions to this; you could simply not call fsync(2) a
thousand times, and instead write a pack file, and call fsync once on
the pack file.  That's probably the smartest approach.

You could also create a thousand threads, and call fsync(2) on those
thousand threads at roughly the same time.  Or you could use a
bleeding edge kernel with the latest AIO patch, and use the newly
added IOCB_CMD_FSYNC support.

But I'd simply recommend writing a pack and fsync'ing the pack,
instead of trying to write a gazillion object files.  (git-repack -A,
I'm looking at you....)

					- Ted

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 23:52       ` Theodore Ts'o
@ 2018-01-17 23:57         ` Linus Torvalds
  2018-01-18 16:27           ` Christoph Hellwig
  2020-09-17 11:06         ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 52+ messages in thread
From: Linus Torvalds @ 2018-01-17 23:57 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Christoph Hellwig, Git Mailing List, linux-fsdevel

On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>
> Well, let's be fair; this is something *ext3* got wrong, and it was
> the default file system back them.

I'm pretty sure reiserfs and btrfs did too..

          Linus

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 23:57         ` Linus Torvalds
@ 2018-01-18 16:27           ` Christoph Hellwig
  2018-01-19 19:08             ` Junio C Hamano
  2018-01-21 21:32             ` Chris Mason
  0 siblings, 2 replies; 52+ messages in thread
From: Christoph Hellwig @ 2018-01-18 16:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Ts'o, Ævar Arnfjörð Bjarmason,
	Junio C Hamano, Christoph Hellwig, Git Mailing List,
	linux-fsdevel, Chris Mason

[adding Chris to the Cc list - this is about the awful ext3 data=ordered
behavior of syncing the whole file system data and metadata on each
fsync]

On Wed, Jan 17, 2018 at 03:57:53PM -0800, Linus Torvalds wrote:
> On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> >
> > Well, let's be fair; this is something *ext3* got wrong, and it was
> > the default file system back them.
> 
> I'm pretty sure reiserfs and btrfs did too..

I'm pretty sure btrfs never did, and reiserfs at least looks like
it currently doesn't but I'd have to dig into history to check if
it ever did.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-18 16:27           ` Christoph Hellwig
@ 2018-01-19 19:08             ` Junio C Hamano
  2018-01-20 22:14               ` Theodore Ts'o
  2018-01-21 21:32             ` Chris Mason
  1 sibling, 1 reply; 52+ messages in thread
From: Junio C Hamano @ 2018-01-19 19:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, Theodore Ts'o,
	Ævar Arnfjörð Bjarmason, Git Mailing List,
	linux-fsdevel, Chris Mason

Christoph Hellwig <hch@lst.de> writes:

> [adding Chris to the Cc list - this is about the awful ext3 data=ordered
> behavior of syncing the whole file system data and metadata on each
> fsync]
>
> On Wed, Jan 17, 2018 at 03:57:53PM -0800, Linus Torvalds wrote:
>> On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>> >
>> > Well, let's be fair; this is something *ext3* got wrong, and it was
>> > the default file system back them.
>> 
>> I'm pretty sure reiserfs and btrfs did too..
>
> I'm pretty sure btrfs never did, and reiserfs at least looks like
> it currently doesn't but I'd have to dig into history to check if
> it ever did.

So..., is it fair to say that the one you sent in

  https://public-inbox.org/git/20180117193510.GA30657@lst.de/

is the best variant we have seen in this thread so far?  I'll keep
that in my inbox so that I do not forget, but I think we would want
to deal with a hotfix for 2.16 on case insensitive platforms before
this topic.

Thanks.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-19 19:08             ` Junio C Hamano
@ 2018-01-20 22:14               ` Theodore Ts'o
  2018-01-20 22:27                 ` Junio C Hamano
  0 siblings, 1 reply; 52+ messages in thread
From: Theodore Ts'o @ 2018-01-20 22:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Christoph Hellwig, Linus Torvalds,
	Ævar Arnfjörð Bjarmason, Git Mailing List,
	linux-fsdevel, Chris Mason

On Fri, Jan 19, 2018 at 11:08:46AM -0800, Junio C Hamano wrote:
> So..., is it fair to say that the one you sent in
> 
>   https://public-inbox.org/git/20180117193510.GA30657@lst.de/
> 
> is the best variant we have seen in this thread so far?  I'll keep
> that in my inbox so that I do not forget, but I think we would want
> to deal with a hotfix for 2.16 on case insensitive platforms before
> this topic.

It's a simplistic fix, but it will work.  There may very well be
certain workloads which generate a large number of loose objects
(e.g., git repack -A) which will make things go significantly more
slowly as a result.  It might very well be the case that if nothing
else is going on, something like "write all the files without
fsync(2), then use syncfs(2)" would be much faster.  The downside with
that approach is if indeed you were downloading a multi-gigabyte DVD
image at the same time, the syncfs(2) will force a writeback of the
partially writte DVD image, or some other unrelated files.

But if the goal is to just change the default, and then see what
shakes out, and then apply other optimizations later, that's certainly
a valid result.  I've never been fond of the "git repack -A" behavior
where it can generate huge numbers of loose files.  I'd much prefer it
if the other objects ended up in a separate pack file, and then some
other provision made for nuking that pack file some time later.  But
that's expanding the scope significantly over what's currently being
discussed.

						- Ted

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-20 22:14               ` Theodore Ts'o
@ 2018-01-20 22:27                 ` Junio C Hamano
  2018-01-22 15:09                   ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 52+ messages in thread
From: Junio C Hamano @ 2018-01-20 22:27 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Christoph Hellwig, Linus Torvalds,
	Ævar Arnfjörð Bjarmason, Git Mailing List,
	linux-fsdevel, Chris Mason, Jeff King

Theodore Ts'o <tytso@mit.edu> writes:

> ....  I've never been fond of the "git repack -A" behavior
> where it can generate huge numbers of loose files.  I'd much prefer it
> if the other objects ended up in a separate pack file, and then some
> other provision made for nuking that pack file some time later....

Yes, a "cruft pack" that holds unreachable object has been discussed
a few times recently on list, and I do agree that it is a desirable
thing to have in the longer run.

What's tricky is to devise a way to allow us to salvage objects that
are placed in a cruft pack because they are accessed recently,
proving themselves to be no longer crufts.  It could be that a good
way to resurrect them is to explode them to loose form when they are
accessed out of a cruft pack.  We need to worry about interactions
with read-only users if we go that route, but with the current
"explode unreachable to loose, touch their mtime when they are
accessed" scheme ends up ignoring accesses from read-only users that
cannot update mtime, so it might not be too bad.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-18 16:27           ` Christoph Hellwig
  2018-01-19 19:08             ` Junio C Hamano
@ 2018-01-21 21:32             ` Chris Mason
  1 sibling, 0 replies; 52+ messages in thread
From: Chris Mason @ 2018-01-21 21:32 UTC (permalink / raw)
  To: Christoph Hellwig, Linus Torvalds
  Cc: Theodore Ts'o, Ævar Arnfjörð Bjarmason,
	Junio C Hamano, Git Mailing List, linux-fsdevel

On 01/18/2018 11:27 AM, Christoph Hellwig wrote:
> [adding Chris to the Cc list - this is about the awful ext3 data=ordered
> behavior of syncing the whole file system data and metadata on each
> fsync]
> 
> On Wed, Jan 17, 2018 at 03:57:53PM -0800, Linus Torvalds wrote:
>> On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>>>
>>> Well, let's be fair; this is something *ext3* got wrong, and it was
>>> the default file system back them.
>>
>> I'm pretty sure reiserfs and btrfs did too..
> 
> I'm pretty sure btrfs never did, and reiserfs at least looks like
> it currently doesn't but I'd have to dig into history to check if
> it ever did.
> 

Christoph has this right, btrfs only fsyncs the one file that you've 
asked for, and unrelated data/metadata won't be included.

We've seen big fsync stalls on ext4 caused by data=ordered, so it's 
still possible to trigger on ext4, but much better than ext3.

I do share Ted's concern about the perf impact of the fsyncs on tons of 
loose files, but the defaults should be safety first.

-chris

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-20 22:27                 ` Junio C Hamano
@ 2018-01-22 15:09                   ` Ævar Arnfjörð Bjarmason
  2018-01-22 18:09                     ` Theodore Ts'o
  2018-01-23  0:25                     ` Jeff King
  0 siblings, 2 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-22 15:09 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Theodore Ts'o, Christoph Hellwig, Linus Torvalds,
	Git Mailing List, linux-fsdevel, Chris Mason, Jeff King


On Sat, Jan 20 2018, Junio C. Hamano jotted:

> Theodore Ts'o <tytso@mit.edu> writes:
>
>> ....  I've never been fond of the "git repack -A" behavior
>> where it can generate huge numbers of loose files.  I'd much prefer it
>> if the other objects ended up in a separate pack file, and then some
>> other provision made for nuking that pack file some time later....
>
> Yes, a "cruft pack" that holds unreachable object has been discussed
> a few times recently on list, and I do agree that it is a desirable
> thing to have in the longer run.
>
> What's tricky is to devise a way to allow us to salvage objects that
> are placed in a cruft pack because they are accessed recently,
> proving themselves to be no longer crufts.  It could be that a good
> way to resurrect them is to explode them to loose form when they are
> accessed out of a cruft pack.  We need to worry about interactions
> with read-only users if we go that route, but with the current
> "explode unreachable to loose, touch their mtime when they are
> accessed" scheme ends up ignoring accesses from read-only users that
> cannot update mtime, so it might not be too bad.

Wouldn't it also make gc pruning more expensive? Now you can repack
regularly and loose objects will be left out of the pack, and then just
rm'd, whereas now it would entail creating new packs (unless the whole
pack was objects meant for removal).

Probably still worth it, but something to keep in mind.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-22 15:09                   ` Ævar Arnfjörð Bjarmason
@ 2018-01-22 18:09                     ` Theodore Ts'o
  2018-01-23  0:47                       ` Jeff King
  2018-01-23  0:25                     ` Jeff King
  1 sibling, 1 reply; 52+ messages in thread
From: Theodore Ts'o @ 2018-01-22 18:09 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Christoph Hellwig, Linus Torvalds,
	Git Mailing List, linux-fsdevel, Chris Mason, Jeff King

On Mon, Jan 22, 2018 at 04:09:23PM +0100, Ævar Arnfjörð Bjarmason wrote:
> > What's tricky is to devise a way to allow us to salvage objects that
> > are placed in a cruft pack because they are accessed recently,
> > proving themselves to be no longer crufts.  It could be that a good
> > way to resurrect them is to explode them to loose form when they are
> > accessed out of a cruft pack.  We need to worry about interactions
> > with read-only users if we go that route, but with the current
> > "explode unreachable to loose, touch their mtime when they are
> > accessed" scheme ends up ignoring accesses from read-only users that
> > cannot update mtime, so it might not be too bad.
> 
> Wouldn't it also make gc pruning more expensive? Now you can repack
> regularly and loose objects will be left out of the pack, and then just
> rm'd, whereas now it would entail creating new packs (unless the whole
> pack was objects meant for removal).

The idea is that the cruft pack would be all objects that were no
longer referenced.  Hence the proposal that if they ever *are*
accessed, they would be exploded to a loose object at that point.  So
in the common case, the GC would go quickly since the entire pack
could just be rm'ed once it hit the designated expiry time.

Another way of doing things would be to use the mtime of the cruft
pack for the expiry time, and if the curft pack is ever referenced,
its mtime would get updated.  Yet a third way would be to simply clear
the "cruft" bit if it ever *is* referenced.  In the common case, it
would never be referenced, so it could just get deleted, but in the
case where the user has manually "rescued" a set of commits (perhaps
by explicitly setting a branch head to commit id found from a reflog),
the objects would be saved.

So there are many ways it could be managed.

							- Ted

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-22 15:09                   ` Ævar Arnfjörð Bjarmason
  2018-01-22 18:09                     ` Theodore Ts'o
@ 2018-01-23  0:25                     ` Jeff King
  1 sibling, 0 replies; 52+ messages in thread
From: Jeff King @ 2018-01-23  0:25 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Theodore Ts'o, Christoph Hellwig,
	Linus Torvalds, Git Mailing List, linux-fsdevel, Chris Mason

On Mon, Jan 22, 2018 at 04:09:23PM +0100, Ævar Arnfjörð Bjarmason wrote:

> > Yes, a "cruft pack" that holds unreachable object has been discussed
> > a few times recently on list, and I do agree that it is a desirable
> > thing to have in the longer run.
> >
> > What's tricky is to devise a way to allow us to salvage objects that
> > are placed in a cruft pack because they are accessed recently,
> > proving themselves to be no longer crufts.  It could be that a good
> > way to resurrect them is to explode them to loose form when they are
> > accessed out of a cruft pack.  We need to worry about interactions
> > with read-only users if we go that route, but with the current
> > "explode unreachable to loose, touch their mtime when they are
> > accessed" scheme ends up ignoring accesses from read-only users that
> > cannot update mtime, so it might not be too bad.
> 
> Wouldn't it also make gc pruning more expensive? Now you can repack
> regularly and loose objects will be left out of the pack, and then just
> rm'd, whereas now it would entail creating new packs (unless the whole
> pack was objects meant for removal).
> 
> Probably still worth it, but something to keep in mind.

That's a good point. I think it would be OK in practice, though, since
normal operations don't tend to create a huge number of unreachable
loose objects (at least compared to the _reachable_ loose objects, which
we're already dealing with). We tend to get unbalanced explosions of
loose objects only because a huge chunk of packed history expired.

It is something to keep in mind when implementing the scheme, though.
Luckily we already have the right behavior implemented via
--pack-loose-unreachable (which is used for "repack -k" currently), so I
think it would just be a matter of passing the right flags from
git-repack.

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-22 18:09                     ` Theodore Ts'o
@ 2018-01-23  0:47                       ` Jeff King
  2018-01-23  5:45                         ` Theodore Ts'o
  0 siblings, 1 reply; 52+ messages in thread
From: Jeff King @ 2018-01-23  0:47 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Christoph Hellwig, Linus Torvalds, Git Mailing List,
	linux-fsdevel, Chris Mason

On Mon, Jan 22, 2018 at 01:09:03PM -0500, Theodore Ts'o wrote:

> > Wouldn't it also make gc pruning more expensive? Now you can repack
> > regularly and loose objects will be left out of the pack, and then just
> > rm'd, whereas now it would entail creating new packs (unless the whole
> > pack was objects meant for removal).
> 
> The idea is that the cruft pack would be all objects that were no
> longer referenced.  Hence the proposal that if they ever *are*
> accessed, they would be exploded to a loose object at that point.  So
> in the common case, the GC would go quickly since the entire pack
> could just be rm'ed once it hit the designated expiry time.

I think Ævar is talking about the case of:

  1. You make 100 objects that aren't referenced. They're loose.

  2. You run git-gc. They're still too recent to be deleted.

Right now those recent loose objects sit loose, and have zero cost at
the time of gc.  In a "cruft pack" world, you'd pay some I/O to copy
them into the cruft pack, and some CPU to zlib and delta-compress them.
I think that's probably fine, though.

That said, some of what you wrote left me confused, and whether we're
all talking about the same idea. ;) Let me describe the idea I had
mentioned in another thread.  Right now the behavior is basically this:

If an unreachable object becomes referenced, it doesn't immediately get
exploded. During the next gc, whatever new object referenced them would
be one of:

  1. Reachable from refs, in which case it carries along the
     formerly-cruft object into the new pack, since it is now also
     reachable.

  2. Unreachable but still recent by mtime; we keep such objects, and
     anything they reference (now as unreachable, in this proposal in
     the cruft pack). Now these get either left loose, or exploded loose
     if they were previously packed.

  3. Unreachable and old. Both objects can be dropped totally.

The current strategy is to use the mtimes for "recent", and we use the
pack's mtime for every object in the pack.

So if we pack all the loose objects into a cruft pack, the mtime of the
cruft pack becomes the new gauge for "recent". And if we migrate objects
from old cruft pack to new cruft pack at each gc, then they'll keep
getting their mtimes refreshed, and we'll never drop them.

So we need to either:

  - keep per-object mtimes, so that old ones can age out (i.e., they'd
    hit case 3 and just not get migrated to either the new "real" pack
    or the new cruft pack).

  - keep multiple cruft packs, and let whole packs age out. But then
    cruft objects which get referenced again by other cruft have to get
    copied (not moved!) to new packs. That _probably_ doesn't happen all
    that often, so it might be OK.

> Another way of doing things would be to use the mtime of the cruft
> pack for the expiry time, and if the curft pack is ever referenced,
> its mtime would get updated.  Yet a third way would be to simply clear
> the "cruft" bit if it ever *is* referenced.  In the common case, it
> would never be referenced, so it could just get deleted, but in the
> case where the user has manually "rescued" a set of commits (perhaps
> by explicitly setting a branch head to commit id found from a reflog),
> the objects would be saved.

I don't think we have to worry about "rescued" objects. Those are
reachable, so they'd get copied into the new "real" pack (and then their
cruft pack eventually deleted).

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-23  0:47                       ` Jeff King
@ 2018-01-23  5:45                         ` Theodore Ts'o
  2018-01-23 16:17                           ` Jeff King
  0 siblings, 1 reply; 52+ messages in thread
From: Theodore Ts'o @ 2018-01-23  5:45 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Christoph Hellwig, Linus Torvalds, Git Mailing List,
	linux-fsdevel, Chris Mason

On Mon, Jan 22, 2018 at 07:47:10PM -0500, Jeff King wrote:
> 
> I think Ævar is talking about the case of:
> 
>   1. You make 100 objects that aren't referenced. They're loose.
> 
>   2. You run git-gc. They're still too recent to be deleted.
> 
> Right now those recent loose objects sit loose, and have zero cost at
> the time of gc.  In a "cruft pack" world, you'd pay some I/O to copy
> them into the cruft pack, and some CPU to zlib and delta-compress them.
> I think that's probably fine, though.

I wasn't assuming that git-gc would create a cruft pack --- although I
guess it could.  As you say, recent loose objects have relatively zero
cost at the time of gc.  To the extent that the gc has to read lots of
loose files, there may be more seeks in the cold cache case, so there
is actually *some* cost to having the loose objects, but it's not
great.

What I was thinking about instead is that in cases where we know we
are likely to be creating a large number of loose objects (whether
they referenced or not), in a world where we will be calling fsync(2)
after every single loose object being created, pack files start
looking *way* more efficient.  So in general, if you know you will be
creating N loose objects, where N is probably around 50 or so, you'll
want to create a pack instead.

One of those cases is "repack -A", and in that case the loose objects
are all going tobe not referenced, so it would be a "cruft pack".  But
in many other cases where we might be importing from another DCVS,
which will be another case where doing an fsync(2) after every loose
object creation (and where I have sometimes seen it create them *all*
loose, and not use a pack at all), is going to get extremely slow and
painful.

> So if we pack all the loose objects into a cruft pack, the mtime of the
> cruft pack becomes the new gauge for "recent". And if we migrate objects
> from old cruft pack to new cruft pack at each gc, then they'll keep
> getting their mtimes refreshed, and we'll never drop them.

Well, I was assuming that gc would be a special case which doesn't the
mtime of the old cruft pack.  (Or more generally, any time an object
is gets copied out of the cruft pack, either to a loose object, or to
another pack, the mtime on the source pack should not be touched.)

	      	  	       	      	   	   - Ted

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-23  5:45                         ` Theodore Ts'o
@ 2018-01-23 16:17                           ` Jeff King
  0 siblings, 0 replies; 52+ messages in thread
From: Jeff King @ 2018-01-23 16:17 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Christoph Hellwig, Linus Torvalds, Git Mailing List,
	linux-fsdevel, Chris Mason

On Tue, Jan 23, 2018 at 12:45:53AM -0500, Theodore Ts'o wrote:

> What I was thinking about instead is that in cases where we know we
> are likely to be creating a large number of loose objects (whether
> they referenced or not), in a world where we will be calling fsync(2)
> after every single loose object being created, pack files start
> looking *way* more efficient.  So in general, if you know you will be
> creating N loose objects, where N is probably around 50 or so, you'll
> want to create a pack instead.
> 
> One of those cases is "repack -A", and in that case the loose objects
> are all going tobe not referenced, so it would be a "cruft pack".  But
> in many other cases where we might be importing from another DCVS,
> which will be another case where doing an fsync(2) after every loose
> object creation (and where I have sometimes seen it create them *all*
> loose, and not use a pack at all), is going to get extremely slow and
> painful.

Ah, I see. I think in the general case of git operations this is hard
(because most object writes don't realize the larger operation that
they're a part of). But I agree that those two are the low-hanging fruit
(imports should already be using fast-import, and "cruft packs" are not
too hard an idea to implement).

I agree that a cruft-pack implementation could just be for "repack -A",
and does not have to collect otherwise loose objects. I think part of my
confusion was that you and I are coming to the idea from different
angles: you care about minimizing fsyncs, and I'm interested in stopping
the problem where you have too many loose objects after running auto-gc.
So I care more about collecting those loose objects for that case.

> > So if we pack all the loose objects into a cruft pack, the mtime of the
> > cruft pack becomes the new gauge for "recent". And if we migrate objects
> > from old cruft pack to new cruft pack at each gc, then they'll keep
> > getting their mtimes refreshed, and we'll never drop them.
> 
> Well, I was assuming that gc would be a special case which doesn't the
> mtime of the old cruft pack.  (Or more generally, any time an object
> is gets copied out of the cruft pack, either to a loose object, or to
> another pack, the mtime on the source pack should not be touched.)

Right, that's the "you have multiple cruft packs" idea which has been
discussed[1] (each one just hangs around until its mtime expires, and
may duplicate objects found elsewhere).

That does end up with one pack per gc, which just punts the "too many
loose objects" to "too many packs". But unless the number of gc runs you
do is very high compared to the expiration time, we can probably ignore
that.

-Peff

[1] https://public-inbox.org/git/20170610080626.sjujpmgkli4muh7h@sigill.intra.peff.net/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 23:52       ` Theodore Ts'o
  2018-01-17 23:57         ` Linus Torvalds
@ 2020-09-17 11:06         ` Ævar Arnfjörð Bjarmason
  2020-09-17 11:28           ` [RFC PATCH 0/2] should core.fsyncObjectFiles fsync the dir entry + docs Ævar Arnfjörð Bjarmason
                             ` (4 more replies)
  1 sibling, 5 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2020-09-17 11:06 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Linus Torvalds, Junio C Hamano, Christoph Hellwig,
	Git Mailing List, linux-fsdevel, Jacob Vosmaer


On Thu, Jan 18 2018, Theodore Ts'o wrote:

> On Wed, Jan 17, 2018 at 02:07:22PM -0800, Linus Torvalds wrote:
>> 
>> Now re-do the test while another process writes to a totally unrelated
>> a huge file (say, do a ISO file copy or something).
>> 
>> That was the thing that several filesystems get completely and
>> horribly wrong. Generally _particularly_ the logging filesystems that
>> don't even need the fsync, because they use a single log for
>> everything (so fsync serializes all the writes, not just the writes to
>> the one file it's fsync'ing).
>
> Well, let's be fair; this is something *ext3* got wrong, and it was
> the default file system back them.  All of the modern file systems now
> do delayed allocation, which means that an fsync of one file doesn't
> actually imply an fsync of another file.  Hence...
>
>> The original git design was very much to write each object file
>> without any syncing, because they don't matter since a new object file
>> - by definition - isn't really reachable. Then sync before writing the
>> index file or a new ref.
>
> This isn't really safe any more.  Yes, there's a single log.  But
> files which are subject to delayed allocation are in the page cache,
> and just because you fsync the index file doesn't mean that the object
> file is now written to disk.  It was true for ext3, but it's not true
> for ext4, xfs, btrfs, etc.
>
> The good news is that if you have another process downloading a huge
> ISO image, the fsync of the index file won't force the ISO file to be
> written out.  The bad news is that it won't force out the other git
> object files, either.
>
> Now, there is a potential downside of fsync'ing each object file, and
> that is the cost of doing a CACHE FLUSH on a HDD is non-trivial, and
> even on a SSD, it's not optimal to call CACHE FLUSH thousands of times
> in a second.  So if you are creating thousands of tiny files, and you
> fsync each one, each fsync(2) call is a serializing instruction, which
> means it won't return until that one file is written to disk.  If you
> are writing lots of small files, and you are using a HDD, you'll be
> bottlenecked to around 30 files per second on a 5400 RPM HDD, and this
> is true regardless of what file system you use, because the bottle
> neck is the CACHE FLUSH operation, and how you organize the metadata
> and how you do the block allocation, is largely lost in the noise
> compared to the CACHE FLUSH command, which serializes everything.
>
> There are solutions to this; you could simply not call fsync(2) a
> thousand times, and instead write a pack file, and call fsync once on
> the pack file.  That's probably the smartest approach.
>
> You could also create a thousand threads, and call fsync(2) on those
> thousand threads at roughly the same time.  Or you could use a
> bleeding edge kernel with the latest AIO patch, and use the newly
> added IOCB_CMD_FSYNC support.
>
> But I'd simply recommend writing a pack and fsync'ing the pack,
> instead of trying to write a gazillion object files.  (git-repack -A,
> I'm looking at you....)
>
> 					- Ted

[I didn't find an ideal message to reply to in this thread, but this
seemed to probably be the best]

Just an update on this since I went back and looked at this thread,
GitLab about ~1yr ago turned on core.fsyncObjectFiles=true by
default.

The reason is detailed in [1], tl;dr: empty loose object file issue on
ext4 allegedly caused by a lack of core.fsyncObjectFiles=true, but I
didn't do any root cause analysis. Just noting it here for for future
reference.

1. https://gitlab.com/gitlab-org/gitlab-foss/-/issues/51680#note_180508774

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH 0/2] should core.fsyncObjectFiles fsync the dir entry + docs
  2020-09-17 11:06         ` Ævar Arnfjörð Bjarmason
@ 2020-09-17 11:28           ` Ævar Arnfjörð Bjarmason
  2020-09-17 11:28           ` [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles Ævar Arnfjörð Bjarmason
                             ` (3 subsequent siblings)
  4 siblings, 0 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2020-09-17 11:28 UTC (permalink / raw)
  To: git
  Cc: tytso, Junio C Hamano, Christoph Hellwig, Linus Torvalds,
	linux-fsdevel, Ævar Arnfjörð Bjarmason

I was re-reading the
https://lore.kernel.org/git/20180117184828.31816-1-hch@lst.de/ thread
today and thought we should at least update the docs, and per my
earlier E-Mail in
https://lore.kernel.org/git/87sgbghdbp.fsf@evledraar.gmail.com/
perhaps the directory entry should also be synced.

I kept linux-fsdevel@vger.kernel.org in the CC, it was in the original
thread, but more importantly it would be really nice to have people
who know more about the state of filesystems on Linux and other OS's
to give 2/2 a read to see how accurate what I put together is.

Ævar Arnfjörð Bjarmason (2):
  sha1-file: fsync() loose dir entry when core.fsyncObjectFiles
  core.fsyncObjectFiles: make the docs less flippant

 Documentation/config/core.txt | 42 ++++++++++++++++++++++++++++++-----
 sha1-file.c                   | 19 +++++++++++-----
 2 files changed, 50 insertions(+), 11 deletions(-)

-- 
2.28.0.297.g1956fa8f8d


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles
  2020-09-17 11:06         ` Ævar Arnfjörð Bjarmason
  2020-09-17 11:28           ` [RFC PATCH 0/2] should core.fsyncObjectFiles fsync the dir entry + docs Ævar Arnfjörð Bjarmason
@ 2020-09-17 11:28           ` Ævar Arnfjörð Bjarmason
  2020-09-17 13:16             ` Jeff King
                               ` (2 more replies)
  2020-09-17 11:28           ` [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant Ævar Arnfjörð Bjarmason
                             ` (2 subsequent siblings)
  4 siblings, 3 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2020-09-17 11:28 UTC (permalink / raw)
  To: git
  Cc: tytso, Junio C Hamano, Christoph Hellwig, Linus Torvalds,
	linux-fsdevel, Ævar Arnfjörð Bjarmason

Change the behavior of core.fsyncObjectFiles to also sync the
directory entry. I don't have a case where this broke, just going by
paranoia and the fsync(2) manual page's guarantees about its behavior.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 sha1-file.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/sha1-file.c b/sha1-file.c
index dd65bd5c68..d286346921 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1784,10 +1784,14 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
 }
 
 /* Finalize a file on disk, and close it. */
-static void close_loose_object(int fd)
+static void close_loose_object(int fd, const struct strbuf *dirname)
 {
-	if (fsync_object_files)
+	int dirfd;
+	if (fsync_object_files) {
 		fsync_or_die(fd, "loose object file");
+		dirfd = xopen(dirname->buf, O_RDONLY);
+		fsync_or_die(dirfd, "loose object directory");
+	}
 	if (close(fd) != 0)
 		die_errno(_("error when closing loose object file"));
 }
@@ -1808,12 +1812,15 @@ static inline int directory_size(const char *filename)
  * We want to avoid cross-directory filename renames, because those
  * can have problems on various filesystems (FAT, NFS, Coda).
  */
-static int create_tmpfile(struct strbuf *tmp, const char *filename)
+static int create_tmpfile(struct strbuf *tmp,
+			  const char *filename,
+			  struct strbuf *dirname)
 {
 	int fd, dirlen = directory_size(filename);
 
 	strbuf_reset(tmp);
 	strbuf_add(tmp, filename, dirlen);
+	strbuf_add(dirname, filename, dirlen);
 	strbuf_addstr(tmp, "tmp_obj_XXXXXX");
 	fd = git_mkstemp_mode(tmp->buf, 0444);
 	if (fd < 0 && dirlen && errno == ENOENT) {
@@ -1848,10 +1855,11 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
 	struct object_id parano_oid;
 	static struct strbuf tmp_file = STRBUF_INIT;
 	static struct strbuf filename = STRBUF_INIT;
+	static struct strbuf dirname = STRBUF_INIT;
 
 	loose_object_path(the_repository, &filename, oid);
 
-	fd = create_tmpfile(&tmp_file, filename.buf);
+	fd = create_tmpfile(&tmp_file, filename.buf, &dirname);
 	if (fd < 0) {
 		if (errno == EACCES)
 			return error(_("insufficient permission for adding an object to repository database %s"), get_object_directory());
@@ -1897,7 +1905,8 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
 		die(_("confused by unstable object source data for %s"),
 		    oid_to_hex(oid));
 
-	close_loose_object(fd);
+	close_loose_object(fd, &dirname);
+	strbuf_release(&dirname);
 
 	if (mtime) {
 		struct utimbuf utb;
-- 
2.28.0.297.g1956fa8f8d


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant
  2020-09-17 11:06         ` Ævar Arnfjörð Bjarmason
  2020-09-17 11:28           ` [RFC PATCH 0/2] should core.fsyncObjectFiles fsync the dir entry + docs Ævar Arnfjörð Bjarmason
  2020-09-17 11:28           ` [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles Ævar Arnfjörð Bjarmason
@ 2020-09-17 11:28           ` Ævar Arnfjörð Bjarmason
  2020-09-17 14:12             ` Christoph Hellwig
                               ` (2 more replies)
  2020-09-17 14:14           ` [PATCH] enable core.fsyncObjectFiles by default Christoph Hellwig
  2020-09-17 15:30           ` Junio C Hamano
  4 siblings, 3 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2020-09-17 11:28 UTC (permalink / raw)
  To: git
  Cc: tytso, Junio C Hamano, Christoph Hellwig, Linus Torvalds,
	linux-fsdevel, Ævar Arnfjörð Bjarmason

As amusing as Linus's original prose[1] is here it doesn't really explain
in any detail to the uninitiated why you would or wouldn't enable
this, and the counter-intuitive reason for why git wouldn't fsync your
precious data.

So elaborate (a lot) on why this may or may not be needed. This is my
best-effort attempt to summarize the various points raised in the last
ML[2] discussion about this.

1.  aafe9fbaf4 ("Add config option to enable 'fsync()' of object
    files", 2008-06-18)
2. https://lore.kernel.org/git/20180117184828.31816-1-hch@lst.de/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config/core.txt | 42 ++++++++++++++++++++++++++++++-----
 1 file changed, 36 insertions(+), 6 deletions(-)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index 74619a9c03..5b47670c16 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -548,12 +548,42 @@ core.whitespace::
   errors. The default tab width is 8. Allowed values are 1 to 63.
 
 core.fsyncObjectFiles::
-	This boolean will enable 'fsync()' when writing object files.
-+
-This is a total waste of time and effort on a filesystem that orders
-data writes properly, but can be useful for filesystems that do not use
-journalling (traditional UNIX filesystems) or that only journal metadata
-and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
+	This boolean will enable 'fsync()' when writing loose object
+	files. Both the file itself and its containng directory will
+	be fsynced.
++
+When git writes data any required object writes will precede the
+corresponding reference update(s). For example, a
+linkgit:git-receive-pack[1] accepting a push might write a pack or
+loose objects (depending on settings such as `transfer.unpackLimit`).
++
+Therefore on a journaled file system which ensures that data is
+flushed to disk in chronological order an fsync shouldn't be
+needed. The loose objects might be lost with a crash, but so will the
+ref update that would have referenced them. Git's own state in such a
+crash will remain consistent.
++
+This option exists because that assumption doesn't hold on filesystems
+where the data ordering is not preserved, such as on ext3 and ext4
+with "data=writeback". On such a filesystem the `rename()` that drops
+the new reference in place might be preserved, but the contents or
+directory entry for the loose object(s) might not have been synced to
+disk.
++
+Enabling this option might slow git down by a lot in some
+cases. E.g. in the case of a naïve bulk import tool which might create
+a million loose objects before a final ref update and `gc`. In other
+more common cases such as on a server being pushed to with default
+`transfer.unpackLimit` settings the difference might not be noticable.
++
+However, that's highly filesystem-dependent, on some filesystems
+simply calling fsync() might force an unrelated bulk background write
+to be serialized to disk. Such edge cases are the reason this option
+is off by default. That default setting might change in future
+versions.
++
+In older versions of git only the descriptor for the file itself was
+fsynced, not its directory entry.
 
 core.preloadIndex::
 	Enable parallel index preload for operations like 'git diff'
-- 
2.28.0.297.g1956fa8f8d


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles
  2020-09-17 11:28           ` [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles Ævar Arnfjörð Bjarmason
@ 2020-09-17 13:16             ` Jeff King
  2020-09-17 15:09               ` Christoph Hellwig
  2020-09-17 14:09             ` Christoph Hellwig
  2020-09-17 20:21             ` Johannes Sixt
  2 siblings, 1 reply; 52+ messages in thread
From: Jeff King @ 2020-09-17 13:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, tytso, Junio C Hamano, Christoph Hellwig, Linus Torvalds,
	linux-fsdevel

On Thu, Sep 17, 2020 at 01:28:29PM +0200, Ævar Arnfjörð Bjarmason wrote:

> Change the behavior of core.fsyncObjectFiles to also sync the
> directory entry. I don't have a case where this broke, just going by
> paranoia and the fsync(2) manual page's guarantees about its behavior.

I've also often wondered whether this is necessary. Given the symptom of
"oops, this object is there but with 0 bytes" after a hard crash (power
off, etc), my assumption is that the metadata is being journaled but the
actual data is not. Which would imply this isn't needed, but may just be
revealing my naive view of how filesystems work.

And of course all of my experience is on ext4 (which doubly confuses me,
because my systems typically have data=ordered, which I thought would
solve this). Non-journalling filesystems or other modes likely behave
differently, but if this extra fsync carries a cost, we may want to make
it optional.

>  sha1-file.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)

We already fsync pack files, but we don't fsync their directories. If
this is important to do, we should be doing it there, too.

We also don't fsync ref files (nor packed-refs) at all. If fsyncing
files is important for reliability, we should be including those, too.
It may be tempting to say that the important stuff is in objects and the
refs can be salvaged from the commit graph, but my experience says
otherwise. Missing, broken, or mysteriously-rewound refs cause confusing
user-visible behavior, and when compounded with pruning operations like
"git gc" they _do_ result in losing objects.

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles
  2020-09-17 11:28           ` [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles Ævar Arnfjörð Bjarmason
  2020-09-17 13:16             ` Jeff King
@ 2020-09-17 14:09             ` Christoph Hellwig
  2020-09-17 14:55               ` Jeff King
  2020-09-22 10:42               ` Ævar Arnfjörð Bjarmason
  2020-09-17 20:21             ` Johannes Sixt
  2 siblings, 2 replies; 52+ messages in thread
From: Christoph Hellwig @ 2020-09-17 14:09 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, tytso, Junio C Hamano, Christoph Hellwig, Linus Torvalds,
	linux-fsdevel

On Thu, Sep 17, 2020 at 01:28:29PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Change the behavior of core.fsyncObjectFiles to also sync the
> directory entry. I don't have a case where this broke, just going by
> paranoia and the fsync(2) manual page's guarantees about its behavior.

It is not just paranoia, but indeed what is required from the standards
POV.  At least for many Linux file systems your second fsync will be
very cheap (basically a NULL syscall) as the log has alredy been forced
all the way by the first one, but you can't rely on that.

Acked-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant
  2020-09-17 11:28           ` [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant Ævar Arnfjörð Bjarmason
@ 2020-09-17 14:12             ` Christoph Hellwig
  2020-09-17 15:43             ` Junio C Hamano
  2020-09-17 19:21             ` Marc Branchaud
  2 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2020-09-17 14:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, tytso, Junio C Hamano, Christoph Hellwig, Linus Torvalds,
	linux-fsdevel

>  core.fsyncObjectFiles::
> +	This boolean will enable 'fsync()' when writing loose object
> +	files. Both the file itself and its containng directory will
> +	be fsynced.
> ++
> +When git writes data any required object writes will precede the
> +corresponding reference update(s). For example, a
> +linkgit:git-receive-pack[1] accepting a push might write a pack or
> +loose objects (depending on settings such as `transfer.unpackLimit`).
> ++
> +Therefore on a journaled file system which ensures that data is
> +flushed to disk in chronological order an fsync shouldn't be
> +needed. The loose objects might be lost with a crash, but so will the
> +ref update that would have referenced them. Git's own state in such a
> +crash will remain consistent.

While this is much better than what we had before I'm not sure it is
all that useful.  The only file system I know of that actually had the
above behavior was ext3, and the fact that it always wrote back that
way made it a complete performance desaster.  So even mentioning this
here will probably create a lot more confusion than actually clearing
things up.

> ++
> +This option exists because that assumption doesn't hold on filesystems
> +where the data ordering is not preserved, such as on ext3 and ext4
> +with "data=writeback". On such a filesystem the `rename()` that drops
> +the new reference in place might be preserved, but the contents or
> +directory entry for the loose object(s) might not have been synced to
> +disk.

As well as just about any other file system.  Which is another argument
on why it needs to be on by default.  Every time I install a new
development system (aka one that often crashes) and forget to enable
the option I keep corrupting my git repos.  And that is with at least
btrfs, ext4 and xfs as it is pretty much by design.

> +However, that's highly filesystem-dependent, on some filesystems
> +simply calling fsync() might force an unrelated bulk background write
> +to be serialized to disk. Such edge cases are the reason this option
> +is off by default. That default setting might change in future
> +versions.

Again the only "some file system" that was widely used that did this
was ext3.  And ext3 has long been removed from the Linux kernel..

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2020-09-17 11:06         ` Ævar Arnfjörð Bjarmason
                             ` (2 preceding siblings ...)
  2020-09-17 11:28           ` [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant Ævar Arnfjörð Bjarmason
@ 2020-09-17 14:14           ` Christoph Hellwig
  2020-09-17 15:30           ` Junio C Hamano
  4 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2020-09-17 14:14 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Theodore Ts'o, Linus Torvalds, Junio C Hamano,
	Christoph Hellwig, Git Mailing List, linux-fsdevel,
	Jacob Vosmaer

On Thu, Sep 17, 2020 at 01:06:50PM +0200, Ævar Arnfjörð Bjarmason wrote:
> The reason is detailed in [1], tl;dr: empty loose object file issue on
> ext4 allegedly caused by a lack of core.fsyncObjectFiles=true, but I
> didn't do any root cause analysis. Just noting it here for for future
> reference.

All the modern Linux file systems first write the data, and then only
write the metadata after the data write has finished.  So your data might
have been partially or fully on disk, but until the transaction to commit
the size change and/or extent state change you're not going to be able
to read it back.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles
  2020-09-17 14:09             ` Christoph Hellwig
@ 2020-09-17 14:55               ` Jeff King
  2020-09-17 14:56                 ` Christoph Hellwig
  2020-09-22 10:42               ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 52+ messages in thread
From: Jeff King @ 2020-09-17 14:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ævar Arnfjörð Bjarmason, git, tytso,
	Junio C Hamano, Linus Torvalds, linux-fsdevel

On Thu, Sep 17, 2020 at 04:09:12PM +0200, Christoph Hellwig wrote:

> On Thu, Sep 17, 2020 at 01:28:29PM +0200, Ævar Arnfjörð Bjarmason wrote:
> > Change the behavior of core.fsyncObjectFiles to also sync the
> > directory entry. I don't have a case where this broke, just going by
> > paranoia and the fsync(2) manual page's guarantees about its behavior.
> 
> It is not just paranoia, but indeed what is required from the standards
> POV.  At least for many Linux file systems your second fsync will be
> very cheap (basically a NULL syscall) as the log has alredy been forced
> all the way by the first one, but you can't rely on that.

Is it sufficient to fsync() just the surrounding directory? I.e., if I
do:

  mkdir("a");
  mkdir("a/b");
  open("a/b/c", O_WRONLY);

is it enough to fsync() a descriptor pointing to "a/b", or should I
also do "a"?

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles
  2020-09-17 14:55               ` Jeff King
@ 2020-09-17 14:56                 ` Christoph Hellwig
  2020-09-17 15:37                   ` Junio C Hamano
  0 siblings, 1 reply; 52+ messages in thread
From: Christoph Hellwig @ 2020-09-17 14:56 UTC (permalink / raw)
  To: Jeff King
  Cc: Christoph Hellwig, Ævar Arnfjörð Bjarmason, git,
	tytso, Junio C Hamano, Linus Torvalds, linux-fsdevel

On Thu, Sep 17, 2020 at 10:55:23AM -0400, Jeff King wrote:
> On Thu, Sep 17, 2020 at 04:09:12PM +0200, Christoph Hellwig wrote:
> 
> > On Thu, Sep 17, 2020 at 01:28:29PM +0200, Ævar Arnfjörð Bjarmason wrote:
> > > Change the behavior of core.fsyncObjectFiles to also sync the
> > > directory entry. I don't have a case where this broke, just going by
> > > paranoia and the fsync(2) manual page's guarantees about its behavior.
> > 
> > It is not just paranoia, but indeed what is required from the standards
> > POV.  At least for many Linux file systems your second fsync will be
> > very cheap (basically a NULL syscall) as the log has alredy been forced
> > all the way by the first one, but you can't rely on that.
> 
> Is it sufficient to fsync() just the surrounding directory? I.e., if I
> do:
> 
>   mkdir("a");
>   mkdir("a/b");
>   open("a/b/c", O_WRONLY);
> 
> is it enough to fsync() a descriptor pointing to "a/b", or should I
> also do "a"?

You need to fsync both to be fully compliant, even if just fsyncing b
will work for most but not all file systems.  The good news is that
for those common file systems the extra fsync of a is almost free.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles
  2020-09-17 13:16             ` Jeff King
@ 2020-09-17 15:09               ` Christoph Hellwig
  0 siblings, 0 replies; 52+ messages in thread
From: Christoph Hellwig @ 2020-09-17 15:09 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, git, tytso,
	Junio C Hamano, Christoph Hellwig, Linus Torvalds, linux-fsdevel

On Thu, Sep 17, 2020 at 09:16:05AM -0400, Jeff King wrote:
> I've also often wondered whether this is necessary. Given the symptom of
> "oops, this object is there but with 0 bytes" after a hard crash (power
> off, etc), my assumption is that the metadata is being journaled but the
> actual data is not. Which would imply this isn't needed, but may just be
> revealing my naive view of how filesystems work.
> 
> And of course all of my experience is on ext4 (which doubly confuses me,
> because my systems typically have data=ordered, which I thought would
> solve this). Non-journalling filesystems or other modes likely behave
> differently, but if this extra fsync carries a cost, we may want to make
> it optional.

I hope my other mail clarified how this works at a high level, if not
feel free to ask more questions.

> >  sha1-file.c | 19 ++++++++++++++-----
> >  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> We already fsync pack files, but we don't fsync their directories. If
> this is important to do, we should be doing it there, too.
> 
> We also don't fsync ref files (nor packed-refs) at all. If fsyncing
> files is important for reliability, we should be including those, too.
> It may be tempting to say that the important stuff is in objects and the
> refs can be salvaged from the commit graph, but my experience says
> otherwise. Missing, broken, or mysteriously-rewound refs cause confusing
> user-visible behavior, and when compounded with pruning operations like
> "git gc" they _do_ result in losing objects.

True, this probably needs to do for the directories of other files
as well.

One interesting optimization under linux is the syncfs syscall, that
syncs all files on a file system - if you need to do a large number
of fsyncs that do not depend on each other for transaction semantics
it can provide a huge speedup.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2020-09-17 11:06         ` Ævar Arnfjörð Bjarmason
                             ` (3 preceding siblings ...)
  2020-09-17 14:14           ` [PATCH] enable core.fsyncObjectFiles by default Christoph Hellwig
@ 2020-09-17 15:30           ` Junio C Hamano
  4 siblings, 0 replies; 52+ messages in thread
From: Junio C Hamano @ 2020-09-17 15:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Theodore Ts'o, Linus Torvalds, Christoph Hellwig,
	Git Mailing List, linux-fsdevel, Jacob Vosmaer

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> [I didn't find an ideal message to reply to in this thread, but this
> seemed to probably be the best]
>
> Just an update on this since I went back and looked at this thread,
> GitLab about ~1yr ago turned on core.fsyncObjectFiles=true by
> default.
>
> The reason is detailed in [1], tl;dr: empty loose object file issue on
> ext4 allegedly caused by a lack of core.fsyncObjectFiles=true, but I
> didn't do any root cause analysis. Just noting it here for for future
> reference.
>
> 1. https://gitlab.com/gitlab-org/gitlab-foss/-/issues/51680#note_180508774

Thanks for bringing the original discussion back.  I do recall the
discussion and the list of issues to think about raised by Peff back
then in [2] is still relevant.


2. https://public-inbox.org/git/20180117205509.GA14828@sigill.intra.peff.net/




^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles
  2020-09-17 14:56                 ` Christoph Hellwig
@ 2020-09-17 15:37                   ` Junio C Hamano
  2020-09-17 17:12                     ` Jeff King
  0 siblings, 1 reply; 52+ messages in thread
From: Junio C Hamano @ 2020-09-17 15:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, git, tytso,
	Linus Torvalds, linux-fsdevel

Christoph Hellwig <hch@lst.de> writes:

> On Thu, Sep 17, 2020 at 10:55:23AM -0400, Jeff King wrote:
>> On Thu, Sep 17, 2020 at 04:09:12PM +0200, Christoph Hellwig wrote:
>> 
>> > On Thu, Sep 17, 2020 at 01:28:29PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> > > Change the behavior of core.fsyncObjectFiles to also sync the
>> > > directory entry. I don't have a case where this broke, just going by
>> > > paranoia and the fsync(2) manual page's guarantees about its behavior.
>> > 
>> > It is not just paranoia, but indeed what is required from the standards
>> > POV.  At least for many Linux file systems your second fsync will be
>> > very cheap (basically a NULL syscall) as the log has alredy been forced
>> > all the way by the first one, but you can't rely on that.
>> 
>> Is it sufficient to fsync() just the surrounding directory? I.e., if I
>> do:
>> 
>>   mkdir("a");
>>   mkdir("a/b");
>>   open("a/b/c", O_WRONLY);
>> 
>> is it enough to fsync() a descriptor pointing to "a/b", or should I
>> also do "a"?
>
> You need to fsync both to be fully compliant, even if just fsyncing b
> will work for most but not all file systems.  The good news is that
> for those common file systems the extra fsync of a is almost free.

Back to Ævar's patch, when creating a new loose object, we do these
things:

 1. create temporary file and write the compressed contents to it
    while computing its object name

 2. create the fan-out directory under .git/objects/ if needed

 3. mv temporary file to its final name

and the patch adds open+fsync+close on the fan-out directory.  In
the above exchange with Peff, we learned that open+fsync+close needs
to be done on .git/objects if we created the fan-out directory, too.

Am I reading the above correctly?

Thanks.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant
  2020-09-17 11:28           ` [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant Ævar Arnfjörð Bjarmason
  2020-09-17 14:12             ` Christoph Hellwig
@ 2020-09-17 15:43             ` Junio C Hamano
  2020-09-17 20:15               ` Johannes Sixt
  2020-10-08  8:13               ` Johannes Schindelin
  2020-09-17 19:21             ` Marc Branchaud
  2 siblings, 2 replies; 52+ messages in thread
From: Junio C Hamano @ 2020-09-17 15:43 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, tytso, Christoph Hellwig, Linus Torvalds, linux-fsdevel

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> As amusing as Linus's original prose[1] is here it doesn't really explain
> in any detail to the uninitiated why you would or wouldn't enable
> this, and the counter-intuitive reason for why git wouldn't fsync your
> precious data.
>
> So elaborate (a lot) on why this may or may not be needed. This is my
> best-effort attempt to summarize the various points raised in the last
> ML[2] discussion about this.
>
> 1.  aafe9fbaf4 ("Add config option to enable 'fsync()' of object
>     files", 2008-06-18)
> 2. https://lore.kernel.org/git/20180117184828.31816-1-hch@lst.de/
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  Documentation/config/core.txt | 42 ++++++++++++++++++++++++++++++-----
>  1 file changed, 36 insertions(+), 6 deletions(-)

When I saw the subject in my mailbox, I expected to see that you
would resurrect Christoph's updated text in [*1*], but you wrote a
whole lot more ;-) And they are quite informative to help readers to
understand what the option does.  I am not sure if the understanding
directly help readers to decide if it is appropriate for their own
repositories, though X-<.


Thanks.

[Reference]

*1* https://public-inbox.org/git/20180117193510.GA30657@lst.de/

>
> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> index 74619a9c03..5b47670c16 100644
> --- a/Documentation/config/core.txt
> +++ b/Documentation/config/core.txt
> @@ -548,12 +548,42 @@ core.whitespace::
>    errors. The default tab width is 8. Allowed values are 1 to 63.
>  
>  core.fsyncObjectFiles::
> -	This boolean will enable 'fsync()' when writing object files.
> -+
> -This is a total waste of time and effort on a filesystem that orders
> -data writes properly, but can be useful for filesystems that do not use
> -journalling (traditional UNIX filesystems) or that only journal metadata
> -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
> +	This boolean will enable 'fsync()' when writing loose object
> +	files. Both the file itself and its containng directory will
> +	be fsynced.
> ++
> +When git writes data any required object writes will precede the
> +corresponding reference update(s). For example, a
> +linkgit:git-receive-pack[1] accepting a push might write a pack or
> +loose objects (depending on settings such as `transfer.unpackLimit`).
> ++
> +Therefore on a journaled file system which ensures that data is
> +flushed to disk in chronological order an fsync shouldn't be
> +needed. The loose objects might be lost with a crash, but so will the
> +ref update that would have referenced them. Git's own state in such a
> +crash will remain consistent.
> ++
> +This option exists because that assumption doesn't hold on filesystems
> +where the data ordering is not preserved, such as on ext3 and ext4
> +with "data=writeback". On such a filesystem the `rename()` that drops
> +the new reference in place might be preserved, but the contents or
> +directory entry for the loose object(s) might not have been synced to
> +disk.
> ++
> +Enabling this option might slow git down by a lot in some
> +cases. E.g. in the case of a naïve bulk import tool which might create
> +a million loose objects before a final ref update and `gc`. In other
> +more common cases such as on a server being pushed to with default
> +`transfer.unpackLimit` settings the difference might not be noticable.
> ++
> +However, that's highly filesystem-dependent, on some filesystems
> +simply calling fsync() might force an unrelated bulk background write
> +to be serialized to disk. Such edge cases are the reason this option
> +is off by default. That default setting might change in future
> +versions.
> ++
> +In older versions of git only the descriptor for the file itself was
> +fsynced, not its directory entry.
>  
>  core.preloadIndex::
>  	Enable parallel index preload for operations like 'git diff'

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles
  2020-09-17 15:37                   ` Junio C Hamano
@ 2020-09-17 17:12                     ` Jeff King
  2020-09-17 20:37                       ` Taylor Blau
  0 siblings, 1 reply; 52+ messages in thread
From: Jeff King @ 2020-09-17 17:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Christoph Hellwig, Ævar Arnfjörð Bjarmason, git,
	tytso, Linus Torvalds, linux-fsdevel

On Thu, Sep 17, 2020 at 08:37:19AM -0700, Junio C Hamano wrote:

> > You need to fsync both to be fully compliant, even if just fsyncing b
> > will work for most but not all file systems.  The good news is that
> > for those common file systems the extra fsync of a is almost free.
> 
> Back to Ævar's patch, when creating a new loose object, we do these
> things:
> 
>  1. create temporary file and write the compressed contents to it
>     while computing its object name
> 
>  2. create the fan-out directory under .git/objects/ if needed
> 
>  3. mv temporary file to its final name
> 
> and the patch adds open+fsync+close on the fan-out directory.  In
> the above exchange with Peff, we learned that open+fsync+close needs
> to be done on .git/objects if we created the fan-out directory, too.
> 
> Am I reading the above correctly?

That's my understanding. It gets trickier with refs (which I think we
also ought to consider fsyncing), as we may create arbitrarily deep
hierarchies (so we'd have to keep track of which parts got created, or
just conservatively fsync up the whole hierarchy).

It gets a lot easier if we move to reftables that have a more
predictable file/disk structure.

-Peff

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant
  2020-09-17 11:28           ` [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant Ævar Arnfjörð Bjarmason
  2020-09-17 14:12             ` Christoph Hellwig
  2020-09-17 15:43             ` Junio C Hamano
@ 2020-09-17 19:21             ` Marc Branchaud
  2 siblings, 0 replies; 52+ messages in thread
From: Marc Branchaud @ 2020-09-17 19:21 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Junio C Hamano, Christoph Hellwig

On 2020-09-17 7:28 a.m., Ævar Arnfjörð Bjarmason wrote:
> As amusing as Linus's original prose[1] is here it doesn't really explain
> in any detail to the uninitiated why you would or wouldn't enable
> this, and the counter-intuitive reason for why git wouldn't fsync your
> precious data.
> 
> So elaborate (a lot) on why this may or may not be needed. This is my
> best-effort attempt to summarize the various points raised in the last
> ML[2] discussion about this.
> 
> 1.  aafe9fbaf4 ("Add config option to enable 'fsync()' of object
>      files", 2008-06-18)
> 2. https://lore.kernel.org/git/20180117184828.31816-1-hch@lst.de/
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>   Documentation/config/core.txt | 42 ++++++++++++++++++++++++++++++-----
>   1 file changed, 36 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> index 74619a9c03..5b47670c16 100644
> --- a/Documentation/config/core.txt
> +++ b/Documentation/config/core.txt
> @@ -548,12 +548,42 @@ core.whitespace::
>     errors. The default tab width is 8. Allowed values are 1 to 63.
>   
>   core.fsyncObjectFiles::
> -	This boolean will enable 'fsync()' when writing object files.
> -+
> -This is a total waste of time and effort on a filesystem that orders
> -data writes properly, but can be useful for filesystems that do not use
> -journalling (traditional UNIX filesystems) or that only journal metadata
> -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
> +	This boolean will enable 'fsync()' when writing loose object
> +	files. Both the file itself and its containng directory will

Typo: containng

		M.

> +	be fsynced.
> ++
> +When git writes data any required object writes will precede the
> +corresponding reference update(s). For example, a
> +linkgit:git-receive-pack[1] accepting a push might write a pack or
> +loose objects (depending on settings such as `transfer.unpackLimit`).
> ++
> +Therefore on a journaled file system which ensures that data is
> +flushed to disk in chronological order an fsync shouldn't be
> +needed. The loose objects might be lost with a crash, but so will the
> +ref update that would have referenced them. Git's own state in such a
> +crash will remain consistent.
> ++
> +This option exists because that assumption doesn't hold on filesystems
> +where the data ordering is not preserved, such as on ext3 and ext4
> +with "data=writeback". On such a filesystem the `rename()` that drops
> +the new reference in place might be preserved, but the contents or
> +directory entry for the loose object(s) might not have been synced to
> +disk.
> ++
> +Enabling this option might slow git down by a lot in some
> +cases. E.g. in the case of a naïve bulk import tool which might create
> +a million loose objects before a final ref update and `gc`. In other
> +more common cases such as on a server being pushed to with default
> +`transfer.unpackLimit` settings the difference might not be noticable.
> ++
> +However, that's highly filesystem-dependent, on some filesystems
> +simply calling fsync() might force an unrelated bulk background write
> +to be serialized to disk. Such edge cases are the reason this option
> +is off by default. That default setting might change in future
> +versions.
> ++
> +In older versions of git only the descriptor for the file itself was
> +fsynced, not its directory entry.
>   
>   core.preloadIndex::
>   	Enable parallel index preload for operations like 'git diff'
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant
  2020-09-17 15:43             ` Junio C Hamano
@ 2020-09-17 20:15               ` Johannes Sixt
  2020-10-08  8:13               ` Johannes Schindelin
  1 sibling, 0 replies; 52+ messages in thread
From: Johannes Sixt @ 2020-09-17 20:15 UTC (permalink / raw)
  To: Junio C Hamano, Ævar Arnfjörð Bjarmason
  Cc: git, tytso, Christoph Hellwig, Linus Torvalds, linux-fsdevel

Am 17.09.20 um 17:43 schrieb Junio C Hamano:
> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
> 
>> As amusing as Linus's original prose[1] is here it doesn't really explain
>> in any detail to the uninitiated why you would or wouldn't enable
>> this, and the counter-intuitive reason for why git wouldn't fsync your
>> precious data.
>>
>> So elaborate (a lot) on why this may or may not be needed. This is my
>> best-effort attempt to summarize the various points raised in the last
>> ML[2] discussion about this.
>>
>> 1.  aafe9fbaf4 ("Add config option to enable 'fsync()' of object
>>     files", 2008-06-18)
>> 2. https://lore.kernel.org/git/20180117184828.31816-1-hch@lst.de/
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  Documentation/config/core.txt | 42 ++++++++++++++++++++++++++++++-----
>>  1 file changed, 36 insertions(+), 6 deletions(-)
> 
> When I saw the subject in my mailbox, I expected to see that you
> would resurrect Christoph's updated text in [*1*], but you wrote a
> whole lot more ;-) And they are quite informative to help readers to
> understand what the option does.  I am not sure if the understanding
> directly help readers to decide if it is appropriate for their own
> repositories, though X-<.

Not only that; the new text also uses the term "fsync" in a manner that
I could be persuaded that it is actually an English word. Which, so far,
I doubt that it is ;) A little bit less 1337 wording would help the
users better.

> 
> 
> Thanks.
> 
> [Reference]
> 
> *1* https://public-inbox.org/git/20180117193510.GA30657@lst.de/
> 
>>
>> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
>> index 74619a9c03..5b47670c16 100644
>> --- a/Documentation/config/core.txt
>> +++ b/Documentation/config/core.txt
>> @@ -548,12 +548,42 @@ core.whitespace::
>>    errors. The default tab width is 8. Allowed values are 1 to 63.
>>  
>>  core.fsyncObjectFiles::
>> -	This boolean will enable 'fsync()' when writing object files.
>> -+
>> -This is a total waste of time and effort on a filesystem that orders
>> -data writes properly, but can be useful for filesystems that do not use
>> -journalling (traditional UNIX filesystems) or that only journal metadata
>> -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
>> +	This boolean will enable 'fsync()' when writing loose object
>> +	files. Both the file itself and its containng directory will
>> +	be fsynced.
>> ++
>> +When git writes data any required object writes will precede the
>> +corresponding reference update(s). For example, a
>> +linkgit:git-receive-pack[1] accepting a push might write a pack or
>> +loose objects (depending on settings such as `transfer.unpackLimit`).
>> ++
>> +Therefore on a journaled file system which ensures that data is
>> +flushed to disk in chronological order an fsync shouldn't be
>> +needed. The loose objects might be lost with a crash, but so will the
>> +ref update that would have referenced them. Git's own state in such a
>> +crash will remain consistent.
>> ++
>> +This option exists because that assumption doesn't hold on filesystems
>> +where the data ordering is not preserved, such as on ext3 and ext4
>> +with "data=writeback". On such a filesystem the `rename()` that drops
>> +the new reference in place might be preserved, but the contents or
>> +directory entry for the loose object(s) might not have been synced to
>> +disk.
>> ++
>> +Enabling this option might slow git down by a lot in some
>> +cases. E.g. in the case of a naïve bulk import tool which might create
>> +a million loose objects before a final ref update and `gc`. In other
>> +more common cases such as on a server being pushed to with default
>> +`transfer.unpackLimit` settings the difference might not be noticable.
>> ++
>> +However, that's highly filesystem-dependent, on some filesystems
>> +simply calling fsync() might force an unrelated bulk background write
>> +to be serialized to disk. Such edge cases are the reason this option
>> +is off by default. That default setting might change in future
>> +versions.
>> ++
>> +In older versions of git only the descriptor for the file itself was
>> +fsynced, not its directory entry.
>>  
>>  core.preloadIndex::
>>  	Enable parallel index preload for operations like 'git diff'
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles
  2020-09-17 11:28           ` [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles Ævar Arnfjörð Bjarmason
  2020-09-17 13:16             ` Jeff King
  2020-09-17 14:09             ` Christoph Hellwig
@ 2020-09-17 20:21             ` Johannes Sixt
  2020-09-22  8:24               ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 52+ messages in thread
From: Johannes Sixt @ 2020-09-17 20:21 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: tytso, Junio C Hamano, Christoph Hellwig, Linus Torvalds, linux-fsdevel

Am 17.09.20 um 13:28 schrieb Ævar Arnfjörð Bjarmason:
> Change the behavior of core.fsyncObjectFiles to also sync the
> directory entry. I don't have a case where this broke, just going by
> paranoia and the fsync(2) manual page's guarantees about its behavior.
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  sha1-file.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/sha1-file.c b/sha1-file.c
> index dd65bd5c68..d286346921 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -1784,10 +1784,14 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
>  }
>  
>  /* Finalize a file on disk, and close it. */
> -static void close_loose_object(int fd)
> +static void close_loose_object(int fd, const struct strbuf *dirname)
>  {
> -	if (fsync_object_files)
> +	int dirfd;
> +	if (fsync_object_files) {
>  		fsync_or_die(fd, "loose object file");
> +		dirfd = xopen(dirname->buf, O_RDONLY);
> +		fsync_or_die(dirfd, "loose object directory");

Did you have the opportunity to verify that this works on Windows?
Opening a directory with open(2), I mean: It's disallowed according to
the docs:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/open-wopen?view=vs-2019#return-value

> +	}
>  	if (close(fd) != 0)
>  		die_errno(_("error when closing loose object file"));
>  }

-- Hannes

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles
  2020-09-17 17:12                     ` Jeff King
@ 2020-09-17 20:37                       ` Taylor Blau
  0 siblings, 0 replies; 52+ messages in thread
From: Taylor Blau @ 2020-09-17 20:37 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, git, tytso,
	Linus Torvalds, linux-fsdevel

On Thu, Sep 17, 2020 at 01:12:12PM -0400, Jeff King wrote:
> On Thu, Sep 17, 2020 at 08:37:19AM -0700, Junio C Hamano wrote:
> > Am I reading the above correctly?
>
> That's my understanding. It gets trickier with refs (which I think we
> also ought to consider fsyncing), as we may create arbitrarily deep
> hierarchies (so we'd have to keep track of which parts got created, or
> just conservatively fsync up the whole hierarchy).

Yeah, it definitely gets trickier, but hopefully not by much. I
appreciate Christoph's explanation, and certainly buy into it. I can't
think of any reason why we wouldn't want to apply the same reasoning to
storing refs, too.

It shouldn't be a hold-up for this series, though.

> It gets a lot easier if we move to reftables that have a more
> predictable file/disk structure.

In the sense that we don't have to worry about arbitrary-depth loose
references, yes, but I think we'll still have to deal with both cases.

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles
  2020-09-17 20:21             ` Johannes Sixt
@ 2020-09-22  8:24               ` Ævar Arnfjörð Bjarmason
  2020-11-19 11:38                 ` Johannes Schindelin
  0 siblings, 1 reply; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2020-09-22  8:24 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: git, tytso, Junio C Hamano, Christoph Hellwig, Linus Torvalds,
	linux-fsdevel


On Thu, Sep 17 2020, Johannes Sixt wrote:

> Am 17.09.20 um 13:28 schrieb Ævar Arnfjörð Bjarmason:
>> Change the behavior of core.fsyncObjectFiles to also sync the
>> directory entry. I don't have a case where this broke, just going by
>> paranoia and the fsync(2) manual page's guarantees about its behavior.
>> 
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  sha1-file.c | 19 ++++++++++++++-----
>>  1 file changed, 14 insertions(+), 5 deletions(-)
>> 
>> diff --git a/sha1-file.c b/sha1-file.c
>> index dd65bd5c68..d286346921 100644
>> --- a/sha1-file.c
>> +++ b/sha1-file.c
>> @@ -1784,10 +1784,14 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
>>  }
>>  
>>  /* Finalize a file on disk, and close it. */
>> -static void close_loose_object(int fd)
>> +static void close_loose_object(int fd, const struct strbuf *dirname)
>>  {
>> -	if (fsync_object_files)
>> +	int dirfd;
>> +	if (fsync_object_files) {
>>  		fsync_or_die(fd, "loose object file");
>> +		dirfd = xopen(dirname->buf, O_RDONLY);
>> +		fsync_or_die(dirfd, "loose object directory");
>
> Did you have the opportunity to verify that this works on Windows?
> Opening a directory with open(2), I mean: It's disallowed according to
> the docs:
> https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/open-wopen?view=vs-2019#return-value

I did not, just did a quick hack for an RFC discussion (didn't even
close() that fd), but if I pursue this I'll do it properly.

Doing some research on it now reveals that we should probably have some
Windows-specific code here, e.g. browsing GNUlib's source code reveals
that it uses FlushFileBuffers(), and that code itself is taken from
sqlite. SQLite also has special-case code for some Unix warts,
e.g. OSX's and AIX's special fsync behaviors in its src/os_unix.c

>> +	}
>>  	if (close(fd) != 0)
>>  		die_errno(_("error when closing loose object file"));
>>  }
>
> -- Hannes


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles
  2020-09-17 14:09             ` Christoph Hellwig
  2020-09-17 14:55               ` Jeff King
@ 2020-09-22 10:42               ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2020-09-22 10:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: git, tytso, Junio C Hamano, Linus Torvalds, linux-fsdevel


On Thu, Sep 17 2020, Christoph Hellwig wrote:

> On Thu, Sep 17, 2020 at 01:28:29PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> Change the behavior of core.fsyncObjectFiles to also sync the
>> directory entry. I don't have a case where this broke, just going by
>> paranoia and the fsync(2) manual page's guarantees about its behavior.
>
> It is not just paranoia, but indeed what is required from the standards
> POV.  At least for many Linux file systems your second fsync will be
> very cheap (basically a NULL syscall) as the log has alredy been forced
> all the way by the first one, but you can't rely on that.
>
> Acked-by: Christoph Hellwig <hch@lst.de>

Thanks a lot for your advice in this thread.

Can you (or someone else) suggest a Linux fs setup that's as unforgiving
as possible vis-a-vis fsync() for testing? I'd like to hack on making
git better at this, but one of the problems of testing it is that modern
filesystems generally do a pretty good job of not losing your data.

So something like ext4's commit=N is an obvious start, but for git's own
test suite it would be ideal to have process A write file X, and then
have process B try to read it and just not see it if X hadn't been
fsynced (or not see its directory if that hadn't been synced).

It would turn our test suite into pretty much a 100% failure, but one
that could then be fixed by fixing the relevant file writing code.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant
  2020-09-17 15:43             ` Junio C Hamano
  2020-09-17 20:15               ` Johannes Sixt
@ 2020-10-08  8:13               ` Johannes Schindelin
  2020-10-08 15:57                 ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 52+ messages in thread
From: Johannes Schindelin @ 2020-10-08  8:13 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, git, tytso,
	Christoph Hellwig, Linus Torvalds, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1441 bytes --]

Hi Junio and Ævar,

On Thu, 17 Sep 2020, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
> > As amusing as Linus's original prose[1] is here it doesn't really explain
> > in any detail to the uninitiated why you would or wouldn't enable
> > this, and the counter-intuitive reason for why git wouldn't fsync your
> > precious data.
> >
> > So elaborate (a lot) on why this may or may not be needed. This is my
> > best-effort attempt to summarize the various points raised in the last
> > ML[2] discussion about this.
> >
> > 1.  aafe9fbaf4 ("Add config option to enable 'fsync()' of object
> >     files", 2008-06-18)
> > 2. https://lore.kernel.org/git/20180117184828.31816-1-hch@lst.de/
> >
> > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> > ---
> >  Documentation/config/core.txt | 42 ++++++++++++++++++++++++++++++-----
> >  1 file changed, 36 insertions(+), 6 deletions(-)
>
> When I saw the subject in my mailbox, I expected to see that you
> would resurrect Christoph's updated text in [*1*], but you wrote a
> whole lot more ;-) And they are quite informative to help readers to
> understand what the option does.  I am not sure if the understanding
> directly help readers to decide if it is appropriate for their own
> repositories, though X-<.

I agree that it is an improvement, and am therefore in favor of applying
the patch.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant
  2020-10-08  8:13               ` Johannes Schindelin
@ 2020-10-08 15:57                 ` Ævar Arnfjörð Bjarmason
  2020-10-08 18:53                   ` Junio C Hamano
  2020-10-09 10:44                   ` Johannes Schindelin
  0 siblings, 2 replies; 52+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2020-10-08 15:57 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, git, tytso, Christoph Hellwig, Linus Torvalds,
	linux-fsdevel


On Thu, Oct 08 2020, Johannes Schindelin wrote:

> Hi Junio and Ævar,
>
> On Thu, 17 Sep 2020, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>>
>> > As amusing as Linus's original prose[1] is here it doesn't really explain
>> > in any detail to the uninitiated why you would or wouldn't enable
>> > this, and the counter-intuitive reason for why git wouldn't fsync your
>> > precious data.
>> >
>> > So elaborate (a lot) on why this may or may not be needed. This is my
>> > best-effort attempt to summarize the various points raised in the last
>> > ML[2] discussion about this.
>> >
>> > 1.  aafe9fbaf4 ("Add config option to enable 'fsync()' of object
>> >     files", 2008-06-18)
>> > 2. https://lore.kernel.org/git/20180117184828.31816-1-hch@lst.de/
>> >
>> > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> > ---
>> >  Documentation/config/core.txt | 42 ++++++++++++++++++++++++++++++-----
>> >  1 file changed, 36 insertions(+), 6 deletions(-)
>>
>> When I saw the subject in my mailbox, I expected to see that you
>> would resurrect Christoph's updated text in [*1*], but you wrote a
>> whole lot more ;-) And they are quite informative to help readers to
>> understand what the option does.  I am not sure if the understanding
>> directly help readers to decide if it is appropriate for their own
>> repositories, though X-<.
>
> I agree that it is an improvement, and am therefore in favor of applying
> the patch.

Just the improved docs, or flipping the default of core.fsyncObjectFiles
to "true"?

I've been meaning to re-roll this. I won't have time anytime soon to fix
git's fsync() use, i.e. ensure that we run up & down modified
directories and fsync()/fdatasync() file/dir fd's as appropriate but I
think documenting it and changing the core.fsyncObjectFiles default
makes sense and is at least a step in the right direction.

I do think it makes more sense for a v2 to split most of this out into
some section that generally discusses data integrity in the .git
directory. I.e. that says that currently where we use fsync() (such as
pack/commit-graph writes) we don't fsync() the corresponding
director{y,ies), and ref updates don't fsync() at all.

Where to put that though? gitrepository-layout(5)? Or a new page like
gitrepository-integrity(5) (other suggestions welcome..).

Looking at the code again it seems easier than I thought to make this
right if we ignore .git/refs (which reftable can fix for us). Just:

1. Change fsync_or_die() and its callsites to also pass/sync the
   containing directory, which is always created already
   (e.g. .git/objects/pack/)...).

2. ..Or in the case where it's not created already such as
   .git/objects/??/ (or .git/objects/pack/) itself) it's not N-deep like
   the refs hierarchy, so "did we create it" state is pretty simple, or
   we can just always do it unconditionally.

3. Without reftable the .git/refs/ case shouldn't be too hard if we're
   OK with redundantly fsyncing all the way down, i.e. to make it
   simpler by not tracking the state of exactly what was changed.

4. Now that I'm writing this there's also .git/{config,rr-cache} and any
   number of other things we need to change for 100% coverage, but the
   above 1-3 should be good enough for repo integrity where repo = refs
   & objects.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant
  2020-10-08 15:57                 ` Ævar Arnfjörð Bjarmason
@ 2020-10-08 18:53                   ` Junio C Hamano
  2020-10-09 10:44                   ` Johannes Schindelin
  1 sibling, 0 replies; 52+ messages in thread
From: Junio C Hamano @ 2020-10-08 18:53 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Johannes Schindelin, git, tytso, Christoph Hellwig,
	Linus Torvalds, linux-fsdevel

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>>> When I saw the subject in my mailbox, I expected to see that you
>>> would resurrect Christoph's updated text in [*1*], but you wrote a
>>> whole lot more ;-) And they are quite informative to help readers to
>>> understand what the option does.  I am not sure if the understanding
>>> directly help readers to decide if it is appropriate for their own
>>> repositories, though X-<.
>>
>> I agree that it is an improvement, and am therefore in favor of applying
>> the patch.
>
> Just the improved docs, or flipping the default of core.fsyncObjectFiles
> to "true"?

I am not Dscho, but "applying THE patch" meant, at least to me, the
patch [2/2] to the docs, which was the message we are responding to.

> I've been meaning to re-roll this. I won't have time anytime soon to fix
> git's fsync() use, i.e. ensure that we run up & down modified
> directories and fsync()/fdatasync() file/dir fd's as appropriate but I
> think documenting it and changing the core.fsyncObjectFiles default
> makes sense and is at least a step in the right direction.
>
> I do think it makes more sense for a v2 to split most of this out into
> some section that generally discusses data integrity in the .git
> directory. I.e. that says that currently where we use fsync() (such as
> pack/commit-graph writes) we don't fsync() the corresponding
> director{y,ies), and ref updates don't fsync() at all.

Yes, I think all of these are sensible things to do sometime in the
future.


> Where to put that though? gitrepository-layout(5)? Or a new page like
> gitrepository-integrity(5) (other suggestions welcome..).

I do not have a good suggestion at this moment on this.





^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant
  2020-10-08 15:57                 ` Ævar Arnfjörð Bjarmason
  2020-10-08 18:53                   ` Junio C Hamano
@ 2020-10-09 10:44                   ` Johannes Schindelin
  1 sibling, 0 replies; 52+ messages in thread
From: Johannes Schindelin @ 2020-10-09 10:44 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, git, tytso, Christoph Hellwig, Linus Torvalds,
	linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 5249 bytes --]

Hi Ævar,

On Thu, 8 Oct 2020, Ævar Arnfjörð Bjarmason wrote:

> On Thu, Oct 08 2020, Johannes Schindelin wrote:
>
> > On Thu, 17 Sep 2020, Junio C Hamano wrote:
> >
> >> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
> >>
> >> > As amusing as Linus's original prose[1] is here it doesn't really explain
> >> > in any detail to the uninitiated why you would or wouldn't enable
> >> > this, and the counter-intuitive reason for why git wouldn't fsync your
> >> > precious data.
> >> >
> >> > So elaborate (a lot) on why this may or may not be needed. This is my
> >> > best-effort attempt to summarize the various points raised in the last
> >> > ML[2] discussion about this.
> >> >
> >> > 1.  aafe9fbaf4 ("Add config option to enable 'fsync()' of object
> >> >     files", 2008-06-18)
> >> > 2. https://lore.kernel.org/git/20180117184828.31816-1-hch@lst.de/
> >> >
> >> > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> >> > ---
> >> >  Documentation/config/core.txt | 42 ++++++++++++++++++++++++++++++-----
> >> >  1 file changed, 36 insertions(+), 6 deletions(-)
> >>
> >> When I saw the subject in my mailbox, I expected to see that you
> >> would resurrect Christoph's updated text in [*1*], but you wrote a
> >> whole lot more ;-) And they are quite informative to help readers to
> >> understand what the option does.  I am not sure if the understanding
> >> directly help readers to decide if it is appropriate for their own
> >> repositories, though X-<.
> >
> > I agree that it is an improvement, and am therefore in favor of applying
> > the patch.
>
> Just the improved docs, or flipping the default of core.fsyncObjectFiles
> to "true"?

I am actually also in favor of flipping the default. We carry
https://github.com/git-for-windows/git/commit/14dad078c28159b250be599c0890ece2d6f4d635
in Git for Windows for over three years. The commit message:

	mingw: change core.fsyncObjectFiles = 1 by default

	From the documentation of said setting:

		This boolean will enable fsync() when writing object files.

		This is a total waste of time and effort on a filesystem that
		orders data writes properly, but can be useful for filesystems
		that do not use journalling (traditional UNIX filesystems) or
		that only journal metadata and not file contents (OS X’s HFS+,
		or Linux ext3 with "data=writeback").

	The most common file system on Windows (NTFS) does not guarantee that
	order, therefore a sudden loss of power (or any other event causing an
	unclean shutdown) would cause corrupt files (i.e. files filled with
	NULs). Therefore we need to change the default.

	Note that the documentation makes it sound as if this causes really bad
	performance. In reality, writing loose objects is something that is done
	only rarely, and only a handful of files at a time.

	Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

The patch itself limits this change to Windows, but if this becomes a
platform-independent change, all the better for me!

> I've been meaning to re-roll this. I won't have time anytime soon to fix
> git's fsync() use, i.e. ensure that we run up & down modified
> directories and fsync()/fdatasync() file/dir fd's as appropriate but I
> think documenting it and changing the core.fsyncObjectFiles default
> makes sense and is at least a step in the right direction.

Agreed.

> I do think it makes more sense for a v2 to split most of this out into
> some section that generally discusses data integrity in the .git
> directory. I.e. that says that currently where we use fsync() (such as
> pack/commit-graph writes) we don't fsync() the corresponding
> director{y,ies), and ref updates don't fsync() at all.
>
> Where to put that though? gitrepository-layout(5)? Or a new page like
> gitrepository-integrity(5) (other suggestions welcome..).

I think `gitrepository-layout` is probably the best location for now.

> Looking at the code again it seems easier than I thought to make this
> right if we ignore .git/refs (which reftable can fix for us). Just:
>
> 1. Change fsync_or_die() and its callsites to also pass/sync the
>    containing directory, which is always created already
>    (e.g. .git/objects/pack/)...).
>
> 2. ..Or in the case where it's not created already such as
>    .git/objects/??/ (or .git/objects/pack/) itself) it's not N-deep like
>    the refs hierarchy, so "did we create it" state is pretty simple, or
>    we can just always do it unconditionally.
>
> 3. Without reftable the .git/refs/ case shouldn't be too hard if we're
>    OK with redundantly fsyncing all the way down, i.e. to make it
>    simpler by not tracking the state of exactly what was changed.
>
> 4. Now that I'm writing this there's also .git/{config,rr-cache} and any
>    number of other things we need to change for 100% coverage, but the
>    above 1-3 should be good enough for repo integrity where repo = refs
>    & objects.

I fear that the changes to `fsync` also the directory need to be guarded
behind a preprocessor flag, though: if you try to `open()` a directory on
Windows, it fails (our emulated `open()` sets `errno = EISDIR`).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles
  2020-09-22  8:24               ` Ævar Arnfjörð Bjarmason
@ 2020-11-19 11:38                 ` Johannes Schindelin
  0 siblings, 0 replies; 52+ messages in thread
From: Johannes Schindelin @ 2020-11-19 11:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Johannes Sixt, git, tytso, Junio C Hamano, Christoph Hellwig,
	Linus Torvalds, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 3109 bytes --]

Hi Ævar,

On Tue, 22 Sep 2020, Ævar Arnfjörð Bjarmason wrote:

> On Thu, Sep 17 2020, Johannes Sixt wrote:
>
> > Am 17.09.20 um 13:28 schrieb Ævar Arnfjörð Bjarmason:
> >> Change the behavior of core.fsyncObjectFiles to also sync the
> >> directory entry. I don't have a case where this broke, just going by
> >> paranoia and the fsync(2) manual page's guarantees about its behavior.
> >>
> >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> >> ---
> >>  sha1-file.c | 19 ++++++++++++++-----
> >>  1 file changed, 14 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/sha1-file.c b/sha1-file.c
> >> index dd65bd5c68..d286346921 100644
> >> --- a/sha1-file.c
> >> +++ b/sha1-file.c
> >> @@ -1784,10 +1784,14 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
> >>  }
> >>
> >>  /* Finalize a file on disk, and close it. */
> >> -static void close_loose_object(int fd)
> >> +static void close_loose_object(int fd, const struct strbuf *dirname)
> >>  {
> >> -	if (fsync_object_files)
> >> +	int dirfd;
> >> +	if (fsync_object_files) {
> >>  		fsync_or_die(fd, "loose object file");
> >> +		dirfd = xopen(dirname->buf, O_RDONLY);
> >> +		fsync_or_die(dirfd, "loose object directory");
> >
> > Did you have the opportunity to verify that this works on Windows?
> > Opening a directory with open(2), I mean: It's disallowed according to
> > the docs:
> > https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/open-wopen?view=vs-2019#return-value
>
> I did not, just did a quick hack for an RFC discussion (didn't even
> close() that fd), but if I pursue this I'll do it properly.
>
> Doing some research on it now reveals that we should probably have some
> Windows-specific code here, e.g. browsing GNUlib's source code reveals
> that it uses FlushFileBuffers(), and that code itself is taken from
> sqlite. SQLite also has special-case code for some Unix warts,
> e.g. OSX's and AIX's special fsync behaviors in its src/os_unix.c

If I understand correctly, the idea to `fsync()` directories is to ensure
that metadata updates (such as renames) are flushed, too?

If so (and please note that my understanding of NTFS is not super deep in
this regard), I think that we need not worry on Windows. I have come to
believe that the `rename()` operations are flushed pretty much
immediately, without any `FlushFileBuffers()` (or `_commit()`, as we
actually do in `compat/mingw.h`, to convince yourself see
https://github.com/git/git/blob/v2.29.2/compat/mingw.h#L135-L136).

Directories are not mentioned in `FlushFileBuffers()`'s documentation
(https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-flushfilebuffers)
nor in the documentation of `_commit()`:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/commit?view=msvc-160

Therefore, I believe that there is not even a Win32 equivalent of
`fsync()`ing directories.

Ciao,
Dscho

>
> >> +	} if (close(fd) != 0) die_errno(_("error when closing loose object
> >> file")); }
> >
> > -- Hannes
>
>
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2020-11-19 11:38 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-17 18:48 [PATCH] enable core.fsyncObjectFiles by default Christoph Hellwig
2018-01-17 19:04 ` Junio C Hamano
2018-01-17 19:35   ` Christoph Hellwig
2018-01-17 20:05     ` Andreas Schwab
2018-01-17 19:37   ` Matthew Wilcox
2018-01-17 19:42     ` Christoph Hellwig
2018-01-17 21:44   ` Ævar Arnfjörð Bjarmason
2018-01-17 22:07     ` Linus Torvalds
2018-01-17 22:25       ` Linus Torvalds
2018-01-17 23:16       ` Ævar Arnfjörð Bjarmason
2018-01-17 23:42         ` Linus Torvalds
2018-01-17 23:52       ` Theodore Ts'o
2018-01-17 23:57         ` Linus Torvalds
2018-01-18 16:27           ` Christoph Hellwig
2018-01-19 19:08             ` Junio C Hamano
2018-01-20 22:14               ` Theodore Ts'o
2018-01-20 22:27                 ` Junio C Hamano
2018-01-22 15:09                   ` Ævar Arnfjörð Bjarmason
2018-01-22 18:09                     ` Theodore Ts'o
2018-01-23  0:47                       ` Jeff King
2018-01-23  5:45                         ` Theodore Ts'o
2018-01-23 16:17                           ` Jeff King
2018-01-23  0:25                     ` Jeff King
2018-01-21 21:32             ` Chris Mason
2020-09-17 11:06         ` Ævar Arnfjörð Bjarmason
2020-09-17 11:28           ` [RFC PATCH 0/2] should core.fsyncObjectFiles fsync the dir entry + docs Ævar Arnfjörð Bjarmason
2020-09-17 11:28           ` [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles Ævar Arnfjörð Bjarmason
2020-09-17 13:16             ` Jeff King
2020-09-17 15:09               ` Christoph Hellwig
2020-09-17 14:09             ` Christoph Hellwig
2020-09-17 14:55               ` Jeff King
2020-09-17 14:56                 ` Christoph Hellwig
2020-09-17 15:37                   ` Junio C Hamano
2020-09-17 17:12                     ` Jeff King
2020-09-17 20:37                       ` Taylor Blau
2020-09-22 10:42               ` Ævar Arnfjörð Bjarmason
2020-09-17 20:21             ` Johannes Sixt
2020-09-22  8:24               ` Ævar Arnfjörð Bjarmason
2020-11-19 11:38                 ` Johannes Schindelin
2020-09-17 11:28           ` [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant Ævar Arnfjörð Bjarmason
2020-09-17 14:12             ` Christoph Hellwig
2020-09-17 15:43             ` Junio C Hamano
2020-09-17 20:15               ` Johannes Sixt
2020-10-08  8:13               ` Johannes Schindelin
2020-10-08 15:57                 ` Ævar Arnfjörð Bjarmason
2020-10-08 18:53                   ` Junio C Hamano
2020-10-09 10:44                   ` Johannes Schindelin
2020-09-17 19:21             ` Marc Branchaud
2020-09-17 14:14           ` [PATCH] enable core.fsyncObjectFiles by default Christoph Hellwig
2020-09-17 15:30           ` Junio C Hamano
2018-01-17 20:55 ` Jeff King
2018-01-17 21:10   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).