git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Enable core.fsyncObjectFiles by default
@ 2015-06-23 21:57 Stefan Beller
  2015-06-23 22:21 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Stefan Beller @ 2015-06-23 21:57 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, torvalds, Stefan Beller

Linus Torvalds started a discussion[1] if we want to play rather safe
than use defaults which make sense only for the most power users of Git:

> So git is "safe" in the sense that you won't really lose any data,
> but you may well be inconvenienced.  The "fsync each object" config
> option is there in case you don't want that inconvenience, but it
> should be noted that it can make for a hell of a performance impact.

> Of course, it might well be the case that the actual default
> might be worth turning around. Most git users probably don't
> care about that kind of "apply two hundred patches from Andrew
> Morton" kind of workload, although "rebase a big patch-series"
> does end up doing basically the same thing, and might be more
> common.

This patch enables fsync_object_files by default.

[1] https://plus.google.com/u/1/+JonathanCorbet/posts/JBxiKPe3VXa

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/config.txt | 8 ++++----
 environment.c            | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 43bb53c..dce2640 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -693,10 +693,10 @@ core.whitespace::
 core.fsyncObjectFiles::
 	This boolean will enable 'fsync()' when writing object files.
 +
-This is a total waste of time and effort on a filesystem that orders
-data writes properly, but can be useful for filesystems that do not use
-journalling (traditional UNIX filesystems) or that only journal metadata
-and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
+This ensures objects are written to disk instead of relying on the
+operating systems cache and eventual write. Disabling this option will
+yield performance with a trade off in safety for repository corruption
+during power loss.
 
 core.preloadIndex::
 	Enable parallel index preload for operations like 'git diff'
diff --git a/environment.c b/environment.c
index 61c685b..b406f5e 100644
--- a/environment.c
+++ b/environment.c
@@ -35,7 +35,7 @@ const char *git_attributes_file;
 int zlib_compression_level = Z_BEST_SPEED;
 int core_compression_level;
 int core_compression_seen;
-int fsync_object_files;
+int fsync_object_files = 1;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 96 * 1024 * 1024;
-- 
2.4.1.345.gab207b6.dirty

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH] Enable core.fsyncObjectFiles by default
  2015-06-23 21:57 [PATCH] Enable core.fsyncObjectFiles by default Stefan Beller
@ 2015-06-23 22:21 ` Junio C Hamano
  2015-06-23 23:29   ` Theodore Ts'o
  2015-06-24  1:07 ` Duy Nguyen
  2015-06-24  3:37 ` Jeff King
  2 siblings, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2015-06-23 22:21 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, peff, torvalds

Stefan Beller <sbeller@google.com> writes:

> Linus Torvalds started a discussion[1] if we want to play rather safe
> than use defaults which make sense only for the most power users of Git:
>
>> So git is "safe" in the sense that you won't really lose any data,
>> but you may well be inconvenienced.  The "fsync each object" config
>> option is there in case you don't want that inconvenience, but it
>> should be noted that it can make for a hell of a performance impact.
>
>> Of course, it might well be the case that the actual default
>> might be worth turning around. Most git users probably don't
>> care about that kind of "apply two hundred patches from Andrew
>> Morton" kind of workload, although "rebase a big patch-series"
>> does end up doing basically the same thing, and might be more
>> common.
>
> This patch enables fsync_object_files by default.

Sorry, but I fail to see which part of what Linus said (which is the
only thing you quoted from the discussion) argues for this patch.
If anything, I read that as cautioning people against making a
tradeoff based on an incorrect perception of risks and blindly
flipping this bit ON (the original discussion a few days ago, where
Ted says he has this bit ON while clarifying that he does so on SSD,
is also a sensible description on how he made his trade-off).

It's a different matter whom you would want to align with when
assessing your own risk tolerance.  If you quoted Corbet's original
message, then that would have been more consistent.

>
> [1] https://plus.google.com/u/1/+JonathanCorbet/posts/JBxiKPe3VXa
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  Documentation/config.txt | 8 ++++----
>  environment.c            | 2 +-
>  2 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 43bb53c..dce2640 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -693,10 +693,10 @@ core.whitespace::
>  core.fsyncObjectFiles::
>  	This boolean will enable 'fsync()' when writing object files.
>  +
> -This is a total waste of time and effort on a filesystem that orders
> -data writes properly, but can be useful for filesystems that do not use
> -journalling (traditional UNIX filesystems) or that only journal metadata
> -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
> +This ensures objects are written to disk instead of relying on the
> +operating systems cache and eventual write. Disabling this option will
> +yield performance with a trade off in safety for repository corruption
> +during power loss.
>  
>  core.preloadIndex::
>  	Enable parallel index preload for operations like 'git diff'
> diff --git a/environment.c b/environment.c
> index 61c685b..b406f5e 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -35,7 +35,7 @@ const char *git_attributes_file;
>  int zlib_compression_level = Z_BEST_SPEED;
>  int core_compression_level;
>  int core_compression_seen;
> -int fsync_object_files;
> +int fsync_object_files = 1;
>  size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
>  size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
>  size_t delta_base_cache_limit = 96 * 1024 * 1024;

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] Enable core.fsyncObjectFiles by default
  2015-06-23 22:21 ` Junio C Hamano
@ 2015-06-23 23:29   ` Theodore Ts'o
  2015-06-24  5:32     ` Junio C Hamano
  0 siblings, 1 reply; 37+ messages in thread
From: Theodore Ts'o @ 2015-06-23 23:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, git, peff, torvalds

The main issue is that non-expert users might not realize that they
really need to run "git fsck" after a crash; otherwise, what might
happen is that although git is only appending, that you might have
some zero-length (or partially written) git object or pack files, and
while you might not notice at that moment, it might come and bite you
later.  If you do try to recover immediately after a crash, in the
worst case you might have to do that "git am -s /tmp/mbox-filled-with
patches" command, but otherwise you won't lose much data.

So perhaps one alternative strategy to calling fsync(2) after every
single git object file write might be to have git create a zero-length
".git/in-progress-<pid>" file, which gets fsync'ed, and then it can do
the "git am -s /tmp/akpm-big-bag-o-patches" processing nice and fast,
and once git is done, then we call call sync(2) and then delete the
in-progress file.

If there is an in-progress file in the .git directory, git would then
automatically run git fsck to make sure the repository is consistent.

For people who care, maybe that's a good compromise.  (Me, the way
things are right now is just fine since I have a nice fast SSD, and so
setting fsyncObjectfiles is a perfectly fine thing as far as I am
concerned. :-)

		      		     	   - Ted

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] Enable core.fsyncObjectFiles by default
  2015-06-23 21:57 [PATCH] Enable core.fsyncObjectFiles by default Stefan Beller
  2015-06-23 22:21 ` Junio C Hamano
@ 2015-06-24  1:07 ` Duy Nguyen
  2015-06-24  3:37 ` Jeff King
  2 siblings, 0 replies; 37+ messages in thread
From: Duy Nguyen @ 2015-06-24  1:07 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git Mailing List, Junio C Hamano, Jeff King, Linus Torvalds

On Wed, Jun 24, 2015 at 4:57 AM, Stefan Beller <sbeller@google.com> wrote:
> Linus Torvalds started a discussion[1] if we want to play rather safe
> than use defaults which make sense only for the most power users of Git:
>
>> So git is "safe" in the sense that you won't really lose any data,
>> but you may well be inconvenienced.  The "fsync each object" config
>> option is there in case you don't want that inconvenience, but it
>> should be noted that it can make for a hell of a performance impact.
>
>> Of course, it might well be the case that the actual default
>> might be worth turning around. Most git users probably don't
>> care about that kind of "apply two hundred patches from Andrew
>> Morton" kind of workload, although "rebase a big patch-series"
>> does end up doing basically the same thing, and might be more
>> common.
>
> This patch enables fsync_object_files by default.

Will this make nfs performance a lot worse or still within acceptable range?
-- 
Duy

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] Enable core.fsyncObjectFiles by default
  2015-06-23 21:57 [PATCH] Enable core.fsyncObjectFiles by default Stefan Beller
  2015-06-23 22:21 ` Junio C Hamano
  2015-06-24  1:07 ` Duy Nguyen
@ 2015-06-24  3:37 ` Jeff King
  2015-06-24  5:20   ` Junio C Hamano
  2 siblings, 1 reply; 37+ messages in thread
From: Jeff King @ 2015-06-24  3:37 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Theodore Ts'o, git, gitster, torvalds

On Tue, Jun 23, 2015 at 02:57:23PM -0700, Stefan Beller wrote:

> Linus Torvalds started a discussion[1] if we want to play rather safe
> than use defaults which make sense only for the most power users of Git:
> 
> > So git is "safe" in the sense that you won't really lose any data,
> > but you may well be inconvenienced.  The "fsync each object" config
> > option is there in case you don't want that inconvenience, but it
> > should be noted that it can make for a hell of a performance impact.
> 
> > Of course, it might well be the case that the actual default
> > might be worth turning around. Most git users probably don't
> > care about that kind of "apply two hundred patches from Andrew
> > Morton" kind of workload, although "rebase a big patch-series"
> > does end up doing basically the same thing, and might be more
> > common.
> 
> This patch enables fsync_object_files by default.

If you are looking for safety out of the box, I think this falls far
short, as we do not fsync all of the other files. For instance, we do
not fsync refs before they are written (nor anything else that uses the
commit_lock_file() interface to rename, such as the index).  We do
always fsync packfiles and their indices.

I had always assumed this was fine on ext4 with data=ordered (i.e.,
either the rename and its pointed-to content will go through, or not; so
you either get your update or the old state, but not a garbage or empty
file). But it sounds from what Ted wrote in:

  http://article.gmane.org/gmane.linux.file-systems/97255

that this may not be the case. If it's not, I think we should consider
fsyncing ref writes.

-Peff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] Enable core.fsyncObjectFiles by default
  2015-06-24  3:37 ` Jeff King
@ 2015-06-24  5:20   ` Junio C Hamano
  0 siblings, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2015-06-24  5:20 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, Theodore Ts'o, git, torvalds

Jeff King <peff@peff.net> writes:

> I had always assumed this was fine on ext4 with data=ordered (i.e.,
> either the rename and its pointed-to content will go through, or not; so
> you either get your update or the old state, but not a garbage or empty
> file). But it sounds from what Ted wrote in:
>
>   http://article.gmane.org/gmane.linux.file-systems/97255
>
> that this may not be the case. If it's not, I think we should consider
> fsyncing ref writes.

That matches my understanding.  IIRC, we do fsync (without a knob to
turn it off) packs, we do not fsync refs (as you described above),
the index, or working tree files (via write_entry()).  Among these:

 * If we are so paranoid that loss of new loose objects matter (as
   opposed to "Heh, it is just the matter of 'git add' again") to
   cause us to set core.fsyncObjectFiles, we should definitely fsync
   ref writes.  They are even more precious.

 * The index is much less precious.  In the worst case, you can 'git
   reset' (no flags) and re-add from the working tree and nothing
   unrecoverable is lost.  I do not mind a knob to force us fsync,
   but we may want it to be separate from core.fsyncObjectFiles.

 * The working tree files?  I am not sure.  Again, recovery is just
   to run "git diff" to notice what was eaten by the power failure
   and then "git checkout $path" to restore from a known state,
   so...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] Enable core.fsyncObjectFiles by default
  2015-06-23 23:29   ` Theodore Ts'o
@ 2015-06-24  5:32     ` Junio C Hamano
  2015-06-24 14:30       ` Theodore Ts'o
  0 siblings, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2015-06-24  5:32 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Stefan Beller, git, peff, torvalds

Theodore Ts'o <tytso@mit.edu> writes:

> The main issue is that non-expert users might not realize that they
> really need to run "git fsck" after a crash; otherwise, what might
> happen is that although git is only appending, that you might have
> some zero-length (or partially written) git object or pack files, and
> while you might not notice at that moment, it might come and bite you
> later.

Regarding loose object files, given that we write to a temporary,
optionally fsync, close and then move to the final name, would we
still see partially written file if we omit the fsync, or would the
corruption be limited to either empty or missing?

The reason I am wondering is because the codepath to create an
object (i.e. "update-index --add", "hash-object -w", or "add") first
checks if a packed or a loose object file _exists_ and if so
bypasses writing the same thing anew, but the existence check for a
loose object is to merely making sure that access(F_OK) (and
optionally utime()) succeeds.  If the potential breakage is limited
to truncation to empty, then we could replace it with stat(2) and
st.st_size check, as no loose object file can be empty.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] Enable core.fsyncObjectFiles by default
  2015-06-24  5:32     ` Junio C Hamano
@ 2015-06-24 14:30       ` Theodore Ts'o
  0 siblings, 0 replies; 37+ messages in thread
From: Theodore Ts'o @ 2015-06-24 14:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, git, peff, torvalds

On Tue, Jun 23, 2015 at 10:32:08PM -0700, Junio C Hamano wrote:
> 
> Regarding loose object files, given that we write to a temporary,
> optionally fsync, close and then move to the final name, would we
> still see partially written file if we omit the fsync, or would the
> corruption be limited to either empty or missing?

*Most* of the time the corruption will be an empty or missing file.
It's possible that the file could be partially written.  This is a
relatively low-probability event, with the probability going up if the
object file is large, and/or if the system is under memory pressure.

> The reason I am wondering is because the codepath to create an
> object (i.e. "update-index --add", "hash-object -w", or "add") first
> checks if a packed or a loose object file _exists_ and if so
> bypasses writing the same thing anew, but the existence check for a
> loose object is to merely making sure that access(F_OK) (and
> optionally utime()) succeeds.  If the potential breakage is limited
> to truncation to empty, then we could replace it with stat(2) and
> st.st_size check, as no loose object file can be empty.

It would certainly be a good thing to do a st_size check; it can't
possible hurt, and it will catch a large number of failures after a
power failure.  I could also imagine some hueristics that force an
fsync if the object file is larger than a certain size (say, 4k if you
are very paranoid, a few hundred kilobytes if you are less so), but
past a certain point, it might be better just to tell the user to use
fsyncObjectFiles and be done with it.

						- Ted

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2020-09-17 11:06         ` Ævar Arnfjörð Bjarmason
  2020-09-17 14:14           ` Christoph Hellwig
@ 2020-09-17 15:30           ` Junio C Hamano
  1 sibling, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2020-09-17 15:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Theodore Ts'o, Linus Torvalds, Christoph Hellwig,
	Git Mailing List, linux-fsdevel, Jacob Vosmaer

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> [I didn't find an ideal message to reply to in this thread, but this
> seemed to probably be the best]
>
> Just an update on this since I went back and looked at this thread,
> GitLab about ~1yr ago turned on core.fsyncObjectFiles=true by
> default.
>
> The reason is detailed in [1], tl;dr: empty loose object file issue on
> ext4 allegedly caused by a lack of core.fsyncObjectFiles=true, but I
> didn't do any root cause analysis. Just noting it here for for future
> reference.
>
> 1. https://gitlab.com/gitlab-org/gitlab-foss/-/issues/51680#note_180508774

Thanks for bringing the original discussion back.  I do recall the
discussion and the list of issues to think about raised by Peff back
then in [2] is still relevant.


2. https://public-inbox.org/git/20180117205509.GA14828@sigill.intra.peff.net/




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2020-09-17 11:06         ` Ævar Arnfjörð Bjarmason
@ 2020-09-17 14:14           ` Christoph Hellwig
  2020-09-17 15:30           ` Junio C Hamano
  1 sibling, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2020-09-17 14:14 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Theodore Ts'o, Linus Torvalds, Junio C Hamano,
	Christoph Hellwig, Git Mailing List, linux-fsdevel,
	Jacob Vosmaer

On Thu, Sep 17, 2020 at 01:06:50PM +0200, Ævar Arnfjörð Bjarmason wrote:
> The reason is detailed in [1], tl;dr: empty loose object file issue on
> ext4 allegedly caused by a lack of core.fsyncObjectFiles=true, but I
> didn't do any root cause analysis. Just noting it here for for future
> reference.

All the modern Linux file systems first write the data, and then only
write the metadata after the data write has finished.  So your data might
have been partially or fully on disk, but until the transaction to commit
the size change and/or extent state change you're not going to be able
to read it back.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 23:52       ` Theodore Ts'o
  2018-01-17 23:57         ` Linus Torvalds
@ 2020-09-17 11:06         ` Ævar Arnfjörð Bjarmason
  2020-09-17 14:14           ` Christoph Hellwig
  2020-09-17 15:30           ` Junio C Hamano
  1 sibling, 2 replies; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2020-09-17 11:06 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Linus Torvalds, Junio C Hamano, Christoph Hellwig,
	Git Mailing List, linux-fsdevel, Jacob Vosmaer


On Thu, Jan 18 2018, Theodore Ts'o wrote:

> On Wed, Jan 17, 2018 at 02:07:22PM -0800, Linus Torvalds wrote:
>> 
>> Now re-do the test while another process writes to a totally unrelated
>> a huge file (say, do a ISO file copy or something).
>> 
>> That was the thing that several filesystems get completely and
>> horribly wrong. Generally _particularly_ the logging filesystems that
>> don't even need the fsync, because they use a single log for
>> everything (so fsync serializes all the writes, not just the writes to
>> the one file it's fsync'ing).
>
> Well, let's be fair; this is something *ext3* got wrong, and it was
> the default file system back them.  All of the modern file systems now
> do delayed allocation, which means that an fsync of one file doesn't
> actually imply an fsync of another file.  Hence...
>
>> The original git design was very much to write each object file
>> without any syncing, because they don't matter since a new object file
>> - by definition - isn't really reachable. Then sync before writing the
>> index file or a new ref.
>
> This isn't really safe any more.  Yes, there's a single log.  But
> files which are subject to delayed allocation are in the page cache,
> and just because you fsync the index file doesn't mean that the object
> file is now written to disk.  It was true for ext3, but it's not true
> for ext4, xfs, btrfs, etc.
>
> The good news is that if you have another process downloading a huge
> ISO image, the fsync of the index file won't force the ISO file to be
> written out.  The bad news is that it won't force out the other git
> object files, either.
>
> Now, there is a potential downside of fsync'ing each object file, and
> that is the cost of doing a CACHE FLUSH on a HDD is non-trivial, and
> even on a SSD, it's not optimal to call CACHE FLUSH thousands of times
> in a second.  So if you are creating thousands of tiny files, and you
> fsync each one, each fsync(2) call is a serializing instruction, which
> means it won't return until that one file is written to disk.  If you
> are writing lots of small files, and you are using a HDD, you'll be
> bottlenecked to around 30 files per second on a 5400 RPM HDD, and this
> is true regardless of what file system you use, because the bottle
> neck is the CACHE FLUSH operation, and how you organize the metadata
> and how you do the block allocation, is largely lost in the noise
> compared to the CACHE FLUSH command, which serializes everything.
>
> There are solutions to this; you could simply not call fsync(2) a
> thousand times, and instead write a pack file, and call fsync once on
> the pack file.  That's probably the smartest approach.
>
> You could also create a thousand threads, and call fsync(2) on those
> thousand threads at roughly the same time.  Or you could use a
> bleeding edge kernel with the latest AIO patch, and use the newly
> added IOCB_CMD_FSYNC support.
>
> But I'd simply recommend writing a pack and fsync'ing the pack,
> instead of trying to write a gazillion object files.  (git-repack -A,
> I'm looking at you....)
>
> 					- Ted

[I didn't find an ideal message to reply to in this thread, but this
seemed to probably be the best]

Just an update on this since I went back and looked at this thread,
GitLab about ~1yr ago turned on core.fsyncObjectFiles=true by
default.

The reason is detailed in [1], tl;dr: empty loose object file issue on
ext4 allegedly caused by a lack of core.fsyncObjectFiles=true, but I
didn't do any root cause analysis. Just noting it here for for future
reference.

1. https://gitlab.com/gitlab-org/gitlab-foss/-/issues/51680#note_180508774

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-23  5:45                         ` Theodore Ts'o
@ 2018-01-23 16:17                           ` Jeff King
  0 siblings, 0 replies; 37+ messages in thread
From: Jeff King @ 2018-01-23 16:17 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Christoph Hellwig, Linus Torvalds, Git Mailing List,
	linux-fsdevel, Chris Mason

On Tue, Jan 23, 2018 at 12:45:53AM -0500, Theodore Ts'o wrote:

> What I was thinking about instead is that in cases where we know we
> are likely to be creating a large number of loose objects (whether
> they referenced or not), in a world where we will be calling fsync(2)
> after every single loose object being created, pack files start
> looking *way* more efficient.  So in general, if you know you will be
> creating N loose objects, where N is probably around 50 or so, you'll
> want to create a pack instead.
> 
> One of those cases is "repack -A", and in that case the loose objects
> are all going tobe not referenced, so it would be a "cruft pack".  But
> in many other cases where we might be importing from another DCVS,
> which will be another case where doing an fsync(2) after every loose
> object creation (and where I have sometimes seen it create them *all*
> loose, and not use a pack at all), is going to get extremely slow and
> painful.

Ah, I see. I think in the general case of git operations this is hard
(because most object writes don't realize the larger operation that
they're a part of). But I agree that those two are the low-hanging fruit
(imports should already be using fast-import, and "cruft packs" are not
too hard an idea to implement).

I agree that a cruft-pack implementation could just be for "repack -A",
and does not have to collect otherwise loose objects. I think part of my
confusion was that you and I are coming to the idea from different
angles: you care about minimizing fsyncs, and I'm interested in stopping
the problem where you have too many loose objects after running auto-gc.
So I care more about collecting those loose objects for that case.

> > So if we pack all the loose objects into a cruft pack, the mtime of the
> > cruft pack becomes the new gauge for "recent". And if we migrate objects
> > from old cruft pack to new cruft pack at each gc, then they'll keep
> > getting their mtimes refreshed, and we'll never drop them.
> 
> Well, I was assuming that gc would be a special case which doesn't the
> mtime of the old cruft pack.  (Or more generally, any time an object
> is gets copied out of the cruft pack, either to a loose object, or to
> another pack, the mtime on the source pack should not be touched.)

Right, that's the "you have multiple cruft packs" idea which has been
discussed[1] (each one just hangs around until its mtime expires, and
may duplicate objects found elsewhere).

That does end up with one pack per gc, which just punts the "too many
loose objects" to "too many packs". But unless the number of gc runs you
do is very high compared to the expiration time, we can probably ignore
that.

-Peff

[1] https://public-inbox.org/git/20170610080626.sjujpmgkli4muh7h@sigill.intra.peff.net/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-23  0:47                       ` Jeff King
@ 2018-01-23  5:45                         ` Theodore Ts'o
  2018-01-23 16:17                           ` Jeff King
  0 siblings, 1 reply; 37+ messages in thread
From: Theodore Ts'o @ 2018-01-23  5:45 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Christoph Hellwig, Linus Torvalds, Git Mailing List,
	linux-fsdevel, Chris Mason

On Mon, Jan 22, 2018 at 07:47:10PM -0500, Jeff King wrote:
> 
> I think Ævar is talking about the case of:
> 
>   1. You make 100 objects that aren't referenced. They're loose.
> 
>   2. You run git-gc. They're still too recent to be deleted.
> 
> Right now those recent loose objects sit loose, and have zero cost at
> the time of gc.  In a "cruft pack" world, you'd pay some I/O to copy
> them into the cruft pack, and some CPU to zlib and delta-compress them.
> I think that's probably fine, though.

I wasn't assuming that git-gc would create a cruft pack --- although I
guess it could.  As you say, recent loose objects have relatively zero
cost at the time of gc.  To the extent that the gc has to read lots of
loose files, there may be more seeks in the cold cache case, so there
is actually *some* cost to having the loose objects, but it's not
great.

What I was thinking about instead is that in cases where we know we
are likely to be creating a large number of loose objects (whether
they referenced or not), in a world where we will be calling fsync(2)
after every single loose object being created, pack files start
looking *way* more efficient.  So in general, if you know you will be
creating N loose objects, where N is probably around 50 or so, you'll
want to create a pack instead.

One of those cases is "repack -A", and in that case the loose objects
are all going tobe not referenced, so it would be a "cruft pack".  But
in many other cases where we might be importing from another DCVS,
which will be another case where doing an fsync(2) after every loose
object creation (and where I have sometimes seen it create them *all*
loose, and not use a pack at all), is going to get extremely slow and
painful.

> So if we pack all the loose objects into a cruft pack, the mtime of the
> cruft pack becomes the new gauge for "recent". And if we migrate objects
> from old cruft pack to new cruft pack at each gc, then they'll keep
> getting their mtimes refreshed, and we'll never drop them.

Well, I was assuming that gc would be a special case which doesn't the
mtime of the old cruft pack.  (Or more generally, any time an object
is gets copied out of the cruft pack, either to a loose object, or to
another pack, the mtime on the source pack should not be touched.)

	      	  	       	      	   	   - Ted

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-22 18:09                     ` Theodore Ts'o
@ 2018-01-23  0:47                       ` Jeff King
  2018-01-23  5:45                         ` Theodore Ts'o
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff King @ 2018-01-23  0:47 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Christoph Hellwig, Linus Torvalds, Git Mailing List,
	linux-fsdevel, Chris Mason

On Mon, Jan 22, 2018 at 01:09:03PM -0500, Theodore Ts'o wrote:

> > Wouldn't it also make gc pruning more expensive? Now you can repack
> > regularly and loose objects will be left out of the pack, and then just
> > rm'd, whereas now it would entail creating new packs (unless the whole
> > pack was objects meant for removal).
> 
> The idea is that the cruft pack would be all objects that were no
> longer referenced.  Hence the proposal that if they ever *are*
> accessed, they would be exploded to a loose object at that point.  So
> in the common case, the GC would go quickly since the entire pack
> could just be rm'ed once it hit the designated expiry time.

I think Ævar is talking about the case of:

  1. You make 100 objects that aren't referenced. They're loose.

  2. You run git-gc. They're still too recent to be deleted.

Right now those recent loose objects sit loose, and have zero cost at
the time of gc.  In a "cruft pack" world, you'd pay some I/O to copy
them into the cruft pack, and some CPU to zlib and delta-compress them.
I think that's probably fine, though.

That said, some of what you wrote left me confused, and whether we're
all talking about the same idea. ;) Let me describe the idea I had
mentioned in another thread.  Right now the behavior is basically this:

If an unreachable object becomes referenced, it doesn't immediately get
exploded. During the next gc, whatever new object referenced them would
be one of:

  1. Reachable from refs, in which case it carries along the
     formerly-cruft object into the new pack, since it is now also
     reachable.

  2. Unreachable but still recent by mtime; we keep such objects, and
     anything they reference (now as unreachable, in this proposal in
     the cruft pack). Now these get either left loose, or exploded loose
     if they were previously packed.

  3. Unreachable and old. Both objects can be dropped totally.

The current strategy is to use the mtimes for "recent", and we use the
pack's mtime for every object in the pack.

So if we pack all the loose objects into a cruft pack, the mtime of the
cruft pack becomes the new gauge for "recent". And if we migrate objects
from old cruft pack to new cruft pack at each gc, then they'll keep
getting their mtimes refreshed, and we'll never drop them.

So we need to either:

  - keep per-object mtimes, so that old ones can age out (i.e., they'd
    hit case 3 and just not get migrated to either the new "real" pack
    or the new cruft pack).

  - keep multiple cruft packs, and let whole packs age out. But then
    cruft objects which get referenced again by other cruft have to get
    copied (not moved!) to new packs. That _probably_ doesn't happen all
    that often, so it might be OK.

> Another way of doing things would be to use the mtime of the cruft
> pack for the expiry time, and if the curft pack is ever referenced,
> its mtime would get updated.  Yet a third way would be to simply clear
> the "cruft" bit if it ever *is* referenced.  In the common case, it
> would never be referenced, so it could just get deleted, but in the
> case where the user has manually "rescued" a set of commits (perhaps
> by explicitly setting a branch head to commit id found from a reflog),
> the objects would be saved.

I don't think we have to worry about "rescued" objects. Those are
reachable, so they'd get copied into the new "real" pack (and then their
cruft pack eventually deleted).

-Peff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-22 15:09                   ` Ævar Arnfjörð Bjarmason
  2018-01-22 18:09                     ` Theodore Ts'o
@ 2018-01-23  0:25                     ` Jeff King
  1 sibling, 0 replies; 37+ messages in thread
From: Jeff King @ 2018-01-23  0:25 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Theodore Ts'o, Christoph Hellwig,
	Linus Torvalds, Git Mailing List, linux-fsdevel, Chris Mason

On Mon, Jan 22, 2018 at 04:09:23PM +0100, Ævar Arnfjörð Bjarmason wrote:

> > Yes, a "cruft pack" that holds unreachable object has been discussed
> > a few times recently on list, and I do agree that it is a desirable
> > thing to have in the longer run.
> >
> > What's tricky is to devise a way to allow us to salvage objects that
> > are placed in a cruft pack because they are accessed recently,
> > proving themselves to be no longer crufts.  It could be that a good
> > way to resurrect them is to explode them to loose form when they are
> > accessed out of a cruft pack.  We need to worry about interactions
> > with read-only users if we go that route, but with the current
> > "explode unreachable to loose, touch their mtime when they are
> > accessed" scheme ends up ignoring accesses from read-only users that
> > cannot update mtime, so it might not be too bad.
> 
> Wouldn't it also make gc pruning more expensive? Now you can repack
> regularly and loose objects will be left out of the pack, and then just
> rm'd, whereas now it would entail creating new packs (unless the whole
> pack was objects meant for removal).
> 
> Probably still worth it, but something to keep in mind.

That's a good point. I think it would be OK in practice, though, since
normal operations don't tend to create a huge number of unreachable
loose objects (at least compared to the _reachable_ loose objects, which
we're already dealing with). We tend to get unbalanced explosions of
loose objects only because a huge chunk of packed history expired.

It is something to keep in mind when implementing the scheme, though.
Luckily we already have the right behavior implemented via
--pack-loose-unreachable (which is used for "repack -k" currently), so I
think it would just be a matter of passing the right flags from
git-repack.

-Peff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-22 15:09                   ` Ævar Arnfjörð Bjarmason
@ 2018-01-22 18:09                     ` Theodore Ts'o
  2018-01-23  0:47                       ` Jeff King
  2018-01-23  0:25                     ` Jeff King
  1 sibling, 1 reply; 37+ messages in thread
From: Theodore Ts'o @ 2018-01-22 18:09 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Christoph Hellwig, Linus Torvalds,
	Git Mailing List, linux-fsdevel, Chris Mason, Jeff King

On Mon, Jan 22, 2018 at 04:09:23PM +0100, Ævar Arnfjörð Bjarmason wrote:
> > What's tricky is to devise a way to allow us to salvage objects that
> > are placed in a cruft pack because they are accessed recently,
> > proving themselves to be no longer crufts.  It could be that a good
> > way to resurrect them is to explode them to loose form when they are
> > accessed out of a cruft pack.  We need to worry about interactions
> > with read-only users if we go that route, but with the current
> > "explode unreachable to loose, touch their mtime when they are
> > accessed" scheme ends up ignoring accesses from read-only users that
> > cannot update mtime, so it might not be too bad.
> 
> Wouldn't it also make gc pruning more expensive? Now you can repack
> regularly and loose objects will be left out of the pack, and then just
> rm'd, whereas now it would entail creating new packs (unless the whole
> pack was objects meant for removal).

The idea is that the cruft pack would be all objects that were no
longer referenced.  Hence the proposal that if they ever *are*
accessed, they would be exploded to a loose object at that point.  So
in the common case, the GC would go quickly since the entire pack
could just be rm'ed once it hit the designated expiry time.

Another way of doing things would be to use the mtime of the cruft
pack for the expiry time, and if the curft pack is ever referenced,
its mtime would get updated.  Yet a third way would be to simply clear
the "cruft" bit if it ever *is* referenced.  In the common case, it
would never be referenced, so it could just get deleted, but in the
case where the user has manually "rescued" a set of commits (perhaps
by explicitly setting a branch head to commit id found from a reflog),
the objects would be saved.

So there are many ways it could be managed.

							- Ted

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-20 22:27                 ` Junio C Hamano
@ 2018-01-22 15:09                   ` Ævar Arnfjörð Bjarmason
  2018-01-22 18:09                     ` Theodore Ts'o
  2018-01-23  0:25                     ` Jeff King
  0 siblings, 2 replies; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-22 15:09 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Theodore Ts'o, Christoph Hellwig, Linus Torvalds,
	Git Mailing List, linux-fsdevel, Chris Mason, Jeff King


On Sat, Jan 20 2018, Junio C. Hamano jotted:

> Theodore Ts'o <tytso@mit.edu> writes:
>
>> ....  I've never been fond of the "git repack -A" behavior
>> where it can generate huge numbers of loose files.  I'd much prefer it
>> if the other objects ended up in a separate pack file, and then some
>> other provision made for nuking that pack file some time later....
>
> Yes, a "cruft pack" that holds unreachable object has been discussed
> a few times recently on list, and I do agree that it is a desirable
> thing to have in the longer run.
>
> What's tricky is to devise a way to allow us to salvage objects that
> are placed in a cruft pack because they are accessed recently,
> proving themselves to be no longer crufts.  It could be that a good
> way to resurrect them is to explode them to loose form when they are
> accessed out of a cruft pack.  We need to worry about interactions
> with read-only users if we go that route, but with the current
> "explode unreachable to loose, touch their mtime when they are
> accessed" scheme ends up ignoring accesses from read-only users that
> cannot update mtime, so it might not be too bad.

Wouldn't it also make gc pruning more expensive? Now you can repack
regularly and loose objects will be left out of the pack, and then just
rm'd, whereas now it would entail creating new packs (unless the whole
pack was objects meant for removal).

Probably still worth it, but something to keep in mind.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-18 16:27           ` Christoph Hellwig
  2018-01-19 19:08             ` Junio C Hamano
@ 2018-01-21 21:32             ` Chris Mason
  1 sibling, 0 replies; 37+ messages in thread
From: Chris Mason @ 2018-01-21 21:32 UTC (permalink / raw)
  To: Christoph Hellwig, Linus Torvalds
  Cc: Theodore Ts'o, Ævar Arnfjörð Bjarmason,
	Junio C Hamano, Git Mailing List, linux-fsdevel

On 01/18/2018 11:27 AM, Christoph Hellwig wrote:
> [adding Chris to the Cc list - this is about the awful ext3 data=ordered
> behavior of syncing the whole file system data and metadata on each
> fsync]
> 
> On Wed, Jan 17, 2018 at 03:57:53PM -0800, Linus Torvalds wrote:
>> On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>>>
>>> Well, let's be fair; this is something *ext3* got wrong, and it was
>>> the default file system back them.
>>
>> I'm pretty sure reiserfs and btrfs did too..
> 
> I'm pretty sure btrfs never did, and reiserfs at least looks like
> it currently doesn't but I'd have to dig into history to check if
> it ever did.
> 

Christoph has this right, btrfs only fsyncs the one file that you've 
asked for, and unrelated data/metadata won't be included.

We've seen big fsync stalls on ext4 caused by data=ordered, so it's 
still possible to trigger on ext4, but much better than ext3.

I do share Ted's concern about the perf impact of the fsyncs on tons of 
loose files, but the defaults should be safety first.

-chris

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-20 22:14               ` Theodore Ts'o
@ 2018-01-20 22:27                 ` Junio C Hamano
  2018-01-22 15:09                   ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2018-01-20 22:27 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Christoph Hellwig, Linus Torvalds,
	Ævar Arnfjörð Bjarmason, Git Mailing List,
	linux-fsdevel, Chris Mason, Jeff King

Theodore Ts'o <tytso@mit.edu> writes:

> ....  I've never been fond of the "git repack -A" behavior
> where it can generate huge numbers of loose files.  I'd much prefer it
> if the other objects ended up in a separate pack file, and then some
> other provision made for nuking that pack file some time later....

Yes, a "cruft pack" that holds unreachable object has been discussed
a few times recently on list, and I do agree that it is a desirable
thing to have in the longer run.

What's tricky is to devise a way to allow us to salvage objects that
are placed in a cruft pack because they are accessed recently,
proving themselves to be no longer crufts.  It could be that a good
way to resurrect them is to explode them to loose form when they are
accessed out of a cruft pack.  We need to worry about interactions
with read-only users if we go that route, but with the current
"explode unreachable to loose, touch their mtime when they are
accessed" scheme ends up ignoring accesses from read-only users that
cannot update mtime, so it might not be too bad.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-19 19:08             ` Junio C Hamano
@ 2018-01-20 22:14               ` Theodore Ts'o
  2018-01-20 22:27                 ` Junio C Hamano
  0 siblings, 1 reply; 37+ messages in thread
From: Theodore Ts'o @ 2018-01-20 22:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Christoph Hellwig, Linus Torvalds,
	Ævar Arnfjörð Bjarmason, Git Mailing List,
	linux-fsdevel, Chris Mason

On Fri, Jan 19, 2018 at 11:08:46AM -0800, Junio C Hamano wrote:
> So..., is it fair to say that the one you sent in
> 
>   https://public-inbox.org/git/20180117193510.GA30657@lst.de/
> 
> is the best variant we have seen in this thread so far?  I'll keep
> that in my inbox so that I do not forget, but I think we would want
> to deal with a hotfix for 2.16 on case insensitive platforms before
> this topic.

It's a simplistic fix, but it will work.  There may very well be
certain workloads which generate a large number of loose objects
(e.g., git repack -A) which will make things go significantly more
slowly as a result.  It might very well be the case that if nothing
else is going on, something like "write all the files without
fsync(2), then use syncfs(2)" would be much faster.  The downside with
that approach is if indeed you were downloading a multi-gigabyte DVD
image at the same time, the syncfs(2) will force a writeback of the
partially writte DVD image, or some other unrelated files.

But if the goal is to just change the default, and then see what
shakes out, and then apply other optimizations later, that's certainly
a valid result.  I've never been fond of the "git repack -A" behavior
where it can generate huge numbers of loose files.  I'd much prefer it
if the other objects ended up in a separate pack file, and then some
other provision made for nuking that pack file some time later.  But
that's expanding the scope significantly over what's currently being
discussed.

						- Ted

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-18 16:27           ` Christoph Hellwig
@ 2018-01-19 19:08             ` Junio C Hamano
  2018-01-20 22:14               ` Theodore Ts'o
  2018-01-21 21:32             ` Chris Mason
  1 sibling, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2018-01-19 19:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, Theodore Ts'o,
	Ævar Arnfjörð Bjarmason, Git Mailing List,
	linux-fsdevel, Chris Mason

Christoph Hellwig <hch@lst.de> writes:

> [adding Chris to the Cc list - this is about the awful ext3 data=ordered
> behavior of syncing the whole file system data and metadata on each
> fsync]
>
> On Wed, Jan 17, 2018 at 03:57:53PM -0800, Linus Torvalds wrote:
>> On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>> >
>> > Well, let's be fair; this is something *ext3* got wrong, and it was
>> > the default file system back them.
>> 
>> I'm pretty sure reiserfs and btrfs did too..
>
> I'm pretty sure btrfs never did, and reiserfs at least looks like
> it currently doesn't but I'd have to dig into history to check if
> it ever did.

So..., is it fair to say that the one you sent in

  https://public-inbox.org/git/20180117193510.GA30657@lst.de/

is the best variant we have seen in this thread so far?  I'll keep
that in my inbox so that I do not forget, but I think we would want
to deal with a hotfix for 2.16 on case insensitive platforms before
this topic.

Thanks.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 23:57         ` Linus Torvalds
@ 2018-01-18 16:27           ` Christoph Hellwig
  2018-01-19 19:08             ` Junio C Hamano
  2018-01-21 21:32             ` Chris Mason
  0 siblings, 2 replies; 37+ messages in thread
From: Christoph Hellwig @ 2018-01-18 16:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Ts'o, Ævar Arnfjörð Bjarmason,
	Junio C Hamano, Christoph Hellwig, Git Mailing List,
	linux-fsdevel, Chris Mason

[adding Chris to the Cc list - this is about the awful ext3 data=ordered
behavior of syncing the whole file system data and metadata on each
fsync]

On Wed, Jan 17, 2018 at 03:57:53PM -0800, Linus Torvalds wrote:
> On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> >
> > Well, let's be fair; this is something *ext3* got wrong, and it was
> > the default file system back them.
> 
> I'm pretty sure reiserfs and btrfs did too..

I'm pretty sure btrfs never did, and reiserfs at least looks like
it currently doesn't but I'd have to dig into history to check if
it ever did.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 23:52       ` Theodore Ts'o
@ 2018-01-17 23:57         ` Linus Torvalds
  2018-01-18 16:27           ` Christoph Hellwig
  2020-09-17 11:06         ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2018-01-17 23:57 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Christoph Hellwig, Git Mailing List, linux-fsdevel

On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>
> Well, let's be fair; this is something *ext3* got wrong, and it was
> the default file system back them.

I'm pretty sure reiserfs and btrfs did too..

          Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 22:07     ` Linus Torvalds
  2018-01-17 22:25       ` Linus Torvalds
  2018-01-17 23:16       ` Ævar Arnfjörð Bjarmason
@ 2018-01-17 23:52       ` Theodore Ts'o
  2018-01-17 23:57         ` Linus Torvalds
  2020-09-17 11:06         ` Ævar Arnfjörð Bjarmason
  2 siblings, 2 replies; 37+ messages in thread
From: Theodore Ts'o @ 2018-01-17 23:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Christoph Hellwig, Git Mailing List, linux-fsdevel

On Wed, Jan 17, 2018 at 02:07:22PM -0800, Linus Torvalds wrote:
> 
> Now re-do the test while another process writes to a totally unrelated
> a huge file (say, do a ISO file copy or something).
> 
> That was the thing that several filesystems get completely and
> horribly wrong. Generally _particularly_ the logging filesystems that
> don't even need the fsync, because they use a single log for
> everything (so fsync serializes all the writes, not just the writes to
> the one file it's fsync'ing).

Well, let's be fair; this is something *ext3* got wrong, and it was
the default file system back them.  All of the modern file systems now
do delayed allocation, which means that an fsync of one file doesn't
actually imply an fsync of another file.  Hence...

> The original git design was very much to write each object file
> without any syncing, because they don't matter since a new object file
> - by definition - isn't really reachable. Then sync before writing the
> index file or a new ref.

This isn't really safe any more.  Yes, there's a single log.  But
files which are subject to delayed allocation are in the page cache,
and just because you fsync the index file doesn't mean that the object
file is now written to disk.  It was true for ext3, but it's not true
for ext4, xfs, btrfs, etc.

The good news is that if you have another process downloading a huge
ISO image, the fsync of the index file won't force the ISO file to be
written out.  The bad news is that it won't force out the other git
object files, either.

Now, there is a potential downside of fsync'ing each object file, and
that is the cost of doing a CACHE FLUSH on a HDD is non-trivial, and
even on a SSD, it's not optimal to call CACHE FLUSH thousands of times
in a second.  So if you are creating thousands of tiny files, and you
fsync each one, each fsync(2) call is a serializing instruction, which
means it won't return until that one file is written to disk.  If you
are writing lots of small files, and you are using a HDD, you'll be
bottlenecked to around 30 files per second on a 5400 RPM HDD, and this
is true regardless of what file system you use, because the bottle
neck is the CACHE FLUSH operation, and how you organize the metadata
and how you do the block allocation, is largely lost in the noise
compared to the CACHE FLUSH command, which serializes everything.

There are solutions to this; you could simply not call fsync(2) a
thousand times, and instead write a pack file, and call fsync once on
the pack file.  That's probably the smartest approach.

You could also create a thousand threads, and call fsync(2) on those
thousand threads at roughly the same time.  Or you could use a
bleeding edge kernel with the latest AIO patch, and use the newly
added IOCB_CMD_FSYNC support.

But I'd simply recommend writing a pack and fsync'ing the pack,
instead of trying to write a gazillion object files.  (git-repack -A,
I'm looking at you....)

					- Ted

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 23:16       ` Ævar Arnfjörð Bjarmason
@ 2018-01-17 23:42         ` Linus Torvalds
  0 siblings, 0 replies; 37+ messages in thread
From: Linus Torvalds @ 2018-01-17 23:42 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Christoph Hellwig, Git Mailing List,
	linux-fsdevel, Roberto Tyley

On Wed, Jan 17, 2018 at 3:16 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> Or does overall FS activity and raw throughput (e.g. with an ISO copy)
> matter more than general FS contention?

Traditionally, yes.

Also note that none of this is about "throughput". It's about waiting
for a second or two when you do a simple "git commit" or similar.

Also, hardware has changed. I haven't used a rotational disk in about
10 years now, and I don't worry about latencies so much more.

The fact that you get ~100k iops indicates that you probably aren't
using those stupid platters of rust either. So I doubt you can even
trigger the bad cases.

                Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 22:07     ` Linus Torvalds
  2018-01-17 22:25       ` Linus Torvalds
@ 2018-01-17 23:16       ` Ævar Arnfjörð Bjarmason
  2018-01-17 23:42         ` Linus Torvalds
  2018-01-17 23:52       ` Theodore Ts'o
  2 siblings, 1 reply; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-17 23:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, Christoph Hellwig, Git Mailing List,
	linux-fsdevel, Roberto Tyley


On Wed, Jan 17 2018, Linus Torvalds jotted:

> On Wed, Jan 17, 2018 at 1:44 PM, Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>>     I ran a small test myself on CentOS 7 (3.10) with ext4 data=ordered
>>     on the tests I thought might do a lot of loose object writes:
>>
>>       $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS="NO_OPENSSL=Y CFLAGS=-O3 -j56" ./run origin/master fsync-on~ fsync-on p3400-rebase.sh p0007-write-cache.sh
>>       [...]
>>       Test                                                            fsync-on~         fsync-on
>>       -------------------------------------------------------------------------------------------------------
>>       3400.2: rebase on top of a lot of unrelated changes             1.45(1.30+0.17)   1.45(1.28+0.20) +0.0%
>>       3400.4: rebase a lot of unrelated changes without split-index   4.34(3.71+0.66)   4.33(3.69+0.66) -0.2%
>>       3400.6: rebase a lot of unrelated changes with split-index      3.38(2.94+0.47)   3.38(2.93+0.47) +0.0%
>>       0007.2: write_locked_index 3 times (3214 files)                 0.01(0.00+0.00)   0.01(0.00+0.00) +0.0%
>>
>>    No impact. However I did my own test of running the test suite 10%
>>    times with/without this patch, and it runs 9% slower:

That should be "10 times" b.t.w., not "10% times"

>>
>>      fsync-off: avg:21.59 21.50 21.50 21.52 21.53 21.54 21.57 21.59 21.61 21.63 21.95
>>       fsync-on: avg:23.43 23.21 23.25 23.26 23.26 23.27 23.32 23.49 23.51 23.83 23.88
>
> That's not the thing you should check.
>
> Now re-do the test while another process writes to a totally unrelated
> a huge file (say, do a ISO file copy or something).
>
> That was the thing that several filesystems get completely and
> horribly wrong. Generally _particularly_ the logging filesystems that
> don't even need the fsync, because they use a single log for
> everything (so fsync serializes all the writes, not just the writes to
> the one file it's fsync'ing).
>
> The original git design was very much to write each object file
> without any syncing, because they don't matter since a new object file
> - by definition - isn't really reachable. Then sync before writing the
> index file or a new ref.
>>
> But things have changed, I'm not arguing that the code shouldn't be
> made safe by default. I personally refuse to use rotating media on my
> machines anyway, largely exactly because of the fsync() issue (things
> like "firefox" started doing fsync on the mysql database for stupid
> things, and you'd get huge pauses).
>
> But I do think your benchmark is wrong. The case where only git does
> something is not interesting or relevant. It really is "git does
> something _and_ somebody else writes something entirely unrelated at
> the same time" that matters.

Yeah it's shitty, just a quick hack to get some since there was a
discussion of performance, but neither your original patch or this
thread had quoted any.

One thing you may have missed is that this is a parallel (56 tests at a
time) run of the full test suite. So there's plenty of other git
processes (and test setup/teardown) racing with any given git
process. Running the test suite in a loop like this gives me ~100K IO
ops/s & ~50% disk utilization.

Or does overall FS activity and raw throughput (e.g. with an ISO copy)
matter more than general FS contention?

Tweaking it to emulate this iso copy case, running another test with one
of these running concurrently:

    # setup
    dd if=/dev/urandom of=/tmp/fake.iso bs=1024 count=$((1000*1024))
    # run in a loop (shuf to not always write the same thing)
    while sleep 0.1; do shuf /tmp/fake.iso | pv >/tmp/fake.shuf.iso; done

Gives throughput that spikes to 100% (not consistently) and:

    fsync-off: avg:36.37 31.74 33.83 35.12 36.19 36.32 37.04 37.34 37.71 37.93 40.43
     fsync-on: avg:38.09 34.56 35.14 35.69 36.41 36.41 37.96 38.25 40.45 41.44 44.59

~4.7% slower, v.s. ~8.5% in my earlier
87h8rki2iu.fsf@evledraar.gmail.com without that running.

Which is not an argument for / against this patch, but those numbers
seem significant, and generally if the entire test suite slows down by
that much there's going to be sub-parts of it that are much worse.

Which might be a reason to tread more carefully and if it *does* slow
things down perhaps do it with more granularity, e.g. turning it on in
git-receive-pack might be more sensible than in git-filter-branch.

I remember Roberto Tyley's BFG writing an amazing amount of loose
objects, but it doesn't seem to have an fsync() option, I wonder if
adding one would be a representative pathological test case.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 22:07     ` Linus Torvalds
@ 2018-01-17 22:25       ` Linus Torvalds
  2018-01-17 23:16       ` Ævar Arnfjörð Bjarmason
  2018-01-17 23:52       ` Theodore Ts'o
  2 siblings, 0 replies; 37+ messages in thread
From: Linus Torvalds @ 2018-01-17 22:25 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Christoph Hellwig, Git Mailing List, linux-fsdevel

On Wed, Jan 17, 2018 at 2:07 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The original git design was very much to write each object file
> without any syncing, because they don't matter since a new object file
> - by definition - isn't really reachable. Then sync before writing the
> index file or a new ref.

.. actually, I think it originally sync'ed only before starting to
actually remove objects due to repacking.

The theory was that you can lose the last commit series or whatever,
and have to git-fsck, and have to re-do it, but you could never lose
long-term work. If your machine crashes, you still remember what you
did just before the crash.

That theory may have been more correct back in the days than it is
now. People who use git might be less willing to treat it like a
filesystem that you can fsck than I was back 10+ ago.

It's worth noting that the commit that Junio pointed to (from 2008)
didn't actually change any behavior. It just allowed people who cared
to change behavior. The original "let's not waste time on fsync every
object write, because we can just re-create the state anyway" behavior
goes back to 2005.

              Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 21:44   ` Ævar Arnfjörð Bjarmason
@ 2018-01-17 22:07     ` Linus Torvalds
  2018-01-17 22:25       ` Linus Torvalds
                         ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Linus Torvalds @ 2018-01-17 22:07 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Christoph Hellwig, Git Mailing List, linux-fsdevel

On Wed, Jan 17, 2018 at 1:44 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>     I ran a small test myself on CentOS 7 (3.10) with ext4 data=ordered
>     on the tests I thought might do a lot of loose object writes:
>
>       $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS="NO_OPENSSL=Y CFLAGS=-O3 -j56" ./run origin/master fsync-on~ fsync-on p3400-rebase.sh p0007-write-cache.sh
>       [...]
>       Test                                                            fsync-on~         fsync-on
>       -------------------------------------------------------------------------------------------------------
>       3400.2: rebase on top of a lot of unrelated changes             1.45(1.30+0.17)   1.45(1.28+0.20) +0.0%
>       3400.4: rebase a lot of unrelated changes without split-index   4.34(3.71+0.66)   4.33(3.69+0.66) -0.2%
>       3400.6: rebase a lot of unrelated changes with split-index      3.38(2.94+0.47)   3.38(2.93+0.47) +0.0%
>       0007.2: write_locked_index 3 times (3214 files)                 0.01(0.00+0.00)   0.01(0.00+0.00) +0.0%
>
>    No impact. However I did my own test of running the test suite 10%
>    times with/without this patch, and it runs 9% slower:
>
>      fsync-off: avg:21.59 21.50 21.50 21.52 21.53 21.54 21.57 21.59 21.61 21.63 21.95
>       fsync-on: avg:23.43 23.21 23.25 23.26 23.26 23.27 23.32 23.49 23.51 23.83 23.88

That's not the thing you should check.

Now re-do the test while another process writes to a totally unrelated
a huge file (say, do a ISO file copy or something).

That was the thing that several filesystems get completely and
horribly wrong. Generally _particularly_ the logging filesystems that
don't even need the fsync, because they use a single log for
everything (so fsync serializes all the writes, not just the writes to
the one file it's fsync'ing).

The original git design was very much to write each object file
without any syncing, because they don't matter since a new object file
- by definition - isn't really reachable. Then sync before writing the
index file or a new ref.

But things have changed, I'm not arguing that the code shouldn't be
made safe by default. I personally refuse to use rotating media on my
machines anyway, largely exactly because of the fsync() issue (things
like "firefox" started doing fsync on the mysql database for stupid
things, and you'd get huge pauses).

But I do think your benchmark is wrong. The case where only git does
something is not interesting or relevant. It really is "git does
something _and_ somebody else writes something entirely unrelated at
the same time" that matters.

                  Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 19:04 ` Junio C Hamano
  2018-01-17 19:35   ` Christoph Hellwig
  2018-01-17 19:37   ` Matthew Wilcox
@ 2018-01-17 21:44   ` Ævar Arnfjörð Bjarmason
  2018-01-17 22:07     ` Linus Torvalds
  2 siblings, 1 reply; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-17 21:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Christoph Hellwig, git, linux-fsdevel, Linus Torvalds


On Wed, Jan 17 2018, Junio C. Hamano jotted:

> Christoph Hellwig <hch@lst.de> writes:
>
>> fsync is required for data integrity as there is no gurantee that
>> data makes it to disk at any specified time without it.  Even for
>> ext3 with data=ordered mode the file system will only commit all
>> data at some point in time that is not guaranteed.
>
> It comes from this one:
>
> commit aafe9fbaf4f1d1f27a6f6e3eb3e246fff81240ef
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Wed Jun 18 15:18:44 2008 -0700
>
>     Add config option to enable 'fsync()' of object files
>
>     As explained in the documentation[*] this is totally useless on
>     filesystems that do ordered/journalled data writes, but it can be a
>     useful safety feature on filesystems like HFS+ that only journal the
>     metadata, not the actual file contents.
>
>     It defaults to off, although we could presumably in theory some day
>     auto-enable it on a per-filesystem basis.
>
>     [*] Yes, I updated the docs for the thing.  Hell really _has_ frozen
>         over, and the four horsemen are probably just beyond the horizon.
>         EVERYBODY PANIC!
>
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index 0e25b2c92..9a1cec5c8 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -866,10 +866,8 @@ core.whitespace::
>>  core.fsyncObjectFiles::
>>  	This boolean will enable 'fsync()' when writing object files.
>>  +
>> -This is a total waste of time and effort on a filesystem that orders
>> -data writes properly, but can be useful for filesystems that do not use
>> -journalling (traditional UNIX filesystems) or that only journal metadata
>> -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
>> +This option is enabled by default and ensures actual data integrity
>> +by calling fsync after writing object files.
>
> I am somewhat sympathetic to the desire to flip the default to
> "safe" and allow those who know they are already safe to tweak the
> knob for performance, and it also makes sense to document that the
> default is "true" here.  But I do not see the point of removing the
> four lines from this paragraph; the sole effect of the removal is to
> rob information from readers that they can use to decide if they
> want to disable the configuration, no?

[CC'd the author of the current behavior]

Some points/questions:

 a) Is there some reliable way to test whether this is needed from
    userspace? I'm thinking something like `git update-index
    --test-untracked-cache` but for fsync().

 b) On the filesystems that don't need this, what's the performance
    impact?

    I ran a small test myself on CentOS 7 (3.10) with ext4 data=ordered
    on the tests I thought might do a lot of loose object writes:

      $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS="NO_OPENSSL=Y CFLAGS=-O3 -j56" ./run origin/master fsync-on~ fsync-on p3400-rebase.sh p0007-write-cache.sh
      [...]
      Test                                                            fsync-on~         fsync-on
      -------------------------------------------------------------------------------------------------------
      3400.2: rebase on top of a lot of unrelated changes             1.45(1.30+0.17)   1.45(1.28+0.20) +0.0%
      3400.4: rebase a lot of unrelated changes without split-index   4.34(3.71+0.66)   4.33(3.69+0.66) -0.2%
      3400.6: rebase a lot of unrelated changes with split-index      3.38(2.94+0.47)   3.38(2.93+0.47) +0.0%
      0007.2: write_locked_index 3 times (3214 files)                 0.01(0.00+0.00)   0.01(0.00+0.00) +0.0%

   No impact. However I did my own test of running the test suite 10%
   times with/without this patch, and it runs 9% slower:

     fsync-off: avg:21.59 21.50 21.50 21.52 21.53 21.54 21.57 21.59 21.61 21.63 21.95
      fsync-on: avg:23.43 23.21 23.25 23.26 23.26 23.27 23.32 23.49 23.51 23.83 23.88

   Test script at the end of this E-Mail.

 c) What sort of guarantees in this regard do NFS-mounted filesystems
    commonly make?

Test script:

use v5.10.0;
use strict;
use warnings;
use Time::HiRes qw(time);
use List::Util qw(sum);
use Data::Dumper;

my %time;
for my $ref (@ARGV) {
    system "git checkout $ref";
    system qq[make -j56 CFLAGS="-O3 -g" NO_OPENSSL=Y all];
    for (1..10) {
        my $t0 = -time();
        system "(cd t && NO_SVN_TESTS=1 GIT_TEST_HTTPD=0 prove -j56 --state=slow,save t[0-9]*.sh)";
        $t0 += time();
        push @{$time{$ref}} => $t0;
    }
}
for my $ref (sort keys %time) {
    printf "%20s: avg:%.2f %s\n",
        $ref,
        sum(@{$time{$ref}})/@{$time{$ref}},
        join(" ", map { sprintf "%.02f", $_ } sort { $a <=> $b } @{$time{$ref}});
}

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 20:55 ` Jeff King
@ 2018-01-17 21:10   ` Christoph Hellwig
  0 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2018-01-17 21:10 UTC (permalink / raw)
  To: Jeff King; +Cc: Christoph Hellwig, git, linux-fsdevel

On Wed, Jan 17, 2018 at 03:55:09PM -0500, Jeff King wrote:
> I'm definitely sympathetic, and I've contemplated a patch like this a
> few times. But I'm not sure we're "safe by default" here after this
> patch. In particular:
> 
>   1. This covers only loose objects. We generally sync pack writes
>      already, so we're covered there. But we do not sync ref updates at
>      all, which we'd probably want to in a default-safe setup (a common
>      post-crash symptom I've seen is zero-length ref files).

I've not seen them myself yet, but yes, they need an fsync.

>   2. Is it sufficient to fsync() the individual file's descriptors?
>      We often do other filesystem operations (like hardlinking or
>      renaming) that also need to be committed to disk before an
>      operation can be considered saved.

No, for metadata operations we need to fsync the directory as well.

>   3. Related to (2), we often care about the order of metadata commits.
>      E.g., a common sequence is:
> 
>        a. Write object contents to tempfile.
> 
>        b. rename() or hardlink tempfile to final name.
> 
>        c. Write object name into ref.lock file.
> 
>        d. rename() ref.lock to ref
> 
>      If we see (d) but not (b), then the result is a corrupted
>      repository. Is this guaranteed by ext4 journaling with
>      data=ordered?

It is not generally guranteed by Linux file system semantics.  Various
file system will actually start writeback of file data before rename,
but not actually wait on it.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 18:48 [PATCH] enable " Christoph Hellwig
  2018-01-17 19:04 ` Junio C Hamano
@ 2018-01-17 20:55 ` Jeff King
  2018-01-17 21:10   ` Christoph Hellwig
  1 sibling, 1 reply; 37+ messages in thread
From: Jeff King @ 2018-01-17 20:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: git, linux-fsdevel

On Wed, Jan 17, 2018 at 07:48:28PM +0100, Christoph Hellwig wrote:

> fsync is required for data integrity as there is no gurantee that
> data makes it to disk at any specified time without it.  Even for
> ext3 with data=ordered mode the file system will only commit all
> data at some point in time that is not guaranteed.
> 
> I've lost data on development machines with various times countless
> times due to the lack of this option, and now lost trees on a
> git server with ext4 as well yesterday.  It's time to make git
> safe by default.

I'm definitely sympathetic, and I've contemplated a patch like this a
few times. But I'm not sure we're "safe by default" here after this
patch. In particular:

  1. This covers only loose objects. We generally sync pack writes
     already, so we're covered there. But we do not sync ref updates at
     all, which we'd probably want to in a default-safe setup (a common
     post-crash symptom I've seen is zero-length ref files).

  2. Is it sufficient to fsync() the individual file's descriptors?
     We often do other filesystem operations (like hardlinking or
     renaming) that also need to be committed to disk before an
     operation can be considered saved.

  3. Related to (2), we often care about the order of metadata commits.
     E.g., a common sequence is:

       a. Write object contents to tempfile.

       b. rename() or hardlink tempfile to final name.

       c. Write object name into ref.lock file.

       d. rename() ref.lock to ref

     If we see (d) but not (b), then the result is a corrupted
     repository. Is this guaranteed by ext4 journaling with
     data=ordered?

It may be that data=ordered gets us what we need for (2) and (3). But I
think at the very least we should consider fsyncing ref updates based on
a config option, too.

-Peff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 19:35   ` Christoph Hellwig
@ 2018-01-17 20:05     ` Andreas Schwab
  0 siblings, 0 replies; 37+ messages in thread
From: Andreas Schwab @ 2018-01-17 20:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Junio C Hamano, git, linux-fsdevel

On Jan 17 2018, Christoph Hellwig <hch@lst.de> wrote:

> I've lost data on development machines with various times countless
> times due to the lack of this option, and now lost trees on a

Too many times. :-)

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 19:37   ` Matthew Wilcox
@ 2018-01-17 19:42     ` Christoph Hellwig
  0 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2018-01-17 19:42 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Junio C Hamano, Christoph Hellwig, git, linux-fsdevel

On Wed, Jan 17, 2018 at 11:37:31AM -0800, Matthew Wilcox wrote:
> How about this instead?
> 
> This option is enabled by default and ensures data integrity by calling
> fsync after writing object files.  It is not necessary on filesystems
> which journal data writes, but is still necessary on filesystems which
> do not use journalling (ext2), or that only journal metadata writes
> (OS X's HFS+, or Linux's ext4 with "data=writeback").  Turning this
> option off will increase performance at the possible risk of data loss.

I think this goes entirely into the wrong direction.  The point is
fsync is always the right thing to do.  But on ext3 (and ext3 only)
the right thing is way too painful, and conveniently ext3 happens
to be almost ok without it.  So if anything should get a special
mention it is ext3.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 19:04 ` Junio C Hamano
  2018-01-17 19:35   ` Christoph Hellwig
@ 2018-01-17 19:37   ` Matthew Wilcox
  2018-01-17 19:42     ` Christoph Hellwig
  2018-01-17 21:44   ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 37+ messages in thread
From: Matthew Wilcox @ 2018-01-17 19:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Christoph Hellwig, git, linux-fsdevel

On Wed, Jan 17, 2018 at 11:04:32AM -0800, Junio C Hamano wrote:
> Christoph Hellwig <hch@lst.de> writes:
> > @@ -866,10 +866,8 @@ core.whitespace::
> >  core.fsyncObjectFiles::
> >  	This boolean will enable 'fsync()' when writing object files.
> >  +
> > -This is a total waste of time and effort on a filesystem that orders
> > -data writes properly, but can be useful for filesystems that do not use
> > -journalling (traditional UNIX filesystems) or that only journal metadata
> > -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
> > +This option is enabled by default and ensures actual data integrity
> > +by calling fsync after writing object files.
> 
> I am somewhat sympathetic to the desire to flip the default to
> "safe" and allow those who know they are already safe to tweak the
> knob for performance, and it also makes sense to document that the
> default is "true" here.  But I do not see the point of removing the
> four lines from this paragraph; the sole effect of the removal is to
> rob information from readers that they can use to decide if they
> want to disable the configuration, no?

How about this instead?

This option is enabled by default and ensures data integrity by calling
fsync after writing object files.  It is not necessary on filesystems
which journal data writes, but is still necessary on filesystems which
do not use journalling (ext2), or that only journal metadata writes
(OS X's HFS+, or Linux's ext4 with "data=writeback").  Turning this
option off will increase performance at the possible risk of data loss.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 19:04 ` Junio C Hamano
@ 2018-01-17 19:35   ` Christoph Hellwig
  2018-01-17 20:05     ` Andreas Schwab
  2018-01-17 19:37   ` Matthew Wilcox
  2018-01-17 21:44   ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2018-01-17 19:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Christoph Hellwig, git, linux-fsdevel

On Wed, Jan 17, 2018 at 11:04:32AM -0800, Junio C Hamano wrote:
> I am somewhat sympathetic to the desire to flip the default to
> "safe" and allow those who know they are already safe to tweak the
> knob for performance, and it also makes sense to document that the
> default is "true" here.  But I do not see the point of removing the
> four lines from this paragraph; the sole effect of the removal is to
> rob information from readers that they can use to decide if they
> want to disable the configuration, no?

Does this sound any better?  It is a little to technical for my
taste, but I couldn't come up with anything better:

---
From ab8f2d38dfe40e74de5399af0d069427c7473b76 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Wed, 17 Jan 2018 19:42:46 +0100
Subject: enable core.fsyncObjectFiles by default

fsync is required for data integrity as there is no gurantee that
data makes it to disk at any specified time without it.  Even for
ext3 with data=ordered mode the file system will only commit all
data at some point in time that is not guaranteed.

I've lost data on development machines with various times countless
times due to the lack of this option, and now lost trees on a
git server with ext4 as well yesterday.  It's time to make git
safe by default.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/config.txt | 11 +++++++----
 environment.c            |  2 +-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0e25b2c92..8b99f1389 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -866,10 +866,13 @@ core.whitespace::
 core.fsyncObjectFiles::
 	This boolean will enable 'fsync()' when writing object files.
 +
-This is a total waste of time and effort on a filesystem that orders
-data writes properly, but can be useful for filesystems that do not use
-journalling (traditional UNIX filesystems) or that only journal metadata
-and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
+This option is enabled by default and ensures actual data integrity
+by calling fsync after writing object files.
+
+Note that this might be really slow on ext3 in the traditional
+data=ordered mode, in which case you might want to disable this option.
+ext3 in data=ordered mode will order the actual data writeout with
+metadata operation, but not actually guarantee data integrity either.
 
 core.preloadIndex::
 	Enable parallel index preload for operations like 'git diff'
diff --git a/environment.c b/environment.c
index 63ac38a46..c74375b5e 100644
--- a/environment.c
+++ b/environment.c
@@ -36,7 +36,7 @@ const char *git_hooks_path;
 int zlib_compression_level = Z_BEST_SPEED;
 int core_compression_level;
 int pack_compression_level = Z_DEFAULT_COMPRESSION;
-int fsync_object_files;
+int fsync_object_files = 1;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 96 * 1024 * 1024;
-- 
2.14.2




^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH] enable core.fsyncObjectFiles by default
  2018-01-17 18:48 [PATCH] enable " Christoph Hellwig
@ 2018-01-17 19:04 ` Junio C Hamano
  2018-01-17 19:35   ` Christoph Hellwig
                     ` (2 more replies)
  2018-01-17 20:55 ` Jeff King
  1 sibling, 3 replies; 37+ messages in thread
From: Junio C Hamano @ 2018-01-17 19:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: git, linux-fsdevel

Christoph Hellwig <hch@lst.de> writes:

> fsync is required for data integrity as there is no gurantee that
> data makes it to disk at any specified time without it.  Even for
> ext3 with data=ordered mode the file system will only commit all
> data at some point in time that is not guaranteed.

It comes from this one:

commit aafe9fbaf4f1d1f27a6f6e3eb3e246fff81240ef
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Jun 18 15:18:44 2008 -0700

    Add config option to enable 'fsync()' of object files
    
    As explained in the documentation[*] this is totally useless on
    filesystems that do ordered/journalled data writes, but it can be a
    useful safety feature on filesystems like HFS+ that only journal the
    metadata, not the actual file contents.
    
    It defaults to off, although we could presumably in theory some day
    auto-enable it on a per-filesystem basis.
    
    [*] Yes, I updated the docs for the thing.  Hell really _has_ frozen
        over, and the four horsemen are probably just beyond the horizon.
        EVERYBODY PANIC!
    
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 0e25b2c92..9a1cec5c8 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -866,10 +866,8 @@ core.whitespace::
>  core.fsyncObjectFiles::
>  	This boolean will enable 'fsync()' when writing object files.
>  +
> -This is a total waste of time and effort on a filesystem that orders
> -data writes properly, but can be useful for filesystems that do not use
> -journalling (traditional UNIX filesystems) or that only journal metadata
> -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
> +This option is enabled by default and ensures actual data integrity
> +by calling fsync after writing object files.

I am somewhat sympathetic to the desire to flip the default to
"safe" and allow those who know they are already safe to tweak the
knob for performance, and it also makes sense to document that the
default is "true" here.  But I do not see the point of removing the
four lines from this paragraph; the sole effect of the removal is to
rob information from readers that they can use to decide if they
want to disable the configuration, no?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH] enable core.fsyncObjectFiles by default
@ 2018-01-17 18:48 Christoph Hellwig
  2018-01-17 19:04 ` Junio C Hamano
  2018-01-17 20:55 ` Jeff King
  0 siblings, 2 replies; 37+ messages in thread
From: Christoph Hellwig @ 2018-01-17 18:48 UTC (permalink / raw)
  To: git; +Cc: linux-fsdevel

fsync is required for data integrity as there is no gurantee that
data makes it to disk at any specified time without it.  Even for
ext3 with data=ordered mode the file system will only commit all
data at some point in time that is not guaranteed.

I've lost data on development machines with various times countless
times due to the lack of this option, and now lost trees on a
git server with ext4 as well yesterday.  It's time to make git
safe by default.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/config.txt | 6 ++----
 environment.c            | 2 +-
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0e25b2c92..9a1cec5c8 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -866,10 +866,8 @@ core.whitespace::
 core.fsyncObjectFiles::
 	This boolean will enable 'fsync()' when writing object files.
 +
-This is a total waste of time and effort on a filesystem that orders
-data writes properly, but can be useful for filesystems that do not use
-journalling (traditional UNIX filesystems) or that only journal metadata
-and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
+This option is enabled by default and ensures actual data integrity
+by calling fsync after writing object files.
 
 core.preloadIndex::
 	Enable parallel index preload for operations like 'git diff'
diff --git a/environment.c b/environment.c
index 63ac38a46..c74375b5e 100644
--- a/environment.c
+++ b/environment.c
@@ -36,7 +36,7 @@ const char *git_hooks_path;
 int zlib_compression_level = Z_BEST_SPEED;
 int core_compression_level;
 int pack_compression_level = Z_DEFAULT_COMPRESSION;
-int fsync_object_files;
+int fsync_object_files = 1;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 96 * 1024 * 1024;
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2020-09-17 15:31 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-23 21:57 [PATCH] Enable core.fsyncObjectFiles by default Stefan Beller
2015-06-23 22:21 ` Junio C Hamano
2015-06-23 23:29   ` Theodore Ts'o
2015-06-24  5:32     ` Junio C Hamano
2015-06-24 14:30       ` Theodore Ts'o
2015-06-24  1:07 ` Duy Nguyen
2015-06-24  3:37 ` Jeff King
2015-06-24  5:20   ` Junio C Hamano
2018-01-17 18:48 [PATCH] enable " Christoph Hellwig
2018-01-17 19:04 ` Junio C Hamano
2018-01-17 19:35   ` Christoph Hellwig
2018-01-17 20:05     ` Andreas Schwab
2018-01-17 19:37   ` Matthew Wilcox
2018-01-17 19:42     ` Christoph Hellwig
2018-01-17 21:44   ` Ævar Arnfjörð Bjarmason
2018-01-17 22:07     ` Linus Torvalds
2018-01-17 22:25       ` Linus Torvalds
2018-01-17 23:16       ` Ævar Arnfjörð Bjarmason
2018-01-17 23:42         ` Linus Torvalds
2018-01-17 23:52       ` Theodore Ts'o
2018-01-17 23:57         ` Linus Torvalds
2018-01-18 16:27           ` Christoph Hellwig
2018-01-19 19:08             ` Junio C Hamano
2018-01-20 22:14               ` Theodore Ts'o
2018-01-20 22:27                 ` Junio C Hamano
2018-01-22 15:09                   ` Ævar Arnfjörð Bjarmason
2018-01-22 18:09                     ` Theodore Ts'o
2018-01-23  0:47                       ` Jeff King
2018-01-23  5:45                         ` Theodore Ts'o
2018-01-23 16:17                           ` Jeff King
2018-01-23  0:25                     ` Jeff King
2018-01-21 21:32             ` Chris Mason
2020-09-17 11:06         ` Ævar Arnfjörð Bjarmason
2020-09-17 14:14           ` Christoph Hellwig
2020-09-17 15:30           ` Junio C Hamano
2018-01-17 20:55 ` Jeff King
2018-01-17 21:10   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).