* [PATCH] Enable core.fsyncObjectFiles by default @ 2015-06-23 21:57 Stefan Beller 2015-06-23 22:21 ` Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 37+ messages in thread From: Stefan Beller @ 2015-06-23 21:57 UTC (permalink / raw) To: git; +Cc: gitster, peff, torvalds, Stefan Beller Linus Torvalds started a discussion[1] if we want to play rather safe than use defaults which make sense only for the most power users of Git: > So git is "safe" in the sense that you won't really lose any data, > but you may well be inconvenienced. The "fsync each object" config > option is there in case you don't want that inconvenience, but it > should be noted that it can make for a hell of a performance impact. > Of course, it might well be the case that the actual default > might be worth turning around. Most git users probably don't > care about that kind of "apply two hundred patches from Andrew > Morton" kind of workload, although "rebase a big patch-series" > does end up doing basically the same thing, and might be more > common. This patch enables fsync_object_files by default. [1] https://plus.google.com/u/1/+JonathanCorbet/posts/JBxiKPe3VXa Signed-off-by: Stefan Beller <sbeller@google.com> --- Documentation/config.txt | 8 ++++---- environment.c | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 43bb53c..dce2640 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -693,10 +693,10 @@ core.whitespace:: core.fsyncObjectFiles:: This boolean will enable 'fsync()' when writing object files. + -This is a total waste of time and effort on a filesystem that orders -data writes properly, but can be useful for filesystems that do not use -journalling (traditional UNIX filesystems) or that only journal metadata -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback"). +This ensures objects are written to disk instead of relying on the +operating systems cache and eventual write. Disabling this option will +yield performance with a trade off in safety for repository corruption +during power loss. core.preloadIndex:: Enable parallel index preload for operations like 'git diff' diff --git a/environment.c b/environment.c index 61c685b..b406f5e 100644 --- a/environment.c +++ b/environment.c @@ -35,7 +35,7 @@ const char *git_attributes_file; int zlib_compression_level = Z_BEST_SPEED; int core_compression_level; int core_compression_seen; -int fsync_object_files; +int fsync_object_files = 1; size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE; size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT; size_t delta_base_cache_limit = 96 * 1024 * 1024; -- 2.4.1.345.gab207b6.dirty ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH] Enable core.fsyncObjectFiles by default 2015-06-23 21:57 [PATCH] Enable core.fsyncObjectFiles by default Stefan Beller @ 2015-06-23 22:21 ` Junio C Hamano 2015-06-23 23:29 ` Theodore Ts'o 2015-06-24 1:07 ` Duy Nguyen 2015-06-24 3:37 ` Jeff King 2 siblings, 1 reply; 37+ messages in thread From: Junio C Hamano @ 2015-06-23 22:21 UTC (permalink / raw) To: Stefan Beller; +Cc: git, peff, torvalds Stefan Beller <sbeller@google.com> writes: > Linus Torvalds started a discussion[1] if we want to play rather safe > than use defaults which make sense only for the most power users of Git: > >> So git is "safe" in the sense that you won't really lose any data, >> but you may well be inconvenienced. The "fsync each object" config >> option is there in case you don't want that inconvenience, but it >> should be noted that it can make for a hell of a performance impact. > >> Of course, it might well be the case that the actual default >> might be worth turning around. Most git users probably don't >> care about that kind of "apply two hundred patches from Andrew >> Morton" kind of workload, although "rebase a big patch-series" >> does end up doing basically the same thing, and might be more >> common. > > This patch enables fsync_object_files by default. Sorry, but I fail to see which part of what Linus said (which is the only thing you quoted from the discussion) argues for this patch. If anything, I read that as cautioning people against making a tradeoff based on an incorrect perception of risks and blindly flipping this bit ON (the original discussion a few days ago, where Ted says he has this bit ON while clarifying that he does so on SSD, is also a sensible description on how he made his trade-off). It's a different matter whom you would want to align with when assessing your own risk tolerance. If you quoted Corbet's original message, then that would have been more consistent. > > [1] https://plus.google.com/u/1/+JonathanCorbet/posts/JBxiKPe3VXa > > Signed-off-by: Stefan Beller <sbeller@google.com> > --- > Documentation/config.txt | 8 ++++---- > environment.c | 2 +- > 2 files changed, 5 insertions(+), 5 deletions(-) > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 43bb53c..dce2640 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -693,10 +693,10 @@ core.whitespace:: > core.fsyncObjectFiles:: > This boolean will enable 'fsync()' when writing object files. > + > -This is a total waste of time and effort on a filesystem that orders > -data writes properly, but can be useful for filesystems that do not use > -journalling (traditional UNIX filesystems) or that only journal metadata > -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback"). > +This ensures objects are written to disk instead of relying on the > +operating systems cache and eventual write. Disabling this option will > +yield performance with a trade off in safety for repository corruption > +during power loss. > > core.preloadIndex:: > Enable parallel index preload for operations like 'git diff' > diff --git a/environment.c b/environment.c > index 61c685b..b406f5e 100644 > --- a/environment.c > +++ b/environment.c > @@ -35,7 +35,7 @@ const char *git_attributes_file; > int zlib_compression_level = Z_BEST_SPEED; > int core_compression_level; > int core_compression_seen; > -int fsync_object_files; > +int fsync_object_files = 1; > size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE; > size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT; > size_t delta_base_cache_limit = 96 * 1024 * 1024; ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Enable core.fsyncObjectFiles by default 2015-06-23 22:21 ` Junio C Hamano @ 2015-06-23 23:29 ` Theodore Ts'o 2015-06-24 5:32 ` Junio C Hamano 0 siblings, 1 reply; 37+ messages in thread From: Theodore Ts'o @ 2015-06-23 23:29 UTC (permalink / raw) To: Junio C Hamano; +Cc: Stefan Beller, git, peff, torvalds The main issue is that non-expert users might not realize that they really need to run "git fsck" after a crash; otherwise, what might happen is that although git is only appending, that you might have some zero-length (or partially written) git object or pack files, and while you might not notice at that moment, it might come and bite you later. If you do try to recover immediately after a crash, in the worst case you might have to do that "git am -s /tmp/mbox-filled-with patches" command, but otherwise you won't lose much data. So perhaps one alternative strategy to calling fsync(2) after every single git object file write might be to have git create a zero-length ".git/in-progress-<pid>" file, which gets fsync'ed, and then it can do the "git am -s /tmp/akpm-big-bag-o-patches" processing nice and fast, and once git is done, then we call call sync(2) and then delete the in-progress file. If there is an in-progress file in the .git directory, git would then automatically run git fsck to make sure the repository is consistent. For people who care, maybe that's a good compromise. (Me, the way things are right now is just fine since I have a nice fast SSD, and so setting fsyncObjectfiles is a perfectly fine thing as far as I am concerned. :-) - Ted ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Enable core.fsyncObjectFiles by default 2015-06-23 23:29 ` Theodore Ts'o @ 2015-06-24 5:32 ` Junio C Hamano 2015-06-24 14:30 ` Theodore Ts'o 0 siblings, 1 reply; 37+ messages in thread From: Junio C Hamano @ 2015-06-24 5:32 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Stefan Beller, git, peff, torvalds Theodore Ts'o <tytso@mit.edu> writes: > The main issue is that non-expert users might not realize that they > really need to run "git fsck" after a crash; otherwise, what might > happen is that although git is only appending, that you might have > some zero-length (or partially written) git object or pack files, and > while you might not notice at that moment, it might come and bite you > later. Regarding loose object files, given that we write to a temporary, optionally fsync, close and then move to the final name, would we still see partially written file if we omit the fsync, or would the corruption be limited to either empty or missing? The reason I am wondering is because the codepath to create an object (i.e. "update-index --add", "hash-object -w", or "add") first checks if a packed or a loose object file _exists_ and if so bypasses writing the same thing anew, but the existence check for a loose object is to merely making sure that access(F_OK) (and optionally utime()) succeeds. If the potential breakage is limited to truncation to empty, then we could replace it with stat(2) and st.st_size check, as no loose object file can be empty. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Enable core.fsyncObjectFiles by default 2015-06-24 5:32 ` Junio C Hamano @ 2015-06-24 14:30 ` Theodore Ts'o 0 siblings, 0 replies; 37+ messages in thread From: Theodore Ts'o @ 2015-06-24 14:30 UTC (permalink / raw) To: Junio C Hamano; +Cc: Stefan Beller, git, peff, torvalds On Tue, Jun 23, 2015 at 10:32:08PM -0700, Junio C Hamano wrote: > > Regarding loose object files, given that we write to a temporary, > optionally fsync, close and then move to the final name, would we > still see partially written file if we omit the fsync, or would the > corruption be limited to either empty or missing? *Most* of the time the corruption will be an empty or missing file. It's possible that the file could be partially written. This is a relatively low-probability event, with the probability going up if the object file is large, and/or if the system is under memory pressure. > The reason I am wondering is because the codepath to create an > object (i.e. "update-index --add", "hash-object -w", or "add") first > checks if a packed or a loose object file _exists_ and if so > bypasses writing the same thing anew, but the existence check for a > loose object is to merely making sure that access(F_OK) (and > optionally utime()) succeeds. If the potential breakage is limited > to truncation to empty, then we could replace it with stat(2) and > st.st_size check, as no loose object file can be empty. It would certainly be a good thing to do a st_size check; it can't possible hurt, and it will catch a large number of failures after a power failure. I could also imagine some hueristics that force an fsync if the object file is larger than a certain size (say, 4k if you are very paranoid, a few hundred kilobytes if you are less so), but past a certain point, it might be better just to tell the user to use fsyncObjectFiles and be done with it. - Ted ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Enable core.fsyncObjectFiles by default 2015-06-23 21:57 [PATCH] Enable core.fsyncObjectFiles by default Stefan Beller 2015-06-23 22:21 ` Junio C Hamano @ 2015-06-24 1:07 ` Duy Nguyen 2015-06-24 3:37 ` Jeff King 2 siblings, 0 replies; 37+ messages in thread From: Duy Nguyen @ 2015-06-24 1:07 UTC (permalink / raw) To: Stefan Beller; +Cc: Git Mailing List, Junio C Hamano, Jeff King, Linus Torvalds On Wed, Jun 24, 2015 at 4:57 AM, Stefan Beller <sbeller@google.com> wrote: > Linus Torvalds started a discussion[1] if we want to play rather safe > than use defaults which make sense only for the most power users of Git: > >> So git is "safe" in the sense that you won't really lose any data, >> but you may well be inconvenienced. The "fsync each object" config >> option is there in case you don't want that inconvenience, but it >> should be noted that it can make for a hell of a performance impact. > >> Of course, it might well be the case that the actual default >> might be worth turning around. Most git users probably don't >> care about that kind of "apply two hundred patches from Andrew >> Morton" kind of workload, although "rebase a big patch-series" >> does end up doing basically the same thing, and might be more >> common. > > This patch enables fsync_object_files by default. Will this make nfs performance a lot worse or still within acceptable range? -- Duy ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Enable core.fsyncObjectFiles by default 2015-06-23 21:57 [PATCH] Enable core.fsyncObjectFiles by default Stefan Beller 2015-06-23 22:21 ` Junio C Hamano 2015-06-24 1:07 ` Duy Nguyen @ 2015-06-24 3:37 ` Jeff King 2015-06-24 5:20 ` Junio C Hamano 2 siblings, 1 reply; 37+ messages in thread From: Jeff King @ 2015-06-24 3:37 UTC (permalink / raw) To: Stefan Beller; +Cc: Theodore Ts'o, git, gitster, torvalds On Tue, Jun 23, 2015 at 02:57:23PM -0700, Stefan Beller wrote: > Linus Torvalds started a discussion[1] if we want to play rather safe > than use defaults which make sense only for the most power users of Git: > > > So git is "safe" in the sense that you won't really lose any data, > > but you may well be inconvenienced. The "fsync each object" config > > option is there in case you don't want that inconvenience, but it > > should be noted that it can make for a hell of a performance impact. > > > Of course, it might well be the case that the actual default > > might be worth turning around. Most git users probably don't > > care about that kind of "apply two hundred patches from Andrew > > Morton" kind of workload, although "rebase a big patch-series" > > does end up doing basically the same thing, and might be more > > common. > > This patch enables fsync_object_files by default. If you are looking for safety out of the box, I think this falls far short, as we do not fsync all of the other files. For instance, we do not fsync refs before they are written (nor anything else that uses the commit_lock_file() interface to rename, such as the index). We do always fsync packfiles and their indices. I had always assumed this was fine on ext4 with data=ordered (i.e., either the rename and its pointed-to content will go through, or not; so you either get your update or the old state, but not a garbage or empty file). But it sounds from what Ted wrote in: http://article.gmane.org/gmane.linux.file-systems/97255 that this may not be the case. If it's not, I think we should consider fsyncing ref writes. -Peff ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] Enable core.fsyncObjectFiles by default 2015-06-24 3:37 ` Jeff King @ 2015-06-24 5:20 ` Junio C Hamano 0 siblings, 0 replies; 37+ messages in thread From: Junio C Hamano @ 2015-06-24 5:20 UTC (permalink / raw) To: Jeff King; +Cc: Stefan Beller, Theodore Ts'o, git, torvalds Jeff King <peff@peff.net> writes: > I had always assumed this was fine on ext4 with data=ordered (i.e., > either the rename and its pointed-to content will go through, or not; so > you either get your update or the old state, but not a garbage or empty > file). But it sounds from what Ted wrote in: > > http://article.gmane.org/gmane.linux.file-systems/97255 > > that this may not be the case. If it's not, I think we should consider > fsyncing ref writes. That matches my understanding. IIRC, we do fsync (without a knob to turn it off) packs, we do not fsync refs (as you described above), the index, or working tree files (via write_entry()). Among these: * If we are so paranoid that loss of new loose objects matter (as opposed to "Heh, it is just the matter of 'git add' again") to cause us to set core.fsyncObjectFiles, we should definitely fsync ref writes. They are even more precious. * The index is much less precious. In the worst case, you can 'git reset' (no flags) and re-add from the working tree and nothing unrecoverable is lost. I do not mind a knob to force us fsync, but we may want it to be separate from core.fsyncObjectFiles. * The working tree files? I am not sure. Again, recovery is just to run "git diff" to notice what was eaten by the power failure and then "git checkout $path" to restore from a known state, so... ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH] enable core.fsyncObjectFiles by default @ 2018-01-17 18:48 Christoph Hellwig 2018-01-17 19:04 ` Junio C Hamano 2018-01-17 20:55 ` Jeff King 0 siblings, 2 replies; 37+ messages in thread From: Christoph Hellwig @ 2018-01-17 18:48 UTC (permalink / raw) To: git; +Cc: linux-fsdevel fsync is required for data integrity as there is no gurantee that data makes it to disk at any specified time without it. Even for ext3 with data=ordered mode the file system will only commit all data at some point in time that is not guaranteed. I've lost data on development machines with various times countless times due to the lack of this option, and now lost trees on a git server with ext4 as well yesterday. It's time to make git safe by default. Signed-off-by: Christoph Hellwig <hch@lst.de> --- Documentation/config.txt | 6 ++---- environment.c | 2 +- 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 0e25b2c92..9a1cec5c8 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -866,10 +866,8 @@ core.whitespace:: core.fsyncObjectFiles:: This boolean will enable 'fsync()' when writing object files. + -This is a total waste of time and effort on a filesystem that orders -data writes properly, but can be useful for filesystems that do not use -journalling (traditional UNIX filesystems) or that only journal metadata -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback"). +This option is enabled by default and ensures actual data integrity +by calling fsync after writing object files. core.preloadIndex:: Enable parallel index preload for operations like 'git diff' diff --git a/environment.c b/environment.c index 63ac38a46..c74375b5e 100644 --- a/environment.c +++ b/environment.c @@ -36,7 +36,7 @@ const char *git_hooks_path; int zlib_compression_level = Z_BEST_SPEED; int core_compression_level; int pack_compression_level = Z_DEFAULT_COMPRESSION; -int fsync_object_files; +int fsync_object_files = 1; size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE; size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT; size_t delta_base_cache_limit = 96 * 1024 * 1024; -- 2.14.2 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-17 18:48 [PATCH] enable " Christoph Hellwig @ 2018-01-17 19:04 ` Junio C Hamano 2018-01-17 19:35 ` Christoph Hellwig ` (2 more replies) 2018-01-17 20:55 ` Jeff King 1 sibling, 3 replies; 37+ messages in thread From: Junio C Hamano @ 2018-01-17 19:04 UTC (permalink / raw) To: Christoph Hellwig; +Cc: git, linux-fsdevel Christoph Hellwig <hch@lst.de> writes: > fsync is required for data integrity as there is no gurantee that > data makes it to disk at any specified time without it. Even for > ext3 with data=ordered mode the file system will only commit all > data at some point in time that is not guaranteed. It comes from this one: commit aafe9fbaf4f1d1f27a6f6e3eb3e246fff81240ef Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Wed Jun 18 15:18:44 2008 -0700 Add config option to enable 'fsync()' of object files As explained in the documentation[*] this is totally useless on filesystems that do ordered/journalled data writes, but it can be a useful safety feature on filesystems like HFS+ that only journal the metadata, not the actual file contents. It defaults to off, although we could presumably in theory some day auto-enable it on a per-filesystem basis. [*] Yes, I updated the docs for the thing. Hell really _has_ frozen over, and the four horsemen are probably just beyond the horizon. EVERYBODY PANIC! > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 0e25b2c92..9a1cec5c8 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -866,10 +866,8 @@ core.whitespace:: > core.fsyncObjectFiles:: > This boolean will enable 'fsync()' when writing object files. > + > -This is a total waste of time and effort on a filesystem that orders > -data writes properly, but can be useful for filesystems that do not use > -journalling (traditional UNIX filesystems) or that only journal metadata > -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback"). > +This option is enabled by default and ensures actual data integrity > +by calling fsync after writing object files. I am somewhat sympathetic to the desire to flip the default to "safe" and allow those who know they are already safe to tweak the knob for performance, and it also makes sense to document that the default is "true" here. But I do not see the point of removing the four lines from this paragraph; the sole effect of the removal is to rob information from readers that they can use to decide if they want to disable the configuration, no? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-17 19:04 ` Junio C Hamano @ 2018-01-17 19:35 ` Christoph Hellwig 2018-01-17 20:05 ` Andreas Schwab 2018-01-17 19:37 ` Matthew Wilcox 2018-01-17 21:44 ` Ævar Arnfjörð Bjarmason 2 siblings, 1 reply; 37+ messages in thread From: Christoph Hellwig @ 2018-01-17 19:35 UTC (permalink / raw) To: Junio C Hamano; +Cc: Christoph Hellwig, git, linux-fsdevel On Wed, Jan 17, 2018 at 11:04:32AM -0800, Junio C Hamano wrote: > I am somewhat sympathetic to the desire to flip the default to > "safe" and allow those who know they are already safe to tweak the > knob for performance, and it also makes sense to document that the > default is "true" here. But I do not see the point of removing the > four lines from this paragraph; the sole effect of the removal is to > rob information from readers that they can use to decide if they > want to disable the configuration, no? Does this sound any better? It is a little to technical for my taste, but I couldn't come up with anything better: --- From ab8f2d38dfe40e74de5399af0d069427c7473b76 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig <hch@lst.de> Date: Wed, 17 Jan 2018 19:42:46 +0100 Subject: enable core.fsyncObjectFiles by default fsync is required for data integrity as there is no gurantee that data makes it to disk at any specified time without it. Even for ext3 with data=ordered mode the file system will only commit all data at some point in time that is not guaranteed. I've lost data on development machines with various times countless times due to the lack of this option, and now lost trees on a git server with ext4 as well yesterday. It's time to make git safe by default. Signed-off-by: Christoph Hellwig <hch@lst.de> --- Documentation/config.txt | 11 +++++++---- environment.c | 2 +- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 0e25b2c92..8b99f1389 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -866,10 +866,13 @@ core.whitespace:: core.fsyncObjectFiles:: This boolean will enable 'fsync()' when writing object files. + -This is a total waste of time and effort on a filesystem that orders -data writes properly, but can be useful for filesystems that do not use -journalling (traditional UNIX filesystems) or that only journal metadata -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback"). +This option is enabled by default and ensures actual data integrity +by calling fsync after writing object files. + +Note that this might be really slow on ext3 in the traditional +data=ordered mode, in which case you might want to disable this option. +ext3 in data=ordered mode will order the actual data writeout with +metadata operation, but not actually guarantee data integrity either. core.preloadIndex:: Enable parallel index preload for operations like 'git diff' diff --git a/environment.c b/environment.c index 63ac38a46..c74375b5e 100644 --- a/environment.c +++ b/environment.c @@ -36,7 +36,7 @@ const char *git_hooks_path; int zlib_compression_level = Z_BEST_SPEED; int core_compression_level; int pack_compression_level = Z_DEFAULT_COMPRESSION; -int fsync_object_files; +int fsync_object_files = 1; size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE; size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT; size_t delta_base_cache_limit = 96 * 1024 * 1024; -- 2.14.2 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-17 19:35 ` Christoph Hellwig @ 2018-01-17 20:05 ` Andreas Schwab 0 siblings, 0 replies; 37+ messages in thread From: Andreas Schwab @ 2018-01-17 20:05 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Junio C Hamano, git, linux-fsdevel On Jan 17 2018, Christoph Hellwig <hch@lst.de> wrote: > I've lost data on development machines with various times countless > times due to the lack of this option, and now lost trees on a Too many times. :-) Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-17 19:04 ` Junio C Hamano 2018-01-17 19:35 ` Christoph Hellwig @ 2018-01-17 19:37 ` Matthew Wilcox 2018-01-17 19:42 ` Christoph Hellwig 2018-01-17 21:44 ` Ævar Arnfjörð Bjarmason 2 siblings, 1 reply; 37+ messages in thread From: Matthew Wilcox @ 2018-01-17 19:37 UTC (permalink / raw) To: Junio C Hamano; +Cc: Christoph Hellwig, git, linux-fsdevel On Wed, Jan 17, 2018 at 11:04:32AM -0800, Junio C Hamano wrote: > Christoph Hellwig <hch@lst.de> writes: > > @@ -866,10 +866,8 @@ core.whitespace:: > > core.fsyncObjectFiles:: > > This boolean will enable 'fsync()' when writing object files. > > + > > -This is a total waste of time and effort on a filesystem that orders > > -data writes properly, but can be useful for filesystems that do not use > > -journalling (traditional UNIX filesystems) or that only journal metadata > > -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback"). > > +This option is enabled by default and ensures actual data integrity > > +by calling fsync after writing object files. > > I am somewhat sympathetic to the desire to flip the default to > "safe" and allow those who know they are already safe to tweak the > knob for performance, and it also makes sense to document that the > default is "true" here. But I do not see the point of removing the > four lines from this paragraph; the sole effect of the removal is to > rob information from readers that they can use to decide if they > want to disable the configuration, no? How about this instead? This option is enabled by default and ensures data integrity by calling fsync after writing object files. It is not necessary on filesystems which journal data writes, but is still necessary on filesystems which do not use journalling (ext2), or that only journal metadata writes (OS X's HFS+, or Linux's ext4 with "data=writeback"). Turning this option off will increase performance at the possible risk of data loss. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-17 19:37 ` Matthew Wilcox @ 2018-01-17 19:42 ` Christoph Hellwig 0 siblings, 0 replies; 37+ messages in thread From: Christoph Hellwig @ 2018-01-17 19:42 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Junio C Hamano, Christoph Hellwig, git, linux-fsdevel On Wed, Jan 17, 2018 at 11:37:31AM -0800, Matthew Wilcox wrote: > How about this instead? > > This option is enabled by default and ensures data integrity by calling > fsync after writing object files. It is not necessary on filesystems > which journal data writes, but is still necessary on filesystems which > do not use journalling (ext2), or that only journal metadata writes > (OS X's HFS+, or Linux's ext4 with "data=writeback"). Turning this > option off will increase performance at the possible risk of data loss. I think this goes entirely into the wrong direction. The point is fsync is always the right thing to do. But on ext3 (and ext3 only) the right thing is way too painful, and conveniently ext3 happens to be almost ok without it. So if anything should get a special mention it is ext3. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-17 19:04 ` Junio C Hamano 2018-01-17 19:35 ` Christoph Hellwig 2018-01-17 19:37 ` Matthew Wilcox @ 2018-01-17 21:44 ` Ævar Arnfjörð Bjarmason 2018-01-17 22:07 ` Linus Torvalds 2 siblings, 1 reply; 37+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-01-17 21:44 UTC (permalink / raw) To: Junio C Hamano; +Cc: Christoph Hellwig, git, linux-fsdevel, Linus Torvalds On Wed, Jan 17 2018, Junio C. Hamano jotted: > Christoph Hellwig <hch@lst.de> writes: > >> fsync is required for data integrity as there is no gurantee that >> data makes it to disk at any specified time without it. Even for >> ext3 with data=ordered mode the file system will only commit all >> data at some point in time that is not guaranteed. > > It comes from this one: > > commit aafe9fbaf4f1d1f27a6f6e3eb3e246fff81240ef > Author: Linus Torvalds <torvalds@linux-foundation.org> > Date: Wed Jun 18 15:18:44 2008 -0700 > > Add config option to enable 'fsync()' of object files > > As explained in the documentation[*] this is totally useless on > filesystems that do ordered/journalled data writes, but it can be a > useful safety feature on filesystems like HFS+ that only journal the > metadata, not the actual file contents. > > It defaults to off, although we could presumably in theory some day > auto-enable it on a per-filesystem basis. > > [*] Yes, I updated the docs for the thing. Hell really _has_ frozen > over, and the four horsemen are probably just beyond the horizon. > EVERYBODY PANIC! > >> diff --git a/Documentation/config.txt b/Documentation/config.txt >> index 0e25b2c92..9a1cec5c8 100644 >> --- a/Documentation/config.txt >> +++ b/Documentation/config.txt >> @@ -866,10 +866,8 @@ core.whitespace:: >> core.fsyncObjectFiles:: >> This boolean will enable 'fsync()' when writing object files. >> + >> -This is a total waste of time and effort on a filesystem that orders >> -data writes properly, but can be useful for filesystems that do not use >> -journalling (traditional UNIX filesystems) or that only journal metadata >> -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback"). >> +This option is enabled by default and ensures actual data integrity >> +by calling fsync after writing object files. > > I am somewhat sympathetic to the desire to flip the default to > "safe" and allow those who know they are already safe to tweak the > knob for performance, and it also makes sense to document that the > default is "true" here. But I do not see the point of removing the > four lines from this paragraph; the sole effect of the removal is to > rob information from readers that they can use to decide if they > want to disable the configuration, no? [CC'd the author of the current behavior] Some points/questions: a) Is there some reliable way to test whether this is needed from userspace? I'm thinking something like `git update-index --test-untracked-cache` but for fsync(). b) On the filesystems that don't need this, what's the performance impact? I ran a small test myself on CentOS 7 (3.10) with ext4 data=ordered on the tests I thought might do a lot of loose object writes: $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS="NO_OPENSSL=Y CFLAGS=-O3 -j56" ./run origin/master fsync-on~ fsync-on p3400-rebase.sh p0007-write-cache.sh [...] Test fsync-on~ fsync-on ------------------------------------------------------------------------------------------------------- 3400.2: rebase on top of a lot of unrelated changes 1.45(1.30+0.17) 1.45(1.28+0.20) +0.0% 3400.4: rebase a lot of unrelated changes without split-index 4.34(3.71+0.66) 4.33(3.69+0.66) -0.2% 3400.6: rebase a lot of unrelated changes with split-index 3.38(2.94+0.47) 3.38(2.93+0.47) +0.0% 0007.2: write_locked_index 3 times (3214 files) 0.01(0.00+0.00) 0.01(0.00+0.00) +0.0% No impact. However I did my own test of running the test suite 10% times with/without this patch, and it runs 9% slower: fsync-off: avg:21.59 21.50 21.50 21.52 21.53 21.54 21.57 21.59 21.61 21.63 21.95 fsync-on: avg:23.43 23.21 23.25 23.26 23.26 23.27 23.32 23.49 23.51 23.83 23.88 Test script at the end of this E-Mail. c) What sort of guarantees in this regard do NFS-mounted filesystems commonly make? Test script: use v5.10.0; use strict; use warnings; use Time::HiRes qw(time); use List::Util qw(sum); use Data::Dumper; my %time; for my $ref (@ARGV) { system "git checkout $ref"; system qq[make -j56 CFLAGS="-O3 -g" NO_OPENSSL=Y all]; for (1..10) { my $t0 = -time(); system "(cd t && NO_SVN_TESTS=1 GIT_TEST_HTTPD=0 prove -j56 --state=slow,save t[0-9]*.sh)"; $t0 += time(); push @{$time{$ref}} => $t0; } } for my $ref (sort keys %time) { printf "%20s: avg:%.2f %s\n", $ref, sum(@{$time{$ref}})/@{$time{$ref}}, join(" ", map { sprintf "%.02f", $_ } sort { $a <=> $b } @{$time{$ref}}); } ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-17 21:44 ` Ævar Arnfjörð Bjarmason @ 2018-01-17 22:07 ` Linus Torvalds 2018-01-17 22:25 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 37+ messages in thread From: Linus Torvalds @ 2018-01-17 22:07 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Junio C Hamano, Christoph Hellwig, Git Mailing List, linux-fsdevel On Wed, Jan 17, 2018 at 1:44 PM, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > I ran a small test myself on CentOS 7 (3.10) with ext4 data=ordered > on the tests I thought might do a lot of loose object writes: > > $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS="NO_OPENSSL=Y CFLAGS=-O3 -j56" ./run origin/master fsync-on~ fsync-on p3400-rebase.sh p0007-write-cache.sh > [...] > Test fsync-on~ fsync-on > ------------------------------------------------------------------------------------------------------- > 3400.2: rebase on top of a lot of unrelated changes 1.45(1.30+0.17) 1.45(1.28+0.20) +0.0% > 3400.4: rebase a lot of unrelated changes without split-index 4.34(3.71+0.66) 4.33(3.69+0.66) -0.2% > 3400.6: rebase a lot of unrelated changes with split-index 3.38(2.94+0.47) 3.38(2.93+0.47) +0.0% > 0007.2: write_locked_index 3 times (3214 files) 0.01(0.00+0.00) 0.01(0.00+0.00) +0.0% > > No impact. However I did my own test of running the test suite 10% > times with/without this patch, and it runs 9% slower: > > fsync-off: avg:21.59 21.50 21.50 21.52 21.53 21.54 21.57 21.59 21.61 21.63 21.95 > fsync-on: avg:23.43 23.21 23.25 23.26 23.26 23.27 23.32 23.49 23.51 23.83 23.88 That's not the thing you should check. Now re-do the test while another process writes to a totally unrelated a huge file (say, do a ISO file copy or something). That was the thing that several filesystems get completely and horribly wrong. Generally _particularly_ the logging filesystems that don't even need the fsync, because they use a single log for everything (so fsync serializes all the writes, not just the writes to the one file it's fsync'ing). The original git design was very much to write each object file without any syncing, because they don't matter since a new object file - by definition - isn't really reachable. Then sync before writing the index file or a new ref. But things have changed, I'm not arguing that the code shouldn't be made safe by default. I personally refuse to use rotating media on my machines anyway, largely exactly because of the fsync() issue (things like "firefox" started doing fsync on the mysql database for stupid things, and you'd get huge pauses). But I do think your benchmark is wrong. The case where only git does something is not interesting or relevant. It really is "git does something _and_ somebody else writes something entirely unrelated at the same time" that matters. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-17 22:07 ` Linus Torvalds @ 2018-01-17 22:25 ` Linus Torvalds 2018-01-17 23:16 ` Ævar Arnfjörð Bjarmason 2018-01-17 23:52 ` Theodore Ts'o 2 siblings, 0 replies; 37+ messages in thread From: Linus Torvalds @ 2018-01-17 22:25 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Junio C Hamano, Christoph Hellwig, Git Mailing List, linux-fsdevel On Wed, Jan 17, 2018 at 2:07 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > The original git design was very much to write each object file > without any syncing, because they don't matter since a new object file > - by definition - isn't really reachable. Then sync before writing the > index file or a new ref. .. actually, I think it originally sync'ed only before starting to actually remove objects due to repacking. The theory was that you can lose the last commit series or whatever, and have to git-fsck, and have to re-do it, but you could never lose long-term work. If your machine crashes, you still remember what you did just before the crash. That theory may have been more correct back in the days than it is now. People who use git might be less willing to treat it like a filesystem that you can fsck than I was back 10+ ago. It's worth noting that the commit that Junio pointed to (from 2008) didn't actually change any behavior. It just allowed people who cared to change behavior. The original "let's not waste time on fsync every object write, because we can just re-create the state anyway" behavior goes back to 2005. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-17 22:07 ` Linus Torvalds 2018-01-17 22:25 ` Linus Torvalds @ 2018-01-17 23:16 ` Ævar Arnfjörð Bjarmason 2018-01-17 23:42 ` Linus Torvalds 2018-01-17 23:52 ` Theodore Ts'o 2 siblings, 1 reply; 37+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-01-17 23:16 UTC (permalink / raw) To: Linus Torvalds Cc: Junio C Hamano, Christoph Hellwig, Git Mailing List, linux-fsdevel, Roberto Tyley On Wed, Jan 17 2018, Linus Torvalds jotted: > On Wed, Jan 17, 2018 at 1:44 PM, Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> >> I ran a small test myself on CentOS 7 (3.10) with ext4 data=ordered >> on the tests I thought might do a lot of loose object writes: >> >> $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS="NO_OPENSSL=Y CFLAGS=-O3 -j56" ./run origin/master fsync-on~ fsync-on p3400-rebase.sh p0007-write-cache.sh >> [...] >> Test fsync-on~ fsync-on >> ------------------------------------------------------------------------------------------------------- >> 3400.2: rebase on top of a lot of unrelated changes 1.45(1.30+0.17) 1.45(1.28+0.20) +0.0% >> 3400.4: rebase a lot of unrelated changes without split-index 4.34(3.71+0.66) 4.33(3.69+0.66) -0.2% >> 3400.6: rebase a lot of unrelated changes with split-index 3.38(2.94+0.47) 3.38(2.93+0.47) +0.0% >> 0007.2: write_locked_index 3 times (3214 files) 0.01(0.00+0.00) 0.01(0.00+0.00) +0.0% >> >> No impact. However I did my own test of running the test suite 10% >> times with/without this patch, and it runs 9% slower: That should be "10 times" b.t.w., not "10% times" >> >> fsync-off: avg:21.59 21.50 21.50 21.52 21.53 21.54 21.57 21.59 21.61 21.63 21.95 >> fsync-on: avg:23.43 23.21 23.25 23.26 23.26 23.27 23.32 23.49 23.51 23.83 23.88 > > That's not the thing you should check. > > Now re-do the test while another process writes to a totally unrelated > a huge file (say, do a ISO file copy or something). > > That was the thing that several filesystems get completely and > horribly wrong. Generally _particularly_ the logging filesystems that > don't even need the fsync, because they use a single log for > everything (so fsync serializes all the writes, not just the writes to > the one file it's fsync'ing). > > The original git design was very much to write each object file > without any syncing, because they don't matter since a new object file > - by definition - isn't really reachable. Then sync before writing the > index file or a new ref. >> > But things have changed, I'm not arguing that the code shouldn't be > made safe by default. I personally refuse to use rotating media on my > machines anyway, largely exactly because of the fsync() issue (things > like "firefox" started doing fsync on the mysql database for stupid > things, and you'd get huge pauses). > > But I do think your benchmark is wrong. The case where only git does > something is not interesting or relevant. It really is "git does > something _and_ somebody else writes something entirely unrelated at > the same time" that matters. Yeah it's shitty, just a quick hack to get some since there was a discussion of performance, but neither your original patch or this thread had quoted any. One thing you may have missed is that this is a parallel (56 tests at a time) run of the full test suite. So there's plenty of other git processes (and test setup/teardown) racing with any given git process. Running the test suite in a loop like this gives me ~100K IO ops/s & ~50% disk utilization. Or does overall FS activity and raw throughput (e.g. with an ISO copy) matter more than general FS contention? Tweaking it to emulate this iso copy case, running another test with one of these running concurrently: # setup dd if=/dev/urandom of=/tmp/fake.iso bs=1024 count=$((1000*1024)) # run in a loop (shuf to not always write the same thing) while sleep 0.1; do shuf /tmp/fake.iso | pv >/tmp/fake.shuf.iso; done Gives throughput that spikes to 100% (not consistently) and: fsync-off: avg:36.37 31.74 33.83 35.12 36.19 36.32 37.04 37.34 37.71 37.93 40.43 fsync-on: avg:38.09 34.56 35.14 35.69 36.41 36.41 37.96 38.25 40.45 41.44 44.59 ~4.7% slower, v.s. ~8.5% in my earlier 87h8rki2iu.fsf@evledraar.gmail.com without that running. Which is not an argument for / against this patch, but those numbers seem significant, and generally if the entire test suite slows down by that much there's going to be sub-parts of it that are much worse. Which might be a reason to tread more carefully and if it *does* slow things down perhaps do it with more granularity, e.g. turning it on in git-receive-pack might be more sensible than in git-filter-branch. I remember Roberto Tyley's BFG writing an amazing amount of loose objects, but it doesn't seem to have an fsync() option, I wonder if adding one would be a representative pathological test case. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-17 23:16 ` Ævar Arnfjörð Bjarmason @ 2018-01-17 23:42 ` Linus Torvalds 0 siblings, 0 replies; 37+ messages in thread From: Linus Torvalds @ 2018-01-17 23:42 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Junio C Hamano, Christoph Hellwig, Git Mailing List, linux-fsdevel, Roberto Tyley On Wed, Jan 17, 2018 at 3:16 PM, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > Or does overall FS activity and raw throughput (e.g. with an ISO copy) > matter more than general FS contention? Traditionally, yes. Also note that none of this is about "throughput". It's about waiting for a second or two when you do a simple "git commit" or similar. Also, hardware has changed. I haven't used a rotational disk in about 10 years now, and I don't worry about latencies so much more. The fact that you get ~100k iops indicates that you probably aren't using those stupid platters of rust either. So I doubt you can even trigger the bad cases. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-17 22:07 ` Linus Torvalds 2018-01-17 22:25 ` Linus Torvalds 2018-01-17 23:16 ` Ævar Arnfjörð Bjarmason @ 2018-01-17 23:52 ` Theodore Ts'o 2018-01-17 23:57 ` Linus Torvalds 2020-09-17 11:06 ` Ævar Arnfjörð Bjarmason 2 siblings, 2 replies; 37+ messages in thread From: Theodore Ts'o @ 2018-01-17 23:52 UTC (permalink / raw) To: Linus Torvalds Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Christoph Hellwig, Git Mailing List, linux-fsdevel On Wed, Jan 17, 2018 at 02:07:22PM -0800, Linus Torvalds wrote: > > Now re-do the test while another process writes to a totally unrelated > a huge file (say, do a ISO file copy or something). > > That was the thing that several filesystems get completely and > horribly wrong. Generally _particularly_ the logging filesystems that > don't even need the fsync, because they use a single log for > everything (so fsync serializes all the writes, not just the writes to > the one file it's fsync'ing). Well, let's be fair; this is something *ext3* got wrong, and it was the default file system back them. All of the modern file systems now do delayed allocation, which means that an fsync of one file doesn't actually imply an fsync of another file. Hence... > The original git design was very much to write each object file > without any syncing, because they don't matter since a new object file > - by definition - isn't really reachable. Then sync before writing the > index file or a new ref. This isn't really safe any more. Yes, there's a single log. But files which are subject to delayed allocation are in the page cache, and just because you fsync the index file doesn't mean that the object file is now written to disk. It was true for ext3, but it's not true for ext4, xfs, btrfs, etc. The good news is that if you have another process downloading a huge ISO image, the fsync of the index file won't force the ISO file to be written out. The bad news is that it won't force out the other git object files, either. Now, there is a potential downside of fsync'ing each object file, and that is the cost of doing a CACHE FLUSH on a HDD is non-trivial, and even on a SSD, it's not optimal to call CACHE FLUSH thousands of times in a second. So if you are creating thousands of tiny files, and you fsync each one, each fsync(2) call is a serializing instruction, which means it won't return until that one file is written to disk. If you are writing lots of small files, and you are using a HDD, you'll be bottlenecked to around 30 files per second on a 5400 RPM HDD, and this is true regardless of what file system you use, because the bottle neck is the CACHE FLUSH operation, and how you organize the metadata and how you do the block allocation, is largely lost in the noise compared to the CACHE FLUSH command, which serializes everything. There are solutions to this; you could simply not call fsync(2) a thousand times, and instead write a pack file, and call fsync once on the pack file. That's probably the smartest approach. You could also create a thousand threads, and call fsync(2) on those thousand threads at roughly the same time. Or you could use a bleeding edge kernel with the latest AIO patch, and use the newly added IOCB_CMD_FSYNC support. But I'd simply recommend writing a pack and fsync'ing the pack, instead of trying to write a gazillion object files. (git-repack -A, I'm looking at you....) - Ted ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-17 23:52 ` Theodore Ts'o @ 2018-01-17 23:57 ` Linus Torvalds 2018-01-18 16:27 ` Christoph Hellwig 2020-09-17 11:06 ` Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 37+ messages in thread From: Linus Torvalds @ 2018-01-17 23:57 UTC (permalink / raw) To: Theodore Ts'o Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Christoph Hellwig, Git Mailing List, linux-fsdevel On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o <tytso@mit.edu> wrote: > > Well, let's be fair; this is something *ext3* got wrong, and it was > the default file system back them. I'm pretty sure reiserfs and btrfs did too.. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-17 23:57 ` Linus Torvalds @ 2018-01-18 16:27 ` Christoph Hellwig 2018-01-19 19:08 ` Junio C Hamano 2018-01-21 21:32 ` Chris Mason 0 siblings, 2 replies; 37+ messages in thread From: Christoph Hellwig @ 2018-01-18 16:27 UTC (permalink / raw) To: Linus Torvalds Cc: Theodore Ts'o, Ævar Arnfjörð Bjarmason, Junio C Hamano, Christoph Hellwig, Git Mailing List, linux-fsdevel, Chris Mason [adding Chris to the Cc list - this is about the awful ext3 data=ordered behavior of syncing the whole file system data and metadata on each fsync] On Wed, Jan 17, 2018 at 03:57:53PM -0800, Linus Torvalds wrote: > On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o <tytso@mit.edu> wrote: > > > > Well, let's be fair; this is something *ext3* got wrong, and it was > > the default file system back them. > > I'm pretty sure reiserfs and btrfs did too.. I'm pretty sure btrfs never did, and reiserfs at least looks like it currently doesn't but I'd have to dig into history to check if it ever did. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-18 16:27 ` Christoph Hellwig @ 2018-01-19 19:08 ` Junio C Hamano 2018-01-20 22:14 ` Theodore Ts'o 2018-01-21 21:32 ` Chris Mason 1 sibling, 1 reply; 37+ messages in thread From: Junio C Hamano @ 2018-01-19 19:08 UTC (permalink / raw) To: Christoph Hellwig Cc: Linus Torvalds, Theodore Ts'o, Ævar Arnfjörð Bjarmason, Git Mailing List, linux-fsdevel, Chris Mason Christoph Hellwig <hch@lst.de> writes: > [adding Chris to the Cc list - this is about the awful ext3 data=ordered > behavior of syncing the whole file system data and metadata on each > fsync] > > On Wed, Jan 17, 2018 at 03:57:53PM -0800, Linus Torvalds wrote: >> On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o <tytso@mit.edu> wrote: >> > >> > Well, let's be fair; this is something *ext3* got wrong, and it was >> > the default file system back them. >> >> I'm pretty sure reiserfs and btrfs did too.. > > I'm pretty sure btrfs never did, and reiserfs at least looks like > it currently doesn't but I'd have to dig into history to check if > it ever did. So..., is it fair to say that the one you sent in https://public-inbox.org/git/20180117193510.GA30657@lst.de/ is the best variant we have seen in this thread so far? I'll keep that in my inbox so that I do not forget, but I think we would want to deal with a hotfix for 2.16 on case insensitive platforms before this topic. Thanks. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-19 19:08 ` Junio C Hamano @ 2018-01-20 22:14 ` Theodore Ts'o 2018-01-20 22:27 ` Junio C Hamano 0 siblings, 1 reply; 37+ messages in thread From: Theodore Ts'o @ 2018-01-20 22:14 UTC (permalink / raw) To: Junio C Hamano Cc: Christoph Hellwig, Linus Torvalds, Ævar Arnfjörð Bjarmason, Git Mailing List, linux-fsdevel, Chris Mason On Fri, Jan 19, 2018 at 11:08:46AM -0800, Junio C Hamano wrote: > So..., is it fair to say that the one you sent in > > https://public-inbox.org/git/20180117193510.GA30657@lst.de/ > > is the best variant we have seen in this thread so far? I'll keep > that in my inbox so that I do not forget, but I think we would want > to deal with a hotfix for 2.16 on case insensitive platforms before > this topic. It's a simplistic fix, but it will work. There may very well be certain workloads which generate a large number of loose objects (e.g., git repack -A) which will make things go significantly more slowly as a result. It might very well be the case that if nothing else is going on, something like "write all the files without fsync(2), then use syncfs(2)" would be much faster. The downside with that approach is if indeed you were downloading a multi-gigabyte DVD image at the same time, the syncfs(2) will force a writeback of the partially writte DVD image, or some other unrelated files. But if the goal is to just change the default, and then see what shakes out, and then apply other optimizations later, that's certainly a valid result. I've never been fond of the "git repack -A" behavior where it can generate huge numbers of loose files. I'd much prefer it if the other objects ended up in a separate pack file, and then some other provision made for nuking that pack file some time later. But that's expanding the scope significantly over what's currently being discussed. - Ted ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-20 22:14 ` Theodore Ts'o @ 2018-01-20 22:27 ` Junio C Hamano 2018-01-22 15:09 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 37+ messages in thread From: Junio C Hamano @ 2018-01-20 22:27 UTC (permalink / raw) To: Theodore Ts'o Cc: Christoph Hellwig, Linus Torvalds, Ævar Arnfjörð Bjarmason, Git Mailing List, linux-fsdevel, Chris Mason, Jeff King Theodore Ts'o <tytso@mit.edu> writes: > .... I've never been fond of the "git repack -A" behavior > where it can generate huge numbers of loose files. I'd much prefer it > if the other objects ended up in a separate pack file, and then some > other provision made for nuking that pack file some time later.... Yes, a "cruft pack" that holds unreachable object has been discussed a few times recently on list, and I do agree that it is a desirable thing to have in the longer run. What's tricky is to devise a way to allow us to salvage objects that are placed in a cruft pack because they are accessed recently, proving themselves to be no longer crufts. It could be that a good way to resurrect them is to explode them to loose form when they are accessed out of a cruft pack. We need to worry about interactions with read-only users if we go that route, but with the current "explode unreachable to loose, touch their mtime when they are accessed" scheme ends up ignoring accesses from read-only users that cannot update mtime, so it might not be too bad. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-20 22:27 ` Junio C Hamano @ 2018-01-22 15:09 ` Ævar Arnfjörð Bjarmason 2018-01-22 18:09 ` Theodore Ts'o 2018-01-23 0:25 ` Jeff King 0 siblings, 2 replies; 37+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-01-22 15:09 UTC (permalink / raw) To: Junio C Hamano Cc: Theodore Ts'o, Christoph Hellwig, Linus Torvalds, Git Mailing List, linux-fsdevel, Chris Mason, Jeff King On Sat, Jan 20 2018, Junio C. Hamano jotted: > Theodore Ts'o <tytso@mit.edu> writes: > >> .... I've never been fond of the "git repack -A" behavior >> where it can generate huge numbers of loose files. I'd much prefer it >> if the other objects ended up in a separate pack file, and then some >> other provision made for nuking that pack file some time later.... > > Yes, a "cruft pack" that holds unreachable object has been discussed > a few times recently on list, and I do agree that it is a desirable > thing to have in the longer run. > > What's tricky is to devise a way to allow us to salvage objects that > are placed in a cruft pack because they are accessed recently, > proving themselves to be no longer crufts. It could be that a good > way to resurrect them is to explode them to loose form when they are > accessed out of a cruft pack. We need to worry about interactions > with read-only users if we go that route, but with the current > "explode unreachable to loose, touch their mtime when they are > accessed" scheme ends up ignoring accesses from read-only users that > cannot update mtime, so it might not be too bad. Wouldn't it also make gc pruning more expensive? Now you can repack regularly and loose objects will be left out of the pack, and then just rm'd, whereas now it would entail creating new packs (unless the whole pack was objects meant for removal). Probably still worth it, but something to keep in mind. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-22 15:09 ` Ævar Arnfjörð Bjarmason @ 2018-01-22 18:09 ` Theodore Ts'o 2018-01-23 0:47 ` Jeff King 2018-01-23 0:25 ` Jeff King 1 sibling, 1 reply; 37+ messages in thread From: Theodore Ts'o @ 2018-01-22 18:09 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Junio C Hamano, Christoph Hellwig, Linus Torvalds, Git Mailing List, linux-fsdevel, Chris Mason, Jeff King On Mon, Jan 22, 2018 at 04:09:23PM +0100, Ævar Arnfjörð Bjarmason wrote: > > What's tricky is to devise a way to allow us to salvage objects that > > are placed in a cruft pack because they are accessed recently, > > proving themselves to be no longer crufts. It could be that a good > > way to resurrect them is to explode them to loose form when they are > > accessed out of a cruft pack. We need to worry about interactions > > with read-only users if we go that route, but with the current > > "explode unreachable to loose, touch their mtime when they are > > accessed" scheme ends up ignoring accesses from read-only users that > > cannot update mtime, so it might not be too bad. > > Wouldn't it also make gc pruning more expensive? Now you can repack > regularly and loose objects will be left out of the pack, and then just > rm'd, whereas now it would entail creating new packs (unless the whole > pack was objects meant for removal). The idea is that the cruft pack would be all objects that were no longer referenced. Hence the proposal that if they ever *are* accessed, they would be exploded to a loose object at that point. So in the common case, the GC would go quickly since the entire pack could just be rm'ed once it hit the designated expiry time. Another way of doing things would be to use the mtime of the cruft pack for the expiry time, and if the curft pack is ever referenced, its mtime would get updated. Yet a third way would be to simply clear the "cruft" bit if it ever *is* referenced. In the common case, it would never be referenced, so it could just get deleted, but in the case where the user has manually "rescued" a set of commits (perhaps by explicitly setting a branch head to commit id found from a reflog), the objects would be saved. So there are many ways it could be managed. - Ted ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-22 18:09 ` Theodore Ts'o @ 2018-01-23 0:47 ` Jeff King 2018-01-23 5:45 ` Theodore Ts'o 0 siblings, 1 reply; 37+ messages in thread From: Jeff King @ 2018-01-23 0:47 UTC (permalink / raw) To: Theodore Ts'o Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Christoph Hellwig, Linus Torvalds, Git Mailing List, linux-fsdevel, Chris Mason On Mon, Jan 22, 2018 at 01:09:03PM -0500, Theodore Ts'o wrote: > > Wouldn't it also make gc pruning more expensive? Now you can repack > > regularly and loose objects will be left out of the pack, and then just > > rm'd, whereas now it would entail creating new packs (unless the whole > > pack was objects meant for removal). > > The idea is that the cruft pack would be all objects that were no > longer referenced. Hence the proposal that if they ever *are* > accessed, they would be exploded to a loose object at that point. So > in the common case, the GC would go quickly since the entire pack > could just be rm'ed once it hit the designated expiry time. I think Ævar is talking about the case of: 1. You make 100 objects that aren't referenced. They're loose. 2. You run git-gc. They're still too recent to be deleted. Right now those recent loose objects sit loose, and have zero cost at the time of gc. In a "cruft pack" world, you'd pay some I/O to copy them into the cruft pack, and some CPU to zlib and delta-compress them. I think that's probably fine, though. That said, some of what you wrote left me confused, and whether we're all talking about the same idea. ;) Let me describe the idea I had mentioned in another thread. Right now the behavior is basically this: If an unreachable object becomes referenced, it doesn't immediately get exploded. During the next gc, whatever new object referenced them would be one of: 1. Reachable from refs, in which case it carries along the formerly-cruft object into the new pack, since it is now also reachable. 2. Unreachable but still recent by mtime; we keep such objects, and anything they reference (now as unreachable, in this proposal in the cruft pack). Now these get either left loose, or exploded loose if they were previously packed. 3. Unreachable and old. Both objects can be dropped totally. The current strategy is to use the mtimes for "recent", and we use the pack's mtime for every object in the pack. So if we pack all the loose objects into a cruft pack, the mtime of the cruft pack becomes the new gauge for "recent". And if we migrate objects from old cruft pack to new cruft pack at each gc, then they'll keep getting their mtimes refreshed, and we'll never drop them. So we need to either: - keep per-object mtimes, so that old ones can age out (i.e., they'd hit case 3 and just not get migrated to either the new "real" pack or the new cruft pack). - keep multiple cruft packs, and let whole packs age out. But then cruft objects which get referenced again by other cruft have to get copied (not moved!) to new packs. That _probably_ doesn't happen all that often, so it might be OK. > Another way of doing things would be to use the mtime of the cruft > pack for the expiry time, and if the curft pack is ever referenced, > its mtime would get updated. Yet a third way would be to simply clear > the "cruft" bit if it ever *is* referenced. In the common case, it > would never be referenced, so it could just get deleted, but in the > case where the user has manually "rescued" a set of commits (perhaps > by explicitly setting a branch head to commit id found from a reflog), > the objects would be saved. I don't think we have to worry about "rescued" objects. Those are reachable, so they'd get copied into the new "real" pack (and then their cruft pack eventually deleted). -Peff ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-23 0:47 ` Jeff King @ 2018-01-23 5:45 ` Theodore Ts'o 2018-01-23 16:17 ` Jeff King 0 siblings, 1 reply; 37+ messages in thread From: Theodore Ts'o @ 2018-01-23 5:45 UTC (permalink / raw) To: Jeff King Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Christoph Hellwig, Linus Torvalds, Git Mailing List, linux-fsdevel, Chris Mason On Mon, Jan 22, 2018 at 07:47:10PM -0500, Jeff King wrote: > > I think Ævar is talking about the case of: > > 1. You make 100 objects that aren't referenced. They're loose. > > 2. You run git-gc. They're still too recent to be deleted. > > Right now those recent loose objects sit loose, and have zero cost at > the time of gc. In a "cruft pack" world, you'd pay some I/O to copy > them into the cruft pack, and some CPU to zlib and delta-compress them. > I think that's probably fine, though. I wasn't assuming that git-gc would create a cruft pack --- although I guess it could. As you say, recent loose objects have relatively zero cost at the time of gc. To the extent that the gc has to read lots of loose files, there may be more seeks in the cold cache case, so there is actually *some* cost to having the loose objects, but it's not great. What I was thinking about instead is that in cases where we know we are likely to be creating a large number of loose objects (whether they referenced or not), in a world where we will be calling fsync(2) after every single loose object being created, pack files start looking *way* more efficient. So in general, if you know you will be creating N loose objects, where N is probably around 50 or so, you'll want to create a pack instead. One of those cases is "repack -A", and in that case the loose objects are all going tobe not referenced, so it would be a "cruft pack". But in many other cases where we might be importing from another DCVS, which will be another case where doing an fsync(2) after every loose object creation (and where I have sometimes seen it create them *all* loose, and not use a pack at all), is going to get extremely slow and painful. > So if we pack all the loose objects into a cruft pack, the mtime of the > cruft pack becomes the new gauge for "recent". And if we migrate objects > from old cruft pack to new cruft pack at each gc, then they'll keep > getting their mtimes refreshed, and we'll never drop them. Well, I was assuming that gc would be a special case which doesn't the mtime of the old cruft pack. (Or more generally, any time an object is gets copied out of the cruft pack, either to a loose object, or to another pack, the mtime on the source pack should not be touched.) - Ted ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-23 5:45 ` Theodore Ts'o @ 2018-01-23 16:17 ` Jeff King 0 siblings, 0 replies; 37+ messages in thread From: Jeff King @ 2018-01-23 16:17 UTC (permalink / raw) To: Theodore Ts'o Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Christoph Hellwig, Linus Torvalds, Git Mailing List, linux-fsdevel, Chris Mason On Tue, Jan 23, 2018 at 12:45:53AM -0500, Theodore Ts'o wrote: > What I was thinking about instead is that in cases where we know we > are likely to be creating a large number of loose objects (whether > they referenced or not), in a world where we will be calling fsync(2) > after every single loose object being created, pack files start > looking *way* more efficient. So in general, if you know you will be > creating N loose objects, where N is probably around 50 or so, you'll > want to create a pack instead. > > One of those cases is "repack -A", and in that case the loose objects > are all going tobe not referenced, so it would be a "cruft pack". But > in many other cases where we might be importing from another DCVS, > which will be another case where doing an fsync(2) after every loose > object creation (and where I have sometimes seen it create them *all* > loose, and not use a pack at all), is going to get extremely slow and > painful. Ah, I see. I think in the general case of git operations this is hard (because most object writes don't realize the larger operation that they're a part of). But I agree that those two are the low-hanging fruit (imports should already be using fast-import, and "cruft packs" are not too hard an idea to implement). I agree that a cruft-pack implementation could just be for "repack -A", and does not have to collect otherwise loose objects. I think part of my confusion was that you and I are coming to the idea from different angles: you care about minimizing fsyncs, and I'm interested in stopping the problem where you have too many loose objects after running auto-gc. So I care more about collecting those loose objects for that case. > > So if we pack all the loose objects into a cruft pack, the mtime of the > > cruft pack becomes the new gauge for "recent". And if we migrate objects > > from old cruft pack to new cruft pack at each gc, then they'll keep > > getting their mtimes refreshed, and we'll never drop them. > > Well, I was assuming that gc would be a special case which doesn't the > mtime of the old cruft pack. (Or more generally, any time an object > is gets copied out of the cruft pack, either to a loose object, or to > another pack, the mtime on the source pack should not be touched.) Right, that's the "you have multiple cruft packs" idea which has been discussed[1] (each one just hangs around until its mtime expires, and may duplicate objects found elsewhere). That does end up with one pack per gc, which just punts the "too many loose objects" to "too many packs". But unless the number of gc runs you do is very high compared to the expiration time, we can probably ignore that. -Peff [1] https://public-inbox.org/git/20170610080626.sjujpmgkli4muh7h@sigill.intra.peff.net/ ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-22 15:09 ` Ævar Arnfjörð Bjarmason 2018-01-22 18:09 ` Theodore Ts'o @ 2018-01-23 0:25 ` Jeff King 1 sibling, 0 replies; 37+ messages in thread From: Jeff King @ 2018-01-23 0:25 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Junio C Hamano, Theodore Ts'o, Christoph Hellwig, Linus Torvalds, Git Mailing List, linux-fsdevel, Chris Mason On Mon, Jan 22, 2018 at 04:09:23PM +0100, Ævar Arnfjörð Bjarmason wrote: > > Yes, a "cruft pack" that holds unreachable object has been discussed > > a few times recently on list, and I do agree that it is a desirable > > thing to have in the longer run. > > > > What's tricky is to devise a way to allow us to salvage objects that > > are placed in a cruft pack because they are accessed recently, > > proving themselves to be no longer crufts. It could be that a good > > way to resurrect them is to explode them to loose form when they are > > accessed out of a cruft pack. We need to worry about interactions > > with read-only users if we go that route, but with the current > > "explode unreachable to loose, touch their mtime when they are > > accessed" scheme ends up ignoring accesses from read-only users that > > cannot update mtime, so it might not be too bad. > > Wouldn't it also make gc pruning more expensive? Now you can repack > regularly and loose objects will be left out of the pack, and then just > rm'd, whereas now it would entail creating new packs (unless the whole > pack was objects meant for removal). > > Probably still worth it, but something to keep in mind. That's a good point. I think it would be OK in practice, though, since normal operations don't tend to create a huge number of unreachable loose objects (at least compared to the _reachable_ loose objects, which we're already dealing with). We tend to get unbalanced explosions of loose objects only because a huge chunk of packed history expired. It is something to keep in mind when implementing the scheme, though. Luckily we already have the right behavior implemented via --pack-loose-unreachable (which is used for "repack -k" currently), so I think it would just be a matter of passing the right flags from git-repack. -Peff ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-18 16:27 ` Christoph Hellwig 2018-01-19 19:08 ` Junio C Hamano @ 2018-01-21 21:32 ` Chris Mason 1 sibling, 0 replies; 37+ messages in thread From: Chris Mason @ 2018-01-21 21:32 UTC (permalink / raw) To: Christoph Hellwig, Linus Torvalds Cc: Theodore Ts'o, Ævar Arnfjörð Bjarmason, Junio C Hamano, Git Mailing List, linux-fsdevel On 01/18/2018 11:27 AM, Christoph Hellwig wrote: > [adding Chris to the Cc list - this is about the awful ext3 data=ordered > behavior of syncing the whole file system data and metadata on each > fsync] > > On Wed, Jan 17, 2018 at 03:57:53PM -0800, Linus Torvalds wrote: >> On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o <tytso@mit.edu> wrote: >>> >>> Well, let's be fair; this is something *ext3* got wrong, and it was >>> the default file system back them. >> >> I'm pretty sure reiserfs and btrfs did too.. > > I'm pretty sure btrfs never did, and reiserfs at least looks like > it currently doesn't but I'd have to dig into history to check if > it ever did. > Christoph has this right, btrfs only fsyncs the one file that you've asked for, and unrelated data/metadata won't be included. We've seen big fsync stalls on ext4 caused by data=ordered, so it's still possible to trigger on ext4, but much better than ext3. I do share Ted's concern about the perf impact of the fsyncs on tons of loose files, but the defaults should be safety first. -chris ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-17 23:52 ` Theodore Ts'o 2018-01-17 23:57 ` Linus Torvalds @ 2020-09-17 11:06 ` Ævar Arnfjörð Bjarmason 2020-09-17 14:14 ` Christoph Hellwig 2020-09-17 15:30 ` Junio C Hamano 1 sibling, 2 replies; 37+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2020-09-17 11:06 UTC (permalink / raw) To: Theodore Ts'o Cc: Linus Torvalds, Junio C Hamano, Christoph Hellwig, Git Mailing List, linux-fsdevel, Jacob Vosmaer On Thu, Jan 18 2018, Theodore Ts'o wrote: > On Wed, Jan 17, 2018 at 02:07:22PM -0800, Linus Torvalds wrote: >> >> Now re-do the test while another process writes to a totally unrelated >> a huge file (say, do a ISO file copy or something). >> >> That was the thing that several filesystems get completely and >> horribly wrong. Generally _particularly_ the logging filesystems that >> don't even need the fsync, because they use a single log for >> everything (so fsync serializes all the writes, not just the writes to >> the one file it's fsync'ing). > > Well, let's be fair; this is something *ext3* got wrong, and it was > the default file system back them. All of the modern file systems now > do delayed allocation, which means that an fsync of one file doesn't > actually imply an fsync of another file. Hence... > >> The original git design was very much to write each object file >> without any syncing, because they don't matter since a new object file >> - by definition - isn't really reachable. Then sync before writing the >> index file or a new ref. > > This isn't really safe any more. Yes, there's a single log. But > files which are subject to delayed allocation are in the page cache, > and just because you fsync the index file doesn't mean that the object > file is now written to disk. It was true for ext3, but it's not true > for ext4, xfs, btrfs, etc. > > The good news is that if you have another process downloading a huge > ISO image, the fsync of the index file won't force the ISO file to be > written out. The bad news is that it won't force out the other git > object files, either. > > Now, there is a potential downside of fsync'ing each object file, and > that is the cost of doing a CACHE FLUSH on a HDD is non-trivial, and > even on a SSD, it's not optimal to call CACHE FLUSH thousands of times > in a second. So if you are creating thousands of tiny files, and you > fsync each one, each fsync(2) call is a serializing instruction, which > means it won't return until that one file is written to disk. If you > are writing lots of small files, and you are using a HDD, you'll be > bottlenecked to around 30 files per second on a 5400 RPM HDD, and this > is true regardless of what file system you use, because the bottle > neck is the CACHE FLUSH operation, and how you organize the metadata > and how you do the block allocation, is largely lost in the noise > compared to the CACHE FLUSH command, which serializes everything. > > There are solutions to this; you could simply not call fsync(2) a > thousand times, and instead write a pack file, and call fsync once on > the pack file. That's probably the smartest approach. > > You could also create a thousand threads, and call fsync(2) on those > thousand threads at roughly the same time. Or you could use a > bleeding edge kernel with the latest AIO patch, and use the newly > added IOCB_CMD_FSYNC support. > > But I'd simply recommend writing a pack and fsync'ing the pack, > instead of trying to write a gazillion object files. (git-repack -A, > I'm looking at you....) > > - Ted [I didn't find an ideal message to reply to in this thread, but this seemed to probably be the best] Just an update on this since I went back and looked at this thread, GitLab about ~1yr ago turned on core.fsyncObjectFiles=true by default. The reason is detailed in [1], tl;dr: empty loose object file issue on ext4 allegedly caused by a lack of core.fsyncObjectFiles=true, but I didn't do any root cause analysis. Just noting it here for for future reference. 1. https://gitlab.com/gitlab-org/gitlab-foss/-/issues/51680#note_180508774 ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2020-09-17 11:06 ` Ævar Arnfjörð Bjarmason @ 2020-09-17 14:14 ` Christoph Hellwig 2020-09-17 15:30 ` Junio C Hamano 1 sibling, 0 replies; 37+ messages in thread From: Christoph Hellwig @ 2020-09-17 14:14 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Theodore Ts'o, Linus Torvalds, Junio C Hamano, Christoph Hellwig, Git Mailing List, linux-fsdevel, Jacob Vosmaer On Thu, Sep 17, 2020 at 01:06:50PM +0200, Ævar Arnfjörð Bjarmason wrote: > The reason is detailed in [1], tl;dr: empty loose object file issue on > ext4 allegedly caused by a lack of core.fsyncObjectFiles=true, but I > didn't do any root cause analysis. Just noting it here for for future > reference. All the modern Linux file systems first write the data, and then only write the metadata after the data write has finished. So your data might have been partially or fully on disk, but until the transaction to commit the size change and/or extent state change you're not going to be able to read it back. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2020-09-17 11:06 ` Ævar Arnfjörð Bjarmason 2020-09-17 14:14 ` Christoph Hellwig @ 2020-09-17 15:30 ` Junio C Hamano 1 sibling, 0 replies; 37+ messages in thread From: Junio C Hamano @ 2020-09-17 15:30 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Theodore Ts'o, Linus Torvalds, Christoph Hellwig, Git Mailing List, linux-fsdevel, Jacob Vosmaer Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > [I didn't find an ideal message to reply to in this thread, but this > seemed to probably be the best] > > Just an update on this since I went back and looked at this thread, > GitLab about ~1yr ago turned on core.fsyncObjectFiles=true by > default. > > The reason is detailed in [1], tl;dr: empty loose object file issue on > ext4 allegedly caused by a lack of core.fsyncObjectFiles=true, but I > didn't do any root cause analysis. Just noting it here for for future > reference. > > 1. https://gitlab.com/gitlab-org/gitlab-foss/-/issues/51680#note_180508774 Thanks for bringing the original discussion back. I do recall the discussion and the list of issues to think about raised by Peff back then in [2] is still relevant. 2. https://public-inbox.org/git/20180117205509.GA14828@sigill.intra.peff.net/ ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-17 18:48 [PATCH] enable " Christoph Hellwig 2018-01-17 19:04 ` Junio C Hamano @ 2018-01-17 20:55 ` Jeff King 2018-01-17 21:10 ` Christoph Hellwig 1 sibling, 1 reply; 37+ messages in thread From: Jeff King @ 2018-01-17 20:55 UTC (permalink / raw) To: Christoph Hellwig; +Cc: git, linux-fsdevel On Wed, Jan 17, 2018 at 07:48:28PM +0100, Christoph Hellwig wrote: > fsync is required for data integrity as there is no gurantee that > data makes it to disk at any specified time without it. Even for > ext3 with data=ordered mode the file system will only commit all > data at some point in time that is not guaranteed. > > I've lost data on development machines with various times countless > times due to the lack of this option, and now lost trees on a > git server with ext4 as well yesterday. It's time to make git > safe by default. I'm definitely sympathetic, and I've contemplated a patch like this a few times. But I'm not sure we're "safe by default" here after this patch. In particular: 1. This covers only loose objects. We generally sync pack writes already, so we're covered there. But we do not sync ref updates at all, which we'd probably want to in a default-safe setup (a common post-crash symptom I've seen is zero-length ref files). 2. Is it sufficient to fsync() the individual file's descriptors? We often do other filesystem operations (like hardlinking or renaming) that also need to be committed to disk before an operation can be considered saved. 3. Related to (2), we often care about the order of metadata commits. E.g., a common sequence is: a. Write object contents to tempfile. b. rename() or hardlink tempfile to final name. c. Write object name into ref.lock file. d. rename() ref.lock to ref If we see (d) but not (b), then the result is a corrupted repository. Is this guaranteed by ext4 journaling with data=ordered? It may be that data=ordered gets us what we need for (2) and (3). But I think at the very least we should consider fsyncing ref updates based on a config option, too. -Peff ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH] enable core.fsyncObjectFiles by default 2018-01-17 20:55 ` Jeff King @ 2018-01-17 21:10 ` Christoph Hellwig 0 siblings, 0 replies; 37+ messages in thread From: Christoph Hellwig @ 2018-01-17 21:10 UTC (permalink / raw) To: Jeff King; +Cc: Christoph Hellwig, git, linux-fsdevel On Wed, Jan 17, 2018 at 03:55:09PM -0500, Jeff King wrote: > I'm definitely sympathetic, and I've contemplated a patch like this a > few times. But I'm not sure we're "safe by default" here after this > patch. In particular: > > 1. This covers only loose objects. We generally sync pack writes > already, so we're covered there. But we do not sync ref updates at > all, which we'd probably want to in a default-safe setup (a common > post-crash symptom I've seen is zero-length ref files). I've not seen them myself yet, but yes, they need an fsync. > 2. Is it sufficient to fsync() the individual file's descriptors? > We often do other filesystem operations (like hardlinking or > renaming) that also need to be committed to disk before an > operation can be considered saved. No, for metadata operations we need to fsync the directory as well. > 3. Related to (2), we often care about the order of metadata commits. > E.g., a common sequence is: > > a. Write object contents to tempfile. > > b. rename() or hardlink tempfile to final name. > > c. Write object name into ref.lock file. > > d. rename() ref.lock to ref > > If we see (d) but not (b), then the result is a corrupted > repository. Is this guaranteed by ext4 journaling with > data=ordered? It is not generally guranteed by Linux file system semantics. Various file system will actually start writeback of file data before rename, but not actually wait on it. ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2020-09-17 15:31 UTC | newest] Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-06-23 21:57 [PATCH] Enable core.fsyncObjectFiles by default Stefan Beller 2015-06-23 22:21 ` Junio C Hamano 2015-06-23 23:29 ` Theodore Ts'o 2015-06-24 5:32 ` Junio C Hamano 2015-06-24 14:30 ` Theodore Ts'o 2015-06-24 1:07 ` Duy Nguyen 2015-06-24 3:37 ` Jeff King 2015-06-24 5:20 ` Junio C Hamano 2018-01-17 18:48 [PATCH] enable " Christoph Hellwig 2018-01-17 19:04 ` Junio C Hamano 2018-01-17 19:35 ` Christoph Hellwig 2018-01-17 20:05 ` Andreas Schwab 2018-01-17 19:37 ` Matthew Wilcox 2018-01-17 19:42 ` Christoph Hellwig 2018-01-17 21:44 ` Ævar Arnfjörð Bjarmason 2018-01-17 22:07 ` Linus Torvalds 2018-01-17 22:25 ` Linus Torvalds 2018-01-17 23:16 ` Ævar Arnfjörð Bjarmason 2018-01-17 23:42 ` Linus Torvalds 2018-01-17 23:52 ` Theodore Ts'o 2018-01-17 23:57 ` Linus Torvalds 2018-01-18 16:27 ` Christoph Hellwig 2018-01-19 19:08 ` Junio C Hamano 2018-01-20 22:14 ` Theodore Ts'o 2018-01-20 22:27 ` Junio C Hamano 2018-01-22 15:09 ` Ævar Arnfjörð Bjarmason 2018-01-22 18:09 ` Theodore Ts'o 2018-01-23 0:47 ` Jeff King 2018-01-23 5:45 ` Theodore Ts'o 2018-01-23 16:17 ` Jeff King 2018-01-23 0:25 ` Jeff King 2018-01-21 21:32 ` Chris Mason 2020-09-17 11:06 ` Ævar Arnfjörð Bjarmason 2020-09-17 14:14 ` Christoph Hellwig 2020-09-17 15:30 ` Junio C Hamano 2018-01-17 20:55 ` Jeff King 2018-01-17 21:10 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).