git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, tytso@mit.edu,
	Christoph Hellwig <hch@lst.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant
Date: Fri, 9 Oct 2020 12:44:36 +0200 (CEST)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.2010091219480.50@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <87eem8hfrp.fsf@evledraar.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5249 bytes --]

Hi Ævar,

On Thu, 8 Oct 2020, Ævar Arnfjörð Bjarmason wrote:

> On Thu, Oct 08 2020, Johannes Schindelin wrote:
>
> > On Thu, 17 Sep 2020, Junio C Hamano wrote:
> >
> >> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
> >>
> >> > As amusing as Linus's original prose[1] is here it doesn't really explain
> >> > in any detail to the uninitiated why you would or wouldn't enable
> >> > this, and the counter-intuitive reason for why git wouldn't fsync your
> >> > precious data.
> >> >
> >> > So elaborate (a lot) on why this may or may not be needed. This is my
> >> > best-effort attempt to summarize the various points raised in the last
> >> > ML[2] discussion about this.
> >> >
> >> > 1.  aafe9fbaf4 ("Add config option to enable 'fsync()' of object
> >> >     files", 2008-06-18)
> >> > 2. https://lore.kernel.org/git/20180117184828.31816-1-hch@lst.de/
> >> >
> >> > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> >> > ---
> >> >  Documentation/config/core.txt | 42 ++++++++++++++++++++++++++++++-----
> >> >  1 file changed, 36 insertions(+), 6 deletions(-)
> >>
> >> When I saw the subject in my mailbox, I expected to see that you
> >> would resurrect Christoph's updated text in [*1*], but you wrote a
> >> whole lot more ;-) And they are quite informative to help readers to
> >> understand what the option does.  I am not sure if the understanding
> >> directly help readers to decide if it is appropriate for their own
> >> repositories, though X-<.
> >
> > I agree that it is an improvement, and am therefore in favor of applying
> > the patch.
>
> Just the improved docs, or flipping the default of core.fsyncObjectFiles
> to "true"?

I am actually also in favor of flipping the default. We carry
https://github.com/git-for-windows/git/commit/14dad078c28159b250be599c0890ece2d6f4d635
in Git for Windows for over three years. The commit message:

	mingw: change core.fsyncObjectFiles = 1 by default

	From the documentation of said setting:

		This boolean will enable fsync() when writing object files.

		This is a total waste of time and effort on a filesystem that
		orders data writes properly, but can be useful for filesystems
		that do not use journalling (traditional UNIX filesystems) or
		that only journal metadata and not file contents (OS X’s HFS+,
		or Linux ext3 with "data=writeback").

	The most common file system on Windows (NTFS) does not guarantee that
	order, therefore a sudden loss of power (or any other event causing an
	unclean shutdown) would cause corrupt files (i.e. files filled with
	NULs). Therefore we need to change the default.

	Note that the documentation makes it sound as if this causes really bad
	performance. In reality, writing loose objects is something that is done
	only rarely, and only a handful of files at a time.

	Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

The patch itself limits this change to Windows, but if this becomes a
platform-independent change, all the better for me!

> I've been meaning to re-roll this. I won't have time anytime soon to fix
> git's fsync() use, i.e. ensure that we run up & down modified
> directories and fsync()/fdatasync() file/dir fd's as appropriate but I
> think documenting it and changing the core.fsyncObjectFiles default
> makes sense and is at least a step in the right direction.

Agreed.

> I do think it makes more sense for a v2 to split most of this out into
> some section that generally discusses data integrity in the .git
> directory. I.e. that says that currently where we use fsync() (such as
> pack/commit-graph writes) we don't fsync() the corresponding
> director{y,ies), and ref updates don't fsync() at all.
>
> Where to put that though? gitrepository-layout(5)? Or a new page like
> gitrepository-integrity(5) (other suggestions welcome..).

I think `gitrepository-layout` is probably the best location for now.

> Looking at the code again it seems easier than I thought to make this
> right if we ignore .git/refs (which reftable can fix for us). Just:
>
> 1. Change fsync_or_die() and its callsites to also pass/sync the
>    containing directory, which is always created already
>    (e.g. .git/objects/pack/)...).
>
> 2. ..Or in the case where it's not created already such as
>    .git/objects/??/ (or .git/objects/pack/) itself) it's not N-deep like
>    the refs hierarchy, so "did we create it" state is pretty simple, or
>    we can just always do it unconditionally.
>
> 3. Without reftable the .git/refs/ case shouldn't be too hard if we're
>    OK with redundantly fsyncing all the way down, i.e. to make it
>    simpler by not tracking the state of exactly what was changed.
>
> 4. Now that I'm writing this there's also .git/{config,rr-cache} and any
>    number of other things we need to change for 100% coverage, but the
>    above 1-3 should be good enough for repo integrity where repo = refs
>    & objects.

I fear that the changes to `fsync` also the directory need to be guarded
behind a preprocessor flag, though: if you try to `open()` a directory on
Windows, it fails (our emulated `open()` sets `errno = EISDIR`).

Ciao,
Dscho

  parent reply	other threads:[~2020-10-09 13:02 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-17 18:48 [PATCH] enable core.fsyncObjectFiles by default Christoph Hellwig
2018-01-17 19:04 ` Junio C Hamano
2018-01-17 19:35   ` Christoph Hellwig
2018-01-17 20:05     ` Andreas Schwab
2018-01-17 19:37   ` Matthew Wilcox
2018-01-17 19:42     ` Christoph Hellwig
2018-01-17 21:44   ` Ævar Arnfjörð Bjarmason
2018-01-17 22:07     ` Linus Torvalds
2018-01-17 22:25       ` Linus Torvalds
2018-01-17 23:16       ` Ævar Arnfjörð Bjarmason
2018-01-17 23:42         ` Linus Torvalds
2018-01-17 23:52       ` Theodore Ts'o
2018-01-17 23:57         ` Linus Torvalds
2018-01-18 16:27           ` Christoph Hellwig
2018-01-19 19:08             ` Junio C Hamano
2018-01-20 22:14               ` Theodore Ts'o
2018-01-20 22:27                 ` Junio C Hamano
2018-01-22 15:09                   ` Ævar Arnfjörð Bjarmason
2018-01-22 18:09                     ` Theodore Ts'o
2018-01-23  0:47                       ` Jeff King
2018-01-23  5:45                         ` Theodore Ts'o
2018-01-23 16:17                           ` Jeff King
2018-01-23  0:25                     ` Jeff King
2018-01-21 21:32             ` Chris Mason
2020-09-17 11:06         ` Ævar Arnfjörð Bjarmason
2020-09-17 11:28           ` [RFC PATCH 0/2] should core.fsyncObjectFiles fsync the dir entry + docs Ævar Arnfjörð Bjarmason
2020-09-17 11:28           ` [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles Ævar Arnfjörð Bjarmason
2020-09-17 13:16             ` Jeff King
2020-09-17 15:09               ` Christoph Hellwig
2020-09-17 14:09             ` Christoph Hellwig
2020-09-17 14:55               ` Jeff King
2020-09-17 14:56                 ` Christoph Hellwig
2020-09-17 15:37                   ` Junio C Hamano
2020-09-17 17:12                     ` Jeff King
2020-09-17 20:37                       ` Taylor Blau
2020-09-22 10:42               ` Ævar Arnfjörð Bjarmason
2020-09-17 20:21             ` Johannes Sixt
2020-09-22  8:24               ` Ævar Arnfjörð Bjarmason
2020-11-19 11:38                 ` Johannes Schindelin
2020-09-17 11:28           ` [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant Ævar Arnfjörð Bjarmason
2020-09-17 14:12             ` Christoph Hellwig
2020-09-17 15:43             ` Junio C Hamano
2020-09-17 20:15               ` Johannes Sixt
2020-10-08  8:13               ` Johannes Schindelin
2020-10-08 15:57                 ` Ævar Arnfjörð Bjarmason
2020-10-08 18:53                   ` Junio C Hamano
2020-10-09 10:44                   ` Johannes Schindelin [this message]
2020-09-17 19:21             ` Marc Branchaud
2020-09-17 14:14           ` [PATCH] enable core.fsyncObjectFiles by default Christoph Hellwig
2020-09-17 15:30           ` Junio C Hamano
2018-01-17 20:55 ` Jeff King
2018-01-17 21:10   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.2010091219480.50@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).