All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Eric Wong <e@80x24.org>
Cc: git@vger.kernel.org, Neeraj Singh <nksingh85@gmail.com>
Subject: Re: [PATCH] allow disabling fsync everywhere
Date: Fri, 29 Oct 2021 13:20:29 +0200	[thread overview]
Message-ID: <211029.86v91g3vv3.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <20211028002102.19384-1-e@80x24.org>


On Thu, Oct 28 2021, Eric Wong wrote:

> "core.fsync" and the "GIT_FSYNC" environment variable now exist
> for disabling fsync() even on packfiles and other critical data.

First off, I think this would be useful to add, even as a non-test
thing.

There's plenty of use-cases where users don't care about fsync(), and
aside from data reliability it also has implications for say laptop
battery use.

I care about my E-Mail, but my E-Mail syncing thingy's (mbsync) has an
fsync=off in its config, I'm pretty confident that I won't lose power on
that *nix laptop, and even if I did I have other recovery methods.

I wouldn't use this in general, but might for some operations, e.g. when
I'm doing some batch operation with git-annex or whatever.

> Running "make test -j8 NO_SVN_TESTS=1" on a noisy 8-core system
> on an HDD, adding "GIT_FSYNC=0" to the invocation brings my test
> runtime from ~4 minutes down to ~3 minutes.

On a related topic: Have you tried having it use an empty template
directory? I have some unsubmitted patches for that, and if you retain
all trash dirs you'll find that we have IIRC 100-200MB of just
accumulated *.sample etc. hooks, out of ~1GB total for the tests.

> +core.fsync::
> +	A boolean to control 'fsync()' on all files.
> ++
> +Setting this to false can speed up writes of unimportant data.
> +Disabling fsync will lead to data loss on power failure.  If set
> +to false, this also overrides core.fsyncObjectFiles.  Defaults to true.
> +

Per the recent threads about fsync semantics this should really be
worded to be more scary, i.e. it's not guaranteed that your data is
on-disk at all. So subsequent git operations might see repo corruption,
if we're being POSIX-pedantic.

So some more "all bets are off" wording would be better here, and any
extra promises could be e.g.:

    On commonly deployed OS's such as Linux, OSX etc. a lack of fsync()
    gives you extra guarantees that are non-standard, primarily that a
    subsequent running process that accesses the data will see the
    updated version, as the VFS will serve up the new version, even if
    it's not on-disk yet. Even these light guarantees may not be
    something you're getting from our OS. Here be dragons!

> [...]
> --- a/git-cvsserver.perl
> +++ b/git-cvsserver.perl
> @@ -3607,6 +3607,25 @@ package GITCVS::updater;
>  use strict;
>  use warnings;
>  use DBI;
> +our $_use_fsync;

s/our/my/?

> +# n.b. consider using Git.pm
> +sub use_fsync {
> +    if (!defined($_use_fsync)) {
> +        my $x = $ENV{GIT_FSYNC};
> +        if (defined $x) {
> +            local $ENV{GIT_CONFIG};
> +            delete $ENV{GIT_CONFIG};
> +            my $v = ::safe_pipe_capture('git', '-c', "core.fsync=$x",
> +                                        qw(config --type=bool core.fsync));
> +            $_use_fsync = defined($v) ? ($v eq "true\n") : 1;
> +        } else {
> +            my $v = `git config --type=bool core.fsync`;
> +            $_use_fsync = $v eq "false\n" ? 0 : 1;
> +        }
> +    }
> +    $_use_fsync;
> +}

I wonder most of all if we really need these perl changes, does it
really matter for the overall performance that you want, or was it just
to get all fsync() uses out of the test suite?

This is a more general comment, but so much of scaffolding like that in
the *.perl scripts could go away if we taught git.c some "not quite a
builtin, but it's ours" mode, and had say code for git-send-email,
git-svn etc. to just set the various data they need in the environment
before we invoke them. Then this would just be say:

    our $use_fsync = $ENV{GIT_FOR_CVSSERVER_FSYNC};


> +    if ($self->{dbdriver} eq 'SQLite' && !use_fsync()) {
> +        $self->{dbh}->do('PRAGMA synchronous = OFF');
> +    }

..in particular if we're doing this we should update the docs to say
that we'll also tweak third-party software's use of fsync() in some
cases.

Even if we "own" that sqlite DB a user might not expect a git option to
have gone so far as to be the cause of their on-disk sqlite DB's
corruption.

> [...]
>  	my $sync;
>  	# both of these options make our .rev_db file very, very important
>  	# and we can't afford to lose it because rebuild() won't work
> -	if ($self->use_svm_props || $self->no_metadata) {
> +	if (($self->use_svm_props || $self->no_metadata) && use_fsync()) {
>  		require File::Copy;
>  		$sync = 1;
>  		File::Copy::copy($db, $db_lock) or die "rev_map_set(@_): ",

This doesn't just impact $io->sync, but also $io->flush, which is a
different thing. PerlIO flushing to the OS != fsync().

So even on OS's that make the above noted guarantees you'd get nothing,
since we're now at the mercy of perl not crashing in the interim, not
kernel or stdlib behavior.

> @@ -57,7 +58,9 @@ void fprintf_or_die(FILE *f, const char *fmt, ...)
>  
>  void fsync_or_die(int fd, const char *msg)
>  {
> -	while (fsync(fd) < 0) {
> +	if (use_fsync < 0)
> +		use_fsync = git_env_bool("GIT_FSYNC", 1);
> +	while (use_fsync && fsync(fd) < 0) {
>  		if (errno != EINTR)
>  			die_errno("fsync error on '%s'", msg);
>  	}

This patch direction looks good to me, aside from the above we should
really update any other fsync() docs we have, maybe with a
cross-reference in core.fsyncObjectFiles?

I'm not sure offhand what the state of the other fsync patches that
Neeraj Singh has been submitting is, but let's make sure that whatever
docs/config knobs etc. we have all play nicely together, and are
somewhat future-proof if possible.

  parent reply	other threads:[~2021-10-29 11:37 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-28  0:21 [PATCH] allow disabling fsync everywhere Eric Wong
2021-10-28  1:21 ` Eric Sunshine
2021-10-28 14:36 ` Jeff King
2021-10-28 18:28   ` Eric Wong
2021-10-28 19:29     ` Junio C Hamano
2021-10-29  0:15       ` [PATCH] tests: disable " Eric Wong
2021-10-29  5:18         ` Junio C Hamano
2021-10-29  7:56           ` Eric Wong
2021-10-29 18:12             ` Junio C Hamano
2021-10-29  7:33         ` Junio C Hamano
2021-10-29  7:48           ` Eric Wong
2021-10-29 17:22             ` Junio C Hamano
2021-10-29 20:34         ` Jeff King
2021-10-29 20:42           ` Junio C Hamano
2021-10-28 21:40     ` [PATCH] allow disabling " brian m. carlson
2021-10-29 11:20 ` Ævar Arnfjörð Bjarmason [this message]
2021-10-30 10:39   ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=211029.86v91g3vv3.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=nksingh85@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.