Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Junio C Hamano <gitster@pobox.com>,
	Christoph Hellwig <hch@lst.de>,
	Git Mailing List <git@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Roberto Tyley <roberto.tyley@gmail.com>
Subject: Re: [PATCH] enable core.fsyncObjectFiles by default
Date: Thu, 18 Jan 2018 00:16:51 +0100
Message-ID: <87efmohy8s.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <CA+55aFzJ2QO0MH3vgbUd8X-dzg_65A-jKmEBMSVt8ST2bpmzSQ@mail.gmail.com>


On Wed, Jan 17 2018, Linus Torvalds jotted:

> On Wed, Jan 17, 2018 at 1:44 PM, Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>>     I ran a small test myself on CentOS 7 (3.10) with ext4 data=ordered
>>     on the tests I thought might do a lot of loose object writes:
>>
>>       $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS="NO_OPENSSL=Y CFLAGS=-O3 -j56" ./run origin/master fsync-on~ fsync-on p3400-rebase.sh p0007-write-cache.sh
>>       [...]
>>       Test                                                            fsync-on~         fsync-on
>>       -------------------------------------------------------------------------------------------------------
>>       3400.2: rebase on top of a lot of unrelated changes             1.45(1.30+0.17)   1.45(1.28+0.20) +0.0%
>>       3400.4: rebase a lot of unrelated changes without split-index   4.34(3.71+0.66)   4.33(3.69+0.66) -0.2%
>>       3400.6: rebase a lot of unrelated changes with split-index      3.38(2.94+0.47)   3.38(2.93+0.47) +0.0%
>>       0007.2: write_locked_index 3 times (3214 files)                 0.01(0.00+0.00)   0.01(0.00+0.00) +0.0%
>>
>>    No impact. However I did my own test of running the test suite 10%
>>    times with/without this patch, and it runs 9% slower:

That should be "10 times" b.t.w., not "10% times"

>>
>>      fsync-off: avg:21.59 21.50 21.50 21.52 21.53 21.54 21.57 21.59 21.61 21.63 21.95
>>       fsync-on: avg:23.43 23.21 23.25 23.26 23.26 23.27 23.32 23.49 23.51 23.83 23.88
>
> That's not the thing you should check.
>
> Now re-do the test while another process writes to a totally unrelated
> a huge file (say, do a ISO file copy or something).
>
> That was the thing that several filesystems get completely and
> horribly wrong. Generally _particularly_ the logging filesystems that
> don't even need the fsync, because they use a single log for
> everything (so fsync serializes all the writes, not just the writes to
> the one file it's fsync'ing).
>
> The original git design was very much to write each object file
> without any syncing, because they don't matter since a new object file
> - by definition - isn't really reachable. Then sync before writing the
> index file or a new ref.
>>
> But things have changed, I'm not arguing that the code shouldn't be
> made safe by default. I personally refuse to use rotating media on my
> machines anyway, largely exactly because of the fsync() issue (things
> like "firefox" started doing fsync on the mysql database for stupid
> things, and you'd get huge pauses).
>
> But I do think your benchmark is wrong. The case where only git does
> something is not interesting or relevant. It really is "git does
> something _and_ somebody else writes something entirely unrelated at
> the same time" that matters.

Yeah it's shitty, just a quick hack to get some since there was a
discussion of performance, but neither your original patch or this
thread had quoted any.

One thing you may have missed is that this is a parallel (56 tests at a
time) run of the full test suite. So there's plenty of other git
processes (and test setup/teardown) racing with any given git
process. Running the test suite in a loop like this gives me ~100K IO
ops/s & ~50% disk utilization.

Or does overall FS activity and raw throughput (e.g. with an ISO copy)
matter more than general FS contention?

Tweaking it to emulate this iso copy case, running another test with one
of these running concurrently:

    # setup
    dd if=/dev/urandom of=/tmp/fake.iso bs=1024 count=$((1000*1024))
    # run in a loop (shuf to not always write the same thing)
    while sleep 0.1; do shuf /tmp/fake.iso | pv >/tmp/fake.shuf.iso; done

Gives throughput that spikes to 100% (not consistently) and:

    fsync-off: avg:36.37 31.74 33.83 35.12 36.19 36.32 37.04 37.34 37.71 37.93 40.43
     fsync-on: avg:38.09 34.56 35.14 35.69 36.41 36.41 37.96 38.25 40.45 41.44 44.59

~4.7% slower, v.s. ~8.5% in my earlier
87h8rki2iu.fsf@evledraar.gmail.com without that running.

Which is not an argument for / against this patch, but those numbers
seem significant, and generally if the entire test suite slows down by
that much there's going to be sub-parts of it that are much worse.

Which might be a reason to tread more carefully and if it *does* slow
things down perhaps do it with more granularity, e.g. turning it on in
git-receive-pack might be more sensible than in git-filter-branch.

I remember Roberto Tyley's BFG writing an amazing amount of loose
objects, but it doesn't seem to have an fsync() option, I wonder if
adding one would be a representative pathological test case.

  parent reply index

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-17 18:48 Christoph Hellwig
2018-01-17 19:04 ` Junio C Hamano
2018-01-17 19:35   ` Christoph Hellwig
2018-01-17 20:05     ` Andreas Schwab
2018-01-17 19:37   ` Matthew Wilcox
2018-01-17 19:42     ` Christoph Hellwig
2018-01-17 21:44   ` Ævar Arnfjörð Bjarmason
2018-01-17 22:07     ` Linus Torvalds
2018-01-17 22:25       ` Linus Torvalds
2018-01-17 23:16       ` Ævar Arnfjörð Bjarmason [this message]
2018-01-17 23:42         ` Linus Torvalds
2018-01-17 23:52       ` Theodore Ts'o
2018-01-17 23:57         ` Linus Torvalds
2018-01-18 16:27           ` Christoph Hellwig
2018-01-19 19:08             ` Junio C Hamano
2018-01-20 22:14               ` Theodore Ts'o
2018-01-20 22:27                 ` Junio C Hamano
2018-01-22 15:09                   ` Ævar Arnfjörð Bjarmason
2018-01-22 18:09                     ` Theodore Ts'o
2018-01-23  0:47                       ` Jeff King
2018-01-23  5:45                         ` Theodore Ts'o
2018-01-23 16:17                           ` Jeff King
2018-01-23  0:25                     ` Jeff King
2018-01-21 21:32             ` Chris Mason
2020-09-17 11:06         ` Ævar Arnfjörð Bjarmason
2020-09-17 11:28           ` [RFC PATCH 0/2] should core.fsyncObjectFiles fsync the dir entry + docs Ævar Arnfjörð Bjarmason
2020-09-17 11:28           ` [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles Ævar Arnfjörð Bjarmason
2020-09-17 13:16             ` Jeff King
2020-09-17 15:09               ` Christoph Hellwig
2020-09-17 14:09             ` Christoph Hellwig
2020-09-17 14:55               ` Jeff King
2020-09-17 14:56                 ` Christoph Hellwig
2020-09-17 15:37                   ` Junio C Hamano
2020-09-17 17:12                     ` Jeff King
2020-09-17 20:37                       ` Taylor Blau
2020-09-22 10:42               ` Ævar Arnfjörð Bjarmason
2020-09-17 20:21             ` Johannes Sixt
2020-09-22  8:24               ` Ævar Arnfjörð Bjarmason
2020-11-19 11:38                 ` Johannes Schindelin
2020-09-17 11:28           ` [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant Ævar Arnfjörð Bjarmason
2020-09-17 14:12             ` Christoph Hellwig
2020-09-17 15:43             ` Junio C Hamano
2020-09-17 20:15               ` Johannes Sixt
2020-10-08  8:13               ` Johannes Schindelin
2020-10-08 15:57                 ` Ævar Arnfjörð Bjarmason
2020-10-08 18:53                   ` Junio C Hamano
2020-10-09 10:44                   ` Johannes Schindelin
2020-09-17 19:21             ` Marc Branchaud
2020-09-17 14:14           ` [PATCH] enable core.fsyncObjectFiles by default Christoph Hellwig
2020-09-17 15:30           ` Junio C Hamano
2018-01-17 20:55 ` Jeff King
2018-01-17 21:10   ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2015-06-23 21:57 [PATCH] Enable " Stefan Beller
2015-06-23 22:21 ` Junio C Hamano
2015-06-23 23:29   ` Theodore Ts'o
2015-06-24  5:32     ` Junio C Hamano
2015-06-24 14:30       ` Theodore Ts'o
2015-06-24  1:07 ` Duy Nguyen
2015-06-24  3:37 ` Jeff King
2015-06-24  5:20   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87efmohy8s.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=roberto.tyley@gmail.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git