All of lore.kernel.org
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>, Jan Kara <jack@suse.cz>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	Chris Mason <clm@fb.com>, Al Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	Ext4 <linux-ext4@vger.kernel.org>,
	Linux Btrfs <linux-btrfs@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>
Subject: Re: [RFC][PATCH] link.2: AT_ATOMIC_DATA and AT_ATOMIC_METADATA
Date: Sat, 1 Jun 2019 11:01:42 +0300	[thread overview]
Message-ID: <CAOQ4uxi99NDYMrz-Q7xKta4beQiYFX3-MipZ_RxFNktFTA=vMA@mail.gmail.com> (raw)
In-Reply-To: <20190531232852.GG29573@dread.disaster.area>

On Sat, Jun 1, 2019 at 2:28 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Sat, Jun 01, 2019 at 08:45:49AM +1000, Dave Chinner wrote:
> > Given that we can already use AIO to provide this sort of ordering,
> > and AIO is vastly faster than synchronous IO, I don't see any point
> > in adding complex barrier interfaces that can be /easily implemented
> > in userspace/ using existing AIO primitives. You should start
> > thinking about expanding libaio with stuff like
> > "link_after_fdatasync()" and suddenly the whole problem of
> > filesystem data vs metadata ordering goes away because the
> > application directly controls all ordering without blocking and
> > doesn't need to care what the filesystem under it does....
>
> And let me point out that this is also how userspace can do an
> efficient atomic rename - rename_after_fdatasync(). i.e. on
> completion of the AIO_FSYNC, run the rename. This guarantees that
> the application will see either the old file of the complete new
> file, and it *doesn't have to wait for the operation to complete*.
> Once it is in flight, the file will contain the old data until some
> point in the near future when will it contain the new data....

What I am looking for is a way to isolate the effects of "atomic rename/link"
from the rest of the users. Sure there is I/O bandwidth and queued
bios, but at least isolate other threads working on other files or metadata
from contending with the "atomic rename" thread of journal flushes and
the like. Actually, one of my use cases is "atomic rename" of files with
no data (looking for atomicity w.r.t xattr and mtime), so this "atomic rename"
thread should not be interfering with other workloads at all.

>
> Seriously, sit down and work out all the "atomic" data vs metadata
> behaviours you want, and then tell me how many of them cannot be
> implemented as "AIO_FSYNC w/ completion callback function" in
> userspace. This mechanism /guarantees ordering/ at the application
> level, the application does not block waiting for these data
> integrity operations to complete, and you don't need any new kernel
> side functionality to implement this.

So I think what I could have used is AIO_BATCH_FSYNC, an interface
that was proposed by Ric Wheeler and discussed on LSF:
https://lwn.net/Articles/789024/
Ric was looking for a way to efficiently fsync a "bunch of files".
Submitting several AIO_FSYNC calls is not the efficient way of doing that.
So it is either a new AIO_BATCH_FSYNC and a kernel implementation
that flushes the inodes and then calls ->sync_fs(), or a new AIO operation
that just does the ->sync_fs() bit and using sync_file_range() for the inodes.

To be more accurate, the AIO operation that would emulate my
proposed API more closely is AIO_WAIT_FOR_SYNCFS, as I do not wish
to impose excessive journal flushes, I just need a completion callback
when they happened to perform the rename/link.

>
> Fundamentally, the assertion that disk cache flushes are not what
> causes fsync "to be slow" is incorrect. It's the synchronous

Too many double negatives. I am not sure I parsed this correctly.
But I think by now you understand that I don't care that fsync is "slow".
I care about frequent fsyncs making the entire system slow down.

Heck, xfs even has a mitigation in place to improve performance
of too frequent fsyncs, but that mitigation is partly gone since
47c7d0b19502 xfs: fix incorrect log_flushed on fsync

The situation with frequent fsync on ext4 at the moment is probably
worse.

I am trying to reduce the number of fsyncs from applications
and converting fsync to AIO_FSYNC is not going to help with that.

Thanks,
Amir.

  reply	other threads:[~2019-06-01  8:01 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-27 17:26 [RFC][PATCH] link.2: AT_ATOMIC_DATA and AT_ATOMIC_METADATA Amir Goldstein
2019-05-28 20:06 ` Darrick J. Wong
2019-05-29  5:58   ` Amir Goldstein
2019-05-28 20:26 ` Theodore Ts'o
2019-05-29  5:38   ` Amir Goldstein
2019-05-31 15:21     ` Amir Goldstein
2019-05-31 16:41       ` Theodore Ts'o
2019-05-31 17:22         ` Amir Goldstein
2019-05-31 19:21           ` Theodore Ts'o
2019-05-31 22:45         ` Dave Chinner
2019-05-31 23:28           ` Dave Chinner
2019-06-01  8:01             ` Amir Goldstein [this message]
2019-06-03  4:25               ` Dave Chinner
2019-06-03  6:17                 ` Amir Goldstein
2019-06-01  7:21           ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOQ4uxi99NDYMrz-Q7xKta4beQiYFX3-MipZ_RxFNktFTA=vMA@mail.gmail.com' \
    --to=amir73il@gmail.com \
    --cc=clm@fb.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.