All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: Christoph Hellwig <hch@lst.de>
Cc: Christoph Hellwig <hch@infradead.org>, Jan Kara <jack@suse.cz>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org, Evgeniy Polyakov <zbr@ioremap.net>,
	ocfs2-devel@oss.oracle.com, Joel Becker <joel.becker@oracle.com>,
	Felix Blyakher <felixb@sgi.com>,
	xfs@oss.sgi.com, Anton Altaparmakov <aia21@cantab.net>,
	linux-ntfs-dev@lists.sourceforge.net,
	OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>,
	linux-ext4@vger.kernel.org, tytso@mit.edu
Subject: Re: [PATCH 07/17] vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode
Date: Sun, 30 Aug 2009 18:29:59 +0100	[thread overview]
Message-ID: <20090830172959.GD7129@shareable.org> (raw)
In-Reply-To: <20090830163917.GA23955@lst.de>

Christoph Hellwig wrote:
> Linux has sync_file_range which currently is a perfect way to lose your
> synced' data, but with two more flags and calls to ->fsync we could
> turn it into range-fsync/fdatasync.

Apart from the way it loses your data, the man page for
sync_file_range never manages to explain quite why you should use the
existing flags in various combinations.  It's only obvious if you've
worked on a kernel yourself.

Having asked this before, it appears one of the reasons for
sync_file_range working as it does is to give the application more
control over writeback order and to some extent, reduce the amount of
blocking.

But it's really difficult to manage the amount of blocking with it.
You need to know the request queue size among other things, and even
if you do it's dynamic.  Writeback order would be as easy with
fdatasync_range, and if you want to reduce blocking, a good
implementation of aio_fsync would be more useful.  Or, you have to use
application writeback threads anyway, so fdatasync_range again.

The one thing sync_file_range can do is let you submit multiple ranges
which the elevators can sort for the hardware.  You can't do that with
sequential calls to fdatasync_range, and it's not clear that aio_fsync
is implemented well enough (but it's a fairly good API for it).

Nick Piggin's idea to let fdatasync_range take multiple ranges might
help with that, but it's not clear how much.

> I'm not sure if that's a good
> idea or if we should just add a sys_fdatasync_rage systems call.

fdatasync_range has the advantage of being comprehensible.  People
will use it because it makes sense.

sync_file_range could be hijacked with new flags to implement
fdatasync_range.  If that's done, I'd rename the system call, but keep
it compatible with sync_file_range's flags, which would never be set
when userspace uses the new functionality.

> I don't quite see the point of a range-fsync, but it could be easily
> implemented as a flag.

A flags argument would be good anyway: to indicate if we want an
ordinary fdatasync, or something which flushes the relevant bit of
volatile hardware caches too.  With that as a capability, it is useful
to offer fsync, because that'd be the only way to get a volatile
hardware cache flush (or maybe the only way not to?).

For that reason, it should be permitted to give an infinitely large range.

I don't see the point of range-fsync either, but I'm not sure if I see
any harm in it.  If permitted, range-fsync with a zero-byte range
would flush just the inode state and none of the data.  If that's
technically available, maybe O_ISYNC and "#define O_SYNC
(O_DATASYNC|O_ISYNC)" isn't such as daft idea.

I'd call it fsync_range for consistency with aio_fsync (POSIX), which
takes flags O_DSYNC or O_SYNC to indicate the type of sync.  But I'd
use new flag names, to keep the space clear for other flags.  Just
sketching some ideas:

/* One of FSYNC_RANGE_SYNC or FSYNC_RANGE_DATASYNC must be set. */

#define FSYNC_RANGE_SYNC	(1 << 0)	/* Like fsync, O_SYNC. */
#define FSYNC_RANGE_DATASYNC	(1 << 1)	/* Like fdatasync, O_DSYNC. */
#define FSYNC_RANGE_NO_HWCACHE	(1 << 2)	/* Not hardware caches. */

-- Jamie

WARNING: multiple messages have this Message-ID (diff)
From: Jamie Lokier <jamie@shareable.org>
To: Christoph Hellwig <hch@lst.de>
Cc: tytso@mit.edu, linux-ext4@vger.kernel.org,
	Jan Kara <jack@suse.cz>,
	linux-ntfs-dev@lists.sourceforge.net,
	LKML <linux-kernel@vger.kernel.org>,
	Joel Becker <joel.becker@oracle.com>,
	Christoph Hellwig <hch@infradead.org>,
	Anton Altaparmakov <aia21@cantab.net>,
	OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>,
	linux-fsdevel@vger.kernel.org, Evgeniy Polyakov <zbr@ioremap.net>,
	xfs@oss.sgi.com, ocfs2-devel@oss.oracle.com
Subject: Re: [PATCH 07/17] vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode
Date: Sun, 30 Aug 2009 18:29:59 +0100	[thread overview]
Message-ID: <20090830172959.GD7129@shareable.org> (raw)
In-Reply-To: <20090830163917.GA23955@lst.de>

Christoph Hellwig wrote:
> Linux has sync_file_range which currently is a perfect way to lose your
> synced' data, but with two more flags and calls to ->fsync we could
> turn it into range-fsync/fdatasync.

Apart from the way it loses your data, the man page for
sync_file_range never manages to explain quite why you should use the
existing flags in various combinations.  It's only obvious if you've
worked on a kernel yourself.

Having asked this before, it appears one of the reasons for
sync_file_range working as it does is to give the application more
control over writeback order and to some extent, reduce the amount of
blocking.

But it's really difficult to manage the amount of blocking with it.
You need to know the request queue size among other things, and even
if you do it's dynamic.  Writeback order would be as easy with
fdatasync_range, and if you want to reduce blocking, a good
implementation of aio_fsync would be more useful.  Or, you have to use
application writeback threads anyway, so fdatasync_range again.

The one thing sync_file_range can do is let you submit multiple ranges
which the elevators can sort for the hardware.  You can't do that with
sequential calls to fdatasync_range, and it's not clear that aio_fsync
is implemented well enough (but it's a fairly good API for it).

Nick Piggin's idea to let fdatasync_range take multiple ranges might
help with that, but it's not clear how much.

> I'm not sure if that's a good
> idea or if we should just add a sys_fdatasync_rage systems call.

fdatasync_range has the advantage of being comprehensible.  People
will use it because it makes sense.

sync_file_range could be hijacked with new flags to implement
fdatasync_range.  If that's done, I'd rename the system call, but keep
it compatible with sync_file_range's flags, which would never be set
when userspace uses the new functionality.

> I don't quite see the point of a range-fsync, but it could be easily
> implemented as a flag.

A flags argument would be good anyway: to indicate if we want an
ordinary fdatasync, or something which flushes the relevant bit of
volatile hardware caches too.  With that as a capability, it is useful
to offer fsync, because that'd be the only way to get a volatile
hardware cache flush (or maybe the only way not to?).

For that reason, it should be permitted to give an infinitely large range.

I don't see the point of range-fsync either, but I'm not sure if I see
any harm in it.  If permitted, range-fsync with a zero-byte range
would flush just the inode state and none of the data.  If that's
technically available, maybe O_ISYNC and "#define O_SYNC
(O_DATASYNC|O_ISYNC)" isn't such as daft idea.

I'd call it fsync_range for consistency with aio_fsync (POSIX), which
takes flags O_DSYNC or O_SYNC to indicate the type of sync.  But I'd
use new flag names, to keep the space clear for other flags.  Just
sketching some ideas:

/* One of FSYNC_RANGE_SYNC or FSYNC_RANGE_DATASYNC must be set. */

#define FSYNC_RANGE_SYNC	(1 << 0)	/* Like fsync, O_SYNC. */
#define FSYNC_RANGE_DATASYNC	(1 << 1)	/* Like fdatasync, O_DSYNC. */
#define FSYNC_RANGE_NO_HWCACHE	(1 << 2)	/* Not hardware caches. */

-- Jamie

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

WARNING: multiple messages have this Message-ID (diff)
From: Jamie Lokier <jamie@shareable.org>
To: Christoph Hellwig <hch@lst.de>
Cc: Christoph Hellwig <hch@infradead.org>, Jan Kara <jack@suse.cz>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org, Evgeniy Polyakov <zbr@ioremap.net>,
	ocfs2-devel@oss.oracle.com, Joel Becker <joel.becker@oracle.com>,
	Felix Blyakher <felixb@sgi.com>,
	xfs@oss.sgi.com, Anton Altaparmakov <aia21@cantab.net>,
	linux-ntfs-dev@lists.sourceforge.net,
	OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>,
	linux-ext4@vger.kernel.org, tytso@mit.edu
Subject: [Ocfs2-devel] [PATCH 07/17] vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode
Date: Sun, 30 Aug 2009 17:30:07 -0000	[thread overview]
Message-ID: <20090830172959.GD7129@shareable.org> (raw)
In-Reply-To: <20090830163917.GA23955@lst.de>

Christoph Hellwig wrote:
> Linux has sync_file_range which currently is a perfect way to lose your
> synced' data, but with two more flags and calls to ->fsync we could
> turn it into range-fsync/fdatasync.

Apart from the way it loses your data, the man page for
sync_file_range never manages to explain quite why you should use the
existing flags in various combinations.  It's only obvious if you've
worked on a kernel yourself.

Having asked this before, it appears one of the reasons for
sync_file_range working as it does is to give the application more
control over writeback order and to some extent, reduce the amount of
blocking.

But it's really difficult to manage the amount of blocking with it.
You need to know the request queue size among other things, and even
if you do it's dynamic.  Writeback order would be as easy with
fdatasync_range, and if you want to reduce blocking, a good
implementation of aio_fsync would be more useful.  Or, you have to use
application writeback threads anyway, so fdatasync_range again.

The one thing sync_file_range can do is let you submit multiple ranges
which the elevators can sort for the hardware.  You can't do that with
sequential calls to fdatasync_range, and it's not clear that aio_fsync
is implemented well enough (but it's a fairly good API for it).

Nick Piggin's idea to let fdatasync_range take multiple ranges might
help with that, but it's not clear how much.

> I'm not sure if that's a good
> idea or if we should just add a sys_fdatasync_rage systems call.

fdatasync_range has the advantage of being comprehensible.  People
will use it because it makes sense.

sync_file_range could be hijacked with new flags to implement
fdatasync_range.  If that's done, I'd rename the system call, but keep
it compatible with sync_file_range's flags, which would never be set
when userspace uses the new functionality.

> I don't quite see the point of a range-fsync, but it could be easily
> implemented as a flag.

A flags argument would be good anyway: to indicate if we want an
ordinary fdatasync, or something which flushes the relevant bit of
volatile hardware caches too.  With that as a capability, it is useful
to offer fsync, because that'd be the only way to get a volatile
hardware cache flush (or maybe the only way not to?).

For that reason, it should be permitted to give an infinitely large range.

I don't see the point of range-fsync either, but I'm not sure if I see
any harm in it.  If permitted, range-fsync with a zero-byte range
would flush just the inode state and none of the data.  If that's
technically available, maybe O_ISYNC and "#define O_SYNC
(O_DATASYNC|O_ISYNC)" isn't such as daft idea.

I'd call it fsync_range for consistency with aio_fsync (POSIX), which
takes flags O_DSYNC or O_SYNC to indicate the type of sync.  But I'd
use new flag names, to keep the space clear for other flags.  Just
sketching some ideas:

/* One of FSYNC_RANGE_SYNC or FSYNC_RANGE_DATASYNC must be set. */

#define FSYNC_RANGE_SYNC	(1 << 0)	/* Like fsync, O_SYNC. */
#define FSYNC_RANGE_DATASYNC	(1 << 1)	/* Like fdatasync, O_DSYNC. */
#define FSYNC_RANGE_NO_HWCACHE	(1 << 2)	/* Not hardware caches. */

-- Jamie

  reply	other threads:[~2009-08-30 17:30 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-21 17:23 [PATCH 0/17] Make O_SYNC handling use standard syncing path (Version 2) Jan Kara
2009-08-21 17:23 ` [PATCH 01/17] vfs: Introduce filemap_fdatawait_range Jan Kara
2009-08-21 17:23 ` [PATCH 02/17] vfs: Export __generic_file_aio_write() and add some comments Jan Kara
2009-08-21 17:23   ` [Ocfs2-devel] " Jan Kara
2009-08-21 17:23   ` Jan Kara
2009-08-21 17:23 ` [PATCH 03/17] vfs: Remove syncing from generic_file_direct_write() and generic_file_buffered_write() Jan Kara
2009-08-21 17:23   ` [Ocfs2-devel] " Jan Kara
2009-08-21 17:23   ` Jan Kara
2009-08-21 17:23   ` Jan Kara
2009-08-21 17:23 ` [PATCH 04/17] pohmelfs: Use __generic_file_aio_write instead of generic_file_aio_write_nolock Jan Kara
2009-08-21 17:23 ` [PATCH 05/17] ocfs2: " Jan Kara
2009-08-21 17:23   ` [Ocfs2-devel] " Jan Kara
2009-08-21 17:23   ` Jan Kara
2009-08-25 18:58   ` Joel Becker
2009-08-25 18:58     ` [Ocfs2-devel] " Joel Becker
2009-08-25 18:58     ` Joel Becker
2009-08-21 17:23 ` [PATCH 06/17] vfs: Rename generic_file_aio_write_nolock Jan Kara
2009-08-21 17:30   ` Christoph Hellwig
2009-08-21 17:56     ` Jan Kara
2009-08-21 18:07       ` Christoph Hellwig
2009-08-21 17:23 ` [PATCH 07/17] vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode Jan Kara
2009-08-21 17:24   ` [Ocfs2-devel] " Jan Kara
2009-08-21 17:23   ` Jan Kara
2009-08-27 17:35   ` Christoph Hellwig
2009-08-27 17:35     ` [Ocfs2-devel] " Christoph Hellwig
2009-08-27 17:35     ` Christoph Hellwig
2009-08-30 16:35     ` Jamie Lokier
2009-08-30 16:36       ` [Ocfs2-devel] " Jamie Lokier
2009-08-30 16:35       ` Jamie Lokier
2009-08-30 16:39       ` Christoph Hellwig
2009-08-30 16:40         ` [Ocfs2-devel] " Christoph Hellwig
2009-08-30 16:39         ` Christoph Hellwig
2009-08-30 17:29         ` Jamie Lokier [this message]
2009-08-30 17:30           ` [Ocfs2-devel] " Jamie Lokier
2009-08-30 17:29           ` Jamie Lokier
2009-08-21 17:23 ` [PATCH 08/17] ext2: Update comment about generic_osync_inode Jan Kara
2009-08-21 17:23 ` [PATCH 09/17] ext3: Remove syncing logic from ext3_file_write Jan Kara
2009-08-21 17:24 ` [PATCH 10/17] ext4: Remove syncing logic from ext4_file_write Jan Kara
2009-08-21 17:24 ` [PATCH 11/17] ntfs: Use new syncing helpers and update comments Jan Kara
2009-08-21 17:24 ` [PATCH 12/17] ocfs2: Update syncing after splicing to match generic version Jan Kara
2009-08-21 17:24   ` [Ocfs2-devel] " Jan Kara
2009-08-21 17:24   ` Jan Kara
2009-08-24 18:35   ` [Ocfs2-devel] " Mark Fasheh
2009-08-24 18:35     ` Mark Fasheh
2009-08-24 18:40     ` Joel Becker
2009-08-24 18:40       ` Joel Becker
2009-08-25 13:13       ` Jan Kara
2009-08-25 13:13         ` Jan Kara
2009-08-21 17:24 ` [PATCH 13/17] xfs: Convert sync_page_range() to simple fdatawrite_range() Jan Kara
2009-08-21 17:24   ` Jan Kara
2009-08-21 17:28   ` Christoph Hellwig
2009-08-21 17:28     ` Christoph Hellwig
2009-08-21 17:59     ` Jan Kara
2009-08-21 17:59       ` Jan Kara
2009-08-21 17:24 ` [PATCH 14/17] pohmelfs: Use new syncing helper Jan Kara
2009-08-21 17:24 ` [PATCH 15/17] nfs: Remove reference to generic_osync_inode from a comment Jan Kara
2009-08-21 17:52   ` Trond Myklebust
2009-08-21 17:52     ` Trond Myklebust
2009-08-21 17:24 ` [PATCH 16/17] fat: Opencode sync_page_range_nolock() Jan Kara
2009-08-21 17:24 ` [PATCH 17/17] vfs: Remove generic_osync_inode() and sync_page_range{_nolock}() Jan Kara
2009-08-22 16:27 ` [PATCH 0/17] Make O_SYNC handling use standard syncing path (Version 2) Jamie Lokier
2009-08-24  9:29   ` Jan Kara
  -- strict thread matches above, loose matches on Subject: below --
2009-08-21 16:59 [PATCH 01/17] vfs: Introduce filemap_fdatawait_range Jan Kara
2009-08-21 16:59 ` [PATCH 07/17] vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode Jan Kara
2009-08-21 16:59   ` Jan Kara
2009-08-19 16:04 [PATCH 0/17] Make O_SYNC handling use standard syncing path Jan Kara
2009-08-19 16:04 ` [PATCH 07/17] vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode Jan Kara
2009-08-19 16:04   ` Jan Kara
2009-08-19 16:26   ` Christoph Hellwig
2009-08-19 16:26     ` Christoph Hellwig
2009-08-20 12:15     ` Jan Kara
2009-08-20 12:15       ` Jan Kara
2009-08-20 16:27       ` Christoph Hellwig
2009-08-20 16:27         ` Christoph Hellwig
2009-08-21 15:23         ` Jan Kara
2009-08-21 15:23           ` Jan Kara
2009-08-21 15:32           ` Christoph Hellwig
2009-08-21 15:32             ` Christoph Hellwig
2009-08-21 15:48             ` Jan Kara
2009-08-21 15:48               ` Jan Kara
2009-08-26 18:22         ` Christoph Hellwig
2009-08-26 18:22           ` Christoph Hellwig
2009-08-27  0:04           ` Christoph Hellwig
2009-08-27  0:04             ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090830172959.GD7129@shareable.org \
    --to=jamie@shareable.org \
    --cc=aia21@cantab.net \
    --cc=felixb@sgi.com \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=jack@suse.cz \
    --cc=joel.becker@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-ntfs-dev@lists.sourceforge.net \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=tytso@mit.edu \
    --cc=xfs@oss.sgi.com \
    --cc=zbr@ioremap.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.