linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* avoid taking the iolock in fsync unless actually needed v2
@ 2021-01-22 16:46 Christoph Hellwig
  2021-01-22 16:46 ` [PATCH 1/2] xfs: refactor xfs_file_fsync Christoph Hellwig
  2021-01-22 16:46 ` [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync Christoph Hellwig
  0 siblings, 2 replies; 9+ messages in thread
From: Christoph Hellwig @ 2021-01-22 16:46 UTC (permalink / raw)
  To: linux-xfs

Hi all,

this series avoids taking the iolock in fsync if there is no dirty
metadata.

Changes since v1:
 - add a comment explaining the ipincount check

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] xfs: refactor xfs_file_fsync
  2021-01-22 16:46 avoid taking the iolock in fsync unless actually needed v2 Christoph Hellwig
@ 2021-01-22 16:46 ` Christoph Hellwig
  2021-01-22 21:08   ` Dave Chinner
  2021-01-22 16:46 ` [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync Christoph Hellwig
  1 sibling, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2021-01-22 16:46 UTC (permalink / raw)
  To: linux-xfs; +Cc: Brian Foster

Factor out the log syncing logic into two helpers to make the code easier
to read and more maintainable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_file.c | 81 +++++++++++++++++++++++++++++------------------
 1 file changed, 50 insertions(+), 31 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 39695b59dfcc92..588232c77f11e0 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -118,6 +118,54 @@ xfs_dir_fsync(
 	return xfs_log_force_inode(ip);
 }
 
+static xfs_lsn_t
+xfs_fsync_lsn(
+	struct xfs_inode	*ip,
+	bool			datasync)
+{
+	if (!xfs_ipincount(ip))
+		return 0;
+	if (datasync && !(ip->i_itemp->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP))
+		return 0;
+	return ip->i_itemp->ili_last_lsn;
+}
+
+/*
+ * All metadata updates are logged, which means that we just have to flush the
+ * log up to the latest LSN that touched the inode.
+ *
+ * If we have concurrent fsync/fdatasync() calls, we need them to all block on
+ * the log force before we clear the ili_fsync_fields field. This ensures that
+ * we don't get a racing sync operation that does not wait for the metadata to
+ * hit the journal before returning.  If we race with clearing ili_fsync_fields,
+ * then all that will happen is the log force will do nothing as the lsn will
+ * already be on disk.  We can't race with setting ili_fsync_fields because that
+ * is done under XFS_ILOCK_EXCL, and that can't happen because we hold the lock
+ * shared until after the ili_fsync_fields is cleared.
+ */
+static  int
+xfs_fsync_flush_log(
+	struct xfs_inode	*ip,
+	bool			datasync,
+	int			*log_flushed)
+{
+	int			error = 0;
+	xfs_lsn_t		lsn;
+
+	xfs_ilock(ip, XFS_ILOCK_SHARED);
+	lsn = xfs_fsync_lsn(ip, datasync);
+	if (lsn) {
+		error = xfs_log_force_lsn(ip->i_mount, lsn, XFS_LOG_SYNC,
+					  log_flushed);
+
+		spin_lock(&ip->i_itemp->ili_lock);
+		ip->i_itemp->ili_fsync_fields = 0;
+		spin_unlock(&ip->i_itemp->ili_lock);
+	}
+	xfs_iunlock(ip, XFS_ILOCK_SHARED);
+	return error;
+}
+
 STATIC int
 xfs_file_fsync(
 	struct file		*file,
@@ -125,13 +173,10 @@ xfs_file_fsync(
 	loff_t			end,
 	int			datasync)
 {
-	struct inode		*inode = file->f_mapping->host;
-	struct xfs_inode	*ip = XFS_I(inode);
-	struct xfs_inode_log_item *iip = ip->i_itemp;
+	struct xfs_inode	*ip = XFS_I(file->f_mapping->host);
 	struct xfs_mount	*mp = ip->i_mount;
 	int			error = 0;
 	int			log_flushed = 0;
-	xfs_lsn_t		lsn = 0;
 
 	trace_xfs_file_fsync(ip);
 
@@ -155,33 +200,7 @@ xfs_file_fsync(
 	else if (mp->m_logdev_targp != mp->m_ddev_targp)
 		xfs_blkdev_issue_flush(mp->m_ddev_targp);
 
-	/*
-	 * All metadata updates are logged, which means that we just have to
-	 * flush the log up to the latest LSN that touched the inode. If we have
-	 * concurrent fsync/fdatasync() calls, we need them to all block on the
-	 * log force before we clear the ili_fsync_fields field. This ensures
-	 * that we don't get a racing sync operation that does not wait for the
-	 * metadata to hit the journal before returning. If we race with
-	 * clearing the ili_fsync_fields, then all that will happen is the log
-	 * force will do nothing as the lsn will already be on disk. We can't
-	 * race with setting ili_fsync_fields because that is done under
-	 * XFS_ILOCK_EXCL, and that can't happen because we hold the lock shared
-	 * until after the ili_fsync_fields is cleared.
-	 */
-	xfs_ilock(ip, XFS_ILOCK_SHARED);
-	if (xfs_ipincount(ip)) {
-		if (!datasync ||
-		    (iip->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP))
-			lsn = iip->ili_last_lsn;
-	}
-
-	if (lsn) {
-		error = xfs_log_force_lsn(mp, lsn, XFS_LOG_SYNC, &log_flushed);
-		spin_lock(&iip->ili_lock);
-		iip->ili_fsync_fields = 0;
-		spin_unlock(&iip->ili_lock);
-	}
-	xfs_iunlock(ip, XFS_ILOCK_SHARED);
+	error = xfs_fsync_flush_log(ip, datasync, &log_flushed);
 
 	/*
 	 * If we only have a single device, and the log force about was
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync
  2021-01-22 16:46 avoid taking the iolock in fsync unless actually needed v2 Christoph Hellwig
  2021-01-22 16:46 ` [PATCH 1/2] xfs: refactor xfs_file_fsync Christoph Hellwig
@ 2021-01-22 16:46 ` Christoph Hellwig
  2021-01-22 21:08   ` Dave Chinner
  2021-01-25 13:16   ` Brian Foster
  1 sibling, 2 replies; 9+ messages in thread
From: Christoph Hellwig @ 2021-01-22 16:46 UTC (permalink / raw)
  To: linux-xfs

If the inode is not pinned by the time fsync is called we don't need the
ilock to protect against concurrent clearing of ili_fsync_fields as the
inode won't need a log flush or clearing of these fields.  Not taking
the iolock allows for full concurrency of fsync and thus O_DSYNC
completions with io_uring/aio write submissions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 588232c77f11e0..ffe2d7c37e26cd 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -200,7 +200,14 @@ xfs_file_fsync(
 	else if (mp->m_logdev_targp != mp->m_ddev_targp)
 		xfs_blkdev_issue_flush(mp->m_ddev_targp);
 
-	error = xfs_fsync_flush_log(ip, datasync, &log_flushed);
+	/*
+	 * Any inode that has dirty modifications in the log is pinned.  The
+	 * racy check here for a pinned inode while not catch modifications
+	 * that happen concurrently to the fsync call, but fsync semantics
+	 * only require to sync previously completed I/O.
+	 */
+	if (xfs_ipincount(ip))
+		error = xfs_fsync_flush_log(ip, datasync, &log_flushed);
 
 	/*
 	 * If we only have a single device, and the log force about was
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync
  2021-01-22 16:46 ` [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync Christoph Hellwig
@ 2021-01-22 21:08   ` Dave Chinner
  2021-01-23  6:41     ` Christoph Hellwig
  2021-01-25 13:16   ` Brian Foster
  1 sibling, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2021-01-22 21:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Fri, Jan 22, 2021 at 05:46:43PM +0100, Christoph Hellwig wrote:
> If the inode is not pinned by the time fsync is called we don't need the
> ilock to protect against concurrent clearing of ili_fsync_fields as the
> inode won't need a log flush or clearing of these fields.  Not taking
> the iolock allows for full concurrency of fsync and thus O_DSYNC
> completions with io_uring/aio write submissions.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Code looks good, so

Reviewed-by: Dave Chinner <dchinner@redhat.com>

But it makes me wonder...

That is, we already elide the call to generic_write_sync() in direct
IO in the case that the device supports FUA and it's a pure
overwrite with no dirty metadata on the inode. Hence for a lot of
storage and AIO/io_uring+DIO w/ O_DSYNC workloads we're already
eliding this fsync-based lock cycle.

In the case where we can't do a REQ_FUA IO because it is not
supported by the device, then don't we really only need a cache
flush at IO completion rather than the full generic_write_sync()
call path? That would provide this optimisation to all the
filesystems using iomap_dio_rw(), not just XFS....

In fact, I wonder if we need to do anything other than just use
REQ_FUA unconditionally in iomap for this situation, as the block
layer will translate REQ_FUA to a write+post-flush if the device
doesn't support FUA writes directly.

You're thoughts on that, Christoph?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] xfs: refactor xfs_file_fsync
  2021-01-22 16:46 ` [PATCH 1/2] xfs: refactor xfs_file_fsync Christoph Hellwig
@ 2021-01-22 21:08   ` Dave Chinner
  0 siblings, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2021-01-22 21:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, Brian Foster

On Fri, Jan 22, 2021 at 05:46:42PM +0100, Christoph Hellwig wrote:
> Factor out the log syncing logic into two helpers to make the code easier
> to read and more maintainable.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Brian Foster <bfoster@redhat.com>

LGTM.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync
  2021-01-22 21:08   ` Dave Chinner
@ 2021-01-23  6:41     ` Christoph Hellwig
  2021-01-26  6:56       ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2021-01-23  6:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-xfs

On Sat, Jan 23, 2021 at 08:08:01AM +1100, Dave Chinner wrote:
> That is, we already elide the call to generic_write_sync() in direct
> IO in the case that the device supports FUA and it's a pure
> overwrite with no dirty metadata on the inode. Hence for a lot of
> storage and AIO/io_uring+DIO w/ O_DSYNC workloads we're already
> eliding this fsync-based lock cycle.
> 
> In the case where we can't do a REQ_FUA IO because it is not
> supported by the device, then don't we really only need a cache
> flush at IO completion rather than the full generic_write_sync()
> call path? That would provide this optimisation to all the
> filesystems using iomap_dio_rw(), not just XFS....
> 
> In fact, I wonder if we need to do anything other than just use
> REQ_FUA unconditionally in iomap for this situation, as the block
> layer will translate REQ_FUA to a write+post-flush if the device
> doesn't support FUA writes directly.
> 
> You're thoughts on that, Christoph?

For the pure overwrite O_DIRECT + O_DSYNC case we'd get away with just
a flush.  And using REQ_FUA will get us there, so it might be worth
a try.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync
  2021-01-22 16:46 ` [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync Christoph Hellwig
  2021-01-22 21:08   ` Dave Chinner
@ 2021-01-25 13:16   ` Brian Foster
  2021-01-28  8:00     ` Christoph Hellwig
  1 sibling, 1 reply; 9+ messages in thread
From: Brian Foster @ 2021-01-25 13:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Fri, Jan 22, 2021 at 05:46:43PM +0100, Christoph Hellwig wrote:
> If the inode is not pinned by the time fsync is called we don't need the
> ilock to protect against concurrent clearing of ili_fsync_fields as the
> inode won't need a log flush or clearing of these fields.  Not taking
> the iolock allows for full concurrency of fsync and thus O_DSYNC
> completions with io_uring/aio write submissions.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_file.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 588232c77f11e0..ffe2d7c37e26cd 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -200,7 +200,14 @@ xfs_file_fsync(
>  	else if (mp->m_logdev_targp != mp->m_ddev_targp)
>  		xfs_blkdev_issue_flush(mp->m_ddev_targp);
>  
> -	error = xfs_fsync_flush_log(ip, datasync, &log_flushed);
> +	/*
> +	 * Any inode that has dirty modifications in the log is pinned.  The
> +	 * racy check here for a pinned inode while not catch modifications

s/while/will/ ?

Otherwise looks good:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +	 * that happen concurrently to the fsync call, but fsync semantics
> +	 * only require to sync previously completed I/O.
> +	 */
> +	if (xfs_ipincount(ip))
> +		error = xfs_fsync_flush_log(ip, datasync, &log_flushed);
>  
>  	/*
>  	 * If we only have a single device, and the log force about was
> -- 
> 2.29.2
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync
  2021-01-23  6:41     ` Christoph Hellwig
@ 2021-01-26  6:56       ` Christoph Hellwig
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2021-01-26  6:56 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-xfs

On Sat, Jan 23, 2021 at 07:41:39AM +0100, Christoph Hellwig wrote:
> > In fact, I wonder if we need to do anything other than just use
> > REQ_FUA unconditionally in iomap for this situation, as the block
> > layer will translate REQ_FUA to a write+post-flush if the device
> > doesn't support FUA writes directly.
> > 
> > You're thoughts on that, Christoph?
> 
> For the pure overwrite O_DIRECT + O_DSYNC case we'd get away with just
> a flush.  And using REQ_FUA will get us there, so it might be worth
> a try.

And looking at this a little more, while just using REQ_FUA would
work it would be rather suboptimal for many cases, as the block layer
flush state machine would do a flush for every bio.  So for each
O_DIRECT + O_DSYNC write that generates more than one bio we'd grow
extra flushes.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync
  2021-01-25 13:16   ` Brian Foster
@ 2021-01-28  8:00     ` Christoph Hellwig
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2021-01-28  8:00 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs

On Mon, Jan 25, 2021 at 08:16:18AM -0500, Brian Foster wrote:
> > -	error = xfs_fsync_flush_log(ip, datasync, &log_flushed);
> > +	/*
> > +	 * Any inode that has dirty modifications in the log is pinned.  The
> > +	 * racy check here for a pinned inode while not catch modifications
> 
> s/while/will/ ?

Yes.  Darrick, can you fix this up when applying the patch, or do you
want me to resend?

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-01-28  8:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-22 16:46 avoid taking the iolock in fsync unless actually needed v2 Christoph Hellwig
2021-01-22 16:46 ` [PATCH 1/2] xfs: refactor xfs_file_fsync Christoph Hellwig
2021-01-22 21:08   ` Dave Chinner
2021-01-22 16:46 ` [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync Christoph Hellwig
2021-01-22 21:08   ` Dave Chinner
2021-01-23  6:41     ` Christoph Hellwig
2021-01-26  6:56       ` Christoph Hellwig
2021-01-25 13:16   ` Brian Foster
2021-01-28  8:00     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).