* avoid taking the iolock in fsync unless actually needed v2
@ 2021-01-22 16:46 Christoph Hellwig
2021-01-22 16:46 ` [PATCH 1/2] xfs: refactor xfs_file_fsync Christoph Hellwig
2021-01-22 16:46 ` [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync Christoph Hellwig
0 siblings, 2 replies; 12+ messages in thread
From: Christoph Hellwig @ 2021-01-22 16:46 UTC (permalink / raw)
To: linux-xfs
Hi all,
this series avoids taking the iolock in fsync if there is no dirty
metadata.
Changes since v1:
- add a comment explaining the ipincount check
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] xfs: refactor xfs_file_fsync
2021-01-22 16:46 avoid taking the iolock in fsync unless actually needed v2 Christoph Hellwig
@ 2021-01-22 16:46 ` Christoph Hellwig
2021-01-22 21:08 ` Dave Chinner
2021-01-22 16:46 ` [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync Christoph Hellwig
1 sibling, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2021-01-22 16:46 UTC (permalink / raw)
To: linux-xfs; +Cc: Brian Foster
Factor out the log syncing logic into two helpers to make the code easier
to read and more maintainable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
fs/xfs/xfs_file.c | 81 +++++++++++++++++++++++++++++------------------
1 file changed, 50 insertions(+), 31 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 39695b59dfcc92..588232c77f11e0 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -118,6 +118,54 @@ xfs_dir_fsync(
return xfs_log_force_inode(ip);
}
+static xfs_lsn_t
+xfs_fsync_lsn(
+ struct xfs_inode *ip,
+ bool datasync)
+{
+ if (!xfs_ipincount(ip))
+ return 0;
+ if (datasync && !(ip->i_itemp->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP))
+ return 0;
+ return ip->i_itemp->ili_last_lsn;
+}
+
+/*
+ * All metadata updates are logged, which means that we just have to flush the
+ * log up to the latest LSN that touched the inode.
+ *
+ * If we have concurrent fsync/fdatasync() calls, we need them to all block on
+ * the log force before we clear the ili_fsync_fields field. This ensures that
+ * we don't get a racing sync operation that does not wait for the metadata to
+ * hit the journal before returning. If we race with clearing ili_fsync_fields,
+ * then all that will happen is the log force will do nothing as the lsn will
+ * already be on disk. We can't race with setting ili_fsync_fields because that
+ * is done under XFS_ILOCK_EXCL, and that can't happen because we hold the lock
+ * shared until after the ili_fsync_fields is cleared.
+ */
+static int
+xfs_fsync_flush_log(
+ struct xfs_inode *ip,
+ bool datasync,
+ int *log_flushed)
+{
+ int error = 0;
+ xfs_lsn_t lsn;
+
+ xfs_ilock(ip, XFS_ILOCK_SHARED);
+ lsn = xfs_fsync_lsn(ip, datasync);
+ if (lsn) {
+ error = xfs_log_force_lsn(ip->i_mount, lsn, XFS_LOG_SYNC,
+ log_flushed);
+
+ spin_lock(&ip->i_itemp->ili_lock);
+ ip->i_itemp->ili_fsync_fields = 0;
+ spin_unlock(&ip->i_itemp->ili_lock);
+ }
+ xfs_iunlock(ip, XFS_ILOCK_SHARED);
+ return error;
+}
+
STATIC int
xfs_file_fsync(
struct file *file,
@@ -125,13 +173,10 @@ xfs_file_fsync(
loff_t end,
int datasync)
{
- struct inode *inode = file->f_mapping->host;
- struct xfs_inode *ip = XFS_I(inode);
- struct xfs_inode_log_item *iip = ip->i_itemp;
+ struct xfs_inode *ip = XFS_I(file->f_mapping->host);
struct xfs_mount *mp = ip->i_mount;
int error = 0;
int log_flushed = 0;
- xfs_lsn_t lsn = 0;
trace_xfs_file_fsync(ip);
@@ -155,33 +200,7 @@ xfs_file_fsync(
else if (mp->m_logdev_targp != mp->m_ddev_targp)
xfs_blkdev_issue_flush(mp->m_ddev_targp);
- /*
- * All metadata updates are logged, which means that we just have to
- * flush the log up to the latest LSN that touched the inode. If we have
- * concurrent fsync/fdatasync() calls, we need them to all block on the
- * log force before we clear the ili_fsync_fields field. This ensures
- * that we don't get a racing sync operation that does not wait for the
- * metadata to hit the journal before returning. If we race with
- * clearing the ili_fsync_fields, then all that will happen is the log
- * force will do nothing as the lsn will already be on disk. We can't
- * race with setting ili_fsync_fields because that is done under
- * XFS_ILOCK_EXCL, and that can't happen because we hold the lock shared
- * until after the ili_fsync_fields is cleared.
- */
- xfs_ilock(ip, XFS_ILOCK_SHARED);
- if (xfs_ipincount(ip)) {
- if (!datasync ||
- (iip->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP))
- lsn = iip->ili_last_lsn;
- }
-
- if (lsn) {
- error = xfs_log_force_lsn(mp, lsn, XFS_LOG_SYNC, &log_flushed);
- spin_lock(&iip->ili_lock);
- iip->ili_fsync_fields = 0;
- spin_unlock(&iip->ili_lock);
- }
- xfs_iunlock(ip, XFS_ILOCK_SHARED);
+ error = xfs_fsync_flush_log(ip, datasync, &log_flushed);
/*
* If we only have a single device, and the log force about was
--
2.29.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync
2021-01-22 16:46 avoid taking the iolock in fsync unless actually needed v2 Christoph Hellwig
2021-01-22 16:46 ` [PATCH 1/2] xfs: refactor xfs_file_fsync Christoph Hellwig
@ 2021-01-22 16:46 ` Christoph Hellwig
2021-01-22 21:08 ` Dave Chinner
2021-01-25 13:16 ` Brian Foster
1 sibling, 2 replies; 12+ messages in thread
From: Christoph Hellwig @ 2021-01-22 16:46 UTC (permalink / raw)
To: linux-xfs
If the inode is not pinned by the time fsync is called we don't need the
ilock to protect against concurrent clearing of ili_fsync_fields as the
inode won't need a log flush or clearing of these fields. Not taking
the iolock allows for full concurrency of fsync and thus O_DSYNC
completions with io_uring/aio write submissions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_file.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 588232c77f11e0..ffe2d7c37e26cd 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -200,7 +200,14 @@ xfs_file_fsync(
else if (mp->m_logdev_targp != mp->m_ddev_targp)
xfs_blkdev_issue_flush(mp->m_ddev_targp);
- error = xfs_fsync_flush_log(ip, datasync, &log_flushed);
+ /*
+ * Any inode that has dirty modifications in the log is pinned. The
+ * racy check here for a pinned inode while not catch modifications
+ * that happen concurrently to the fsync call, but fsync semantics
+ * only require to sync previously completed I/O.
+ */
+ if (xfs_ipincount(ip))
+ error = xfs_fsync_flush_log(ip, datasync, &log_flushed);
/*
* If we only have a single device, and the log force about was
--
2.29.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync
2021-01-22 16:46 ` [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync Christoph Hellwig
@ 2021-01-22 21:08 ` Dave Chinner
2021-01-23 6:41 ` Christoph Hellwig
2021-01-25 13:16 ` Brian Foster
1 sibling, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2021-01-22 21:08 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-xfs
On Fri, Jan 22, 2021 at 05:46:43PM +0100, Christoph Hellwig wrote:
> If the inode is not pinned by the time fsync is called we don't need the
> ilock to protect against concurrent clearing of ili_fsync_fields as the
> inode won't need a log flush or clearing of these fields. Not taking
> the iolock allows for full concurrency of fsync and thus O_DSYNC
> completions with io_uring/aio write submissions.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Code looks good, so
Reviewed-by: Dave Chinner <dchinner@redhat.com>
But it makes me wonder...
That is, we already elide the call to generic_write_sync() in direct
IO in the case that the device supports FUA and it's a pure
overwrite with no dirty metadata on the inode. Hence for a lot of
storage and AIO/io_uring+DIO w/ O_DSYNC workloads we're already
eliding this fsync-based lock cycle.
In the case where we can't do a REQ_FUA IO because it is not
supported by the device, then don't we really only need a cache
flush at IO completion rather than the full generic_write_sync()
call path? That would provide this optimisation to all the
filesystems using iomap_dio_rw(), not just XFS....
In fact, I wonder if we need to do anything other than just use
REQ_FUA unconditionally in iomap for this situation, as the block
layer will translate REQ_FUA to a write+post-flush if the device
doesn't support FUA writes directly.
You're thoughts on that, Christoph?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] xfs: refactor xfs_file_fsync
2021-01-22 16:46 ` [PATCH 1/2] xfs: refactor xfs_file_fsync Christoph Hellwig
@ 2021-01-22 21:08 ` Dave Chinner
0 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2021-01-22 21:08 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-xfs, Brian Foster
On Fri, Jan 22, 2021 at 05:46:42PM +0100, Christoph Hellwig wrote:
> Factor out the log syncing logic into two helpers to make the code easier
> to read and more maintainable.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
LGTM.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync
2021-01-22 21:08 ` Dave Chinner
@ 2021-01-23 6:41 ` Christoph Hellwig
2021-01-26 6:56 ` Christoph Hellwig
0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2021-01-23 6:41 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, linux-xfs
On Sat, Jan 23, 2021 at 08:08:01AM +1100, Dave Chinner wrote:
> That is, we already elide the call to generic_write_sync() in direct
> IO in the case that the device supports FUA and it's a pure
> overwrite with no dirty metadata on the inode. Hence for a lot of
> storage and AIO/io_uring+DIO w/ O_DSYNC workloads we're already
> eliding this fsync-based lock cycle.
>
> In the case where we can't do a REQ_FUA IO because it is not
> supported by the device, then don't we really only need a cache
> flush at IO completion rather than the full generic_write_sync()
> call path? That would provide this optimisation to all the
> filesystems using iomap_dio_rw(), not just XFS....
>
> In fact, I wonder if we need to do anything other than just use
> REQ_FUA unconditionally in iomap for this situation, as the block
> layer will translate REQ_FUA to a write+post-flush if the device
> doesn't support FUA writes directly.
>
> You're thoughts on that, Christoph?
For the pure overwrite O_DIRECT + O_DSYNC case we'd get away with just
a flush. And using REQ_FUA will get us there, so it might be worth
a try.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync
2021-01-22 16:46 ` [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync Christoph Hellwig
2021-01-22 21:08 ` Dave Chinner
@ 2021-01-25 13:16 ` Brian Foster
2021-01-28 8:00 ` Christoph Hellwig
1 sibling, 1 reply; 12+ messages in thread
From: Brian Foster @ 2021-01-25 13:16 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-xfs
On Fri, Jan 22, 2021 at 05:46:43PM +0100, Christoph Hellwig wrote:
> If the inode is not pinned by the time fsync is called we don't need the
> ilock to protect against concurrent clearing of ili_fsync_fields as the
> inode won't need a log flush or clearing of these fields. Not taking
> the iolock allows for full concurrency of fsync and thus O_DSYNC
> completions with io_uring/aio write submissions.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/xfs/xfs_file.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 588232c77f11e0..ffe2d7c37e26cd 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -200,7 +200,14 @@ xfs_file_fsync(
> else if (mp->m_logdev_targp != mp->m_ddev_targp)
> xfs_blkdev_issue_flush(mp->m_ddev_targp);
>
> - error = xfs_fsync_flush_log(ip, datasync, &log_flushed);
> + /*
> + * Any inode that has dirty modifications in the log is pinned. The
> + * racy check here for a pinned inode while not catch modifications
s/while/will/ ?
Otherwise looks good:
Reviewed-by: Brian Foster <bfoster@redhat.com>
> + * that happen concurrently to the fsync call, but fsync semantics
> + * only require to sync previously completed I/O.
> + */
> + if (xfs_ipincount(ip))
> + error = xfs_fsync_flush_log(ip, datasync, &log_flushed);
>
> /*
> * If we only have a single device, and the log force about was
> --
> 2.29.2
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync
2021-01-23 6:41 ` Christoph Hellwig
@ 2021-01-26 6:56 ` Christoph Hellwig
0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2021-01-26 6:56 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, linux-xfs
On Sat, Jan 23, 2021 at 07:41:39AM +0100, Christoph Hellwig wrote:
> > In fact, I wonder if we need to do anything other than just use
> > REQ_FUA unconditionally in iomap for this situation, as the block
> > layer will translate REQ_FUA to a write+post-flush if the device
> > doesn't support FUA writes directly.
> >
> > You're thoughts on that, Christoph?
>
> For the pure overwrite O_DIRECT + O_DSYNC case we'd get away with just
> a flush. And using REQ_FUA will get us there, so it might be worth
> a try.
And looking at this a little more, while just using REQ_FUA would
work it would be rather suboptimal for many cases, as the block layer
flush state machine would do a flush for every bio. So for each
O_DIRECT + O_DSYNC write that generates more than one bio we'd grow
extra flushes.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync
2021-01-25 13:16 ` Brian Foster
@ 2021-01-28 8:00 ` Christoph Hellwig
0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2021-01-28 8:00 UTC (permalink / raw)
To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs
On Mon, Jan 25, 2021 at 08:16:18AM -0500, Brian Foster wrote:
> > - error = xfs_fsync_flush_log(ip, datasync, &log_flushed);
> > + /*
> > + * Any inode that has dirty modifications in the log is pinned. The
> > + * racy check here for a pinned inode while not catch modifications
>
> s/while/will/ ?
Yes. Darrick, can you fix this up when applying the patch, or do you
want me to resend?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] xfs: refactor xfs_file_fsync
2021-01-12 15:33 ` Brian Foster
@ 2021-01-12 17:12 ` Christoph Hellwig
0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2021-01-12 17:12 UTC (permalink / raw)
To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs
On Tue, Jan 12, 2021 at 10:33:47AM -0500, Brian Foster wrote:
> Looks fine, though it might be nice to find some commonality with
> xfs_log_force_inode():
The common logic is called xfs_log_force_lsn :)
The fact that fsync checks and modifies ili_fsync_fields makes it rather
impractival to share more code unfortunately.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] xfs: refactor xfs_file_fsync
2021-01-11 16:15 ` [PATCH 1/2] xfs: refactor xfs_file_fsync Christoph Hellwig
@ 2021-01-12 15:33 ` Brian Foster
2021-01-12 17:12 ` Christoph Hellwig
0 siblings, 1 reply; 12+ messages in thread
From: Brian Foster @ 2021-01-12 15:33 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-xfs
On Mon, Jan 11, 2021 at 05:15:43PM +0100, Christoph Hellwig wrote:
> Factor out the log syncing logic into two helpers to make the code easier
> to read and more maintainable.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
Looks fine, though it might be nice to find some commonality with
xfs_log_force_inode():
Reviewed-by: Brian Foster <bfoster@redhat.com>
> fs/xfs/xfs_file.c | 81 +++++++++++++++++++++++++++++------------------
> 1 file changed, 50 insertions(+), 31 deletions(-)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 5b0f93f738372d..414d856e2e755a 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -118,6 +118,54 @@ xfs_dir_fsync(
> return xfs_log_force_inode(ip);
> }
>
> +static xfs_lsn_t
> +xfs_fsync_lsn(
> + struct xfs_inode *ip,
> + bool datasync)
> +{
> + if (!xfs_ipincount(ip))
> + return 0;
> + if (datasync && !(ip->i_itemp->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP))
> + return 0;
> + return ip->i_itemp->ili_last_lsn;
> +}
> +
> +/*
> + * All metadata updates are logged, which means that we just have to flush the
> + * log up to the latest LSN that touched the inode.
> + *
> + * If we have concurrent fsync/fdatasync() calls, we need them to all block on
> + * the log force before we clear the ili_fsync_fields field. This ensures that
> + * we don't get a racing sync operation that does not wait for the metadata to
> + * hit the journal before returning. If we race with clearing ili_fsync_fields,
> + * then all that will happen is the log force will do nothing as the lsn will
> + * already be on disk. We can't race with setting ili_fsync_fields because that
> + * is done under XFS_ILOCK_EXCL, and that can't happen because we hold the lock
> + * shared until after the ili_fsync_fields is cleared.
> + */
> +static int
> +xfs_fsync_flush_log(
> + struct xfs_inode *ip,
> + bool datasync,
> + int *log_flushed)
> +{
> + int error = 0;
> + xfs_lsn_t lsn;
> +
> + xfs_ilock(ip, XFS_ILOCK_SHARED);
> + lsn = xfs_fsync_lsn(ip, datasync);
> + if (lsn) {
> + error = xfs_log_force_lsn(ip->i_mount, lsn, XFS_LOG_SYNC,
> + log_flushed);
> +
> + spin_lock(&ip->i_itemp->ili_lock);
> + ip->i_itemp->ili_fsync_fields = 0;
> + spin_unlock(&ip->i_itemp->ili_lock);
> + }
> + xfs_iunlock(ip, XFS_ILOCK_SHARED);
> + return error;
> +}
> +
> STATIC int
> xfs_file_fsync(
> struct file *file,
> @@ -125,13 +173,10 @@ xfs_file_fsync(
> loff_t end,
> int datasync)
> {
> - struct inode *inode = file->f_mapping->host;
> - struct xfs_inode *ip = XFS_I(inode);
> - struct xfs_inode_log_item *iip = ip->i_itemp;
> + struct xfs_inode *ip = XFS_I(file->f_mapping->host);
> struct xfs_mount *mp = ip->i_mount;
> int error = 0;
> int log_flushed = 0;
> - xfs_lsn_t lsn = 0;
>
> trace_xfs_file_fsync(ip);
>
> @@ -155,33 +200,7 @@ xfs_file_fsync(
> else if (mp->m_logdev_targp != mp->m_ddev_targp)
> xfs_blkdev_issue_flush(mp->m_ddev_targp);
>
> - /*
> - * All metadata updates are logged, which means that we just have to
> - * flush the log up to the latest LSN that touched the inode. If we have
> - * concurrent fsync/fdatasync() calls, we need them to all block on the
> - * log force before we clear the ili_fsync_fields field. This ensures
> - * that we don't get a racing sync operation that does not wait for the
> - * metadata to hit the journal before returning. If we race with
> - * clearing the ili_fsync_fields, then all that will happen is the log
> - * force will do nothing as the lsn will already be on disk. We can't
> - * race with setting ili_fsync_fields because that is done under
> - * XFS_ILOCK_EXCL, and that can't happen because we hold the lock shared
> - * until after the ili_fsync_fields is cleared.
> - */
> - xfs_ilock(ip, XFS_ILOCK_SHARED);
> - if (xfs_ipincount(ip)) {
> - if (!datasync ||
> - (iip->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP))
> - lsn = iip->ili_last_lsn;
> - }
> -
> - if (lsn) {
> - error = xfs_log_force_lsn(mp, lsn, XFS_LOG_SYNC, &log_flushed);
> - spin_lock(&iip->ili_lock);
> - iip->ili_fsync_fields = 0;
> - spin_unlock(&iip->ili_lock);
> - }
> - xfs_iunlock(ip, XFS_ILOCK_SHARED);
> + error = xfs_fsync_flush_log(ip, datasync, &log_flushed);
>
> /*
> * If we only have a single device, and the log force about was
> --
> 2.29.2
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] xfs: refactor xfs_file_fsync
2021-01-11 16:15 avoid taking the iolock in fsync unless actually needed Christoph Hellwig
@ 2021-01-11 16:15 ` Christoph Hellwig
2021-01-12 15:33 ` Brian Foster
0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2021-01-11 16:15 UTC (permalink / raw)
To: linux-xfs
Factor out the log syncing logic into two helpers to make the code easier
to read and more maintainable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_file.c | 81 +++++++++++++++++++++++++++++------------------
1 file changed, 50 insertions(+), 31 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 5b0f93f738372d..414d856e2e755a 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -118,6 +118,54 @@ xfs_dir_fsync(
return xfs_log_force_inode(ip);
}
+static xfs_lsn_t
+xfs_fsync_lsn(
+ struct xfs_inode *ip,
+ bool datasync)
+{
+ if (!xfs_ipincount(ip))
+ return 0;
+ if (datasync && !(ip->i_itemp->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP))
+ return 0;
+ return ip->i_itemp->ili_last_lsn;
+}
+
+/*
+ * All metadata updates are logged, which means that we just have to flush the
+ * log up to the latest LSN that touched the inode.
+ *
+ * If we have concurrent fsync/fdatasync() calls, we need them to all block on
+ * the log force before we clear the ili_fsync_fields field. This ensures that
+ * we don't get a racing sync operation that does not wait for the metadata to
+ * hit the journal before returning. If we race with clearing ili_fsync_fields,
+ * then all that will happen is the log force will do nothing as the lsn will
+ * already be on disk. We can't race with setting ili_fsync_fields because that
+ * is done under XFS_ILOCK_EXCL, and that can't happen because we hold the lock
+ * shared until after the ili_fsync_fields is cleared.
+ */
+static int
+xfs_fsync_flush_log(
+ struct xfs_inode *ip,
+ bool datasync,
+ int *log_flushed)
+{
+ int error = 0;
+ xfs_lsn_t lsn;
+
+ xfs_ilock(ip, XFS_ILOCK_SHARED);
+ lsn = xfs_fsync_lsn(ip, datasync);
+ if (lsn) {
+ error = xfs_log_force_lsn(ip->i_mount, lsn, XFS_LOG_SYNC,
+ log_flushed);
+
+ spin_lock(&ip->i_itemp->ili_lock);
+ ip->i_itemp->ili_fsync_fields = 0;
+ spin_unlock(&ip->i_itemp->ili_lock);
+ }
+ xfs_iunlock(ip, XFS_ILOCK_SHARED);
+ return error;
+}
+
STATIC int
xfs_file_fsync(
struct file *file,
@@ -125,13 +173,10 @@ xfs_file_fsync(
loff_t end,
int datasync)
{
- struct inode *inode = file->f_mapping->host;
- struct xfs_inode *ip = XFS_I(inode);
- struct xfs_inode_log_item *iip = ip->i_itemp;
+ struct xfs_inode *ip = XFS_I(file->f_mapping->host);
struct xfs_mount *mp = ip->i_mount;
int error = 0;
int log_flushed = 0;
- xfs_lsn_t lsn = 0;
trace_xfs_file_fsync(ip);
@@ -155,33 +200,7 @@ xfs_file_fsync(
else if (mp->m_logdev_targp != mp->m_ddev_targp)
xfs_blkdev_issue_flush(mp->m_ddev_targp);
- /*
- * All metadata updates are logged, which means that we just have to
- * flush the log up to the latest LSN that touched the inode. If we have
- * concurrent fsync/fdatasync() calls, we need them to all block on the
- * log force before we clear the ili_fsync_fields field. This ensures
- * that we don't get a racing sync operation that does not wait for the
- * metadata to hit the journal before returning. If we race with
- * clearing the ili_fsync_fields, then all that will happen is the log
- * force will do nothing as the lsn will already be on disk. We can't
- * race with setting ili_fsync_fields because that is done under
- * XFS_ILOCK_EXCL, and that can't happen because we hold the lock shared
- * until after the ili_fsync_fields is cleared.
- */
- xfs_ilock(ip, XFS_ILOCK_SHARED);
- if (xfs_ipincount(ip)) {
- if (!datasync ||
- (iip->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP))
- lsn = iip->ili_last_lsn;
- }
-
- if (lsn) {
- error = xfs_log_force_lsn(mp, lsn, XFS_LOG_SYNC, &log_flushed);
- spin_lock(&iip->ili_lock);
- iip->ili_fsync_fields = 0;
- spin_unlock(&iip->ili_lock);
- }
- xfs_iunlock(ip, XFS_ILOCK_SHARED);
+ error = xfs_fsync_flush_log(ip, datasync, &log_flushed);
/*
* If we only have a single device, and the log force about was
--
2.29.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-01-28 8:03 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-22 16:46 avoid taking the iolock in fsync unless actually needed v2 Christoph Hellwig
2021-01-22 16:46 ` [PATCH 1/2] xfs: refactor xfs_file_fsync Christoph Hellwig
2021-01-22 21:08 ` Dave Chinner
2021-01-22 16:46 ` [PATCH 2/2] xfs: reduce ilock acquisitions in xfs_file_fsync Christoph Hellwig
2021-01-22 21:08 ` Dave Chinner
2021-01-23 6:41 ` Christoph Hellwig
2021-01-26 6:56 ` Christoph Hellwig
2021-01-25 13:16 ` Brian Foster
2021-01-28 8:00 ` Christoph Hellwig
-- strict thread matches above, loose matches on Subject: below --
2021-01-11 16:15 avoid taking the iolock in fsync unless actually needed Christoph Hellwig
2021-01-11 16:15 ` [PATCH 1/2] xfs: refactor xfs_file_fsync Christoph Hellwig
2021-01-12 15:33 ` Brian Foster
2021-01-12 17:12 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).