All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] xfs: fixes for XFS_DIFLAG2_DAX support
@ 2016-02-15  5:22 Dave Chinner
  2016-02-15  5:22 ` [PATCH 1/4] xfs: XFS_DIFLAG_DAX is only for regular files or directories Dave Chinner
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Dave Chinner @ 2016-02-15  5:22 UTC (permalink / raw)
  To: xfs; +Cc: ross.zwisler, jack

Hi folks,

This is a series to add the correct constraints to using the on-disk
inode flag to enable DAX on per-file basis. The same constraints are
placed on setting the flag on directories for inheritance purposes.

These constraints are:
	- the inode flag is limited to regular files or directory
	  inodes.
	- the S_DAX flag is only ever set on regular files
	- the flag can only ever be set on filesystems which have
	  blocksize == PAGE_SIZE (for now)
	- When the flag is set or cleared, the current mapping
	  contents are flushed and then invalidated so that the new
	  access mode starts with an empty mapping.
	- Setting or clearing the flag is atomic w.r.t. IO and
	  page faults.

I've tested these manually with xfs_io (patchset for supporting
chattr +x/-x to be sent soon), and it all appears to work as
expected. I'd like to push these for 4.5-rc6 so the initial kernel
with support for this flag doesn't do silly things, so comments,
testing and review woul dbe appreciated.

Cheers,

Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] xfs: XFS_DIFLAG_DAX is only for regular files or directories
  2016-02-15  5:22 [PATCH 0/4] xfs: fixes for XFS_DIFLAG2_DAX support Dave Chinner
@ 2016-02-15  5:22 ` Dave Chinner
  2016-02-15  5:22 ` [PATCH 2/4] xfs: S_DAX is only for regular files Dave Chinner
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2016-02-15  5:22 UTC (permalink / raw)
  To: xfs; +Cc: ross.zwisler, jack

From: Dave Chinner <dchinner@redhat.com>

Only file data can use DAX, so we should onyl be able to set this
flag on regular files. However, the flag also serves as an "inherit"
flag at file create time when set on directories, so limit the
FS_IOC_FSSETXATTR ioctl to only set this flag on regular files and
directories.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_ioctl.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 81d6d62..0895967 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1051,6 +1051,14 @@ xfs_ioctl_setattr_xflags(
 	    !capable(CAP_LINUX_IMMUTABLE))
 		return -EPERM;
 
+	/*
+	 * It is only valid to set the DAX flag on regular files and
+	 * directories. On directories it serves as an inherit hint.
+	 */
+	if ((fa->fsx_xflags & FS_XFLAG_DAX) &&
+	    !(S_ISREG(VFS_I(ip)->i_mode) || S_ISDIR(VFS_I(ip)->i_mode)))
+		return -EINVAL;
+
 	xfs_set_diflags(ip, fa->fsx_xflags);
 	xfs_diflags_to_linux(ip);
 	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] xfs: S_DAX is only for regular files
  2016-02-15  5:22 [PATCH 0/4] xfs: fixes for XFS_DIFLAG2_DAX support Dave Chinner
  2016-02-15  5:22 ` [PATCH 1/4] xfs: XFS_DIFLAG_DAX is only for regular files or directories Dave Chinner
@ 2016-02-15  5:22 ` Dave Chinner
  2016-02-15  5:22 ` [PATCH 3/4] xfs: dynamically switch modes when XFS_DIFLAG2_DAX is set/cleared Dave Chinner
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2016-02-15  5:22 UTC (permalink / raw)
  To: xfs; +Cc: ross.zwisler, jack

From: Dave Chinner <dchinner@redhat.com>

Only regular files can use DAX for data operations, so we should
restrict setting it on the VFS inode to regular files. Setting it on
metadata inodes may cause the VFS to do the wrong thing for such
inodes, so avoid potential problems by restricting the scope of the
flag to what we know is supported.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_iops.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 0d38b1d..9c984a0 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1185,8 +1185,9 @@ xfs_diflags_to_iflags(
 		inode->i_flags |= S_SYNC;
 	if (flags & XFS_DIFLAG_NOATIME)
 		inode->i_flags |= S_NOATIME;
-	if (ip->i_mount->m_flags & XFS_MOUNT_DAX ||
-	    ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)
+	if (S_ISREG(inode->i_mode) &&
+	    (ip->i_mount->m_flags & XFS_MOUNT_DAX ||
+	     ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
 		inode->i_flags |= S_DAX;
 }
 
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] xfs: dynamically switch modes when XFS_DIFLAG2_DAX is set/cleared
  2016-02-15  5:22 [PATCH 0/4] xfs: fixes for XFS_DIFLAG2_DAX support Dave Chinner
  2016-02-15  5:22 ` [PATCH 1/4] xfs: XFS_DIFLAG_DAX is only for regular files or directories Dave Chinner
  2016-02-15  5:22 ` [PATCH 2/4] xfs: S_DAX is only for regular files Dave Chinner
@ 2016-02-15  5:22 ` Dave Chinner
  2016-02-17  7:31   ` [PATCH 3/4 v2] " Dave Chinner
  2016-02-15  5:22 ` [PATCH 4/4] xfs: XFS_DIFLAG2_DAX limited by PAGE_SIZE Dave Chinner
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2016-02-15  5:22 UTC (permalink / raw)
  To: xfs; +Cc: ross.zwisler, jack

From: Dave Chinner <dchinner@redhat.com>

When we set or clear the XFS_DIFLAG2_DAX flag, we should also
set/clear the S_DAX flag in the VFS inode. To do this, we need to
ensure that we first flush and remove any cached entries in the
radix tree to ensure the correct data access method is used when we
next try to read or write data. We ahve to be especially careful
here to lock out page faults so they don't race with the flush and
invalidation before we change the access mode.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_ioctl.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 92 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 0895967..a870d16 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1051,20 +1051,63 @@ xfs_ioctl_setattr_xflags(
 	    !capable(CAP_LINUX_IMMUTABLE))
 		return -EPERM;
 
+	xfs_set_diflags(ip, fa->fsx_xflags);
+	xfs_diflags_to_linux(ip);
+	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+	XFS_STATS_INC(mp, xs_ig_attrchg);
+	return 0;
+}
+
+/*
+ * If we are changing DAX flags, we have to ensure the file is clean and any
+ * cached objects in the address space are invalidated and removed. This
+ * requires us to lock out other IO and page faults similar to a truncate
+ * operation. The locks need to be held until the transaction has been committed
+ * so that the cache invalidation is atomic with respect to the DAX flag
+ * manipulation.
+ */
+static int
+xfs_ioctl_setattr_dax_invalidate(
+	struct xfs_inode	*ip,
+	struct fsxattr		*fa,
+	int			*join_flags)
+{
+	struct inode		*inode = VFS_I(ip);
+	int			error;
+
+	*join_flags = 0;
+
 	/*
 	 * It is only valid to set the DAX flag on regular files and
 	 * directories. On directories it serves as an inherit hint.
 	 */
 	if ((fa->fsx_xflags & FS_XFLAG_DAX) &&
-	    !(S_ISREG(VFS_I(ip)->i_mode) || S_ISDIR(VFS_I(ip)->i_mode)))
+	    !(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)))
 		return -EINVAL;
 
-	xfs_set_diflags(ip, fa->fsx_xflags);
-	xfs_diflags_to_linux(ip);
-	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
-	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-	XFS_STATS_INC(mp, xs_ig_attrchg);
+	/* If the DAX state is not changing, we have nothing to do here. */
+	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
+		return 0;
+	if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
+		return 0;
+
+	/* lock, flush and invalidate mapping in preparation for flag change */
+	xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
+	error = filemap_write_and_wait(inode->i_mapping);
+	if (error)
+		goto out_unlock;
+	error = invalidate_inode_pages2(inode->i_mapping);
+	if (error)
+		goto out_unlock;
+
+	*join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
 	return 0;
+
+out_unlock:
+	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
+	return error;
+
 }
 
 /*
@@ -1072,19 +1115,27 @@ xfs_ioctl_setattr_xflags(
  * have permission to do so. On success, return a clean transaction and the
  * inode locked exclusively ready for further operation specific checks. On
  * failure, return an error without modifying or locking the inode.
+ *
+ * The inode might already be IO locked on call. If this is the case, it is
+ * indicated in @join_flags and we take full responsibility for ensuring they
+ * are unlocked from now on. Hence if we have an error here, we still have to
+ * unlock them. Otherwise, once they are joined to the transaction, they will
+ * be unlocked on commit/cancel.
  */
 static struct xfs_trans *
 xfs_ioctl_setattr_get_trans(
-	struct xfs_inode	*ip)
+	struct xfs_inode	*ip,
+	int			join_flags)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_trans	*tp;
-	int			error;
+	int			error = -EROFS;
 
 	if (mp->m_flags & XFS_MOUNT_RDONLY)
-		return ERR_PTR(-EROFS);
+		goto out_unlock;
+	error = -EIO;
 	if (XFS_FORCED_SHUTDOWN(mp))
-		return ERR_PTR(-EIO);
+		goto out_unlock;
 
 	tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_NOT_SIZE);
 	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0);
@@ -1092,7 +1143,8 @@ xfs_ioctl_setattr_get_trans(
 		goto out_cancel;
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
+	join_flags = 0;
 
 	/*
 	 * CAP_FOWNER overrides the following restrictions:
@@ -1112,6 +1164,9 @@ xfs_ioctl_setattr_get_trans(
 
 out_cancel:
 	xfs_trans_cancel(tp);
+out_unlock:
+	if (join_flags)
+		xfs_iunlock(ip, join_flags);
 	return ERR_PTR(error);
 }
 
@@ -1210,6 +1265,7 @@ xfs_ioctl_setattr(
 	struct xfs_dquot	*pdqp = NULL;
 	struct xfs_dquot	*olddquot = NULL;
 	int			code;
+	int			join_flags = 0;
 
 	trace_xfs_ioctl_setattr(ip);
 
@@ -1218,6 +1274,17 @@ xfs_ioctl_setattr(
 		return code;
 
 	/*
+	 * Changing DAX config may require inode locking for mapping
+	 * invalidation. These need to be held all the way to transaction commit
+	 * or cancel time, so need to be passed through to
+	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
+	 * appropriately.
+	 */
+	code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags);
+	if (code)
+		return code;
+
+	/*
 	 * If disk quotas is on, we make sure that the dquots do exist on disk,
 	 * before we start any other transactions. Trying to do this later
 	 * is messy. We don't care to take a readlock to look at the ids
@@ -1233,7 +1300,7 @@ xfs_ioctl_setattr(
 			return code;
 	}
 
-	tp = xfs_ioctl_setattr_get_trans(ip);
+	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
 	if (IS_ERR(tp)) {
 		code = PTR_ERR(tp);
 		goto error_free_dquots;
@@ -1349,6 +1416,7 @@ xfs_ioc_setxflags(
 	struct xfs_trans	*tp;
 	struct fsxattr		fa;
 	unsigned int		flags;
+	int			join_flags = 0;
 	int			error;
 
 	if (copy_from_user(&flags, arg, sizeof(flags)))
@@ -1365,7 +1433,18 @@ xfs_ioc_setxflags(
 	if (error)
 		return error;
 
-	tp = xfs_ioctl_setattr_get_trans(ip);
+	/*
+	 * Changing DAX config may require inode locking for mapping
+	 * invalidation. These need to be held all the way to transaction commit
+	 * or cancel time, so need to be passed through to
+	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
+	 * appropriately.
+	 */
+	error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags);
+	if (error)
+		goto out_drop_write;
+
+	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
 	if (IS_ERR(tp)) {
 		error = PTR_ERR(tp);
 		goto out_drop_write;
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] xfs: XFS_DIFLAG2_DAX limited by PAGE_SIZE
  2016-02-15  5:22 [PATCH 0/4] xfs: fixes for XFS_DIFLAG2_DAX support Dave Chinner
                   ` (2 preceding siblings ...)
  2016-02-15  5:22 ` [PATCH 3/4] xfs: dynamically switch modes when XFS_DIFLAG2_DAX is set/cleared Dave Chinner
@ 2016-02-15  5:22 ` Dave Chinner
  2016-02-16  0:12   ` Eric Sandeen
  2016-02-16 23:53 ` [PATCH 0/4] xfs: fixes for XFS_DIFLAG2_DAX support Ross Zwisler
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2016-02-15  5:22 UTC (permalink / raw)
  To: xfs; +Cc: ross.zwisler, jack

From: Dave Chinner <dchinner@redhat.com>

If the block size of a filesystem is not at least PAGE_SIZEd, then
at this point in time DAX cannot be used due to the fact we can't
guarantee extents are page sized or aligned without further work.
Hence disallow setting the DAX flag on an inode if the block size is
too small. Also, be defensive and check the block size when reading
an inode in off disk.

In future, we want to allow DAX to work on any filesystem, so this
is temporary while we sort of the correct conbination of extent size
hints and allocation alignment configurations needed to guarantee
page sized and aligned extent allocation for DAX enabled files.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_ioctl.c | 12 ++++++++----
 fs/xfs/xfs_iops.c  |  1 +
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index a870d16..8e9cd3c 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1080,11 +1080,15 @@ xfs_ioctl_setattr_dax_invalidate(
 
 	/*
 	 * It is only valid to set the DAX flag on regular files and
-	 * directories. On directories it serves as an inherit hint.
+	 * directories on filesystems where the block size is at least the page
+	 * size. On directories it serves as an inherit hint.
 	 */
-	if ((fa->fsx_xflags & FS_XFLAG_DAX) &&
-	    !(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)))
-		return -EINVAL;
+	if (fa->fsx_xflags & FS_XFLAG_DAX) {
+		if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)))
+			return -EINVAL;
+		if (ip->i_mount->m_sb.sb_blocksize != PAGE_SIZE)
+			return -EINVAL;
+	}
 
 	/* If the DAX state is not changing, we have nothing to do here. */
 	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 9c984a0..fb7dc61 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1186,6 +1186,7 @@ xfs_diflags_to_iflags(
 	if (flags & XFS_DIFLAG_NOATIME)
 		inode->i_flags |= S_NOATIME;
 	if (S_ISREG(inode->i_mode) &&
+	    ip->i_mount->m_sb.sb_blocksize == PAGE_SIZE &&
 	    (ip->i_mount->m_flags & XFS_MOUNT_DAX ||
 	     ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
 		inode->i_flags |= S_DAX;
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] xfs: XFS_DIFLAG2_DAX limited by PAGE_SIZE
  2016-02-15  5:22 ` [PATCH 4/4] xfs: XFS_DIFLAG2_DAX limited by PAGE_SIZE Dave Chinner
@ 2016-02-16  0:12   ` Eric Sandeen
  2016-02-16  0:39     ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Sandeen @ 2016-02-16  0:12 UTC (permalink / raw)
  To: xfs



On 2/14/16 11:22 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> If the block size of a filesystem is not at least PAGE_SIZEd, then
> at this point in time DAX cannot be used due to the fact we can't
> guarantee extents are page sized or aligned without further work.
> Hence disallow setting the DAX flag on an inode if the block size is
> too small. Also, be defensive and check the block size when reading
> an inode in off disk.
> 
> In future, we want to allow DAX to work on any filesystem, so this
> is temporary while we sort of the correct conbination of extent size
> hints and allocation alignment configurations needed to guarantee
> page sized and aligned extent allocation for DAX enabled files.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_ioctl.c | 12 ++++++++----
>  fs/xfs/xfs_iops.c  |  1 +
>  2 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index a870d16..8e9cd3c 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1080,11 +1080,15 @@ xfs_ioctl_setattr_dax_invalidate(
>  
>  	/*
>  	 * It is only valid to set the DAX flag on regular files and
> -	 * directories. On directories it serves as an inherit hint.
> +	 * directories on filesystems where the block size is at least the page
                                                              ^^^^^^^^
> +	 * size. On directories it serves as an inherit hint.
>  	 */
> -	if ((fa->fsx_xflags & FS_XFLAG_DAX) &&
> -	    !(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)))
> -		return -EINVAL;
> +	if (fa->fsx_xflags & FS_XFLAG_DAX) {
> +		if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)))
> +			return -EINVAL;
> +		if (ip->i_mount->m_sb.sb_blocksize != PAGE_SIZE)
                                                   ^^

So which is it, at least PAGE_SIZE or == PAGE_SIZE?

> +			return -EINVAL;
> +	}
>  
>  	/* If the DAX state is not changing, we have nothing to do here. */
>  	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 9c984a0..fb7dc61 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -1186,6 +1186,7 @@ xfs_diflags_to_iflags(
>  	if (flags & XFS_DIFLAG_NOATIME)
>  		inode->i_flags |= S_NOATIME;
>  	if (S_ISREG(inode->i_mode) &&
> +	    ip->i_mount->m_sb.sb_blocksize == PAGE_SIZE &&
>  	    (ip->i_mount->m_flags & XFS_MOUNT_DAX ||
>  	     ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
>  		inode->i_flags |= S_DAX;

Is it possible to get mounted with XFS_MOUNT_DAX if blocksize != PAGE_SIZE?
If so, should it be?  This seems like a strange place to catch this mismatch.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] xfs: XFS_DIFLAG2_DAX limited by PAGE_SIZE
  2016-02-16  0:12   ` Eric Sandeen
@ 2016-02-16  0:39     ` Dave Chinner
  2016-02-16  0:54       ` Eric Sandeen
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2016-02-16  0:39 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

On Mon, Feb 15, 2016 at 06:12:04PM -0600, Eric Sandeen wrote:
> 
> 
> On 2/14/16 11:22 PM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > If the block size of a filesystem is not at least PAGE_SIZEd, then
> > at this point in time DAX cannot be used due to the fact we can't
> > guarantee extents are page sized or aligned without further work.
> > Hence disallow setting the DAX flag on an inode if the block size is
> > too small. Also, be defensive and check the block size when reading
> > an inode in off disk.
> > 
> > In future, we want to allow DAX to work on any filesystem, so this
> > is temporary while we sort of the correct conbination of extent size
> > hints and allocation alignment configurations needed to guarantee
> > page sized and aligned extent allocation for DAX enabled files.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_ioctl.c | 12 ++++++++----
> >  fs/xfs/xfs_iops.c  |  1 +
> >  2 files changed, 9 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index a870d16..8e9cd3c 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -1080,11 +1080,15 @@ xfs_ioctl_setattr_dax_invalidate(
> >  
> >  	/*
> >  	 * It is only valid to set the DAX flag on regular files and
> > -	 * directories. On directories it serves as an inherit hint.
> > +	 * directories on filesystems where the block size is at least the page
>                                                               ^^^^^^^^
> > +	 * size. On directories it serves as an inherit hint.
> >  	 */
> > -	if ((fa->fsx_xflags & FS_XFLAG_DAX) &&
> > -	    !(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)))
> > -		return -EINVAL;
> > +	if (fa->fsx_xflags & FS_XFLAG_DAX) {
> > +		if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)))
> > +			return -EINVAL;
> > +		if (ip->i_mount->m_sb.sb_blocksize != PAGE_SIZE)
>                                                    ^^
> 
> So which is it, at least PAGE_SIZE or == PAGE_SIZE?

Linux does not support filesystems where the block size is larger
than the page size, so the supported set of "block size at least as
large as PAGE_SIZE" is only block size == PAGE_SIZE.

> >  	/* If the DAX state is not changing, we have nothing to do here. */
> >  	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
> > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > index 9c984a0..fb7dc61 100644
> > --- a/fs/xfs/xfs_iops.c
> > +++ b/fs/xfs/xfs_iops.c
> > @@ -1186,6 +1186,7 @@ xfs_diflags_to_iflags(
> >  	if (flags & XFS_DIFLAG_NOATIME)
> >  		inode->i_flags |= S_NOATIME;
> >  	if (S_ISREG(inode->i_mode) &&
> > +	    ip->i_mount->m_sb.sb_blocksize == PAGE_SIZE &&
> >  	    (ip->i_mount->m_flags & XFS_MOUNT_DAX ||
> >  	     ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
> >  		inode->i_flags |= S_DAX;
> 
> Is it possible to get mounted with XFS_MOUNT_DAX if blocksize != PAGE_SIZE?

No, It's checked at mount time.

> If so, should it be?  This seems like a strange place to catch this mismatch.

It's not for catching a bad mount option (which will go away,
anyway). it's for catching an "in-pmem" inode flag that can't be
applied because, e.g, the kernel was rebuilt with a different base
page size and now the extents won't align correctly for DAX to work.

Or, perhaps, the pmem is a global pool similar interconnected to
individual processing domains that have different architectures and
the filesystem is moved to a different domain. IIUC, HP's Machine
architecture is based around such a shared-pmem, isolated host
layout.

Or, maybe there's some magic RMDA pixie dust allowing a different
physical machine to access the pmem the filesystem was created
on, in which case DAX won't work (e.g. VM gets shifted from one
machine to another due to load balancing).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] xfs: XFS_DIFLAG2_DAX limited by PAGE_SIZE
  2016-02-16  0:39     ` Dave Chinner
@ 2016-02-16  0:54       ` Eric Sandeen
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Sandeen @ 2016-02-16  0:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs



On 2/15/16 6:39 PM, Dave Chinner wrote:
> On Mon, Feb 15, 2016 at 06:12:04PM -0600, Eric Sandeen wrote:

...

>> So which is it, at least PAGE_SIZE or == PAGE_SIZE?
> 
> Linux does not support filesystems where the block size is larger
> than the page size, so the supported set of "block size at least as
> large as PAGE_SIZE" is only block size == PAGE_SIZE.

Um, ok.  Comment is still confusing.  ;)

>>>  	/* If the DAX state is not changing, we have nothing to do here. */
>>>  	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
>>> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
>>> index 9c984a0..fb7dc61 100644
>>> --- a/fs/xfs/xfs_iops.c
>>> +++ b/fs/xfs/xfs_iops.c
>>> @@ -1186,6 +1186,7 @@ xfs_diflags_to_iflags(
>>>  	if (flags & XFS_DIFLAG_NOATIME)
>>>  		inode->i_flags |= S_NOATIME;
>>>  	if (S_ISREG(inode->i_mode) &&
>>> +	    ip->i_mount->m_sb.sb_blocksize == PAGE_SIZE &&
>>>  	    (ip->i_mount->m_flags & XFS_MOUNT_DAX ||
>>>  	     ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
>>>  		inode->i_flags |= S_DAX;
>>
>> Is it possible to get mounted with XFS_MOUNT_DAX if blocksize != PAGE_SIZE?
> 
> No, It's checked at mount time.
> 
>> If so, should it be?  This seems like a strange place to catch this mismatch.
> 
> It's not for catching a bad mount option (which will go away,
> anyway). it's for catching an "in-pmem" inode flag that can't be
> applied because, e.g, the kernel was rebuilt with a different base
> page size and now the extents won't align correctly for DAX to work.

Oh, with the di_flags2 already set on disk, and now encountering a
new page size.  Right, ok.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] xfs: fixes for XFS_DIFLAG2_DAX support
  2016-02-15  5:22 [PATCH 0/4] xfs: fixes for XFS_DIFLAG2_DAX support Dave Chinner
                   ` (3 preceding siblings ...)
  2016-02-15  5:22 ` [PATCH 4/4] xfs: XFS_DIFLAG2_DAX limited by PAGE_SIZE Dave Chinner
@ 2016-02-16 23:53 ` Ross Zwisler
  2016-02-17  0:23   ` Dave Chinner
  2016-02-17 20:04 ` Brian Foster
  2016-02-17 22:56 ` Ross Zwisler
  6 siblings, 1 reply; 14+ messages in thread
From: Ross Zwisler @ 2016-02-16 23:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: ross.zwisler, jack, xfs

On Mon, Feb 15, 2016 at 04:22:10PM +1100, Dave Chinner wrote:
> Hi folks,
> 
> This is a series to add the correct constraints to using the on-disk
> inode flag to enable DAX on per-file basis. The same constraints are
> placed on setting the flag on directories for inheritance purposes.
> 
> These constraints are:
> 	- the inode flag is limited to regular files or directory
> 	  inodes.
> 	- the S_DAX flag is only ever set on regular files
> 	- the flag can only ever be set on filesystems which have
> 	  blocksize == PAGE_SIZE (for now)
> 	- When the flag is set or cleared, the current mapping
> 	  contents are flushed and then invalidated so that the new
> 	  access mode starts with an empty mapping.
> 	- Setting or clearing the flag is atomic w.r.t. IO and
> 	  page faults.
> 
> I've tested these manually with xfs_io (patchset for supporting
> chattr +x/-x to be sent soon), and it all appears to work as
> expected. I'd like to push these for 4.5-rc6 so the initial kernel
> with support for this flag doesn't do silly things, so comments,
> testing and review woul dbe appreciated.

I'm seeing the following errors with xfs/305 when running these four patches +
v4.5-rc4:

================================================
[ BUG: lock held when returning to user space! ]
4.5.0-rc4+ #4 Not tainted
------------------------------------------------
fsstress/2311 is leaving the kernel with locks still held!
2 locks held by fsstress/2311:
 #0:  (&(&ip->i_iolock)->mr_lock){++++++}, at: [<     inline     >] mrupdate_nested fs/xfs/mrlock.h:48
 #0:  (&(&ip->i_iolock)->mr_lock){++++++}, at: [<ffffffff8149ba82>] xfs_ilock+0x152/0x1f0 fs/xfs/xfs_inode.c:170
 #1:  (&(&ip->i_mmaplock)->mr_lock){+.+.+.}, at: [<     inline     >] mrupdate_nested fs/xfs/mrlock.h:48
 #1:  (&(&ip->i_mmaplock)->mr_lock){+.+.+.}, at: [<ffffffff8149baad>] xfs_ilock+0x17d/0x1f0 fs/xfs/xfs_inode.c:175
XFS: Assertion failed: !rwsem_is_locked(&ip->i_iolock.mr_lock), file: fs/xfs/xfs_super.c, line: 981
------------[ cut here ]------------
kernel BUG at fs/xfs/xfs_message.c:113!
invalid opcode: 0000 [#1] SMP
Modules linked in: nd_pmem nd_btt nd_e820 libnvdimm
CPU: 1 PID: 2332 Comm: fsstress Not tainted 4.5.0-rc4+ #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014
task: ffff88009a093180 ti: ffff88009993c000 task.ti: ffff88009993c000
RIP: 0010:[<ffffffff814a3830>]  [<ffffffff814a3830>] assfail+0x20/0x30
RSP: 0018:ffff88009993fe30  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88009843f018 RCX: 0000000000000000
RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff81ee7c48
RBP: ffff88009993fe30 R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000000a R11: f000000000000000 R12: ffff88009843ec80
R13: ffff88009843f018 R14: ffff88009843f0a0 R15: ffff88009843f018
FS:  00007f5dfda64700(0000) GS:ffff88051a200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f5dfd8c1008 CR3: 000000009a20a000 CR4: 00000000000006e0
Stack:
 ffff88009993fe58 ffffffff814a659a ffff88009843f018 ffff88009843f1f0
 ffffffff81c52e40 ffff88009993fe80 ffffffff8127e608 ffff880097112000
 ffff88009843f018 ffffffff81c52e40 ffff88009993feb0 ffffffff8127e8db
Call Trace:
 [<ffffffff814a659a>] xfs_fs_evict_inode+0x3a/0x110 fs/xfs/xfs_super.c:981
 [<ffffffff8127e608>] evict+0xb8/0x180 fs/inode.c:542
 [<     inline     >] iput_final fs/inode.c:1477
 [<ffffffff8127e8db>] iput+0x1ab/0x230 fs/inode.c:1504
 [<ffffffff81270df1>] do_unlinkat+0x1d1/0x2a0 fs/namei.c:3939
 [<     inline     >] SYSC_unlink fs/namei.c:3980
 [<ffffffff812718e6>] SyS_unlink+0x16/0x20 fs/namei.c:3978
 [<ffffffff81a6b2f2>] entry_SYSCALL_64_fastpath+0x12/0x76 arch/x86/entry/entry_64.S:185
Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 f1 41 89 d0 48 c7 c6 b8 cc f2 81 48 89 fa 31 ff 48 89 e5 e8 b0 f8 ff ff <0f> 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
RIP  [<ffffffff814a3830>] assfail+0x20/0x30 fs/xfs/xfs_message.c:111
 RSP <ffff88009993fe30>
---[ end trace 1f81c918d4ac8110 ]---

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] xfs: fixes for XFS_DIFLAG2_DAX support
  2016-02-16 23:53 ` [PATCH 0/4] xfs: fixes for XFS_DIFLAG2_DAX support Ross Zwisler
@ 2016-02-17  0:23   ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2016-02-17  0:23 UTC (permalink / raw)
  To: Ross Zwisler; +Cc: jack, xfs

On Tue, Feb 16, 2016 at 04:53:53PM -0700, Ross Zwisler wrote:
> On Mon, Feb 15, 2016 at 04:22:10PM +1100, Dave Chinner wrote:
> > Hi folks,
> > 
> > This is a series to add the correct constraints to using the on-disk
> > inode flag to enable DAX on per-file basis. The same constraints are
> > placed on setting the flag on directories for inheritance purposes.
> > 
> > These constraints are:
> > 	- the inode flag is limited to regular files or directory
> > 	  inodes.
> > 	- the S_DAX flag is only ever set on regular files
> > 	- the flag can only ever be set on filesystems which have
> > 	  blocksize == PAGE_SIZE (for now)
> > 	- When the flag is set or cleared, the current mapping
> > 	  contents are flushed and then invalidated so that the new
> > 	  access mode starts with an empty mapping.
> > 	- Setting or clearing the flag is atomic w.r.t. IO and
> > 	  page faults.
> > 
> > I've tested these manually with xfs_io (patchset for supporting
> > chattr +x/-x to be sent soon), and it all appears to work as
> > expected. I'd like to push these for 4.5-rc6 so the initial kernel
> > with support for this flag doesn't do silly things, so comments,
> > testing and review woul dbe appreciated.
> 
> I'm seeing the following errors with xfs/305 when running these four patches +
> v4.5-rc4:
> 
> ================================================
> [ BUG: lock held when returning to user space! ]
> 4.5.0-rc4+ #4 Not tainted
> ------------------------------------------------
> fsstress/2311 is leaving the kernel with locks still held!
> 2 locks held by fsstress/2311:
>  #0:  (&(&ip->i_iolock)->mr_lock){++++++}, at: [<     inline     >] mrupdate_nested fs/xfs/mrlock.h:48
>  #0:  (&(&ip->i_iolock)->mr_lock){++++++}, at: [<ffffffff8149ba82>] xfs_ilock+0x152/0x1f0 fs/xfs/xfs_inode.c:170
>  #1:  (&(&ip->i_mmaplock)->mr_lock){+.+.+.}, at: [<     inline     >] mrupdate_nested fs/xfs/mrlock.h:48
>  #1:  (&(&ip->i_mmaplock)->mr_lock){+.+.+.}, at: [<ffffffff8149baad>] xfs_ilock+0x17d/0x1f0 fs/xfs/xfs_inode.c:175

I can see an error path where this might occur on a project quota
related test (fsstress can change projid, that can fail the quota
reservation, leaks locks). I'll fix and resend later today.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3/4 v2] xfs: dynamically switch modes when XFS_DIFLAG2_DAX is set/cleared
  2016-02-15  5:22 ` [PATCH 3/4] xfs: dynamically switch modes when XFS_DIFLAG2_DAX is set/cleared Dave Chinner
@ 2016-02-17  7:31   ` Dave Chinner
  2016-02-17 20:57     ` Ross Zwisler
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2016-02-17  7:31 UTC (permalink / raw)
  To: xfs; +Cc: ross.zwisler, jack

xfs: dynamically switch modes when XFS_DIFLAG2_DAX is set/cleared

From: Dave Chinner <dchinner@redhat.com>

When we set or clear the XFS_DIFLAG2_DAX flag, we should also
set/clear the S_DAX flag in the VFS inode. To do this, we need to
ensure that we first flush and remove any cached entries in the
radix tree to ensure the correct data access method is used when we
next try to read or write data. We ahve to be especially careful
here to lock out page faults so they don't race with the flush and
invalidation before we change the access mode.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
Version 2:
- fix lock leak in xfs_ioctl_setattr where dquot allocation could
  fail after we'd locked the inode iolock and mmaplock but we didn't
  unlock them on error.

 fs/xfs/xfs_ioctl.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 92 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 0895967..c1000e9 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1051,20 +1051,63 @@ xfs_ioctl_setattr_xflags(
 	    !capable(CAP_LINUX_IMMUTABLE))
 		return -EPERM;
 
+	xfs_set_diflags(ip, fa->fsx_xflags);
+	xfs_diflags_to_linux(ip);
+	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+	XFS_STATS_INC(mp, xs_ig_attrchg);
+	return 0;
+}
+
+/*
+ * If we are changing DAX flags, we have to ensure the file is clean and any
+ * cached objects in the address space are invalidated and removed. This
+ * requires us to lock out other IO and page faults similar to a truncate
+ * operation. The locks need to be held until the transaction has been committed
+ * so that the cache invalidation is atomic with respect to the DAX flag
+ * manipulation.
+ */
+static int
+xfs_ioctl_setattr_dax_invalidate(
+	struct xfs_inode	*ip,
+	struct fsxattr		*fa,
+	int			*join_flags)
+{
+	struct inode		*inode = VFS_I(ip);
+	int			error;
+
+	*join_flags = 0;
+
 	/*
 	 * It is only valid to set the DAX flag on regular files and
 	 * directories. On directories it serves as an inherit hint.
 	 */
 	if ((fa->fsx_xflags & FS_XFLAG_DAX) &&
-	    !(S_ISREG(VFS_I(ip)->i_mode) || S_ISDIR(VFS_I(ip)->i_mode)))
+	    !(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)))
 		return -EINVAL;
 
-	xfs_set_diflags(ip, fa->fsx_xflags);
-	xfs_diflags_to_linux(ip);
-	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
-	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-	XFS_STATS_INC(mp, xs_ig_attrchg);
+	/* If the DAX state is not changing, we have nothing to do here. */
+	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
+		return 0;
+	if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
+		return 0;
+
+	/* lock, flush and invalidate mapping in preparation for flag change */
+	xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
+	error = filemap_write_and_wait(inode->i_mapping);
+	if (error)
+		goto out_unlock;
+	error = invalidate_inode_pages2(inode->i_mapping);
+	if (error)
+		goto out_unlock;
+
+	*join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
 	return 0;
+
+out_unlock:
+	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
+	return error;
+
 }
 
 /*
@@ -1072,19 +1115,27 @@ xfs_ioctl_setattr_xflags(
  * have permission to do so. On success, return a clean transaction and the
  * inode locked exclusively ready for further operation specific checks. On
  * failure, return an error without modifying or locking the inode.
+ *
+ * The inode might already be IO locked on call. If this is the case, it is
+ * indicated in @join_flags and we take full responsibility for ensuring they
+ * are unlocked from now on. Hence if we have an error here, we still have to
+ * unlock them. Otherwise, once they are joined to the transaction, they will
+ * be unlocked on commit/cancel.
  */
 static struct xfs_trans *
 xfs_ioctl_setattr_get_trans(
-	struct xs_inode	*ip)
+	struct xfs_inode	*ip,
+	int			join_flags)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_trans	*tp;
-	int			error;
+	int			error = -EROFS;
 
 	if (mp->m_flags & XFS_MOUNT_RDONLY)
-		return ERR_PTR(-EROFS);
+		goto out_unlock;
+	error = -EIO;
 	if (XFS_FORCED_SHUTDOWN(mp))
-		return ERR_PTR(-EIO);
+		goto out_unlock;
 
 	tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_NOT_SIZE);
 	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0);
@@ -1092,7 +1143,8 @@ xfs_ioctl_setattr_get_trans(
 		goto out_cancel;
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
+	join_flags = 0;
 
 	/*
 	 * CAP_FOWNER overrides the following restrictions:
@@ -1112,6 +1164,9 @@ xfs_ioctl_setattr_get_trans(
 
 out_cancel:
 	xfs_trans_cancel(tp);
+out_unlock:
+	if (join_flags)
+		xfs_iunlock(ip, join_flags);
 	return ERR_PTR(error);
 }
 
@@ -1210,6 +1265,7 @@ xfs_ioctl_setattr(
 	struct xfs_dquot	*pdqp = NULL;
 	struct xfs_dquot	*olddquot = NULL;
 	int			code;
+	int			join_flags = 0;
 
 	trace_xfs_ioctl_setattr(ip);
 
@@ -1233,7 +1289,18 @@ xfs_ioctl_setattr(
 			return code;
 	}
 
-	tp = xfs_ioctl_setattr_get_trans(ip);
+	/*
+	 * Changing DAX config may require inode locking for mapping
+	 * invalidation. These need to be held all the way to transaction commit
+	 * or cancel time, so need to be passed through to
+	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
+	 * appropriately.
+	 */
+	code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags);
+	if (code)
+		goto error_free_dquots;
+
+	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
 	if (IS_ERR(tp)) {
 		code = PTR_ERR(tp);
 		goto error_free_dquots;
@@ -1349,6 +1416,7 @@ xfs_ioc_setxflags(
 	struct xfs_trans	*tp;
 	struct fsxattr		fa;
 	unsigned int		flags;
+	int			join_flags = 0;
 	int			error;
 
 	if (copy_from_user(&flags, arg, sizeof(flags)))
@@ -1365,7 +1433,18 @@ xfs_ioc_setxflags(
 	if (error)
 		return error;
 
-	tp = xfs_ioctl_setattr_get_trans(ip);
+	/*
+	 * Changing DAX config may require inode locking for mapping
+	 * invalidation. These need to be held all the way to transaction commit
+	 * or cancel time, so need to be passed through to
+	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
+	 * appropriately.
+	 */
+	error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags);
+	if (error)
+		goto out_drop_write;
+
+	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
 	if (IS_ERR(tp)) {
 		error = PTR_ERR(tp);
 		goto out_drop_write;
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] xfs: fixes for XFS_DIFLAG2_DAX support
  2016-02-15  5:22 [PATCH 0/4] xfs: fixes for XFS_DIFLAG2_DAX support Dave Chinner
                   ` (4 preceding siblings ...)
  2016-02-16 23:53 ` [PATCH 0/4] xfs: fixes for XFS_DIFLAG2_DAX support Ross Zwisler
@ 2016-02-17 20:04 ` Brian Foster
  2016-02-17 22:56 ` Ross Zwisler
  6 siblings, 0 replies; 14+ messages in thread
From: Brian Foster @ 2016-02-17 20:04 UTC (permalink / raw)
  To: Dave Chinner; +Cc: ross.zwisler, jack, xfs

On Mon, Feb 15, 2016 at 04:22:10PM +1100, Dave Chinner wrote:
> Hi folks,
> 
> This is a series to add the correct constraints to using the on-disk
> inode flag to enable DAX on per-file basis. The same constraints are
> placed on setting the flag on directories for inheritance purposes.
> 
> These constraints are:
> 	- the inode flag is limited to regular files or directory
> 	  inodes.
> 	- the S_DAX flag is only ever set on regular files
> 	- the flag can only ever be set on filesystems which have
> 	  blocksize == PAGE_SIZE (for now)
> 	- When the flag is set or cleared, the current mapping
> 	  contents are flushed and then invalidated so that the new
> 	  access mode starts with an empty mapping.
> 	- Setting or clearing the flag is atomic w.r.t. IO and
> 	  page faults.
> 
> I've tested these manually with xfs_io (patchset for supporting
> chattr +x/-x to be sent soon), and it all appears to work as
> expected. I'd like to push these for 4.5-rc6 so the initial kernel
> with support for this flag doesn't do silly things, so comments,
> testing and review woul dbe appreciated.
> 

The commit log description for patch 3 (v2) looks like it needs to be
fixed up (multiple title lines and extraneous From: header), but
otherwise the series looks good to me:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> Cheers,
> 
> Dave.
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4 v2] xfs: dynamically switch modes when XFS_DIFLAG2_DAX is set/cleared
  2016-02-17  7:31   ` [PATCH 3/4 v2] " Dave Chinner
@ 2016-02-17 20:57     ` Ross Zwisler
  0 siblings, 0 replies; 14+ messages in thread
From: Ross Zwisler @ 2016-02-17 20:57 UTC (permalink / raw)
  To: Dave Chinner; +Cc: ross.zwisler, jack, xfs

On Wed, Feb 17, 2016 at 06:31:21PM +1100, Dave Chinner wrote:
> xfs: dynamically switch modes when XFS_DIFLAG2_DAX is set/cleared
> 
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we set or clear the XFS_DIFLAG2_DAX flag, we should also
> set/clear the S_DAX flag in the VFS inode. To do this, we need to
> ensure that we first flush and remove any cached entries in the
> radix tree to ensure the correct data access method is used when we
> next try to read or write data. We ahve to be especially careful
> here to lock out page faults so they don't race with the flush and
> invalidation before we change the access mode.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> Version 2:
> - fix lock leak in xfs_ioctl_setattr where dquot allocation could
>   fail after we'd locked the inode iolock and mmaplock but we didn't
>   unlock them on error.
> 
>  fs/xfs/xfs_ioctl.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 92 insertions(+), 13 deletions(-)
<>
> @@ -1072,19 +1115,27 @@ xfs_ioctl_setattr_xflags(
>   * have permission to do so. On success, return a clean transaction and the
>   * inode locked exclusively ready for further operation specific checks. On
>   * failure, return an error without modifying or locking the inode.
> + *
> + * The inode might already be IO locked on call. If this is the case, it is
> + * indicated in @join_flags and we take full responsibility for ensuring they
> + * are unlocked from now on. Hence if we have an error here, we still have to
> + * unlock them. Otherwise, once they are joined to the transaction, they will
> + * be unlocked on commit/cancel.
>   */
>  static struct xfs_trans *
>  xfs_ioctl_setattr_get_trans(
> -	struct xs_inode	*ip)
> +	struct xfs_inode	*ip,

Interesting - it looks like 'xfs_inode' got corrupted to 'xs_inode'?

After fixing this the series passed all my testing.
Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] xfs: fixes for XFS_DIFLAG2_DAX support
  2016-02-15  5:22 [PATCH 0/4] xfs: fixes for XFS_DIFLAG2_DAX support Dave Chinner
                   ` (5 preceding siblings ...)
  2016-02-17 20:04 ` Brian Foster
@ 2016-02-17 22:56 ` Ross Zwisler
  6 siblings, 0 replies; 14+ messages in thread
From: Ross Zwisler @ 2016-02-17 22:56 UTC (permalink / raw)
  To: Dave Chinner; +Cc: ross.zwisler, jack, xfs

On Mon, Feb 15, 2016 at 04:22:10PM +1100, Dave Chinner wrote:
> Hi folks,
> 
> This is a series to add the correct constraints to using the on-disk
> inode flag to enable DAX on per-file basis. The same constraints are
> placed on setting the flag on directories for inheritance purposes.
> 
> These constraints are:
> 	- the inode flag is limited to regular files or directory
> 	  inodes.
> 	- the S_DAX flag is only ever set on regular files
> 	- the flag can only ever be set on filesystems which have
> 	  blocksize == PAGE_SIZE (for now)
> 	- When the flag is set or cleared, the current mapping
> 	  contents are flushed and then invalidated so that the new
> 	  access mode starts with an empty mapping.
> 	- Setting or clearing the flag is atomic w.r.t. IO and
> 	  page faults.
> 
> I've tested these manually with xfs_io (patchset for supporting
> chattr +x/-x to be sent soon), and it all appears to work as
> expected. I'd like to push these for 4.5-rc6 so the initial kernel
> with support for this flag doesn't do silly things, so comments,
> testing and review woul dbe appreciated.
> 
> Cheers,
> 
> Dave.

This seems like what we want to be doing for raw block devices, when we add
back in support.  Default to DAX off so we have to opt-in, and when we get an
IOCTL to enable them block I/O and page faults, clear out all mappings and
radix tree entries, and then change the flag and turn back on I/O and page
faults.  We'll also need to add a check to prevent mounts on raw block devices
with S_DAX set.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-02-17 22:56 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-15  5:22 [PATCH 0/4] xfs: fixes for XFS_DIFLAG2_DAX support Dave Chinner
2016-02-15  5:22 ` [PATCH 1/4] xfs: XFS_DIFLAG_DAX is only for regular files or directories Dave Chinner
2016-02-15  5:22 ` [PATCH 2/4] xfs: S_DAX is only for regular files Dave Chinner
2016-02-15  5:22 ` [PATCH 3/4] xfs: dynamically switch modes when XFS_DIFLAG2_DAX is set/cleared Dave Chinner
2016-02-17  7:31   ` [PATCH 3/4 v2] " Dave Chinner
2016-02-17 20:57     ` Ross Zwisler
2016-02-15  5:22 ` [PATCH 4/4] xfs: XFS_DIFLAG2_DAX limited by PAGE_SIZE Dave Chinner
2016-02-16  0:12   ` Eric Sandeen
2016-02-16  0:39     ` Dave Chinner
2016-02-16  0:54       ` Eric Sandeen
2016-02-16 23:53 ` [PATCH 0/4] xfs: fixes for XFS_DIFLAG2_DAX support Ross Zwisler
2016-02-17  0:23   ` Dave Chinner
2016-02-17 20:04 ` Brian Foster
2016-02-17 22:56 ` Ross Zwisler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.