linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V6 0/8] Enable per-file/per-directory DAX operations V6
@ 2020-04-07 18:29 ira.weiny
  2020-04-07 18:29 ` [PATCH V6 1/8] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
                   ` (7 more replies)
  0 siblings, 8 replies; 47+ messages in thread
From: ira.weiny @ 2020-04-07 18:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ira Weiny, Darrick J. Wong, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, linux-ext4,
	linux-xfs, linux-fsdevel, Jeff Moyer

From: Ira Weiny <ira.weiny@intel.com>

Changes from V5:
	* make dax mount option a tri-state
	* Reject changes to FS_XFLAG_DAX for regular files
		- Allow only on directories
	* Update documentation

At LSF/MM'19 [1] [2] we discussed applications that overestimate memory
consumption due to their inability to detect whether the kernel will
instantiate page cache for a file, and cases where a global dax enable via a
mount option is too coarse.

The following patch series enables the use of DAX on individual files and/or
directories on xfs, and lays some groundwork to do so in ext4.  It further
enhances the dax mount option to be a tri-state of 'always', 'never', or
'iflag' (default).  Furthermore, it maintians '-o dax' to be equivalent to '-o
dax=always'.

The insight at LSF/MM was to separate the per-mount or per-file "physical"
(FS_XFLAG_DAX) capability switch from an "effective" (S_DAX) attribute for the
file.

At LSF/MM we discussed the difficulties of switching the DAX state of a file
with active mappings / page cache.  It was thought the races could be avoided
by limiting DAX state flips to 0-length files.

However, this turns out to not be true.[3][5] This is because address space
operations (a_ops) may be in use at any time the inode is referenced.

For this reason direct manipulation of the FS_XFLAG_DAX file is prohibited on
files in this patch set.  File can only inherit this flag from their parent
directory on creation.

Details of when and how DAX state can be changed on a file is included in a
documentation patch.

It should be noted that FS_XFLAG_DAX inheritance is not shown in this patch set
as it was maintained from previous work on XFS.  FS_XFLAG_DAX and it's
inheritance will need to be added to other file systems for user control. 


[1] https://lwn.net/Articles/787973/
[2] https://lwn.net/Articles/787233/
[3] https://lkml.org/lkml/2019/10/20/96
[4] https://patchwork.kernel.org/patch/11310511/
[5] https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/


To: linux-kernel@vger.kernel.org
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Cc: linux-xfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org


Changes from V4:
	* Open code the aops lock rather than add it to the xfs_ilock()
	  subsystem (Darrick's comments were obsoleted by this change)
	* Fix lkp build suggestions and bugs

Changes from V3:
	* Remove global locking...  :-D
	* put back per inode locking and remove pre-mature optimizations
	* Fix issues with Directories having IS_DAX() set
	* Fix kernel crash issues reported by Jeff
	* Add some clean up patches
	* Consolidate diflags to iflags functions
	* Update/add documentation
	* Reorder/rename patches quite a bit

Changes from V2:

	* Move i_dax_sem to be a global percpu_rw_sem rather than per inode
		Internal discussions with Dan determined this would be easier,
		just as performant, and slightly less overhead that having it
		in the SB as suggested by Jan
	* Fix locking order in comments and throughout code
	* Change "mode" to "state" throughout commits
	* Add CONFIG_FS_DAX wrapper to disable inode_[un]lock_state() when not
		configured
	* Add static branch for which is activated by a device which supports
		DAX in XFS
	* Change "lock/unlock" to up/down read/write as appropriate
		Previous names were over simplified
	* Update comments/documentation

	* Remove the xfs specific lock to the vfs (global) layer.
	* Fix i_dax_sem locking order and comments

	* Move 'i_mapped' count from struct inode to struct address_space and
		rename it to mmap_count
	* Add inode_has_mappings() call

	* Fix build issues
	* Clean up syntax spacing and minor issues
	* Update man page text for STATX_ATTR_DAX
	* Add reviewed-by's
	* Rebase to 5.6

	Rename patch:
		from: fs/xfs: Add lock/unlock state to xfs
		to: fs/xfs: Add write DAX lock to xfs layer
	Add patch:
		fs/xfs: Clarify lockdep dependency for xfs_isilocked()
	Drop patch:
		fs/xfs: Fix truncate up

Ira Weiny (8):
  fs/xfs: Remove unnecessary initialization of i_rwsem
  fs: Remove unneeded IS_DAX() check
  fs/stat: Define DAX statx attribute
  fs/xfs: Make DAX mount option a tri-state
  fs/xfs: Create function xfs_inode_enable_dax()
  fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  fs/xfs: Change xfs_ioctl_setattr_dax_invalidate() to
    xfs_ioctl_dax_check()
  Documentation/dax: Update Usage section

 Documentation/filesystems/dax.txt |  94 +++++++++++++++++++++-
 fs/stat.c                         |   3 +
 fs/xfs/xfs_icache.c               |   4 +-
 fs/xfs/xfs_inode.h                |   1 +
 fs/xfs/xfs_ioctl.c                | 124 +++---------------------------
 fs/xfs/xfs_iops.c                 |  62 ++++++++++-----
 fs/xfs/xfs_mount.h                |  26 ++++++-
 fs/xfs/xfs_super.c                |  34 ++++++--
 include/linux/fs.h                |   2 +-
 include/uapi/linux/stat.h         |   1 +
 10 files changed, 206 insertions(+), 145 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH V6 1/8] fs/xfs: Remove unnecessary initialization of i_rwsem
  2020-04-07 18:29 [PATCH V6 0/8] Enable per-file/per-directory DAX operations V6 ira.weiny
@ 2020-04-07 18:29 ` ira.weiny
  2020-04-07 23:46   ` Dave Chinner
  2020-04-07 18:29 ` [PATCH V6 2/8] fs: Remove unneeded IS_DAX() check ira.weiny
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 47+ messages in thread
From: ira.weiny @ 2020-04-07 18:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ira Weiny, Darrick J. Wong, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

From: Ira Weiny <ira.weiny@intel.com>

An earlier call of xfs_reinit_inode() from xfs_iget_cache_hit() already
handles initialization of i_rwsem.

Doing so again is unneeded.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V4:
	Update commit message to make it clear the xfs_iget_cache_hit()
	is actually doing the initialization via xfs_reinit_inode()

New for V4:

NOTE: This was found while ensuring the new i_aops_sem was properly
handled.  It seems like this is a layering violation so I think it is
worth cleaning up so as to not confuse others.
---
 fs/xfs/xfs_icache.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 8dc2e5414276..836a1f09be03 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -419,6 +419,7 @@ xfs_iget_cache_hit(
 		spin_unlock(&ip->i_flags_lock);
 		rcu_read_unlock();
 
+		ASSERT(!rwsem_is_locked(&inode->i_rwsem));
 		error = xfs_reinit_inode(mp, inode);
 		if (error) {
 			bool wake;
@@ -452,9 +453,6 @@ xfs_iget_cache_hit(
 		ip->i_sick = 0;
 		ip->i_checked = 0;
 
-		ASSERT(!rwsem_is_locked(&inode->i_rwsem));
-		init_rwsem(&inode->i_rwsem);
-
 		spin_unlock(&ip->i_flags_lock);
 		spin_unlock(&pag->pag_ici_lock);
 	} else {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 2/8] fs: Remove unneeded IS_DAX() check
  2020-04-07 18:29 [PATCH V6 0/8] Enable per-file/per-directory DAX operations V6 ira.weiny
  2020-04-07 18:29 ` [PATCH V6 1/8] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
@ 2020-04-07 18:29 ` ira.weiny
  2020-04-09  7:31   ` Christoph Hellwig
  2020-04-07 18:29 ` [PATCH V6 3/8] fs/stat: Define DAX statx attribute ira.weiny
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 47+ messages in thread
From: ira.weiny @ 2020-04-07 18:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ira Weiny, Dave Chinner, Jan Kara, Darrick J. Wong, Dan Williams,
	Dave Chinner, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-xfs, linux-fsdevel

From: Ira Weiny <ira.weiny@intel.com>

Remove the check because DAX now has it's own read/write methods and
file systems which support DAX check IS_DAX() prior to IOCB_DIRECT on
their own.  Therefore, it does not matter if the file state is DAX when
the iocb flags are created.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from v3:
	Reword commit message.
	Reordered to be a 'pre-cleanup' patch
---
 include/linux/fs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index abedbffe2c9e..f97b99c36cee 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3389,7 +3389,7 @@ extern int file_update_time(struct file *file);
 
 static inline bool io_is_direct(struct file *filp)
 {
-	return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host);
+	return (filp->f_flags & O_DIRECT);
 }
 
 static inline bool vma_is_dax(struct vm_area_struct *vma)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 3/8] fs/stat: Define DAX statx attribute
  2020-04-07 18:29 [PATCH V6 0/8] Enable per-file/per-directory DAX operations V6 ira.weiny
  2020-04-07 18:29 ` [PATCH V6 1/8] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
  2020-04-07 18:29 ` [PATCH V6 2/8] fs: Remove unneeded IS_DAX() check ira.weiny
@ 2020-04-07 18:29 ` ira.weiny
  2020-04-07 23:47   ` Dave Chinner
  2020-04-07 18:29 ` [PATCH V6 4/8] fs/xfs: Make DAX mount option a tri-state ira.weiny
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 47+ messages in thread
From: ira.weiny @ 2020-04-07 18:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ira Weiny, Jan Kara, Darrick J . Wong, Dan Williams,
	Dave Chinner, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-xfs, linux-fsdevel

From: Ira Weiny <ira.weiny@intel.com>

In order for users to determine if a file is currently operating in DAX
state (effective DAX).  Define a statx attribute value and set that
attribute if the effective DAX flag is set.

To go along with this we propose the following addition to the statx man
page:

STATX_ATTR_DAX

	The file is in the DAX (cpu direct access) state.  DAX state
	attempts to minimize software cache effects for both I/O and
	memory mappings of this file.  It requires a file system which
	has been configured to support DAX.

	DAX generally assumes all accesses are via cpu load / store
	instructions which can minimize overhead for small accesses, but
	may adversely affect cpu utilization for large transfers.

	File I/O is done directly to/from user-space buffers and memory
	mapped I/O may be performed with direct memory mappings that
	bypass kernel page cache.

	While the DAX property tends to result in data being transferred
	synchronously, it does not give the same guarantees of O_SYNC
	where data and the necessary metadata are transferred together.

	A DAX file may support being mapped with the MAP_SYNC flag,
	which enables a program to use CPU cache flush instructions to
	persist CPU store operations without an explicit fsync(2).  See
	mmap(2) for more information.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V2:
	Update man page text with comments from Darrick, Jan, Dan, and
	Dave.
---
 fs/stat.c                 | 3 +++
 include/uapi/linux/stat.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/fs/stat.c b/fs/stat.c
index 030008796479..894699c74dde 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -79,6 +79,9 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat,
 	if (IS_AUTOMOUNT(inode))
 		stat->attributes |= STATX_ATTR_AUTOMOUNT;
 
+	if (IS_DAX(inode))
+		stat->attributes |= STATX_ATTR_DAX;
+
 	if (inode->i_op->getattr)
 		return inode->i_op->getattr(path, stat, request_mask,
 					    query_flags);
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index ad80a5c885d5..e5f9d5517f6b 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -169,6 +169,7 @@ struct statx {
 #define STATX_ATTR_ENCRYPTED		0x00000800 /* [I] File requires key to decrypt in fs */
 #define STATX_ATTR_AUTOMOUNT		0x00001000 /* Dir: Automount trigger */
 #define STATX_ATTR_VERITY		0x00100000 /* [I] Verity protected file */
+#define STATX_ATTR_DAX			0x00002000 /* [I] File is DAX */
 
 
 #endif /* _UAPI_LINUX_STAT_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 4/8] fs/xfs: Make DAX mount option a tri-state
  2020-04-07 18:29 [PATCH V6 0/8] Enable per-file/per-directory DAX operations V6 ira.weiny
                   ` (2 preceding siblings ...)
  2020-04-07 18:29 ` [PATCH V6 3/8] fs/stat: Define DAX statx attribute ira.weiny
@ 2020-04-07 18:29 ` ira.weiny
  2020-04-07 23:59   ` Dave Chinner
  2020-04-07 18:29 ` [PATCH V6 5/8] fs/xfs: Create function xfs_inode_enable_dax() ira.weiny
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 47+ messages in thread
From: ira.weiny @ 2020-04-07 18:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ira Weiny, Darrick J. Wong, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

From: Ira Weiny <ira.weiny@intel.com>

As agreed upon[1].  We make the dax mount option a tri-state.  '-o dax'
continues to operate the same.  We add 'always', 'never', and 'iflag'
(default).

[1] https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from v5:
	New Patch
---
 fs/xfs/xfs_iops.c  |  2 +-
 fs/xfs/xfs_mount.h | 26 +++++++++++++++++++++++++-
 fs/xfs/xfs_super.c | 34 +++++++++++++++++++++++++++++-----
 3 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 81f2f93caec0..1ec4a36917bd 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1248,7 +1248,7 @@ xfs_inode_supports_dax(
 		return false;
 
 	/* DAX mount option or DAX iflag must be set. */
-	if (!(mp->m_flags & XFS_MOUNT_DAX) &&
+	if (xfs_mount_dax_mode(mp) != XFS_DAX_ALWAYS &&
 	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
 		return false;
 
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 88ab09ed29e7..ce027ee06692 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -233,7 +233,31 @@ typedef struct xfs_mount {
 						   allocator */
 #define XFS_MOUNT_NOATTR2	(1ULL << 25)	/* disable use of attr2 format */
 
-#define XFS_MOUNT_DAX		(1ULL << 62)	/* TEST ONLY! */
+/* DAX flag is a 2 bit field representing a tri-state for dax
+ *      iflag, always, never
+ * We reserve/document the 2 bits using dax field/field2
+ */
+#define XFS_DAX_FIELD_MASK 0x3ULL
+#define XFS_DAX_FIELD_SHIFT 62
+#define XFS_MOUNT_DAX_FIELD	(1ULL << 62)
+#define XFS_MOUNT_DAX_FIELD2	(1ULL << 63)
+
+enum {
+	XFS_DAX_IFLAG = 0,
+	XFS_DAX_ALWAYS = 1,
+	XFS_DAX_NEVER = 2,
+};
+
+static inline void xfs_mount_set_dax(struct xfs_mount *mp, u32 val)
+{
+	mp->m_flags &= ~(XFS_DAX_FIELD_MASK << XFS_DAX_FIELD_SHIFT);
+	mp->m_flags |= ((val & XFS_DAX_FIELD_MASK) << XFS_DAX_FIELD_SHIFT);
+}
+
+static inline u32 xfs_mount_dax_mode(struct xfs_mount *mp)
+{
+	return (mp->m_flags >> XFS_DAX_FIELD_SHIFT) & XFS_DAX_FIELD_MASK;
+}
 
 /*
  * Max and min values for mount-option defined I/O
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 2094386af8ac..d2fd465eeed5 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -47,6 +47,13 @@ static struct kset *xfs_kset;		/* top-level xfs sysfs dir */
 static struct xfs_kobj xfs_dbg_kobj;	/* global debug sysfs attrs */
 #endif
 
+static const struct constant_table dax_param_enums[] = {
+	{"iflag",	XFS_DAX_IFLAG },
+	{"always",	XFS_DAX_ALWAYS },
+	{"never",	XFS_DAX_NEVER },
+	{}
+};
+
 /*
  * Table driven mount option parser.
  */
@@ -59,7 +66,7 @@ enum {
 	Opt_filestreams, Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota,
 	Opt_prjquota, Opt_uquota, Opt_gquota, Opt_pquota,
 	Opt_uqnoenforce, Opt_gqnoenforce, Opt_pqnoenforce, Opt_qnoenforce,
-	Opt_discard, Opt_nodiscard, Opt_dax,
+	Opt_discard, Opt_nodiscard, Opt_dax, Opt_dax_enum,
 };
 
 static const struct fs_parameter_spec xfs_fs_parameters[] = {
@@ -103,6 +110,7 @@ static const struct fs_parameter_spec xfs_fs_parameters[] = {
 	fsparam_flag("discard",		Opt_discard),
 	fsparam_flag("nodiscard",	Opt_nodiscard),
 	fsparam_flag("dax",		Opt_dax),
+	fsparam_enum("dax",		Opt_dax_enum, dax_param_enums),
 	{}
 };
 
@@ -129,7 +137,6 @@ xfs_fs_show_options(
 		{ XFS_MOUNT_GRPID,		",grpid" },
 		{ XFS_MOUNT_DISCARD,		",discard" },
 		{ XFS_MOUNT_LARGEIO,		",largeio" },
-		{ XFS_MOUNT_DAX,		",dax" },
 		{ 0, NULL }
 	};
 	struct xfs_mount	*mp = XFS_M(root->d_sb);
@@ -185,6 +192,20 @@ xfs_fs_show_options(
 	if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT))
 		seq_puts(m, ",noquota");
 
+	switch (xfs_mount_dax_mode(mp)) {
+		case XFS_DAX_IFLAG:
+			seq_puts(m, ",dax=iflag");
+			break;
+		case XFS_DAX_ALWAYS:
+			seq_puts(m, ",dax=always");
+			break;
+		case XFS_DAX_NEVER:
+			seq_puts(m, ",dax=never");
+			break;
+		default:
+			break;
+	}
+
 	return 0;
 }
 
@@ -1244,7 +1265,10 @@ xfs_fc_parse_param(
 		return 0;
 #ifdef CONFIG_FS_DAX
 	case Opt_dax:
-		mp->m_flags |= XFS_MOUNT_DAX;
+		xfs_mount_set_dax(mp, XFS_DAX_ALWAYS);
+		return 0;
+	case Opt_dax_enum:
+		xfs_mount_set_dax(mp, result.uint_32);
 		return 0;
 #endif
 	default:
@@ -1437,7 +1461,7 @@ xfs_fc_fill_super(
 	if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5)
 		sb->s_flags |= SB_I_VERSION;
 
-	if (mp->m_flags & XFS_MOUNT_DAX) {
+	if (xfs_mount_dax_mode(mp) == XFS_DAX_ALWAYS) {
 		bool rtdev_is_dax = false, datadev_is_dax;
 
 		xfs_warn(mp,
@@ -1451,7 +1475,7 @@ xfs_fc_fill_super(
 		if (!rtdev_is_dax && !datadev_is_dax) {
 			xfs_alert(mp,
 			"DAX unsupported by block device. Turning off DAX.");
-			mp->m_flags &= ~XFS_MOUNT_DAX;
+			xfs_mount_set_dax(mp, XFS_DAX_NEVER);
 		}
 		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
 			xfs_alert(mp,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 5/8] fs/xfs: Create function xfs_inode_enable_dax()
  2020-04-07 18:29 [PATCH V6 0/8] Enable per-file/per-directory DAX operations V6 ira.weiny
                   ` (3 preceding siblings ...)
  2020-04-07 18:29 ` [PATCH V6 4/8] fs/xfs: Make DAX mount option a tri-state ira.weiny
@ 2020-04-07 18:29 ` ira.weiny
  2020-04-08  0:05   ` Dave Chinner
  2020-04-07 18:29 ` [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags() ira.weiny
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 47+ messages in thread
From: ira.weiny @ 2020-04-07 18:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ira Weiny, Darrick J. Wong, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

From: Ira Weiny <ira.weiny@intel.com>

xfs_inode_supports_dax() should reflect if the inode can support DAX not
that it is enabled for DAX.

Change the use of xfs_inode_supports_dax() to reflect only if the inode
and underlying storage support dax.

Add a new function xfs_inode_enable_dax() which reflects if the inode
should be enabled for DAX.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from v5:
	Update to reflect the new tri-state mount option

Changes from v3:
	Update functions and names to be more clear
	Update commit message
	Merge with
		'fs/xfs: Clean up DAX support check'
		don't allow IS_DAX() on a directory
		use STATIC macro for static
		make xfs_inode_supports_dax() static
---
 fs/xfs/xfs_iops.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 1ec4a36917bd..e07f7b641226 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1237,19 +1237,18 @@ static const struct inode_operations xfs_inline_symlink_inode_operations = {
 };
 
 /* Figure out if this file actually supports DAX. */
-static bool
+STATIC bool
 xfs_inode_supports_dax(
 	struct xfs_inode	*ip)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 
 	/* Only supported on non-reflinked files. */
-	if (!S_ISREG(VFS_I(ip)->i_mode) || xfs_is_reflink_inode(ip))
+	if (xfs_is_reflink_inode(ip))
 		return false;
 
-	/* DAX mount option or DAX iflag must be set. */
-	if (xfs_mount_dax_mode(mp) != XFS_DAX_ALWAYS &&
-	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
+	/* Only supported on regular files. */
+	if (!S_ISREG(VFS_I(ip)->i_mode))
 		return false;
 
 	/* Block size must match page size */
@@ -1260,6 +1259,19 @@ xfs_inode_supports_dax(
 	return xfs_inode_buftarg(ip)->bt_daxdev != NULL;
 }
 
+STATIC bool
+xfs_inode_enable_dax(
+	struct xfs_inode *ip)
+{
+	u32 dax_mode = xfs_mount_dax_mode(ip->i_mount);
+
+	if (dax_mode == XFS_DAX_NEVER || !xfs_inode_supports_dax(ip))
+		return false;
+	if (dax_mode == XFS_DAX_ALWAYS || ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)
+		return true;
+	return false;
+}
+
 STATIC void
 xfs_diflags_to_iflags(
 	struct inode		*inode,
@@ -1278,7 +1290,7 @@ xfs_diflags_to_iflags(
 		inode->i_flags |= S_SYNC;
 	if (flags & XFS_DIFLAG_NOATIME)
 		inode->i_flags |= S_NOATIME;
-	if (xfs_inode_supports_dax(ip))
+	if (xfs_inode_enable_dax(ip))
 		inode->i_flags |= S_DAX;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-07 18:29 [PATCH V6 0/8] Enable per-file/per-directory DAX operations V6 ira.weiny
                   ` (4 preceding siblings ...)
  2020-04-07 18:29 ` [PATCH V6 5/8] fs/xfs: Create function xfs_inode_enable_dax() ira.weiny
@ 2020-04-07 18:29 ` ira.weiny
  2020-04-08  2:08   ` Dave Chinner
  2020-04-07 18:29 ` [PATCH V6 7/8] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate() to xfs_ioctl_dax_check() ira.weiny
  2020-04-07 18:29 ` [PATCH V6 8/8] Documentation/dax: Update Usage section ira.weiny
  7 siblings, 1 reply; 47+ messages in thread
From: ira.weiny @ 2020-04-07 18:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ira Weiny, Darrick J. Wong, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

From: Ira Weiny <ira.weiny@intel.com>

The functionality in xfs_diflags_to_linux() and xfs_diflags_to_iflags() are
nearly identical.  The only difference is that *_to_linux() is called after
inode setup and disallows changing the DAX flag.

Combining them can be done with a flag which indicates if this is the initial
setup to allow the DAX flag to be properly set only at init time.

So remove xfs_diflags_to_linux() and call the modified xfs_diflags_to_iflags()
directly.

While we are here simplify xfs_diflags_to_iflags() to take struct xfs_inode and
use xfs_ip2xflags() to ensure future diflags are included correctly.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V5:
	The functions are no longer identical so we can only combine
	them rather than deleting one completely.  This is reflected in
	the new init parameter.
---
 fs/xfs/xfs_inode.h |  1 +
 fs/xfs/xfs_ioctl.c | 33 +--------------------------------
 fs/xfs/xfs_iops.c  | 42 +++++++++++++++++++++++++++---------------
 3 files changed, 29 insertions(+), 47 deletions(-)

diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 492e53992fa9..e76ed9ca17f7 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -466,6 +466,7 @@ int	xfs_break_layouts(struct inode *inode, uint *iolock,
 /* from xfs_iops.c */
 extern void xfs_setup_inode(struct xfs_inode *ip);
 extern void xfs_setup_iops(struct xfs_inode *ip);
+extern void xfs_diflags_to_iflags(struct xfs_inode *ip, bool init);
 
 /*
  * When setting up a newly allocated inode, we need to call
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index d42de92cb283..c6cd92ef4a05 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1100,37 +1100,6 @@ xfs_flags2diflags2(
 	return di_flags2;
 }
 
-STATIC void
-xfs_diflags_to_linux(
-	struct xfs_inode	*ip)
-{
-	struct inode		*inode = VFS_I(ip);
-	unsigned int		xflags = xfs_ip2xflags(ip);
-
-	if (xflags & FS_XFLAG_IMMUTABLE)
-		inode->i_flags |= S_IMMUTABLE;
-	else
-		inode->i_flags &= ~S_IMMUTABLE;
-	if (xflags & FS_XFLAG_APPEND)
-		inode->i_flags |= S_APPEND;
-	else
-		inode->i_flags &= ~S_APPEND;
-	if (xflags & FS_XFLAG_SYNC)
-		inode->i_flags |= S_SYNC;
-	else
-		inode->i_flags &= ~S_SYNC;
-	if (xflags & FS_XFLAG_NOATIME)
-		inode->i_flags |= S_NOATIME;
-	else
-		inode->i_flags &= ~S_NOATIME;
-#if 0	/* disabled until the flag switching races are sorted out */
-	if (xflags & FS_XFLAG_DAX)
-		inode->i_flags |= S_DAX;
-	else
-		inode->i_flags &= ~S_DAX;
-#endif
-}
-
 static int
 xfs_ioctl_setattr_xflags(
 	struct xfs_trans	*tp,
@@ -1168,7 +1137,7 @@ xfs_ioctl_setattr_xflags(
 	ip->i_d.di_flags = xfs_flags2diflags(ip, fa->fsx_xflags);
 	ip->i_d.di_flags2 = di_flags2;
 
-	xfs_diflags_to_linux(ip);
+	xfs_diflags_to_iflags(ip, false);
 	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 	XFS_STATS_INC(mp, xs_ig_attrchg);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index e07f7b641226..a4ac8568c8c7 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1259,7 +1259,7 @@ xfs_inode_supports_dax(
 	return xfs_inode_buftarg(ip)->bt_daxdev != NULL;
 }
 
-STATIC bool
+static bool
 xfs_inode_enable_dax(
 	struct xfs_inode *ip)
 {
@@ -1272,26 +1272,38 @@ xfs_inode_enable_dax(
 	return false;
 }
 
-STATIC void
+void
 xfs_diflags_to_iflags(
-	struct inode		*inode,
-	struct xfs_inode	*ip)
+	struct xfs_inode	*ip,
+	bool init)
 {
-	uint16_t		flags = ip->i_d.di_flags;
-
-	inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC |
-			    S_NOATIME | S_DAX);
+	struct inode		*inode = VFS_I(ip);
+	uint			diflags = xfs_ip2xflags(ip);
 
-	if (flags & XFS_DIFLAG_IMMUTABLE)
+	if (diflags & FS_XFLAG_IMMUTABLE)
 		inode->i_flags |= S_IMMUTABLE;
-	if (flags & XFS_DIFLAG_APPEND)
+	else
+		inode->i_flags &= ~S_IMMUTABLE;
+	if (diflags & FS_XFLAG_APPEND)
 		inode->i_flags |= S_APPEND;
-	if (flags & XFS_DIFLAG_SYNC)
+	else
+		inode->i_flags &= ~S_APPEND;
+	if (diflags & FS_XFLAG_SYNC)
 		inode->i_flags |= S_SYNC;
-	if (flags & XFS_DIFLAG_NOATIME)
+	else
+		inode->i_flags &= ~S_SYNC;
+	if (diflags & FS_XFLAG_NOATIME)
 		inode->i_flags |= S_NOATIME;
-	if (xfs_inode_enable_dax(ip))
-		inode->i_flags |= S_DAX;
+	else
+		inode->i_flags &= ~S_NOATIME;
+
+	/* Only toggle the dax flag when initializing */
+	if (init) {
+		if (xfs_inode_enable_dax(ip))
+			inode->i_flags |= S_DAX;
+		else
+			inode->i_flags &= ~S_DAX;
+	}
 }
 
 /*
@@ -1320,7 +1332,7 @@ xfs_setup_inode(
 	inode->i_gid    = xfs_gid_to_kgid(ip->i_d.di_gid);
 
 	i_size_write(inode, ip->i_d.di_size);
-	xfs_diflags_to_iflags(inode, ip);
+	xfs_diflags_to_iflags(ip, true);
 
 	if (S_ISDIR(inode->i_mode)) {
 		/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 7/8] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate() to xfs_ioctl_dax_check()
  2020-04-07 18:29 [PATCH V6 0/8] Enable per-file/per-directory DAX operations V6 ira.weiny
                   ` (5 preceding siblings ...)
  2020-04-07 18:29 ` [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags() ira.weiny
@ 2020-04-07 18:29 ` ira.weiny
  2020-04-08  2:23   ` Dave Chinner
  2020-04-08 15:37   ` Darrick J. Wong
  2020-04-07 18:29 ` [PATCH V6 8/8] Documentation/dax: Update Usage section ira.weiny
  7 siblings, 2 replies; 47+ messages in thread
From: ira.weiny @ 2020-04-07 18:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ira Weiny, Darrick J. Wong, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

From: Ira Weiny <ira.weiny@intel.com>

We only support changing FS_XFLAG_DAX on directories.  Files get their
flag from the parent directory on creation only.  So no data
invalidation needs to happen.

Alter the xfs_ioctl_setattr_dax_invalidate() to be
xfs_ioctl_dax_check().

This also allows use to remove the join_flags logic.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from v5:
	New patch
---
 fs/xfs/xfs_ioctl.c | 91 +++++-----------------------------------------
 1 file changed, 10 insertions(+), 81 deletions(-)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index c6cd92ef4a05..5472faab7c4f 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1145,63 +1145,18 @@ xfs_ioctl_setattr_xflags(
 }
 
 /*
- * If we are changing DAX flags, we have to ensure the file is clean and any
- * cached objects in the address space are invalidated and removed. This
- * requires us to lock out other IO and page faults similar to a truncate
- * operation. The locks need to be held until the transaction has been committed
- * so that the cache invalidation is atomic with respect to the DAX flag
- * manipulation.
+ * Only directories are allowed to change dax flags
  */
 static int
 xfs_ioctl_setattr_dax_invalidate(
-	struct xfs_inode	*ip,
-	struct fsxattr		*fa,
-	int			*join_flags)
+	struct xfs_inode	*ip)
 {
 	struct inode		*inode = VFS_I(ip);
-	struct super_block	*sb = inode->i_sb;
-	int			error;
-
-	*join_flags = 0;
-
-	/*
-	 * It is only valid to set the DAX flag on regular files and
-	 * directories on filesystems where the block size is equal to the page
-	 * size. On directories it serves as an inherited hint so we don't
-	 * have to check the device for dax support or flush pagecache.
-	 */
-	if (fa->fsx_xflags & FS_XFLAG_DAX) {
-		struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
-
-		if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize))
-			return -EINVAL;
-	}
-
-	/* If the DAX state is not changing, we have nothing to do here. */
-	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
-		return 0;
-	if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
-		return 0;
-
-	if (S_ISDIR(inode->i_mode))
-		return 0;
 
-	/* lock, flush and invalidate mapping in preparation for flag change */
-	xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
-	error = filemap_write_and_wait(inode->i_mapping);
-	if (error)
-		goto out_unlock;
-	error = invalidate_inode_pages2(inode->i_mapping);
-	if (error)
-		goto out_unlock;
+	if (!S_ISDIR(inode->i_mode))
+		return -EINVAL;
 
-	*join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
 	return 0;
-
-out_unlock:
-	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
-	return error;
-
 }
 
 /*
@@ -1209,17 +1164,10 @@ xfs_ioctl_setattr_dax_invalidate(
  * have permission to do so. On success, return a clean transaction and the
  * inode locked exclusively ready for further operation specific checks. On
  * failure, return an error without modifying or locking the inode.
- *
- * The inode might already be IO locked on call. If this is the case, it is
- * indicated in @join_flags and we take full responsibility for ensuring they
- * are unlocked from now on. Hence if we have an error here, we still have to
- * unlock them. Otherwise, once they are joined to the transaction, they will
- * be unlocked on commit/cancel.
  */
 static struct xfs_trans *
 xfs_ioctl_setattr_get_trans(
-	struct xfs_inode	*ip,
-	int			join_flags)
+	struct xfs_inode	*ip)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_trans	*tp;
@@ -1236,8 +1184,7 @@ xfs_ioctl_setattr_get_trans(
 		goto out_unlock;
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
-	join_flags = 0;
+	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
 
 	/*
 	 * CAP_FOWNER overrides the following restrictions:
@@ -1258,8 +1205,6 @@ xfs_ioctl_setattr_get_trans(
 out_cancel:
 	xfs_trans_cancel(tp);
 out_unlock:
-	if (join_flags)
-		xfs_iunlock(ip, join_flags);
 	return ERR_PTR(error);
 }
 
@@ -1386,7 +1331,6 @@ xfs_ioctl_setattr(
 	struct xfs_dquot	*pdqp = NULL;
 	struct xfs_dquot	*olddquot = NULL;
 	int			code;
-	int			join_flags = 0;
 
 	trace_xfs_ioctl_setattr(ip);
 
@@ -1410,18 +1354,11 @@ xfs_ioctl_setattr(
 			return code;
 	}
 
-	/*
-	 * Changing DAX config may require inode locking for mapping
-	 * invalidation. These need to be held all the way to transaction commit
-	 * or cancel time, so need to be passed through to
-	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
-	 * appropriately.
-	 */
-	code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags);
+	code = xfs_ioctl_setattr_dax_invalidate(ip);
 	if (code)
 		goto error_free_dquots;
 
-	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
+	tp = xfs_ioctl_setattr_get_trans(ip);
 	if (IS_ERR(tp)) {
 		code = PTR_ERR(tp);
 		goto error_free_dquots;
@@ -1552,7 +1489,6 @@ xfs_ioc_setxflags(
 	struct fsxattr		fa;
 	struct fsxattr		old_fa;
 	unsigned int		flags;
-	int			join_flags = 0;
 	int			error;
 
 	if (copy_from_user(&flags, arg, sizeof(flags)))
@@ -1569,18 +1505,11 @@ xfs_ioc_setxflags(
 	if (error)
 		return error;
 
-	/*
-	 * Changing DAX config may require inode locking for mapping
-	 * invalidation. These need to be held all the way to transaction commit
-	 * or cancel time, so need to be passed through to
-	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
-	 * appropriately.
-	 */
-	error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags);
+	error = xfs_ioctl_setattr_dax_invalidate(ip);
 	if (error)
 		goto out_drop_write;
 
-	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
+	tp = xfs_ioctl_setattr_get_trans(ip);
 	if (IS_ERR(tp)) {
 		error = PTR_ERR(tp);
 		goto out_drop_write;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 8/8] Documentation/dax: Update Usage section
  2020-04-07 18:29 [PATCH V6 0/8] Enable per-file/per-directory DAX operations V6 ira.weiny
                   ` (6 preceding siblings ...)
  2020-04-07 18:29 ` [PATCH V6 7/8] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate() to xfs_ioctl_dax_check() ira.weiny
@ 2020-04-07 18:29 ` ira.weiny
  7 siblings, 0 replies; 47+ messages in thread
From: ira.weiny @ 2020-04-07 18:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ira Weiny, Darrick J. Wong, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

From: Ira Weiny <ira.weiny@intel.com>

Update the Usage section to reflect the new individual dax selection
functionality.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V5:
	Update to reflect the agreed upon semantics
	https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
---
 Documentation/filesystems/dax.txt | 94 ++++++++++++++++++++++++++++++-
 1 file changed, 91 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
index 679729442fd2..d84e8101cf8a 100644
--- a/Documentation/filesystems/dax.txt
+++ b/Documentation/filesystems/dax.txt
@@ -17,11 +17,99 @@ For file mappings, the storage device is mapped directly into userspace.
 Usage
 -----
 
-If you have a block device which supports DAX, you can make a filesystem
+If you have a block device which supports DAX, you can make a file system
 on it as usual.  The DAX code currently only supports files with a block
 size equal to your kernel's PAGE_SIZE, so you may need to specify a block
-size when creating the filesystem.  When mounting it, use the "-o dax"
-option on the command line or add 'dax' to the options in /etc/fstab.
+size when creating the file system.
+
+Enabling DAX on an individual file basis (XFS)
+----------------------------------------------
+
+There are 2 per file dax flags.  One is a physical inode setting (FS_XFLAG_DAX) and
+the other a currently enabled state (S_DAX).
+
+FS_XFLAG_DAX is maintained on individual file and directory inodes.  It is
+preserved within the file system.  This 'physical' config setting can be set on
+directories using an ioctl and/or an application such as "xfs_io -c 'chattr
+[-+]x'".  Files and directories automatically inherit FS_XFLAG_DAX from their
+parent directory _when_ _created_.  Therefore, setting FS_XFLAG_DAX at
+directory creation time can be used to set a default behavior for an entire
+sub-tree.  (Doing so on the root directory acts to set a default for the entire
+file system.)
+
+To clarify inheritance here are 3 examples:
+
+Example A:
+
+mkdir -p a/b/c
+xfs_io 'chattr +x' a
+mkdir a/b/c/d
+mkdir a/e
+
+	dax: a,e
+	no dax: b,c,d
+
+Example B:
+
+mkdir a
+xfs_io 'chattr +x' a
+mkdir -p a/b/c/d
+
+	dax: a,b,c,d
+	no dax:
+
+Example C:
+
+mkdir -p a/b/c
+xfs_io 'chattr +x' c
+mkdir a/b/c/d
+
+	dax: c,d
+	no dax: a,b
+
+
+The current enabled state (S_DAX) is set when a file inode is loaded based on
+the underlying media support and the file systems dax mount option setting.  See
+below.
+
+statx can be used to query S_DAX.  NOTE that a directory will never have S_DAX
+set and therefore statx will always return false.  FS_XFLAG_DAX can be queried
+with ioctl or xfs_io on directories.
+
+NOTE: Setting FS_XFLAG_DAX on a directory is possible even if the underlying
+media does not support dax.  Furthermore, files and directories will continue
+to inherit FS_XLFAG_DAX even if the underlying media does not support dax.
+
+
+overriding FS_XFLAG_DAX (the dax= mount option)
+-----------------------------------------------
+
+The dax mount option is a tri-state option (never, always, iflag):
+
+   "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
+   "-o dax=always" means "always set S_DAX (at least on pmem), ignore FS_XFLAG_DAX"
+        "-o dax" by itself means "dax=always" to remain compatible with older
+	         kernels
+   "-o dax=iflag" means "follow FS_XFLAG_DAX"
+
+The default state is 'iflag'.  The following algorithm is used to determine the
+effective mode of the file S_DAX on a capable device.
+
+	S_DAX &= FS_XFLAG_DAX;
+
+	if (dax_mount == "always")
+		S_DAX = true;
+	else if (dax_mount == "off"
+		S_DAX = false;
+
+Using the mount option does not change the physical configured state of
+individual files.
+
+NOTE: Setting FS_XFLAG_DAX on a directory is possible while the file system is
+mounted with the dax override.  In addition, files and directories will inherit
+FS_XFLAG_DAX as normal while the file system is overriden.  However, the file's
+enabled state will continue to be the mount option until remounted with
+dax=iflag.
 
 
 Implementation Tips for Block Driver Writers
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 1/8] fs/xfs: Remove unnecessary initialization of i_rwsem
  2020-04-07 18:29 ` [PATCH V6 1/8] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
@ 2020-04-07 23:46   ` Dave Chinner
  0 siblings, 0 replies; 47+ messages in thread
From: Dave Chinner @ 2020-04-07 23:46 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Tue, Apr 07, 2020 at 11:29:51AM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> An earlier call of xfs_reinit_inode() from xfs_iget_cache_hit() already
> handles initialization of i_rwsem.
> 
> Doing so again is unneeded.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from V4:
> 	Update commit message to make it clear the xfs_iget_cache_hit()
> 	is actually doing the initialization via xfs_reinit_inode()
> 
> New for V4:
> 
> NOTE: This was found while ensuring the new i_aops_sem was properly
> handled.  It seems like this is a layering violation so I think it is
> worth cleaning up so as to not confuse others.
> ---
>  fs/xfs/xfs_icache.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 8dc2e5414276..836a1f09be03 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -419,6 +419,7 @@ xfs_iget_cache_hit(
>  		spin_unlock(&ip->i_flags_lock);
>  		rcu_read_unlock();
>  
> +		ASSERT(!rwsem_is_locked(&inode->i_rwsem));
>  		error = xfs_reinit_inode(mp, inode);
>  		if (error) {
>  			bool wake;
> @@ -452,9 +453,6 @@ xfs_iget_cache_hit(
>  		ip->i_sick = 0;
>  		ip->i_checked = 0;
>  
> -		ASSERT(!rwsem_is_locked(&inode->i_rwsem));
> -		init_rwsem(&inode->i_rwsem);
> -
>  		spin_unlock(&ip->i_flags_lock);
>  		spin_unlock(&pag->pag_ici_lock);
>  	} else {

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 3/8] fs/stat: Define DAX statx attribute
  2020-04-07 18:29 ` [PATCH V6 3/8] fs/stat: Define DAX statx attribute ira.weiny
@ 2020-04-07 23:47   ` Dave Chinner
  0 siblings, 0 replies; 47+ messages in thread
From: Dave Chinner @ 2020-04-07 23:47 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, Jan Kara, Darrick J . Wong, Dan Williams,
	Christoph Hellwig, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Tue, Apr 07, 2020 at 11:29:53AM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> In order for users to determine if a file is currently operating in DAX
> state (effective DAX).  Define a statx attribute value and set that
> attribute if the effective DAX flag is set.
> 
> To go along with this we propose the following addition to the statx man
> page:
> 
> STATX_ATTR_DAX
> 
> 	The file is in the DAX (cpu direct access) state.  DAX state
> 	attempts to minimize software cache effects for both I/O and
> 	memory mappings of this file.  It requires a file system which
> 	has been configured to support DAX.
> 
> 	DAX generally assumes all accesses are via cpu load / store
> 	instructions which can minimize overhead for small accesses, but
> 	may adversely affect cpu utilization for large transfers.
> 
> 	File I/O is done directly to/from user-space buffers and memory
> 	mapped I/O may be performed with direct memory mappings that
> 	bypass kernel page cache.
> 
> 	While the DAX property tends to result in data being transferred
> 	synchronously, it does not give the same guarantees of O_SYNC
> 	where data and the necessary metadata are transferred together.
> 
> 	A DAX file may support being mapped with the MAP_SYNC flag,
> 	which enables a program to use CPU cache flush instructions to
> 	persist CPU store operations without an explicit fsync(2).  See
> 	mmap(2) for more information.
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from V2:
> 	Update man page text with comments from Darrick, Jan, Dan, and
> 	Dave.
> ---
>  fs/stat.c                 | 3 +++
>  include/uapi/linux/stat.h | 1 +
>  2 files changed, 4 insertions(+)
> 
> diff --git a/fs/stat.c b/fs/stat.c
> index 030008796479..894699c74dde 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -79,6 +79,9 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat,
>  	if (IS_AUTOMOUNT(inode))
>  		stat->attributes |= STATX_ATTR_AUTOMOUNT;
>  
> +	if (IS_DAX(inode))
> +		stat->attributes |= STATX_ATTR_DAX;
> +
>  	if (inode->i_op->getattr)
>  		return inode->i_op->getattr(path, stat, request_mask,
>  					    query_flags);
> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> index ad80a5c885d5..e5f9d5517f6b 100644
> --- a/include/uapi/linux/stat.h
> +++ b/include/uapi/linux/stat.h
> @@ -169,6 +169,7 @@ struct statx {
>  #define STATX_ATTR_ENCRYPTED		0x00000800 /* [I] File requires key to decrypt in fs */
>  #define STATX_ATTR_AUTOMOUNT		0x00001000 /* Dir: Automount trigger */
>  #define STATX_ATTR_VERITY		0x00100000 /* [I] Verity protected file */
> +#define STATX_ATTR_DAX			0x00002000 /* [I] File is DAX */
>  
>  
>  #endif /* _UAPI_LINUX_STAT_H */

Looks fine. Man page text seems ok, too.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 4/8] fs/xfs: Make DAX mount option a tri-state
  2020-04-07 18:29 ` [PATCH V6 4/8] fs/xfs: Make DAX mount option a tri-state ira.weiny
@ 2020-04-07 23:59   ` Dave Chinner
  2020-04-08  0:09     ` Ira Weiny
  0 siblings, 1 reply; 47+ messages in thread
From: Dave Chinner @ 2020-04-07 23:59 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Tue, Apr 07, 2020 at 11:29:54AM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> As agreed upon[1].  We make the dax mount option a tri-state.  '-o dax'
> continues to operate the same.  We add 'always', 'never', and 'iflag'
> (default).
> 
> [1] https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from v5:
> 	New Patch
> ---
>  fs/xfs/xfs_iops.c  |  2 +-
>  fs/xfs/xfs_mount.h | 26 +++++++++++++++++++++++++-
>  fs/xfs/xfs_super.c | 34 +++++++++++++++++++++++++++++-----
>  3 files changed, 55 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 81f2f93caec0..1ec4a36917bd 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -1248,7 +1248,7 @@ xfs_inode_supports_dax(
>  		return false;
>  
>  	/* DAX mount option or DAX iflag must be set. */
> -	if (!(mp->m_flags & XFS_MOUNT_DAX) &&
> +	if (xfs_mount_dax_mode(mp) != XFS_DAX_ALWAYS &&
>  	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
>  		return false;
>  
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 88ab09ed29e7..ce027ee06692 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -233,7 +233,31 @@ typedef struct xfs_mount {
>  						   allocator */
>  #define XFS_MOUNT_NOATTR2	(1ULL << 25)	/* disable use of attr2 format */
>  
> -#define XFS_MOUNT_DAX		(1ULL << 62)	/* TEST ONLY! */
> +/* DAX flag is a 2 bit field representing a tri-state for dax
> + *      iflag, always, never
> + * We reserve/document the 2 bits using dax field/field2
> + */
> +#define XFS_DAX_FIELD_MASK 0x3ULL
> +#define XFS_DAX_FIELD_SHIFT 62
> +#define XFS_MOUNT_DAX_FIELD	(1ULL << 62)
> +#define XFS_MOUNT_DAX_FIELD2	(1ULL << 63)
> +
> +enum {
> +	XFS_DAX_IFLAG = 0,
> +	XFS_DAX_ALWAYS = 1,
> +	XFS_DAX_NEVER = 2,
> +};
> +
> +static inline void xfs_mount_set_dax(struct xfs_mount *mp, u32 val)
> +{
> +	mp->m_flags &= ~(XFS_DAX_FIELD_MASK << XFS_DAX_FIELD_SHIFT);
> +	mp->m_flags |= ((val & XFS_DAX_FIELD_MASK) << XFS_DAX_FIELD_SHIFT);
> +}
> +
> +static inline u32 xfs_mount_dax_mode(struct xfs_mount *mp)
> +{
> +	return (mp->m_flags >> XFS_DAX_FIELD_SHIFT) & XFS_DAX_FIELD_MASK;
> +}

This is overly complex. Just use 2 flags:

#define XFS_MOUNT_DAX_ALWAYS	(1ULL << 26)
#define XFS_MOUNT_DAX_NEVER	(1ULL << 27)

and if no mount flag is set, we use the inode flag....

> @@ -59,7 +66,7 @@ enum {
>  	Opt_filestreams, Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota,
>  	Opt_prjquota, Opt_uquota, Opt_gquota, Opt_pquota,
>  	Opt_uqnoenforce, Opt_gqnoenforce, Opt_pqnoenforce, Opt_qnoenforce,
> -	Opt_discard, Opt_nodiscard, Opt_dax,
> +	Opt_discard, Opt_nodiscard, Opt_dax, Opt_dax_enum,
>  };
>  
>  static const struct fs_parameter_spec xfs_fs_parameters[] = {
> @@ -103,6 +110,7 @@ static const struct fs_parameter_spec xfs_fs_parameters[] = {
>  	fsparam_flag("discard",		Opt_discard),
>  	fsparam_flag("nodiscard",	Opt_nodiscard),
>  	fsparam_flag("dax",		Opt_dax),
> +	fsparam_enum("dax",		Opt_dax_enum, dax_param_enums),
>  	{}
>  };
>  
> @@ -129,7 +137,6 @@ xfs_fs_show_options(
>  		{ XFS_MOUNT_GRPID,		",grpid" },
>  		{ XFS_MOUNT_DISCARD,		",discard" },
>  		{ XFS_MOUNT_LARGEIO,		",largeio" },
> -		{ XFS_MOUNT_DAX,		",dax" },
+		{ XFS_MOUNT_DAX_ALWAYS,		",dax=always" },
+		{ XFS_MOUNT_DAX_NEVER,		",dax=never" },

>  		{ 0, NULL }
>  	};
>  	struct xfs_mount	*mp = XFS_M(root->d_sb);
> @@ -185,6 +192,20 @@ xfs_fs_show_options(
>  	if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT))
>  		seq_puts(m, ",noquota");
>  
> +	switch (xfs_mount_dax_mode(mp)) {
> +		case XFS_DAX_IFLAG:
> +			seq_puts(m, ",dax=iflag");
> +			break;
> +		case XFS_DAX_ALWAYS:
> +			seq_puts(m, ",dax=always");
> +			break;
> +		case XFS_DAX_NEVER:
> +			seq_puts(m, ",dax=never");
> +			break;
> +		default:
> +			break;
> +	}

	if (!(mp->m_flags & (XFS_MOUNT_DAX_ALWAYS | XFS_MOUNT_DAX_NEVER))
		seq_puts(m, ",dax=iflag");

> +
>  	return 0;
>  }
>  
> @@ -1244,7 +1265,10 @@ xfs_fc_parse_param(
>  		return 0;
>  #ifdef CONFIG_FS_DAX
>  	case Opt_dax:
> -		mp->m_flags |= XFS_MOUNT_DAX;
> +		xfs_mount_set_dax(mp, XFS_DAX_ALWAYS);
> +		return 0;
> +	case Opt_dax_enum:
> +		xfs_mount_set_dax(mp, result.uint_32);
>  		return 0;
>  #endif
>  	default:
> @@ -1437,7 +1461,7 @@ xfs_fc_fill_super(
>  	if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5)
>  		sb->s_flags |= SB_I_VERSION;
>  
> -	if (mp->m_flags & XFS_MOUNT_DAX) {
> +	if (xfs_mount_dax_mode(mp) == XFS_DAX_ALWAYS) {

	if (mp->m_flags & XFS_MOUNT_DAX_ALWAYS) {

>  		bool rtdev_is_dax = false, datadev_is_dax;
>  
>  		xfs_warn(mp,
> @@ -1451,7 +1475,7 @@ xfs_fc_fill_super(
>  		if (!rtdev_is_dax && !datadev_is_dax) {
>  			xfs_alert(mp,
>  			"DAX unsupported by block device. Turning off DAX.");
> -			mp->m_flags &= ~XFS_MOUNT_DAX;
> +			xfs_mount_set_dax(mp, XFS_DAX_NEVER);
>  		}
>  		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
>  			xfs_alert(mp,
> -- 
> 2.25.1
> 
> 

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 5/8] fs/xfs: Create function xfs_inode_enable_dax()
  2020-04-07 18:29 ` [PATCH V6 5/8] fs/xfs: Create function xfs_inode_enable_dax() ira.weiny
@ 2020-04-08  0:05   ` Dave Chinner
  2020-04-08  0:13     ` Ira Weiny
  0 siblings, 1 reply; 47+ messages in thread
From: Dave Chinner @ 2020-04-08  0:05 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Tue, Apr 07, 2020 at 11:29:55AM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> xfs_inode_supports_dax() should reflect if the inode can support DAX not
> that it is enabled for DAX.
> 
> Change the use of xfs_inode_supports_dax() to reflect only if the inode
> and underlying storage support dax.
> 
> Add a new function xfs_inode_enable_dax() which reflects if the inode
> should be enabled for DAX.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
....
>  
> +STATIC bool
> +xfs_inode_enable_dax(
> +	struct xfs_inode *ip)
> +{
> +	u32 dax_mode = xfs_mount_dax_mode(ip->i_mount);
> +
> +	if (dax_mode == XFS_DAX_NEVER || !xfs_inode_supports_dax(ip))
> +		return false;
> +	if (dax_mode == XFS_DAX_ALWAYS || ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)
> +		return true;

These compound || statements are better written as single conditions
as they are all sequential logic checks and we can't skip over
checks.

	if (mp->m_flags & XFS_MOUNT_DAX_NEVER)
		return false;
	if (!xfs_inode_supports_dax(ip))
		return false;
	if (mp->m_flags & XFS_MOUNT_DAX_ALWAYS)
		return true;
	if (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)
		return true;
	return false;

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 4/8] fs/xfs: Make DAX mount option a tri-state
  2020-04-07 23:59   ` Dave Chinner
@ 2020-04-08  0:09     ` Ira Weiny
  2020-04-08  0:48       ` Dave Chinner
  0 siblings, 1 reply; 47+ messages in thread
From: Ira Weiny @ 2020-04-08  0:09 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Wed, Apr 08, 2020 at 09:59:09AM +1000, Dave Chinner wrote:
> On Tue, Apr 07, 2020 at 11:29:54AM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > As agreed upon[1].  We make the dax mount option a tri-state.  '-o dax'
> > continues to operate the same.  We add 'always', 'never', and 'iflag'
> > (default).
> > 
> > [1] https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > 
> > ---
> > Changes from v5:
> > 	New Patch
> > ---
> >  fs/xfs/xfs_iops.c  |  2 +-
> >  fs/xfs/xfs_mount.h | 26 +++++++++++++++++++++++++-
> >  fs/xfs/xfs_super.c | 34 +++++++++++++++++++++++++++++-----
> >  3 files changed, 55 insertions(+), 7 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > index 81f2f93caec0..1ec4a36917bd 100644
> > --- a/fs/xfs/xfs_iops.c
> > +++ b/fs/xfs/xfs_iops.c
> > @@ -1248,7 +1248,7 @@ xfs_inode_supports_dax(
> >  		return false;
> >  
> >  	/* DAX mount option or DAX iflag must be set. */
> > -	if (!(mp->m_flags & XFS_MOUNT_DAX) &&
> > +	if (xfs_mount_dax_mode(mp) != XFS_DAX_ALWAYS &&
> >  	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
> >  		return false;
> >  
> > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > index 88ab09ed29e7..ce027ee06692 100644
> > --- a/fs/xfs/xfs_mount.h
> > +++ b/fs/xfs/xfs_mount.h
> > @@ -233,7 +233,31 @@ typedef struct xfs_mount {
> >  						   allocator */
> >  #define XFS_MOUNT_NOATTR2	(1ULL << 25)	/* disable use of attr2 format */
> >  
> > -#define XFS_MOUNT_DAX		(1ULL << 62)	/* TEST ONLY! */
> > +/* DAX flag is a 2 bit field representing a tri-state for dax
> > + *      iflag, always, never
> > + * We reserve/document the 2 bits using dax field/field2
> > + */
> > +#define XFS_DAX_FIELD_MASK 0x3ULL
> > +#define XFS_DAX_FIELD_SHIFT 62
> > +#define XFS_MOUNT_DAX_FIELD	(1ULL << 62)
> > +#define XFS_MOUNT_DAX_FIELD2	(1ULL << 63)
> > +
> > +enum {
> > +	XFS_DAX_IFLAG = 0,
> > +	XFS_DAX_ALWAYS = 1,
> > +	XFS_DAX_NEVER = 2,
> > +};
> > +
> > +static inline void xfs_mount_set_dax(struct xfs_mount *mp, u32 val)
> > +{
> > +	mp->m_flags &= ~(XFS_DAX_FIELD_MASK << XFS_DAX_FIELD_SHIFT);
> > +	mp->m_flags |= ((val & XFS_DAX_FIELD_MASK) << XFS_DAX_FIELD_SHIFT);
> > +}
> > +
> > +static inline u32 xfs_mount_dax_mode(struct xfs_mount *mp)
> > +{
> > +	return (mp->m_flags >> XFS_DAX_FIELD_SHIFT) & XFS_DAX_FIELD_MASK;
> > +}
> 
> This is overly complex. Just use 2 flags:

LOL...  I was afraid someone would say that.  At first I used 2 flags with
fsparam_string, but then I realized Darrick suggested fsparam_enum:

	"It would be handy if we could have a single fsparam_enum for figuring
	out the dax mount options."

	-- https://lore.kernel.org/lkml/20200403183746.GQ80283@magnolia/

So I changed it...

Darrick?

Ira

> 
> #define XFS_MOUNT_DAX_ALWAYS	(1ULL << 26)
> #define XFS_MOUNT_DAX_NEVER	(1ULL << 27)
> 
> and if no mount flag is set, we use the inode flag....
> 
> > @@ -59,7 +66,7 @@ enum {
> >  	Opt_filestreams, Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota,
> >  	Opt_prjquota, Opt_uquota, Opt_gquota, Opt_pquota,
> >  	Opt_uqnoenforce, Opt_gqnoenforce, Opt_pqnoenforce, Opt_qnoenforce,
> > -	Opt_discard, Opt_nodiscard, Opt_dax,
> > +	Opt_discard, Opt_nodiscard, Opt_dax, Opt_dax_enum,
> >  };
> >  
> >  static const struct fs_parameter_spec xfs_fs_parameters[] = {
> > @@ -103,6 +110,7 @@ static const struct fs_parameter_spec xfs_fs_parameters[] = {
> >  	fsparam_flag("discard",		Opt_discard),
> >  	fsparam_flag("nodiscard",	Opt_nodiscard),
> >  	fsparam_flag("dax",		Opt_dax),
> > +	fsparam_enum("dax",		Opt_dax_enum, dax_param_enums),
> >  	{}
> >  };
> >  
> > @@ -129,7 +137,6 @@ xfs_fs_show_options(
> >  		{ XFS_MOUNT_GRPID,		",grpid" },
> >  		{ XFS_MOUNT_DISCARD,		",discard" },
> >  		{ XFS_MOUNT_LARGEIO,		",largeio" },
> > -		{ XFS_MOUNT_DAX,		",dax" },
> +		{ XFS_MOUNT_DAX_ALWAYS,		",dax=always" },
> +		{ XFS_MOUNT_DAX_NEVER,		",dax=never" },
> 
> >  		{ 0, NULL }
> >  	};
> >  	struct xfs_mount	*mp = XFS_M(root->d_sb);
> > @@ -185,6 +192,20 @@ xfs_fs_show_options(
> >  	if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT))
> >  		seq_puts(m, ",noquota");
> >  
> > +	switch (xfs_mount_dax_mode(mp)) {
> > +		case XFS_DAX_IFLAG:
> > +			seq_puts(m, ",dax=iflag");
> > +			break;
> > +		case XFS_DAX_ALWAYS:
> > +			seq_puts(m, ",dax=always");
> > +			break;
> > +		case XFS_DAX_NEVER:
> > +			seq_puts(m, ",dax=never");
> > +			break;
> > +		default:
> > +			break;
> > +	}
> 
> 	if (!(mp->m_flags & (XFS_MOUNT_DAX_ALWAYS | XFS_MOUNT_DAX_NEVER))
> 		seq_puts(m, ",dax=iflag");
> 
> > +
> >  	return 0;
> >  }
> >  
> > @@ -1244,7 +1265,10 @@ xfs_fc_parse_param(
> >  		return 0;
> >  #ifdef CONFIG_FS_DAX
> >  	case Opt_dax:
> > -		mp->m_flags |= XFS_MOUNT_DAX;
> > +		xfs_mount_set_dax(mp, XFS_DAX_ALWAYS);
> > +		return 0;
> > +	case Opt_dax_enum:
> > +		xfs_mount_set_dax(mp, result.uint_32);
> >  		return 0;
> >  #endif
> >  	default:
> > @@ -1437,7 +1461,7 @@ xfs_fc_fill_super(
> >  	if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5)
> >  		sb->s_flags |= SB_I_VERSION;
> >  
> > -	if (mp->m_flags & XFS_MOUNT_DAX) {
> > +	if (xfs_mount_dax_mode(mp) == XFS_DAX_ALWAYS) {
> 
> 	if (mp->m_flags & XFS_MOUNT_DAX_ALWAYS) {
> 
> >  		bool rtdev_is_dax = false, datadev_is_dax;
> >  
> >  		xfs_warn(mp,
> > @@ -1451,7 +1475,7 @@ xfs_fc_fill_super(
> >  		if (!rtdev_is_dax && !datadev_is_dax) {
> >  			xfs_alert(mp,
> >  			"DAX unsupported by block device. Turning off DAX.");
> > -			mp->m_flags &= ~XFS_MOUNT_DAX;
> > +			xfs_mount_set_dax(mp, XFS_DAX_NEVER);
> >  		}
> >  		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> >  			xfs_alert(mp,
> > -- 
> > 2.25.1
> > 
> > 
> 
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 5/8] fs/xfs: Create function xfs_inode_enable_dax()
  2020-04-08  0:05   ` Dave Chinner
@ 2020-04-08  0:13     ` Ira Weiny
  0 siblings, 0 replies; 47+ messages in thread
From: Ira Weiny @ 2020-04-08  0:13 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Wed, Apr 08, 2020 at 10:05:33AM +1000, Dave Chinner wrote:
> On Tue, Apr 07, 2020 at 11:29:55AM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > xfs_inode_supports_dax() should reflect if the inode can support DAX not
> > that it is enabled for DAX.
> > 
> > Change the use of xfs_inode_supports_dax() to reflect only if the inode
> > and underlying storage support dax.
> > 
> > Add a new function xfs_inode_enable_dax() which reflects if the inode
> > should be enabled for DAX.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ....
> >  
> > +STATIC bool
> > +xfs_inode_enable_dax(
> > +	struct xfs_inode *ip)
> > +{
> > +	u32 dax_mode = xfs_mount_dax_mode(ip->i_mount);
> > +
> > +	if (dax_mode == XFS_DAX_NEVER || !xfs_inode_supports_dax(ip))
> > +		return false;
> > +	if (dax_mode == XFS_DAX_ALWAYS || ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)
> > +		return true;
> 
> These compound || statements are better written as single conditions
> as they are all sequential logic checks and we can't skip over
> checks.
> 
> 	if (mp->m_flags & XFS_MOUNT_DAX_NEVER)
> 		return false;
> 	if (!xfs_inode_supports_dax(ip))
> 		return false;
> 	if (mp->m_flags & XFS_MOUNT_DAX_ALWAYS)
> 		return true;
> 	if (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)
> 		return true;
> 	return false;

Updated for V7

Thanks,
Ira

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 4/8] fs/xfs: Make DAX mount option a tri-state
  2020-04-08  0:09     ` Ira Weiny
@ 2020-04-08  0:48       ` Dave Chinner
  2020-04-09 15:03         ` Ira Weiny
  0 siblings, 1 reply; 47+ messages in thread
From: Dave Chinner @ 2020-04-08  0:48 UTC (permalink / raw)
  To: Ira Weiny
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Tue, Apr 07, 2020 at 05:09:04PM -0700, Ira Weiny wrote:
> On Wed, Apr 08, 2020 at 09:59:09AM +1000, Dave Chinner wrote:
> > On Tue, Apr 07, 2020 at 11:29:54AM -0700, ira.weiny@intel.com wrote:
> > > From: Ira Weiny <ira.weiny@intel.com>
> > > 
> > > As agreed upon[1].  We make the dax mount option a tri-state.  '-o dax'
> > > continues to operate the same.  We add 'always', 'never', and 'iflag'
> > > (default).
> > > 
> > > [1] https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > > 
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > 
> > > ---
> > > Changes from v5:
> > > 	New Patch
> > > ---
> > >  fs/xfs/xfs_iops.c  |  2 +-
> > >  fs/xfs/xfs_mount.h | 26 +++++++++++++++++++++++++-
> > >  fs/xfs/xfs_super.c | 34 +++++++++++++++++++++++++++++-----
> > >  3 files changed, 55 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > > index 81f2f93caec0..1ec4a36917bd 100644
> > > --- a/fs/xfs/xfs_iops.c
> > > +++ b/fs/xfs/xfs_iops.c
> > > @@ -1248,7 +1248,7 @@ xfs_inode_supports_dax(
> > >  		return false;
> > >  
> > >  	/* DAX mount option or DAX iflag must be set. */
> > > -	if (!(mp->m_flags & XFS_MOUNT_DAX) &&
> > > +	if (xfs_mount_dax_mode(mp) != XFS_DAX_ALWAYS &&
> > >  	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
> > >  		return false;
> > >  
> > > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > > index 88ab09ed29e7..ce027ee06692 100644
> > > --- a/fs/xfs/xfs_mount.h
> > > +++ b/fs/xfs/xfs_mount.h
> > > @@ -233,7 +233,31 @@ typedef struct xfs_mount {
> > >  						   allocator */
> > >  #define XFS_MOUNT_NOATTR2	(1ULL << 25)	/* disable use of attr2 format */
> > >  
> > > -#define XFS_MOUNT_DAX		(1ULL << 62)	/* TEST ONLY! */
> > > +/* DAX flag is a 2 bit field representing a tri-state for dax
> > > + *      iflag, always, never
> > > + * We reserve/document the 2 bits using dax field/field2
> > > + */
> > > +#define XFS_DAX_FIELD_MASK 0x3ULL
> > > +#define XFS_DAX_FIELD_SHIFT 62
> > > +#define XFS_MOUNT_DAX_FIELD	(1ULL << 62)
> > > +#define XFS_MOUNT_DAX_FIELD2	(1ULL << 63)
> > > +
> > > +enum {
> > > +	XFS_DAX_IFLAG = 0,
> > > +	XFS_DAX_ALWAYS = 1,
> > > +	XFS_DAX_NEVER = 2,
> > > +};
> > > +
> > > +static inline void xfs_mount_set_dax(struct xfs_mount *mp, u32 val)
> > > +{
> > > +	mp->m_flags &= ~(XFS_DAX_FIELD_MASK << XFS_DAX_FIELD_SHIFT);
> > > +	mp->m_flags |= ((val & XFS_DAX_FIELD_MASK) << XFS_DAX_FIELD_SHIFT);
> > > +}
> > > +
> > > +static inline u32 xfs_mount_dax_mode(struct xfs_mount *mp)
> > > +{
> > > +	return (mp->m_flags >> XFS_DAX_FIELD_SHIFT) & XFS_DAX_FIELD_MASK;
> > > +}
> > 
> > This is overly complex. Just use 2 flags:
> 
> LOL...  I was afraid someone would say that.  At first I used 2 flags with
> fsparam_string, but then I realized Darrick suggested fsparam_enum:

Well, I'm not concerned about the fsparam enum, it's just that
encoding an integer into a flags bit field is just ... messy.
Especially when encoding that state can be done with just 2 flags.

If you want to keep the xfs_mount_dax_mode() wrapper, then:

static inline uint32_t xfs_mount_dax_mode(struct xfs_mount *mp)
{
	if (mp->m_flags & XFS_MOUNT_DAX_NEVER)
		return XFS_DAX_NEVER;
	if (mp->m_flags & XFS_MOUNT_DAX_ALWAYS)
		return XFS_DAX_ALWAYS;
	return XFS_DAX_IFLAG;
}

but once it's encoded in flags like this, the wrapper really isn't
necessary...

Also, while I think of it, can we change "iflag" to "inode". i.e.
the DAX state is held on the inode. Saying it comes from an "inode
flag" encodes the implementation into the user interface. i.e. it
could well be held in an xattr on the inode on another filesystem,
so we shouldn't mention "flag" in the user API....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-07 18:29 ` [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags() ira.weiny
@ 2020-04-08  2:08   ` Dave Chinner
  2020-04-08 17:09     ` Ira Weiny
  0 siblings, 1 reply; 47+ messages in thread
From: Dave Chinner @ 2020-04-08  2:08 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Tue, Apr 07, 2020 at 11:29:56AM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> The functionality in xfs_diflags_to_linux() and xfs_diflags_to_iflags() are
> nearly identical.  The only difference is that *_to_linux() is called after
> inode setup and disallows changing the DAX flag.
> 
> Combining them can be done with a flag which indicates if this is the initial
> setup to allow the DAX flag to be properly set only at init time.
> 
> So remove xfs_diflags_to_linux() and call the modified xfs_diflags_to_iflags()
> directly.
> 
> While we are here simplify xfs_diflags_to_iflags() to take struct xfs_inode and
> use xfs_ip2xflags() to ensure future diflags are included correctly.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from V5:
> 	The functions are no longer identical so we can only combine
> 	them rather than deleting one completely.  This is reflected in
> 	the new init parameter.
> ---
>  fs/xfs/xfs_inode.h |  1 +
>  fs/xfs/xfs_ioctl.c | 33 +--------------------------------
>  fs/xfs/xfs_iops.c  | 42 +++++++++++++++++++++++++++---------------
>  3 files changed, 29 insertions(+), 47 deletions(-)
> 
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 492e53992fa9..e76ed9ca17f7 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -466,6 +466,7 @@ int	xfs_break_layouts(struct inode *inode, uint *iolock,
>  /* from xfs_iops.c */
>  extern void xfs_setup_inode(struct xfs_inode *ip);
>  extern void xfs_setup_iops(struct xfs_inode *ip);
> +extern void xfs_diflags_to_iflags(struct xfs_inode *ip, bool init);
>  
>  /*
>   * When setting up a newly allocated inode, we need to call
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index d42de92cb283..c6cd92ef4a05 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1100,37 +1100,6 @@ xfs_flags2diflags2(
>  	return di_flags2;
>  }
>  
> -STATIC void
> -xfs_diflags_to_linux(
> -	struct xfs_inode	*ip)
> -{
> -	struct inode		*inode = VFS_I(ip);
> -	unsigned int		xflags = xfs_ip2xflags(ip);
> -
> -	if (xflags & FS_XFLAG_IMMUTABLE)
> -		inode->i_flags |= S_IMMUTABLE;
> -	else
> -		inode->i_flags &= ~S_IMMUTABLE;
> -	if (xflags & FS_XFLAG_APPEND)
> -		inode->i_flags |= S_APPEND;
> -	else
> -		inode->i_flags &= ~S_APPEND;
> -	if (xflags & FS_XFLAG_SYNC)
> -		inode->i_flags |= S_SYNC;
> -	else
> -		inode->i_flags &= ~S_SYNC;
> -	if (xflags & FS_XFLAG_NOATIME)
> -		inode->i_flags |= S_NOATIME;
> -	else
> -		inode->i_flags &= ~S_NOATIME;
> -#if 0	/* disabled until the flag switching races are sorted out */
> -	if (xflags & FS_XFLAG_DAX)
> -		inode->i_flags |= S_DAX;
> -	else
> -		inode->i_flags &= ~S_DAX;
> -#endif

So this variant will set the flag in the inode if the disk inode
flag is set, otherwise it will clear it.  It does it with if/else
branches.


> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index e07f7b641226..a4ac8568c8c7 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -1259,7 +1259,7 @@ xfs_inode_supports_dax(
>  	return xfs_inode_buftarg(ip)->bt_daxdev != NULL;
>  }
>  
> -STATIC bool
> +static bool
>  xfs_inode_enable_dax(
>  	struct xfs_inode *ip)
>  {

This belongs in the previous patch.

> @@ -1272,26 +1272,38 @@ xfs_inode_enable_dax(
>  	return false;
>  }
>  
> -STATIC void
> +void
>  xfs_diflags_to_iflags(
> -	struct inode		*inode,
> -	struct xfs_inode	*ip)
> +	struct xfs_inode	*ip,
> +	bool init)
>  {
> -	uint16_t		flags = ip->i_d.di_flags;
> -
> -	inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC |
> -			    S_NOATIME | S_DAX);

And this code cleared all the flags in the inode first, then
set them if the disk inode flag is set. This does not require
branches, resulting in more readable code and better code
generation.

> +	struct inode		*inode = VFS_I(ip);
> +	uint			diflags = xfs_ip2xflags(ip);
>  
> -	if (flags & XFS_DIFLAG_IMMUTABLE)
> +	if (diflags & FS_XFLAG_IMMUTABLE)
>  		inode->i_flags |= S_IMMUTABLE;
> -	if (flags & XFS_DIFLAG_APPEND)
> +	else
> +		inode->i_flags &= ~S_IMMUTABLE;

> +	if (diflags & FS_XFLAG_APPEND)
>  		inode->i_flags |= S_APPEND;
> -	if (flags & XFS_DIFLAG_SYNC)
> +	else
> +		inode->i_flags &= ~S_APPEND;
> +	if (diflags & FS_XFLAG_SYNC)
>  		inode->i_flags |= S_SYNC;
> -	if (flags & XFS_DIFLAG_NOATIME)
> +	else
> +		inode->i_flags &= ~S_SYNC;
> +	if (diflags & FS_XFLAG_NOATIME)
>  		inode->i_flags |= S_NOATIME;
> -	if (xfs_inode_enable_dax(ip))
> -		inode->i_flags |= S_DAX;
> +	else
> +		inode->i_flags &= ~S_NOATIME;
> +
> +	/* Only toggle the dax flag when initializing */
> +	if (init) {
> +		if (xfs_inode_enable_dax(ip))
> +			inode->i_flags |= S_DAX;
> +		else
> +			inode->i_flags &= ~S_DAX;
> +	}
>  }

IOWs, this:

        struct inode            *inode = VFS_I(ip);
        unsigned int            xflags = xfs_ip2xflags(ip);
        unsigned int            flags = 0;

        if (xflags & FS_XFLAG_IMMUTABLE)
                flags |= S_IMMUTABLE;
        if (xflags & FS_XFLAG_APPEND)
                flags |= S_APPEND;
        if (xflags & FS_XFLAG_SYNC)
                flags |= S_SYNC;
        if (xflags & FS_XFLAG_NOATIME)
                flags |= S_NOATIME;
	if ((xflags & FS_XFLAG_DAX) && init)
		flags |= S_DAX;

        inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME);
        inode->i_flags |= flags;

ends up being much easier to read and results in better code
generation. And we don't need to clear the S_DAX flag when "init" is
set, because we are starting from an inode that has no flags set
(because init!)...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 7/8] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate() to xfs_ioctl_dax_check()
  2020-04-07 18:29 ` [PATCH V6 7/8] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate() to xfs_ioctl_dax_check() ira.weiny
@ 2020-04-08  2:23   ` Dave Chinner
  2020-04-08  9:58     ` Jan Kara
  2020-04-08 15:37   ` Darrick J. Wong
  1 sibling, 1 reply; 47+ messages in thread
From: Dave Chinner @ 2020-04-08  2:23 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Tue, Apr 07, 2020 at 11:29:57AM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> We only support changing FS_XFLAG_DAX on directories.  Files get their
> flag from the parent directory on creation only.  So no data
> invalidation needs to happen.

Which leads me to ask: how are users and/or admins supposed to
remove the flag from regular files once it is set in the filesystem?

Only being able to override the flag via the "dax=never" mount
option means that once the flag is set, nobody can ever remove it
and they can only globally turn off dax if it gets set incorrectly.
It also means a global interrupt because all apps on the filesystem
need to be stopped so the filesystem can be unmounted and mounted
again with dax=never. This is highly unfriendly to admins and users.

IOWs, we _must_ be able to clear this inode flag on regular inodes
in some way. I don't care if it doesn't change the current in-memory
state, but we must be able to clear the flags so that the next time
the inodes are instantiated DAX is not enabled for those files...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 7/8] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate() to xfs_ioctl_dax_check()
  2020-04-08  2:23   ` Dave Chinner
@ 2020-04-08  9:58     ` Jan Kara
  2020-04-08 21:09       ` Dave Chinner
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Kara @ 2020-04-08  9:58 UTC (permalink / raw)
  To: Dave Chinner
  Cc: ira.weiny, linux-kernel, Darrick J. Wong, Dan Williams,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Wed 08-04-20 12:23:18, Dave Chinner wrote:
> On Tue, Apr 07, 2020 at 11:29:57AM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > We only support changing FS_XFLAG_DAX on directories.  Files get their
> > flag from the parent directory on creation only.  So no data
> > invalidation needs to happen.
> 
> Which leads me to ask: how are users and/or admins supposed to
> remove the flag from regular files once it is set in the filesystem?
> 
> Only being able to override the flag via the "dax=never" mount
> option means that once the flag is set, nobody can ever remove it
> and they can only globally turn off dax if it gets set incorrectly.
> It also means a global interrupt because all apps on the filesystem
> need to be stopped so the filesystem can be unmounted and mounted
> again with dax=never. This is highly unfriendly to admins and users.
> 
> IOWs, we _must_ be able to clear this inode flag on regular inodes
> in some way. I don't care if it doesn't change the current in-memory
> state, but we must be able to clear the flags so that the next time
> the inodes are instantiated DAX is not enabled for those files...

Well, there's one way to clear the flag: delete the file. If you still care
about the data, you can copy the data first. It isn't very convenient, I
agree, and effectively means restarting whatever application that is using
the file. But it seems like more understandable API than letting user clear
the on-disk flag but the inode will still use DAX until kernel decides to
evict the inode - because that often means you need to restart the
application using the file anyway for the flag change to have any effect.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 7/8] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate() to xfs_ioctl_dax_check()
  2020-04-07 18:29 ` [PATCH V6 7/8] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate() to xfs_ioctl_dax_check() ira.weiny
  2020-04-08  2:23   ` Dave Chinner
@ 2020-04-08 15:37   ` Darrick J. Wong
  2020-04-08 18:13     ` Ira Weiny
  1 sibling, 1 reply; 47+ messages in thread
From: Darrick J. Wong @ 2020-04-08 15:37 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Tue, Apr 07, 2020 at 11:29:57AM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> We only support changing FS_XFLAG_DAX on directories.  Files get their
> flag from the parent directory on creation only.  So no data
> invalidation needs to happen.
> 
> Alter the xfs_ioctl_setattr_dax_invalidate() to be
> xfs_ioctl_dax_check().
> 
> This also allows use to remove the join_flags logic.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from v5:
> 	New patch
> ---
>  fs/xfs/xfs_ioctl.c | 91 +++++-----------------------------------------
>  1 file changed, 10 insertions(+), 81 deletions(-)
> 
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index c6cd92ef4a05..5472faab7c4f 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1145,63 +1145,18 @@ xfs_ioctl_setattr_xflags(
>  }
>  
>  /*
> - * If we are changing DAX flags, we have to ensure the file is clean and any
> - * cached objects in the address space are invalidated and removed. This
> - * requires us to lock out other IO and page faults similar to a truncate
> - * operation. The locks need to be held until the transaction has been committed
> - * so that the cache invalidation is atomic with respect to the DAX flag
> - * manipulation.
> + * Only directories are allowed to change dax flags
>   */
>  static int
>  xfs_ioctl_setattr_dax_invalidate(
> -	struct xfs_inode	*ip,
> -	struct fsxattr		*fa,
> -	int			*join_flags)
> +	struct xfs_inode	*ip)
>  {
>  	struct inode		*inode = VFS_I(ip);
> -	struct super_block	*sb = inode->i_sb;
> -	int			error;
> -
> -	*join_flags = 0;
> -
> -	/*
> -	 * It is only valid to set the DAX flag on regular files and
> -	 * directories on filesystems where the block size is equal to the page
> -	 * size. On directories it serves as an inherited hint so we don't
> -	 * have to check the device for dax support or flush pagecache.
> -	 */
> -	if (fa->fsx_xflags & FS_XFLAG_DAX) {
> -		struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
> -
> -		if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize))
> -			return -EINVAL;
> -	}
> -
> -	/* If the DAX state is not changing, we have nothing to do here. */
> -	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
> -		return 0;
> -	if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
> -		return 0;

Does the !S_ISDIR check below apply unconditionally even if we weren't
trying to change the DAX flag?

> -	if (S_ISDIR(inode->i_mode))
> -		return 0;
>  
> -	/* lock, flush and invalidate mapping in preparation for flag change */
> -	xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> -	error = filemap_write_and_wait(inode->i_mapping);
> -	if (error)
> -		goto out_unlock;
> -	error = invalidate_inode_pages2(inode->i_mapping);
> -	if (error)
> -		goto out_unlock;
> +	if (!S_ISDIR(inode->i_mode))
> +		return -EINVAL;

If this entire function collapses to an S_ISDIR check then you might
as well just hoist this one piece to the caller.  Also, where is
xfs_ioctl_dax_check?

<confused>

--D

>  
> -	*join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
>  	return 0;
> -
> -out_unlock:
> -	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> -	return error;
> -
>  }
>  
>  /*
> @@ -1209,17 +1164,10 @@ xfs_ioctl_setattr_dax_invalidate(
>   * have permission to do so. On success, return a clean transaction and the
>   * inode locked exclusively ready for further operation specific checks. On
>   * failure, return an error without modifying or locking the inode.
> - *
> - * The inode might already be IO locked on call. If this is the case, it is
> - * indicated in @join_flags and we take full responsibility for ensuring they
> - * are unlocked from now on. Hence if we have an error here, we still have to
> - * unlock them. Otherwise, once they are joined to the transaction, they will
> - * be unlocked on commit/cancel.
>   */
>  static struct xfs_trans *
>  xfs_ioctl_setattr_get_trans(
> -	struct xfs_inode	*ip,
> -	int			join_flags)
> +	struct xfs_inode	*ip)
>  {
>  	struct xfs_mount	*mp = ip->i_mount;
>  	struct xfs_trans	*tp;
> @@ -1236,8 +1184,7 @@ xfs_ioctl_setattr_get_trans(
>  		goto out_unlock;
>  
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
> -	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
> -	join_flags = 0;
> +	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
>  
>  	/*
>  	 * CAP_FOWNER overrides the following restrictions:
> @@ -1258,8 +1205,6 @@ xfs_ioctl_setattr_get_trans(
>  out_cancel:
>  	xfs_trans_cancel(tp);
>  out_unlock:
> -	if (join_flags)
> -		xfs_iunlock(ip, join_flags);
>  	return ERR_PTR(error);
>  }
>  
> @@ -1386,7 +1331,6 @@ xfs_ioctl_setattr(
>  	struct xfs_dquot	*pdqp = NULL;
>  	struct xfs_dquot	*olddquot = NULL;
>  	int			code;
> -	int			join_flags = 0;
>  
>  	trace_xfs_ioctl_setattr(ip);
>  
> @@ -1410,18 +1354,11 @@ xfs_ioctl_setattr(
>  			return code;
>  	}
>  
> -	/*
> -	 * Changing DAX config may require inode locking for mapping
> -	 * invalidation. These need to be held all the way to transaction commit
> -	 * or cancel time, so need to be passed through to
> -	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> -	 * appropriately.
> -	 */
> -	code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags);
> +	code = xfs_ioctl_setattr_dax_invalidate(ip);
>  	if (code)
>  		goto error_free_dquots;
>  
> -	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> +	tp = xfs_ioctl_setattr_get_trans(ip);
>  	if (IS_ERR(tp)) {
>  		code = PTR_ERR(tp);
>  		goto error_free_dquots;
> @@ -1552,7 +1489,6 @@ xfs_ioc_setxflags(
>  	struct fsxattr		fa;
>  	struct fsxattr		old_fa;
>  	unsigned int		flags;
> -	int			join_flags = 0;
>  	int			error;
>  
>  	if (copy_from_user(&flags, arg, sizeof(flags)))
> @@ -1569,18 +1505,11 @@ xfs_ioc_setxflags(
>  	if (error)
>  		return error;
>  
> -	/*
> -	 * Changing DAX config may require inode locking for mapping
> -	 * invalidation. These need to be held all the way to transaction commit
> -	 * or cancel time, so need to be passed through to
> -	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> -	 * appropriately.
> -	 */
> -	error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags);
> +	error = xfs_ioctl_setattr_dax_invalidate(ip);
>  	if (error)
>  		goto out_drop_write;
>  
> -	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> +	tp = xfs_ioctl_setattr_get_trans(ip);
>  	if (IS_ERR(tp)) {
>  		error = PTR_ERR(tp);
>  		goto out_drop_write;
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-08  2:08   ` Dave Chinner
@ 2020-04-08 17:09     ` Ira Weiny
  2020-04-08 21:02       ` Dave Chinner
  0 siblings, 1 reply; 47+ messages in thread
From: Ira Weiny @ 2020-04-08 17:09 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Wed, Apr 08, 2020 at 12:08:27PM +1000, Dave Chinner wrote:
> On Tue, Apr 07, 2020 at 11:29:56AM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>

[snip]

> >  
> > -STATIC void
> > -xfs_diflags_to_linux(
> > -	struct xfs_inode	*ip)
> > -{
> > -	struct inode		*inode = VFS_I(ip);
> > -	unsigned int		xflags = xfs_ip2xflags(ip);
> > -
> > -	if (xflags & FS_XFLAG_IMMUTABLE)
> > -		inode->i_flags |= S_IMMUTABLE;
> > -	else
> > -		inode->i_flags &= ~S_IMMUTABLE;
> > -	if (xflags & FS_XFLAG_APPEND)
> > -		inode->i_flags |= S_APPEND;
> > -	else
> > -		inode->i_flags &= ~S_APPEND;
> > -	if (xflags & FS_XFLAG_SYNC)
> > -		inode->i_flags |= S_SYNC;
> > -	else
> > -		inode->i_flags &= ~S_SYNC;
> > -	if (xflags & FS_XFLAG_NOATIME)
> > -		inode->i_flags |= S_NOATIME;
> > -	else
> > -		inode->i_flags &= ~S_NOATIME;
> > -#if 0	/* disabled until the flag switching races are sorted out */
> > -	if (xflags & FS_XFLAG_DAX)
> > -		inode->i_flags |= S_DAX;
> > -	else
> > -		inode->i_flags &= ~S_DAX;
> > -#endif
> 
> So this variant will set the flag in the inode if the disk inode
> flag is set, otherwise it will clear it.  It does it with if/else
> branches.
> 
> 
> > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > index e07f7b641226..a4ac8568c8c7 100644
> > --- a/fs/xfs/xfs_iops.c
> > +++ b/fs/xfs/xfs_iops.c
> > @@ -1259,7 +1259,7 @@ xfs_inode_supports_dax(
> >  	return xfs_inode_buftarg(ip)->bt_daxdev != NULL;
> >  }
> >  
> > -STATIC bool
> > +static bool
> >  xfs_inode_enable_dax(
> >  	struct xfs_inode *ip)
> >  {
> 
> This belongs in the previous patch.

Ah yea...  Sorry.

Fixed in V7

> 
> > @@ -1272,26 +1272,38 @@ xfs_inode_enable_dax(
> >  	return false;
> >  }
> >  
> > -STATIC void
> > +void
> >  xfs_diflags_to_iflags(
> > -	struct inode		*inode,
> > -	struct xfs_inode	*ip)
> > +	struct xfs_inode	*ip,
> > +	bool init)
> >  {
> > -	uint16_t		flags = ip->i_d.di_flags;
> > -
> > -	inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC |
> > -			    S_NOATIME | S_DAX);
> 
> And this code cleared all the flags in the inode first, then
> set them if the disk inode flag is set. This does not require
> branches, resulting in more readable code and better code
> generation.
> 
> > +	struct inode		*inode = VFS_I(ip);
> > +	uint			diflags = xfs_ip2xflags(ip);
> >  
> > -	if (flags & XFS_DIFLAG_IMMUTABLE)
> > +	if (diflags & FS_XFLAG_IMMUTABLE)
> >  		inode->i_flags |= S_IMMUTABLE;
> > -	if (flags & XFS_DIFLAG_APPEND)
> > +	else
> > +		inode->i_flags &= ~S_IMMUTABLE;
> 
> > +	if (diflags & FS_XFLAG_APPEND)
> >  		inode->i_flags |= S_APPEND;
> > -	if (flags & XFS_DIFLAG_SYNC)
> > +	else
> > +		inode->i_flags &= ~S_APPEND;
> > +	if (diflags & FS_XFLAG_SYNC)
> >  		inode->i_flags |= S_SYNC;
> > -	if (flags & XFS_DIFLAG_NOATIME)
> > +	else
> > +		inode->i_flags &= ~S_SYNC;
> > +	if (diflags & FS_XFLAG_NOATIME)
> >  		inode->i_flags |= S_NOATIME;
> > -	if (xfs_inode_enable_dax(ip))
> > -		inode->i_flags |= S_DAX;
> > +	else
> > +		inode->i_flags &= ~S_NOATIME;
> > +
> > +	/* Only toggle the dax flag when initializing */
> > +	if (init) {
> > +		if (xfs_inode_enable_dax(ip))
> > +			inode->i_flags |= S_DAX;
> > +		else
> > +			inode->i_flags &= ~S_DAX;
> > +	}
> >  }
> 
> IOWs, this:
> 
>         struct inode            *inode = VFS_I(ip);
>         unsigned int            xflags = xfs_ip2xflags(ip);
>         unsigned int            flags = 0;
> 
>         if (xflags & FS_XFLAG_IMMUTABLE)
>                 flags |= S_IMMUTABLE;
>         if (xflags & FS_XFLAG_APPEND)
>                 flags |= S_APPEND;
>         if (xflags & FS_XFLAG_SYNC)
>                 flags |= S_SYNC;
>         if (xflags & FS_XFLAG_NOATIME)
>                 flags |= S_NOATIME;
> 	if ((xflags & FS_XFLAG_DAX) && init)
> 		flags |= S_DAX;
> 
>         inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME);
>         inode->i_flags |= flags;
> 
> ends up being much easier to read and results in better code
> generation. And we don't need to clear the S_DAX flag when "init" is
> set, because we are starting from an inode that has no flags set
> (because init!)...

This sounds good but I think we need a slight modification to make the function equivalent in functionality.

void
xfs_diflags_to_iflags(
        struct xfs_inode        *ip,
        bool init)
{
        struct inode            *inode = VFS_I(ip);
        unsigned int            xflags = xfs_ip2xflags(ip);
        unsigned int            flags = 0;

        inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME |
                            S_DAX);

        if (xflags & FS_XFLAG_IMMUTABLE)
                flags |= S_IMMUTABLE;
        if (xflags & FS_XFLAG_APPEND)
                flags |= S_APPEND;
        if (xflags & FS_XFLAG_SYNC)
                flags |= S_SYNC;
        if (xflags & FS_XFLAG_NOATIME)
                flags |= S_NOATIME;
        if (init && xfs_inode_enable_dax(ip))
                flags |= S_DAX;

        inode->i_flags |= flags;
}

I've queued this for v7.

Thanks,
Ira

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 7/8] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate() to xfs_ioctl_dax_check()
  2020-04-08 15:37   ` Darrick J. Wong
@ 2020-04-08 18:13     ` Ira Weiny
  0 siblings, 0 replies; 47+ messages in thread
From: Ira Weiny @ 2020-04-08 18:13 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-kernel, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Wed, Apr 08, 2020 at 08:37:17AM -0700, Darrick J. Wong wrote:
> On Tue, Apr 07, 2020 at 11:29:57AM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > We only support changing FS_XFLAG_DAX on directories.  Files get their
> > flag from the parent directory on creation only.  So no data
> > invalidation needs to happen.
> > 
> > Alter the xfs_ioctl_setattr_dax_invalidate() to be
> > xfs_ioctl_dax_check().
> > 
> > This also allows use to remove the join_flags logic.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > 
> > ---
> > Changes from v5:
> > 	New patch
> > ---
> >  fs/xfs/xfs_ioctl.c | 91 +++++-----------------------------------------
> >  1 file changed, 10 insertions(+), 81 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index c6cd92ef4a05..5472faab7c4f 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -1145,63 +1145,18 @@ xfs_ioctl_setattr_xflags(
> >  }
> >  
> >  /*
> > - * If we are changing DAX flags, we have to ensure the file is clean and any
> > - * cached objects in the address space are invalidated and removed. This
> > - * requires us to lock out other IO and page faults similar to a truncate
> > - * operation. The locks need to be held until the transaction has been committed
> > - * so that the cache invalidation is atomic with respect to the DAX flag
> > - * manipulation.
> > + * Only directories are allowed to change dax flags
> >   */
> >  static int
> >  xfs_ioctl_setattr_dax_invalidate(
> > -	struct xfs_inode	*ip,
> > -	struct fsxattr		*fa,
> > -	int			*join_flags)
> > +	struct xfs_inode	*ip)
> >  {
> >  	struct inode		*inode = VFS_I(ip);
> > -	struct super_block	*sb = inode->i_sb;
> > -	int			error;
> > -
> > -	*join_flags = 0;
> > -
> > -	/*
> > -	 * It is only valid to set the DAX flag on regular files and
> > -	 * directories on filesystems where the block size is equal to the page
> > -	 * size. On directories it serves as an inherited hint so we don't
> > -	 * have to check the device for dax support or flush pagecache.
> > -	 */
> > -	if (fa->fsx_xflags & FS_XFLAG_DAX) {
> > -		struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
> > -
> > -		if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize))
> > -			return -EINVAL;
> > -	}
> > -
> > -	/* If the DAX state is not changing, we have nothing to do here. */
> > -	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
> > -		return 0;
> > -	if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
> > -		return 0;
> 
> Does the !S_ISDIR check below apply unconditionally even if we weren't
> trying to change the DAX flag?

:-/

shoot...  I really screwed this up...

> 
> > -	if (S_ISDIR(inode->i_mode))
> > -		return 0;
> >  
> > -	/* lock, flush and invalidate mapping in preparation for flag change */
> > -	xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> > -	error = filemap_write_and_wait(inode->i_mapping);
> > -	if (error)
> > -		goto out_unlock;
> > -	error = invalidate_inode_pages2(inode->i_mapping);
> > -	if (error)
> > -		goto out_unlock;
> > +	if (!S_ISDIR(inode->i_mode))
> > +		return -EINVAL;
> 
> If this entire function collapses to an S_ISDIR check then you might
> as well just hoist this one piece to the caller.

Oops...  yea this is broken...  All broken.

> Also, where is
> xfs_ioctl_dax_check?

I forgot to change the name.

> 
> <confused>

Much less so than me...  :-/

I'll clean it up.

Thanks, this was really bad on my part...
Ira

> 
> --D
> 
> >  
> > -	*join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
> >  	return 0;
> > -
> > -out_unlock:
> > -	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> > -	return error;
> > -
> >  }
> >  
> >  /*
> > @@ -1209,17 +1164,10 @@ xfs_ioctl_setattr_dax_invalidate(
> >   * have permission to do so. On success, return a clean transaction and the
> >   * inode locked exclusively ready for further operation specific checks. On
> >   * failure, return an error without modifying or locking the inode.
> > - *
> > - * The inode might already be IO locked on call. If this is the case, it is
> > - * indicated in @join_flags and we take full responsibility for ensuring they
> > - * are unlocked from now on. Hence if we have an error here, we still have to
> > - * unlock them. Otherwise, once they are joined to the transaction, they will
> > - * be unlocked on commit/cancel.
> >   */
> >  static struct xfs_trans *
> >  xfs_ioctl_setattr_get_trans(
> > -	struct xfs_inode	*ip,
> > -	int			join_flags)
> > +	struct xfs_inode	*ip)
> >  {
> >  	struct xfs_mount	*mp = ip->i_mount;
> >  	struct xfs_trans	*tp;
> > @@ -1236,8 +1184,7 @@ xfs_ioctl_setattr_get_trans(
> >  		goto out_unlock;
> >  
> >  	xfs_ilock(ip, XFS_ILOCK_EXCL);
> > -	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
> > -	join_flags = 0;
> > +	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> >  
> >  	/*
> >  	 * CAP_FOWNER overrides the following restrictions:
> > @@ -1258,8 +1205,6 @@ xfs_ioctl_setattr_get_trans(
> >  out_cancel:
> >  	xfs_trans_cancel(tp);
> >  out_unlock:
> > -	if (join_flags)
> > -		xfs_iunlock(ip, join_flags);
> >  	return ERR_PTR(error);
> >  }
> >  
> > @@ -1386,7 +1331,6 @@ xfs_ioctl_setattr(
> >  	struct xfs_dquot	*pdqp = NULL;
> >  	struct xfs_dquot	*olddquot = NULL;
> >  	int			code;
> > -	int			join_flags = 0;
> >  
> >  	trace_xfs_ioctl_setattr(ip);
> >  
> > @@ -1410,18 +1354,11 @@ xfs_ioctl_setattr(
> >  			return code;
> >  	}
> >  
> > -	/*
> > -	 * Changing DAX config may require inode locking for mapping
> > -	 * invalidation. These need to be held all the way to transaction commit
> > -	 * or cancel time, so need to be passed through to
> > -	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> > -	 * appropriately.
> > -	 */
> > -	code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags);
> > +	code = xfs_ioctl_setattr_dax_invalidate(ip);
> >  	if (code)
> >  		goto error_free_dquots;
> >  
> > -	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> > +	tp = xfs_ioctl_setattr_get_trans(ip);
> >  	if (IS_ERR(tp)) {
> >  		code = PTR_ERR(tp);
> >  		goto error_free_dquots;
> > @@ -1552,7 +1489,6 @@ xfs_ioc_setxflags(
> >  	struct fsxattr		fa;
> >  	struct fsxattr		old_fa;
> >  	unsigned int		flags;
> > -	int			join_flags = 0;
> >  	int			error;
> >  
> >  	if (copy_from_user(&flags, arg, sizeof(flags)))
> > @@ -1569,18 +1505,11 @@ xfs_ioc_setxflags(
> >  	if (error)
> >  		return error;
> >  
> > -	/*
> > -	 * Changing DAX config may require inode locking for mapping
> > -	 * invalidation. These need to be held all the way to transaction commit
> > -	 * or cancel time, so need to be passed through to
> > -	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> > -	 * appropriately.
> > -	 */
> > -	error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags);
> > +	error = xfs_ioctl_setattr_dax_invalidate(ip);
> >  	if (error)
> >  		goto out_drop_write;
> >  
> > -	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> > +	tp = xfs_ioctl_setattr_get_trans(ip);
> >  	if (IS_ERR(tp)) {
> >  		error = PTR_ERR(tp);
> >  		goto out_drop_write;
> > -- 
> > 2.25.1
> > 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-08 17:09     ` Ira Weiny
@ 2020-04-08 21:02       ` Dave Chinner
  2020-04-08 21:28         ` Dan Williams
  2020-04-08 22:07         ` Ira Weiny
  0 siblings, 2 replies; 47+ messages in thread
From: Dave Chinner @ 2020-04-08 21:02 UTC (permalink / raw)
  To: Ira Weiny
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Wed, Apr 08, 2020 at 10:09:23AM -0700, Ira Weiny wrote:
> On Wed, Apr 08, 2020 at 12:08:27PM +1000, Dave Chinner wrote:
> > On Tue, Apr 07, 2020 at 11:29:56AM -0700, ira.weiny@intel.com wrote:
> > > From: Ira Weiny <ira.weiny@intel.com>
> 
> [snip]
> 
> > >  
> > > -STATIC void
> > > -xfs_diflags_to_linux(
> > > -	struct xfs_inode	*ip)
> > > -{
> > > -	struct inode		*inode = VFS_I(ip);
> > > -	unsigned int		xflags = xfs_ip2xflags(ip);
> > > -
> > > -	if (xflags & FS_XFLAG_IMMUTABLE)
> > > -		inode->i_flags |= S_IMMUTABLE;
> > > -	else
> > > -		inode->i_flags &= ~S_IMMUTABLE;
> > > -	if (xflags & FS_XFLAG_APPEND)
> > > -		inode->i_flags |= S_APPEND;
> > > -	else
> > > -		inode->i_flags &= ~S_APPEND;
> > > -	if (xflags & FS_XFLAG_SYNC)
> > > -		inode->i_flags |= S_SYNC;
> > > -	else
> > > -		inode->i_flags &= ~S_SYNC;
> > > -	if (xflags & FS_XFLAG_NOATIME)
> > > -		inode->i_flags |= S_NOATIME;
> > > -	else
> > > -		inode->i_flags &= ~S_NOATIME;
> > > -#if 0	/* disabled until the flag switching races are sorted out */
> > > -	if (xflags & FS_XFLAG_DAX)
> > > -		inode->i_flags |= S_DAX;
> > > -	else
> > > -		inode->i_flags &= ~S_DAX;
> > > -#endif
> > 
> > So this variant will set the flag in the inode if the disk inode
> > flag is set, otherwise it will clear it.  It does it with if/else
> > branches.
> > 
> > 
> > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > > index e07f7b641226..a4ac8568c8c7 100644
> > > --- a/fs/xfs/xfs_iops.c
> > > +++ b/fs/xfs/xfs_iops.c
> > > @@ -1259,7 +1259,7 @@ xfs_inode_supports_dax(
> > >  	return xfs_inode_buftarg(ip)->bt_daxdev != NULL;
> > >  }
> > >  
> > > -STATIC bool
> > > +static bool
> > >  xfs_inode_enable_dax(
> > >  	struct xfs_inode *ip)
> > >  {
> > 
> > This belongs in the previous patch.
> 
> Ah yea...  Sorry.
> 
> Fixed in V7
> 
> > 
> > > @@ -1272,26 +1272,38 @@ xfs_inode_enable_dax(
> > >  	return false;
> > >  }
> > >  
> > > -STATIC void
> > > +void
> > >  xfs_diflags_to_iflags(
> > > -	struct inode		*inode,
> > > -	struct xfs_inode	*ip)
> > > +	struct xfs_inode	*ip,
> > > +	bool init)
> > >  {
> > > -	uint16_t		flags = ip->i_d.di_flags;
> > > -
> > > -	inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC |
> > > -			    S_NOATIME | S_DAX);
> > 
> > And this code cleared all the flags in the inode first, then
> > set them if the disk inode flag is set. This does not require
> > branches, resulting in more readable code and better code
> > generation.
> > 
> > > +	struct inode		*inode = VFS_I(ip);
> > > +	uint			diflags = xfs_ip2xflags(ip);
> > >  
> > > -	if (flags & XFS_DIFLAG_IMMUTABLE)
> > > +	if (diflags & FS_XFLAG_IMMUTABLE)
> > >  		inode->i_flags |= S_IMMUTABLE;
> > > -	if (flags & XFS_DIFLAG_APPEND)
> > > +	else
> > > +		inode->i_flags &= ~S_IMMUTABLE;
> > 
> > > +	if (diflags & FS_XFLAG_APPEND)
> > >  		inode->i_flags |= S_APPEND;
> > > -	if (flags & XFS_DIFLAG_SYNC)
> > > +	else
> > > +		inode->i_flags &= ~S_APPEND;
> > > +	if (diflags & FS_XFLAG_SYNC)
> > >  		inode->i_flags |= S_SYNC;
> > > -	if (flags & XFS_DIFLAG_NOATIME)
> > > +	else
> > > +		inode->i_flags &= ~S_SYNC;
> > > +	if (diflags & FS_XFLAG_NOATIME)
> > >  		inode->i_flags |= S_NOATIME;
> > > -	if (xfs_inode_enable_dax(ip))
> > > -		inode->i_flags |= S_DAX;
> > > +	else
> > > +		inode->i_flags &= ~S_NOATIME;
> > > +
> > > +	/* Only toggle the dax flag when initializing */
> > > +	if (init) {
> > > +		if (xfs_inode_enable_dax(ip))
> > > +			inode->i_flags |= S_DAX;
> > > +		else
> > > +			inode->i_flags &= ~S_DAX;
> > > +	}
> > >  }
> > 
> > IOWs, this:
> > 
> >         struct inode            *inode = VFS_I(ip);
> >         unsigned int            xflags = xfs_ip2xflags(ip);
> >         unsigned int            flags = 0;
> > 
> >         if (xflags & FS_XFLAG_IMMUTABLE)
> >                 flags |= S_IMMUTABLE;
> >         if (xflags & FS_XFLAG_APPEND)
> >                 flags |= S_APPEND;
> >         if (xflags & FS_XFLAG_SYNC)
> >                 flags |= S_SYNC;
> >         if (xflags & FS_XFLAG_NOATIME)
> >                 flags |= S_NOATIME;
> > 	if ((xflags & FS_XFLAG_DAX) && init)
> > 		flags |= S_DAX;
> > 
> >         inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME);
> >         inode->i_flags |= flags;
> > 
> > ends up being much easier to read and results in better code
> > generation. And we don't need to clear the S_DAX flag when "init" is
> > set, because we are starting from an inode that has no flags set
> > (because init!)...
> 
> This sounds good but I think we need a slight modification to make the function equivalent in functionality.
> 
> void
> xfs_diflags_to_iflags(
>         struct xfs_inode        *ip,
>         bool init)
> {
>         struct inode            *inode = VFS_I(ip);
>         unsigned int            xflags = xfs_ip2xflags(ip);
>         unsigned int            flags = 0;
> 
>         inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME |
>                             S_DAX);

We don't want to clear the dax flag here, ever, if it is already
set. That is an externally visible change and opens us up (again) to
races where IS_DAX() changes half way through a fault path. IOWs, avoiding
clearing the DAX flag was something I did explicitly in the above
code fragment.

And it makes the logic clearer by pre-calculating the new flags,
then clearing and setting the inode flags together, rather than
having the spearated at the top and bottom of the function.

THis leads to an obvious conclusion: if we never clear the in memory
S_DAX flag, we can actually clear the on-disk flag safely, so that
next time the inode cycles into memory it won't be using DAX. IOWs,
admins can stop the applications, clear the DAX flag and drop
caches. This should result in the inode being recycled and when the
app is restarted it will run without DAX. No ned for deleting files,
copying large data sets, etc just to turn off an inode flag.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 7/8] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate() to xfs_ioctl_dax_check()
  2020-04-08  9:58     ` Jan Kara
@ 2020-04-08 21:09       ` Dave Chinner
  2020-04-08 22:26         ` Ira Weiny
  0 siblings, 1 reply; 47+ messages in thread
From: Dave Chinner @ 2020-04-08 21:09 UTC (permalink / raw)
  To: Jan Kara
  Cc: ira.weiny, linux-kernel, Darrick J. Wong, Dan Williams,
	Christoph Hellwig, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Wed, Apr 08, 2020 at 11:58:03AM +0200, Jan Kara wrote:
> On Wed 08-04-20 12:23:18, Dave Chinner wrote:
> > On Tue, Apr 07, 2020 at 11:29:57AM -0700, ira.weiny@intel.com wrote:
> > > From: Ira Weiny <ira.weiny@intel.com>
> > > 
> > > We only support changing FS_XFLAG_DAX on directories.  Files get their
> > > flag from the parent directory on creation only.  So no data
> > > invalidation needs to happen.
> > 
> > Which leads me to ask: how are users and/or admins supposed to
> > remove the flag from regular files once it is set in the filesystem?
> > 
> > Only being able to override the flag via the "dax=never" mount
> > option means that once the flag is set, nobody can ever remove it
> > and they can only globally turn off dax if it gets set incorrectly.
> > It also means a global interrupt because all apps on the filesystem
> > need to be stopped so the filesystem can be unmounted and mounted
> > again with dax=never. This is highly unfriendly to admins and users.
> > 
> > IOWs, we _must_ be able to clear this inode flag on regular inodes
> > in some way. I don't care if it doesn't change the current in-memory
> > state, but we must be able to clear the flags so that the next time
> > the inodes are instantiated DAX is not enabled for those files...
> 
> Well, there's one way to clear the flag: delete the file. If you still care
> about the data, you can copy the data first. It isn't very convenient, I
> agree, and effectively means restarting whatever application that is using
> the file.

Restarting the application is fine. Having to backup/restore or copy
the entire data set just to turn off an inode flag? That's not a
viable management strategy. We could be talking about terabytes of
data here.

I explained how we can safely remove the flag in the other branch of
this thread...

> But it seems like more understandable API than letting user clear
> the on-disk flag but the inode will still use DAX until kernel decides to
> evict the inode

Certainly doesn't seem that way to me. "stop app, clear flags, drop
caches, restart app" is a pretty simple, easy thing to do for an
admin.

Especially compared to process that is effectively "stop app, backup
data set, delete data set, clear flags, restore data set, restart
app"

> - because that often means you need to restart the
> application using the file anyway for the flag change to have any effect.

That's a trivial requirement compared to the downtime and resource
cost of a data set backup/restore just to clear inode flags....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-08 21:02       ` Dave Chinner
@ 2020-04-08 21:28         ` Dan Williams
  2020-04-08 22:10           ` Ira Weiny
  2020-04-08 23:58           ` Dave Chinner
  2020-04-08 22:07         ` Ira Weiny
  1 sibling, 2 replies; 47+ messages in thread
From: Dan Williams @ 2020-04-08 21:28 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Ira Weiny, Linux Kernel Mailing List, Darrick J. Wong,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Wed, Apr 8, 2020 at 2:02 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Wed, Apr 08, 2020 at 10:09:23AM -0700, Ira Weiny wrote:
> > On Wed, Apr 08, 2020 at 12:08:27PM +1000, Dave Chinner wrote:
> > > On Tue, Apr 07, 2020 at 11:29:56AM -0700, ira.weiny@intel.com wrote:
> > > > From: Ira Weiny <ira.weiny@intel.com>
> >
> > [snip]
> >
> > > >
> > > > -STATIC void
> > > > -xfs_diflags_to_linux(
> > > > - struct xfs_inode        *ip)
> > > > -{
> > > > - struct inode            *inode = VFS_I(ip);
> > > > - unsigned int            xflags = xfs_ip2xflags(ip);
> > > > -
> > > > - if (xflags & FS_XFLAG_IMMUTABLE)
> > > > -         inode->i_flags |= S_IMMUTABLE;
> > > > - else
> > > > -         inode->i_flags &= ~S_IMMUTABLE;
> > > > - if (xflags & FS_XFLAG_APPEND)
> > > > -         inode->i_flags |= S_APPEND;
> > > > - else
> > > > -         inode->i_flags &= ~S_APPEND;
> > > > - if (xflags & FS_XFLAG_SYNC)
> > > > -         inode->i_flags |= S_SYNC;
> > > > - else
> > > > -         inode->i_flags &= ~S_SYNC;
> > > > - if (xflags & FS_XFLAG_NOATIME)
> > > > -         inode->i_flags |= S_NOATIME;
> > > > - else
> > > > -         inode->i_flags &= ~S_NOATIME;
> > > > -#if 0    /* disabled until the flag switching races are sorted out */
> > > > - if (xflags & FS_XFLAG_DAX)
> > > > -         inode->i_flags |= S_DAX;
> > > > - else
> > > > -         inode->i_flags &= ~S_DAX;
> > > > -#endif
> > >
> > > So this variant will set the flag in the inode if the disk inode
> > > flag is set, otherwise it will clear it.  It does it with if/else
> > > branches.
> > >
> > >
> > > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > > > index e07f7b641226..a4ac8568c8c7 100644
> > > > --- a/fs/xfs/xfs_iops.c
> > > > +++ b/fs/xfs/xfs_iops.c
> > > > @@ -1259,7 +1259,7 @@ xfs_inode_supports_dax(
> > > >   return xfs_inode_buftarg(ip)->bt_daxdev != NULL;
> > > >  }
> > > >
> > > > -STATIC bool
> > > > +static bool
> > > >  xfs_inode_enable_dax(
> > > >   struct xfs_inode *ip)
> > > >  {
> > >
> > > This belongs in the previous patch.
> >
> > Ah yea...  Sorry.
> >
> > Fixed in V7
> >
> > >
> > > > @@ -1272,26 +1272,38 @@ xfs_inode_enable_dax(
> > > >   return false;
> > > >  }
> > > >
> > > > -STATIC void
> > > > +void
> > > >  xfs_diflags_to_iflags(
> > > > - struct inode            *inode,
> > > > - struct xfs_inode        *ip)
> > > > + struct xfs_inode        *ip,
> > > > + bool init)
> > > >  {
> > > > - uint16_t                flags = ip->i_d.di_flags;
> > > > -
> > > > - inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC |
> > > > -                     S_NOATIME | S_DAX);
> > >
> > > And this code cleared all the flags in the inode first, then
> > > set them if the disk inode flag is set. This does not require
> > > branches, resulting in more readable code and better code
> > > generation.
> > >
> > > > + struct inode            *inode = VFS_I(ip);
> > > > + uint                    diflags = xfs_ip2xflags(ip);
> > > >
> > > > - if (flags & XFS_DIFLAG_IMMUTABLE)
> > > > + if (diflags & FS_XFLAG_IMMUTABLE)
> > > >           inode->i_flags |= S_IMMUTABLE;
> > > > - if (flags & XFS_DIFLAG_APPEND)
> > > > + else
> > > > +         inode->i_flags &= ~S_IMMUTABLE;
> > >
> > > > + if (diflags & FS_XFLAG_APPEND)
> > > >           inode->i_flags |= S_APPEND;
> > > > - if (flags & XFS_DIFLAG_SYNC)
> > > > + else
> > > > +         inode->i_flags &= ~S_APPEND;
> > > > + if (diflags & FS_XFLAG_SYNC)
> > > >           inode->i_flags |= S_SYNC;
> > > > - if (flags & XFS_DIFLAG_NOATIME)
> > > > + else
> > > > +         inode->i_flags &= ~S_SYNC;
> > > > + if (diflags & FS_XFLAG_NOATIME)
> > > >           inode->i_flags |= S_NOATIME;
> > > > - if (xfs_inode_enable_dax(ip))
> > > > -         inode->i_flags |= S_DAX;
> > > > + else
> > > > +         inode->i_flags &= ~S_NOATIME;
> > > > +
> > > > + /* Only toggle the dax flag when initializing */
> > > > + if (init) {
> > > > +         if (xfs_inode_enable_dax(ip))
> > > > +                 inode->i_flags |= S_DAX;
> > > > +         else
> > > > +                 inode->i_flags &= ~S_DAX;
> > > > + }
> > > >  }
> > >
> > > IOWs, this:
> > >
> > >         struct inode            *inode = VFS_I(ip);
> > >         unsigned int            xflags = xfs_ip2xflags(ip);
> > >         unsigned int            flags = 0;
> > >
> > >         if (xflags & FS_XFLAG_IMMUTABLE)
> > >                 flags |= S_IMMUTABLE;
> > >         if (xflags & FS_XFLAG_APPEND)
> > >                 flags |= S_APPEND;
> > >         if (xflags & FS_XFLAG_SYNC)
> > >                 flags |= S_SYNC;
> > >         if (xflags & FS_XFLAG_NOATIME)
> > >                 flags |= S_NOATIME;
> > >     if ((xflags & FS_XFLAG_DAX) && init)
> > >             flags |= S_DAX;
> > >
> > >         inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME);
> > >         inode->i_flags |= flags;
> > >
> > > ends up being much easier to read and results in better code
> > > generation. And we don't need to clear the S_DAX flag when "init" is
> > > set, because we are starting from an inode that has no flags set
> > > (because init!)...
> >
> > This sounds good but I think we need a slight modification to make the function equivalent in functionality.
> >
> > void
> > xfs_diflags_to_iflags(
> >         struct xfs_inode        *ip,
> >         bool init)
> > {
> >         struct inode            *inode = VFS_I(ip);
> >         unsigned int            xflags = xfs_ip2xflags(ip);
> >         unsigned int            flags = 0;
> >
> >         inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME |
> >                             S_DAX);
>
> We don't want to clear the dax flag here, ever, if it is already
> set. That is an externally visible change and opens us up (again) to
> races where IS_DAX() changes half way through a fault path. IOWs, avoiding
> clearing the DAX flag was something I did explicitly in the above
> code fragment.
>
> And it makes the logic clearer by pre-calculating the new flags,
> then clearing and setting the inode flags together, rather than
> having the spearated at the top and bottom of the function.
>
> THis leads to an obvious conclusion: if we never clear the in memory
> S_DAX flag, we can actually clear the on-disk flag safely, so that
> next time the inode cycles into memory it won't be using DAX. IOWs,
> admins can stop the applications, clear the DAX flag and drop
> caches. This should result in the inode being recycled and when the
> app is restarted it will run without DAX. No ned for deleting files,
> copying large data sets, etc just to turn off an inode flag.

Makes sense, but is that sufficient? I recall you saying there might
be a multitude of other reasons that the inode is not evicted, not the
least of which is races [1]. Does this need another flag, lets call it
"dax toggle" to track the "I requested the inode to clear the flag,
but on cache-flush + restart the inode never got evicted" case. S_DAX
almost plays this role, but it loses the ability to audit which files
are pending an inode eviction event. So the dax-toggle flag indicates
to the kernel to xor the toggle value with the inode flag on inode
instantiation and the dax inode flag is never directly manipulated by
the ioctl path.

[1]: http://lore.kernel.org/r/20191025003603.GE4614@dread.disaster.area

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-08 21:02       ` Dave Chinner
  2020-04-08 21:28         ` Dan Williams
@ 2020-04-08 22:07         ` Ira Weiny
  2020-04-08 23:21           ` Dave Chinner
  1 sibling, 1 reply; 47+ messages in thread
From: Ira Weiny @ 2020-04-08 22:07 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Thu, Apr 09, 2020 at 07:02:36AM +1000, Dave Chinner wrote:
> On Wed, Apr 08, 2020 at 10:09:23AM -0700, Ira Weiny wrote:

[snip]

> > 
> > This sounds good but I think we need a slight modification to make the function equivalent in functionality.
> > 
> > void
> > xfs_diflags_to_iflags(
> >         struct xfs_inode        *ip,
> >         bool init)
> > {
> >         struct inode            *inode = VFS_I(ip);
> >         unsigned int            xflags = xfs_ip2xflags(ip);
> >         unsigned int            flags = 0;
> > 
> >         inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME |
> >                             S_DAX);
> 
> We don't want to clear the dax flag here, ever, if it is already
> set. That is an externally visible change and opens us up (again) to
> races where IS_DAX() changes half way through a fault path. IOWs, avoiding
> clearing the DAX flag was something I did explicitly in the above
> code fragment.

<sigh> yes... you are correct.

But I don't like depending on the caller to clear the S_DAX flag if
xfs_inode_enable_dax() is false.  IMO this function should clear the flag in
that case for consistency...

This is part of the reason I used the if/else logic from xfs_diflags_to_linux()
originally.  It is very explicit.

> 
> And it makes the logic clearer by pre-calculating the new flags,
> then clearing and setting the inode flags together, rather than
> having the spearated at the top and bottom of the function.

But this will not clear the S_DAX flag even if init is true.  To me that is a
potential for confusion down the road.

> 
> THis leads to an obvious conclusion: if we never clear the in memory
> S_DAX flag, we can actually clear the on-disk flag safely, so that
> next time the inode cycles into memory it won't be using DAX. IOWs,
> admins can stop the applications, clear the DAX flag and drop
> caches. This should result in the inode being recycled and when the
> app is restarted it will run without DAX. No ned for deleting files,
> copying large data sets, etc just to turn off an inode flag.

We already discussed evicting the inode and it was determined to be too
confusing.[*]

Furthermore, if we did want an interface like that why not allow the on-disk
flag to be set as well as cleared?

IMO, this function should set all of the flags consistently including S_DAX.

Ira

[*] https://lore.kernel.org/lkml/20200403072731.GA24176@lst.de/

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-08 21:28         ` Dan Williams
@ 2020-04-08 22:10           ` Ira Weiny
  2020-04-08 23:58           ` Dave Chinner
  1 sibling, 0 replies; 47+ messages in thread
From: Ira Weiny @ 2020-04-08 22:10 UTC (permalink / raw)
  To: Dan Williams
  Cc: Dave Chinner, Linux Kernel Mailing List, Darrick J. Wong,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Wed, Apr 08, 2020 at 02:28:30PM -0700, Dan Williams wrote:
> On Wed, Apr 8, 2020 at 2:02 PM Dave Chinner <david@fromorbit.com> wrote:
> >

[snip]

> > >
> > > void
> > > xfs_diflags_to_iflags(
> > >         struct xfs_inode        *ip,
> > >         bool init)
> > > {
> > >         struct inode            *inode = VFS_I(ip);
> > >         unsigned int            xflags = xfs_ip2xflags(ip);
> > >         unsigned int            flags = 0;
> > >
> > >         inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME |
> > >                             S_DAX);
> >
> > We don't want to clear the dax flag here, ever, if it is already
> > set. That is an externally visible change and opens us up (again) to
> > races where IS_DAX() changes half way through a fault path. IOWs, avoiding
> > clearing the DAX flag was something I did explicitly in the above
> > code fragment.
> >
> > And it makes the logic clearer by pre-calculating the new flags,
> > then clearing and setting the inode flags together, rather than
> > having the spearated at the top and bottom of the function.
> >
> > THis leads to an obvious conclusion: if we never clear the in memory
> > S_DAX flag, we can actually clear the on-disk flag safely, so that
> > next time the inode cycles into memory it won't be using DAX. IOWs,
> > admins can stop the applications, clear the DAX flag and drop
> > caches. This should result in the inode being recycled and when the
> > app is restarted it will run without DAX. No ned for deleting files,
> > copying large data sets, etc just to turn off an inode flag.
> 
> Makes sense, but is that sufficient? I recall you saying there might
> be a multitude of other reasons that the inode is not evicted, not the
> least of which is races [1]. Does this need another flag, lets call it
> "dax toggle" to track the "I requested the inode to clear the flag,
> but on cache-flush + restart the inode never got evicted" case. S_DAX
> almost plays this role, but it loses the ability to audit which files
> are pending an inode eviction event. So the dax-toggle flag indicates
> to the kernel to xor the toggle value with the inode flag on inode
> instantiation and the dax inode flag is never directly manipulated by
> the ioctl path.
> 
> [1]: http://lore.kernel.org/r/20191025003603.GE4614@dread.disaster.area

FWIW I think we should continue down this simplified interface and get this
done for 5.8.  If we can come up with a way for delayed mode change I'm all for
looking into that.  But there has been too much controversy/difficulty about
changing the bit on a file.

So let's table this idea until >= 5.9

Ira


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 7/8] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate() to xfs_ioctl_dax_check()
  2020-04-08 21:09       ` Dave Chinner
@ 2020-04-08 22:26         ` Ira Weiny
  2020-04-08 23:48           ` Dave Chinner
  0 siblings, 1 reply; 47+ messages in thread
From: Ira Weiny @ 2020-04-08 22:26 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jan Kara, linux-kernel, Darrick J. Wong, Dan Williams,
	Christoph Hellwig, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Thu, Apr 09, 2020 at 07:09:50AM +1000, Dave Chinner wrote:
> On Wed, Apr 08, 2020 at 11:58:03AM +0200, Jan Kara wrote:
> > On Wed 08-04-20 12:23:18, Dave Chinner wrote:
> > > On Tue, Apr 07, 2020 at 11:29:57AM -0700, ira.weiny@intel.com wrote:
> > > > From: Ira Weiny <ira.weiny@intel.com>
> > > > 
> > > > We only support changing FS_XFLAG_DAX on directories.  Files get their
> > > > flag from the parent directory on creation only.  So no data
> > > > invalidation needs to happen.
> > > 
> > > Which leads me to ask: how are users and/or admins supposed to
> > > remove the flag from regular files once it is set in the filesystem?
> > > 
> > > Only being able to override the flag via the "dax=never" mount
> > > option means that once the flag is set, nobody can ever remove it
> > > and they can only globally turn off dax if it gets set incorrectly.
> > > It also means a global interrupt because all apps on the filesystem
> > > need to be stopped so the filesystem can be unmounted and mounted
> > > again with dax=never. This is highly unfriendly to admins and users.
> > > 
> > > IOWs, we _must_ be able to clear this inode flag on regular inodes
> > > in some way. I don't care if it doesn't change the current in-memory
> > > state, but we must be able to clear the flags so that the next time
> > > the inodes are instantiated DAX is not enabled for those files...
> > 
> > Well, there's one way to clear the flag: delete the file. If you still care
> > about the data, you can copy the data first. It isn't very convenient, I
> > agree, and effectively means restarting whatever application that is using
> > the file.
> 
> Restarting the application is fine. Having to backup/restore or copy
> the entire data set just to turn off an inode flag? That's not a
> viable management strategy. We could be talking about terabytes of
> data here.
> 
> I explained how we can safely remove the flag in the other branch of
> this thread...
> 
> > But it seems like more understandable API than letting user clear
> > the on-disk flag but the inode will still use DAX until kernel decides to
> > evict the inode
> 
> Certainly doesn't seem that way to me. "stop app, clear flags, drop
> caches, restart app" is a pretty simple, easy thing to do for an
> admin.

I want to be clear here: I think this is reasonable.  However, I don't see
consensus for that interface.

Christoph in particular said that a 'lazy change' is: "... straight from
the playbook for arcane and confusing API designs."

	"But returning an error and doing a lazy change anyway is straight from
	the playbook for arcane and confusing API designs."

	-- https://lore.kernel.org/lkml/20200403072731.GA24176@lst.de/

Did I somehow misunderstand this?

Again for this patch set, 5.8, lets leave that alone for now.  I think if we
disable setting this on files right now we can still allow it in the future as
another step forward.

> 
> Especially compared to process that is effectively "stop app, backup
> data set, delete data set, clear flags, restore data set, restart
> app"
> 
> > - because that often means you need to restart the
> > application using the file anyway for the flag change to have any effect.
> 
> That's a trivial requirement compared to the downtime and resource
> cost of a data set backup/restore just to clear inode flags....
> 

I agree but others do not.  This still provides a baby step forward and some
granularity for those who plan out the creation of their files.

Ira


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-08 22:07         ` Ira Weiny
@ 2020-04-08 23:21           ` Dave Chinner
  2020-04-09  0:12             ` Ira Weiny
  0 siblings, 1 reply; 47+ messages in thread
From: Dave Chinner @ 2020-04-08 23:21 UTC (permalink / raw)
  To: Ira Weiny
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Wed, Apr 08, 2020 at 03:07:35PM -0700, Ira Weiny wrote:
> On Thu, Apr 09, 2020 at 07:02:36AM +1000, Dave Chinner wrote:
> > On Wed, Apr 08, 2020 at 10:09:23AM -0700, Ira Weiny wrote:
> 
> [snip]
> 
> > > 
> > > This sounds good but I think we need a slight modification to make the function equivalent in functionality.
> > > 
> > > void
> > > xfs_diflags_to_iflags(
> > >         struct xfs_inode        *ip,
> > >         bool init)
> > > {
> > >         struct inode            *inode = VFS_I(ip);
> > >         unsigned int            xflags = xfs_ip2xflags(ip);
> > >         unsigned int            flags = 0;
> > > 
> > >         inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME |
> > >                             S_DAX);
> > 
> > We don't want to clear the dax flag here, ever, if it is already
> > set. That is an externally visible change and opens us up (again) to
> > races where IS_DAX() changes half way through a fault path. IOWs, avoiding
> > clearing the DAX flag was something I did explicitly in the above
> > code fragment.
> 
> <sigh> yes... you are correct.
> 
> But I don't like depending on the caller to clear the S_DAX flag if
> xfs_inode_enable_dax() is false.  IMO this function should clear the flag in
> that case for consistency...

No. We simply cannot do that here except in the init case when the
inode is not yet visible to userspace. In which case, we know -for
certain- that the S_DAX is not set, and hence we do not need to
clear it. Initial conditions matter!

If you want to make sure of this, add this:

	ASSERT(!(IS_DAX(inode) && init));

And now we'll catch inodes that incorrectly have S_DAX set at init
time.

> > memory S_DAX flag, we can actually clear the on-disk flag
> > safely, so that next time the inode cycles into memory it won't
> > be using DAX. IOWs, admins can stop the applications, clear the
> > DAX flag and drop caches. This should result in the inode being
> > recycled and when the app is restarted it will run without DAX.
> > No ned for deleting files, copying large data sets, etc just to
> > turn off an inode flag.
> 
> We already discussed evicting the inode and it was determined to
> be too confusing.[*]

That discussion did not even consider how admins are supposed to
clear the inode flag once it is set on disk. It was entirely
focussed around "we can't change in memory S_DAX state" and how the
tri-state mount option to "override" the on-disk flag could be done.

Nobody noticed that being unable to rmeove the on-disk flag means
the admin's only option to turn off dax for an application is to
turn it off for everything, filesystem wide, which requires:

	1. stopping the app.
	2. stopping every other app using the filesystem
	3. unmounting the filesystem
	4. changing to dax=never mount option
	5. mounting the filesystem
	6. restarting all apps.

It's a hard stop for everything using the filesystem, and it changes
the runtime environment for all applications, not just the one that
needs DAX turned off.  Not to mention that if it's the root
filesystem that is using DAX, then it's a full system reboot needed
to change the mount options.

IMO, this is a non-starter from a production point of view - testing
and qualification of all applications rather than just the affected
app is required to make this sort of change.  It simply does not
follow the "minimal change to fix the problem" rules for managing
issues in production environments.

So, pLease explain to me how this process:

	1. stop the app
	2. remove inode flags via xfs_io
	3. run drop_caches
	4. start the app

is worse than requiring admins to unmount the filesystem to turn off
DAX for an application.

> Furthermore, if we did want an interface like that why not allow
> the on-disk flag to be set as well as cleared?

Well, why not - it's why I implemented the flag in the first place!
The only problem we have here is how to safely change the in-memory
DAX state, and that largely has nothing to do with setting/clearing
the on-disk flag....

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 7/8] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate() to xfs_ioctl_dax_check()
  2020-04-08 22:26         ` Ira Weiny
@ 2020-04-08 23:48           ` Dave Chinner
  2020-04-09 12:28             ` Christoph Hellwig
  0 siblings, 1 reply; 47+ messages in thread
From: Dave Chinner @ 2020-04-08 23:48 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Jan Kara, linux-kernel, Darrick J. Wong, Dan Williams,
	Christoph Hellwig, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Wed, Apr 08, 2020 at 03:26:36PM -0700, Ira Weiny wrote:
> On Thu, Apr 09, 2020 at 07:09:50AM +1000, Dave Chinner wrote:
> > On Wed, Apr 08, 2020 at 11:58:03AM +0200, Jan Kara wrote:
> > I explained how we can safely remove the flag in the other branch of
> > this thread...
> > 
> > > But it seems like more understandable API than letting user clear
> > > the on-disk flag but the inode will still use DAX until kernel decides to
> > > evict the inode
> > 
> > Certainly doesn't seem that way to me. "stop app, clear flags, drop
> > caches, restart app" is a pretty simple, easy thing to do for an
> > admin.
> 
> I want to be clear here: I think this is reasonable.  However, I don't see
> consensus for that interface.
> 
> Christoph in particular said that a 'lazy change' is: "... straight from
> the playbook for arcane and confusing API designs."
> 
> 	"But returning an error and doing a lazy change anyway is straight from
> 	the playbook for arcane and confusing API designs."
> 
> 	-- https://lore.kernel.org/lkml/20200403072731.GA24176@lst.de/
> 
> Did I somehow misunderstand this?

Yes. Clearing the on-disk flag successfully should not return an
error.

What is wrong is having it clear the flag successfully and returning
an error because the operation doesn't take immediate effect, then
having the change take effect later after telling the application
there was an error.

That's what Christoph was saying is "straight from the playbook for
arcane and confusing API designs."

There's absolutely nothing wrong with setting/clearing the on-disk
flag and having the change take effect some time later depending on
some external context. We've done this sort of thing for a -long
time- and it's not XFS specific at all.

e.g.  changing the on-disk APPEND flag doesn't change the write
behaviour of currently open files - it only affects the behaviour of
future file opens. IOWs, we can have the flag set on disk, but we
can still write randomly to the inode as long as we have a file
descriptor that was opened before the APPEND on disk flag was set.

That's exactly the same class of behaviour as we are talking about
here for the on-disk DAX flag.

> > Especially compared to process that is effectively "stop app, backup
> > data set, delete data set, clear flags, restore data set, restart
> > app"
> > 
> > > - because that often means you need to restart the
> > > application using the file anyway for the flag change to have any effect.
> > 
> > That's a trivial requirement compared to the downtime and resource
> > cost of a data set backup/restore just to clear inode flags....
> 
> I agree but others do not.  This still provides a baby step forward and some

It's not a baby step forward. We can't expose a behaviour to
userspace and then decide to change it completely at some later
date.  We have to think through the entire admin model before
setting it in concrete.

If an admin operation can set an optional persistent feature flags
on a file, then there *must* be admin operations that can remove
that persistent feature flag from said files. This has *nothing to
do with DAX* - it's a fundamental principle of balanced system
design.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-08 21:28         ` Dan Williams
  2020-04-08 22:10           ` Ira Weiny
@ 2020-04-08 23:58           ` Dave Chinner
  2020-04-09  0:22             ` Ira Weiny
  1 sibling, 1 reply; 47+ messages in thread
From: Dave Chinner @ 2020-04-08 23:58 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Linux Kernel Mailing List, Darrick J. Wong,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Wed, Apr 08, 2020 at 02:28:30PM -0700, Dan Williams wrote:
> On Wed, Apr 8, 2020 at 2:02 PM Dave Chinner <david@fromorbit.com> wrote:
> > THis leads to an obvious conclusion: if we never clear the in memory
> > S_DAX flag, we can actually clear the on-disk flag safely, so that
> > next time the inode cycles into memory it won't be using DAX. IOWs,
> > admins can stop the applications, clear the DAX flag and drop
> > caches. This should result in the inode being recycled and when the
> > app is restarted it will run without DAX. No ned for deleting files,
> > copying large data sets, etc just to turn off an inode flag.
> 
> Makes sense, but is that sufficient? I recall you saying there might
> be a multitude of other reasons that the inode is not evicted, not the
> least of which is races [1]. Does this need another flag, lets call it
> "dax toggle" to track the "I requested the inode to clear the flag,
> but on cache-flush + restart the inode never got evicted" case.

You mean something like XFS_IDONTCACHE?

i.e. the functionality already exists in XFS to selectively evict an
inode from cache when the last reference to it is dropped rather
than let it go to the LRUs and hang around in memory.

That flag can be set when changing the on disk DAX flag, and we can
tweak how it works so new cache hits don't clear it (as happens
now). Hence the only thing that can prevent eviction are active
references.

That means we'll still need to stop the application and drop_caches,
because we need to close all the files and purge the dentries that
hold references to the inode before it can be evicted.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-08 23:21           ` Dave Chinner
@ 2020-04-09  0:12             ` Ira Weiny
  2020-04-09  0:30               ` Darrick J. Wong
  2020-04-09  0:49               ` Dave Chinner
  0 siblings, 2 replies; 47+ messages in thread
From: Ira Weiny @ 2020-04-09  0:12 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Thu, Apr 09, 2020 at 09:21:06AM +1000, Dave Chinner wrote:
> On Wed, Apr 08, 2020 at 03:07:35PM -0700, Ira Weiny wrote:
> > On Thu, Apr 09, 2020 at 07:02:36AM +1000, Dave Chinner wrote:
> > > On Wed, Apr 08, 2020 at 10:09:23AM -0700, Ira Weiny wrote:
> > 
> > [snip]
> > 
> > > > 
> > > > This sounds good but I think we need a slight modification to make the function equivalent in functionality.
> > > > 
> > > > void
> > > > xfs_diflags_to_iflags(
> > > >         struct xfs_inode        *ip,
> > > >         bool init)
> > > > {
> > > >         struct inode            *inode = VFS_I(ip);
> > > >         unsigned int            xflags = xfs_ip2xflags(ip);
> > > >         unsigned int            flags = 0;
> > > > 
> > > >         inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME |
> > > >                             S_DAX);
> > > 
> > > We don't want to clear the dax flag here, ever, if it is already
> > > set. That is an externally visible change and opens us up (again) to
> > > races where IS_DAX() changes half way through a fault path. IOWs, avoiding
> > > clearing the DAX flag was something I did explicitly in the above
> > > code fragment.
> > 
> > <sigh> yes... you are correct.
> > 
> > But I don't like depending on the caller to clear the S_DAX flag if
> > xfs_inode_enable_dax() is false.  IMO this function should clear the flag in
> > that case for consistency...
> 
> No. We simply cannot do that here except in the init case when the
> inode is not yet visible to userspace. In which case, we know -for
> certain- that the S_DAX is not set, and hence we do not need to
> clear it. Initial conditions matter!
> 
> If you want to make sure of this, add this:
> 
> 	ASSERT(!(IS_DAX(inode) && init));
> 
> And now we'll catch inodes that incorrectly have S_DAX set at init
> time.

Ok, that will work.  Also documents that expected initial condition.

> 
> > > memory S_DAX flag, we can actually clear the on-disk flag
> > > safely, so that next time the inode cycles into memory it won't
> > > be using DAX. IOWs, admins can stop the applications, clear the
> > > DAX flag and drop caches. This should result in the inode being
> > > recycled and when the app is restarted it will run without DAX.
> > > No ned for deleting files, copying large data sets, etc just to
> > > turn off an inode flag.
> > 
> > We already discussed evicting the inode and it was determined to
> > be too confusing.[*]
> 
> That discussion did not even consider how admins are supposed to
> clear the inode flag once it is set on disk.

I think this is a bit unfair.  I think we were all considering it...  and I
still think this patch set is a step in the right direction.

> It was entirely
> focussed around "we can't change in memory S_DAX state"

Not true.  There were many ideas on how to change the FS_XFLAG_DAX with some
sort of delayed S_DAX state to avoid changing S_DAX on an in memory inode.

I made the comment:

	"... I want to clarify.  ...  we could have the flag change with an
	appropriate error which could let the user know the change has been
	delayed."

	-- https://lore.kernel.org/lkml/20200402205518.GF3952565@iweiny-DESK2.sc.intel.com/

Jan made multiple comments:

	"I generally like the proposal but I think the fact that toggling
	FS_XFLAG_DAX will not have immediate effect on S_DAX will cause quite
	some confusion and ultimately bug reports."

	-- https://lore.kernel.org/lkml/20200401102511.GC19466@quack2.suse.cz/


	"Just switch FS_XFLAG_DAX flag, S_DAX flag will magically switch when
	inode gets evicted and the inode gets reloaded from the disk again. Did
	I misunderstand anything?

	And my thinking was that this is surprising behavior for the user and
	so it will likely generate lots of bug reports along the lines of "DAX
	inode flag does not work!"."

	-- https://lore.kernel.org/lkml/20200403170338.GD29920@quack2.suse.cz/

Darrick also had similar ideas/comments.

Christoph did say:

	"A reasonably smart application can try to evict itself."

	-- https://lore.kernel.org/lkml/20200403072731.GA24176@lst.de/

Which I was unclear about???

Christoph does this mean you would be ok with changing the FS_XFLAG_DAX on disk
and letting S_DAX change later?

> and how the
> tri-state mount option to "override" the on-disk flag could be done.
> 
> Nobody noticed that being unable to rmeove the on-disk flag means
> the admin's only option to turn off dax for an application is to
> turn it off for everything, filesystem wide, which requires:

No. This is not entirely true.  While I don't like the idea of having to copy
data (and I agree with your points) it is possible to do.

> 
> 	1. stopping the app.
> 	2. stopping every other app using the filesystem
> 	3. unmounting the filesystem
> 	4. changing to dax=never mount option

I don't understand why we need to unmount and mount with dax=never?

> 	5. mounting the filesystem
> 	6. restarting all apps.
> 
> It's a hard stop for everything using the filesystem, and it changes
> the runtime environment for all applications, not just the one that
> needs DAX turned off.  Not to mention that if it's the root
> filesystem that is using DAX, then it's a full system reboot needed
> to change the mount options.
> 
> IMO, this is a non-starter from a production point of view - testing
> and qualification of all applications rather than just the affected
> app is required to make this sort of change.  It simply does not
> follow the "minimal change to fix the problem" rules for managing
> issues in production environments.
> 
> So, pLease explain to me how this process:
> 
> 	1. stop the app
> 	2. remove inode flags via xfs_io
> 	3. run drop_caches
> 	4. start the app
> 
> is worse than requiring admins to unmount the filesystem to turn off
> DAX for an application.

Jan?  Christoph?

> 
> > Furthermore, if we did want an interface like that why not allow
> > the on-disk flag to be set as well as cleared?
> 
> Well, why not - it's why I implemented the flag in the first place!
> The only problem we have here is how to safely change the in-memory
> DAX state, and that largely has nothing to do with setting/clearing
> the on-disk flag....

With the above change to xfs_diflags_to_iflags() I think we are ok here.

Ira


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-08 23:58           ` Dave Chinner
@ 2020-04-09  0:22             ` Ira Weiny
  2020-04-09 12:41               ` Christoph Hellwig
  0 siblings, 1 reply; 47+ messages in thread
From: Ira Weiny @ 2020-04-09  0:22 UTC (permalink / raw)
  To: Dave Chinner, Jan Kara, Christoph Hellwig
  Cc: Dan Williams, Linux Kernel Mailing List, Darrick J. Wong,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Thu, Apr 09, 2020 at 09:58:36AM +1000, Dave Chinner wrote:
> On Wed, Apr 08, 2020 at 02:28:30PM -0700, Dan Williams wrote:
> > On Wed, Apr 8, 2020 at 2:02 PM Dave Chinner <david@fromorbit.com> wrote:
> > > THis leads to an obvious conclusion: if we never clear the in memory
> > > S_DAX flag, we can actually clear the on-disk flag safely, so that
> > > next time the inode cycles into memory it won't be using DAX. IOWs,
> > > admins can stop the applications, clear the DAX flag and drop
> > > caches. This should result in the inode being recycled and when the
> > > app is restarted it will run without DAX. No ned for deleting files,
> > > copying large data sets, etc just to turn off an inode flag.
> > 
> > Makes sense, but is that sufficient? I recall you saying there might
> > be a multitude of other reasons that the inode is not evicted, not the
> > least of which is races [1]. Does this need another flag, lets call it
> > "dax toggle" to track the "I requested the inode to clear the flag,
> > but on cache-flush + restart the inode never got evicted" case.
> 
> You mean something like XFS_IDONTCACHE?
> 
> i.e. the functionality already exists in XFS to selectively evict an
> inode from cache when the last reference to it is dropped rather
> than let it go to the LRUs and hang around in memory.
> 
> That flag can be set when changing the on disk DAX flag, and we can
> tweak how it works so new cache hits don't clear it (as happens
> now). Hence the only thing that can prevent eviction are active
> references.
> 
> That means we'll still need to stop the application and drop_caches,
> because we need to close all the files and purge the dentries that
> hold references to the inode before it can be evicted.

That sounds like a great idea...

Jan?  Christoph?

I will reiterate though that any delayed S_DAX change be done as a follow on
series to the one proposed here.

Ira

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-09  0:12             ` Ira Weiny
@ 2020-04-09  0:30               ` Darrick J. Wong
  2020-04-09 15:29                 ` Ira Weiny
  2020-04-09  0:49               ` Dave Chinner
  1 sibling, 1 reply; 47+ messages in thread
From: Darrick J. Wong @ 2020-04-09  0:30 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dave Chinner, linux-kernel, Dan Williams, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Wed, Apr 08, 2020 at 05:12:06PM -0700, Ira Weiny wrote:
> On Thu, Apr 09, 2020 at 09:21:06AM +1000, Dave Chinner wrote:
> > On Wed, Apr 08, 2020 at 03:07:35PM -0700, Ira Weiny wrote:
> > > On Thu, Apr 09, 2020 at 07:02:36AM +1000, Dave Chinner wrote:
> > > > On Wed, Apr 08, 2020 at 10:09:23AM -0700, Ira Weiny wrote:
> > > 
> > > [snip]
> > > 
> > > > > 
> > > > > This sounds good but I think we need a slight modification to make the function equivalent in functionality.
> > > > > 
> > > > > void
> > > > > xfs_diflags_to_iflags(
> > > > >         struct xfs_inode        *ip,
> > > > >         bool init)
> > > > > {
> > > > >         struct inode            *inode = VFS_I(ip);
> > > > >         unsigned int            xflags = xfs_ip2xflags(ip);
> > > > >         unsigned int            flags = 0;
> > > > > 
> > > > >         inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME |
> > > > >                             S_DAX);
> > > > 
> > > > We don't want to clear the dax flag here, ever, if it is already
> > > > set. That is an externally visible change and opens us up (again) to
> > > > races where IS_DAX() changes half way through a fault path. IOWs, avoiding
> > > > clearing the DAX flag was something I did explicitly in the above
> > > > code fragment.
> > > 
> > > <sigh> yes... you are correct.
> > > 
> > > But I don't like depending on the caller to clear the S_DAX flag if
> > > xfs_inode_enable_dax() is false.  IMO this function should clear the flag in
> > > that case for consistency...
> > 
> > No. We simply cannot do that here except in the init case when the
> > inode is not yet visible to userspace. In which case, we know -for
> > certain- that the S_DAX is not set, and hence we do not need to
> > clear it. Initial conditions matter!
> > 
> > If you want to make sure of this, add this:
> > 
> > 	ASSERT(!(IS_DAX(inode) && init));
> > 
> > And now we'll catch inodes that incorrectly have S_DAX set at init
> > time.
> 
> Ok, that will work.  Also documents that expected initial condition.
> 
> > 
> > > > memory S_DAX flag, we can actually clear the on-disk flag
> > > > safely, so that next time the inode cycles into memory it won't
> > > > be using DAX. IOWs, admins can stop the applications, clear the
> > > > DAX flag and drop caches. This should result in the inode being
> > > > recycled and when the app is restarted it will run without DAX.
> > > > No ned for deleting files, copying large data sets, etc just to
> > > > turn off an inode flag.
> > > 
> > > We already discussed evicting the inode and it was determined to
> > > be too confusing.[*]
> > 
> > That discussion did not even consider how admins are supposed to
> > clear the inode flag once it is set on disk.
> 
> I think this is a bit unfair.  I think we were all considering it...  and I
> still think this patch set is a step in the right direction.
> 
> > It was entirely
> > focussed around "we can't change in memory S_DAX state"
> 
> Not true.  There were many ideas on how to change the FS_XFLAG_DAX with some
> sort of delayed S_DAX state to avoid changing S_DAX on an in memory inode.
> 
> I made the comment:
> 
> 	"... I want to clarify.  ...  we could have the flag change with an
> 	appropriate error which could let the user know the change has been
> 	delayed."
> 
> 	-- https://lore.kernel.org/lkml/20200402205518.GF3952565@iweiny-DESK2.sc.intel.com/
> 
> Jan made multiple comments:
> 
> 	"I generally like the proposal but I think the fact that toggling
> 	FS_XFLAG_DAX will not have immediate effect on S_DAX will cause quite
> 	some confusion and ultimately bug reports."
> 
> 	-- https://lore.kernel.org/lkml/20200401102511.GC19466@quack2.suse.cz/
> 
> 
> 	"Just switch FS_XFLAG_DAX flag, S_DAX flag will magically switch when
> 	inode gets evicted and the inode gets reloaded from the disk again. Did
> 	I misunderstand anything?
> 
> 	And my thinking was that this is surprising behavior for the user and
> 	so it will likely generate lots of bug reports along the lines of "DAX
> 	inode flag does not work!"."
> 
> 	-- https://lore.kernel.org/lkml/20200403170338.GD29920@quack2.suse.cz/
> 
> Darrick also had similar ideas/comments.
> 
> Christoph did say:
> 
> 	"A reasonably smart application can try to evict itself."
> 
> 	-- https://lore.kernel.org/lkml/20200403072731.GA24176@lst.de/
> 
> Which I was unclear about???
> 
> Christoph does this mean you would be ok with changing the FS_XFLAG_DAX on disk
> and letting S_DAX change later?
> 
> > and how the
> > tri-state mount option to "override" the on-disk flag could be done.
> > 
> > Nobody noticed that being unable to rmeove the on-disk flag means
> > the admin's only option to turn off dax for an application is to
> > turn it off for everything, filesystem wide, which requires:
> 
> No. This is not entirely true.  While I don't like the idea of having to copy
> data (and I agree with your points) it is possible to do.

But now that I think about it, that's really going to be a PITA, and
probably more of a pain than if the two DAX flags are only loosely
coupled.

> > 
> > 	1. stopping the app.
> > 	2. stopping every other app using the filesystem
> > 	3. unmounting the filesystem
> > 	4. changing to dax=never mount option
> 
> I don't understand why we need to unmount and mount with dax=never?

I've realized that if you can /never/ clear FS_XFLAG_DAX from a file,
then the only way to force it off is dax=never (so the kernel ignores
it) or move the fs to a non-pmem storage (so the kernel doesn't even
try).

> > 	5. mounting the filesystem
> > 	6. restarting all apps.
> > 
> > It's a hard stop for everything using the filesystem, and it changes
> > the runtime environment for all applications, not just the one that
> > needs DAX turned off.  Not to mention that if it's the root
> > filesystem that is using DAX, then it's a full system reboot needed
> > to change the mount options.
> > 
> > IMO, this is a non-starter from a production point of view - testing
> > and qualification of all applications rather than just the affected
> > app is required to make this sort of change.  It simply does not
> > follow the "minimal change to fix the problem" rules for managing
> > issues in production environments.
> > 
> > So, pLease explain to me how this process:
> > 
> > 	1. stop the app
> > 	2. remove inode flags via xfs_io
> > 	3. run drop_caches
> > 	4. start the app
> > 
> > is worse than requiring admins to unmount the filesystem to turn off
> > DAX for an application.
> 
> Jan?  Christoph?

But you're right, this thing keeps swirling around and around and around
because we can't ever get to agreement on this.  Maybe I'll just become
XFS BOFH MAINTAINER and make a decision like this:

 1 Applications must call statx to discover the current S_DAX state.

 2 There exists an advisory file inode flag FS_XFLAG_DAX that is set based on
   the parent directory FS_XFLAG_DAX inode flag.  This advisory flag can be
   changed after file creation, but it does not immediately affect the S_DAX
   state.

   If FS_XFLAG_DAX is set and the fs is on pmem then it will enable S_DAX at
   inode load time; if FS_XFLAG_DAX is not set, it will not enable S_DAX.
   Unless overridden...

 3 There exists a dax= mount option.

   "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
   "-o dax=always" means "always set S_DAX (at least on pmem), ignore FS_XFLAG_DAX"
        "-o dax" by itself means "dax=always"
   "-o dax=iflag" means "follow FS_XFLAG_DAX" and is the default

 4 There exists an advisory directory inode flag FS_XFLAG_DAX that can be
   changed at any time.  The flag state is copied into any files or
   subdirectories when they are created within that directory.  If programs
   require file access runs in S_DAX mode, they must create those files
   inside a directory with FS_XFLAG_DAX set, or mount the fs with an
   appropriate dax mount option.

 5 Programs that require a specific file access mode (DAX or not DAX) must
   do one of the following:

   (a) create files in directories with the FS_XFLAG_DAX flag set as needed;

   (b) have the administrator set an override via mount option;

   (c) if they need to change a file's FS_XFLAG_DAX flag so that it does not
       match the S_DAX state (as reported by statx), they must cause the
       kernel to evict the inode from memory.  This can be done by:

       i>   closing the file;
       ii>  re-opening the file and using statx to see if the fs has
            changed the S_DAX flag;
       iii> if not, either unmount and remount the filesystem, or
            closing the file and using drop_caches.

 6 I no longer think it's too wild to require that users who want to
   squeeze every last bit of performance out of the particular rough and
   tumble bits of their storage also be exposed to the difficulties of
   what happens when the operating system can't totally virtualize those
   hardware capabilities.  Your high performance sports car is not a
   Toyota minivan, as it were.

I think (like Dave said) that if you set XFS_IDONTCACHE on the inode
when you change the DAX flag, the VFS will kill the inode the instant
the last user close()s the file.  Then 5.c.ii will actually work.

--D

> > 
> > > Furthermore, if we did want an interface like that why not allow
> > > the on-disk flag to be set as well as cleared?
> > 
> > Well, why not - it's why I implemented the flag in the first place!
> > The only problem we have here is how to safely change the in-memory
> > DAX state, and that largely has nothing to do with setting/clearing
> > the on-disk flag....
> 
> With the above change to xfs_diflags_to_iflags() I think we are ok here.
> 
> Ira
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-09  0:12             ` Ira Weiny
  2020-04-09  0:30               ` Darrick J. Wong
@ 2020-04-09  0:49               ` Dave Chinner
  2020-04-09 12:40                 ` Christoph Hellwig
  1 sibling, 1 reply; 47+ messages in thread
From: Dave Chinner @ 2020-04-09  0:49 UTC (permalink / raw)
  To: Ira Weiny
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Wed, Apr 08, 2020 at 05:12:06PM -0700, Ira Weiny wrote:
> On Thu, Apr 09, 2020 at 09:21:06AM +1000, Dave Chinner wrote:
> > On Wed, Apr 08, 2020 at 03:07:35PM -0700, Ira Weiny wrote:
> > > On Thu, Apr 09, 2020 at 07:02:36AM +1000, Dave Chinner wrote:
> > > > On Wed, Apr 08, 2020 at 10:09:23AM -0700, Ira Weiny wrote:
> > > 
> > > [snip]
> > > 
> > > > > 
> > > > > This sounds good but I think we need a slight modification to make the function equivalent in functionality.
> > > > > 
> > > > > void
> > > > > xfs_diflags_to_iflags(
> > > > >         struct xfs_inode        *ip,
> > > > >         bool init)
> > > > > {
> > > > >         struct inode            *inode = VFS_I(ip);
> > > > >         unsigned int            xflags = xfs_ip2xflags(ip);
> > > > >         unsigned int            flags = 0;
> > > > > 
> > > > >         inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME |
> > > > >                             S_DAX);
> > > > 
> > > > We don't want to clear the dax flag here, ever, if it is already
> > > > set. That is an externally visible change and opens us up (again) to
> > > > races where IS_DAX() changes half way through a fault path. IOWs, avoiding
> > > > clearing the DAX flag was something I did explicitly in the above
> > > > code fragment.
> > > 
> > > <sigh> yes... you are correct.
> > > 
> > > But I don't like depending on the caller to clear the S_DAX flag if
> > > xfs_inode_enable_dax() is false.  IMO this function should clear the flag in
> > > that case for consistency...
> > 
> > No. We simply cannot do that here except in the init case when the
> > inode is not yet visible to userspace. In which case, we know -for
> > certain- that the S_DAX is not set, and hence we do not need to
> > clear it. Initial conditions matter!
> > 
> > If you want to make sure of this, add this:
> > 
> > 	ASSERT(!(IS_DAX(inode) && init));
> > 
> > And now we'll catch inodes that incorrectly have S_DAX set at init
> > time.
> 
> Ok, that will work.  Also documents that expected initial condition.
> 
> > 
> > > > memory S_DAX flag, we can actually clear the on-disk flag
> > > > safely, so that next time the inode cycles into memory it won't
> > > > be using DAX. IOWs, admins can stop the applications, clear the
> > > > DAX flag and drop caches. This should result in the inode being
> > > > recycled and when the app is restarted it will run without DAX.
> > > > No ned for deleting files, copying large data sets, etc just to
> > > > turn off an inode flag.
> > > 
> > > We already discussed evicting the inode and it was determined to
> > > be too confusing.[*]
> > 
> > That discussion did not even consider how admins are supposed to
> > clear the inode flag once it is set on disk.
> 
> I think this is a bit unfair.  I think we were all considering it...  and I
> still think this patch set is a step in the right direction.
> 
> > It was entirely
> > focussed around "we can't change in memory S_DAX state"
> 
> Not true.  There were many ideas on how to change the FS_XFLAG_DAX with some
> sort of delayed S_DAX state to avoid changing S_DAX on an in memory inode.
> 
> I made the comment:
> 
> 	"... I want to clarify.  ...  we could have the flag change with an
> 	appropriate error which could let the user know the change has been
> 	delayed."
> 
> 	-- https://lore.kernel.org/lkml/20200402205518.GF3952565@iweiny-DESK2.sc.intel.com/
> 
> Jan made multiple comments:
> 
> 	"I generally like the proposal but I think the fact that toggling
> 	FS_XFLAG_DAX will not have immediate effect on S_DAX will cause quite
> 	some confusion and ultimately bug reports."
> 
> 	-- https://lore.kernel.org/lkml/20200401102511.GC19466@quack2.suse.cz/
> 
> 
> 	"Just switch FS_XFLAG_DAX flag, S_DAX flag will magically switch when
> 	inode gets evicted and the inode gets reloaded from the disk again. Did
> 	I misunderstand anything?
> 
> 	And my thinking was that this is surprising behavior for the user and
> 	so it will likely generate lots of bug reports along the lines of "DAX
> 	inode flag does not work!"."
> 
> 	-- https://lore.kernel.org/lkml/20200403170338.GD29920@quack2.suse.cz/
> 
> Darrick also had similar ideas/comments.

None of these consider how the admin is supposed to remove
the flag once it is set. They all talk about issues that result
from setting/clearing the flag inside the kernel, and don't consider
the administration impacts of having an unclearable flag on disk
that the kernel uses for operational policy decisions.

Nobody said "hey, wait, if we don't allow the flag to be cleared
once the flag is set, how do we actually remove it so the admin can
stop overriding them globally with the mount option? If the kernel
can't clear the flag, then what mechanism are we going to provide to
let them clear it without interrupting production?"

> Christoph did say:
> 
> 	"A reasonably smart application can try to evict itself."
> 
> 	-- https://lore.kernel.org/lkml/20200403072731.GA24176@lst.de/

I'd love to know how an unprivileged application can force the
eviction of an inode from cache. If the application cannot evict
files it is using reliably, then it's no better solution than
drop_caches. And given that userspace has to take references to
files to access them, by definition a file that userspace is trying
to evict will have active references and hence cannot be evicted.

However, if userspace can reliably evict inodes that it can access,
then we have a timing attack vector that we need to address.  i.e.
by now everyone here should know that user controlled cache eviction
is the basic requirement for creating most OS and CPU level timing
attacks....

> > and how the
> > tri-state mount option to "override" the on-disk flag could be done.
> > 
> > Nobody noticed that being unable to rmeove the on-disk flag means
> > the admin's only option to turn off dax for an application is to
> > turn it off for everything, filesystem wide, which requires:
> 
> No. This is not entirely true.  While I don't like the idea of having to copy
> data (and I agree with your points) it is possible to do.
> 
> > 
> > 	1. stopping the app.
> > 	2. stopping every other app using the filesystem
> > 	3. unmounting the filesystem
> > 	4. changing to dax=never mount option
> 
> I don't understand why we need to unmount and mount with dax=never?

1. IIUC your patches correctly, that's exactly how you implemented
the dax=... mount option.

2. If you don't unmount the filesystem and only require a remount,
then changing the mount option has all the same "delayed effect"
issues that changing the on-disk flag has. i.e. We can't change the
in-memory inode state until the inode has been evicted from cache
and a remount doesn't guarantee that cache eviction will succeed.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 2/8] fs: Remove unneeded IS_DAX() check
  2020-04-07 18:29 ` [PATCH V6 2/8] fs: Remove unneeded IS_DAX() check ira.weiny
@ 2020-04-09  7:31   ` Christoph Hellwig
  2020-04-09 14:57     ` Ira Weiny
  0 siblings, 1 reply; 47+ messages in thread
From: Christoph Hellwig @ 2020-04-09  7:31 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, Dave Chinner, Jan Kara, Darrick J. Wong,
	Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jeff Moyer, linux-ext4, linux-xfs,
	linux-fsdevel

On Tue, Apr 07, 2020 at 11:29:52AM -0700, ira.weiny@intel.com wrote:
>  static inline bool io_is_direct(struct file *filp)
>  {
> -	return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host);
> +	return (filp->f_flags & O_DIRECT);
>  }

As requested last time: Can you please also just remove io_is_direct?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 7/8] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate() to xfs_ioctl_dax_check()
  2020-04-08 23:48           ` Dave Chinner
@ 2020-04-09 12:28             ` Christoph Hellwig
  0 siblings, 0 replies; 47+ messages in thread
From: Christoph Hellwig @ 2020-04-09 12:28 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Ira Weiny, Jan Kara, linux-kernel, Darrick J. Wong, Dan Williams,
	Christoph Hellwig, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Thu, Apr 09, 2020 at 09:48:17AM +1000, Dave Chinner wrote:
> > Christoph in particular said that a 'lazy change' is: "... straight from
> > the playbook for arcane and confusing API designs."
> > 
> > 	"But returning an error and doing a lazy change anyway is straight from
> > 	the playbook for arcane and confusing API designs."
> > 
> > 	-- https://lore.kernel.org/lkml/20200403072731.GA24176@lst.de/
> > 
> > Did I somehow misunderstand this?
> 
> Yes. Clearing the on-disk flag successfully should not return an
> error.
> 
> What is wrong is having it clear the flag successfully and returning
> an error because the operation doesn't take immediate effect, then
> having the change take effect later after telling the application
> there was an error.
> 
> That's what Christoph was saying is "straight from the playbook for
> arcane and confusing API designs."

Yes.

> There's absolutely nothing wrong with setting/clearing the on-disk
> flag and having the change take effect some time later depending on
> some external context. We've done this sort of thing for a -long
> time- and it's not XFS specific at all.
> 
> e.g.  changing the on-disk APPEND flag doesn't change the write
> behaviour of currently open files - it only affects the behaviour of
> future file opens. IOWs, we can have the flag set on disk, but we
> can still write randomly to the inode as long as we have a file
> descriptor that was opened before the APPEND on disk flag was set.
> 
> That's exactly the same class of behaviour as we are talking about
> here for the on-disk DAX flag.

Some people consider that a bug, though.  But I don't think we can
change that now.  In general I don't think APIs that don't take
immediate effect are all that great, but in some cases we can live
with them if they are properly documented.  But APIs that return
an error, but actually take effect later anyway are just crazy.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-09  0:49               ` Dave Chinner
@ 2020-04-09 12:40                 ` Christoph Hellwig
  2020-04-10  0:27                   ` Dave Chinner
  0 siblings, 1 reply; 47+ messages in thread
From: Christoph Hellwig @ 2020-04-09 12:40 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Ira Weiny, linux-kernel, Darrick J. Wong, Dan Williams,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Thu, Apr 09, 2020 at 10:49:21AM +1000, Dave Chinner wrote:
> > Christoph did say:
> > 
> > 	"A reasonably smart application can try to evict itself."
> > 
> > 	-- https://lore.kernel.org/lkml/20200403072731.GA24176@lst.de/
> 
> I'd love to know how an unprivileged application can force the
> eviction of an inode from cache.

Where did the "unprivileged" suddenly come from?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-09  0:22             ` Ira Weiny
@ 2020-04-09 12:41               ` Christoph Hellwig
  2020-04-09 20:49                 ` Ira Weiny
  0 siblings, 1 reply; 47+ messages in thread
From: Christoph Hellwig @ 2020-04-09 12:41 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dave Chinner, Jan Kara, Christoph Hellwig, Dan Williams,
	Linux Kernel Mailing List, Darrick J. Wong, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-xfs, linux-fsdevel

On Wed, Apr 08, 2020 at 05:22:03PM -0700, Ira Weiny wrote:
> > You mean something like XFS_IDONTCACHE?
> > 
> > i.e. the functionality already exists in XFS to selectively evict an
> > inode from cache when the last reference to it is dropped rather
> > than let it go to the LRUs and hang around in memory.
> > 
> > That flag can be set when changing the on disk DAX flag, and we can
> > tweak how it works so new cache hits don't clear it (as happens
> > now). Hence the only thing that can prevent eviction are active
> > references.
> > 
> > That means we'll still need to stop the application and drop_caches,
> > because we need to close all the files and purge the dentries that
> > hold references to the inode before it can be evicted.
> 
> That sounds like a great idea...
> 
> Jan?  Christoph?

Sounds ok.  Note that we could also pretty trivially lift
XFS_IDONTCACHE to the VFS if we need to apply the same scheme to
ext4.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 2/8] fs: Remove unneeded IS_DAX() check
  2020-04-09  7:31   ` Christoph Hellwig
@ 2020-04-09 14:57     ` Ira Weiny
  0 siblings, 0 replies; 47+ messages in thread
From: Ira Weiny @ 2020-04-09 14:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, Dave Chinner, Jan Kara, Darrick J. Wong,
	Dan Williams, Dave Chinner, Theodore Y. Ts'o, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Thu, Apr 09, 2020 at 09:31:34AM +0200, Christoph Hellwig wrote:
> On Tue, Apr 07, 2020 at 11:29:52AM -0700, ira.weiny@intel.com wrote:
> >  static inline bool io_is_direct(struct file *filp)
> >  {
> > -	return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host);
> > +	return (filp->f_flags & O_DIRECT);
> >  }
> 
> As requested last time: Can you please also just remove io_is_direct?

FWIW I just found this mail in my junk folder...  My fault I know... :-/

Regardless I did not see that request last time but I can do that,

Done for V7
Ira

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 4/8] fs/xfs: Make DAX mount option a tri-state
  2020-04-08  0:48       ` Dave Chinner
@ 2020-04-09 15:03         ` Ira Weiny
  0 siblings, 0 replies; 47+ messages in thread
From: Ira Weiny @ 2020-04-09 15:03 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Wed, Apr 08, 2020 at 10:48:01AM +1000, Dave Chinner wrote:
> On Tue, Apr 07, 2020 at 05:09:04PM -0700, Ira Weiny wrote:
> > On Wed, Apr 08, 2020 at 09:59:09AM +1000, Dave Chinner wrote:
> > > 
> > > This is overly complex. Just use 2 flags:
> > 
> > LOL...  I was afraid someone would say that.  At first I used 2 flags with
> > fsparam_string, but then I realized Darrick suggested fsparam_enum:
> 
> Well, I'm not concerned about the fsparam enum, it's just that
> encoding an integer into a flags bit field is just ... messy.
> Especially when encoding that state can be done with just 2 flags.
> 
> If you want to keep the xfs_mount_dax_mode() wrapper, then:
> 
> static inline uint32_t xfs_mount_dax_mode(struct xfs_mount *mp)
> {
> 	if (mp->m_flags & XFS_MOUNT_DAX_NEVER)
> 		return XFS_DAX_NEVER;
> 	if (mp->m_flags & XFS_MOUNT_DAX_ALWAYS)
> 		return XFS_DAX_ALWAYS;
> 	return XFS_DAX_IFLAG;
> }
> 
> but once it's encoded in flags like this, the wrapper really isn't
> necessary...

Done for v7

> 
> Also, while I think of it, can we change "iflag" to "inode". i.e.
> the DAX state is held on the inode. Saying it comes from an "inode
> flag" encodes the implementation into the user interface. i.e. it
> could well be held in an xattr on the inode on another filesystem,
> so we shouldn't mention "flag" in the user API....

Sure "inode" is fine with me.  Easy change, done for v7

Ira


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-09  0:30               ` Darrick J. Wong
@ 2020-04-09 15:29                 ` Ira Weiny
  2020-04-09 16:59                   ` Darrick J. Wong
  0 siblings, 1 reply; 47+ messages in thread
From: Ira Weiny @ 2020-04-09 15:29 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dave Chinner, linux-kernel, Dan Williams, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Wed, Apr 08, 2020 at 05:30:21PM -0700, Darrick J. Wong wrote:

[snip]

> 
> But you're right, this thing keeps swirling around and around and around
> because we can't ever get to agreement on this.  Maybe I'll just become
> XFS BOFH MAINTAINER and make a decision like this:
> 
>  1 Applications must call statx to discover the current S_DAX state.
> 
>  2 There exists an advisory file inode flag FS_XFLAG_DAX that is set based on
>    the parent directory FS_XFLAG_DAX inode flag.  This advisory flag can be
>    changed after file creation, but it does not immediately affect the S_DAX
>    state.
> 
>    If FS_XFLAG_DAX is set and the fs is on pmem then it will enable S_DAX at
>    inode load time; if FS_XFLAG_DAX is not set, it will not enable S_DAX.
>    Unless overridden...
> 
>  3 There exists a dax= mount option.
> 
>    "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
>    "-o dax=always" means "always set S_DAX (at least on pmem), ignore FS_XFLAG_DAX"
>         "-o dax" by itself means "dax=always"
>    "-o dax=iflag" means "follow FS_XFLAG_DAX" and is the default

per-Dave '-o dax=inode'

> 
>  4 There exists an advisory directory inode flag FS_XFLAG_DAX that can be
>    changed at any time.  The flag state is copied into any files or
>    subdirectories when they are created within that directory.

Good.

>    If programs
>    require file access runs in S_DAX mode, they must create those files
>    inside a directory with FS_XFLAG_DAX set, or mount the fs with an
>    appropriate dax mount option.

Why do we need this to be true?  If the FS_XFLAG_DAX flag can be cleared why
not set it and allow the S_DAX change to occur later just like clearing it?
The logic is exactly the same.

> 
>  5 Programs that require a specific file access mode (DAX or not DAX) must

s/must/can/

>    do one of the following:
> 
>    (a) create files in directories with the FS_XFLAG_DAX flag set as needed;

Again if we allow clearing the flag why not setting?  So this is 1 option they
'can' do.

> 
>    (b) have the administrator set an override via mount option;
> 
>    (c) if they need to change a file's FS_XFLAG_DAX flag so that it does not
>        match the S_DAX state (as reported by statx), they must cause the
>        kernel to evict the inode from memory.  This can be done by:
> 
>        i>   closing the file;
>        ii>  re-opening the file and using statx to see if the fs has
>             changed the S_DAX flag;

i and ii need to be 1 step the user must follow.

>        iii> if not, either unmount and remount the filesystem, or
>             closing the file and using drop_caches.
> 
>  6 I no longer think it's too wild to require that users who want to
>    squeeze every last bit of performance out of the particular rough and
>    tumble bits of their storage also be exposed to the difficulties of
>    what happens when the operating system can't totally virtualize those
>    hardware capabilities.  Your high performance sports car is not a
>    Toyota minivan, as it were.

I'm good with this statement.  But I think we need to clean up the verbiage for
the documentation...  ;-)

Thanks for the summary.  I like these to get everyone on the same page.  :-D
Ira

> 
> I think (like Dave said) that if you set XFS_IDONTCACHE on the inode
> when you change the DAX flag, the VFS will kill the inode the instant
> the last user close()s the file.  Then 5.c.ii will actually work.
> 
> --D
> 
> > > 
> > > > Furthermore, if we did want an interface like that why not allow
> > > > the on-disk flag to be set as well as cleared?
> > > 
> > > Well, why not - it's why I implemented the flag in the first place!
> > > The only problem we have here is how to safely change the in-memory
> > > DAX state, and that largely has nothing to do with setting/clearing
> > > the on-disk flag....
> > 
> > With the above change to xfs_diflags_to_iflags() I think we are ok here.
> > 
> > Ira
> > 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-09 15:29                 ` Ira Weiny
@ 2020-04-09 16:59                   ` Darrick J. Wong
  2020-04-09 17:17                     ` Jan Kara
  2020-04-09 20:54                     ` Ira Weiny
  0 siblings, 2 replies; 47+ messages in thread
From: Darrick J. Wong @ 2020-04-09 16:59 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dave Chinner, linux-kernel, Dan Williams, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Thu, Apr 09, 2020 at 08:29:44AM -0700, Ira Weiny wrote:
> On Wed, Apr 08, 2020 at 05:30:21PM -0700, Darrick J. Wong wrote:
> 
> [snip]
> 
> > 
> > But you're right, this thing keeps swirling around and around and around
> > because we can't ever get to agreement on this.  Maybe I'll just become
> > XFS BOFH MAINTAINER and make a decision like this:
> > 
> >  1 Applications must call statx to discover the current S_DAX state.
> > 
> >  2 There exists an advisory file inode flag FS_XFLAG_DAX that is set based on
> >    the parent directory FS_XFLAG_DAX inode flag.  This advisory flag can be
> >    changed after file creation, but it does not immediately affect the S_DAX
> >    state.
> > 
> >    If FS_XFLAG_DAX is set and the fs is on pmem then it will enable S_DAX at
> >    inode load time; if FS_XFLAG_DAX is not set, it will not enable S_DAX.
> >    Unless overridden...
> > 
> >  3 There exists a dax= mount option.
> > 
> >    "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
> >    "-o dax=always" means "always set S_DAX (at least on pmem), ignore FS_XFLAG_DAX"
> >         "-o dax" by itself means "dax=always"
> >    "-o dax=iflag" means "follow FS_XFLAG_DAX" and is the default
> 
> per-Dave '-o dax=inode'

Ok.

> > 
> >  4 There exists an advisory directory inode flag FS_XFLAG_DAX that can be
> >    changed at any time.  The flag state is copied into any files or
> >    subdirectories when they are created within that directory.
> 
> Good.
> 
> >    If programs
> >    require file access runs in S_DAX mode, they must create those files
> >    inside a directory with FS_XFLAG_DAX set, or mount the fs with an
> >    appropriate dax mount option.
> 
> Why do we need this to be true?  If the FS_XFLAG_DAX flag can be cleared why
> not set it and allow the S_DAX change to occur later just like clearing it?
> The logic is exactly the same.

I think I'll just delete this sentence since I started pushing all that
verbiage towards (5).

To answer your question, yes, FS_XFLAG_DAX can be set on a file at any
time, the same as it can be cleared at any time.  Sorry that was
unclear, I'll fix that for the next draft (below).

> > 
> >  5 Programs that require a specific file access mode (DAX or not DAX) must
> 
> s/must/can/

Ok.

> >    do one of the following:
> > 
> >    (a) create files in directories with the FS_XFLAG_DAX flag set as needed;
> 
> Again if we allow clearing the flag why not setting?  So this is 1 option they
> 'can' do.
> 
> > 
> >    (b) have the administrator set an override via mount option;
> > 
> >    (c) if they need to change a file's FS_XFLAG_DAX flag so that it does not
> >        match the S_DAX state (as reported by statx), they must cause the
> >        kernel to evict the inode from memory.  This can be done by:
> > 
> >        i>   closing the file;
> >        ii>  re-opening the file and using statx to see if the fs has
> >             changed the S_DAX flag;
> 
> i and ii need to be 1 step the user must follow.

Ok, I'll combine these.

> >        iii> if not, either unmount and remount the filesystem, or
> >             closing the file and using drop_caches.
> > 
> >  6 I no longer think it's too wild to require that users who want to
> >    squeeze every last bit of performance out of the particular rough and
> >    tumble bits of their storage also be exposed to the difficulties of
> >    what happens when the operating system can't totally virtualize those
> >    hardware capabilities.  Your high performance sports car is not a
> >    Toyota minivan, as it were.
> 
> I'm good with this statement.  But I think we need to clean up the verbiage for
> the documentation...  ;-)

Heh. :)

> Thanks for the summary.  I like these to get everyone on the same page.  :-D

And today:

 1. There exists an in-kernel access mode flag S_DAX that is set when
    file accesses go directly to persistent memory, bypassing the page
    cache.  Applications must call statx to discover the current S_DAX
    state (STATX_ATTR_DAX).

 2. There exists an advisory file inode flag FS_XFLAG_DAX that is
    inherited from the parent directory FS_XFLAG_DAX inode flag at file
    creation time.  This advisory flag can be set or cleared at any
    time, but doing so does not immediately affect the S_DAX state.

    Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
    and the fs is on pmem then it will enable S_DAX at inode load time;
    if FS_XFLAG_DAX is not set, it will not enable S_DAX.

 3. There exists a dax= mount option.

    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."

    "-o dax=always" means "always set S_DAX (at least on pmem),
                    and ignore FS_XFLAG_DAX."

    "-o dax"        is an alias for "dax=always".

    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.

 4. There exists an advisory directory inode flag FS_XFLAG_DAX that can
    be set or cleared at any time.  The flag state is copied into any
    files or subdirectories when they are created within that directory.

 5. Programs that require a specific file access mode (DAX or not DAX)
    can do one of the following:

    (a) Create files in directories that the FS_XFLAG_DAX flag set as
        needed; or

    (b) Have the administrator set an override via mount option; or

    (c) Set or clear the file's FS_XFLAG_DAX flag as needed.  Programs
        must then cause the kernel to evict the inode from memory.  This
        can be done by:

        i>  Closing the file and re-opening the file and using statx to
            see if the fs has changed the S_DAX flag; and

        ii> If the file still does not have the desired S_DAX access
            mode, either unmount and remount the filesystem, or close
            the file and use drop_caches.

 6. It's not unreasonable that users who want to squeeze every last bit
    of performance out of the particular rough and tumble bits of their
    storage also be exposed to the difficulties of what happens when the
    operating system can't totally virtualize those hardware
    capabilities.  Your high performance sports car is not a Toyota
    minivan, as it were.

Given our overnight discussions, I don't think it'll be difficult to
hoist XFS_IDONTCACHE to the VFS so that 5.c.i is enough to change the
S_DAX state if nobody else is using the file.

--D

> > 
> > > Furthermore, if we did want an interface like that why not allow
> > > the on-disk flag to be set as well as cleared?
> > 
> > Well, why not - it's why I implemented the flag in the first place!
> > The only problem we have here is how to safely change the in-memory
> > DAX state, and that largely has nothing to do with setting/clearing
> > the on-disk flag....
> 
> With the above change to xfs_diflags_to_iflags() I think we are ok here.
> 
> Ira
> 
--D

> Ira
> 
> > 
> > I think (like Dave said) that if you set XFS_IDONTCACHE on the inode
> > when you change the DAX flag, the VFS will kill the inode the instant
> > the last user close()s the file.  Then 5.c.ii will actually work.
> > 
> > --D
> > 
> > > > 
> > > > > Furthermore, if we did want an interface like that why not allow
> > > > > the on-disk flag to be set as well as cleared?
> > > > 
> > > > Well, why not - it's why I implemented the flag in the first place!
> > > > The only problem we have here is how to safely change the in-memory
> > > > DAX state, and that largely has nothing to do with setting/clearing
> > > > the on-disk flag....
> > > 
> > > With the above change to xfs_diflags_to_iflags() I think we are ok here.
> > > 
> > > Ira
> > > 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-09 16:59                   ` Darrick J. Wong
@ 2020-04-09 17:17                     ` Jan Kara
  2020-04-09 20:54                     ` Ira Weiny
  1 sibling, 0 replies; 47+ messages in thread
From: Jan Kara @ 2020-04-09 17:17 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Ira Weiny, Dave Chinner, linux-kernel, Dan Williams,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Thu 09-04-20 09:59:27, Darrick J. Wong wrote:
> And today:
> 
>  1. There exists an in-kernel access mode flag S_DAX that is set when
>     file accesses go directly to persistent memory, bypassing the page
>     cache.  Applications must call statx to discover the current S_DAX
>     state (STATX_ATTR_DAX).
> 
>  2. There exists an advisory file inode flag FS_XFLAG_DAX that is
>     inherited from the parent directory FS_XFLAG_DAX inode flag at file
>     creation time.  This advisory flag can be set or cleared at any
>     time, but doing so does not immediately affect the S_DAX state.
> 
>     Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
>     and the fs is on pmem then it will enable S_DAX at inode load time;
>     if FS_XFLAG_DAX is not set, it will not enable S_DAX.
> 
>  3. There exists a dax= mount option.
> 
>     "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
> 
>     "-o dax=always" means "always set S_DAX (at least on pmem),
>                     and ignore FS_XFLAG_DAX."
> 
>     "-o dax"        is an alias for "dax=always".
> 
>     "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
> 
>  4. There exists an advisory directory inode flag FS_XFLAG_DAX that can
>     be set or cleared at any time.  The flag state is copied into any
>     files or subdirectories when they are created within that directory.
> 
>  5. Programs that require a specific file access mode (DAX or not DAX)
>     can do one of the following:
> 
>     (a) Create files in directories that the FS_XFLAG_DAX flag set as
>         needed; or
> 
>     (b) Have the administrator set an override via mount option; or
> 
>     (c) Set or clear the file's FS_XFLAG_DAX flag as needed.  Programs
>         must then cause the kernel to evict the inode from memory.  This
>         can be done by:
> 
>         i>  Closing the file and re-opening the file and using statx to
>             see if the fs has changed the S_DAX flag; and
> 
>         ii> If the file still does not have the desired S_DAX access
>             mode, either unmount and remount the filesystem, or close
>             the file and use drop_caches.
> 
>  6. It's not unreasonable that users who want to squeeze every last bit
>     of performance out of the particular rough and tumble bits of their
>     storage also be exposed to the difficulties of what happens when the
>     operating system can't totally virtualize those hardware
>     capabilities.  Your high performance sports car is not a Toyota
>     minivan, as it were.
> 
> Given our overnight discussions, I don't think it'll be difficult to
> hoist XFS_IDONTCACHE to the VFS so that 5.c.i is enough to change the
> S_DAX state if nobody else is using the file.

I still find the "S_DAX changes on inode eviction" confusing but I
guess it's as good as it gets and with XFS_IDONTCACHE lifted to VFS, 
it seems acceptable so OK from me.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-09 12:41               ` Christoph Hellwig
@ 2020-04-09 20:49                 ` Ira Weiny
  0 siblings, 0 replies; 47+ messages in thread
From: Ira Weiny @ 2020-04-09 20:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Chinner, Jan Kara, Dan Williams, Linux Kernel Mailing List,
	Darrick J. Wong, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Thu, Apr 09, 2020 at 02:41:27PM +0200, Christoph Hellwig wrote:
> On Wed, Apr 08, 2020 at 05:22:03PM -0700, Ira Weiny wrote:
> > > You mean something like XFS_IDONTCACHE?
> > > 
> > > i.e. the functionality already exists in XFS to selectively evict an
> > > inode from cache when the last reference to it is dropped rather
> > > than let it go to the LRUs and hang around in memory.
> > > 
> > > That flag can be set when changing the on disk DAX flag, and we can
> > > tweak how it works so new cache hits don't clear it (as happens
> > > now). Hence the only thing that can prevent eviction are active
> > > references.
> > > 
> > > That means we'll still need to stop the application and drop_caches,
> > > because we need to close all the files and purge the dentries that
> > > hold references to the inode before it can be evicted.
> > 
> > That sounds like a great idea...
> > 
> > Jan?  Christoph?
> 
> Sounds ok.  Note that we could also pretty trivially lift
> XFS_IDONTCACHE to the VFS if we need to apply the same scheme to
> ext4.

Yes I have been slowing working on ext4 in the background.  So lifting
XXX_IDONTCACHE to VFS will be part of that series.

Thanks,
Ira

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-09 16:59                   ` Darrick J. Wong
  2020-04-09 17:17                     ` Jan Kara
@ 2020-04-09 20:54                     ` Ira Weiny
  1 sibling, 0 replies; 47+ messages in thread
From: Ira Weiny @ 2020-04-09 20:54 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dave Chinner, linux-kernel, Dan Williams, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Thu, Apr 09, 2020 at 09:59:27AM -0700, Darrick J. Wong wrote:

[snip]
 
> And today:
> 
>  1. There exists an in-kernel access mode flag S_DAX that is set when
>     file accesses go directly to persistent memory, bypassing the page
>     cache.  Applications must call statx to discover the current S_DAX
>     state (STATX_ATTR_DAX).
> 
>  2. There exists an advisory file inode flag FS_XFLAG_DAX that is
>     inherited from the parent directory FS_XFLAG_DAX inode flag at file
>     creation time.  This advisory flag can be set or cleared at any
>     time, but doing so does not immediately affect the S_DAX state.
> 
>     Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
>     and the fs is on pmem then it will enable S_DAX at inode load time;
>     if FS_XFLAG_DAX is not set, it will not enable S_DAX.
> 
>  3. There exists a dax= mount option.
> 
>     "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
> 
>     "-o dax=always" means "always set S_DAX (at least on pmem),
>                     and ignore FS_XFLAG_DAX."
> 
>     "-o dax"        is an alias for "dax=always".
> 
>     "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
> 
>  4. There exists an advisory directory inode flag FS_XFLAG_DAX that can
>     be set or cleared at any time.  The flag state is copied into any
>     files or subdirectories when they are created within that directory.
> 
>  5. Programs that require a specific file access mode (DAX or not DAX)
>     can do one of the following:
> 
>     (a) Create files in directories that the FS_XFLAG_DAX flag set as
>         needed; or
> 
>     (b) Have the administrator set an override via mount option; or
> 
>     (c) Set or clear the file's FS_XFLAG_DAX flag as needed.  Programs
>         must then cause the kernel to evict the inode from memory.  This
>         can be done by:
> 
>         i>  Closing the file and re-opening the file and using statx to
>             see if the fs has changed the S_DAX flag; and
> 
>         ii> If the file still does not have the desired S_DAX access
>             mode, either unmount and remount the filesystem, or close
>             the file and use drop_caches.
> 
>  6. It's not unreasonable that users who want to squeeze every last bit
>     of performance out of the particular rough and tumble bits of their
>     storage also be exposed to the difficulties of what happens when the
>     operating system can't totally virtualize those hardware
>     capabilities.  Your high performance sports car is not a Toyota
>     minivan, as it were.
> 
> Given our overnight discussions, I don't think it'll be difficult to
> hoist XFS_IDONTCACHE to the VFS so that 5.c.i is enough to change the
> S_DAX state if nobody else is using the file.

Agreed!

One note on implementation, I plan to get XFS_IDONTCACHE tested with XFS and
leave hoisting it to VFS for the series which enables ext4 as that is when we
would need such hoisting.

Thanks everyone for another round of discussions!  :-D

Ira

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-09 12:40                 ` Christoph Hellwig
@ 2020-04-10  0:27                   ` Dave Chinner
  0 siblings, 0 replies; 47+ messages in thread
From: Dave Chinner @ 2020-04-10  0:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ira Weiny, linux-kernel, Darrick J. Wong, Dan Williams,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Thu, Apr 09, 2020 at 02:40:31PM +0200, Christoph Hellwig wrote:
> On Thu, Apr 09, 2020 at 10:49:21AM +1000, Dave Chinner wrote:
> > > Christoph did say:
> > > 
> > > 	"A reasonably smart application can try to evict itself."
> > > 
> > > 	-- https://lore.kernel.org/lkml/20200403072731.GA24176@lst.de/
> > 
> > I'd love to know how an unprivileged application can force the
> > eviction of an inode from cache.
> 
> Where did the "unprivileged" suddenly come from?

I'm assuming that applications are being run without the root
permissions needed to run drop_caches. i.e. the apps are
unprivileged, and therefore can't brute force inode cache eviction.
That's why I'm asking what mechanism these applications are using to
evict inodes on demand without requiring elevated privileges,
because I can't see how they'd acheive this...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2020-04-10  0:27 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-07 18:29 [PATCH V6 0/8] Enable per-file/per-directory DAX operations V6 ira.weiny
2020-04-07 18:29 ` [PATCH V6 1/8] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
2020-04-07 23:46   ` Dave Chinner
2020-04-07 18:29 ` [PATCH V6 2/8] fs: Remove unneeded IS_DAX() check ira.weiny
2020-04-09  7:31   ` Christoph Hellwig
2020-04-09 14:57     ` Ira Weiny
2020-04-07 18:29 ` [PATCH V6 3/8] fs/stat: Define DAX statx attribute ira.weiny
2020-04-07 23:47   ` Dave Chinner
2020-04-07 18:29 ` [PATCH V6 4/8] fs/xfs: Make DAX mount option a tri-state ira.weiny
2020-04-07 23:59   ` Dave Chinner
2020-04-08  0:09     ` Ira Weiny
2020-04-08  0:48       ` Dave Chinner
2020-04-09 15:03         ` Ira Weiny
2020-04-07 18:29 ` [PATCH V6 5/8] fs/xfs: Create function xfs_inode_enable_dax() ira.weiny
2020-04-08  0:05   ` Dave Chinner
2020-04-08  0:13     ` Ira Weiny
2020-04-07 18:29 ` [PATCH V6 6/8] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags() ira.weiny
2020-04-08  2:08   ` Dave Chinner
2020-04-08 17:09     ` Ira Weiny
2020-04-08 21:02       ` Dave Chinner
2020-04-08 21:28         ` Dan Williams
2020-04-08 22:10           ` Ira Weiny
2020-04-08 23:58           ` Dave Chinner
2020-04-09  0:22             ` Ira Weiny
2020-04-09 12:41               ` Christoph Hellwig
2020-04-09 20:49                 ` Ira Weiny
2020-04-08 22:07         ` Ira Weiny
2020-04-08 23:21           ` Dave Chinner
2020-04-09  0:12             ` Ira Weiny
2020-04-09  0:30               ` Darrick J. Wong
2020-04-09 15:29                 ` Ira Weiny
2020-04-09 16:59                   ` Darrick J. Wong
2020-04-09 17:17                     ` Jan Kara
2020-04-09 20:54                     ` Ira Weiny
2020-04-09  0:49               ` Dave Chinner
2020-04-09 12:40                 ` Christoph Hellwig
2020-04-10  0:27                   ` Dave Chinner
2020-04-07 18:29 ` [PATCH V6 7/8] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate() to xfs_ioctl_dax_check() ira.weiny
2020-04-08  2:23   ` Dave Chinner
2020-04-08  9:58     ` Jan Kara
2020-04-08 21:09       ` Dave Chinner
2020-04-08 22:26         ` Ira Weiny
2020-04-08 23:48           ` Dave Chinner
2020-04-09 12:28             ` Christoph Hellwig
2020-04-08 15:37   ` Darrick J. Wong
2020-04-08 18:13     ` Ira Weiny
2020-04-07 18:29 ` [PATCH V6 8/8] Documentation/dax: Update Usage section ira.weiny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).