linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V7 0/9] Enable per-file/per-directory DAX operations V7
@ 2020-04-13  5:40 ira.weiny
  2020-04-13  5:40 ` [PATCH V7 1/9] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
                   ` (8 more replies)
  0 siblings, 9 replies; 47+ messages in thread
From: ira.weiny @ 2020-04-13  5:40 UTC (permalink / raw)
  To: linux-kernel, Darrick J. Wong
  Cc: Ira Weiny, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, linux-ext4, linux-xfs,
	linux-fsdevel, Jeff Moyer

From: Ira Weiny <ira.weiny@intel.com>

Changes from V6:
	Incorporate feedback on patches
	Add ability to change FS_XFLAG_DAX on files at any time.
		Add a don't cache flag to the VFS layer
		Preserve internal XFS IDONTCACHE behavior for bulkstat
		operations

At LSF/MM'19 [1] [2] we discussed applications that overestimate memory
consumption due to their inability to detect whether the kernel will
instantiate page cache for a file, and cases where a global dax enable via a
mount option is too coarse.

The following patch series enables the use of DAX on individual files and/or
directories on xfs, and lays some groundwork to do so in ext4.  It further
enhances the dax mount option to be a tri-state of 'always', 'never', or
'iflag' (default).  Furthermore, it maintians '-o dax' to be equivalent to '-o
dax=always'.

The insight at LSF/MM was to separate the per-mount or per-file "physical"
(FS_XFLAG_DAX) capability switch from an "effective" (S_DAX) attribute for the
file.

At LSF/MM we discussed the difficulties of switching the DAX state of a file
with active mappings / page cache.  It was thought the races could be avoided
by limiting DAX state flips to 0-length files.

However, this turns out to not be true.[3][5] This is because address space
operations (a_ops) may be in use at any time the inode is referenced.

For this reason direct manipulation of the FS_XFLAG_DAX is allowed but the
operation of the file (S_DAX) is not immediately changed.

Details of when and how DAX state can be changed on a file is included in a
documentation patch.

It should be noted that FS_XFLAG_DAX inheritance is not shown in this patch set
as it was maintained from previous work on XFS.  FS_XFLAG_DAX and it's
inheritance will need to be added to other file systems for user control. 


[1] https://lwn.net/Articles/787973/
[2] https://lwn.net/Articles/787233/
[3] https://lkml.org/lkml/2019/10/20/96
[4] https://patchwork.kernel.org/patch/11310511/
[5] https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/


To: linux-kernel@vger.kernel.org
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Cc: linux-xfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org

Changes from V5:
	* make dax mount option a tri-state
	* Reject changes to FS_XFLAG_DAX for regular files
		- Allow only on directories
	* Update documentation

Changes from V4:
	* Open code the aops lock rather than add it to the xfs_ilock()
	  subsystem (Darrick's comments were obsoleted by this change)
	* Fix lkp build suggestions and bugs

Changes from V3:
	* Remove global locking...  :-D
	* put back per inode locking and remove pre-mature optimizations
	* Fix issues with Directories having IS_DAX() set
	* Fix kernel crash issues reported by Jeff
	* Add some clean up patches
	* Consolidate diflags to iflags functions
	* Update/add documentation
	* Reorder/rename patches quite a bit

Changes from V2:

	* Move i_dax_sem to be a global percpu_rw_sem rather than per inode
		Internal discussions with Dan determined this would be easier,
		just as performant, and slightly less overhead that having it
		in the SB as suggested by Jan
	* Fix locking order in comments and throughout code
	* Change "mode" to "state" throughout commits
	* Add CONFIG_FS_DAX wrapper to disable inode_[un]lock_state() when not
		configured
	* Add static branch for which is activated by a device which supports
		DAX in XFS
	* Change "lock/unlock" to up/down read/write as appropriate
		Previous names were over simplified
	* Update comments/documentation

	* Remove the xfs specific lock to the vfs (global) layer.
	* Fix i_dax_sem locking order and comments

	* Move 'i_mapped' count from struct inode to struct address_space and
		rename it to mmap_count
	* Add inode_has_mappings() call

	* Fix build issues
	* Clean up syntax spacing and minor issues
	* Update man page text for STATX_ATTR_DAX
	* Add reviewed-by's
	* Rebase to 5.6

	Rename patch:
		from: fs/xfs: Add lock/unlock state to xfs
		to: fs/xfs: Add write DAX lock to xfs layer
	Add patch:
		fs/xfs: Clarify lockdep dependency for xfs_isilocked()
	Drop patch:
		fs/xfs: Fix truncate up


Ira Weiny (9):
  fs/xfs: Remove unnecessary initialization of i_rwsem
  fs: Remove unneeded IS_DAX() check in io_is_direct()
  fs/stat: Define DAX statx attribute
  fs/xfs: Make DAX mount option a tri-state
  fs/xfs: Create function xfs_inode_enable_dax()
  fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  fs: Define I_DONTCACNE in VFS layer
  fs/xfs: Change xfs_ioctl_setattr_dax_invalidate()
  Documentation/dax: Update Usage section

 Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
 drivers/block/loop.c              |   6 +-
 fs/stat.c                         |   3 +
 fs/xfs/xfs_icache.c               |   4 +-
 fs/xfs/xfs_inode.h                |   1 +
 fs/xfs/xfs_ioctl.c                | 135 +++---------------------
 fs/xfs/xfs_iops.c                 |  74 ++++++++-----
 fs/xfs/xfs_mount.h                |   3 +-
 fs/xfs/xfs_super.c                |  44 +++++++-
 include/linux/fs.h                |  13 ++-
 include/uapi/linux/stat.h         |   1 +
 11 files changed, 288 insertions(+), 162 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH V7 1/9] fs/xfs: Remove unnecessary initialization of i_rwsem
  2020-04-13  5:40 [PATCH V7 0/9] Enable per-file/per-directory DAX operations V7 ira.weiny
@ 2020-04-13  5:40 ` ira.weiny
  2020-04-13 15:41   ` Darrick J. Wong
  2020-04-13  5:40 ` [PATCH V7 2/9] fs: Remove unneeded IS_DAX() check in io_is_direct() ira.weiny
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 47+ messages in thread
From: ira.weiny @ 2020-04-13  5:40 UTC (permalink / raw)
  To: linux-kernel, Darrick J. Wong
  Cc: Ira Weiny, Dave Chinner, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

From: Ira Weiny <ira.weiny@intel.com>

An earlier call of xfs_reinit_inode() from xfs_iget_cache_hit() already
handles initialization of i_rwsem.

Doing so again is unneeded.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V4:
	Update commit message to make it clear the xfs_iget_cache_hit()
	is actually doing the initialization via xfs_reinit_inode()

New for V4:

NOTE: This was found while ensuring the new i_aops_sem was properly
handled.  It seems like this is a layering violation so I think it is
worth cleaning up so as to not confuse others.
---
 fs/xfs/xfs_icache.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 8dc2e5414276..836a1f09be03 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -419,6 +419,7 @@ xfs_iget_cache_hit(
 		spin_unlock(&ip->i_flags_lock);
 		rcu_read_unlock();
 
+		ASSERT(!rwsem_is_locked(&inode->i_rwsem));
 		error = xfs_reinit_inode(mp, inode);
 		if (error) {
 			bool wake;
@@ -452,9 +453,6 @@ xfs_iget_cache_hit(
 		ip->i_sick = 0;
 		ip->i_checked = 0;
 
-		ASSERT(!rwsem_is_locked(&inode->i_rwsem));
-		init_rwsem(&inode->i_rwsem);
-
 		spin_unlock(&ip->i_flags_lock);
 		spin_unlock(&pag->pag_ici_lock);
 	} else {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V7 2/9] fs: Remove unneeded IS_DAX() check in io_is_direct()
  2020-04-13  5:40 [PATCH V7 0/9] Enable per-file/per-directory DAX operations V7 ira.weiny
  2020-04-13  5:40 ` [PATCH V7 1/9] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
@ 2020-04-13  5:40 ` ira.weiny
  2020-04-14  6:22   ` Christoph Hellwig
  2020-04-13  5:40 ` [PATCH V7 3/9] fs/stat: Define DAX statx attribute ira.weiny
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 47+ messages in thread
From: ira.weiny @ 2020-04-13  5:40 UTC (permalink / raw)
  To: linux-kernel, Darrick J. Wong
  Cc: Ira Weiny, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

From: Ira Weiny <ira.weiny@intel.com>

Remove the check because DAX now has it's own read/write methods and
file systems which support DAX check IS_DAX() prior to IOCB_DIRECT on
their own.  Therefore, it does not matter if the file state is DAX when
the iocb flags are created.

Also remove io_is_direct() as it is just a simple flag check.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from v6:
	remove io_is_direct() as well.
	Remove Reviews since this is quite a bit different.

Changes from v3:
	Reword commit message.
	Reordered to be a 'pre-cleanup' patch
---
 drivers/block/loop.c | 6 +++---
 include/linux/fs.h   | 7 +------
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 739b372a5112..9a9af78974ac 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -631,8 +631,8 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
 
 static inline void loop_update_dio(struct loop_device *lo)
 {
-	__loop_update_dio(lo, io_is_direct(lo->lo_backing_file) |
-			lo->use_dio);
+	__loop_update_dio(lo, (lo->lo_backing_file->f_flags & O_DIRECT) |
+				lo->use_dio);
 }
 
 static void loop_reread_partitions(struct loop_device *lo,
@@ -1006,7 +1006,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	if (!(lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
 		blk_queue_write_cache(lo->lo_queue, true, false);
 
-	if (io_is_direct(lo->lo_backing_file) && inode->i_sb->s_bdev) {
+	if ((lo->lo_backing_file->f_flags & O_DIRECT) && inode->i_sb->s_bdev) {
 		/* In case of direct I/O, match underlying block size */
 		unsigned short bsize = bdev_logical_block_size(
 			inode->i_sb->s_bdev);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index abedbffe2c9e..a818ced22961 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3387,11 +3387,6 @@ extern void setattr_copy(struct inode *inode, const struct iattr *attr);
 
 extern int file_update_time(struct file *file);
 
-static inline bool io_is_direct(struct file *filp)
-{
-	return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host);
-}
-
 static inline bool vma_is_dax(struct vm_area_struct *vma)
 {
 	return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host);
@@ -3416,7 +3411,7 @@ static inline int iocb_flags(struct file *file)
 	int res = 0;
 	if (file->f_flags & O_APPEND)
 		res |= IOCB_APPEND;
-	if (io_is_direct(file))
+	if (file->f_flags & O_DIRECT)
 		res |= IOCB_DIRECT;
 	if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))
 		res |= IOCB_DSYNC;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V7 3/9] fs/stat: Define DAX statx attribute
  2020-04-13  5:40 [PATCH V7 0/9] Enable per-file/per-directory DAX operations V7 ira.weiny
  2020-04-13  5:40 ` [PATCH V7 1/9] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
  2020-04-13  5:40 ` [PATCH V7 2/9] fs: Remove unneeded IS_DAX() check in io_is_direct() ira.weiny
@ 2020-04-13  5:40 ` ira.weiny
  2020-04-14  6:23   ` Christoph Hellwig
  2020-04-13  5:40 ` [PATCH V7 4/9] fs/xfs: Make DAX mount option a tri-state ira.weiny
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 47+ messages in thread
From: ira.weiny @ 2020-04-13  5:40 UTC (permalink / raw)
  To: linux-kernel, Darrick J. Wong
  Cc: Ira Weiny, Dave Chinner, Jan Kara, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

From: Ira Weiny <ira.weiny@intel.com>

In order for users to determine if a file is currently operating in DAX
state (effective DAX).  Define a statx attribute value and set that
attribute if the effective DAX flag is set.

To go along with this we propose the following addition to the statx man
page:

STATX_ATTR_DAX

	The file is in the DAX (cpu direct access) state.  DAX state
	attempts to minimize software cache effects for both I/O and
	memory mappings of this file.  It requires a file system which
	has been configured to support DAX.

	DAX generally assumes all accesses are via cpu load / store
	instructions which can minimize overhead for small accesses, but
	may adversely affect cpu utilization for large transfers.

	File I/O is done directly to/from user-space buffers and memory
	mapped I/O may be performed with direct memory mappings that
	bypass kernel page cache.

	While the DAX property tends to result in data being transferred
	synchronously, it does not give the same guarantees of O_SYNC
	where data and the necessary metadata are transferred together.

	A DAX file may support being mapped with the MAP_SYNC flag,
	which enables a program to use CPU cache flush instructions to
	persist CPU store operations without an explicit fsync(2).  See
	mmap(2) for more information.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V2:
	Update man page text with comments from Darrick, Jan, Dan, and
	Dave.
---
 fs/stat.c                 | 3 +++
 include/uapi/linux/stat.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/fs/stat.c b/fs/stat.c
index 030008796479..894699c74dde 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -79,6 +79,9 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat,
 	if (IS_AUTOMOUNT(inode))
 		stat->attributes |= STATX_ATTR_AUTOMOUNT;
 
+	if (IS_DAX(inode))
+		stat->attributes |= STATX_ATTR_DAX;
+
 	if (inode->i_op->getattr)
 		return inode->i_op->getattr(path, stat, request_mask,
 					    query_flags);
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index ad80a5c885d5..e5f9d5517f6b 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -169,6 +169,7 @@ struct statx {
 #define STATX_ATTR_ENCRYPTED		0x00000800 /* [I] File requires key to decrypt in fs */
 #define STATX_ATTR_AUTOMOUNT		0x00001000 /* Dir: Automount trigger */
 #define STATX_ATTR_VERITY		0x00100000 /* [I] Verity protected file */
+#define STATX_ATTR_DAX			0x00002000 /* [I] File is DAX */
 
 
 #endif /* _UAPI_LINUX_STAT_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V7 4/9] fs/xfs: Make DAX mount option a tri-state
  2020-04-13  5:40 [PATCH V7 0/9] Enable per-file/per-directory DAX operations V7 ira.weiny
                   ` (2 preceding siblings ...)
  2020-04-13  5:40 ` [PATCH V7 3/9] fs/stat: Define DAX statx attribute ira.weiny
@ 2020-04-13  5:40 ` ira.weiny
  2020-04-13 15:46   ` Darrick J. Wong
  2020-04-14  6:25   ` Christoph Hellwig
  2020-04-13  5:40 ` [PATCH V7 5/9] fs/xfs: Create function xfs_inode_enable_dax() ira.weiny
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 47+ messages in thread
From: ira.weiny @ 2020-04-13  5:40 UTC (permalink / raw)
  To: linux-kernel, Darrick J. Wong
  Cc: Ira Weiny, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

From: Ira Weiny <ira.weiny@intel.com>

As agreed upon[1].  We make the dax mount option a tri-state.  '-o dax'
continues to operate the same.  We add 'always', 'never', and 'inode'
(default).

[1] https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from v6:
	Use 2 flag bits rather than a field.
	change iflag to inode

Changes from v5:
	New Patch
---
 fs/xfs/xfs_mount.h |  3 ++-
 fs/xfs/xfs_super.c | 44 ++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 88ab09ed29e7..d581b990e59a 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -233,7 +233,8 @@ typedef struct xfs_mount {
 						   allocator */
 #define XFS_MOUNT_NOATTR2	(1ULL << 25)	/* disable use of attr2 format */
 
-#define XFS_MOUNT_DAX		(1ULL << 62)	/* TEST ONLY! */
+#define XFS_MOUNT_DAX		(1ULL << 62)
+#define XFS_MOUNT_NODAX		(1ULL << 63)
 
 /*
  * Max and min values for mount-option defined I/O
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 2094386af8ac..d7bd8f5e00c9 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -47,6 +47,32 @@ static struct kset *xfs_kset;		/* top-level xfs sysfs dir */
 static struct xfs_kobj xfs_dbg_kobj;	/* global debug sysfs attrs */
 #endif
 
+enum {
+	XFS_DAX_INODE = 0,
+	XFS_DAX_ALWAYS = 1,
+	XFS_DAX_NEVER = 2,
+};
+
+static void xfs_mount_set_dax_mode(struct xfs_mount *mp, u32 val)
+{
+	if (val == XFS_DAX_INODE) {
+		mp->m_flags &= ~(XFS_MOUNT_DAX | XFS_MOUNT_NODAX);
+	} else if (val == XFS_DAX_ALWAYS) {
+		mp->m_flags &= ~XFS_MOUNT_NODAX;
+		mp->m_flags |= XFS_MOUNT_DAX;
+	} else if (val == XFS_DAX_NEVER) {
+		mp->m_flags &= ~XFS_MOUNT_DAX;
+		mp->m_flags |= XFS_MOUNT_NODAX;
+	}
+}
+
+static const struct constant_table dax_param_enums[] = {
+	{"inode",	XFS_DAX_INODE },
+	{"always",	XFS_DAX_ALWAYS },
+	{"never",	XFS_DAX_NEVER },
+	{}
+};
+
 /*
  * Table driven mount option parser.
  */
@@ -59,7 +85,7 @@ enum {
 	Opt_filestreams, Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota,
 	Opt_prjquota, Opt_uquota, Opt_gquota, Opt_pquota,
 	Opt_uqnoenforce, Opt_gqnoenforce, Opt_pqnoenforce, Opt_qnoenforce,
-	Opt_discard, Opt_nodiscard, Opt_dax,
+	Opt_discard, Opt_nodiscard, Opt_dax, Opt_dax_enum,
 };
 
 static const struct fs_parameter_spec xfs_fs_parameters[] = {
@@ -103,6 +129,7 @@ static const struct fs_parameter_spec xfs_fs_parameters[] = {
 	fsparam_flag("discard",		Opt_discard),
 	fsparam_flag("nodiscard",	Opt_nodiscard),
 	fsparam_flag("dax",		Opt_dax),
+	fsparam_enum("dax",		Opt_dax_enum, dax_param_enums),
 	{}
 };
 
@@ -129,7 +156,6 @@ xfs_fs_show_options(
 		{ XFS_MOUNT_GRPID,		",grpid" },
 		{ XFS_MOUNT_DISCARD,		",discard" },
 		{ XFS_MOUNT_LARGEIO,		",largeio" },
-		{ XFS_MOUNT_DAX,		",dax" },
 		{ 0, NULL }
 	};
 	struct xfs_mount	*mp = XFS_M(root->d_sb);
@@ -185,6 +211,13 @@ xfs_fs_show_options(
 	if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT))
 		seq_puts(m, ",noquota");
 
+	if (mp->m_flags & XFS_MOUNT_DAX)
+		seq_puts(m, ",dax=always");
+	else if (mp->m_flags & XFS_MOUNT_NODAX)
+		seq_puts(m, ",dax=never");
+	else
+		seq_puts(m, ",dax=inode");
+
 	return 0;
 }
 
@@ -1244,7 +1277,10 @@ xfs_fc_parse_param(
 		return 0;
 #ifdef CONFIG_FS_DAX
 	case Opt_dax:
-		mp->m_flags |= XFS_MOUNT_DAX;
+		xfs_mount_set_dax_mode(mp, XFS_DAX_ALWAYS);
+		return 0;
+	case Opt_dax_enum:
+		xfs_mount_set_dax_mode(mp, result.uint_32);
 		return 0;
 #endif
 	default:
@@ -1451,7 +1487,7 @@ xfs_fc_fill_super(
 		if (!rtdev_is_dax && !datadev_is_dax) {
 			xfs_alert(mp,
 			"DAX unsupported by block device. Turning off DAX.");
-			mp->m_flags &= ~XFS_MOUNT_DAX;
+			xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
 		}
 		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
 			xfs_alert(mp,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V7 5/9] fs/xfs: Create function xfs_inode_enable_dax()
  2020-04-13  5:40 [PATCH V7 0/9] Enable per-file/per-directory DAX operations V7 ira.weiny
                   ` (3 preceding siblings ...)
  2020-04-13  5:40 ` [PATCH V7 4/9] fs/xfs: Make DAX mount option a tri-state ira.weiny
@ 2020-04-13  5:40 ` ira.weiny
  2020-04-13 15:52   ` Darrick J. Wong
  2020-04-14  6:27   ` Christoph Hellwig
  2020-04-13  5:40 ` [PATCH V7 6/9] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags() ira.weiny
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 47+ messages in thread
From: ira.weiny @ 2020-04-13  5:40 UTC (permalink / raw)
  To: linux-kernel, Darrick J. Wong
  Cc: Ira Weiny, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

From: Ira Weiny <ira.weiny@intel.com>

xfs_inode_supports_dax() should reflect if the inode can support DAX not
that it is enabled for DAX.

Change the use of xfs_inode_supports_dax() to reflect only if the inode
and underlying storage support dax.

Add a new function xfs_inode_enable_dax() which reflects if the inode
should be enabled for DAX.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from v6:
	Change enable checks to be sequential logic.
	Update for 2 bit tri-state option.
	Make 'static' consistent.
	Don't set S_DAX if !CONFIG_FS_DAX

Changes from v5:
	Update to reflect the new tri-state mount option

Changes from v3:
	Update functions and names to be more clear
	Update commit message
	Merge with
		'fs/xfs: Clean up DAX support check'
		don't allow IS_DAX() on a directory
		use STATIC macro for static
		make xfs_inode_supports_dax() static
---
 fs/xfs/xfs_iops.c | 34 +++++++++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 81f2f93caec0..37bd15b55878 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1244,12 +1244,11 @@ xfs_inode_supports_dax(
 	struct xfs_mount	*mp = ip->i_mount;
 
 	/* Only supported on non-reflinked files. */
-	if (!S_ISREG(VFS_I(ip)->i_mode) || xfs_is_reflink_inode(ip))
+	if (xfs_is_reflink_inode(ip))
 		return false;
 
-	/* DAX mount option or DAX iflag must be set. */
-	if (!(mp->m_flags & XFS_MOUNT_DAX) &&
-	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
+	/* Only supported on regular files. */
+	if (!S_ISREG(VFS_I(ip)->i_mode))
 		return false;
 
 	/* Block size must match page size */
@@ -1260,6 +1259,31 @@ xfs_inode_supports_dax(
 	return xfs_inode_buftarg(ip)->bt_daxdev != NULL;
 }
 
+#ifdef CONFIG_FS_DAX
+static bool
+xfs_inode_enable_dax(
+	struct xfs_inode *ip)
+{
+	if (ip->i_mount->m_flags & XFS_MOUNT_NODAX)
+		return false;
+	if (!xfs_inode_supports_dax(ip))
+		return false;
+	if (ip->i_mount->m_flags & XFS_MOUNT_DAX)
+		return true;
+	if (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)
+		return true;
+	return false;
+}
+#else /* !CONFIG_FS_DAX */
+static bool
+xfs_inode_enable_dax(
+	struct xfs_inode *ip)
+{
+	return false;
+}
+#endif /* CONFIG_FS_DAX */
+
+
 STATIC void
 xfs_diflags_to_iflags(
 	struct inode		*inode,
@@ -1278,7 +1302,7 @@ xfs_diflags_to_iflags(
 		inode->i_flags |= S_SYNC;
 	if (flags & XFS_DIFLAG_NOATIME)
 		inode->i_flags |= S_NOATIME;
-	if (xfs_inode_supports_dax(ip))
+	if (xfs_inode_enable_dax(ip))
 		inode->i_flags |= S_DAX;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V7 6/9] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-13  5:40 [PATCH V7 0/9] Enable per-file/per-directory DAX operations V7 ira.weiny
                   ` (4 preceding siblings ...)
  2020-04-13  5:40 ` [PATCH V7 5/9] fs/xfs: Create function xfs_inode_enable_dax() ira.weiny
@ 2020-04-13  5:40 ` ira.weiny
  2020-04-13 16:01   ` Darrick J. Wong
  2020-04-13  5:40 ` [PATCH V7 7/9] fs: Define I_DONTCACNE in VFS layer ira.weiny
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 47+ messages in thread
From: ira.weiny @ 2020-04-13  5:40 UTC (permalink / raw)
  To: linux-kernel, Darrick J. Wong
  Cc: Ira Weiny, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

From: Ira Weiny <ira.weiny@intel.com>

The functionality in xfs_diflags_to_linux() and xfs_diflags_to_iflags() are
nearly identical.  The only difference is that *_to_linux() is called after
inode setup and disallows changing the DAX flag.

Combining them can be done with a flag which indicates if this is the initial
setup to allow the DAX flag to be properly set only at init time.

So remove xfs_diflags_to_linux() and call the modified xfs_diflags_to_iflags()
directly.

While we are here simplify xfs_diflags_to_iflags() to take struct xfs_inode and
use xfs_ip2xflags() to ensure future diflags are included correctly.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V6:
	Move unrelated hunk to previous patch.
	Change logic for better code generation.

Changes from V5:
	The functions are no longer identical so we can only combine
	them rather than deleting one completely.  This is reflected in
	the new init parameter.
---
 fs/xfs/xfs_inode.h |  1 +
 fs/xfs/xfs_ioctl.c | 33 +--------------------------------
 fs/xfs/xfs_iops.c  | 42 +++++++++++++++++++++++-------------------
 3 files changed, 25 insertions(+), 51 deletions(-)

diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 492e53992fa9..e76ed9ca17f7 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -466,6 +466,7 @@ int	xfs_break_layouts(struct inode *inode, uint *iolock,
 /* from xfs_iops.c */
 extern void xfs_setup_inode(struct xfs_inode *ip);
 extern void xfs_setup_iops(struct xfs_inode *ip);
+extern void xfs_diflags_to_iflags(struct xfs_inode *ip, bool init);
 
 /*
  * When setting up a newly allocated inode, we need to call
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index d42de92cb283..c6cd92ef4a05 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1100,37 +1100,6 @@ xfs_flags2diflags2(
 	return di_flags2;
 }
 
-STATIC void
-xfs_diflags_to_linux(
-	struct xfs_inode	*ip)
-{
-	struct inode		*inode = VFS_I(ip);
-	unsigned int		xflags = xfs_ip2xflags(ip);
-
-	if (xflags & FS_XFLAG_IMMUTABLE)
-		inode->i_flags |= S_IMMUTABLE;
-	else
-		inode->i_flags &= ~S_IMMUTABLE;
-	if (xflags & FS_XFLAG_APPEND)
-		inode->i_flags |= S_APPEND;
-	else
-		inode->i_flags &= ~S_APPEND;
-	if (xflags & FS_XFLAG_SYNC)
-		inode->i_flags |= S_SYNC;
-	else
-		inode->i_flags &= ~S_SYNC;
-	if (xflags & FS_XFLAG_NOATIME)
-		inode->i_flags |= S_NOATIME;
-	else
-		inode->i_flags &= ~S_NOATIME;
-#if 0	/* disabled until the flag switching races are sorted out */
-	if (xflags & FS_XFLAG_DAX)
-		inode->i_flags |= S_DAX;
-	else
-		inode->i_flags &= ~S_DAX;
-#endif
-}
-
 static int
 xfs_ioctl_setattr_xflags(
 	struct xfs_trans	*tp,
@@ -1168,7 +1137,7 @@ xfs_ioctl_setattr_xflags(
 	ip->i_d.di_flags = xfs_flags2diflags(ip, fa->fsx_xflags);
 	ip->i_d.di_flags2 = di_flags2;
 
-	xfs_diflags_to_linux(ip);
+	xfs_diflags_to_iflags(ip, false);
 	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 	XFS_STATS_INC(mp, xs_ig_attrchg);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 37bd15b55878..856ad823e754 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1284,26 +1284,30 @@ xfs_inode_enable_dax(
 #endif /* CONFIG_FS_DAX */
 
 
-STATIC void
+void
 xfs_diflags_to_iflags(
-	struct inode		*inode,
-	struct xfs_inode	*ip)
+	struct xfs_inode	*ip,
+	bool init)
 {
-	uint16_t		flags = ip->i_d.di_flags;
-
-	inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC |
-			    S_NOATIME | S_DAX);
-
-	if (flags & XFS_DIFLAG_IMMUTABLE)
-		inode->i_flags |= S_IMMUTABLE;
-	if (flags & XFS_DIFLAG_APPEND)
-		inode->i_flags |= S_APPEND;
-	if (flags & XFS_DIFLAG_SYNC)
-		inode->i_flags |= S_SYNC;
-	if (flags & XFS_DIFLAG_NOATIME)
-		inode->i_flags |= S_NOATIME;
-	if (xfs_inode_enable_dax(ip))
-		inode->i_flags |= S_DAX;
+	struct inode            *inode = VFS_I(ip);
+	unsigned int            xflags = xfs_ip2xflags(ip);
+	unsigned int            flags = 0;
+
+	ASSERT(!(IS_DAX(inode) && init));
+
+	if (xflags & FS_XFLAG_IMMUTABLE)
+		flags |= S_IMMUTABLE;
+	if (xflags & FS_XFLAG_APPEND)
+		flags |= S_APPEND;
+	if (xflags & FS_XFLAG_SYNC)
+		flags |= S_SYNC;
+	if (xflags & FS_XFLAG_NOATIME)
+		flags |= S_NOATIME;
+	if (init && xfs_inode_enable_dax(ip))
+		flags |= S_DAX;
+
+	inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME);
+	inode->i_flags |= flags;
 }
 
 /*
@@ -1332,7 +1336,7 @@ xfs_setup_inode(
 	inode->i_gid    = xfs_gid_to_kgid(ip->i_d.di_gid);
 
 	i_size_write(inode, ip->i_d.di_size);
-	xfs_diflags_to_iflags(inode, ip);
+	xfs_diflags_to_iflags(ip, true);
 
 	if (S_ISDIR(inode->i_mode)) {
 		/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V7 7/9] fs: Define I_DONTCACNE in VFS layer
  2020-04-13  5:40 [PATCH V7 0/9] Enable per-file/per-directory DAX operations V7 ira.weiny
                   ` (5 preceding siblings ...)
  2020-04-13  5:40 ` [PATCH V7 6/9] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags() ira.weiny
@ 2020-04-13  5:40 ` ira.weiny
  2020-04-13 16:09   ` Darrick J. Wong
  2020-04-14 15:26   ` Jan Kara
  2020-04-13  5:40 ` [PATCH V7 8/9] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate() ira.weiny
  2020-04-13  5:40 ` [PATCH V7 9/9] Documentation/dax: Update Usage section ira.weiny
  8 siblings, 2 replies; 47+ messages in thread
From: ira.weiny @ 2020-04-13  5:40 UTC (permalink / raw)
  To: linux-kernel, Darrick J. Wong
  Cc: Ira Weiny, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

From: Ira Weiny <ira.weiny@intel.com>

DAX effective mode changes (setting of S_DAX) require inode eviction.

Define a flag which can be set to inform the VFS layer that inodes
should not be cached.  This will expedite the eviction of those nodes
requiring reload.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 include/linux/fs.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index a818ced22961..e2db71d150c3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2151,6 +2151,8 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
  *
  * I_CREATING		New object's inode in the middle of setting up.
  *
+ * I_DONTCACHE		Do not cache the inode
+ *
  * Q: What is the difference between I_WILL_FREE and I_FREEING?
  */
 #define I_DIRTY_SYNC		(1 << 0)
@@ -2173,6 +2175,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
 #define I_WB_SWITCH		(1 << 13)
 #define I_OVL_INUSE		(1 << 14)
 #define I_CREATING		(1 << 15)
+#define I_DONTCACHE		(1 << 16)
 
 #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
 #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
@@ -3042,7 +3045,8 @@ extern int inode_needs_sync(struct inode *inode);
 extern int generic_delete_inode(struct inode *inode);
 static inline int generic_drop_inode(struct inode *inode)
 {
-	return !inode->i_nlink || inode_unhashed(inode);
+	return !inode->i_nlink || inode_unhashed(inode) ||
+		(inode->i_state & I_DONTCACHE);
 }
 
 extern struct inode *ilookup5_nowait(struct super_block *sb,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V7 8/9] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate()
  2020-04-13  5:40 [PATCH V7 0/9] Enable per-file/per-directory DAX operations V7 ira.weiny
                   ` (6 preceding siblings ...)
  2020-04-13  5:40 ` [PATCH V7 7/9] fs: Define I_DONTCACNE in VFS layer ira.weiny
@ 2020-04-13  5:40 ` ira.weiny
  2020-04-13 16:12   ` Darrick J. Wong
  2020-04-13  5:40 ` [PATCH V7 9/9] Documentation/dax: Update Usage section ira.weiny
  8 siblings, 1 reply; 47+ messages in thread
From: ira.weiny @ 2020-04-13  5:40 UTC (permalink / raw)
  To: linux-kernel, Darrick J. Wong
  Cc: Ira Weiny, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

From: Ira Weiny <ira.weiny@intel.com>

We only support changing FS_XFLAG_DAX on directories.  Files get their
flag from the parent directory on creation only.  So no data
invalidation needs to happen.

Alter the xfs_ioctl_setattr_dax_invalidate() to be
xfs_ioctl_setattr_dax_validate().  xfs_ioctl_setattr_dax_validate() now
validates that any FS_XFLAG_DAX change is ok.

This also allows use to remove the join_flags logic.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from v6:
	Fix completely broken implementation and update commit message.
	Use the new VFS layer I_DONTCACHE to facilitate inode eviction
	and S_DAX changing on drop_caches

Changes from v5:
	New patch
---
 fs/xfs/xfs_ioctl.c | 102 +++++++--------------------------------------
 1 file changed, 16 insertions(+), 86 deletions(-)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index c6cd92ef4a05..ba42a5fb5b05 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1145,63 +1145,23 @@ xfs_ioctl_setattr_xflags(
 }
 
 /*
- * If we are changing DAX flags, we have to ensure the file is clean and any
- * cached objects in the address space are invalidated and removed. This
- * requires us to lock out other IO and page faults similar to a truncate
- * operation. The locks need to be held until the transaction has been committed
- * so that the cache invalidation is atomic with respect to the DAX flag
- * manipulation.
+ * Mark inodes with a changing FS_XFLAG_DAX, I_DONTCACHE
  */
-static int
+static void
 xfs_ioctl_setattr_dax_invalidate(
 	struct xfs_inode	*ip,
-	struct fsxattr		*fa,
-	int			*join_flags)
+	struct fsxattr		*fa)
 {
-	struct inode		*inode = VFS_I(ip);
-	struct super_block	*sb = inode->i_sb;
-	int			error;
-
-	*join_flags = 0;
-
-	/*
-	 * It is only valid to set the DAX flag on regular files and
-	 * directories on filesystems where the block size is equal to the page
-	 * size. On directories it serves as an inherited hint so we don't
-	 * have to check the device for dax support or flush pagecache.
-	 */
-	if (fa->fsx_xflags & FS_XFLAG_DAX) {
-		struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
-
-		if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize))
-			return -EINVAL;
-	}
-
-	/* If the DAX state is not changing, we have nothing to do here. */
-	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
-		return 0;
-	if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
-		return 0;
+	struct inode            *inode = VFS_I(ip);
 
 	if (S_ISDIR(inode->i_mode))
-		return 0;
-
-	/* lock, flush and invalidate mapping in preparation for flag change */
-	xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
-	error = filemap_write_and_wait(inode->i_mapping);
-	if (error)
-		goto out_unlock;
-	error = invalidate_inode_pages2(inode->i_mapping);
-	if (error)
-		goto out_unlock;
-
-	*join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
-	return 0;
-
-out_unlock:
-	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
-	return error;
+		return;
 
+	if (((fa->fsx_xflags & FS_XFLAG_DAX) &&
+	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) ||
+	    (!(fa->fsx_xflags & FS_XFLAG_DAX) &&
+	     (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)))
+		inode->i_state |= I_DONTCACHE;
 }
 
 /*
@@ -1209,17 +1169,10 @@ xfs_ioctl_setattr_dax_invalidate(
  * have permission to do so. On success, return a clean transaction and the
  * inode locked exclusively ready for further operation specific checks. On
  * failure, return an error without modifying or locking the inode.
- *
- * The inode might already be IO locked on call. If this is the case, it is
- * indicated in @join_flags and we take full responsibility for ensuring they
- * are unlocked from now on. Hence if we have an error here, we still have to
- * unlock them. Otherwise, once they are joined to the transaction, they will
- * be unlocked on commit/cancel.
  */
 static struct xfs_trans *
 xfs_ioctl_setattr_get_trans(
-	struct xfs_inode	*ip,
-	int			join_flags)
+	struct xfs_inode	*ip)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_trans	*tp;
@@ -1236,8 +1189,7 @@ xfs_ioctl_setattr_get_trans(
 		goto out_unlock;
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
-	join_flags = 0;
+	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
 
 	/*
 	 * CAP_FOWNER overrides the following restrictions:
@@ -1258,8 +1210,6 @@ xfs_ioctl_setattr_get_trans(
 out_cancel:
 	xfs_trans_cancel(tp);
 out_unlock:
-	if (join_flags)
-		xfs_iunlock(ip, join_flags);
 	return ERR_PTR(error);
 }
 
@@ -1386,7 +1336,6 @@ xfs_ioctl_setattr(
 	struct xfs_dquot	*pdqp = NULL;
 	struct xfs_dquot	*olddquot = NULL;
 	int			code;
-	int			join_flags = 0;
 
 	trace_xfs_ioctl_setattr(ip);
 
@@ -1410,18 +1359,9 @@ xfs_ioctl_setattr(
 			return code;
 	}
 
-	/*
-	 * Changing DAX config may require inode locking for mapping
-	 * invalidation. These need to be held all the way to transaction commit
-	 * or cancel time, so need to be passed through to
-	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
-	 * appropriately.
-	 */
-	code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags);
-	if (code)
-		goto error_free_dquots;
+	xfs_ioctl_setattr_dax_invalidate(ip, fa);
 
-	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
+	tp = xfs_ioctl_setattr_get_trans(ip);
 	if (IS_ERR(tp)) {
 		code = PTR_ERR(tp);
 		goto error_free_dquots;
@@ -1552,7 +1492,6 @@ xfs_ioc_setxflags(
 	struct fsxattr		fa;
 	struct fsxattr		old_fa;
 	unsigned int		flags;
-	int			join_flags = 0;
 	int			error;
 
 	if (copy_from_user(&flags, arg, sizeof(flags)))
@@ -1569,18 +1508,9 @@ xfs_ioc_setxflags(
 	if (error)
 		return error;
 
-	/*
-	 * Changing DAX config may require inode locking for mapping
-	 * invalidation. These need to be held all the way to transaction commit
-	 * or cancel time, so need to be passed through to
-	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
-	 * appropriately.
-	 */
-	error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags);
-	if (error)
-		goto out_drop_write;
+	xfs_ioctl_setattr_dax_invalidate(ip, &fa);
 
-	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
+	tp = xfs_ioctl_setattr_get_trans(ip);
 	if (IS_ERR(tp)) {
 		error = PTR_ERR(tp);
 		goto out_drop_write;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V7 9/9] Documentation/dax: Update Usage section
  2020-04-13  5:40 [PATCH V7 0/9] Enable per-file/per-directory DAX operations V7 ira.weiny
                   ` (7 preceding siblings ...)
  2020-04-13  5:40 ` [PATCH V7 8/9] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate() ira.weiny
@ 2020-04-13  5:40 ` ira.weiny
  2020-04-13 16:19   ` Darrick J. Wong
  2020-04-14  5:21   ` Dan Williams
  8 siblings, 2 replies; 47+ messages in thread
From: ira.weiny @ 2020-04-13  5:40 UTC (permalink / raw)
  To: linux-kernel, Darrick J. Wong
  Cc: Ira Weiny, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

From: Ira Weiny <ira.weiny@intel.com>

Update the Usage section to reflect the new individual dax selection
functionality.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V6:
	Update to allow setting FS_XFLAG_DAX any time.
	Update with list of behaviors from Darrick
	https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/

Changes from V5:
	Update to reflect the agreed upon semantics
	https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
---
 Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
 1 file changed, 163 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
index 679729442fd2..af14c1b330a9 100644
--- a/Documentation/filesystems/dax.txt
+++ b/Documentation/filesystems/dax.txt
@@ -17,11 +17,171 @@ For file mappings, the storage device is mapped directly into userspace.
 Usage
 -----
 
-If you have a block device which supports DAX, you can make a filesystem
+If you have a block device which supports DAX, you can make a file system
 on it as usual.  The DAX code currently only supports files with a block
 size equal to your kernel's PAGE_SIZE, so you may need to specify a block
-size when creating the filesystem.  When mounting it, use the "-o dax"
-option on the command line or add 'dax' to the options in /etc/fstab.
+size when creating the file system.
+
+Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
+different at this time.
+
+Enabling DAX on ext4
+--------------------
+
+When mounting the filesystem, use the "-o dax" option on the command line or
+add 'dax' to the options in /etc/fstab.
+
+
+Enabling DAX on xfs
+-------------------
+
+Summary
+-------
+
+ 1. There exists an in-kernel access mode flag S_DAX that is set when
+    file accesses go directly to persistent memory, bypassing the page
+    cache.  Applications must call statx to discover the current S_DAX
+    state (STATX_ATTR_DAX).
+
+ 2. There exists an advisory file inode flag FS_XFLAG_DAX that is
+    inherited from the parent directory FS_XFLAG_DAX inode flag at file
+    creation time.  This advisory flag can be set or cleared at any
+    time, but doing so does not immediately affect the S_DAX state.
+
+    Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
+    and the fs is on pmem then it will enable S_DAX at inode load time;
+    if FS_XFLAG_DAX is not set, it will not enable S_DAX.
+
+ 3. There exists a dax= mount option.
+
+    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
+
+    "-o dax=always" means "always set S_DAX (at least on pmem),
+                    and ignore FS_XFLAG_DAX."
+
+    "-o dax"        is an alias for "dax=always".
+
+    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
+
+ 4. There exists an advisory directory inode flag FS_XFLAG_DAX that can
+    be set or cleared at any time.  The flag state is inherited by any files or
+    subdirectories when they are created within that directory.
+
+ 5. Programs that require a specific file access mode (DAX or not DAX)
+    can do one of the following:
+
+    (a) Create files in directories that the FS_XFLAG_DAX flag set as
+        needed; or
+
+    (b) Have the administrator set an override via mount option; or
+
+    (c) Set or clear the file's FS_XFLAG_DAX flag as needed.  Programs
+        must then cause the kernel to evict the inode from memory.  This
+        can be done by:
+
+        i>  Closing the file and re-opening the file and using statx to
+            see if the fs has changed the S_DAX flag; and
+
+        ii> If the file still does not have the desired S_DAX access
+            mode, either unmount and remount the filesystem, or close
+            the file and use drop_caches.
+
+ 6. It is expected that users who want to squeeze every last bit of performance
+    out of the particular rough and tumble bits of their storage will also be
+    exposed to the difficulties of what happens when the operating system can't
+    totally virtualize those hardware capabilities.  DAX is such a feature.
+    Basically, Formula-1 cars require a bit more care and feeding than your
+    averaged Toyota minivan, as it were.
+
+
+Details
+-------
+
+There are 2 per-file dax flags.  One is a physical inode setting (FS_XFLAG_DAX)
+and the other a currently enabled state (S_DAX).
+
+FS_XFLAG_DAX is maintained, on disk, on individual inodes.  It is preserved
+within the file system.  This 'physical' config setting can be set using an
+ioctl and/or an application such as "xfs_io -c 'chattr [-+]x'".  Files and
+directories automatically inherit FS_XFLAG_DAX from their parent directory
+_when_ _created_.  Therefore, setting FS_XFLAG_DAX at directory creation time
+can be used to set a default behavior for an entire sub-tree.  (Doing so on the
+root directory acts to set a default for the entire file system.)
+
+To clarify inheritance here are 3 examples:
+
+Example A:
+
+mkdir -p a/b/c
+xfs_io 'chattr +x' a
+mkdir a/b/c/d
+mkdir a/e
+
+	dax: a,e
+	no dax: b,c,d
+
+Example B:
+
+mkdir a
+xfs_io 'chattr +x' a
+mkdir -p a/b/c/d
+
+	dax: a,b,c,d
+	no dax:
+
+Example C:
+
+mkdir -p a/b/c
+xfs_io 'chattr +x' c
+mkdir a/b/c/d
+
+	dax: c,d
+	no dax: a,b
+
+
+The current enabled state (S_DAX) is set when a file inode is _loaded_ based on
+the underlying media support, the value of FS_XFLAG_DAX, and the file systems
+dax mount option setting.  See below.
+
+statx can be used to query S_DAX.  NOTE that a directory will never have S_DAX
+set and therefore statx will always return false on directories.
+
+NOTE: Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs
+even if the underlying media does not support dax and/or the file system is
+overridden with a mount option.
+
+
+Overriding FS_XFLAG_DAX (dax= mount option)
+-------------------------------------------
+
+There exists a dax mount option.  Using the mount option does not change the
+physical configured state of individual files but overrides the S_DAX operating
+state when inodes are loaded.
+
+Given underlying media support, the dax mount option is a tri-state option
+(never, always, inode) with the following meanings:
+
+   "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
+   "-o dax=always" means "always set S_DAX, ignore FS_XFLAG_DAX"
+        "-o dax" by itself means "dax=always" to remain compatible with older
+	         kernels
+   "-o dax=inode" means "follow FS_XFLAG_DAX"
+
+The default state is 'inode'.  Given underlying media support, the following
+algorithm is used to determine the effective mode of the file S_DAX on a
+capable device.
+
+	S_DAX = FS_XFLAG_DAX;
+
+	if (dax_mount == "always")
+		S_DAX = true;
+	else if (dax_mount == "off"
+		S_DAX = false;
+
+To reiterate: Setting, and inheritance, continues to affect FS_XFLAG_DAX even
+while the file system is mounted with a dax override.  However, file enabled
+state, S_DAX, will continue to be the overridden until the file system is
+remounted with dax=inode.
 
 
 Implementation Tips for Block Driver Writers
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 1/9] fs/xfs: Remove unnecessary initialization of i_rwsem
  2020-04-13  5:40 ` [PATCH V7 1/9] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
@ 2020-04-13 15:41   ` Darrick J. Wong
  0 siblings, 0 replies; 47+ messages in thread
From: Darrick J. Wong @ 2020-04-13 15:41 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, Dave Chinner, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Sun, Apr 12, 2020 at 10:40:38PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> An earlier call of xfs_reinit_inode() from xfs_iget_cache_hit() already
> handles initialization of i_rwsem.
> 
> Doing so again is unneeded.
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Seems reasonable to me,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> 
> ---
> Changes from V4:
> 	Update commit message to make it clear the xfs_iget_cache_hit()
> 	is actually doing the initialization via xfs_reinit_inode()
> 
> New for V4:
> 
> NOTE: This was found while ensuring the new i_aops_sem was properly
> handled.  It seems like this is a layering violation so I think it is
> worth cleaning up so as to not confuse others.
> ---
>  fs/xfs/xfs_icache.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 8dc2e5414276..836a1f09be03 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -419,6 +419,7 @@ xfs_iget_cache_hit(
>  		spin_unlock(&ip->i_flags_lock);
>  		rcu_read_unlock();
>  
> +		ASSERT(!rwsem_is_locked(&inode->i_rwsem));
>  		error = xfs_reinit_inode(mp, inode);
>  		if (error) {
>  			bool wake;
> @@ -452,9 +453,6 @@ xfs_iget_cache_hit(
>  		ip->i_sick = 0;
>  		ip->i_checked = 0;
>  
> -		ASSERT(!rwsem_is_locked(&inode->i_rwsem));
> -		init_rwsem(&inode->i_rwsem);
> -
>  		spin_unlock(&ip->i_flags_lock);
>  		spin_unlock(&pag->pag_ici_lock);
>  	} else {
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 4/9] fs/xfs: Make DAX mount option a tri-state
  2020-04-13  5:40 ` [PATCH V7 4/9] fs/xfs: Make DAX mount option a tri-state ira.weiny
@ 2020-04-13 15:46   ` Darrick J. Wong
  2020-04-13 19:28     ` Ira Weiny
  2020-04-14  6:25   ` Christoph Hellwig
  1 sibling, 1 reply; 47+ messages in thread
From: Darrick J. Wong @ 2020-04-13 15:46 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Sun, Apr 12, 2020 at 10:40:41PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> As agreed upon[1].  We make the dax mount option a tri-state.  '-o dax'
> continues to operate the same.  We add 'always', 'never', and 'inode'
> (default).
> 
> [1] https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from v6:
> 	Use 2 flag bits rather than a field.
> 	change iflag to inode
> 
> Changes from v5:
> 	New Patch
> ---
>  fs/xfs/xfs_mount.h |  3 ++-
>  fs/xfs/xfs_super.c | 44 ++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 42 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 88ab09ed29e7..d581b990e59a 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -233,7 +233,8 @@ typedef struct xfs_mount {
>  						   allocator */
>  #define XFS_MOUNT_NOATTR2	(1ULL << 25)	/* disable use of attr2 format */
>  
> -#define XFS_MOUNT_DAX		(1ULL << 62)	/* TEST ONLY! */
> +#define XFS_MOUNT_DAX		(1ULL << 62)
> +#define XFS_MOUNT_NODAX		(1ULL << 63)
>  
>  /*
>   * Max and min values for mount-option defined I/O
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 2094386af8ac..d7bd8f5e00c9 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -47,6 +47,32 @@ static struct kset *xfs_kset;		/* top-level xfs sysfs dir */
>  static struct xfs_kobj xfs_dbg_kobj;	/* global debug sysfs attrs */
>  #endif
>  
> +enum {
> +	XFS_DAX_INODE = 0,
> +	XFS_DAX_ALWAYS = 1,
> +	XFS_DAX_NEVER = 2,
> +};
> +
> +static void xfs_mount_set_dax_mode(struct xfs_mount *mp, u32 val)
> +{
> +	if (val == XFS_DAX_INODE) {
> +		mp->m_flags &= ~(XFS_MOUNT_DAX | XFS_MOUNT_NODAX);
> +	} else if (val == XFS_DAX_ALWAYS) {
> +		mp->m_flags &= ~XFS_MOUNT_NODAX;
> +		mp->m_flags |= XFS_MOUNT_DAX;
> +	} else if (val == XFS_DAX_NEVER) {
> +		mp->m_flags &= ~XFS_MOUNT_DAX;
> +		mp->m_flags |= XFS_MOUNT_NODAX;
> +	}
> +}
> +
> +static const struct constant_table dax_param_enums[] = {
> +	{"inode",	XFS_DAX_INODE },
> +	{"always",	XFS_DAX_ALWAYS },
> +	{"never",	XFS_DAX_NEVER },
> +	{}

I think that the dax_param_enums table (and the unnamed enum defining
XFS_DAX_*) probably ought to be part of the VFS so that you don't have
to duplicate these two pieces whenever it's time to bring ext4 in line
with XFS.

That probably doesn't need to be done right away, though...

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D


> +};
> +
>  /*
>   * Table driven mount option parser.
>   */
> @@ -59,7 +85,7 @@ enum {
>  	Opt_filestreams, Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota,
>  	Opt_prjquota, Opt_uquota, Opt_gquota, Opt_pquota,
>  	Opt_uqnoenforce, Opt_gqnoenforce, Opt_pqnoenforce, Opt_qnoenforce,
> -	Opt_discard, Opt_nodiscard, Opt_dax,
> +	Opt_discard, Opt_nodiscard, Opt_dax, Opt_dax_enum,
>  };
>  
>  static const struct fs_parameter_spec xfs_fs_parameters[] = {
> @@ -103,6 +129,7 @@ static const struct fs_parameter_spec xfs_fs_parameters[] = {
>  	fsparam_flag("discard",		Opt_discard),
>  	fsparam_flag("nodiscard",	Opt_nodiscard),
>  	fsparam_flag("dax",		Opt_dax),
> +	fsparam_enum("dax",		Opt_dax_enum, dax_param_enums),
>  	{}
>  };
>  
> @@ -129,7 +156,6 @@ xfs_fs_show_options(
>  		{ XFS_MOUNT_GRPID,		",grpid" },
>  		{ XFS_MOUNT_DISCARD,		",discard" },
>  		{ XFS_MOUNT_LARGEIO,		",largeio" },
> -		{ XFS_MOUNT_DAX,		",dax" },
>  		{ 0, NULL }
>  	};
>  	struct xfs_mount	*mp = XFS_M(root->d_sb);
> @@ -185,6 +211,13 @@ xfs_fs_show_options(
>  	if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT))
>  		seq_puts(m, ",noquota");
>  
> +	if (mp->m_flags & XFS_MOUNT_DAX)
> +		seq_puts(m, ",dax=always");
> +	else if (mp->m_flags & XFS_MOUNT_NODAX)
> +		seq_puts(m, ",dax=never");
> +	else
> +		seq_puts(m, ",dax=inode");
> +
>  	return 0;
>  }
>  
> @@ -1244,7 +1277,10 @@ xfs_fc_parse_param(
>  		return 0;
>  #ifdef CONFIG_FS_DAX
>  	case Opt_dax:
> -		mp->m_flags |= XFS_MOUNT_DAX;
> +		xfs_mount_set_dax_mode(mp, XFS_DAX_ALWAYS);
> +		return 0;
> +	case Opt_dax_enum:
> +		xfs_mount_set_dax_mode(mp, result.uint_32);
>  		return 0;
>  #endif
>  	default:
> @@ -1451,7 +1487,7 @@ xfs_fc_fill_super(
>  		if (!rtdev_is_dax && !datadev_is_dax) {
>  			xfs_alert(mp,
>  			"DAX unsupported by block device. Turning off DAX.");
> -			mp->m_flags &= ~XFS_MOUNT_DAX;
> +			xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
>  		}
>  		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
>  			xfs_alert(mp,
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 5/9] fs/xfs: Create function xfs_inode_enable_dax()
  2020-04-13  5:40 ` [PATCH V7 5/9] fs/xfs: Create function xfs_inode_enable_dax() ira.weiny
@ 2020-04-13 15:52   ` Darrick J. Wong
  2020-04-13 19:39     ` Ira Weiny
  2020-04-14  6:27   ` Christoph Hellwig
  1 sibling, 1 reply; 47+ messages in thread
From: Darrick J. Wong @ 2020-04-13 15:52 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Sun, Apr 12, 2020 at 10:40:42PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> xfs_inode_supports_dax() should reflect if the inode can support DAX not
> that it is enabled for DAX.
> 
> Change the use of xfs_inode_supports_dax() to reflect only if the inode
> and underlying storage support dax.
> 
> Add a new function xfs_inode_enable_dax() which reflects if the inode
> should be enabled for DAX.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from v6:
> 	Change enable checks to be sequential logic.
> 	Update for 2 bit tri-state option.
> 	Make 'static' consistent.
> 	Don't set S_DAX if !CONFIG_FS_DAX
> 
> Changes from v5:
> 	Update to reflect the new tri-state mount option
> 
> Changes from v3:
> 	Update functions and names to be more clear
> 	Update commit message
> 	Merge with
> 		'fs/xfs: Clean up DAX support check'
> 		don't allow IS_DAX() on a directory
> 		use STATIC macro for static
> 		make xfs_inode_supports_dax() static
> ---
>  fs/xfs/xfs_iops.c | 34 +++++++++++++++++++++++++++++-----
>  1 file changed, 29 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 81f2f93caec0..37bd15b55878 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -1244,12 +1244,11 @@ xfs_inode_supports_dax(
>  	struct xfs_mount	*mp = ip->i_mount;
>  
>  	/* Only supported on non-reflinked files. */
> -	if (!S_ISREG(VFS_I(ip)->i_mode) || xfs_is_reflink_inode(ip))
> +	if (xfs_is_reflink_inode(ip))
>  		return false;
>  
> -	/* DAX mount option or DAX iflag must be set. */
> -	if (!(mp->m_flags & XFS_MOUNT_DAX) &&
> -	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
> +	/* Only supported on regular files. */
> +	if (!S_ISREG(VFS_I(ip)->i_mode))
>  		return false;

Why separate the !S_ISREG and the is_reflink_inode checks?

The part about "Change the use of xfs_inode_supports_dax() to reflect
only if the inode and underlying storage support dax" would be a lot
more straightforward if this hunk only removed the DIFLAG2_DAX check.

The rest of the patch looks ok.

--D

>  
>  	/* Block size must match page size */
> @@ -1260,6 +1259,31 @@ xfs_inode_supports_dax(
>  	return xfs_inode_buftarg(ip)->bt_daxdev != NULL;
>  }
>  
> +#ifdef CONFIG_FS_DAX
> +static bool
> +xfs_inode_enable_dax(
> +	struct xfs_inode *ip)
> +{
> +	if (ip->i_mount->m_flags & XFS_MOUNT_NODAX)
> +		return false;
> +	if (!xfs_inode_supports_dax(ip))
> +		return false;
> +	if (ip->i_mount->m_flags & XFS_MOUNT_DAX)
> +		return true;
> +	if (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)
> +		return true;
> +	return false;
> +}
> +#else /* !CONFIG_FS_DAX */
> +static bool
> +xfs_inode_enable_dax(
> +	struct xfs_inode *ip)
> +{
> +	return false;
> +}
> +#endif /* CONFIG_FS_DAX */
> +
> +
>  STATIC void
>  xfs_diflags_to_iflags(
>  	struct inode		*inode,
> @@ -1278,7 +1302,7 @@ xfs_diflags_to_iflags(
>  		inode->i_flags |= S_SYNC;
>  	if (flags & XFS_DIFLAG_NOATIME)
>  		inode->i_flags |= S_NOATIME;
> -	if (xfs_inode_supports_dax(ip))
> +	if (xfs_inode_enable_dax(ip))
>  		inode->i_flags |= S_DAX;
>  }
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 6/9] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-13  5:40 ` [PATCH V7 6/9] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags() ira.weiny
@ 2020-04-13 16:01   ` Darrick J. Wong
  2020-04-14  4:07     ` Ira Weiny
  0 siblings, 1 reply; 47+ messages in thread
From: Darrick J. Wong @ 2020-04-13 16:01 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Sun, Apr 12, 2020 at 10:40:43PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> The functionality in xfs_diflags_to_linux() and xfs_diflags_to_iflags() are
> nearly identical.  The only difference is that *_to_linux() is called after
> inode setup and disallows changing the DAX flag.
> 
> Combining them can be done with a flag which indicates if this is the initial
> setup to allow the DAX flag to be properly set only at init time.
> 
> So remove xfs_diflags_to_linux() and call the modified xfs_diflags_to_iflags()
> directly.
> 
> While we are here simplify xfs_diflags_to_iflags() to take struct xfs_inode and
> use xfs_ip2xflags() to ensure future diflags are included correctly.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from V6:
> 	Move unrelated hunk to previous patch.
> 	Change logic for better code generation.
> 
> Changes from V5:
> 	The functions are no longer identical so we can only combine
> 	them rather than deleting one completely.  This is reflected in
> 	the new init parameter.
> ---
>  fs/xfs/xfs_inode.h |  1 +
>  fs/xfs/xfs_ioctl.c | 33 +--------------------------------
>  fs/xfs/xfs_iops.c  | 42 +++++++++++++++++++++++-------------------
>  3 files changed, 25 insertions(+), 51 deletions(-)
> 
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 492e53992fa9..e76ed9ca17f7 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -466,6 +466,7 @@ int	xfs_break_layouts(struct inode *inode, uint *iolock,
>  /* from xfs_iops.c */
>  extern void xfs_setup_inode(struct xfs_inode *ip);
>  extern void xfs_setup_iops(struct xfs_inode *ip);
> +extern void xfs_diflags_to_iflags(struct xfs_inode *ip, bool init);
>  
>  /*
>   * When setting up a newly allocated inode, we need to call
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index d42de92cb283..c6cd92ef4a05 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1100,37 +1100,6 @@ xfs_flags2diflags2(
>  	return di_flags2;
>  }
>  
> -STATIC void
> -xfs_diflags_to_linux(
> -	struct xfs_inode	*ip)
> -{
> -	struct inode		*inode = VFS_I(ip);
> -	unsigned int		xflags = xfs_ip2xflags(ip);
> -
> -	if (xflags & FS_XFLAG_IMMUTABLE)
> -		inode->i_flags |= S_IMMUTABLE;
> -	else
> -		inode->i_flags &= ~S_IMMUTABLE;
> -	if (xflags & FS_XFLAG_APPEND)
> -		inode->i_flags |= S_APPEND;
> -	else
> -		inode->i_flags &= ~S_APPEND;
> -	if (xflags & FS_XFLAG_SYNC)
> -		inode->i_flags |= S_SYNC;
> -	else
> -		inode->i_flags &= ~S_SYNC;
> -	if (xflags & FS_XFLAG_NOATIME)
> -		inode->i_flags |= S_NOATIME;
> -	else
> -		inode->i_flags &= ~S_NOATIME;
> -#if 0	/* disabled until the flag switching races are sorted out */
> -	if (xflags & FS_XFLAG_DAX)
> -		inode->i_flags |= S_DAX;
> -	else
> -		inode->i_flags &= ~S_DAX;
> -#endif
> -}
> -
>  static int
>  xfs_ioctl_setattr_xflags(
>  	struct xfs_trans	*tp,
> @@ -1168,7 +1137,7 @@ xfs_ioctl_setattr_xflags(
>  	ip->i_d.di_flags = xfs_flags2diflags(ip, fa->fsx_xflags);
>  	ip->i_d.di_flags2 = di_flags2;
>  
> -	xfs_diflags_to_linux(ip);
> +	xfs_diflags_to_iflags(ip, false);
>  	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
>  	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
>  	XFS_STATS_INC(mp, xs_ig_attrchg);
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 37bd15b55878..856ad823e754 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -1284,26 +1284,30 @@ xfs_inode_enable_dax(
>  #endif /* CONFIG_FS_DAX */
>  
>  
> -STATIC void
> +void
>  xfs_diflags_to_iflags(
> -	struct inode		*inode,
> -	struct xfs_inode	*ip)
> +	struct xfs_inode	*ip,
> +	bool init)
>  {
> -	uint16_t		flags = ip->i_d.di_flags;
> -
> -	inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC |
> -			    S_NOATIME | S_DAX);
> -
> -	if (flags & XFS_DIFLAG_IMMUTABLE)
> -		inode->i_flags |= S_IMMUTABLE;
> -	if (flags & XFS_DIFLAG_APPEND)
> -		inode->i_flags |= S_APPEND;
> -	if (flags & XFS_DIFLAG_SYNC)
> -		inode->i_flags |= S_SYNC;
> -	if (flags & XFS_DIFLAG_NOATIME)
> -		inode->i_flags |= S_NOATIME;
> -	if (xfs_inode_enable_dax(ip))
> -		inode->i_flags |= S_DAX;
> +	struct inode            *inode = VFS_I(ip);
> +	unsigned int            xflags = xfs_ip2xflags(ip);
> +	unsigned int            flags = 0;
> +
> +	ASSERT(!(IS_DAX(inode) && init));
> +
> +	if (xflags & FS_XFLAG_IMMUTABLE)
> +		flags |= S_IMMUTABLE;
> +	if (xflags & FS_XFLAG_APPEND)
> +		flags |= S_APPEND;
> +	if (xflags & FS_XFLAG_SYNC)
> +		flags |= S_SYNC;
> +	if (xflags & FS_XFLAG_NOATIME)
> +		flags |= S_NOATIME;
> +	if (init && xfs_inode_enable_dax(ip))
> +		flags |= S_DAX;
> +
> +	inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME);

I noticed that S_DAX drops out of the mask out operation here, which of
course resulted in an eyebrow-raise because the other four flags are
always set to whatever we just computed. :)

Then I realized that yes, this is intentional since we can't change
S_DAX on the fly, and that S_DAX is never set i_flags on an inode that's
being initialized so we don't need to mask off S_DAX ever.

Could we add a comment here to remind the reader that S_DAX is a bit
special?

/*
 * S_DAX can only be set during inode initialization and is never set by
 * the VFS, so we cannot mask off S_DAX in i_flags.
 */

With that added,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> +	inode->i_flags |= flags;
>  }
>  
>  /*
> @@ -1332,7 +1336,7 @@ xfs_setup_inode(
>  	inode->i_gid    = xfs_gid_to_kgid(ip->i_d.di_gid);
>  
>  	i_size_write(inode, ip->i_d.di_size);
> -	xfs_diflags_to_iflags(inode, ip);
> +	xfs_diflags_to_iflags(ip, true);
>  
>  	if (S_ISDIR(inode->i_mode)) {
>  		/*
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 7/9] fs: Define I_DONTCACNE in VFS layer
  2020-04-13  5:40 ` [PATCH V7 7/9] fs: Define I_DONTCACNE in VFS layer ira.weiny
@ 2020-04-13 16:09   ` Darrick J. Wong
  2020-04-13 16:13     ` Darrick J. Wong
  2020-04-13 19:44     ` Ira Weiny
  2020-04-14 15:26   ` Jan Kara
  1 sibling, 2 replies; 47+ messages in thread
From: Darrick J. Wong @ 2020-04-13 16:09 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

> Subject: [PATCH V7 7/9] fs: Define I_DONTCACNE in VFS layer

CACNE -> CACHE.

On Sun, Apr 12, 2020 at 10:40:44PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> DAX effective mode changes (setting of S_DAX) require inode eviction.
> 
> Define a flag which can be set to inform the VFS layer that inodes
> should not be cached.  This will expedite the eviction of those nodes
> requiring reload.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  include/linux/fs.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index a818ced22961..e2db71d150c3 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2151,6 +2151,8 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
>   *
>   * I_CREATING		New object's inode in the middle of setting up.
>   *
> + * I_DONTCACHE		Do not cache the inode

"Do not cache" is a bit vague, how about:

"Evict the inode when the last reference is dropped.
Do not put it on the LRU list."

Also, shouldn't xfs_ioctl_setattr be setting I_DONTCACHE if someone
changes FS_XFLAG_DAX (and there are no mount option overrides)?  I don't
see any user of I_DONTCACHE in this series.

(Also also, please convert XFS_IDONTCACHE, since it's a straightforward
conversion...)

--D

> + *
>   * Q: What is the difference between I_WILL_FREE and I_FREEING?
>   */
>  #define I_DIRTY_SYNC		(1 << 0)
> @@ -2173,6 +2175,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
>  #define I_WB_SWITCH		(1 << 13)
>  #define I_OVL_INUSE		(1 << 14)
>  #define I_CREATING		(1 << 15)
> +#define I_DONTCACHE		(1 << 16)
>  
>  #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
>  #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
> @@ -3042,7 +3045,8 @@ extern int inode_needs_sync(struct inode *inode);
>  extern int generic_delete_inode(struct inode *inode);
>  static inline int generic_drop_inode(struct inode *inode)
>  {
> -	return !inode->i_nlink || inode_unhashed(inode);
> +	return !inode->i_nlink || inode_unhashed(inode) ||
> +		(inode->i_state & I_DONTCACHE);
>  }
>  
>  extern struct inode *ilookup5_nowait(struct super_block *sb,
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 8/9] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate()
  2020-04-13  5:40 ` [PATCH V7 8/9] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate() ira.weiny
@ 2020-04-13 16:12   ` Darrick J. Wong
  2020-04-13 19:46     ` Ira Weiny
  0 siblings, 1 reply; 47+ messages in thread
From: Darrick J. Wong @ 2020-04-13 16:12 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Sun, Apr 12, 2020 at 10:40:45PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> We only support changing FS_XFLAG_DAX on directories.  Files get their
> flag from the parent directory on creation only.  So no data
> invalidation needs to happen.
> 
> Alter the xfs_ioctl_setattr_dax_invalidate() to be
> xfs_ioctl_setattr_dax_validate().  xfs_ioctl_setattr_dax_validate() now
> validates that any FS_XFLAG_DAX change is ok.
> 
> This also allows use to remove the join_flags logic.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from v6:
> 	Fix completely broken implementation and update commit message.
> 	Use the new VFS layer I_DONTCACHE to facilitate inode eviction
> 	and S_DAX changing on drop_caches
> 
> Changes from v5:
> 	New patch
> ---
>  fs/xfs/xfs_ioctl.c | 102 +++++++--------------------------------------
>  1 file changed, 16 insertions(+), 86 deletions(-)
> 
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index c6cd92ef4a05..ba42a5fb5b05 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1145,63 +1145,23 @@ xfs_ioctl_setattr_xflags(
>  }
>  
>  /*
> - * If we are changing DAX flags, we have to ensure the file is clean and any
> - * cached objects in the address space are invalidated and removed. This
> - * requires us to lock out other IO and page faults similar to a truncate
> - * operation. The locks need to be held until the transaction has been committed
> - * so that the cache invalidation is atomic with respect to the DAX flag
> - * manipulation.
> + * Mark inodes with a changing FS_XFLAG_DAX, I_DONTCACHE
>   */
> -static int
> +static void
>  xfs_ioctl_setattr_dax_invalidate(
>  	struct xfs_inode	*ip,
> -	struct fsxattr		*fa,
> -	int			*join_flags)
> +	struct fsxattr		*fa)
>  {
> -	struct inode		*inode = VFS_I(ip);
> -	struct super_block	*sb = inode->i_sb;
> -	int			error;
> -
> -	*join_flags = 0;
> -
> -	/*
> -	 * It is only valid to set the DAX flag on regular files and
> -	 * directories on filesystems where the block size is equal to the page
> -	 * size. On directories it serves as an inherited hint so we don't
> -	 * have to check the device for dax support or flush pagecache.
> -	 */
> -	if (fa->fsx_xflags & FS_XFLAG_DAX) {
> -		struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
> -
> -		if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize))
> -			return -EINVAL;
> -	}
> -
> -	/* If the DAX state is not changing, we have nothing to do here. */
> -	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
> -		return 0;
> -	if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
> -		return 0;
> +	struct inode            *inode = VFS_I(ip);
>  
>  	if (S_ISDIR(inode->i_mode))
> -		return 0;
> -
> -	/* lock, flush and invalidate mapping in preparation for flag change */
> -	xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> -	error = filemap_write_and_wait(inode->i_mapping);
> -	if (error)
> -		goto out_unlock;
> -	error = invalidate_inode_pages2(inode->i_mapping);
> -	if (error)
> -		goto out_unlock;
> -
> -	*join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
> -	return 0;
> -
> -out_unlock:
> -	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> -	return error;
> +		return;

We also need a check up here to skip the I_DONTCACHE setting if the
admin has set a mount option to override the inode flag.

The rest looks good to me.

--D

> +	if (((fa->fsx_xflags & FS_XFLAG_DAX) &&
> +	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) ||
> +	    (!(fa->fsx_xflags & FS_XFLAG_DAX) &&
> +	     (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)))
> +		inode->i_state |= I_DONTCACHE;
>  }
>  
>  /*
> @@ -1209,17 +1169,10 @@ xfs_ioctl_setattr_dax_invalidate(
>   * have permission to do so. On success, return a clean transaction and the
>   * inode locked exclusively ready for further operation specific checks. On
>   * failure, return an error without modifying or locking the inode.
> - *
> - * The inode might already be IO locked on call. If this is the case, it is
> - * indicated in @join_flags and we take full responsibility for ensuring they
> - * are unlocked from now on. Hence if we have an error here, we still have to
> - * unlock them. Otherwise, once they are joined to the transaction, they will
> - * be unlocked on commit/cancel.
>   */
>  static struct xfs_trans *
>  xfs_ioctl_setattr_get_trans(
> -	struct xfs_inode	*ip,
> -	int			join_flags)
> +	struct xfs_inode	*ip)
>  {
>  	struct xfs_mount	*mp = ip->i_mount;
>  	struct xfs_trans	*tp;
> @@ -1236,8 +1189,7 @@ xfs_ioctl_setattr_get_trans(
>  		goto out_unlock;
>  
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
> -	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
> -	join_flags = 0;
> +	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
>  
>  	/*
>  	 * CAP_FOWNER overrides the following restrictions:
> @@ -1258,8 +1210,6 @@ xfs_ioctl_setattr_get_trans(
>  out_cancel:
>  	xfs_trans_cancel(tp);
>  out_unlock:
> -	if (join_flags)
> -		xfs_iunlock(ip, join_flags);
>  	return ERR_PTR(error);
>  }
>  
> @@ -1386,7 +1336,6 @@ xfs_ioctl_setattr(
>  	struct xfs_dquot	*pdqp = NULL;
>  	struct xfs_dquot	*olddquot = NULL;
>  	int			code;
> -	int			join_flags = 0;
>  
>  	trace_xfs_ioctl_setattr(ip);
>  
> @@ -1410,18 +1359,9 @@ xfs_ioctl_setattr(
>  			return code;
>  	}
>  
> -	/*
> -	 * Changing DAX config may require inode locking for mapping
> -	 * invalidation. These need to be held all the way to transaction commit
> -	 * or cancel time, so need to be passed through to
> -	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> -	 * appropriately.
> -	 */
> -	code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags);
> -	if (code)
> -		goto error_free_dquots;
> +	xfs_ioctl_setattr_dax_invalidate(ip, fa);
>  
> -	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> +	tp = xfs_ioctl_setattr_get_trans(ip);
>  	if (IS_ERR(tp)) {
>  		code = PTR_ERR(tp);
>  		goto error_free_dquots;
> @@ -1552,7 +1492,6 @@ xfs_ioc_setxflags(
>  	struct fsxattr		fa;
>  	struct fsxattr		old_fa;
>  	unsigned int		flags;
> -	int			join_flags = 0;
>  	int			error;
>  
>  	if (copy_from_user(&flags, arg, sizeof(flags)))
> @@ -1569,18 +1508,9 @@ xfs_ioc_setxflags(
>  	if (error)
>  		return error;
>  
> -	/*
> -	 * Changing DAX config may require inode locking for mapping
> -	 * invalidation. These need to be held all the way to transaction commit
> -	 * or cancel time, so need to be passed through to
> -	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> -	 * appropriately.
> -	 */
> -	error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags);
> -	if (error)
> -		goto out_drop_write;
> +	xfs_ioctl_setattr_dax_invalidate(ip, &fa);
>  
> -	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> +	tp = xfs_ioctl_setattr_get_trans(ip);
>  	if (IS_ERR(tp)) {
>  		error = PTR_ERR(tp);
>  		goto out_drop_write;
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 7/9] fs: Define I_DONTCACNE in VFS layer
  2020-04-13 16:09   ` Darrick J. Wong
@ 2020-04-13 16:13     ` Darrick J. Wong
  2020-04-13 19:44     ` Ira Weiny
  1 sibling, 0 replies; 47+ messages in thread
From: Darrick J. Wong @ 2020-04-13 16:13 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Mon, Apr 13, 2020 at 09:09:29AM -0700, Darrick J. Wong wrote:
> > Subject: [PATCH V7 7/9] fs: Define I_DONTCACNE in VFS layer
> 
> CACNE -> CACHE.
> 
> On Sun, Apr 12, 2020 at 10:40:44PM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > DAX effective mode changes (setting of S_DAX) require inode eviction.
> > 
> > Define a flag which can be set to inform the VFS layer that inodes
> > should not be cached.  This will expedite the eviction of those nodes
> > requiring reload.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > ---
> >  include/linux/fs.h | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index a818ced22961..e2db71d150c3 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2151,6 +2151,8 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
> >   *
> >   * I_CREATING		New object's inode in the middle of setting up.
> >   *
> > + * I_DONTCACHE		Do not cache the inode
> 
> "Do not cache" is a bit vague, how about:
> 
> "Evict the inode when the last reference is dropped.
> Do not put it on the LRU list."
> 
> Also, shouldn't xfs_ioctl_setattr be setting I_DONTCACHE if someone
> changes FS_XFLAG_DAX (and there are no mount option overrides)?  I don't
> see any user of I_DONTCACHE in this series.

Oops, brain fart, ignore this question ^^^^.

--D

> (Also also, please convert XFS_IDONTCACHE, since it's a straightforward
> conversion...)
> 
> --D
> 
> > + *
> >   * Q: What is the difference between I_WILL_FREE and I_FREEING?
> >   */
> >  #define I_DIRTY_SYNC		(1 << 0)
> > @@ -2173,6 +2175,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
> >  #define I_WB_SWITCH		(1 << 13)
> >  #define I_OVL_INUSE		(1 << 14)
> >  #define I_CREATING		(1 << 15)
> > +#define I_DONTCACHE		(1 << 16)
> >  
> >  #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
> >  #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
> > @@ -3042,7 +3045,8 @@ extern int inode_needs_sync(struct inode *inode);
> >  extern int generic_delete_inode(struct inode *inode);
> >  static inline int generic_drop_inode(struct inode *inode)
> >  {
> > -	return !inode->i_nlink || inode_unhashed(inode);
> > +	return !inode->i_nlink || inode_unhashed(inode) ||
> > +		(inode->i_state & I_DONTCACHE);
> >  }
> >  
> >  extern struct inode *ilookup5_nowait(struct super_block *sb,
> > -- 
> > 2.25.1
> > 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 9/9] Documentation/dax: Update Usage section
  2020-04-13  5:40 ` [PATCH V7 9/9] Documentation/dax: Update Usage section ira.weiny
@ 2020-04-13 16:19   ` Darrick J. Wong
  2020-04-14  4:38     ` Ira Weiny
  2020-04-14  5:21   ` Dan Williams
  1 sibling, 1 reply; 47+ messages in thread
From: Darrick J. Wong @ 2020-04-13 16:19 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Sun, Apr 12, 2020 at 10:40:46PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> Update the Usage section to reflect the new individual dax selection
> functionality.

Yum. :)

> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from V6:
> 	Update to allow setting FS_XFLAG_DAX any time.
> 	Update with list of behaviors from Darrick
> 	https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> 
> Changes from V5:
> 	Update to reflect the agreed upon semantics
> 	https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> ---
>  Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
>  1 file changed, 163 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> index 679729442fd2..af14c1b330a9 100644
> --- a/Documentation/filesystems/dax.txt
> +++ b/Documentation/filesystems/dax.txt
> @@ -17,11 +17,171 @@ For file mappings, the storage device is mapped directly into userspace.
>  Usage
>  -----
>  
> -If you have a block device which supports DAX, you can make a filesystem
> +If you have a block device which supports DAX, you can make a file system
>  on it as usual.  The DAX code currently only supports files with a block
>  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> -size when creating the filesystem.  When mounting it, use the "-o dax"
> -option on the command line or add 'dax' to the options in /etc/fstab.
> +size when creating the file system.
> +
> +Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
> +different at this time.

I thought ext2 supports DAX?

> +Enabling DAX on ext4
> +--------------------
> +
> +When mounting the filesystem, use the "-o dax" option on the command line or
> +add 'dax' to the options in /etc/fstab.
> +
> +
> +Enabling DAX on xfs
> +-------------------
> +
> +Summary
> +-------
> +
> + 1. There exists an in-kernel access mode flag S_DAX that is set when
> +    file accesses go directly to persistent memory, bypassing the page
> +    cache.  Applications must call statx to discover the current S_DAX
> +    state (STATX_ATTR_DAX).
> +
> + 2. There exists an advisory file inode flag FS_XFLAG_DAX that is
> +    inherited from the parent directory FS_XFLAG_DAX inode flag at file
> +    creation time.  This advisory flag can be set or cleared at any
> +    time, but doing so does not immediately affect the S_DAX state.
> +
> +    Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
> +    and the fs is on pmem then it will enable S_DAX at inode load time;
> +    if FS_XFLAG_DAX is not set, it will not enable S_DAX.
> +
> + 3. There exists a dax= mount option.
> +
> +    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
> +
> +    "-o dax=always" means "always set S_DAX (at least on pmem),
> +                    and ignore FS_XFLAG_DAX."
> +
> +    "-o dax"        is an alias for "dax=always".
> +
> +    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
> +
> + 4. There exists an advisory directory inode flag FS_XFLAG_DAX that can
> +    be set or cleared at any time.  The flag state is inherited by any files or
> +    subdirectories when they are created within that directory.
> +
> + 5. Programs that require a specific file access mode (DAX or not DAX)
> +    can do one of the following:
> +
> +    (a) Create files in directories that the FS_XFLAG_DAX flag set as
> +        needed; or
> +
> +    (b) Have the administrator set an override via mount option; or
> +
> +    (c) Set or clear the file's FS_XFLAG_DAX flag as needed.  Programs
> +        must then cause the kernel to evict the inode from memory.  This
> +        can be done by:
> +
> +        i>  Closing the file and re-opening the file and using statx to
> +            see if the fs has changed the S_DAX flag; and
> +
> +        ii> If the file still does not have the desired S_DAX access
> +            mode, either unmount and remount the filesystem, or close
> +            the file and use drop_caches.
> +
> + 6. It is expected that users who want to squeeze every last bit of performance
> +    out of the particular rough and tumble bits of their storage will also be
> +    exposed to the difficulties of what happens when the operating system can't
> +    totally virtualize those hardware capabilities.  DAX is such a feature.
> +    Basically, Formula-1 cars require a bit more care and feeding than your
> +    averaged Toyota minivan, as it were.

I think we can omit this last sentence for the formal documentation...
:)

> +
> +
> +Details
> +-------
> +
> +There are 2 per-file dax flags.  One is a physical inode setting (FS_XFLAG_DAX)
> +and the other a currently enabled state (S_DAX).
> +
> +FS_XFLAG_DAX is maintained, on disk, on individual inodes.  It is preserved
> +within the file system.  This 'physical' config setting can be set using an
> +ioctl and/or an application such as "xfs_io -c 'chattr [-+]x'".  Files and
> +directories automatically inherit FS_XFLAG_DAX from their parent directory
> +_when_ _created_.  Therefore, setting FS_XFLAG_DAX at directory creation time
> +can be used to set a default behavior for an entire sub-tree.  (Doing so on the
> +root directory acts to set a default for the entire file system.)
> +
> +To clarify inheritance here are 3 examples:
> +
> +Example A:
> +
> +mkdir -p a/b/c
> +xfs_io 'chattr +x' a
> +mkdir a/b/c/d
> +mkdir a/e
> +
> +	dax: a,e
> +	no dax: b,c,d
> +
> +Example B:
> +
> +mkdir a
> +xfs_io 'chattr +x' a
> +mkdir -p a/b/c/d
> +
> +	dax: a,b,c,d
> +	no dax:
> +
> +Example C:
> +
> +mkdir -p a/b/c
> +xfs_io 'chattr +x' c
> +mkdir a/b/c/d
> +
> +	dax: c,d
> +	no dax: a,b
> +
> +
> +The current enabled state (S_DAX) is set when a file inode is _loaded_ based on
> +the underlying media support, the value of FS_XFLAG_DAX, and the file systems
> +dax mount option setting.  See below.
> +
> +statx can be used to query S_DAX.  NOTE that a directory will never have S_DAX
> +set and therefore statx will always return false on directories.

"statx will never indicate that S_DAX is set on directories."

> +
> +NOTE: Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs
> +even if the underlying media does not support dax and/or the file system is
> +overridden with a mount option.
> +
> +
> +Overriding FS_XFLAG_DAX (dax= mount option)
> +-------------------------------------------
> +
> +There exists a dax mount option.  Using the mount option does not change the
> +physical configured state of individual files but overrides the S_DAX operating
> +state when inodes are loaded.
> +
> +Given underlying media support, the dax mount option is a tri-state option
> +(never, always, inode) with the following meanings:
> +
> +   "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
> +   "-o dax=always" means "always set S_DAX, ignore FS_XFLAG_DAX"
> +        "-o dax" by itself means "dax=always" to remain compatible with older
> +	         kernels
> +   "-o dax=inode" means "follow FS_XFLAG_DAX"
> +
> +The default state is 'inode'.  Given underlying media support, the following
> +algorithm is used to determine the effective mode of the file S_DAX on a
> +capable device.
> +
> +	S_DAX = FS_XFLAG_DAX;
> +
> +	if (dax_mount == "always")
> +		S_DAX = true;
> +	else if (dax_mount == "off"
> +		S_DAX = false;
> +
> +To reiterate: Setting, and inheritance, continues to affect FS_XFLAG_DAX even
> +while the file system is mounted with a dax override.  However, file enabled
> +state, S_DAX, will continue to be the overridden until the file system is
> +remounted with dax=inode.

"However, in-core inode state (S_DAX) will continue to be overridden
until the filesystem is remounted with dax=inode and the inode is
evicted."

...since we don't currently evict inodes just because a remount occurred.
:)

--D

>  
>  
>  Implementation Tips for Block Driver Writers
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 4/9] fs/xfs: Make DAX mount option a tri-state
  2020-04-13 15:46   ` Darrick J. Wong
@ 2020-04-13 19:28     ` Ira Weiny
  2020-04-14  6:24       ` Christoph Hellwig
  0 siblings, 1 reply; 47+ messages in thread
From: Ira Weiny @ 2020-04-13 19:28 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-kernel, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Mon, Apr 13, 2020 at 08:46:19AM -0700, Darrick J. Wong wrote:
> On Sun, Apr 12, 2020 at 10:40:41PM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > As agreed upon[1].  We make the dax mount option a tri-state.  '-o dax'
> > continues to operate the same.  We add 'always', 'never', and 'inode'
> > (default).
> > 
> > [1] https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > 
> > ---
> > Changes from v6:
> > 	Use 2 flag bits rather than a field.
> > 	change iflag to inode
> > 
> > Changes from v5:
> > 	New Patch
> > ---
> >  fs/xfs/xfs_mount.h |  3 ++-
> >  fs/xfs/xfs_super.c | 44 ++++++++++++++++++++++++++++++++++++++++----
> >  2 files changed, 42 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > index 88ab09ed29e7..d581b990e59a 100644
> > --- a/fs/xfs/xfs_mount.h
> > +++ b/fs/xfs/xfs_mount.h
> > @@ -233,7 +233,8 @@ typedef struct xfs_mount {
> >  						   allocator */
> >  #define XFS_MOUNT_NOATTR2	(1ULL << 25)	/* disable use of attr2 format */
> >  
> > -#define XFS_MOUNT_DAX		(1ULL << 62)	/* TEST ONLY! */
> > +#define XFS_MOUNT_DAX		(1ULL << 62)
> > +#define XFS_MOUNT_NODAX		(1ULL << 63)
> >  
> >  /*
> >   * Max and min values for mount-option defined I/O
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 2094386af8ac..d7bd8f5e00c9 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -47,6 +47,32 @@ static struct kset *xfs_kset;		/* top-level xfs sysfs dir */
> >  static struct xfs_kobj xfs_dbg_kobj;	/* global debug sysfs attrs */
> >  #endif
> >  
> > +enum {
> > +	XFS_DAX_INODE = 0,
> > +	XFS_DAX_ALWAYS = 1,
> > +	XFS_DAX_NEVER = 2,
> > +};
> > +
> > +static void xfs_mount_set_dax_mode(struct xfs_mount *mp, u32 val)
> > +{
> > +	if (val == XFS_DAX_INODE) {
> > +		mp->m_flags &= ~(XFS_MOUNT_DAX | XFS_MOUNT_NODAX);
> > +	} else if (val == XFS_DAX_ALWAYS) {
> > +		mp->m_flags &= ~XFS_MOUNT_NODAX;
> > +		mp->m_flags |= XFS_MOUNT_DAX;
> > +	} else if (val == XFS_DAX_NEVER) {
> > +		mp->m_flags &= ~XFS_MOUNT_DAX;
> > +		mp->m_flags |= XFS_MOUNT_NODAX;
> > +	}
> > +}
> > +
> > +static const struct constant_table dax_param_enums[] = {
> > +	{"inode",	XFS_DAX_INODE },
> > +	{"always",	XFS_DAX_ALWAYS },
> > +	{"never",	XFS_DAX_NEVER },
> > +	{}
> 
> I think that the dax_param_enums table (and the unnamed enum defining
> XFS_DAX_*) probably ought to be part of the VFS so that you don't have
> to duplicate these two pieces whenever it's time to bring ext4 in line
> with XFS.
> 
> That probably doesn't need to be done right away, though...

Ext4 has a very different param parsing mechanism which I've barely learned.
I'm not really seeing how to use the enum strategy so I've just used a string
option.  But I'm open to being corrected.

I am close to having the series working and hope to have that set (which builds
on this one) out for review soon (today?).

> 
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks,
Ira

> 
> --D
> 
> 
> > +};
> > +
> >  /*
> >   * Table driven mount option parser.
> >   */
> > @@ -59,7 +85,7 @@ enum {
> >  	Opt_filestreams, Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota,
> >  	Opt_prjquota, Opt_uquota, Opt_gquota, Opt_pquota,
> >  	Opt_uqnoenforce, Opt_gqnoenforce, Opt_pqnoenforce, Opt_qnoenforce,
> > -	Opt_discard, Opt_nodiscard, Opt_dax,
> > +	Opt_discard, Opt_nodiscard, Opt_dax, Opt_dax_enum,
> >  };
> >  
> >  static const struct fs_parameter_spec xfs_fs_parameters[] = {
> > @@ -103,6 +129,7 @@ static const struct fs_parameter_spec xfs_fs_parameters[] = {
> >  	fsparam_flag("discard",		Opt_discard),
> >  	fsparam_flag("nodiscard",	Opt_nodiscard),
> >  	fsparam_flag("dax",		Opt_dax),
> > +	fsparam_enum("dax",		Opt_dax_enum, dax_param_enums),
> >  	{}
> >  };
> >  
> > @@ -129,7 +156,6 @@ xfs_fs_show_options(
> >  		{ XFS_MOUNT_GRPID,		",grpid" },
> >  		{ XFS_MOUNT_DISCARD,		",discard" },
> >  		{ XFS_MOUNT_LARGEIO,		",largeio" },
> > -		{ XFS_MOUNT_DAX,		",dax" },
> >  		{ 0, NULL }
> >  	};
> >  	struct xfs_mount	*mp = XFS_M(root->d_sb);
> > @@ -185,6 +211,13 @@ xfs_fs_show_options(
> >  	if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT))
> >  		seq_puts(m, ",noquota");
> >  
> > +	if (mp->m_flags & XFS_MOUNT_DAX)
> > +		seq_puts(m, ",dax=always");
> > +	else if (mp->m_flags & XFS_MOUNT_NODAX)
> > +		seq_puts(m, ",dax=never");
> > +	else
> > +		seq_puts(m, ",dax=inode");
> > +
> >  	return 0;
> >  }
> >  
> > @@ -1244,7 +1277,10 @@ xfs_fc_parse_param(
> >  		return 0;
> >  #ifdef CONFIG_FS_DAX
> >  	case Opt_dax:
> > -		mp->m_flags |= XFS_MOUNT_DAX;
> > +		xfs_mount_set_dax_mode(mp, XFS_DAX_ALWAYS);
> > +		return 0;
> > +	case Opt_dax_enum:
> > +		xfs_mount_set_dax_mode(mp, result.uint_32);
> >  		return 0;
> >  #endif
> >  	default:
> > @@ -1451,7 +1487,7 @@ xfs_fc_fill_super(
> >  		if (!rtdev_is_dax && !datadev_is_dax) {
> >  			xfs_alert(mp,
> >  			"DAX unsupported by block device. Turning off DAX.");
> > -			mp->m_flags &= ~XFS_MOUNT_DAX;
> > +			xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
> >  		}
> >  		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> >  			xfs_alert(mp,
> > -- 
> > 2.25.1
> > 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 5/9] fs/xfs: Create function xfs_inode_enable_dax()
  2020-04-13 15:52   ` Darrick J. Wong
@ 2020-04-13 19:39     ` Ira Weiny
  0 siblings, 0 replies; 47+ messages in thread
From: Ira Weiny @ 2020-04-13 19:39 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-kernel, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Mon, Apr 13, 2020 at 08:52:51AM -0700, Darrick J. Wong wrote:
> On Sun, Apr 12, 2020 at 10:40:42PM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > xfs_inode_supports_dax() should reflect if the inode can support DAX not
> > that it is enabled for DAX.
> > 
> > Change the use of xfs_inode_supports_dax() to reflect only if the inode
> > and underlying storage support dax.
> > 
> > Add a new function xfs_inode_enable_dax() which reflects if the inode
> > should be enabled for DAX.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > 
> > ---
> > Changes from v6:
> > 	Change enable checks to be sequential logic.
> > 	Update for 2 bit tri-state option.
> > 	Make 'static' consistent.
> > 	Don't set S_DAX if !CONFIG_FS_DAX
> > 
> > Changes from v5:
> > 	Update to reflect the new tri-state mount option
> > 
> > Changes from v3:
> > 	Update functions and names to be more clear
> > 	Update commit message
> > 	Merge with
> > 		'fs/xfs: Clean up DAX support check'
> > 		don't allow IS_DAX() on a directory
> > 		use STATIC macro for static
> > 		make xfs_inode_supports_dax() static
> > ---
> >  fs/xfs/xfs_iops.c | 34 +++++++++++++++++++++++++++++-----
> >  1 file changed, 29 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > index 81f2f93caec0..37bd15b55878 100644
> > --- a/fs/xfs/xfs_iops.c
> > +++ b/fs/xfs/xfs_iops.c
> > @@ -1244,12 +1244,11 @@ xfs_inode_supports_dax(
> >  	struct xfs_mount	*mp = ip->i_mount;
> >  
> >  	/* Only supported on non-reflinked files. */
> > -	if (!S_ISREG(VFS_I(ip)->i_mode) || xfs_is_reflink_inode(ip))
> > +	if (xfs_is_reflink_inode(ip))
> >  		return false;
> >  
> > -	/* DAX mount option or DAX iflag must be set. */
> > -	if (!(mp->m_flags & XFS_MOUNT_DAX) &&
> > -	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
> > +	/* Only supported on regular files. */
> > +	if (!S_ISREG(VFS_I(ip)->i_mode))
> >  		return false;
> 
> Why separate the !S_ISREG and the is_reflink_inode checks?
> 
> The part about "Change the use of xfs_inode_supports_dax() to reflect
> only if the inode and underlying storage support dax" would be a lot
> more straightforward if this hunk only removed the DIFLAG2_DAX check.

Yes I could see that.  But for me the separate checks were more clear.  FWIW,
Dave requested a similar pattern for xfs_inode_enable_dax()[*] and I think I
agree with him.

So I'm inclined to keep the checks like this.

Thanks for the reviews!
Ira

[*] https://lore.kernel.org/lkml/20200408000533.GG24067@dread.disaster.area/

> 
> The rest of the patch looks ok.
> 
> --D
> 
> >  
> >  	/* Block size must match page size */
> > @@ -1260,6 +1259,31 @@ xfs_inode_supports_dax(
> >  	return xfs_inode_buftarg(ip)->bt_daxdev != NULL;
> >  }
> >  
> > +#ifdef CONFIG_FS_DAX
> > +static bool
> > +xfs_inode_enable_dax(
> > +	struct xfs_inode *ip)
> > +{
> > +	if (ip->i_mount->m_flags & XFS_MOUNT_NODAX)
> > +		return false;
> > +	if (!xfs_inode_supports_dax(ip))
> > +		return false;
> > +	if (ip->i_mount->m_flags & XFS_MOUNT_DAX)
> > +		return true;
> > +	if (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)
> > +		return true;
> > +	return false;
> > +}
> > +#else /* !CONFIG_FS_DAX */
> > +static bool
> > +xfs_inode_enable_dax(
> > +	struct xfs_inode *ip)
> > +{
> > +	return false;
> > +}
> > +#endif /* CONFIG_FS_DAX */
> > +
> > +
> >  STATIC void
> >  xfs_diflags_to_iflags(
> >  	struct inode		*inode,
> > @@ -1278,7 +1302,7 @@ xfs_diflags_to_iflags(
> >  		inode->i_flags |= S_SYNC;
> >  	if (flags & XFS_DIFLAG_NOATIME)
> >  		inode->i_flags |= S_NOATIME;
> > -	if (xfs_inode_supports_dax(ip))
> > +	if (xfs_inode_enable_dax(ip))
> >  		inode->i_flags |= S_DAX;
> >  }
> >  
> > -- 
> > 2.25.1
> > 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 7/9] fs: Define I_DONTCACNE in VFS layer
  2020-04-13 16:09   ` Darrick J. Wong
  2020-04-13 16:13     ` Darrick J. Wong
@ 2020-04-13 19:44     ` Ira Weiny
  1 sibling, 0 replies; 47+ messages in thread
From: Ira Weiny @ 2020-04-13 19:44 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-kernel, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Mon, Apr 13, 2020 at 09:09:29AM -0700, Darrick J. Wong wrote:
> > Subject: [PATCH V7 7/9] fs: Define I_DONTCACNE in VFS layer
> 
> CACNE -> CACHE.
> 
> On Sun, Apr 12, 2020 at 10:40:44PM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > DAX effective mode changes (setting of S_DAX) require inode eviction.
> > 
> > Define a flag which can be set to inform the VFS layer that inodes
> > should not be cached.  This will expedite the eviction of those nodes
> > requiring reload.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > ---
> >  include/linux/fs.h | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index a818ced22961..e2db71d150c3 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2151,6 +2151,8 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
> >   *
> >   * I_CREATING		New object's inode in the middle of setting up.
> >   *
> > + * I_DONTCACHE		Do not cache the inode
> 
> "Do not cache" is a bit vague, how about:
> 
> "Evict the inode when the last reference is dropped.
> Do not put it on the LRU list."
> 
> Also, shouldn't xfs_ioctl_setattr be setting I_DONTCACHE if someone
> changes FS_XFLAG_DAX (and there are no mount option overrides)?  I don't
> see any user of I_DONTCACHE in this series.
> 
> (Also also, please convert XFS_IDONTCACHE, since it's a straightforward
> conversion...)

AFAICT XFS_IDONTCACHE is not exactly the same because it can be cleared if
someone access' the inode before it is evicted.  Dave mentioned that we could
probably do this but I was not 100% sure if that would change some other
behavior.

I'm happy to remove XFS_IDONTCACHE if we are sure that it will not regress
something in the bulkstat code?  (I don't know exactly what bulkstat does so
I'm not expert here...  Was just doing what seemed safest)

Ira

> 
> --D
> 
> > + *
> >   * Q: What is the difference between I_WILL_FREE and I_FREEING?
> >   */
> >  #define I_DIRTY_SYNC		(1 << 0)
> > @@ -2173,6 +2175,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
> >  #define I_WB_SWITCH		(1 << 13)
> >  #define I_OVL_INUSE		(1 << 14)
> >  #define I_CREATING		(1 << 15)
> > +#define I_DONTCACHE		(1 << 16)
> >  
> >  #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
> >  #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
> > @@ -3042,7 +3045,8 @@ extern int inode_needs_sync(struct inode *inode);
> >  extern int generic_delete_inode(struct inode *inode);
> >  static inline int generic_drop_inode(struct inode *inode)
> >  {
> > -	return !inode->i_nlink || inode_unhashed(inode);
> > +	return !inode->i_nlink || inode_unhashed(inode) ||
> > +		(inode->i_state & I_DONTCACHE);
> >  }
> >  
> >  extern struct inode *ilookup5_nowait(struct super_block *sb,
> > -- 
> > 2.25.1
> > 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 8/9] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate()
  2020-04-13 16:12   ` Darrick J. Wong
@ 2020-04-13 19:46     ` Ira Weiny
  0 siblings, 0 replies; 47+ messages in thread
From: Ira Weiny @ 2020-04-13 19:46 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-kernel, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Mon, Apr 13, 2020 at 09:12:40AM -0700, Darrick J. Wong wrote:
> On Sun, Apr 12, 2020 at 10:40:45PM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > We only support changing FS_XFLAG_DAX on directories.  Files get their
> > flag from the parent directory on creation only.  So no data
> > invalidation needs to happen.
> > 
> > Alter the xfs_ioctl_setattr_dax_invalidate() to be
> > xfs_ioctl_setattr_dax_validate().  xfs_ioctl_setattr_dax_validate() now
> > validates that any FS_XFLAG_DAX change is ok.
> > 
> > This also allows use to remove the join_flags logic.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > 
> > ---
> > Changes from v6:
> > 	Fix completely broken implementation and update commit message.
> > 	Use the new VFS layer I_DONTCACHE to facilitate inode eviction
> > 	and S_DAX changing on drop_caches
> > 
> > Changes from v5:
> > 	New patch
> > ---
> >  fs/xfs/xfs_ioctl.c | 102 +++++++--------------------------------------
> >  1 file changed, 16 insertions(+), 86 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index c6cd92ef4a05..ba42a5fb5b05 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -1145,63 +1145,23 @@ xfs_ioctl_setattr_xflags(
> >  }
> >  
> >  /*
> > - * If we are changing DAX flags, we have to ensure the file is clean and any
> > - * cached objects in the address space are invalidated and removed. This
> > - * requires us to lock out other IO and page faults similar to a truncate
> > - * operation. The locks need to be held until the transaction has been committed
> > - * so that the cache invalidation is atomic with respect to the DAX flag
> > - * manipulation.
> > + * Mark inodes with a changing FS_XFLAG_DAX, I_DONTCACHE
> >   */
> > -static int
> > +static void
> >  xfs_ioctl_setattr_dax_invalidate(
> >  	struct xfs_inode	*ip,
> > -	struct fsxattr		*fa,
> > -	int			*join_flags)
> > +	struct fsxattr		*fa)
> >  {
> > -	struct inode		*inode = VFS_I(ip);
> > -	struct super_block	*sb = inode->i_sb;
> > -	int			error;
> > -
> > -	*join_flags = 0;
> > -
> > -	/*
> > -	 * It is only valid to set the DAX flag on regular files and
> > -	 * directories on filesystems where the block size is equal to the page
> > -	 * size. On directories it serves as an inherited hint so we don't
> > -	 * have to check the device for dax support or flush pagecache.
> > -	 */
> > -	if (fa->fsx_xflags & FS_XFLAG_DAX) {
> > -		struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
> > -
> > -		if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize))
> > -			return -EINVAL;
> > -	}
> > -
> > -	/* If the DAX state is not changing, we have nothing to do here. */
> > -	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
> > -		return 0;
> > -	if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
> > -		return 0;
> > +	struct inode            *inode = VFS_I(ip);
> >  
> >  	if (S_ISDIR(inode->i_mode))
> > -		return 0;
> > -
> > -	/* lock, flush and invalidate mapping in preparation for flag change */
> > -	xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> > -	error = filemap_write_and_wait(inode->i_mapping);
> > -	if (error)
> > -		goto out_unlock;
> > -	error = invalidate_inode_pages2(inode->i_mapping);
> > -	if (error)
> > -		goto out_unlock;
> > -
> > -	*join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
> > -	return 0;
> > -
> > -out_unlock:
> > -	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> > -	return error;
> > +		return;
> 
> We also need a check up here to skip the I_DONTCACHE setting if the
> admin has set a mount option to override the inode flag.

Yes that would be more optimal!

Thanks,
Ira

> 
> The rest looks good to me.
> 
> --D
> 
> > +	if (((fa->fsx_xflags & FS_XFLAG_DAX) &&
> > +	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) ||
> > +	    (!(fa->fsx_xflags & FS_XFLAG_DAX) &&
> > +	     (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)))
> > +		inode->i_state |= I_DONTCACHE;
> >  }
> >  
> >  /*
> > @@ -1209,17 +1169,10 @@ xfs_ioctl_setattr_dax_invalidate(
> >   * have permission to do so. On success, return a clean transaction and the
> >   * inode locked exclusively ready for further operation specific checks. On
> >   * failure, return an error without modifying or locking the inode.
> > - *
> > - * The inode might already be IO locked on call. If this is the case, it is
> > - * indicated in @join_flags and we take full responsibility for ensuring they
> > - * are unlocked from now on. Hence if we have an error here, we still have to
> > - * unlock them. Otherwise, once they are joined to the transaction, they will
> > - * be unlocked on commit/cancel.
> >   */
> >  static struct xfs_trans *
> >  xfs_ioctl_setattr_get_trans(
> > -	struct xfs_inode	*ip,
> > -	int			join_flags)
> > +	struct xfs_inode	*ip)
> >  {
> >  	struct xfs_mount	*mp = ip->i_mount;
> >  	struct xfs_trans	*tp;
> > @@ -1236,8 +1189,7 @@ xfs_ioctl_setattr_get_trans(
> >  		goto out_unlock;
> >  
> >  	xfs_ilock(ip, XFS_ILOCK_EXCL);
> > -	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
> > -	join_flags = 0;
> > +	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> >  
> >  	/*
> >  	 * CAP_FOWNER overrides the following restrictions:
> > @@ -1258,8 +1210,6 @@ xfs_ioctl_setattr_get_trans(
> >  out_cancel:
> >  	xfs_trans_cancel(tp);
> >  out_unlock:
> > -	if (join_flags)
> > -		xfs_iunlock(ip, join_flags);
> >  	return ERR_PTR(error);
> >  }
> >  
> > @@ -1386,7 +1336,6 @@ xfs_ioctl_setattr(
> >  	struct xfs_dquot	*pdqp = NULL;
> >  	struct xfs_dquot	*olddquot = NULL;
> >  	int			code;
> > -	int			join_flags = 0;
> >  
> >  	trace_xfs_ioctl_setattr(ip);
> >  
> > @@ -1410,18 +1359,9 @@ xfs_ioctl_setattr(
> >  			return code;
> >  	}
> >  
> > -	/*
> > -	 * Changing DAX config may require inode locking for mapping
> > -	 * invalidation. These need to be held all the way to transaction commit
> > -	 * or cancel time, so need to be passed through to
> > -	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> > -	 * appropriately.
> > -	 */
> > -	code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags);
> > -	if (code)
> > -		goto error_free_dquots;
> > +	xfs_ioctl_setattr_dax_invalidate(ip, fa);
> >  
> > -	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> > +	tp = xfs_ioctl_setattr_get_trans(ip);
> >  	if (IS_ERR(tp)) {
> >  		code = PTR_ERR(tp);
> >  		goto error_free_dquots;
> > @@ -1552,7 +1492,6 @@ xfs_ioc_setxflags(
> >  	struct fsxattr		fa;
> >  	struct fsxattr		old_fa;
> >  	unsigned int		flags;
> > -	int			join_flags = 0;
> >  	int			error;
> >  
> >  	if (copy_from_user(&flags, arg, sizeof(flags)))
> > @@ -1569,18 +1508,9 @@ xfs_ioc_setxflags(
> >  	if (error)
> >  		return error;
> >  
> > -	/*
> > -	 * Changing DAX config may require inode locking for mapping
> > -	 * invalidation. These need to be held all the way to transaction commit
> > -	 * or cancel time, so need to be passed through to
> > -	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> > -	 * appropriately.
> > -	 */
> > -	error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags);
> > -	if (error)
> > -		goto out_drop_write;
> > +	xfs_ioctl_setattr_dax_invalidate(ip, &fa);
> >  
> > -	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> > +	tp = xfs_ioctl_setattr_get_trans(ip);
> >  	if (IS_ERR(tp)) {
> >  		error = PTR_ERR(tp);
> >  		goto out_drop_write;
> > -- 
> > 2.25.1
> > 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 6/9] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-13 16:01   ` Darrick J. Wong
@ 2020-04-14  4:07     ` Ira Weiny
  0 siblings, 0 replies; 47+ messages in thread
From: Ira Weiny @ 2020-04-14  4:07 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-kernel, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Mon, Apr 13, 2020 at 09:01:38AM -0700, Darrick J. Wong wrote:
> On Sun, Apr 12, 2020 at 10:40:43PM -0700, ira.weiny@intel.com wrote:

[snip]

> >  
> > -STATIC void
> > +void
> >  xfs_diflags_to_iflags(
> > -	struct inode		*inode,
> > -	struct xfs_inode	*ip)
> > +	struct xfs_inode	*ip,
> > +	bool init)
> >  {
> > -	uint16_t		flags = ip->i_d.di_flags;
> > -
> > -	inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC |
> > -			    S_NOATIME | S_DAX);
> > -
> > -	if (flags & XFS_DIFLAG_IMMUTABLE)
> > -		inode->i_flags |= S_IMMUTABLE;
> > -	if (flags & XFS_DIFLAG_APPEND)
> > -		inode->i_flags |= S_APPEND;
> > -	if (flags & XFS_DIFLAG_SYNC)
> > -		inode->i_flags |= S_SYNC;
> > -	if (flags & XFS_DIFLAG_NOATIME)
> > -		inode->i_flags |= S_NOATIME;
> > -	if (xfs_inode_enable_dax(ip))
> > -		inode->i_flags |= S_DAX;
> > +	struct inode            *inode = VFS_I(ip);
> > +	unsigned int            xflags = xfs_ip2xflags(ip);
> > +	unsigned int            flags = 0;
> > +
> > +	ASSERT(!(IS_DAX(inode) && init));
> > +
> > +	if (xflags & FS_XFLAG_IMMUTABLE)
> > +		flags |= S_IMMUTABLE;
> > +	if (xflags & FS_XFLAG_APPEND)
> > +		flags |= S_APPEND;
> > +	if (xflags & FS_XFLAG_SYNC)
> > +		flags |= S_SYNC;
> > +	if (xflags & FS_XFLAG_NOATIME)
> > +		flags |= S_NOATIME;
> > +	if (init && xfs_inode_enable_dax(ip))
> > +		flags |= S_DAX;
> > +
> > +	inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME);
> 
> I noticed that S_DAX drops out of the mask out operation here, which of
> course resulted in an eyebrow-raise because the other four flags are
> always set to whatever we just computed. :)
> 
> Then I realized that yes, this is intentional since we can't change
> S_DAX on the fly, and that S_DAX is never set i_flags on an inode that's
> being initialized so we don't need to mask off S_DAX ever.
> 
> Could we add a comment here to remind the reader that S_DAX is a bit
> special?
> 
> /*
>  * S_DAX can only be set during inode initialization and is never set by
>  * the VFS, so we cannot mask off S_DAX in i_flags.
>  */
> 
> With that added,
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

Added that comment.

Thanks for the review!
Ira

> 
> --D

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 9/9] Documentation/dax: Update Usage section
  2020-04-13 16:19   ` Darrick J. Wong
@ 2020-04-14  4:38     ` Ira Weiny
  2020-04-14  5:12       ` Dan Williams
  0 siblings, 1 reply; 47+ messages in thread
From: Ira Weiny @ 2020-04-14  4:38 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-kernel, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Mon, Apr 13, 2020 at 09:19:12AM -0700, Darrick J. Wong wrote:
> On Sun, Apr 12, 2020 at 10:40:46PM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > Update the Usage section to reflect the new individual dax selection
> > functionality.
> 
> Yum. :)
> 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > 
> > ---
> > Changes from V6:
> > 	Update to allow setting FS_XFLAG_DAX any time.
> > 	Update with list of behaviors from Darrick
> > 	https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> > 
> > Changes from V5:
> > 	Update to reflect the agreed upon semantics
> > 	https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > ---
> >  Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
> >  1 file changed, 163 insertions(+), 3 deletions(-)
> > 
> > diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> > index 679729442fd2..af14c1b330a9 100644
> > --- a/Documentation/filesystems/dax.txt
> > +++ b/Documentation/filesystems/dax.txt
> > @@ -17,11 +17,171 @@ For file mappings, the storage device is mapped directly into userspace.
> >  Usage
> >  -----
> >  
> > -If you have a block device which supports DAX, you can make a filesystem
> > +If you have a block device which supports DAX, you can make a file system
> >  on it as usual.  The DAX code currently only supports files with a block
> >  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> > -size when creating the filesystem.  When mounting it, use the "-o dax"
> > -option on the command line or add 'dax' to the options in /etc/fstab.
> > +size when creating the file system.
> > +
> > +Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
> > +different at this time.
> 
> I thought ext2 supports DAX?

Not that I know of?  Does it?

> 
> > +Enabling DAX on ext4
> > +--------------------
> > +
> > +When mounting the filesystem, use the "-o dax" option on the command line or
> > +add 'dax' to the options in /etc/fstab.
> > +
> > +
> > +Enabling DAX on xfs
> > +-------------------
> > +
> > +Summary
> > +-------
> > +
> > + 1. There exists an in-kernel access mode flag S_DAX that is set when
> > +    file accesses go directly to persistent memory, bypassing the page
> > +    cache.  Applications must call statx to discover the current S_DAX
> > +    state (STATX_ATTR_DAX).
> > +
> > + 2. There exists an advisory file inode flag FS_XFLAG_DAX that is
> > +    inherited from the parent directory FS_XFLAG_DAX inode flag at file
> > +    creation time.  This advisory flag can be set or cleared at any
> > +    time, but doing so does not immediately affect the S_DAX state.
> > +
> > +    Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
> > +    and the fs is on pmem then it will enable S_DAX at inode load time;
> > +    if FS_XFLAG_DAX is not set, it will not enable S_DAX.
> > +
> > + 3. There exists a dax= mount option.
> > +
> > +    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
> > +
> > +    "-o dax=always" means "always set S_DAX (at least on pmem),
> > +                    and ignore FS_XFLAG_DAX."
> > +
> > +    "-o dax"        is an alias for "dax=always".
> > +
> > +    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
> > +
> > + 4. There exists an advisory directory inode flag FS_XFLAG_DAX that can
> > +    be set or cleared at any time.  The flag state is inherited by any files or
> > +    subdirectories when they are created within that directory.
> > +
> > + 5. Programs that require a specific file access mode (DAX or not DAX)
> > +    can do one of the following:
> > +
> > +    (a) Create files in directories that the FS_XFLAG_DAX flag set as
> > +        needed; or
> > +
> > +    (b) Have the administrator set an override via mount option; or
> > +
> > +    (c) Set or clear the file's FS_XFLAG_DAX flag as needed.  Programs
> > +        must then cause the kernel to evict the inode from memory.  This
> > +        can be done by:
> > +
> > +        i>  Closing the file and re-opening the file and using statx to
> > +            see if the fs has changed the S_DAX flag; and
> > +
> > +        ii> If the file still does not have the desired S_DAX access
> > +            mode, either unmount and remount the filesystem, or close
> > +            the file and use drop_caches.
> > +
> > + 6. It is expected that users who want to squeeze every last bit of performance
> > +    out of the particular rough and tumble bits of their storage will also be
> > +    exposed to the difficulties of what happens when the operating system can't
> > +    totally virtualize those hardware capabilities.  DAX is such a feature.
> > +    Basically, Formula-1 cars require a bit more care and feeding than your
> > +    averaged Toyota minivan, as it were.
> 
> I think we can omit this last sentence for the formal documentation...

Done.

> :)
> 
> > +
> > +
> > +Details
> > +-------
> > +
> > +There are 2 per-file dax flags.  One is a physical inode setting (FS_XFLAG_DAX)
> > +and the other a currently enabled state (S_DAX).
> > +
> > +FS_XFLAG_DAX is maintained, on disk, on individual inodes.  It is preserved
> > +within the file system.  This 'physical' config setting can be set using an
> > +ioctl and/or an application such as "xfs_io -c 'chattr [-+]x'".  Files and
> > +directories automatically inherit FS_XFLAG_DAX from their parent directory
> > +_when_ _created_.  Therefore, setting FS_XFLAG_DAX at directory creation time
> > +can be used to set a default behavior for an entire sub-tree.  (Doing so on the
> > +root directory acts to set a default for the entire file system.)
> > +
> > +To clarify inheritance here are 3 examples:
> > +
> > +Example A:
> > +
> > +mkdir -p a/b/c
> > +xfs_io 'chattr +x' a
> > +mkdir a/b/c/d
> > +mkdir a/e
> > +
> > +	dax: a,e
> > +	no dax: b,c,d
> > +
> > +Example B:
> > +
> > +mkdir a
> > +xfs_io 'chattr +x' a
> > +mkdir -p a/b/c/d
> > +
> > +	dax: a,b,c,d
> > +	no dax:
> > +
> > +Example C:
> > +
> > +mkdir -p a/b/c
> > +xfs_io 'chattr +x' c
> > +mkdir a/b/c/d
> > +
> > +	dax: c,d
> > +	no dax: a,b
> > +
> > +
> > +The current enabled state (S_DAX) is set when a file inode is _loaded_ based on
> > +the underlying media support, the value of FS_XFLAG_DAX, and the file systems
> > +dax mount option setting.  See below.
> > +
> > +statx can be used to query S_DAX.  NOTE that a directory will never have S_DAX
> > +set and therefore statx will always return false on directories.
> 
> "statx will never indicate that S_DAX is set on directories."

Done.

> 
> > +
> > +NOTE: Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs
> > +even if the underlying media does not support dax and/or the file system is
> > +overridden with a mount option.
> > +
> > +
> > +Overriding FS_XFLAG_DAX (dax= mount option)
> > +-------------------------------------------
> > +
> > +There exists a dax mount option.  Using the mount option does not change the
> > +physical configured state of individual files but overrides the S_DAX operating
> > +state when inodes are loaded.
> > +
> > +Given underlying media support, the dax mount option is a tri-state option
> > +(never, always, inode) with the following meanings:
> > +
> > +   "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
> > +   "-o dax=always" means "always set S_DAX, ignore FS_XFLAG_DAX"
> > +        "-o dax" by itself means "dax=always" to remain compatible with older
> > +	         kernels
> > +   "-o dax=inode" means "follow FS_XFLAG_DAX"
> > +
> > +The default state is 'inode'.  Given underlying media support, the following
> > +algorithm is used to determine the effective mode of the file S_DAX on a
> > +capable device.
> > +
> > +	S_DAX = FS_XFLAG_DAX;
> > +
> > +	if (dax_mount == "always")
> > +		S_DAX = true;
> > +	else if (dax_mount == "off"
> > +		S_DAX = false;
> > +
> > +To reiterate: Setting, and inheritance, continues to affect FS_XFLAG_DAX even
> > +while the file system is mounted with a dax override.  However, file enabled
> > +state, S_DAX, will continue to be the overridden until the file system is
> > +remounted with dax=inode.
> 
> "However, in-core inode state (S_DAX) will continue to be overridden
> until the filesystem is remounted with dax=inode and the inode is
> evicted."
> 
> ...since we don't currently evict inodes just because a remount occurred.
> :)

Done

Thanks again for the review!  :-D

Ira

> 
> --D
> 
> >  
> >  
> >  Implementation Tips for Block Driver Writers
> > -- 
> > 2.25.1
> > 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 9/9] Documentation/dax: Update Usage section
  2020-04-14  4:38     ` Ira Weiny
@ 2020-04-14  5:12       ` Dan Williams
  2020-04-14 19:48         ` Ira Weiny
  0 siblings, 1 reply; 47+ messages in thread
From: Dan Williams @ 2020-04-14  5:12 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Darrick J. Wong, Linux Kernel Mailing List, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Mon, Apr 13, 2020 at 9:38 PM Ira Weiny <ira.weiny@intel.com> wrote:
>
> On Mon, Apr 13, 2020 at 09:19:12AM -0700, Darrick J. Wong wrote:
> > On Sun, Apr 12, 2020 at 10:40:46PM -0700, ira.weiny@intel.com wrote:
> > > From: Ira Weiny <ira.weiny@intel.com>
> > >
> > > Update the Usage section to reflect the new individual dax selection
> > > functionality.
> >
> > Yum. :)
> >
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > >
> > > ---
> > > Changes from V6:
> > >     Update to allow setting FS_XFLAG_DAX any time.
> > >     Update with list of behaviors from Darrick
> > >     https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> > >
> > > Changes from V5:
> > >     Update to reflect the agreed upon semantics
> > >     https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > > ---
> > >  Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
> > >  1 file changed, 163 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> > > index 679729442fd2..af14c1b330a9 100644
> > > --- a/Documentation/filesystems/dax.txt
> > > +++ b/Documentation/filesystems/dax.txt
> > > @@ -17,11 +17,171 @@ For file mappings, the storage device is mapped directly into userspace.
> > >  Usage
> > >  -----
> > >
> > > -If you have a block device which supports DAX, you can make a filesystem
> > > +If you have a block device which supports DAX, you can make a file system
> > >  on it as usual.  The DAX code currently only supports files with a block
> > >  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> > > -size when creating the filesystem.  When mounting it, use the "-o dax"
> > > -option on the command line or add 'dax' to the options in /etc/fstab.
> > > +size when creating the file system.
> > > +
> > > +Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
> > > +different at this time.
> >
> > I thought ext2 supports DAX?
>
> Not that I know of?  Does it?

Yes. Seemed like a good idea at the time, but in retrospect...

In fairness I believe this was also an olive branch to XIP users that
were transitioned to DAX, so they did not also need to transition
filesystems.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 9/9] Documentation/dax: Update Usage section
  2020-04-13  5:40 ` [PATCH V7 9/9] Documentation/dax: Update Usage section ira.weiny
  2020-04-13 16:19   ` Darrick J. Wong
@ 2020-04-14  5:21   ` Dan Williams
  2020-04-14 16:15     ` Darrick J. Wong
  1 sibling, 1 reply; 47+ messages in thread
From: Dan Williams @ 2020-04-14  5:21 UTC (permalink / raw)
  To: Weiny, Ira
  Cc: Linux Kernel Mailing List, Darrick J. Wong, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Sun, Apr 12, 2020 at 10:41 PM <ira.weiny@intel.com> wrote:
>
> From: Ira Weiny <ira.weiny@intel.com>
>
> Update the Usage section to reflect the new individual dax selection
> functionality.
>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
>
> ---
> Changes from V6:
>         Update to allow setting FS_XFLAG_DAX any time.
>         Update with list of behaviors from Darrick
>         https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
>
> Changes from V5:
>         Update to reflect the agreed upon semantics
>         https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> ---
>  Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
>  1 file changed, 163 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> index 679729442fd2..af14c1b330a9 100644
> --- a/Documentation/filesystems/dax.txt
> +++ b/Documentation/filesystems/dax.txt
> @@ -17,11 +17,171 @@ For file mappings, the storage device is mapped directly into userspace.
>  Usage
>  -----
>
> -If you have a block device which supports DAX, you can make a filesystem
> +If you have a block device which supports DAX, you can make a file system
>  on it as usual.  The DAX code currently only supports files with a block
>  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> -size when creating the filesystem.  When mounting it, use the "-o dax"
> -option on the command line or add 'dax' to the options in /etc/fstab.
> +size when creating the file system.
> +
> +Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
> +different at this time.
> +
> +Enabling DAX on ext4
> +--------------------
> +
> +When mounting the filesystem, use the "-o dax" option on the command line or
> +add 'dax' to the options in /etc/fstab.
> +
> +
> +Enabling DAX on xfs
> +-------------------
> +
> +Summary
> +-------
> +
> + 1. There exists an in-kernel access mode flag S_DAX that is set when
> +    file accesses go directly to persistent memory, bypassing the page
> +    cache.

I had reserved some quibbling with this wording, but now that this is
being proposed as documentation I'll let my quibbling fly. "dax" may
imply, but does not require persistent memory nor does it necessarily
"bypass page cache". For example on configurations that support dax,
but turn off MAP_SYNC (like virtio-pmem), a software flush is
required. Instead, if we're going to define "dax" here I'd prefer it
be a #include of the man page definition that is careful (IIRC) to
only talk about semantics and not backend implementation details. In
other words, dax is to page-cache as direct-io is to page cache,
effectively not there, but dig a bit deeper and you may find it.

> Applications must call statx to discover the current S_DAX
> +    state (STATX_ATTR_DAX).
> +
> + 2. There exists an advisory file inode flag FS_XFLAG_DAX that is
> +    inherited from the parent directory FS_XFLAG_DAX inode flag at file
> +    creation time.  This advisory flag can be set or cleared at any
> +    time, but doing so does not immediately affect the S_DAX state.
> +
> +    Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
> +    and the fs is on pmem then it will enable S_DAX at inode load time;
> +    if FS_XFLAG_DAX is not set, it will not enable S_DAX.
> +
> + 3. There exists a dax= mount option.
> +
> +    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
> +
> +    "-o dax=always" means "always set S_DAX (at least on pmem),
> +                    and ignore FS_XFLAG_DAX."
> +
> +    "-o dax"        is an alias for "dax=always".
> +
> +    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
> +
> + 4. There exists an advisory directory inode flag FS_XFLAG_DAX that can
> +    be set or cleared at any time.  The flag state is inherited by any files or
> +    subdirectories when they are created within that directory.
> +
> + 5. Programs that require a specific file access mode (DAX or not DAX)
> +    can do one of the following:
> +
> +    (a) Create files in directories that the FS_XFLAG_DAX flag set as
> +        needed; or
> +
> +    (b) Have the administrator set an override via mount option; or
> +
> +    (c) Set or clear the file's FS_XFLAG_DAX flag as needed.  Programs
> +        must then cause the kernel to evict the inode from memory.  This
> +        can be done by:
> +
> +        i>  Closing the file and re-opening the file and using statx to
> +            see if the fs has changed the S_DAX flag; and
> +
> +        ii> If the file still does not have the desired S_DAX access
> +            mode, either unmount and remount the filesystem, or close
> +            the file and use drop_caches.
> +
> + 6. It is expected that users who want to squeeze every last bit of performance
> +    out of the particular rough and tumble bits of their storage will also be
> +    exposed to the difficulties of what happens when the operating system can't
> +    totally virtualize those hardware capabilities.  DAX is such a feature.
> +    Basically, Formula-1 cars require a bit more care and feeding than your
> +    averaged Toyota minivan, as it were.
> +
> +
> +Details
> +-------
> +
> +There are 2 per-file dax flags.  One is a physical inode setting (FS_XFLAG_DAX)
> +and the other a currently enabled state (S_DAX).
> +
> +FS_XFLAG_DAX is maintained, on disk, on individual inodes.  It is preserved
> +within the file system.  This 'physical' config setting can be set using an
> +ioctl and/or an application such as "xfs_io -c 'chattr [-+]x'".  Files and
> +directories automatically inherit FS_XFLAG_DAX from their parent directory
> +_when_ _created_.  Therefore, setting FS_XFLAG_DAX at directory creation time
> +can be used to set a default behavior for an entire sub-tree.  (Doing so on the
> +root directory acts to set a default for the entire file system.)
> +
> +To clarify inheritance here are 3 examples:
> +
> +Example A:
> +
> +mkdir -p a/b/c
> +xfs_io 'chattr +x' a
> +mkdir a/b/c/d
> +mkdir a/e
> +
> +       dax: a,e
> +       no dax: b,c,d
> +
> +Example B:
> +
> +mkdir a
> +xfs_io 'chattr +x' a
> +mkdir -p a/b/c/d
> +
> +       dax: a,b,c,d
> +       no dax:
> +
> +Example C:
> +
> +mkdir -p a/b/c
> +xfs_io 'chattr +x' c
> +mkdir a/b/c/d
> +
> +       dax: c,d
> +       no dax: a,b
> +
> +
> +The current enabled state (S_DAX) is set when a file inode is _loaded_ based on
> +the underlying media support, the value of FS_XFLAG_DAX, and the file systems
> +dax mount option setting.  See below.
> +
> +statx can be used to query S_DAX.  NOTE that a directory will never have S_DAX
> +set and therefore statx will always return false on directories.
> +
> +NOTE: Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs
> +even if the underlying media does not support dax and/or the file system is
> +overridden with a mount option.
> +
> +
> +Overriding FS_XFLAG_DAX (dax= mount option)
> +-------------------------------------------
> +
> +There exists a dax mount option.  Using the mount option does not change the
> +physical configured state of individual files but overrides the S_DAX operating
> +state when inodes are loaded.
> +
> +Given underlying media support, the dax mount option is a tri-state option
> +(never, always, inode) with the following meanings:
> +
> +   "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
> +   "-o dax=always" means "always set S_DAX, ignore FS_XFLAG_DAX"
> +        "-o dax" by itself means "dax=always" to remain compatible with older
> +                kernels
> +   "-o dax=inode" means "follow FS_XFLAG_DAX"
> +
> +The default state is 'inode'.  Given underlying media support, the following
> +algorithm is used to determine the effective mode of the file S_DAX on a
> +capable device.
> +
> +       S_DAX = FS_XFLAG_DAX;
> +
> +       if (dax_mount == "always")
> +               S_DAX = true;
> +       else if (dax_mount == "off"
> +               S_DAX = false;
> +
> +To reiterate: Setting, and inheritance, continues to affect FS_XFLAG_DAX even
> +while the file system is mounted with a dax override.  However, file enabled
> +state, S_DAX, will continue to be the overridden until the file system is
> +remounted with dax=inode.
>
>
>  Implementation Tips for Block Driver Writers
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 2/9] fs: Remove unneeded IS_DAX() check in io_is_direct()
  2020-04-13  5:40 ` [PATCH V7 2/9] fs: Remove unneeded IS_DAX() check in io_is_direct() ira.weiny
@ 2020-04-14  6:22   ` Christoph Hellwig
  0 siblings, 0 replies; 47+ messages in thread
From: Christoph Hellwig @ 2020-04-14  6:22 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Sun, Apr 12, 2020 at 10:40:39PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> Remove the check because DAX now has it's own read/write methods and
> file systems which support DAX check IS_DAX() prior to IOCB_DIRECT on
> their own.  Therefore, it does not matter if the file state is DAX when
> the iocb flags are created.
> 
> Also remove io_is_direct() as it is just a simple flag check.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 3/9] fs/stat: Define DAX statx attribute
  2020-04-13  5:40 ` [PATCH V7 3/9] fs/stat: Define DAX statx attribute ira.weiny
@ 2020-04-14  6:23   ` Christoph Hellwig
  2020-04-14 20:39     ` Ira Weiny
  0 siblings, 1 reply; 47+ messages in thread
From: Christoph Hellwig @ 2020-04-14  6:23 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, Darrick J. Wong, Dave Chinner, Jan Kara,
	Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jeff Moyer, linux-ext4, linux-xfs,
	linux-fsdevel

On Sun, Apr 12, 2020 at 10:40:40PM -0700, ira.weiny@intel.com wrote:
> STATX_ATTR_DAX
> 
> 	The file is in the DAX (cpu direct access) state.  DAX state
> 	attempts to minimize software cache effects for both I/O and
> 	memory mappings of this file.  It requires a file system which
> 	has been configured to support DAX.

Can we remove the misleading DAX name?  Something like
STATX_ATTR_DIRECT_LOAD_STORE?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 4/9] fs/xfs: Make DAX mount option a tri-state
  2020-04-13 19:28     ` Ira Weiny
@ 2020-04-14  6:24       ` Christoph Hellwig
  0 siblings, 0 replies; 47+ messages in thread
From: Christoph Hellwig @ 2020-04-14  6:24 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Darrick J. Wong, linux-kernel, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Mon, Apr 13, 2020 at 12:28:11PM -0700, Ira Weiny wrote:
> > I think that the dax_param_enums table (and the unnamed enum defining
> > XFS_DAX_*) probably ought to be part of the VFS so that you don't have
> > to duplicate these two pieces whenever it's time to bring ext4 in line
> > with XFS.
> > 
> > That probably doesn't need to be done right away, though...
> 
> Ext4 has a very different param parsing mechanism which I've barely learned.
> I'm not really seeing how to use the enum strategy so I've just used a string
> option.  But I'm open to being corrected.
> 
> I am close to having the series working and hope to have that set (which builds
> on this one) out for review soon (today?).

ext4 still uses the legacy mount option parsing that XFS used until
recently.  It needs to be switched over to the new mount API anyway.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 4/9] fs/xfs: Make DAX mount option a tri-state
  2020-04-13  5:40 ` [PATCH V7 4/9] fs/xfs: Make DAX mount option a tri-state ira.weiny
  2020-04-13 15:46   ` Darrick J. Wong
@ 2020-04-14  6:25   ` Christoph Hellwig
  2020-04-14 20:34     ` Ira Weiny
  1 sibling, 1 reply; 47+ messages in thread
From: Christoph Hellwig @ 2020-04-14  6:25 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Sun, Apr 12, 2020 at 10:40:41PM -0700, ira.weiny@intel.com wrote:
> +#define XFS_MOUNT_DAX		(1ULL << 62)
> +#define XFS_MOUNT_NODAX		(1ULL << 63)

Replace this with XFS_MOUNT_DAX_ALWAYS and XFS_MOUNT_DAX_NEVER?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 5/9] fs/xfs: Create function xfs_inode_enable_dax()
  2020-04-13  5:40 ` [PATCH V7 5/9] fs/xfs: Create function xfs_inode_enable_dax() ira.weiny
  2020-04-13 15:52   ` Darrick J. Wong
@ 2020-04-14  6:27   ` Christoph Hellwig
  2020-04-14 20:43     ` Ira Weiny
  1 sibling, 1 reply; 47+ messages in thread
From: Christoph Hellwig @ 2020-04-14  6:27 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Sun, Apr 12, 2020 at 10:40:42PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> xfs_inode_supports_dax() should reflect if the inode can support DAX not
> that it is enabled for DAX.
> 
> Change the use of xfs_inode_supports_dax() to reflect only if the inode
> and underlying storage support dax.
> 
> Add a new function xfs_inode_enable_dax() which reflects if the inode
> should be enabled for DAX.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from v6:
> 	Change enable checks to be sequential logic.
> 	Update for 2 bit tri-state option.
> 	Make 'static' consistent.
> 	Don't set S_DAX if !CONFIG_FS_DAX
> 
> Changes from v5:
> 	Update to reflect the new tri-state mount option
> 
> Changes from v3:
> 	Update functions and names to be more clear
> 	Update commit message
> 	Merge with
> 		'fs/xfs: Clean up DAX support check'
> 		don't allow IS_DAX() on a directory
> 		use STATIC macro for static
> 		make xfs_inode_supports_dax() static
> ---
>  fs/xfs/xfs_iops.c | 34 +++++++++++++++++++++++++++++-----
>  1 file changed, 29 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 81f2f93caec0..37bd15b55878 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -1244,12 +1244,11 @@ xfs_inode_supports_dax(
>  	struct xfs_mount	*mp = ip->i_mount;
>  
>  	/* Only supported on non-reflinked files. */
> -	if (!S_ISREG(VFS_I(ip)->i_mode) || xfs_is_reflink_inode(ip))
> +	if (xfs_is_reflink_inode(ip))
>  		return false;
>  
> -	/* DAX mount option or DAX iflag must be set. */
> -	if (!(mp->m_flags & XFS_MOUNT_DAX) &&
> -	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
> +	/* Only supported on regular files. */
> +	if (!S_ISREG(VFS_I(ip)->i_mode))
>  		return false;

To me it would make sense to check S_ISREG before reflink, as it seems
more logical.

> +#ifdef CONFIG_FS_DAX
> +static bool
> +xfs_inode_enable_dax(
> +	struct xfs_inode *ip)
> +{
> +	if (ip->i_mount->m_flags & XFS_MOUNT_NODAX)
> +		return false;
> +	if (!xfs_inode_supports_dax(ip))
> +		return false;
> +	if (ip->i_mount->m_flags & XFS_MOUNT_DAX)
> +		return true;
> +	if (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)
> +		return true;
> +	return false;
> +}
> +#else /* !CONFIG_FS_DAX */
> +static bool
> +xfs_inode_enable_dax(
> +	struct xfs_inode *ip)
> +{
> +	return false;
> +}
> +#endif /* CONFIG_FS_DAX */

Just throw in a

	if (!IS_ENABLED(CONFIG_FS_DAX))
		return false;

as the first statement of the full version and avoid the stub entirely?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 7/9] fs: Define I_DONTCACNE in VFS layer
  2020-04-13  5:40 ` [PATCH V7 7/9] fs: Define I_DONTCACNE in VFS layer ira.weiny
  2020-04-13 16:09   ` Darrick J. Wong
@ 2020-04-14 15:26   ` Jan Kara
  2020-04-14 15:45     ` Ira Weiny
  1 sibling, 1 reply; 47+ messages in thread
From: Jan Kara @ 2020-04-14 15:26 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Sun 12-04-20 22:40:44, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> DAX effective mode changes (setting of S_DAX) require inode eviction.
> 
> Define a flag which can be set to inform the VFS layer that inodes
> should not be cached.  This will expedite the eviction of those nodes
> requiring reload.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

This inode flag will have a limited impact because usually dentry will
still hold inode reference. So until dentry is evicted, inode stays as
well. So I think we'd need something like DCACHE_DONTCACHE flag as well to
discard a dentry whenever dentry usecount hits zero (which will be
generally on last file close). What do you think?

And I'd note that checking for I_DONTCACHE flag in dput() isn't
straightforward because of locking so that's why I suggest separate dentry
flag.

								Honza

> ---
>  include/linux/fs.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index a818ced22961..e2db71d150c3 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2151,6 +2151,8 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
>   *
>   * I_CREATING		New object's inode in the middle of setting up.
>   *
> + * I_DONTCACHE		Do not cache the inode
> + *
>   * Q: What is the difference between I_WILL_FREE and I_FREEING?
>   */
>  #define I_DIRTY_SYNC		(1 << 0)
> @@ -2173,6 +2175,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
>  #define I_WB_SWITCH		(1 << 13)
>  #define I_OVL_INUSE		(1 << 14)
>  #define I_CREATING		(1 << 15)
> +#define I_DONTCACHE		(1 << 16)
>  
>  #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
>  #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
> @@ -3042,7 +3045,8 @@ extern int inode_needs_sync(struct inode *inode);
>  extern int generic_delete_inode(struct inode *inode);
>  static inline int generic_drop_inode(struct inode *inode)
>  {
> -	return !inode->i_nlink || inode_unhashed(inode);
> +	return !inode->i_nlink || inode_unhashed(inode) ||
> +		(inode->i_state & I_DONTCACHE);
>  }
>  
>  extern struct inode *ilookup5_nowait(struct super_block *sb,
> -- 
> 2.25.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 7/9] fs: Define I_DONTCACNE in VFS layer
  2020-04-14 15:26   ` Jan Kara
@ 2020-04-14 15:45     ` Ira Weiny
  2020-04-14 15:59       ` Jan Kara
  0 siblings, 1 reply; 47+ messages in thread
From: Ira Weiny @ 2020-04-14 15:45 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Tue, Apr 14, 2020 at 05:26:30PM +0200, Jan Kara wrote:
> On Sun 12-04-20 22:40:44, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > DAX effective mode changes (setting of S_DAX) require inode eviction.
> > 
> > Define a flag which can be set to inform the VFS layer that inodes
> > should not be cached.  This will expedite the eviction of those nodes
> > requiring reload.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> This inode flag will have a limited impact because usually dentry will
> still hold inode reference. So until dentry is evicted, inode stays as
> well.

Agreed but at least this keeps the inode from being cached until that time.

FWIW the ext4 patches seem to have a much longer delay when issuing drop_caches
and I'm not 100% sure why.  I've sent out those patches RFC to get the
discussions started.  I feel like I have missed something there but it does
eventually flip the S_DAX flag.

> So I think we'd need something like DCACHE_DONTCACHE flag as well to
> discard a dentry whenever dentry usecount hits zero (which will be
> generally on last file close). What do you think?

I wanted to do something like this but I was not sure how to trigger the
DCACHE_DONTCACHE on the correct 'parent' dentry.  Can't their be multiple
dentries pointing to the same inode?

In which case, would you need to flag them all?

Ira

> 
> And I'd note that checking for I_DONTCACHE flag in dput() isn't
> straightforward because of locking so that's why I suggest separate dentry
> flag.
> 
> 								Honza
> 
> > ---
> >  include/linux/fs.h | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index a818ced22961..e2db71d150c3 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2151,6 +2151,8 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
> >   *
> >   * I_CREATING		New object's inode in the middle of setting up.
> >   *
> > + * I_DONTCACHE		Do not cache the inode
> > + *
> >   * Q: What is the difference between I_WILL_FREE and I_FREEING?
> >   */
> >  #define I_DIRTY_SYNC		(1 << 0)
> > @@ -2173,6 +2175,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
> >  #define I_WB_SWITCH		(1 << 13)
> >  #define I_OVL_INUSE		(1 << 14)
> >  #define I_CREATING		(1 << 15)
> > +#define I_DONTCACHE		(1 << 16)
> >  
> >  #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
> >  #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
> > @@ -3042,7 +3045,8 @@ extern int inode_needs_sync(struct inode *inode);
> >  extern int generic_delete_inode(struct inode *inode);
> >  static inline int generic_drop_inode(struct inode *inode)
> >  {
> > -	return !inode->i_nlink || inode_unhashed(inode);
> > +	return !inode->i_nlink || inode_unhashed(inode) ||
> > +		(inode->i_state & I_DONTCACHE);
> >  }
> >  
> >  extern struct inode *ilookup5_nowait(struct super_block *sb,
> > -- 
> > 2.25.1
> > 
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 7/9] fs: Define I_DONTCACNE in VFS layer
  2020-04-14 15:45     ` Ira Weiny
@ 2020-04-14 15:59       ` Jan Kara
  0 siblings, 0 replies; 47+ messages in thread
From: Jan Kara @ 2020-04-14 15:59 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Jan Kara, linux-kernel, Darrick J. Wong, Dan Williams,
	Dave Chinner, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-xfs, linux-fsdevel

On Tue 14-04-20 08:45:01, Ira Weiny wrote:
> On Tue, Apr 14, 2020 at 05:26:30PM +0200, Jan Kara wrote:
> > On Sun 12-04-20 22:40:44, ira.weiny@intel.com wrote:
> > > From: Ira Weiny <ira.weiny@intel.com>
> > > 
> > > DAX effective mode changes (setting of S_DAX) require inode eviction.
> > > 
> > > Define a flag which can be set to inform the VFS layer that inodes
> > > should not be cached.  This will expedite the eviction of those nodes
> > > requiring reload.
> > > 
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > 
> > This inode flag will have a limited impact because usually dentry will
> > still hold inode reference. So until dentry is evicted, inode stays as
> > well.
> 
> Agreed but at least this keeps the inode from being cached until that time.
> 
> FWIW the ext4 patches seem to have a much longer delay when issuing drop_caches
> and I'm not 100% sure why.  I've sent out those patches RFC to get the
> discussions started.  I feel like I have missed something there but it does
> eventually flip the S_DAX flag.
> 
> > So I think we'd need something like DCACHE_DONTCACHE flag as well to
> > discard a dentry whenever dentry usecount hits zero (which will be
> > generally on last file close). What do you think?
> 
> I wanted to do something like this but I was not sure how to trigger the
> DCACHE_DONTCACHE on the correct 'parent' dentry.  Can't their be multiple
> dentries pointing to the same inode?
> 
> In which case, would you need to flag them all?

There can be multiple dentries in case there are hardlinks. There can be
also multiple entries in case the filesystem is NFS-exported and there are
some disconnected dentries (those will however get discarded automatically
once they are unused). You could actually iterate the list of all dentries
(they are all part of inode->i_dentry list) and mark them all. This would
still miss the case if there are more hardlinks and a dentry for a new link
gets instantiated later but I guess I would not bother with this
cornercase.

								Honza

> > And I'd note that checking for I_DONTCACHE flag in dput() isn't
> > straightforward because of locking so that's why I suggest separate dentry
> > flag.
> > 
> > 								Honza
> > 
> > > ---
> > >  include/linux/fs.h | 6 +++++-
> > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > index a818ced22961..e2db71d150c3 100644
> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -2151,6 +2151,8 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
> > >   *
> > >   * I_CREATING		New object's inode in the middle of setting up.
> > >   *
> > > + * I_DONTCACHE		Do not cache the inode
> > > + *
> > >   * Q: What is the difference between I_WILL_FREE and I_FREEING?
> > >   */
> > >  #define I_DIRTY_SYNC		(1 << 0)
> > > @@ -2173,6 +2175,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
> > >  #define I_WB_SWITCH		(1 << 13)
> > >  #define I_OVL_INUSE		(1 << 14)
> > >  #define I_CREATING		(1 << 15)
> > > +#define I_DONTCACHE		(1 << 16)
> > >  
> > >  #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
> > >  #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
> > > @@ -3042,7 +3045,8 @@ extern int inode_needs_sync(struct inode *inode);
> > >  extern int generic_delete_inode(struct inode *inode);
> > >  static inline int generic_drop_inode(struct inode *inode)
> > >  {
> > > -	return !inode->i_nlink || inode_unhashed(inode);
> > > +	return !inode->i_nlink || inode_unhashed(inode) ||
> > > +		(inode->i_state & I_DONTCACHE);
> > >  }
> > >  
> > >  extern struct inode *ilookup5_nowait(struct super_block *sb,
> > > -- 
> > > 2.25.1
> > > 
> > -- 
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 9/9] Documentation/dax: Update Usage section
  2020-04-14  5:21   ` Dan Williams
@ 2020-04-14 16:15     ` Darrick J. Wong
  2020-04-14 19:04       ` Dan Williams
  0 siblings, 1 reply; 47+ messages in thread
From: Darrick J. Wong @ 2020-04-14 16:15 UTC (permalink / raw)
  To: Dan Williams
  Cc: Weiny, Ira, Linux Kernel Mailing List, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Mon, Apr 13, 2020 at 10:21:26PM -0700, Dan Williams wrote:
> On Sun, Apr 12, 2020 at 10:41 PM <ira.weiny@intel.com> wrote:
> >
> > From: Ira Weiny <ira.weiny@intel.com>
> >
> > Update the Usage section to reflect the new individual dax selection
> > functionality.
> >
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> >
> > ---
> > Changes from V6:
> >         Update to allow setting FS_XFLAG_DAX any time.
> >         Update with list of behaviors from Darrick
> >         https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> >
> > Changes from V5:
> >         Update to reflect the agreed upon semantics
> >         https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > ---
> >  Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
> >  1 file changed, 163 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> > index 679729442fd2..af14c1b330a9 100644
> > --- a/Documentation/filesystems/dax.txt
> > +++ b/Documentation/filesystems/dax.txt
> > @@ -17,11 +17,171 @@ For file mappings, the storage device is mapped directly into userspace.
> >  Usage
> >  -----
> >
> > -If you have a block device which supports DAX, you can make a filesystem
> > +If you have a block device which supports DAX, you can make a file system
> >  on it as usual.  The DAX code currently only supports files with a block
> >  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> > -size when creating the filesystem.  When mounting it, use the "-o dax"
> > -option on the command line or add 'dax' to the options in /etc/fstab.
> > +size when creating the file system.
> > +
> > +Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
> > +different at this time.
> > +
> > +Enabling DAX on ext4
> > +--------------------
> > +
> > +When mounting the filesystem, use the "-o dax" option on the command line or
> > +add 'dax' to the options in /etc/fstab.
> > +
> > +
> > +Enabling DAX on xfs
> > +-------------------
> > +
> > +Summary
> > +-------
> > +
> > + 1. There exists an in-kernel access mode flag S_DAX that is set when
> > +    file accesses go directly to persistent memory, bypassing the page
> > +    cache.
> 
> I had reserved some quibbling with this wording, but now that this is
> being proposed as documentation I'll let my quibbling fly. "dax" may
> imply, but does not require persistent memory nor does it necessarily
> "bypass page cache". For example on configurations that support dax,
> but turn off MAP_SYNC (like virtio-pmem), a software flush is
> required. Instead, if we're going to define "dax" here I'd prefer it
> be a #include of the man page definition that is careful (IIRC) to
> only talk about semantics and not backend implementation details. In
> other words, dax is to page-cache as direct-io is to page cache,
> effectively not there, but dig a bit deeper and you may find it.

Uh, which manpage?  Are you talking about the MAP_SYNC documentation?

I don't rewording this to say "There exists an in-kernel access mode
flag S_DAX that, when set, enables MAP_SYNC semantics.  Refer to mmap(2)
for more details about what that means."

--D

> > Applications must call statx to discover the current S_DAX
> > +    state (STATX_ATTR_DAX).
> > +
> > + 2. There exists an advisory file inode flag FS_XFLAG_DAX that is
> > +    inherited from the parent directory FS_XFLAG_DAX inode flag at file
> > +    creation time.  This advisory flag can be set or cleared at any
> > +    time, but doing so does not immediately affect the S_DAX state.
> > +
> > +    Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
> > +    and the fs is on pmem then it will enable S_DAX at inode load time;
> > +    if FS_XFLAG_DAX is not set, it will not enable S_DAX.
> > +
> > + 3. There exists a dax= mount option.
> > +
> > +    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
> > +
> > +    "-o dax=always" means "always set S_DAX (at least on pmem),
> > +                    and ignore FS_XFLAG_DAX."
> > +
> > +    "-o dax"        is an alias for "dax=always".
> > +
> > +    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
> > +
> > + 4. There exists an advisory directory inode flag FS_XFLAG_DAX that can
> > +    be set or cleared at any time.  The flag state is inherited by any files or
> > +    subdirectories when they are created within that directory.
> > +
> > + 5. Programs that require a specific file access mode (DAX or not DAX)
> > +    can do one of the following:
> > +
> > +    (a) Create files in directories that the FS_XFLAG_DAX flag set as
> > +        needed; or
> > +
> > +    (b) Have the administrator set an override via mount option; or
> > +
> > +    (c) Set or clear the file's FS_XFLAG_DAX flag as needed.  Programs
> > +        must then cause the kernel to evict the inode from memory.  This
> > +        can be done by:
> > +
> > +        i>  Closing the file and re-opening the file and using statx to
> > +            see if the fs has changed the S_DAX flag; and
> > +
> > +        ii> If the file still does not have the desired S_DAX access
> > +            mode, either unmount and remount the filesystem, or close
> > +            the file and use drop_caches.
> > +
> > + 6. It is expected that users who want to squeeze every last bit of performance
> > +    out of the particular rough and tumble bits of their storage will also be
> > +    exposed to the difficulties of what happens when the operating system can't
> > +    totally virtualize those hardware capabilities.  DAX is such a feature.
> > +    Basically, Formula-1 cars require a bit more care and feeding than your
> > +    averaged Toyota minivan, as it were.
> > +
> > +
> > +Details
> > +-------
> > +
> > +There are 2 per-file dax flags.  One is a physical inode setting (FS_XFLAG_DAX)
> > +and the other a currently enabled state (S_DAX).
> > +
> > +FS_XFLAG_DAX is maintained, on disk, on individual inodes.  It is preserved
> > +within the file system.  This 'physical' config setting can be set using an
> > +ioctl and/or an application such as "xfs_io -c 'chattr [-+]x'".  Files and
> > +directories automatically inherit FS_XFLAG_DAX from their parent directory
> > +_when_ _created_.  Therefore, setting FS_XFLAG_DAX at directory creation time
> > +can be used to set a default behavior for an entire sub-tree.  (Doing so on the
> > +root directory acts to set a default for the entire file system.)
> > +
> > +To clarify inheritance here are 3 examples:
> > +
> > +Example A:
> > +
> > +mkdir -p a/b/c
> > +xfs_io 'chattr +x' a
> > +mkdir a/b/c/d
> > +mkdir a/e
> > +
> > +       dax: a,e
> > +       no dax: b,c,d
> > +
> > +Example B:
> > +
> > +mkdir a
> > +xfs_io 'chattr +x' a
> > +mkdir -p a/b/c/d
> > +
> > +       dax: a,b,c,d
> > +       no dax:
> > +
> > +Example C:
> > +
> > +mkdir -p a/b/c
> > +xfs_io 'chattr +x' c
> > +mkdir a/b/c/d
> > +
> > +       dax: c,d
> > +       no dax: a,b
> > +
> > +
> > +The current enabled state (S_DAX) is set when a file inode is _loaded_ based on
> > +the underlying media support, the value of FS_XFLAG_DAX, and the file systems
> > +dax mount option setting.  See below.
> > +
> > +statx can be used to query S_DAX.  NOTE that a directory will never have S_DAX
> > +set and therefore statx will always return false on directories.
> > +
> > +NOTE: Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs
> > +even if the underlying media does not support dax and/or the file system is
> > +overridden with a mount option.
> > +
> > +
> > +Overriding FS_XFLAG_DAX (dax= mount option)
> > +-------------------------------------------
> > +
> > +There exists a dax mount option.  Using the mount option does not change the
> > +physical configured state of individual files but overrides the S_DAX operating
> > +state when inodes are loaded.
> > +
> > +Given underlying media support, the dax mount option is a tri-state option
> > +(never, always, inode) with the following meanings:
> > +
> > +   "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
> > +   "-o dax=always" means "always set S_DAX, ignore FS_XFLAG_DAX"
> > +        "-o dax" by itself means "dax=always" to remain compatible with older
> > +                kernels
> > +   "-o dax=inode" means "follow FS_XFLAG_DAX"
> > +
> > +The default state is 'inode'.  Given underlying media support, the following
> > +algorithm is used to determine the effective mode of the file S_DAX on a
> > +capable device.
> > +
> > +       S_DAX = FS_XFLAG_DAX;
> > +
> > +       if (dax_mount == "always")
> > +               S_DAX = true;
> > +       else if (dax_mount == "off"
> > +               S_DAX = false;
> > +
> > +To reiterate: Setting, and inheritance, continues to affect FS_XFLAG_DAX even
> > +while the file system is mounted with a dax override.  However, file enabled
> > +state, S_DAX, will continue to be the overridden until the file system is
> > +remounted with dax=inode.
> >
> >
> >  Implementation Tips for Block Driver Writers
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 9/9] Documentation/dax: Update Usage section
  2020-04-14 16:15     ` Darrick J. Wong
@ 2020-04-14 19:04       ` Dan Williams
  2020-04-14 19:57         ` Darrick J. Wong
  2020-04-14 19:58         ` Ira Weiny
  0 siblings, 2 replies; 47+ messages in thread
From: Dan Williams @ 2020-04-14 19:04 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Weiny, Ira, Linux Kernel Mailing List, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Tue, Apr 14, 2020 at 9:15 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> On Mon, Apr 13, 2020 at 10:21:26PM -0700, Dan Williams wrote:
> > On Sun, Apr 12, 2020 at 10:41 PM <ira.weiny@intel.com> wrote:
> > >
> > > From: Ira Weiny <ira.weiny@intel.com>
> > >
> > > Update the Usage section to reflect the new individual dax selection
> > > functionality.
> > >
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > >
> > > ---
> > > Changes from V6:
> > >         Update to allow setting FS_XFLAG_DAX any time.
> > >         Update with list of behaviors from Darrick
> > >         https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> > >
> > > Changes from V5:
> > >         Update to reflect the agreed upon semantics
> > >         https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > > ---
> > >  Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
> > >  1 file changed, 163 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> > > index 679729442fd2..af14c1b330a9 100644
> > > --- a/Documentation/filesystems/dax.txt
> > > +++ b/Documentation/filesystems/dax.txt
> > > @@ -17,11 +17,171 @@ For file mappings, the storage device is mapped directly into userspace.
> > >  Usage
> > >  -----
> > >
> > > -If you have a block device which supports DAX, you can make a filesystem
> > > +If you have a block device which supports DAX, you can make a file system
> > >  on it as usual.  The DAX code currently only supports files with a block
> > >  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> > > -size when creating the filesystem.  When mounting it, use the "-o dax"
> > > -option on the command line or add 'dax' to the options in /etc/fstab.
> > > +size when creating the file system.
> > > +
> > > +Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
> > > +different at this time.
> > > +
> > > +Enabling DAX on ext4
> > > +--------------------
> > > +
> > > +When mounting the filesystem, use the "-o dax" option on the command line or
> > > +add 'dax' to the options in /etc/fstab.
> > > +
> > > +
> > > +Enabling DAX on xfs
> > > +-------------------
> > > +
> > > +Summary
> > > +-------
> > > +
> > > + 1. There exists an in-kernel access mode flag S_DAX that is set when
> > > +    file accesses go directly to persistent memory, bypassing the page
> > > +    cache.
> >
> > I had reserved some quibbling with this wording, but now that this is
> > being proposed as documentation I'll let my quibbling fly. "dax" may
> > imply, but does not require persistent memory nor does it necessarily
> > "bypass page cache". For example on configurations that support dax,
> > but turn off MAP_SYNC (like virtio-pmem), a software flush is
> > required. Instead, if we're going to define "dax" here I'd prefer it
> > be a #include of the man page definition that is careful (IIRC) to
> > only talk about semantics and not backend implementation details. In
> > other words, dax is to page-cache as direct-io is to page cache,
> > effectively not there, but dig a bit deeper and you may find it.
>
> Uh, which manpage?  Are you talking about the MAP_SYNC documentation?

No, I was referring to the proposed wording for STATX_ATTR_DAX.
There's no reason for this description to say anything divergent from
that description.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 9/9] Documentation/dax: Update Usage section
  2020-04-14  5:12       ` Dan Williams
@ 2020-04-14 19:48         ` Ira Weiny
  2020-04-15  8:23           ` Jan Kara
  0 siblings, 1 reply; 47+ messages in thread
From: Ira Weiny @ 2020-04-14 19:48 UTC (permalink / raw)
  To: Dan Williams
  Cc: Darrick J. Wong, Linux Kernel Mailing List, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Mon, Apr 13, 2020 at 10:12:22PM -0700, Dan Williams wrote:
> On Mon, Apr 13, 2020 at 9:38 PM Ira Weiny <ira.weiny@intel.com> wrote:
> >
> > On Mon, Apr 13, 2020 at 09:19:12AM -0700, Darrick J. Wong wrote:
> > > On Sun, Apr 12, 2020 at 10:40:46PM -0700, ira.weiny@intel.com wrote:
> > > > From: Ira Weiny <ira.weiny@intel.com>
> > > >
> > > > Update the Usage section to reflect the new individual dax selection
> > > > functionality.
> > >
> > > Yum. :)
> > >
> > > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > >
> > > > ---
> > > > Changes from V6:
> > > >     Update to allow setting FS_XFLAG_DAX any time.
> > > >     Update with list of behaviors from Darrick
> > > >     https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> > > >
> > > > Changes from V5:
> > > >     Update to reflect the agreed upon semantics
> > > >     https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > > > ---
> > > >  Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
> > > >  1 file changed, 163 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> > > > index 679729442fd2..af14c1b330a9 100644
> > > > --- a/Documentation/filesystems/dax.txt
> > > > +++ b/Documentation/filesystems/dax.txt
> > > > @@ -17,11 +17,171 @@ For file mappings, the storage device is mapped directly into userspace.
> > > >  Usage
> > > >  -----
> > > >
> > > > -If you have a block device which supports DAX, you can make a filesystem
> > > > +If you have a block device which supports DAX, you can make a file system
> > > >  on it as usual.  The DAX code currently only supports files with a block
> > > >  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> > > > -size when creating the filesystem.  When mounting it, use the "-o dax"
> > > > -option on the command line or add 'dax' to the options in /etc/fstab.
> > > > +size when creating the file system.
> > > > +
> > > > +Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
> > > > +different at this time.
> > >
> > > I thought ext2 supports DAX?
> >
> > Not that I know of?  Does it?
> 
> Yes. Seemed like a good idea at the time, but in retrospect...

Ah ok...   Is there an objection to leaving ext2 as a global mount option?
Updating the doc is easy enough.

Ira

> 
> In fairness I believe this was also an olive branch to XIP users that
> were transitioned to DAX, so they did not also need to transition
> filesystems.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 9/9] Documentation/dax: Update Usage section
  2020-04-14 19:04       ` Dan Williams
@ 2020-04-14 19:57         ` Darrick J. Wong
  2020-04-14 20:00           ` Ira Weiny
  2020-04-14 19:58         ` Ira Weiny
  1 sibling, 1 reply; 47+ messages in thread
From: Darrick J. Wong @ 2020-04-14 19:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: Weiny, Ira, Linux Kernel Mailing List, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Tue, Apr 14, 2020 at 12:04:57PM -0700, Dan Williams wrote:
> On Tue, Apr 14, 2020 at 9:15 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> >
> > On Mon, Apr 13, 2020 at 10:21:26PM -0700, Dan Williams wrote:
> > > On Sun, Apr 12, 2020 at 10:41 PM <ira.weiny@intel.com> wrote:
> > > >
> > > > From: Ira Weiny <ira.weiny@intel.com>
> > > >
> > > > Update the Usage section to reflect the new individual dax selection
> > > > functionality.
> > > >
> > > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > >
> > > > ---
> > > > Changes from V6:
> > > >         Update to allow setting FS_XFLAG_DAX any time.
> > > >         Update with list of behaviors from Darrick
> > > >         https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> > > >
> > > > Changes from V5:
> > > >         Update to reflect the agreed upon semantics
> > > >         https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > > > ---
> > > >  Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
> > > >  1 file changed, 163 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> > > > index 679729442fd2..af14c1b330a9 100644
> > > > --- a/Documentation/filesystems/dax.txt
> > > > +++ b/Documentation/filesystems/dax.txt
> > > > @@ -17,11 +17,171 @@ For file mappings, the storage device is mapped directly into userspace.
> > > >  Usage
> > > >  -----
> > > >
> > > > -If you have a block device which supports DAX, you can make a filesystem
> > > > +If you have a block device which supports DAX, you can make a file system
> > > >  on it as usual.  The DAX code currently only supports files with a block
> > > >  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> > > > -size when creating the filesystem.  When mounting it, use the "-o dax"
> > > > -option on the command line or add 'dax' to the options in /etc/fstab.
> > > > +size when creating the file system.
> > > > +
> > > > +Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
> > > > +different at this time.
> > > > +
> > > > +Enabling DAX on ext4
> > > > +--------------------
> > > > +
> > > > +When mounting the filesystem, use the "-o dax" option on the command line or
> > > > +add 'dax' to the options in /etc/fstab.
> > > > +
> > > > +
> > > > +Enabling DAX on xfs
> > > > +-------------------
> > > > +
> > > > +Summary
> > > > +-------
> > > > +
> > > > + 1. There exists an in-kernel access mode flag S_DAX that is set when
> > > > +    file accesses go directly to persistent memory, bypassing the page
> > > > +    cache.
> > >
> > > I had reserved some quibbling with this wording, but now that this is
> > > being proposed as documentation I'll let my quibbling fly. "dax" may
> > > imply, but does not require persistent memory nor does it necessarily
> > > "bypass page cache". For example on configurations that support dax,
> > > but turn off MAP_SYNC (like virtio-pmem), a software flush is
> > > required. Instead, if we're going to define "dax" here I'd prefer it
> > > be a #include of the man page definition that is careful (IIRC) to
> > > only talk about semantics and not backend implementation details. In
> > > other words, dax is to page-cache as direct-io is to page cache,
> > > effectively not there, but dig a bit deeper and you may find it.
> >
> > Uh, which manpage?  Are you talking about the MAP_SYNC documentation?
> 
> No, I was referring to the proposed wording for STATX_ATTR_DAX.
> There's no reason for this description to say anything divergent from
> that description.

Ahh, ok.  Something like this, then:

 1. There exists an in-kernel access mode flag S_DAX.  When set, the
    file is in the DAX (cpu direct access) state.  DAX state attempts to
    minimize software cache effects for both I/O and memory mappings of
    this file.  The S_DAX state is exposed to userspace via the
    STATX_ATTR_DAX statx flag.

    See the STATX_ATTR_DAX in the statx(2) manpage for more information.

--D

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 9/9] Documentation/dax: Update Usage section
  2020-04-14 19:04       ` Dan Williams
  2020-04-14 19:57         ` Darrick J. Wong
@ 2020-04-14 19:58         ` Ira Weiny
  1 sibling, 0 replies; 47+ messages in thread
From: Ira Weiny @ 2020-04-14 19:58 UTC (permalink / raw)
  To: Dan Williams
  Cc: Darrick J. Wong, Linux Kernel Mailing List, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Tue, Apr 14, 2020 at 12:04:57PM -0700, Dan Williams wrote:
> On Tue, Apr 14, 2020 at 9:15 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> >
> > On Mon, Apr 13, 2020 at 10:21:26PM -0700, Dan Williams wrote:
> > > On Sun, Apr 12, 2020 at 10:41 PM <ira.weiny@intel.com> wrote:
> > > >
> > > > From: Ira Weiny <ira.weiny@intel.com>
> > > >
> > > > Update the Usage section to reflect the new individual dax selection
> > > > functionality.
> > > >
> > > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > >
> > > > ---
> > > > Changes from V6:
> > > >         Update to allow setting FS_XFLAG_DAX any time.
> > > >         Update with list of behaviors from Darrick
> > > >         https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> > > >
> > > > Changes from V5:
> > > >         Update to reflect the agreed upon semantics
> > > >         https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > > > ---
> > > >  Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
> > > >  1 file changed, 163 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> > > > index 679729442fd2..af14c1b330a9 100644
> > > > --- a/Documentation/filesystems/dax.txt
> > > > +++ b/Documentation/filesystems/dax.txt
> > > > @@ -17,11 +17,171 @@ For file mappings, the storage device is mapped directly into userspace.
> > > >  Usage
> > > >  -----
> > > >
> > > > -If you have a block device which supports DAX, you can make a filesystem
> > > > +If you have a block device which supports DAX, you can make a file system
> > > >  on it as usual.  The DAX code currently only supports files with a block
> > > >  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> > > > -size when creating the filesystem.  When mounting it, use the "-o dax"
> > > > -option on the command line or add 'dax' to the options in /etc/fstab.
> > > > +size when creating the file system.
> > > > +
> > > > +Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
> > > > +different at this time.
> > > > +
> > > > +Enabling DAX on ext4
> > > > +--------------------
> > > > +
> > > > +When mounting the filesystem, use the "-o dax" option on the command line or
> > > > +add 'dax' to the options in /etc/fstab.
> > > > +
> > > > +
> > > > +Enabling DAX on xfs
> > > > +-------------------
> > > > +
> > > > +Summary
> > > > +-------
> > > > +
> > > > + 1. There exists an in-kernel access mode flag S_DAX that is set when
> > > > +    file accesses go directly to persistent memory, bypassing the page
> > > > +    cache.
> > >
> > > I had reserved some quibbling with this wording, but now that this is
> > > being proposed as documentation I'll let my quibbling fly. "dax" may
> > > imply, but does not require persistent memory nor does it necessarily
> > > "bypass page cache". For example on configurations that support dax,
> > > but turn off MAP_SYNC (like virtio-pmem), a software flush is
> > > required. Instead, if we're going to define "dax" here I'd prefer it
> > > be a #include of the man page definition that is careful (IIRC) to
> > > only talk about semantics and not backend implementation details. In
> > > other words, dax is to page-cache as direct-io is to page cache,
> > > effectively not there, but dig a bit deeper and you may find it.
> >
> > Uh, which manpage?  Are you talking about the MAP_SYNC documentation?
> 
> No, I was referring to the proposed wording for STATX_ATTR_DAX.
> There's no reason for this description to say anything divergent from
> that description.

Ok I think the best text would be to simply refer to the STATX_ATTR_DAX man
page here.  Something like:

<quote>
 1. There exists an in-kernel access mode flag S_DAX that is set when file
    accesses is enabled for 'DAX'.  Applications must call statx to discover
    the current S_DAX state (STATX_ATTR_DAX).  See the man page for statx for
    more details.
</quote>

Ira


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 9/9] Documentation/dax: Update Usage section
  2020-04-14 19:57         ` Darrick J. Wong
@ 2020-04-14 20:00           ` Ira Weiny
  2020-04-14 20:18             ` Darrick J. Wong
  0 siblings, 1 reply; 47+ messages in thread
From: Ira Weiny @ 2020-04-14 20:00 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Linux Kernel Mailing List, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Tue, Apr 14, 2020 at 12:57:54PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 14, 2020 at 12:04:57PM -0700, Dan Williams wrote:
> > On Tue, Apr 14, 2020 at 9:15 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:

[snip]

> > > > > +
> > > > > +Enabling DAX on xfs
> > > > > +-------------------
> > > > > +
> > > > > +Summary
> > > > > +-------
> > > > > +
> > > > > + 1. There exists an in-kernel access mode flag S_DAX that is set when
> > > > > +    file accesses go directly to persistent memory, bypassing the page
> > > > > +    cache.
> > > >
> > > > I had reserved some quibbling with this wording, but now that this is
> > > > being proposed as documentation I'll let my quibbling fly. "dax" may
> > > > imply, but does not require persistent memory nor does it necessarily
> > > > "bypass page cache". For example on configurations that support dax,
> > > > but turn off MAP_SYNC (like virtio-pmem), a software flush is
> > > > required. Instead, if we're going to define "dax" here I'd prefer it
> > > > be a #include of the man page definition that is careful (IIRC) to
> > > > only talk about semantics and not backend implementation details. In
> > > > other words, dax is to page-cache as direct-io is to page cache,
> > > > effectively not there, but dig a bit deeper and you may find it.
> > >
> > > Uh, which manpage?  Are you talking about the MAP_SYNC documentation?
> > 
> > No, I was referring to the proposed wording for STATX_ATTR_DAX.
> > There's no reason for this description to say anything divergent from
> > that description.
> 
> Ahh, ok.  Something like this, then:
> 
>  1. There exists an in-kernel access mode flag S_DAX.  When set, the
>     file is in the DAX (cpu direct access) state.  DAX state attempts to
>     minimize software cache effects for both I/O and memory mappings of
>     this file.  The S_DAX state is exposed to userspace via the
>     STATX_ATTR_DAX statx flag.
> 
>     See the STATX_ATTR_DAX in the statx(2) manpage for more information.

We crossed in the ether!!!  I propose even less details here...  Leave all the
details to the man page.

<quote>
1. There exists an in-kernel access mode flag S_DAX that is set when file
    accesses is enabled for 'DAX'.  Applications must call statx to discover
    the current S_DAX state (STATX_ATTR_DAX).  See the man page for statx for
    more details.
</quote>

Ira


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 9/9] Documentation/dax: Update Usage section
  2020-04-14 20:00           ` Ira Weiny
@ 2020-04-14 20:18             ` Darrick J. Wong
  2020-04-14 20:54               ` Ira Weiny
  0 siblings, 1 reply; 47+ messages in thread
From: Darrick J. Wong @ 2020-04-14 20:18 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Linux Kernel Mailing List, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Tue, Apr 14, 2020 at 01:00:15PM -0700, Ira Weiny wrote:
> On Tue, Apr 14, 2020 at 12:57:54PM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 14, 2020 at 12:04:57PM -0700, Dan Williams wrote:
> > > On Tue, Apr 14, 2020 at 9:15 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> 
> [snip]
> 
> > > > > > +
> > > > > > +Enabling DAX on xfs
> > > > > > +-------------------
> > > > > > +
> > > > > > +Summary
> > > > > > +-------
> > > > > > +
> > > > > > + 1. There exists an in-kernel access mode flag S_DAX that is set when
> > > > > > +    file accesses go directly to persistent memory, bypassing the page
> > > > > > +    cache.
> > > > >
> > > > > I had reserved some quibbling with this wording, but now that this is
> > > > > being proposed as documentation I'll let my quibbling fly. "dax" may
> > > > > imply, but does not require persistent memory nor does it necessarily
> > > > > "bypass page cache". For example on configurations that support dax,
> > > > > but turn off MAP_SYNC (like virtio-pmem), a software flush is
> > > > > required. Instead, if we're going to define "dax" here I'd prefer it
> > > > > be a #include of the man page definition that is careful (IIRC) to
> > > > > only talk about semantics and not backend implementation details. In
> > > > > other words, dax is to page-cache as direct-io is to page cache,
> > > > > effectively not there, but dig a bit deeper and you may find it.
> > > >
> > > > Uh, which manpage?  Are you talking about the MAP_SYNC documentation?
> > > 
> > > No, I was referring to the proposed wording for STATX_ATTR_DAX.
> > > There's no reason for this description to say anything divergent from
> > > that description.
> > 
> > Ahh, ok.  Something like this, then:
> > 
> >  1. There exists an in-kernel access mode flag S_DAX.  When set, the
> >     file is in the DAX (cpu direct access) state.  DAX state attempts to
> >     minimize software cache effects for both I/O and memory mappings of
> >     this file.  The S_DAX state is exposed to userspace via the
> >     STATX_ATTR_DAX statx flag.
> > 
> >     See the STATX_ATTR_DAX in the statx(2) manpage for more information.
> 
> We crossed in the ether!!!  I propose even less details here...  Leave all the
> details to the man page.
> 
> <quote>
> 1. There exists an in-kernel access mode flag S_DAX that is set when file
>     accesses is enabled for 'DAX'.  Applications must call statx to discover
>     the current S_DAX state (STATX_ATTR_DAX).  See the man page for statx for
>     more details.
> </quote>

Why stop cutting there? :)

 1. There exists an in-kernel file access mode flag S_DAX that
    corresponds to the statx flag STATX_ATTR_DIRECT_LOAD_STORE.  See the
    manpage for statx(2) for details about this access mode.

--D

> Ira
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 4/9] fs/xfs: Make DAX mount option a tri-state
  2020-04-14  6:25   ` Christoph Hellwig
@ 2020-04-14 20:34     ` Ira Weiny
  0 siblings, 0 replies; 47+ messages in thread
From: Ira Weiny @ 2020-04-14 20:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Dave Chinner,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Tue, Apr 14, 2020 at 08:25:30AM +0200, Christoph Hellwig wrote:
> On Sun, Apr 12, 2020 at 10:40:41PM -0700, ira.weiny@intel.com wrote:
> > +#define XFS_MOUNT_DAX		(1ULL << 62)
> > +#define XFS_MOUNT_NODAX		(1ULL << 63)
> 
> Replace this with XFS_MOUNT_DAX_ALWAYS and XFS_MOUNT_DAX_NEVER?

Sounds reasonable, Done for v8

Ira


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 3/9] fs/stat: Define DAX statx attribute
  2020-04-14  6:23   ` Christoph Hellwig
@ 2020-04-14 20:39     ` Ira Weiny
  0 siblings, 0 replies; 47+ messages in thread
From: Ira Weiny @ 2020-04-14 20:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, Darrick J. Wong, Dave Chinner, Jan Kara,
	Dan Williams, Dave Chinner, Theodore Y. Ts'o, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Tue, Apr 14, 2020 at 08:23:06AM +0200, Christoph Hellwig wrote:
> On Sun, Apr 12, 2020 at 10:40:40PM -0700, ira.weiny@intel.com wrote:
> > STATX_ATTR_DAX
> > 
> > 	The file is in the DAX (cpu direct access) state.  DAX state
> > 	attempts to minimize software cache effects for both I/O and
> > 	memory mappings of this file.  It requires a file system which
> > 	has been configured to support DAX.
> 
> Can we remove the misleading DAX name?  Something like
> STATX_ATTR_DIRECT_LOAD_STORE?

This is easy enough to change but...

Honestly I feel like this ship has already sailed.  We have so much out there
which uses the term "DAX".  Is it really better to introduce a new terminology
for the same thing?

Ira


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 5/9] fs/xfs: Create function xfs_inode_enable_dax()
  2020-04-14  6:27   ` Christoph Hellwig
@ 2020-04-14 20:43     ` Ira Weiny
  0 siblings, 0 replies; 47+ messages in thread
From: Ira Weiny @ 2020-04-14 20:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, Darrick J. Wong, Dan Williams, Dave Chinner,
	Theodore Y. Ts'o, Jan Kara, Jeff Moyer, linux-ext4,
	linux-xfs, linux-fsdevel

On Tue, Apr 14, 2020 at 08:27:18AM +0200, Christoph Hellwig wrote:
> On Sun, Apr 12, 2020 at 10:40:42PM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 

[snip]

> > index 81f2f93caec0..37bd15b55878 100644
> > --- a/fs/xfs/xfs_iops.c
> > +++ b/fs/xfs/xfs_iops.c
> > @@ -1244,12 +1244,11 @@ xfs_inode_supports_dax(
> >  	struct xfs_mount	*mp = ip->i_mount;
> >  
> >  	/* Only supported on non-reflinked files. */
> > -	if (!S_ISREG(VFS_I(ip)->i_mode) || xfs_is_reflink_inode(ip))
> > +	if (xfs_is_reflink_inode(ip))
> >  		return false;
> >  
> > -	/* DAX mount option or DAX iflag must be set. */
> > -	if (!(mp->m_flags & XFS_MOUNT_DAX) &&
> > -	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
> > +	/* Only supported on regular files. */
> > +	if (!S_ISREG(VFS_I(ip)->i_mode))
> >  		return false;
> 
> To me it would make sense to check S_ISREG before reflink, as it seems
> more logical.

Done.

> 
> > +#ifdef CONFIG_FS_DAX
> > +static bool
> > +xfs_inode_enable_dax(
> > +	struct xfs_inode *ip)
> > +{
> > +	if (ip->i_mount->m_flags & XFS_MOUNT_NODAX)
> > +		return false;
> > +	if (!xfs_inode_supports_dax(ip))
> > +		return false;
> > +	if (ip->i_mount->m_flags & XFS_MOUNT_DAX)
> > +		return true;
> > +	if (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)
> > +		return true;
> > +	return false;
> > +}
> > +#else /* !CONFIG_FS_DAX */
> > +static bool
> > +xfs_inode_enable_dax(
> > +	struct xfs_inode *ip)
> > +{
> > +	return false;
> > +}
> > +#endif /* CONFIG_FS_DAX */
> 
> Just throw in a
> 
> 	if (!IS_ENABLED(CONFIG_FS_DAX))
> 		return false;
> 
> as the first statement of the full version and avoid the stub entirely?

Sure, less code that way.  Done.

Ira


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 9/9] Documentation/dax: Update Usage section
  2020-04-14 20:18             ` Darrick J. Wong
@ 2020-04-14 20:54               ` Ira Weiny
  2020-04-14 21:02                 ` Darrick J. Wong
  0 siblings, 1 reply; 47+ messages in thread
From: Ira Weiny @ 2020-04-14 20:54 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Linux Kernel Mailing List, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Tue, Apr 14, 2020 at 01:18:08PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 14, 2020 at 01:00:15PM -0700, Ira Weiny wrote:
> > On Tue, Apr 14, 2020 at 12:57:54PM -0700, Darrick J. Wong wrote:
> > > On Tue, Apr 14, 2020 at 12:04:57PM -0700, Dan Williams wrote:
> > > > On Tue, Apr 14, 2020 at 9:15 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > 
> > [snip]
> > 
> > > > > > > +
> > > > > > > +Enabling DAX on xfs
> > > > > > > +-------------------
> > > > > > > +
> > > > > > > +Summary
> > > > > > > +-------
> > > > > > > +
> > > > > > > + 1. There exists an in-kernel access mode flag S_DAX that is set when
> > > > > > > +    file accesses go directly to persistent memory, bypassing the page
> > > > > > > +    cache.
> > > > > >
> > > > > > I had reserved some quibbling with this wording, but now that this is
> > > > > > being proposed as documentation I'll let my quibbling fly. "dax" may
> > > > > > imply, but does not require persistent memory nor does it necessarily
> > > > > > "bypass page cache". For example on configurations that support dax,
> > > > > > but turn off MAP_SYNC (like virtio-pmem), a software flush is
> > > > > > required. Instead, if we're going to define "dax" here I'd prefer it
> > > > > > be a #include of the man page definition that is careful (IIRC) to
> > > > > > only talk about semantics and not backend implementation details. In
> > > > > > other words, dax is to page-cache as direct-io is to page cache,
> > > > > > effectively not there, but dig a bit deeper and you may find it.
> > > > >
> > > > > Uh, which manpage?  Are you talking about the MAP_SYNC documentation?
> > > > 
> > > > No, I was referring to the proposed wording for STATX_ATTR_DAX.
> > > > There's no reason for this description to say anything divergent from
> > > > that description.
> > > 
> > > Ahh, ok.  Something like this, then:
> > > 
> > >  1. There exists an in-kernel access mode flag S_DAX.  When set, the
> > >     file is in the DAX (cpu direct access) state.  DAX state attempts to
> > >     minimize software cache effects for both I/O and memory mappings of
> > >     this file.  The S_DAX state is exposed to userspace via the
> > >     STATX_ATTR_DAX statx flag.
> > > 
> > >     See the STATX_ATTR_DAX in the statx(2) manpage for more information.
> > 
> > We crossed in the ether!!!  I propose even less details here...  Leave all the
> > details to the man page.
> > 
> > <quote>
> > 1. There exists an in-kernel access mode flag S_DAX that is set when file
> >     accesses is enabled for 'DAX'.  Applications must call statx to discover
> >     the current S_DAX state (STATX_ATTR_DAX).  See the man page for statx for
> >     more details.
> > </quote>
> 
> Why stop cutting there? :)
> 
>  1. There exists an in-kernel file access mode flag S_DAX that
>     corresponds to the statx flag STATX_ATTR_DIRECT_LOAD_STORE.  See the
>     manpage for statx(2) for details about this access mode.

Sure!  But I'm holding to STATX_ATTR_DAX...  I don't like introducing another
alias for this stuff.  Why have '-o dax=x' and then have some other term here?

Keep the name the same for consistency.

Searching for 'DAX Linux'[*] results in 'About 877,000 results' on Google.

While "'direct load store' Linux" results in 'About 2,630 results'.

I'll update the rest of the text though!  :-D

Ira

[*] Because 'DAX' is some company index and or a rapper...  <sigh>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 9/9] Documentation/dax: Update Usage section
  2020-04-14 20:54               ` Ira Weiny
@ 2020-04-14 21:02                 ` Darrick J. Wong
  0 siblings, 0 replies; 47+ messages in thread
From: Darrick J. Wong @ 2020-04-14 21:02 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Linux Kernel Mailing List, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, Jeff Moyer,
	linux-ext4, linux-xfs, linux-fsdevel

On Tue, Apr 14, 2020 at 01:54:44PM -0700, Ira Weiny wrote:
> On Tue, Apr 14, 2020 at 01:18:08PM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 14, 2020 at 01:00:15PM -0700, Ira Weiny wrote:
> > > On Tue, Apr 14, 2020 at 12:57:54PM -0700, Darrick J. Wong wrote:
> > > > On Tue, Apr 14, 2020 at 12:04:57PM -0700, Dan Williams wrote:
> > > > > On Tue, Apr 14, 2020 at 9:15 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > > 
> > > [snip]
> > > 
> > > > > > > > +
> > > > > > > > +Enabling DAX on xfs
> > > > > > > > +-------------------
> > > > > > > > +
> > > > > > > > +Summary
> > > > > > > > +-------
> > > > > > > > +
> > > > > > > > + 1. There exists an in-kernel access mode flag S_DAX that is set when
> > > > > > > > +    file accesses go directly to persistent memory, bypassing the page
> > > > > > > > +    cache.
> > > > > > >
> > > > > > > I had reserved some quibbling with this wording, but now that this is
> > > > > > > being proposed as documentation I'll let my quibbling fly. "dax" may
> > > > > > > imply, but does not require persistent memory nor does it necessarily
> > > > > > > "bypass page cache". For example on configurations that support dax,
> > > > > > > but turn off MAP_SYNC (like virtio-pmem), a software flush is
> > > > > > > required. Instead, if we're going to define "dax" here I'd prefer it
> > > > > > > be a #include of the man page definition that is careful (IIRC) to
> > > > > > > only talk about semantics and not backend implementation details. In
> > > > > > > other words, dax is to page-cache as direct-io is to page cache,
> > > > > > > effectively not there, but dig a bit deeper and you may find it.
> > > > > >
> > > > > > Uh, which manpage?  Are you talking about the MAP_SYNC documentation?
> > > > > 
> > > > > No, I was referring to the proposed wording for STATX_ATTR_DAX.
> > > > > There's no reason for this description to say anything divergent from
> > > > > that description.
> > > > 
> > > > Ahh, ok.  Something like this, then:
> > > > 
> > > >  1. There exists an in-kernel access mode flag S_DAX.  When set, the
> > > >     file is in the DAX (cpu direct access) state.  DAX state attempts to
> > > >     minimize software cache effects for both I/O and memory mappings of
> > > >     this file.  The S_DAX state is exposed to userspace via the
> > > >     STATX_ATTR_DAX statx flag.
> > > > 
> > > >     See the STATX_ATTR_DAX in the statx(2) manpage for more information.
> > > 
> > > We crossed in the ether!!!  I propose even less details here...  Leave all the
> > > details to the man page.
> > > 
> > > <quote>
> > > 1. There exists an in-kernel access mode flag S_DAX that is set when file
> > >     accesses is enabled for 'DAX'.  Applications must call statx to discover
> > >     the current S_DAX state (STATX_ATTR_DAX).  See the man page for statx for
> > >     more details.
> > > </quote>
> > 
> > Why stop cutting there? :)
> > 
> >  1. There exists an in-kernel file access mode flag S_DAX that
> >     corresponds to the statx flag STATX_ATTR_DIRECT_LOAD_STORE.  See the
> >     manpage for statx(2) for details about this access mode.
> 
> Sure!  But I'm holding to STATX_ATTR_DAX...  I don't like introducing another
> alias for this stuff.  Why have '-o dax=x' and then have some other term here?

Ok, STATX_ATTR_DAX then.

> Keep the name the same for consistency.
> 
> Searching for 'DAX Linux'[*] results in 'About 877,000 results' on Google.
> 
> While "'direct load store' Linux" results in 'About 2,630 results'.
> 
> I'll update the rest of the text though!  :-D
> 
> Ira
> 
> [*] Because 'DAX' is some company index and or a rapper...  <sigh>

Don't forget Jadzia and Ezri. ;)

--D

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V7 9/9] Documentation/dax: Update Usage section
  2020-04-14 19:48         ` Ira Weiny
@ 2020-04-15  8:23           ` Jan Kara
  0 siblings, 0 replies; 47+ messages in thread
From: Jan Kara @ 2020-04-15  8:23 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Darrick J. Wong, Linux Kernel Mailing List,
	Dave Chinner, Christoph Hellwig, Theodore Y. Ts'o, Jan Kara,
	Jeff Moyer, linux-ext4, linux-xfs, linux-fsdevel

On Tue 14-04-20 12:48:48, Ira Weiny wrote:
> On Mon, Apr 13, 2020 at 10:12:22PM -0700, Dan Williams wrote:
> > On Mon, Apr 13, 2020 at 9:38 PM Ira Weiny <ira.weiny@intel.com> wrote:
> > >
> > > On Mon, Apr 13, 2020 at 09:19:12AM -0700, Darrick J. Wong wrote:
> > > > On Sun, Apr 12, 2020 at 10:40:46PM -0700, ira.weiny@intel.com wrote:
> > > > > From: Ira Weiny <ira.weiny@intel.com>
> > > > >
> > > > > Update the Usage section to reflect the new individual dax selection
> > > > > functionality.
> > > >
> > > > Yum. :)
> > > >
> > > > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > > >
> > > > > ---
> > > > > Changes from V6:
> > > > >     Update to allow setting FS_XFLAG_DAX any time.
> > > > >     Update with list of behaviors from Darrick
> > > > >     https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> > > > >
> > > > > Changes from V5:
> > > > >     Update to reflect the agreed upon semantics
> > > > >     https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > > > > ---
> > > > >  Documentation/filesystems/dax.txt | 166 +++++++++++++++++++++++++++++-
> > > > >  1 file changed, 163 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> > > > > index 679729442fd2..af14c1b330a9 100644
> > > > > --- a/Documentation/filesystems/dax.txt
> > > > > +++ b/Documentation/filesystems/dax.txt
> > > > > @@ -17,11 +17,171 @@ For file mappings, the storage device is mapped directly into userspace.
> > > > >  Usage
> > > > >  -----
> > > > >
> > > > > -If you have a block device which supports DAX, you can make a filesystem
> > > > > +If you have a block device which supports DAX, you can make a file system
> > > > >  on it as usual.  The DAX code currently only supports files with a block
> > > > >  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> > > > > -size when creating the filesystem.  When mounting it, use the "-o dax"
> > > > > -option on the command line or add 'dax' to the options in /etc/fstab.
> > > > > +size when creating the file system.
> > > > > +
> > > > > +Currently 2 filesystems support DAX, ext4 and xfs.  Enabling DAX on them is
> > > > > +different at this time.
> > > >
> > > > I thought ext2 supports DAX?
> > >
> > > Not that I know of?  Does it?
> > 
> > Yes. Seemed like a good idea at the time, but in retrospect...
> 
> Ah ok...   Is there an objection to leaving ext2 as a global mount option?
> Updating the doc is easy enough.

I'm fine with that. I wouldn't really bother with per-inode DAX flag for
ext2.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2020-04-15  8:24 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-13  5:40 [PATCH V7 0/9] Enable per-file/per-directory DAX operations V7 ira.weiny
2020-04-13  5:40 ` [PATCH V7 1/9] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
2020-04-13 15:41   ` Darrick J. Wong
2020-04-13  5:40 ` [PATCH V7 2/9] fs: Remove unneeded IS_DAX() check in io_is_direct() ira.weiny
2020-04-14  6:22   ` Christoph Hellwig
2020-04-13  5:40 ` [PATCH V7 3/9] fs/stat: Define DAX statx attribute ira.weiny
2020-04-14  6:23   ` Christoph Hellwig
2020-04-14 20:39     ` Ira Weiny
2020-04-13  5:40 ` [PATCH V7 4/9] fs/xfs: Make DAX mount option a tri-state ira.weiny
2020-04-13 15:46   ` Darrick J. Wong
2020-04-13 19:28     ` Ira Weiny
2020-04-14  6:24       ` Christoph Hellwig
2020-04-14  6:25   ` Christoph Hellwig
2020-04-14 20:34     ` Ira Weiny
2020-04-13  5:40 ` [PATCH V7 5/9] fs/xfs: Create function xfs_inode_enable_dax() ira.weiny
2020-04-13 15:52   ` Darrick J. Wong
2020-04-13 19:39     ` Ira Weiny
2020-04-14  6:27   ` Christoph Hellwig
2020-04-14 20:43     ` Ira Weiny
2020-04-13  5:40 ` [PATCH V7 6/9] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags() ira.weiny
2020-04-13 16:01   ` Darrick J. Wong
2020-04-14  4:07     ` Ira Weiny
2020-04-13  5:40 ` [PATCH V7 7/9] fs: Define I_DONTCACNE in VFS layer ira.weiny
2020-04-13 16:09   ` Darrick J. Wong
2020-04-13 16:13     ` Darrick J. Wong
2020-04-13 19:44     ` Ira Weiny
2020-04-14 15:26   ` Jan Kara
2020-04-14 15:45     ` Ira Weiny
2020-04-14 15:59       ` Jan Kara
2020-04-13  5:40 ` [PATCH V7 8/9] fs/xfs: Change xfs_ioctl_setattr_dax_invalidate() ira.weiny
2020-04-13 16:12   ` Darrick J. Wong
2020-04-13 19:46     ` Ira Weiny
2020-04-13  5:40 ` [PATCH V7 9/9] Documentation/dax: Update Usage section ira.weiny
2020-04-13 16:19   ` Darrick J. Wong
2020-04-14  4:38     ` Ira Weiny
2020-04-14  5:12       ` Dan Williams
2020-04-14 19:48         ` Ira Weiny
2020-04-15  8:23           ` Jan Kara
2020-04-14  5:21   ` Dan Williams
2020-04-14 16:15     ` Darrick J. Wong
2020-04-14 19:04       ` Dan Williams
2020-04-14 19:57         ` Darrick J. Wong
2020-04-14 20:00           ` Ira Weiny
2020-04-14 20:18             ` Darrick J. Wong
2020-04-14 20:54               ` Ira Weiny
2020-04-14 21:02                 ` Darrick J. Wong
2020-04-14 19:58         ` Ira Weiny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).