linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V10 00/11] XFS - Enable per-file/per-directory DAX operations V10
@ 2020-04-22 21:20 ira.weiny
  2020-04-22 21:20 ` [PATCH V10 01/11] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
                   ` (10 more replies)
  0 siblings, 11 replies; 28+ messages in thread
From: ira.weiny @ 2020-04-22 21:20 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Al Viro, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, linux-ext4,
	linux-fsdevel, linux-api, Jeff Moyer

From: Ira Weiny <ira.weiny@intel.com>

Changes from V9:
	Slight reorder of series to put Documentation sooner
	modify i_state under i_lock
	Change name of xfs_ioctl_setattr_dax_invalidate() ->
		xfs_ioctl_setattr_prepare_dax()
	Do not report default dax=inode mount mode
	Move XFS_IEOFBLOCKS to '9'
	Fixup some doc typos
	Fix xfs style indentation
	Fix commit mispelling

At LSF/MM'19 [1] [2] we discussed applications that overestimate memory
consumption due to their inability to detect whether the kernel will
instantiate page cache for a file, and cases where a global dax enable via a
mount option is too coarse.

The following patch series enables the use of DAX on individual files and/or
directories on xfs, and lays some groundwork to do so in ext4.  It further
enhances the dax mount option to be a tri-state of 'always', 'never', or
'iflag' (default).  Furthermore, it maintians '-o dax' to be equivalent to '-o
dax=always'.

The insight at LSF/MM was to separate the per-mount or per-file "physical"
(FS_XFLAG_DAX) capability switch from an "effective" (S_DAX) attribute for the
file.

At LSF/MM we discussed the difficulties of switching the DAX state of a file
with active mappings / page cache.  It was thought the races could be avoided
by limiting DAX state flips to 0-length files.

However, this turns out to not be true.[3][5] This is because address space
operations (a_ops) may be in use at any time the inode is referenced.

For this reason direct manipulation of the FS_XFLAG_DAX is allowed but the
operation of the file (S_DAX) is not immediately changed.

Details of when and how DAX state can be changed on a file is included in a
documentation patch.

It should be noted that FS_XFLAG_DAX inheritance is not shown in this patch set
as it was maintained from previous work on XFS.  FS_XFLAG_DAX and it's
inheritance will need to be added to other file systems for user control. 


[1] https://lwn.net/Articles/787973/
[2] https://lwn.net/Articles/787233/
[3] https://lkml.org/lkml/2019/10/20/96
[4] https://patchwork.kernel.org/patch/11310511/
[5] https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/


To: linux-kernel@vger.kernel.org
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Cc: linux-xfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-api@vger.kernel.org

Changes from V8:
	Rebase to 5.7-rc2
	Change ALWAYS/NEVER bits to be 26/27
	Remove XFS_IDONTCACHE -> lift to I_DONTCACHE
		use mark_inode_dontcache() in XFS
	create xfs_dax_mode enum
	use xfs signature styles
	Change xfs_inode_enabe_dax() -> xfs_inode_should_enable()
		Based on feedback to ext4 series
	Fix locking of DCACHE_DONTCACHE
	Change flag_inode_dontcache() -> mark_inode_dontcache()
	Change xfs_ioctl_setattr_dax_invalidate() -> xfs_ioctl_dax_check_set_cache()
	Documentation cleanups
	Clean up all commit messages

Changes from V7:
	Add DCACHE_DONTCACHE
	If mount override don't worry about inode caching
	Change mount flags to NEVER/ALWAYS
	Clean up xfs_inode_enable_dax()
	Clarify comments
	Update documentation

Changes from V6:
	Incorporate feedback on patches
	Add ability to change FS_XFLAG_DAX on files at any time.
		Add a don't cache flag to the VFS layer
		Preserve internal XFS IDONTCACHE behavior for bulkstat
		operations

Changes from V5:
	* make dax mount option a tri-state
	* Reject changes to FS_XFLAG_DAX for regular files
		- Allow only on directories
	* Update documentation

Changes from V4:
	* Open code the aops lock rather than add it to the xfs_ilock()
	  subsystem (Darrick's comments were obsoleted by this change)
	* Fix lkp build suggestions and bugs

Changes from V3:
	* Remove global locking...  :-D
	* put back per inode locking and remove pre-mature optimizations
	* Fix issues with Directories having IS_DAX() set
	* Fix kernel crash issues reported by Jeff
	* Add some clean up patches
	* Consolidate diflags to iflags functions
	* Update/add documentation
	* Reorder/rename patches quite a bit

Changes from V2:

	* Move i_dax_sem to be a global percpu_rw_sem rather than per inode
		Internal discussions with Dan determined this would be easier,
		just as performant, and slightly less overhead that having it
		in the SB as suggested by Jan
	* Fix locking order in comments and throughout code
	* Change "mode" to "state" throughout commits
	* Add CONFIG_FS_DAX wrapper to disable inode_[un]lock_state() when not
		configured
	* Add static branch for which is activated by a device which supports
		DAX in XFS
	* Change "lock/unlock" to up/down read/write as appropriate
		Previous names were over simplified
	* Update comments/documentation

	* Remove the xfs specific lock to the vfs (global) layer.
	* Fix i_dax_sem locking order and comments

	* Move 'i_mapped' count from struct inode to struct address_space and
		rename it to mmap_count
	* Add inode_has_mappings() call

	* Fix build issues
	* Clean up syntax spacing and minor issues
	* Update man page text for STATX_ATTR_DAX
	* Add reviewed-by's
	* Rebase to 5.6

	Rename patch:
		from: fs/xfs: Add lock/unlock state to xfs
		to: fs/xfs: Add write DAX lock to xfs layer
	Add patch:
		fs/xfs: Clarify lockdep dependency for xfs_isilocked()
	Drop patch:
		fs/xfs: Fix truncate up

Ira Weiny (11):
  fs/xfs: Remove unnecessary initialization of i_rwsem
  fs: Remove unneeded IS_DAX() check in io_is_direct()
  fs/stat: Define DAX statx attribute
  Documentation/dax: Update Usage section
  fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS
  fs/xfs: Make DAX mount option a tri-state
  fs/xfs: Create function xfs_inode_should_enable_dax()
  fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  fs: Lift XFS_IDONTCACHE to the VFS layer
  fs: Introduce DCACHE_DONTCACHE
  fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()

 Documentation/filesystems/dax.txt | 164 +++++++++++++++++++++++++++++-
 drivers/block/loop.c              |   6 +-
 fs/dcache.c                       |   4 +
 fs/inode.c                        |  15 +++
 fs/stat.c                         |   3 +
 fs/xfs/xfs_icache.c               |   8 +-
 fs/xfs/xfs_inode.h                |   4 +-
 fs/xfs/xfs_ioctl.c                | 141 ++++---------------------
 fs/xfs/xfs_iops.c                 |  72 ++++++++-----
 fs/xfs/xfs_mount.h                |   4 +-
 fs/xfs/xfs_super.c                |  53 ++++++++--
 include/linux/dcache.h            |   2 +
 include/linux/fs.h                |  14 +--
 include/uapi/linux/stat.h         |   1 +
 14 files changed, 319 insertions(+), 172 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH V10 01/11] fs/xfs: Remove unnecessary initialization of i_rwsem
  2020-04-22 21:20 [PATCH V10 00/11] XFS - Enable per-file/per-directory DAX operations V10 ira.weiny
@ 2020-04-22 21:20 ` ira.weiny
  2020-04-22 21:20 ` [PATCH V10 02/11] fs: Remove unneeded IS_DAX() check in io_is_direct() ira.weiny
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: ira.weiny @ 2020-04-22 21:20 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Dave Chinner, Al Viro, Jan Kara, Dan Williams,
	Dave Chinner, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

An earlier call of xfs_reinit_inode() from xfs_iget_cache_hit() already
handles initialization of i_rwsem.

Doing so again is unneeded.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V4:
	Update commit message to make it clear the xfs_iget_cache_hit()
	is actually doing the initialization via xfs_reinit_inode()

New for V4:

NOTE: This was found while ensuring the new i_aops_sem was properly
handled.  It seems like this is a layering violation so I think it is
worth cleaning up so as to not confuse others.
---
 fs/xfs/xfs_icache.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 8bf1d15be3f6..17a0b86fe701 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -423,6 +423,7 @@ xfs_iget_cache_hit(
 		spin_unlock(&ip->i_flags_lock);
 		rcu_read_unlock();
 
+		ASSERT(!rwsem_is_locked(&inode->i_rwsem));
 		error = xfs_reinit_inode(mp, inode);
 		if (error) {
 			bool wake;
@@ -456,9 +457,6 @@ xfs_iget_cache_hit(
 		ip->i_sick = 0;
 		ip->i_checked = 0;
 
-		ASSERT(!rwsem_is_locked(&inode->i_rwsem));
-		init_rwsem(&inode->i_rwsem);
-
 		spin_unlock(&ip->i_flags_lock);
 		spin_unlock(&pag->pag_ici_lock);
 	} else {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V10 02/11] fs: Remove unneeded IS_DAX() check in io_is_direct()
  2020-04-22 21:20 [PATCH V10 00/11] XFS - Enable per-file/per-directory DAX operations V10 ira.weiny
  2020-04-22 21:20 ` [PATCH V10 01/11] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
@ 2020-04-22 21:20 ` ira.weiny
  2020-04-23 21:41   ` Dave Chinner
  2020-04-22 21:20 ` [PATCH V10 03/11] fs/stat: Define DAX statx attribute ira.weiny
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: ira.weiny @ 2020-04-22 21:20 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Jan Kara, Christoph Hellwig, Al Viro, Dan Williams,
	Dave Chinner, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

Remove the check because DAX now has it's own read/write methods and
file systems which support DAX check IS_DAX() prior to IOCB_DIRECT on
their own.  Therefore, it does not matter if the file state is DAX when
the iocb flags are created.

Also remove io_is_direct() as it is just a simple flag check.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from v8:
	Rebase to latest Linus tree

Changes from v6:
	remove io_is_direct() as well.
	Remove Reviews since this is quite a bit different.

Changes from v3:
	Reword commit message.
	Reordered to be a 'pre-cleanup' patch
---
 drivers/block/loop.c | 6 +++---
 include/linux/fs.h   | 7 +------
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index da693e6a834e..14372df0f354 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -634,8 +634,8 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
 
 static inline void loop_update_dio(struct loop_device *lo)
 {
-	__loop_update_dio(lo, io_is_direct(lo->lo_backing_file) |
-			lo->use_dio);
+	__loop_update_dio(lo, (lo->lo_backing_file->f_flags & O_DIRECT) |
+				lo->use_dio);
 }
 
 static void loop_reread_partitions(struct loop_device *lo,
@@ -1028,7 +1028,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	if (!(lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
 		blk_queue_write_cache(lo->lo_queue, true, false);
 
-	if (io_is_direct(lo->lo_backing_file) && inode->i_sb->s_bdev) {
+	if ((lo->lo_backing_file->f_flags & O_DIRECT) && inode->i_sb->s_bdev) {
 		/* In case of direct I/O, match underlying block size */
 		unsigned short bsize = bdev_logical_block_size(
 			inode->i_sb->s_bdev);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4f6f59b4f22a..a87cc5845a02 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3394,11 +3394,6 @@ extern void setattr_copy(struct inode *inode, const struct iattr *attr);
 
 extern int file_update_time(struct file *file);
 
-static inline bool io_is_direct(struct file *filp)
-{
-	return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host);
-}
-
 static inline bool vma_is_dax(const struct vm_area_struct *vma)
 {
 	return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host);
@@ -3423,7 +3418,7 @@ static inline int iocb_flags(struct file *file)
 	int res = 0;
 	if (file->f_flags & O_APPEND)
 		res |= IOCB_APPEND;
-	if (io_is_direct(file))
+	if (file->f_flags & O_DIRECT)
 		res |= IOCB_DIRECT;
 	if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))
 		res |= IOCB_DSYNC;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V10 03/11] fs/stat: Define DAX statx attribute
  2020-04-22 21:20 [PATCH V10 00/11] XFS - Enable per-file/per-directory DAX operations V10 ira.weiny
  2020-04-22 21:20 ` [PATCH V10 01/11] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
  2020-04-22 21:20 ` [PATCH V10 02/11] fs: Remove unneeded IS_DAX() check in io_is_direct() ira.weiny
@ 2020-04-22 21:20 ` ira.weiny
  2020-04-22 21:20 ` [PATCH V10 04/11] Documentation/dax: Update Usage section ira.weiny
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: ira.weiny @ 2020-04-22 21:20 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Dave Chinner, Jan Kara, Al Viro, Dan Williams,
	Dave Chinner, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

In order for users to determine if a file is currently operating in DAX
state (effective DAX).  Define a statx attribute value and set that
attribute if the effective DAX flag is set.

To go along with this we propose the following addition to the statx man
page:

STATX_ATTR_DAX

	The file is in the DAX (cpu direct access) state.  DAX state
	attempts to minimize software cache effects for both I/O and
	memory mappings of this file.  It requires a file system which
	has been configured to support DAX.

	DAX generally assumes all accesses are via cpu load / store
	instructions which can minimize overhead for small accesses, but
	may adversely affect cpu utilization for large transfers.

	File I/O is done directly to/from user-space buffers and memory
	mapped I/O may be performed with direct memory mappings that
	bypass kernel page cache.

	While the DAX property tends to result in data being transferred
	synchronously, it does not give the same guarantees of O_SYNC
	where data and the necessary metadata are transferred together.

	A DAX file may support being mapped with the MAP_SYNC flag,
	which enables a program to use CPU cache flush instructions to
	persist CPU store operations without an explicit fsync(2).  See
	mmap(2) for more information.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V2:
	Update man page text with comments from Darrick, Jan, Dan, and
	Dave.
---
 fs/stat.c                 | 3 +++
 include/uapi/linux/stat.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/fs/stat.c b/fs/stat.c
index 030008796479..894699c74dde 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -79,6 +79,9 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat,
 	if (IS_AUTOMOUNT(inode))
 		stat->attributes |= STATX_ATTR_AUTOMOUNT;
 
+	if (IS_DAX(inode))
+		stat->attributes |= STATX_ATTR_DAX;
+
 	if (inode->i_op->getattr)
 		return inode->i_op->getattr(path, stat, request_mask,
 					    query_flags);
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index ad80a5c885d5..e5f9d5517f6b 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -169,6 +169,7 @@ struct statx {
 #define STATX_ATTR_ENCRYPTED		0x00000800 /* [I] File requires key to decrypt in fs */
 #define STATX_ATTR_AUTOMOUNT		0x00001000 /* Dir: Automount trigger */
 #define STATX_ATTR_VERITY		0x00100000 /* [I] Verity protected file */
+#define STATX_ATTR_DAX			0x00002000 /* [I] File is DAX */
 
 
 #endif /* _UAPI_LINUX_STAT_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V10 04/11] Documentation/dax: Update Usage section
  2020-04-22 21:20 [PATCH V10 00/11] XFS - Enable per-file/per-directory DAX operations V10 ira.weiny
                   ` (2 preceding siblings ...)
  2020-04-22 21:20 ` [PATCH V10 03/11] fs/stat: Define DAX statx attribute ira.weiny
@ 2020-04-22 21:20 ` ira.weiny
  2020-04-23  2:33   ` Yasunori Goto
  2020-04-23 22:27   ` Dave Chinner
  2020-04-22 21:20 ` [PATCH V10 05/11] fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS ira.weiny
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 28+ messages in thread
From: ira.weiny @ 2020-04-22 21:20 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Jan Kara, Al Viro, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

Update the Usage section to reflect the new individual dax selection
functionality.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V9:
	Fix missing ')'
	Fix trialing '"'

Changes from V8:
	Updates from Darrick

Changes from V7:
	Cleanups/clarifications from Darrick and Dan

Changes from V6:
	Update to allow setting FS_XFLAG_DAX any time.
	Update with list of behaviors from Darrick
	https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/

Changes from V5:
	Update to reflect the agreed upon semantics
	https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
---
 Documentation/filesystems/dax.txt | 164 +++++++++++++++++++++++++++++-
 1 file changed, 161 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
index 679729442fd2..553712c5054e 100644
--- a/Documentation/filesystems/dax.txt
+++ b/Documentation/filesystems/dax.txt
@@ -17,11 +17,169 @@ For file mappings, the storage device is mapped directly into userspace.
 Usage
 -----
 
-If you have a block device which supports DAX, you can make a filesystem
+If you have a block device which supports DAX, you can make a file system
 on it as usual.  The DAX code currently only supports files with a block
 size equal to your kernel's PAGE_SIZE, so you may need to specify a block
-size when creating the filesystem.  When mounting it, use the "-o dax"
-option on the command line or add 'dax' to the options in /etc/fstab.
+size when creating the file system.
+
+Currently 3 filesystems support DAX: ext2, ext4 and xfs.  Enabling DAX on them
+is different.
+
+Enabling DAX on ext4 and ext2
+-----------------------------
+
+When mounting the filesystem, use the "-o dax" option on the command line or
+add 'dax' to the options in /etc/fstab.  This works to enable DAX on all files
+within the filesystem.  It is equivalent to the '-o dax=always' behavior below.
+
+
+Enabling DAX on xfs
+-------------------
+
+Summary
+-------
+
+ 1. There exists an in-kernel file access mode flag S_DAX that corresponds to
+    the statx flag STATX_ATTR_DAX.  See the manpage for statx(2) for details
+    about this access mode.
+
+ 2. There exists an advisory file inode flag FS_XFLAG_DAX that is
+    inherited from the parent directory FS_XFLAG_DAX inode flag at file
+    creation time.  This advisory flag can be set or cleared at any
+    time, but doing so does not immediately affect the S_DAX state.
+
+    Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
+    and the fs is on pmem then it will enable S_DAX at inode load time;
+    if FS_XFLAG_DAX is not set, it will not enable S_DAX.
+
+ 3. There exists a dax= mount option.
+
+    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
+
+    "-o dax=always" means "always set S_DAX (at least on pmem),
+                    and ignore FS_XFLAG_DAX."
+
+    "-o dax"        is an alias for "dax=always".
+
+    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
+
+ 4. There exists an advisory directory inode flag FS_XFLAG_DAX that can
+    be set or cleared at any time.  The flag state is inherited by any files or
+    subdirectories when they are created within that directory.
+
+ 5. Programs that require a specific file access mode (DAX or not DAX)
+    can do one of the following:
+
+    (a) Create files in directories that the FS_XFLAG_DAX flag set as
+        needed; or
+
+    (b) Have the administrator set an override via mount option; or
+
+    (c) Set or clear the file's FS_XFLAG_DAX flag as needed.  Programs
+        must then cause the kernel to evict the inode from memory.  This
+        can be done by:
+
+        i>  Closing the file and re-opening the file and using statx to
+            see if the fs has changed the S_DAX flag; and
+
+        ii> If the file still does not have the desired S_DAX access
+            mode, either unmount and remount the filesystem, or close
+            the file and use drop_caches.
+
+ 6. It is expected that users who want to squeeze every last bit of performance
+    out of the particular rough and tumble bits of their storage will also be
+    exposed to the difficulties of what happens when the operating system can't
+    totally virtualize those hardware capabilities.  DAX is such a feature.
+
+
+Details
+-------
+
+There are 2 per-file dax flags.  One is a physical inode setting (FS_XFLAG_DAX)
+and the other a currently enabled state (S_DAX).
+
+FS_XFLAG_DAX is maintained, on disk, on individual inodes.  It is preserved
+within the file system.  This 'physical' config setting can be set using an
+ioctl and/or an application such as "xfs_io -c 'chattr [-+]x'".  Files and
+directories automatically inherit FS_XFLAG_DAX from their parent directory
+_when_ _created_.  Therefore, setting FS_XFLAG_DAX at directory creation time
+can be used to set a default behavior for an entire sub-tree.  (Doing so on the
+root directory acts to set a default for the entire file system.)
+
+To clarify inheritance here are 3 examples:
+
+Example A:
+
+mkdir -p a/b/c
+xfs_io 'chattr +x' a
+mkdir a/b/c/d
+mkdir a/e
+
+	dax: a,e
+	no dax: b,c,d
+
+Example B:
+
+mkdir a
+xfs_io 'chattr +x' a
+mkdir -p a/b/c/d
+
+	dax: a,b,c,d
+	no dax:
+
+Example C:
+
+mkdir -p a/b/c
+xfs_io 'chattr +x' c
+mkdir a/b/c/d
+
+	dax: c,d
+	no dax: a,b
+
+
+The current enabled state (S_DAX) is set when a file inode is _loaded_ based on
+the underlying media support, the value of FS_XFLAG_DAX, and the file systems
+dax mount option setting.  See below.
+
+statx can be used to query S_DAX.  NOTE that a directory will never have S_DAX
+set and therefore statx will never indicate that S_DAX is set on directories.
+
+NOTE: Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs
+even if the underlying media does not support dax and/or the file system is
+overridden with a mount option.
+
+
+Overriding FS_XFLAG_DAX (dax= mount option)
+-------------------------------------------
+
+There exists a dax mount option.  Using the mount option does not change the
+physical configured state of individual files but overrides the S_DAX operating
+state when inodes are loaded.
+
+Given underlying media support, the dax mount option is a tri-state option
+(never, always, inode) with the following meanings:
+
+   "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
+   "-o dax=always" means "always set S_DAX, ignore FS_XFLAG_DAX"
+        "-o dax" by itself means "dax=always" to remain compatible with older
+	         kernels
+   "-o dax=inode" means "follow FS_XFLAG_DAX"
+
+The default state is 'inode'.  Given underlying media support, the following
+algorithm is used to determine the effective mode of the file S_DAX on a
+capable device.
+
+	S_DAX = FS_XFLAG_DAX;
+
+	if (dax_mount == "always")
+		S_DAX = true;
+	else if (dax_mount == "off")
+		S_DAX = false;
+
+To reiterate: Setting, and inheritance, continues to affect FS_XFLAG_DAX even
+while the file system is mounted with a dax override.  However, in-core inode
+state (S_DAX) will continue to be overridden until the filesystem is remounted
+with dax=inode and the inode is evicted.
 
 
 Implementation Tips for Block Driver Writers
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V10 05/11] fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS
  2020-04-22 21:20 [PATCH V10 00/11] XFS - Enable per-file/per-directory DAX operations V10 ira.weiny
                   ` (3 preceding siblings ...)
  2020-04-22 21:20 ` [PATCH V10 04/11] Documentation/dax: Update Usage section ira.weiny
@ 2020-04-22 21:20 ` ira.weiny
  2020-04-23 22:27   ` Dave Chinner
  2020-04-22 21:20 ` [PATCH V10 06/11] fs/xfs: Make DAX mount option a tri-state ira.weiny
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: ira.weiny @ 2020-04-22 21:20 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Al Viro, Jan Kara, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

In prep for the new tri-state mount option which then introduces
XFS_MOUNT_DAX_NEVER.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from v8
	Move bit to 26
---
 fs/xfs/xfs_iops.c  | 2 +-
 fs/xfs/xfs_mount.h | 3 +--
 fs/xfs/xfs_super.c | 8 ++++----
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index f7a99b3bbcf7..462f89af479a 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1248,7 +1248,7 @@ xfs_inode_supports_dax(
 		return false;
 
 	/* DAX mount option or DAX iflag must be set. */
-	if (!(mp->m_flags & XFS_MOUNT_DAX) &&
+	if (!(mp->m_flags & XFS_MOUNT_DAX_ALWAYS) &&
 	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
 		return false;
 
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index b2e4598fdf7d..f6123fb0113c 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -237,8 +237,7 @@ typedef struct xfs_mount {
 #define XFS_MOUNT_FILESTREAMS	(1ULL << 24)	/* enable the filestreams
 						   allocator */
 #define XFS_MOUNT_NOATTR2	(1ULL << 25)	/* disable use of attr2 format */
-
-#define XFS_MOUNT_DAX		(1ULL << 62)	/* TEST ONLY! */
+#define XFS_MOUNT_DAX_ALWAYS	(1ULL << 26)
 
 /*
  * Max and min values for mount-option defined I/O
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 424bb9a2d532..ce169d1c7474 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -129,7 +129,7 @@ xfs_fs_show_options(
 		{ XFS_MOUNT_GRPID,		",grpid" },
 		{ XFS_MOUNT_DISCARD,		",discard" },
 		{ XFS_MOUNT_LARGEIO,		",largeio" },
-		{ XFS_MOUNT_DAX,		",dax" },
+		{ XFS_MOUNT_DAX_ALWAYS,		",dax" },
 		{ 0, NULL }
 	};
 	struct xfs_mount	*mp = XFS_M(root->d_sb);
@@ -1261,7 +1261,7 @@ xfs_fc_parse_param(
 		return 0;
 #ifdef CONFIG_FS_DAX
 	case Opt_dax:
-		mp->m_flags |= XFS_MOUNT_DAX;
+		mp->m_flags |= XFS_MOUNT_DAX_ALWAYS;
 		return 0;
 #endif
 	default:
@@ -1454,7 +1454,7 @@ xfs_fc_fill_super(
 	if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5)
 		sb->s_flags |= SB_I_VERSION;
 
-	if (mp->m_flags & XFS_MOUNT_DAX) {
+	if (mp->m_flags & XFS_MOUNT_DAX_ALWAYS) {
 		bool rtdev_is_dax = false, datadev_is_dax;
 
 		xfs_warn(mp,
@@ -1468,7 +1468,7 @@ xfs_fc_fill_super(
 		if (!rtdev_is_dax && !datadev_is_dax) {
 			xfs_alert(mp,
 			"DAX unsupported by block device. Turning off DAX.");
-			mp->m_flags &= ~XFS_MOUNT_DAX;
+			mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS;
 		}
 		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
 			xfs_alert(mp,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V10 06/11] fs/xfs: Make DAX mount option a tri-state
  2020-04-22 21:20 [PATCH V10 00/11] XFS - Enable per-file/per-directory DAX operations V10 ira.weiny
                   ` (4 preceding siblings ...)
  2020-04-22 21:20 ` [PATCH V10 05/11] fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS ira.weiny
@ 2020-04-22 21:20 ` ira.weiny
  2020-04-23 22:35   ` Dave Chinner
  2020-04-22 21:20 ` [PATCH V10 07/11] fs/xfs: Create function xfs_inode_should_enable_dax() ira.weiny
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: ira.weiny @ 2020-04-22 21:20 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Al Viro, Jan Kara, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

As agreed upon[1].  We make the dax mount option a tri-state.  '-o dax'
continues to operate the same.  We add 'always', 'never', and 'inode'
(default).

[1] https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V9:
	Fix indentation in xfs_mount_set_dax_mode()
	Do not report dax=inode

Changes from v8:
	Move NEVER bit to 27
	Use xfs signature style
	use xfs_dax_mode enum

Changes from v7:
	Change to XFS_MOUNT_DAX_NEVER

Changes from v6:
	Use 2 flag bits rather than a field.
	change iflag to inode

Changes from v5:
	New Patch
---
 fs/xfs/xfs_mount.h |  1 +
 fs/xfs/xfs_super.c | 49 ++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index f6123fb0113c..37bfb50db809 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -238,6 +238,7 @@ typedef struct xfs_mount {
 						   allocator */
 #define XFS_MOUNT_NOATTR2	(1ULL << 25)	/* disable use of attr2 format */
 #define XFS_MOUNT_DAX_ALWAYS	(1ULL << 26)
+#define XFS_MOUNT_DAX_NEVER	(1ULL << 27)
 
 /*
  * Max and min values for mount-option defined I/O
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index ce169d1c7474..a4028992b789 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -47,6 +47,39 @@ static struct kset *xfs_kset;		/* top-level xfs sysfs dir */
 static struct xfs_kobj xfs_dbg_kobj;	/* global debug sysfs attrs */
 #endif
 
+enum xfs_dax_mode {
+	XFS_DAX_INODE = 0,
+	XFS_DAX_ALWAYS = 1,
+	XFS_DAX_NEVER = 2,
+};
+
+static void
+xfs_mount_set_dax_mode(
+	struct xfs_mount	*mp,
+	enum xfs_dax_mode	mode)
+{
+	switch (mode) {
+	case XFS_DAX_INODE:
+		mp->m_flags &= ~(XFS_MOUNT_DAX_ALWAYS | XFS_MOUNT_DAX_NEVER);
+		break;
+	case XFS_DAX_ALWAYS:
+		mp->m_flags |= XFS_MOUNT_DAX_ALWAYS;
+		mp->m_flags &= ~XFS_MOUNT_DAX_NEVER;
+		break;
+	case XFS_DAX_NEVER:
+		mp->m_flags |= XFS_MOUNT_DAX_NEVER;
+		mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS;
+		break;
+	}
+}
+
+static const struct constant_table dax_param_enums[] = {
+	{"inode",	XFS_DAX_INODE },
+	{"always",	XFS_DAX_ALWAYS },
+	{"never",	XFS_DAX_NEVER },
+	{}
+};
+
 /*
  * Table driven mount option parser.
  */
@@ -59,7 +92,7 @@ enum {
 	Opt_filestreams, Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota,
 	Opt_prjquota, Opt_uquota, Opt_gquota, Opt_pquota,
 	Opt_uqnoenforce, Opt_gqnoenforce, Opt_pqnoenforce, Opt_qnoenforce,
-	Opt_discard, Opt_nodiscard, Opt_dax,
+	Opt_discard, Opt_nodiscard, Opt_dax, Opt_dax_enum,
 };
 
 static const struct fs_parameter_spec xfs_fs_parameters[] = {
@@ -103,6 +136,7 @@ static const struct fs_parameter_spec xfs_fs_parameters[] = {
 	fsparam_flag("discard",		Opt_discard),
 	fsparam_flag("nodiscard",	Opt_nodiscard),
 	fsparam_flag("dax",		Opt_dax),
+	fsparam_enum("dax",		Opt_dax_enum, dax_param_enums),
 	{}
 };
 
@@ -129,7 +163,6 @@ xfs_fs_show_options(
 		{ XFS_MOUNT_GRPID,		",grpid" },
 		{ XFS_MOUNT_DISCARD,		",discard" },
 		{ XFS_MOUNT_LARGEIO,		",largeio" },
-		{ XFS_MOUNT_DAX_ALWAYS,		",dax" },
 		{ 0, NULL }
 	};
 	struct xfs_mount	*mp = XFS_M(root->d_sb);
@@ -185,6 +218,11 @@ xfs_fs_show_options(
 	if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT))
 		seq_puts(m, ",noquota");
 
+	if (mp->m_flags & XFS_MOUNT_DAX_ALWAYS)
+		seq_puts(m, ",dax=always");
+	else if (mp->m_flags & XFS_MOUNT_DAX_NEVER)
+		seq_puts(m, ",dax=never");
+
 	return 0;
 }
 
@@ -1261,7 +1299,10 @@ xfs_fc_parse_param(
 		return 0;
 #ifdef CONFIG_FS_DAX
 	case Opt_dax:
-		mp->m_flags |= XFS_MOUNT_DAX_ALWAYS;
+		xfs_mount_set_dax_mode(mp, XFS_DAX_ALWAYS);
+		return 0;
+	case Opt_dax_enum:
+		xfs_mount_set_dax_mode(mp, result.uint_32);
 		return 0;
 #endif
 	default:
@@ -1468,7 +1509,7 @@ xfs_fc_fill_super(
 		if (!rtdev_is_dax && !datadev_is_dax) {
 			xfs_alert(mp,
 			"DAX unsupported by block device. Turning off DAX.");
-			mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS;
+			xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
 		}
 		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
 			xfs_alert(mp,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V10 07/11] fs/xfs: Create function xfs_inode_should_enable_dax()
  2020-04-22 21:20 [PATCH V10 00/11] XFS - Enable per-file/per-directory DAX operations V10 ira.weiny
                   ` (5 preceding siblings ...)
  2020-04-22 21:20 ` [PATCH V10 06/11] fs/xfs: Make DAX mount option a tri-state ira.weiny
@ 2020-04-22 21:20 ` ira.weiny
  2020-04-23 22:36   ` Dave Chinner
  2020-04-22 21:20 ` [PATCH V10 08/11] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags() ira.weiny
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: ira.weiny @ 2020-04-22 21:20 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Al Viro, Jan Kara, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

xfs_inode_supports_dax() should reflect if the inode can support DAX not
that it is enabled for DAX.

Change the use of xfs_inode_supports_dax() to reflect only if the inode
and underlying storage support dax.

Add a new function xfs_inode_should_enable_dax() which reflects if the
inode should be enabled for DAX.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V8
	Change to 'should enable' (feedback given by Jan in ext4 series)
		Darrick I've preserved your Reviewed-by for now LMK if
		that is an issue.

Changes from v7:
	Move S_ISREG check first
	use IS_ENABLED(CONFIG_FS_DAX) rather than duplicated function

Changes from v6:
	Change enable checks to be sequential logic.
	Update for 2 bit tri-state option.
	Make 'static' consistent.
	Don't set S_DAX if !CONFIG_FS_DAX

Changes from v5:
	Update to reflect the new tri-state mount option

Changes from v3:
	Update functions and names to be more clear
	Update commit message
	Merge with
		'fs/xfs: Clean up DAX support check'
		don't allow IS_DAX() on a directory
		use STATIC macro for static
		make xfs_inode_supports_dax() static
---
 fs/xfs/xfs_iops.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 462f89af479a..1814f10e43d3 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1243,13 +1243,12 @@ xfs_inode_supports_dax(
 {
 	struct xfs_mount	*mp = ip->i_mount;
 
-	/* Only supported on non-reflinked files. */
-	if (!S_ISREG(VFS_I(ip)->i_mode) || xfs_is_reflink_inode(ip))
+	/* Only supported on regular files. */
+	if (!S_ISREG(VFS_I(ip)->i_mode))
 		return false;
 
-	/* DAX mount option or DAX iflag must be set. */
-	if (!(mp->m_flags & XFS_MOUNT_DAX_ALWAYS) &&
-	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
+	/* Only supported on non-reflinked files. */
+	if (xfs_is_reflink_inode(ip))
 		return false;
 
 	/* Block size must match page size */
@@ -1260,6 +1259,23 @@ xfs_inode_supports_dax(
 	return xfs_inode_buftarg(ip)->bt_daxdev != NULL;
 }
 
+static bool
+xfs_inode_should_enable_dax(
+	struct xfs_inode *ip)
+{
+	if (!IS_ENABLED(CONFIG_FS_DAX))
+		return false;
+	if (ip->i_mount->m_flags & XFS_MOUNT_DAX_NEVER)
+		return false;
+	if (!xfs_inode_supports_dax(ip))
+		return false;
+	if (ip->i_mount->m_flags & XFS_MOUNT_DAX_ALWAYS)
+		return true;
+	if (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)
+		return true;
+	return false;
+}
+
 STATIC void
 xfs_diflags_to_iflags(
 	struct inode		*inode,
@@ -1278,7 +1294,7 @@ xfs_diflags_to_iflags(
 		inode->i_flags |= S_SYNC;
 	if (flags & XFS_DIFLAG_NOATIME)
 		inode->i_flags |= S_NOATIME;
-	if (xfs_inode_supports_dax(ip))
+	if (xfs_inode_should_enable_dax(ip))
 		inode->i_flags |= S_DAX;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V10 08/11] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-22 21:20 [PATCH V10 00/11] XFS - Enable per-file/per-directory DAX operations V10 ira.weiny
                   ` (6 preceding siblings ...)
  2020-04-22 21:20 ` [PATCH V10 07/11] fs/xfs: Create function xfs_inode_should_enable_dax() ira.weiny
@ 2020-04-22 21:20 ` ira.weiny
  2020-04-23 22:37   ` Dave Chinner
  2020-04-22 21:21 ` [PATCH V10 09/11] fs: Lift XFS_IDONTCACHE to the VFS layer ira.weiny
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: ira.weiny @ 2020-04-22 21:20 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Al Viro, Jan Kara, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

The functionality in xfs_diflags_to_linux() and xfs_diflags_to_iflags() are
nearly identical.  The only difference is that *_to_linux() is called after
inode setup and disallows changing the DAX flag.

Combining them can be done with a flag which indicates if this is the initial
setup to allow the DAX flag to be properly set only at init time.

So remove xfs_diflags_to_linux() and call the modified xfs_diflags_to_iflags()
directly.

While we are here simplify xfs_diflags_to_iflags() to take struct xfs_inode and
use xfs_ip2xflags() to ensure future diflags are included correctly.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V7:
	Clarify with a comment the reason for leaving S_DAX out of the
	mask

Changes from V6:
	Move unrelated hunk to previous patch.
	Change logic for better code generation.

Changes from V5:
	The functions are no longer identical so we can only combine
	them rather than deleting one completely.  This is reflected in
	the new init parameter.
---
 fs/xfs/xfs_inode.h |  1 +
 fs/xfs/xfs_ioctl.c | 33 +--------------------------------
 fs/xfs/xfs_iops.c  | 46 +++++++++++++++++++++++++++-------------------
 3 files changed, 29 insertions(+), 51 deletions(-)

diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index c6a63f6764a6..83073c883fbf 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -467,6 +467,7 @@ int	xfs_break_layouts(struct inode *inode, uint *iolock,
 /* from xfs_iops.c */
 extern void xfs_setup_inode(struct xfs_inode *ip);
 extern void xfs_setup_iops(struct xfs_inode *ip);
+extern void xfs_diflags_to_iflags(struct xfs_inode *ip, bool init);
 
 /*
  * When setting up a newly allocated inode, we need to call
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 309958186d33..104495ac187c 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1201,37 +1201,6 @@ xfs_flags2diflags2(
 	return di_flags2;
 }
 
-STATIC void
-xfs_diflags_to_linux(
-	struct xfs_inode	*ip)
-{
-	struct inode		*inode = VFS_I(ip);
-	unsigned int		xflags = xfs_ip2xflags(ip);
-
-	if (xflags & FS_XFLAG_IMMUTABLE)
-		inode->i_flags |= S_IMMUTABLE;
-	else
-		inode->i_flags &= ~S_IMMUTABLE;
-	if (xflags & FS_XFLAG_APPEND)
-		inode->i_flags |= S_APPEND;
-	else
-		inode->i_flags &= ~S_APPEND;
-	if (xflags & FS_XFLAG_SYNC)
-		inode->i_flags |= S_SYNC;
-	else
-		inode->i_flags &= ~S_SYNC;
-	if (xflags & FS_XFLAG_NOATIME)
-		inode->i_flags |= S_NOATIME;
-	else
-		inode->i_flags &= ~S_NOATIME;
-#if 0	/* disabled until the flag switching races are sorted out */
-	if (xflags & FS_XFLAG_DAX)
-		inode->i_flags |= S_DAX;
-	else
-		inode->i_flags &= ~S_DAX;
-#endif
-}
-
 static int
 xfs_ioctl_setattr_xflags(
 	struct xfs_trans	*tp,
@@ -1269,7 +1238,7 @@ xfs_ioctl_setattr_xflags(
 	ip->i_d.di_flags = xfs_flags2diflags(ip, fa->fsx_xflags);
 	ip->i_d.di_flags2 = di_flags2;
 
-	xfs_diflags_to_linux(ip);
+	xfs_diflags_to_iflags(ip, false);
 	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 	XFS_STATS_INC(mp, xs_ig_attrchg);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 1814f10e43d3..b70b735fe4a4 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1276,26 +1276,34 @@ xfs_inode_should_enable_dax(
 	return false;
 }
 
-STATIC void
+void
 xfs_diflags_to_iflags(
-	struct inode		*inode,
-	struct xfs_inode	*ip)
+	struct xfs_inode	*ip,
+	bool init)
 {
-	uint16_t		flags = ip->i_d.di_flags;
-
-	inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC |
-			    S_NOATIME | S_DAX);
-
-	if (flags & XFS_DIFLAG_IMMUTABLE)
-		inode->i_flags |= S_IMMUTABLE;
-	if (flags & XFS_DIFLAG_APPEND)
-		inode->i_flags |= S_APPEND;
-	if (flags & XFS_DIFLAG_SYNC)
-		inode->i_flags |= S_SYNC;
-	if (flags & XFS_DIFLAG_NOATIME)
-		inode->i_flags |= S_NOATIME;
-	if (xfs_inode_should_enable_dax(ip))
-		inode->i_flags |= S_DAX;
+	struct inode            *inode = VFS_I(ip);
+	unsigned int            xflags = xfs_ip2xflags(ip);
+	unsigned int            flags = 0;
+
+	ASSERT(!(IS_DAX(inode) && init));
+
+	if (xflags & FS_XFLAG_IMMUTABLE)
+		flags |= S_IMMUTABLE;
+	if (xflags & FS_XFLAG_APPEND)
+		flags |= S_APPEND;
+	if (xflags & FS_XFLAG_SYNC)
+		flags |= S_SYNC;
+	if (xflags & FS_XFLAG_NOATIME)
+		flags |= S_NOATIME;
+	if (init && xfs_inode_should_enable_dax(ip))
+		flags |= S_DAX;
+
+	/*
+	 * S_DAX can only be set during inode initialization and is never set by
+	 * the VFS, so we cannot mask off S_DAX in i_flags.
+	 */
+	inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME);
+	inode->i_flags |= flags;
 }
 
 /*
@@ -1321,7 +1329,7 @@ xfs_setup_inode(
 	inode_fake_hash(inode);
 
 	i_size_write(inode, ip->i_d.di_size);
-	xfs_diflags_to_iflags(inode, ip);
+	xfs_diflags_to_iflags(ip, true);
 
 	if (S_ISDIR(inode->i_mode)) {
 		/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V10 09/11] fs: Lift XFS_IDONTCACHE to the VFS layer
  2020-04-22 21:20 [PATCH V10 00/11] XFS - Enable per-file/per-directory DAX operations V10 ira.weiny
                   ` (7 preceding siblings ...)
  2020-04-22 21:20 ` [PATCH V10 08/11] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags() ira.weiny
@ 2020-04-22 21:21 ` ira.weiny
  2020-04-23 22:38   ` Dave Chinner
  2020-04-22 21:21 ` [PATCH V10 10/11] fs: Introduce DCACHE_DONTCACHE ira.weiny
  2020-04-22 21:21 ` [PATCH V10 11/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate() ira.weiny
  10 siblings, 1 reply; 28+ messages in thread
From: ira.weiny @ 2020-04-22 21:21 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Al Viro, Jan Kara, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

DAX effective mode (S_DAX) changes requires inode eviction.

XFS has an advisory flag (XFS_IDONTCACHE) to prevent caching of the
inode if no other additional references are taken.  We lift this flag to
the VFS layer and change the behavior slightly by allowing the flag to
remain even if multiple references are taken.

This will expedite the eviction of inodes to change S_DAX.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V9:
	Fix misspelling in commit subject
	move XFS_IEOFBLOCKS to '9'

Changes from V8:
	Remove XFS_IDONTCACHE
---
 fs/xfs/xfs_icache.c | 4 ++--
 fs/xfs/xfs_inode.h  | 3 +--
 fs/xfs/xfs_super.c  | 2 +-
 include/linux/fs.h  | 6 +++++-
 4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 17a0b86fe701..de76f7f60695 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -477,7 +477,7 @@ xfs_iget_cache_hit(
 		xfs_ilock(ip, lock_flags);
 
 	if (!(flags & XFS_IGET_INCORE))
-		xfs_iflags_clear(ip, XFS_ISTALE | XFS_IDONTCACHE);
+		xfs_iflags_clear(ip, XFS_ISTALE);
 	XFS_STATS_INC(mp, xs_ig_found);
 
 	return 0;
@@ -559,7 +559,7 @@ xfs_iget_cache_miss(
 	 */
 	iflags = XFS_INEW;
 	if (flags & XFS_IGET_DONTCACHE)
-		iflags |= XFS_IDONTCACHE;
+		VFS_I(ip)->i_state |= I_DONTCACHE;
 	ip->i_udquot = NULL;
 	ip->i_gdquot = NULL;
 	ip->i_pdquot = NULL;
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 83073c883fbf..d8ce3eaa246e 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -218,8 +218,7 @@ static inline bool xfs_inode_has_cow_data(struct xfs_inode *ip)
 #define XFS_IFLOCK		(1 << __XFS_IFLOCK_BIT)
 #define __XFS_IPINNED_BIT	8	 /* wakeup key for zero pin count */
 #define XFS_IPINNED		(1 << __XFS_IPINNED_BIT)
-#define XFS_IDONTCACHE		(1 << 9) /* don't cache the inode long term */
-#define XFS_IEOFBLOCKS		(1 << 10)/* has the preallocblocks tag set */
+#define XFS_IEOFBLOCKS		(1 << 9) /* has the preallocblocks tag set */
 /*
  * If this unlinked inode is in the middle of recovery, don't let drop_inode
  * truncate and free the inode.  This can happen if we iget the inode during
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index a4028992b789..284ab186798c 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -740,7 +740,7 @@ xfs_fs_drop_inode(
 		return 0;
 	}
 
-	return generic_drop_inode(inode) || (ip->i_flags & XFS_IDONTCACHE);
+	return generic_drop_inode(inode);
 }
 
 static void
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a87cc5845a02..44bd45af760f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2156,6 +2156,8 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
  *
  * I_CREATING		New object's inode in the middle of setting up.
  *
+ * I_DONTCACHE		Evict inode as soon as it is not used anymore.
+ *
  * Q: What is the difference between I_WILL_FREE and I_FREEING?
  */
 #define I_DIRTY_SYNC		(1 << 0)
@@ -2178,6 +2180,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
 #define I_WB_SWITCH		(1 << 13)
 #define I_OVL_INUSE		(1 << 14)
 #define I_CREATING		(1 << 15)
+#define I_DONTCACHE		(1 << 16)
 
 #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
 #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
@@ -3049,7 +3052,8 @@ extern int inode_needs_sync(struct inode *inode);
 extern int generic_delete_inode(struct inode *inode);
 static inline int generic_drop_inode(struct inode *inode)
 {
-	return !inode->i_nlink || inode_unhashed(inode);
+	return !inode->i_nlink || inode_unhashed(inode) ||
+		(inode->i_state & I_DONTCACHE);
 }
 
 extern struct inode *ilookup5_nowait(struct super_block *sb,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V10 10/11] fs: Introduce DCACHE_DONTCACHE
  2020-04-22 21:20 [PATCH V10 00/11] XFS - Enable per-file/per-directory DAX operations V10 ira.weiny
                   ` (8 preceding siblings ...)
  2020-04-22 21:21 ` [PATCH V10 09/11] fs: Lift XFS_IDONTCACHE to the VFS layer ira.weiny
@ 2020-04-22 21:21 ` ira.weiny
  2020-04-23  8:40   ` Jan Kara
  2020-04-23 22:57   ` Dave Chinner
  2020-04-22 21:21 ` [PATCH V10 11/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate() ira.weiny
  10 siblings, 2 replies; 28+ messages in thread
From: ira.weiny @ 2020-04-22 21:21 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Al Viro, Jan Kara, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

DCACHE_DONTCACHE indicates a dentry should not be cached on final
dput().

Also add a helper function to mark DCACHE_DONTCACHE on all dentries
pointing to a specific inode when that inode is being set I_DONTCACHE.

This facilitates dropping dentry references to inodes sooner which
require eviction to swap S_DAX mode.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V9:
	modify i_state under i_lock
	Update comment
		"Purge from memory on final dput()"

Changes from V8:
	Update commit message
	Use mark_inode_dontcache in XFS
	Fix locking...  can't use rcu here.
	Change name to mark_inode_dontcache
---
 fs/dcache.c            |  4 ++++
 fs/inode.c             | 15 +++++++++++++++
 fs/xfs/xfs_icache.c    |  2 +-
 include/linux/dcache.h |  2 ++
 include/linux/fs.h     |  1 +
 5 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index b280e07e162b..0030fabab2c4 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -647,6 +647,10 @@ static inline bool retain_dentry(struct dentry *dentry)
 		if (dentry->d_op->d_delete(dentry))
 			return false;
 	}
+
+	if (unlikely(dentry->d_flags & DCACHE_DONTCACHE))
+		return false;
+
 	/* retain; LRU fodder */
 	dentry->d_lockref.count--;
 	if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST)))
diff --git a/fs/inode.c b/fs/inode.c
index 93d9252a00ab..316355433797 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1526,6 +1526,21 @@ int generic_delete_inode(struct inode *inode)
 }
 EXPORT_SYMBOL(generic_delete_inode);
 
+void mark_inode_dontcache(struct inode *inode)
+{
+	struct dentry *de;
+
+	spin_lock(&inode->i_lock);
+	hlist_for_each_entry(de, &inode->i_dentry, d_u.d_alias) {
+		spin_lock(&de->d_lock);
+		de->d_flags |= DCACHE_DONTCACHE;
+		spin_unlock(&de->d_lock);
+	}
+	inode->i_state |= I_DONTCACHE;
+	spin_unlock(&inode->i_lock);
+}
+EXPORT_SYMBOL(mark_inode_dontcache);
+
 /*
  * Called when we're dropping the last reference
  * to an inode.
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index de76f7f60695..3c8f44477804 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -559,7 +559,7 @@ xfs_iget_cache_miss(
 	 */
 	iflags = XFS_INEW;
 	if (flags & XFS_IGET_DONTCACHE)
-		VFS_I(ip)->i_state |= I_DONTCACHE;
+		mark_inode_dontcache(VFS_I(ip));
 	ip->i_udquot = NULL;
 	ip->i_gdquot = NULL;
 	ip->i_pdquot = NULL;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index c1488cc84fd9..a81f0c3cf352 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -177,6 +177,8 @@ struct dentry_operations {
 
 #define DCACHE_REFERENCED		0x00000040 /* Recently used, don't discard. */
 
+#define DCACHE_DONTCACHE		0x00000080 /* Purge from memory on final dput() */
+
 #define DCACHE_CANT_MOUNT		0x00000100
 #define DCACHE_GENOCIDE			0x00000200
 #define DCACHE_SHRINK_LIST		0x00000400
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 44bd45af760f..064168ec2e0b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3055,6 +3055,7 @@ static inline int generic_drop_inode(struct inode *inode)
 	return !inode->i_nlink || inode_unhashed(inode) ||
 		(inode->i_state & I_DONTCACHE);
 }
+extern void mark_inode_dontcache(struct inode *inode);
 
 extern struct inode *ilookup5_nowait(struct super_block *sb,
 		unsigned long hashval, int (*test)(struct inode *, void *),
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH V10 11/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()
  2020-04-22 21:20 [PATCH V10 00/11] XFS - Enable per-file/per-directory DAX operations V10 ira.weiny
                   ` (9 preceding siblings ...)
  2020-04-22 21:21 ` [PATCH V10 10/11] fs: Introduce DCACHE_DONTCACHE ira.weiny
@ 2020-04-22 21:21 ` ira.weiny
  10 siblings, 0 replies; 28+ messages in thread
From: ira.weiny @ 2020-04-22 21:21 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Al Viro, Jan Kara, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

Because of the separation of FS_XFLAG_DAX from S_DAX and the delayed
setting of S_DAX, data invalidation no longer needs to happen when
FS_XFLAG_DAX is changed.

Change xfs_ioctl_setattr_dax_invalidate() to be
xfs_ioctl_dax_check_set_cache() and alter the code to reflect the new
functionality.

Furthermore, we no longer need the locking so we remove the join_flags
logic.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V9:
	Change name of function to xfs_ioctl_setattr_prepare_dax()

Changes from V8:
	Change name of function to xfs_ioctl_dax_check_set_cache()
	Update commit message
	Fix bit manipulations

Changes from V7:
	Use new flag_inode_dontcache()
	Skip don't cache if mount over ride is active.

Changes from v6:
	Fix completely broken implementation and update commit message.
	Use the new VFS layer I_DONTCACHE to facilitate inode eviction
	and S_DAX changing on drop_caches

Changes from v5:
	New patch
---
 fs/xfs/xfs_ioctl.c | 108 +++++++++------------------------------------
 1 file changed, 20 insertions(+), 88 deletions(-)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 104495ac187c..aa1a8bd7d68d 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1245,64 +1245,26 @@ xfs_ioctl_setattr_xflags(
 	return 0;
 }
 
-/*
- * If we are changing DAX flags, we have to ensure the file is clean and any
- * cached objects in the address space are invalidated and removed. This
- * requires us to lock out other IO and page faults similar to a truncate
- * operation. The locks need to be held until the transaction has been committed
- * so that the cache invalidation is atomic with respect to the DAX flag
- * manipulation.
- */
-static int
-xfs_ioctl_setattr_dax_invalidate(
+static void
+xfs_ioctl_setattr_prepare_dax(
 	struct xfs_inode	*ip,
-	struct fsxattr		*fa,
-	int			*join_flags)
+	struct fsxattr		*fa)
 {
-	struct inode		*inode = VFS_I(ip);
-	struct super_block	*sb = inode->i_sb;
-	int			error;
-
-	*join_flags = 0;
-
-	/*
-	 * It is only valid to set the DAX flag on regular files and
-	 * directories on filesystems where the block size is equal to the page
-	 * size. On directories it serves as an inherited hint so we don't
-	 * have to check the device for dax support or flush pagecache.
-	 */
-	if (fa->fsx_xflags & FS_XFLAG_DAX) {
-		struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
-
-		if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize))
-			return -EINVAL;
-	}
-
-	/* If the DAX state is not changing, we have nothing to do here. */
-	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
-		return 0;
-	if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
-		return 0;
+	struct xfs_mount	*mp = ip->i_mount;
+	struct inode            *inode = VFS_I(ip);
 
 	if (S_ISDIR(inode->i_mode))
-		return 0;
-
-	/* lock, flush and invalidate mapping in preparation for flag change */
-	xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
-	error = filemap_write_and_wait(inode->i_mapping);
-	if (error)
-		goto out_unlock;
-	error = invalidate_inode_pages2(inode->i_mapping);
-	if (error)
-		goto out_unlock;
-
-	*join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
-	return 0;
+		return;
 
-out_unlock:
-	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
-	return error;
+	if ((mp->m_flags & XFS_MOUNT_DAX_ALWAYS) ||
+	    (mp->m_flags & XFS_MOUNT_DAX_NEVER))
+		return;
 
+	if (((fa->fsx_xflags & FS_XFLAG_DAX) &&
+	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) ||
+	    (!(fa->fsx_xflags & FS_XFLAG_DAX) &&
+	     (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)))
+		mark_inode_dontcache(inode);
 }
 
 /*
@@ -1310,17 +1272,10 @@ xfs_ioctl_setattr_dax_invalidate(
  * have permission to do so. On success, return a clean transaction and the
  * inode locked exclusively ready for further operation specific checks. On
  * failure, return an error without modifying or locking the inode.
- *
- * The inode might already be IO locked on call. If this is the case, it is
- * indicated in @join_flags and we take full responsibility for ensuring they
- * are unlocked from now on. Hence if we have an error here, we still have to
- * unlock them. Otherwise, once they are joined to the transaction, they will
- * be unlocked on commit/cancel.
  */
 static struct xfs_trans *
 xfs_ioctl_setattr_get_trans(
-	struct xfs_inode	*ip,
-	int			join_flags)
+	struct xfs_inode	*ip)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_trans	*tp;
@@ -1337,8 +1292,7 @@ xfs_ioctl_setattr_get_trans(
 		goto out_unlock;
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
-	join_flags = 0;
+	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
 
 	/*
 	 * CAP_FOWNER overrides the following restrictions:
@@ -1359,8 +1313,6 @@ xfs_ioctl_setattr_get_trans(
 out_cancel:
 	xfs_trans_cancel(tp);
 out_unlock:
-	if (join_flags)
-		xfs_iunlock(ip, join_flags);
 	return ERR_PTR(error);
 }
 
@@ -1486,7 +1438,6 @@ xfs_ioctl_setattr(
 	struct xfs_dquot	*pdqp = NULL;
 	struct xfs_dquot	*olddquot = NULL;
 	int			code;
-	int			join_flags = 0;
 
 	trace_xfs_ioctl_setattr(ip);
 
@@ -1510,18 +1461,9 @@ xfs_ioctl_setattr(
 			return code;
 	}
 
-	/*
-	 * Changing DAX config may require inode locking for mapping
-	 * invalidation. These need to be held all the way to transaction commit
-	 * or cancel time, so need to be passed through to
-	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
-	 * appropriately.
-	 */
-	code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags);
-	if (code)
-		goto error_free_dquots;
+	xfs_ioctl_setattr_prepare_dax(ip, fa);
 
-	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
+	tp = xfs_ioctl_setattr_get_trans(ip);
 	if (IS_ERR(tp)) {
 		code = PTR_ERR(tp);
 		goto error_free_dquots;
@@ -1651,7 +1593,6 @@ xfs_ioc_setxflags(
 	struct fsxattr		fa;
 	struct fsxattr		old_fa;
 	unsigned int		flags;
-	int			join_flags = 0;
 	int			error;
 
 	if (copy_from_user(&flags, arg, sizeof(flags)))
@@ -1668,18 +1609,9 @@ xfs_ioc_setxflags(
 	if (error)
 		return error;
 
-	/*
-	 * Changing DAX config may require inode locking for mapping
-	 * invalidation. These need to be held all the way to transaction commit
-	 * or cancel time, so need to be passed through to
-	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
-	 * appropriately.
-	 */
-	error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags);
-	if (error)
-		goto out_drop_write;
+	xfs_ioctl_setattr_prepare_dax(ip, &fa);
 
-	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
+	tp = xfs_ioctl_setattr_get_trans(ip);
 	if (IS_ERR(tp)) {
 		error = PTR_ERR(tp);
 		goto out_drop_write;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH V10 04/11] Documentation/dax: Update Usage section
  2020-04-22 21:20 ` [PATCH V10 04/11] Documentation/dax: Update Usage section ira.weiny
@ 2020-04-23  2:33   ` Yasunori Goto
  2020-04-23  5:05     ` Ira Weiny
  2020-04-23 22:27   ` Dave Chinner
  1 sibling, 1 reply; 28+ messages in thread
From: Yasunori Goto @ 2020-04-23  2:33 UTC (permalink / raw)
  To: ira.weiny, linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Jan Kara, Al Viro, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jeff Moyer, linux-ext4, linux-fsdevel,
	linux-api

Hello,

I'm trying use your patch now, and I have a small comment in this document.

On 2020/04/23 6:20, ira.weiny@intel.com wrote:

> +To clarify inheritance here are 3 examples:
> +
> +Example A:
> +
> +mkdir -p a/b/c
> +xfs_io 'chattr +x' a

Probably, "-c" is necessary here.

xfs_io -c 'chattr +x' a


> +mkdir a/b/c/d
> +mkdir a/e
> +
> +	dax: a,e
> +	no dax: b,c,d
> +
> +Example B:
> +
> +mkdir a
> +xfs_io 'chattr +x' a
ditto
> +mkdir -p a/b/c/d
> +
> +	dax: a,b,c,d
> +	no dax:
> +
> +Example C:
> +
> +mkdir -p a/b/c
> +xfs_io 'chattr +x' c
ditto
> +mkdir a/b/c/d
> +
> +	dax: c,d
> +	no dax: a,b
> +
> +

---

Yasunori Goto


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V10 04/11] Documentation/dax: Update Usage section
  2020-04-23  2:33   ` Yasunori Goto
@ 2020-04-23  5:05     ` Ira Weiny
  0 siblings, 0 replies; 28+ messages in thread
From: Ira Weiny @ 2020-04-23  5:05 UTC (permalink / raw)
  To: Yasunori Goto
  Cc: linux-kernel, linux-xfs, Darrick J. Wong, Jan Kara, Al Viro,
	Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jeff Moyer, linux-ext4, linux-fsdevel,
	linux-api

On Thu, Apr 23, 2020 at 11:33:26AM +0900, Yasunori Goto wrote:
> Hello,
> 
> I'm trying use your patch now, and I have a small comment in this document.
> 
> On 2020/04/23 6:20, ira.weiny@intel.com wrote:
> 
> > +To clarify inheritance here are 3 examples:
> > +
> > +Example A:
> > +
> > +mkdir -p a/b/c
> > +xfs_io 'chattr +x' a
> 
> Probably, "-c" is necessary here.
> 
> xfs_io -c 'chattr +x' a

Yes! Thanks!
> 
> 
> > +mkdir a/b/c/d
> > +mkdir a/e
> > +
> > +	dax: a,e
> > +	no dax: b,c,d
> > +
> > +Example B:
> > +
> > +mkdir a
> > +xfs_io 'chattr +x' a
> ditto
> > +mkdir -p a/b/c/d
> > +
> > +	dax: a,b,c,d
> > +	no dax:
> > +
> > +Example C:
> > +
> > +mkdir -p a/b/c
> > +xfs_io 'chattr +x' c
> ditto

Thank you!  Updated.
Ira

> > +mkdir a/b/c/d
> > +
> > +	dax: c,d
> > +	no dax: a,b
> > +
> > +
> 
> ---
> 
> Yasunori Goto
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V10 10/11] fs: Introduce DCACHE_DONTCACHE
  2020-04-22 21:21 ` [PATCH V10 10/11] fs: Introduce DCACHE_DONTCACHE ira.weiny
@ 2020-04-23  8:40   ` Jan Kara
  2020-04-23 22:57   ` Dave Chinner
  1 sibling, 0 replies; 28+ messages in thread
From: Jan Kara @ 2020-04-23  8:40 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, linux-xfs, Darrick J. Wong, Al Viro, Jan Kara,
	Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jeff Moyer, linux-ext4, linux-fsdevel,
	linux-api

On Wed 22-04-20 14:21:01, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> DCACHE_DONTCACHE indicates a dentry should not be cached on final
> dput().
> 
> Also add a helper function to mark DCACHE_DONTCACHE on all dentries
> pointing to a specific inode when that inode is being set I_DONTCACHE.
> 
> This facilitates dropping dentry references to inodes sooner which
> require eviction to swap S_DAX mode.
> 
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

The patch looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> 
> ---
> Changes from V9:
> 	modify i_state under i_lock
> 	Update comment
> 		"Purge from memory on final dput()"
> 
> Changes from V8:
> 	Update commit message
> 	Use mark_inode_dontcache in XFS
> 	Fix locking...  can't use rcu here.
> 	Change name to mark_inode_dontcache
> ---
>  fs/dcache.c            |  4 ++++
>  fs/inode.c             | 15 +++++++++++++++
>  fs/xfs/xfs_icache.c    |  2 +-
>  include/linux/dcache.h |  2 ++
>  include/linux/fs.h     |  1 +
>  5 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index b280e07e162b..0030fabab2c4 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -647,6 +647,10 @@ static inline bool retain_dentry(struct dentry *dentry)
>  		if (dentry->d_op->d_delete(dentry))
>  			return false;
>  	}
> +
> +	if (unlikely(dentry->d_flags & DCACHE_DONTCACHE))
> +		return false;
> +
>  	/* retain; LRU fodder */
>  	dentry->d_lockref.count--;
>  	if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST)))
> diff --git a/fs/inode.c b/fs/inode.c
> index 93d9252a00ab..316355433797 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1526,6 +1526,21 @@ int generic_delete_inode(struct inode *inode)
>  }
>  EXPORT_SYMBOL(generic_delete_inode);
>  
> +void mark_inode_dontcache(struct inode *inode)
> +{
> +	struct dentry *de;
> +
> +	spin_lock(&inode->i_lock);
> +	hlist_for_each_entry(de, &inode->i_dentry, d_u.d_alias) {
> +		spin_lock(&de->d_lock);
> +		de->d_flags |= DCACHE_DONTCACHE;
> +		spin_unlock(&de->d_lock);
> +	}
> +	inode->i_state |= I_DONTCACHE;
> +	spin_unlock(&inode->i_lock);
> +}
> +EXPORT_SYMBOL(mark_inode_dontcache);
> +
>  /*
>   * Called when we're dropping the last reference
>   * to an inode.
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index de76f7f60695..3c8f44477804 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -559,7 +559,7 @@ xfs_iget_cache_miss(
>  	 */
>  	iflags = XFS_INEW;
>  	if (flags & XFS_IGET_DONTCACHE)
> -		VFS_I(ip)->i_state |= I_DONTCACHE;
> +		mark_inode_dontcache(VFS_I(ip));
>  	ip->i_udquot = NULL;
>  	ip->i_gdquot = NULL;
>  	ip->i_pdquot = NULL;
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index c1488cc84fd9..a81f0c3cf352 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -177,6 +177,8 @@ struct dentry_operations {
>  
>  #define DCACHE_REFERENCED		0x00000040 /* Recently used, don't discard. */
>  
> +#define DCACHE_DONTCACHE		0x00000080 /* Purge from memory on final dput() */
> +
>  #define DCACHE_CANT_MOUNT		0x00000100
>  #define DCACHE_GENOCIDE			0x00000200
>  #define DCACHE_SHRINK_LIST		0x00000400
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 44bd45af760f..064168ec2e0b 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3055,6 +3055,7 @@ static inline int generic_drop_inode(struct inode *inode)
>  	return !inode->i_nlink || inode_unhashed(inode) ||
>  		(inode->i_state & I_DONTCACHE);
>  }
> +extern void mark_inode_dontcache(struct inode *inode);
>  
>  extern struct inode *ilookup5_nowait(struct super_block *sb,
>  		unsigned long hashval, int (*test)(struct inode *, void *),
> -- 
> 2.25.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V10 02/11] fs: Remove unneeded IS_DAX() check in io_is_direct()
  2020-04-22 21:20 ` [PATCH V10 02/11] fs: Remove unneeded IS_DAX() check in io_is_direct() ira.weiny
@ 2020-04-23 21:41   ` Dave Chinner
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Chinner @ 2020-04-23 21:41 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, linux-xfs, Darrick J. Wong, Jan Kara,
	Christoph Hellwig, Al Viro, Dan Williams, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Wed, Apr 22, 2020 at 02:20:53PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> Remove the check because DAX now has it's own read/write methods and
> file systems which support DAX check IS_DAX() prior to IOCB_DIRECT on
> their own.  Therefore, it does not matter if the file state is DAX when
> the iocb flags are created.
> 
> Also remove io_is_direct() as it is just a simple flag check.
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V10 04/11] Documentation/dax: Update Usage section
  2020-04-22 21:20 ` [PATCH V10 04/11] Documentation/dax: Update Usage section ira.weiny
  2020-04-23  2:33   ` Yasunori Goto
@ 2020-04-23 22:27   ` Dave Chinner
  2020-04-23 23:25     ` Ira Weiny
  1 sibling, 1 reply; 28+ messages in thread
From: Dave Chinner @ 2020-04-23 22:27 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, linux-xfs, Darrick J. Wong, Jan Kara, Al Viro,
	Dan Williams, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Wed, Apr 22, 2020 at 02:20:55PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> Update the Usage section to reflect the new individual dax selection
> functionality.
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from V9:
> 	Fix missing ')'
> 	Fix trialing '"'
> 
> Changes from V8:
> 	Updates from Darrick
> 
> Changes from V7:
> 	Cleanups/clarifications from Darrick and Dan
> 
> Changes from V6:
> 	Update to allow setting FS_XFLAG_DAX any time.
> 	Update with list of behaviors from Darrick
> 	https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> 
> Changes from V5:
> 	Update to reflect the agreed upon semantics
> 	https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> ---
>  Documentation/filesystems/dax.txt | 164 +++++++++++++++++++++++++++++-
>  1 file changed, 161 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> index 679729442fd2..553712c5054e 100644
> --- a/Documentation/filesystems/dax.txt
> +++ b/Documentation/filesystems/dax.txt
> @@ -17,11 +17,169 @@ For file mappings, the storage device is mapped directly into userspace.
>  Usage
>  -----
>  
> -If you have a block device which supports DAX, you can make a filesystem
> +If you have a block device which supports DAX, you can make a file system
>  on it as usual.  The DAX code currently only supports files with a block
>  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> -size when creating the filesystem.  When mounting it, use the "-o dax"
> -option on the command line or add 'dax' to the options in /etc/fstab.
> +size when creating the file system.
> +
> +Currently 3 filesystems support DAX: ext2, ext4 and xfs.  Enabling DAX on them
> +is different.
> +
> +Enabling DAX on ext4 and ext2
> +-----------------------------
> +
> +When mounting the filesystem, use the "-o dax" option on the command line or
> +add 'dax' to the options in /etc/fstab.  This works to enable DAX on all files
> +within the filesystem.  It is equivalent to the '-o dax=always' behavior below.
> +
> +
> +Enabling DAX on xfs
> +-------------------
> +
> +Summary
> +-------
> +
> + 1. There exists an in-kernel file access mode flag S_DAX that corresponds to
> +    the statx flag STATX_ATTR_DAX.  See the manpage for statx(2) for details
> +    about this access mode.
> +
> + 2. There exists an advisory file inode flag FS_XFLAG_DAX that is
> +    inherited from the parent directory FS_XFLAG_DAX inode flag at file
> +    creation time.  This advisory flag can be set or cleared at any
> +    time, but doing so does not immediately affect the S_DAX state.

This needs to make it clear that the inheritance behaviour of this
flag affects both newly created regular files and sub-directories,
but no other types of directory entries.

> +
> +    Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
> +    and the fs is on pmem then it will enable S_DAX at inode load time;
> +    if FS_XFLAG_DAX is not set, it will not enable S_DAX.

This is item 3), and needs to state that it is specific to regular
files as DAX is not a method that can be used to access the
directory structure.

Also "at inode load time" doesn't really mean anything useful to
users. "when the inode is instantiated in memory by the kernel" is
what you really mean, and given that 5(c) talks about "the kernel
evicts the inode from memory", we really need to use consistent
terminology here so that it's clear to users that the behaviours are
related.

> +
> + 3. There exists a dax= mount option.
> +
> +    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
> +
> +    "-o dax=always" means "always set S_DAX (at least on pmem),
> +                    and ignore FS_XFLAG_DAX."
> +
> +    "-o dax"        is an alias for "dax=always".

"Legacy option that is an alias for "dax=always". This may be
deprecated and removed in future, so "dax=always" is the preferred
method for specifying this behaviour."

> +
> +    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
> +
> + 4. There exists an advisory directory inode flag FS_XFLAG_DAX that can
> +    be set or cleared at any time.  The flag state is inherited by any files or
> +    subdirectories when they are created within that directory.

This should be item 2, so that it is defined before it is referenced
by the current item 2 in the list.

> +
> + 5. Programs that require a specific file access mode (DAX or not DAX)
> +    can do one of the following:
> +
> +    (a) Create files in directories that the FS_XFLAG_DAX flag set as
> +        needed; or

	(a) Set the parent directory FS_XFLAG_DAX as needed before
	files are created; or

> +    (b) Have the administrator set an override via mount option; or

	(b) Have the administrator set the desired behaviour via
	mount option; or

> +    (c) Set or clear the file's FS_XFLAG_DAX flag as needed.  Programs
> +        must then cause the kernel to evict the inode from memory.  This
> +        can be done by:
> +
> +        i>  Closing the file and re-opening the file and using statx to
> +            see if the fs has changed the S_DAX flag; and

That will almost never work by itself as the cached dentry will
still pin the inode. Suggesting it will lead to people implementing
dumb loops where they open/check/close/sleep because they....

> +        ii> If the file still does not have the desired S_DAX access
> +            mode, either unmount and remount the filesystem, or close
> +            the file and use drop_caches.

.... don't have permissions to do either of these things...

Essentially, you may as well say "reboot the machine" at this point,
because it's effectively the same thing from a production workload
point of view...

Realistically, I'm not sure we should even say "programs must cause
eviction", because that's something they cannot do directly without
admin privileges nor is it something we want to occur randomly on
production machines during production. i.e. this is something that
should only be done in scheduled downtime by an administrator, not
attempted by applications because DAX isn't immediately available.
The admin is in charge here, not the "program".

> + 6. It is expected that users who want to squeeze every last bit of performance
> +    out of the particular rough and tumble bits of their storage will also be
> +    exposed to the difficulties of what happens when the operating system can't
> +    totally virtualize those hardware capabilities.  DAX is such a feature.

I don't think this adds any value. You may as well just say "caveat
empor", but that's kinda implied by the fact a computer is
involved...

> +
> +
> +Details
> +-------
> +
> +There are 2 per-file dax flags.  One is a physical inode setting (FS_XFLAG_DAX)

s/physical/persistent/

> +and the other a currently enabled state (S_DAX).

the other is a volatile flag indicating the active state of the
feature (S_DAX)

> +
> +FS_XFLAG_DAX is maintained, on disk, on individual inodes.

This is implementation detail, not a requirement. The only
requirement is that it is stored persistently by the filesystem.

> It is preserved
> +within the file system.  This 'physical' config setting can be set using an
> +ioctl and/or an application such as "xfs_io -c 'chattr [-+]x'".  Files and

s/physical/persistent/

... can be set, cleared and/or queried use the FS_IOC_FS[GS]ETXATTR
ioctl (see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
New files and ...

> +directories automatically inherit FS_XFLAG_DAX from their parent directory
> +_when_ _created_.  Therefore, setting FS_XFLAG_DAX at directory creation time
> +can be used to set a default behavior for an entire sub-tree.  (Doing so on the
> +root directory acts to set a default for the entire file system.)

No need for () around the example, but regardless, I don't think
this is a well thought out example. i.e.  setting it on an existing
filesystem which already contains data will not affect the default
behaviour of existing subdirectories or files. IOWs, it only sets the
default behaviour when set on an -empty- filesystem because of the
inheritance characteristics of the flag...

> +The current enabled state (S_DAX) is set when a file inode is _loaded_ based on

instantiated in memoy by the kernel.

> +the underlying media support, the value of FS_XFLAG_DAX, and the file systems

no comma before "and".

> +dax mount option setting.  See below.
> +
> +statx can be used to query S_DAX.  NOTE that a directory will never have S_DAX
> +set and therefore statx will never indicate that S_DAX is set on directories.
> +
> +NOTE: Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs
> +even if the underlying media does not support dax and/or the file system is
> +overridden with a mount option.
> +
> +
> +Overriding FS_XFLAG_DAX (dax= mount option)
> +-------------------------------------------
> +
> +There exists a dax mount option.  Using the mount option does not change the
> +physical configured state of individual files but overrides the S_DAX operating
> +state when inodes are loaded.
> +
> +Given underlying media support, the dax mount option is a tri-state option
> +(never, always, inode) with the following meanings:
> +
> +   "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
> +   "-o dax=always" means "always set S_DAX, ignore FS_XFLAG_DAX"
> +        "-o dax" by itself means "dax=always" to remain compatible with older
> +	         kernels
> +   "-o dax=inode" means "follow FS_XFLAG_DAX"

This is just repeating what is in the definition section.

> +The default state is 'inode'.  Given underlying media support, the following
> +algorithm is used to determine the effective mode of the file S_DAX on a
> +capable device.
> +
> +	S_DAX = FS_XFLAG_DAX;
> +
> +	if (dax_mount == "always")
> +		S_DAX = true;
> +	else if (dax_mount == "off")
> +		S_DAX = false;
> +
> +To reiterate: Setting, and inheritance, continues to affect FS_XFLAG_DAX even
> +while the file system is mounted with a dax override.  However, in-core inode
> +state (S_DAX) will continue to be overridden until the filesystem is remounted
> +with dax=inode and the inode is evicted.

Just put this last paragraph up in the "behavioural definitions" and
this whole section can go away.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V10 05/11] fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS
  2020-04-22 21:20 ` [PATCH V10 05/11] fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS ira.weiny
@ 2020-04-23 22:27   ` Dave Chinner
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Chinner @ 2020-04-23 22:27 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, linux-xfs, Darrick J. Wong, Al Viro, Jan Kara,
	Dan Williams, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Wed, Apr 22, 2020 at 02:20:56PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> In prep for the new tri-state mount option which then introduces
> XFS_MOUNT_DAX_NEVER.
> 
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V10 06/11] fs/xfs: Make DAX mount option a tri-state
  2020-04-22 21:20 ` [PATCH V10 06/11] fs/xfs: Make DAX mount option a tri-state ira.weiny
@ 2020-04-23 22:35   ` Dave Chinner
  2020-04-23 23:33     ` Ira Weiny
  0 siblings, 1 reply; 28+ messages in thread
From: Dave Chinner @ 2020-04-23 22:35 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, linux-xfs, Darrick J. Wong, Al Viro, Jan Kara,
	Dan Williams, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Wed, Apr 22, 2020 at 02:20:57PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> As agreed upon[1].  We make the dax mount option a tri-state.  '-o dax'
> continues to operate the same.  We add 'always', 'never', and 'inode'
> (default).
> 
> [1] https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
....
> @@ -129,7 +163,6 @@ xfs_fs_show_options(
>  		{ XFS_MOUNT_GRPID,		",grpid" },
>  		{ XFS_MOUNT_DISCARD,		",discard" },
>  		{ XFS_MOUNT_LARGEIO,		",largeio" },
> -		{ XFS_MOUNT_DAX_ALWAYS,		",dax" },
>  		{ 0, NULL }
>  	};
>  	struct xfs_mount	*mp = XFS_M(root->d_sb);
> @@ -185,6 +218,11 @@ xfs_fs_show_options(
>  	if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT))
>  		seq_puts(m, ",noquota");
>  
> +	if (mp->m_flags & XFS_MOUNT_DAX_ALWAYS)
> +		seq_puts(m, ",dax=always");
> +	else if (mp->m_flags & XFS_MOUNT_DAX_NEVER)
> +		seq_puts(m, ",dax=never");

These can never be set at the same time, so please put these in the
m_flags options table as XFS_MOUNT_DAX_ALWAYS already is.  i.e.

@@ -129,7 +163,8 @@ xfs_fs_show_options(
 		{ XFS_MOUNT_GRPID,		",grpid" },
 		{ XFS_MOUNT_DISCARD,		",discard" },
 		{ XFS_MOUNT_LARGEIO,		",largeio" },
-		{ XFS_MOUNT_DAX_ALWAYS,		",dax" },
+		{ XFS_MOUNT_DAX_ALWAYS,		",dax=always" },
+		{ XFS_MOUNT_DAX_NEVER,		",dax=never" },
 		{ 0, NULL }
 	};
 	struct xfs_mount	*mp = XFS_M(root->d_sb);

Otherwise looks OK.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V10 07/11] fs/xfs: Create function xfs_inode_should_enable_dax()
  2020-04-22 21:20 ` [PATCH V10 07/11] fs/xfs: Create function xfs_inode_should_enable_dax() ira.weiny
@ 2020-04-23 22:36   ` Dave Chinner
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Chinner @ 2020-04-23 22:36 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, linux-xfs, Darrick J. Wong, Al Viro, Jan Kara,
	Dan Williams, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Wed, Apr 22, 2020 at 02:20:58PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> xfs_inode_supports_dax() should reflect if the inode can support DAX not
> that it is enabled for DAX.
> 
> Change the use of xfs_inode_supports_dax() to reflect only if the inode
> and underlying storage support dax.
> 
> Add a new function xfs_inode_should_enable_dax() which reflects if the
> inode should be enabled for DAX.
> 
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Looks fine.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V10 08/11] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-22 21:20 ` [PATCH V10 08/11] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags() ira.weiny
@ 2020-04-23 22:37   ` Dave Chinner
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Chinner @ 2020-04-23 22:37 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, linux-xfs, Darrick J. Wong, Al Viro, Jan Kara,
	Dan Williams, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Wed, Apr 22, 2020 at 02:20:59PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> The functionality in xfs_diflags_to_linux() and xfs_diflags_to_iflags() are
> nearly identical.  The only difference is that *_to_linux() is called after
> inode setup and disallows changing the DAX flag.
> 
> Combining them can be done with a flag which indicates if this is the initial
> setup to allow the DAX flag to be properly set only at init time.
> 
> So remove xfs_diflags_to_linux() and call the modified xfs_diflags_to_iflags()
> directly.
> 
> While we are here simplify xfs_diflags_to_iflags() to take struct xfs_inode and
> use xfs_ip2xflags() to ensure future diflags are included correctly.
> 
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V10 09/11] fs: Lift XFS_IDONTCACHE to the VFS layer
  2020-04-22 21:21 ` [PATCH V10 09/11] fs: Lift XFS_IDONTCACHE to the VFS layer ira.weiny
@ 2020-04-23 22:38   ` Dave Chinner
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Chinner @ 2020-04-23 22:38 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, linux-xfs, Darrick J. Wong, Al Viro, Jan Kara,
	Dan Williams, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Wed, Apr 22, 2020 at 02:21:00PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> DAX effective mode (S_DAX) changes requires inode eviction.
> 
> XFS has an advisory flag (XFS_IDONTCACHE) to prevent caching of the
> inode if no other additional references are taken.  We lift this flag to
> the VFS layer and change the behavior slightly by allowing the flag to
> remain even if multiple references are taken.
> 
> This will expedite the eviction of inodes to change S_DAX.
> 
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V10 10/11] fs: Introduce DCACHE_DONTCACHE
  2020-04-22 21:21 ` [PATCH V10 10/11] fs: Introduce DCACHE_DONTCACHE ira.weiny
  2020-04-23  8:40   ` Jan Kara
@ 2020-04-23 22:57   ` Dave Chinner
  2020-04-23 23:38     ` Ira Weiny
  1 sibling, 1 reply; 28+ messages in thread
From: Dave Chinner @ 2020-04-23 22:57 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, linux-xfs, Darrick J. Wong, Al Viro, Jan Kara,
	Dan Williams, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Wed, Apr 22, 2020 at 02:21:01PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> DCACHE_DONTCACHE indicates a dentry should not be cached on final
> dput().
> 
> Also add a helper function to mark DCACHE_DONTCACHE on all dentries
> pointing to a specific inode when that inode is being set I_DONTCACHE.
> 
> This facilitates dropping dentry references to inodes sooner which
> require eviction to swap S_DAX mode.
> 
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Code looks fine....

> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1526,6 +1526,21 @@ int generic_delete_inode(struct inode *inode)
>  }
>  EXPORT_SYMBOL(generic_delete_inode);
>  
> +void mark_inode_dontcache(struct inode *inode)
> +{
> +	struct dentry *de;
> +
> +	spin_lock(&inode->i_lock);
> +	hlist_for_each_entry(de, &inode->i_dentry, d_u.d_alias) {
> +		spin_lock(&de->d_lock);
> +		de->d_flags |= DCACHE_DONTCACHE;
> +		spin_unlock(&de->d_lock);
> +	}
> +	inode->i_state |= I_DONTCACHE;
> +	spin_unlock(&inode->i_lock);
> +}
> +EXPORT_SYMBOL(mark_inode_dontcache);

Though I suspect that this should be in fs/dcache.c and not
fs/inode.c. i.e. nothing in fs/inode.c does dentry list walks, but
there are several cases in the dcache code where inode dentry walks
are done under the inode lock (e.g. d_find_alias(inode)).

So perhaps this should be d_mark_dontcache(inode), which also marks
the inode as I_DONTCACHE so that everything is evicted on last
reference...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V10 04/11] Documentation/dax: Update Usage section
  2020-04-23 22:27   ` Dave Chinner
@ 2020-04-23 23:25     ` Ira Weiny
  2020-04-24  2:15       ` Dave Chinner
  0 siblings, 1 reply; 28+ messages in thread
From: Ira Weiny @ 2020-04-23 23:25 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-kernel, linux-xfs, Darrick J. Wong, Jan Kara, Al Viro,
	Dan Williams, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Fri, Apr 24, 2020 at 08:27:20AM +1000, Dave Chinner wrote:
> On Wed, Apr 22, 2020 at 02:20:55PM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > Update the Usage section to reflect the new individual dax selection
> > functionality.
> > 
> > Reviewed-by: Jan Kara <jack@suse.cz>
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > 
> > ---
> > Changes from V9:
> > 	Fix missing ')'
> > 	Fix trialing '"'
> > 
> > Changes from V8:
> > 	Updates from Darrick
> > 
> > Changes from V7:
> > 	Cleanups/clarifications from Darrick and Dan
> > 
> > Changes from V6:
> > 	Update to allow setting FS_XFLAG_DAX any time.
> > 	Update with list of behaviors from Darrick
> > 	https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> > 
> > Changes from V5:
> > 	Update to reflect the agreed upon semantics
> > 	https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > ---
> >  Documentation/filesystems/dax.txt | 164 +++++++++++++++++++++++++++++-
> >  1 file changed, 161 insertions(+), 3 deletions(-)
> > 
> > diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> > index 679729442fd2..553712c5054e 100644
> > --- a/Documentation/filesystems/dax.txt
> > +++ b/Documentation/filesystems/dax.txt
> > @@ -17,11 +17,169 @@ For file mappings, the storage device is mapped directly into userspace.
> >  Usage
> >  -----
> >  
> > -If you have a block device which supports DAX, you can make a filesystem
> > +If you have a block device which supports DAX, you can make a file system
> >  on it as usual.  The DAX code currently only supports files with a block
> >  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> > -size when creating the filesystem.  When mounting it, use the "-o dax"
> > -option on the command line or add 'dax' to the options in /etc/fstab.
> > +size when creating the file system.
> > +
> > +Currently 3 filesystems support DAX: ext2, ext4 and xfs.  Enabling DAX on them
> > +is different.
> > +
> > +Enabling DAX on ext4 and ext2
> > +-----------------------------
> > +
> > +When mounting the filesystem, use the "-o dax" option on the command line or
> > +add 'dax' to the options in /etc/fstab.  This works to enable DAX on all files
> > +within the filesystem.  It is equivalent to the '-o dax=always' behavior below.
> > +
> > +
> > +Enabling DAX on xfs
> > +-------------------
> > +
> > +Summary
> > +-------
> > +
> > + 1. There exists an in-kernel file access mode flag S_DAX that corresponds to
> > +    the statx flag STATX_ATTR_DAX.  See the manpage for statx(2) for details
> > +    about this access mode.
> > +
> > + 2. There exists an advisory file inode flag FS_XFLAG_DAX that is
> > +    inherited from the parent directory FS_XFLAG_DAX inode flag at file
> > +    creation time.  This advisory flag can be set or cleared at any
> > +    time, but doing so does not immediately affect the S_DAX state.
> 
> This needs to make it clear that the inheritance behaviour of this
> flag affects both newly created regular files and sub-directories,
> but no other types of directory entries.

Done.

> 
> > +
> > +    Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
> > +    and the fs is on pmem then it will enable S_DAX at inode load time;
> > +    if FS_XFLAG_DAX is not set, it will not enable S_DAX.
> 
> This is item 3), and needs to state that it is specific to regular
> files as DAX is not a method that can be used to access the
> directory structure.

In 1) S_DAX was defined as being a "file" access mode flag.  I could add
explicit verbiage in 1 to clarify that?

> 
> Also "at inode load time" doesn't really mean anything useful to
> users. "when the inode is instantiated in memory by the kernel" is
> what you really mean, and given that 5(c) talks about "the kernel
> evicts the inode from memory", we really need to use consistent
> terminology here so that it's clear to users that the behaviours are
> related.


> 
> > +
> > + 3. There exists a dax= mount option.
> > +
> > +    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
> > +
> > +    "-o dax=always" means "always set S_DAX (at least on pmem),
> > +                    and ignore FS_XFLAG_DAX."
> > +
> > +    "-o dax"        is an alias for "dax=always".
> 
> "Legacy option that is an alias for "dax=always". This may be
> deprecated and removed in future, so "dax=always" is the preferred
> method for specifying this behaviour."

Done.

> 
> > +
> > +    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
> > +
> > + 4. There exists an advisory directory inode flag FS_XFLAG_DAX that can
> > +    be set or cleared at any time.  The flag state is inherited by any files or
> > +    subdirectories when they are created within that directory.
> 
> This should be item 2, so that it is defined before it is referenced
> by the current item 2 in the list.

Done.

> 
> > +
> > + 5. Programs that require a specific file access mode (DAX or not DAX)
> > +    can do one of the following:
> > +
> > +    (a) Create files in directories that the FS_XFLAG_DAX flag set as
> > +        needed; or
> 
> 	(a) Set the parent directory FS_XFLAG_DAX as needed before
> 	files are created; or

Done.

> 
> > +    (b) Have the administrator set an override via mount option; or
> 
> 	(b) Have the administrator set the desired behaviour via
> 	mount option; or

Done.

> 
> > +    (c) Set or clear the file's FS_XFLAG_DAX flag as needed.  Programs
> > +        must then cause the kernel to evict the inode from memory.  This
> > +        can be done by:
> > +
> > +        i>  Closing the file and re-opening the file and using statx to
> > +            see if the fs has changed the S_DAX flag; and
> 
> That will almost never work by itself as the cached dentry will
> still pin the inode. Suggesting it will lead to people implementing
> dumb loops where they open/check/close/sleep because they....

loops which ...  will keep loading the inode into the cache...  yea...  :-(

> 
> > +        ii> If the file still does not have the desired S_DAX access
> > +            mode, either unmount and remount the filesystem, or close
> > +            the file and use drop_caches.
> 
> .... don't have permissions to do either of these things...
> 
> Essentially, you may as well say "reboot the machine" at this point,
> because it's effectively the same thing from a production workload
> point of view...
> 
> Realistically, I'm not sure we should even say "programs must cause
> eviction", because that's something they cannot do directly without
> admin privileges nor is it something we want to occur randomly on
> production machines during production. i.e. this is something that
> should only be done in scheduled downtime by an administrator, not
> attempted by applications because DAX isn't immediately available.
> The admin is in charge here, not the "program".

I agree with everything you say.

But I feel a bit stuck here.  Without some type of documentation we are not
allowing FS_XFLAG_DAX to be changed on a file by the user.  Which is what we
were proposing before and we all disliked.

So I feel like we need to say something about getting the inodes evicted.
perhaps by a 'drop cache' even requested of the admin???

Maybe this?


 4. Programs that require a specific file access mode (DAX or not DAX)
    can do one of the following:

    (a) Set the parent directory FS_XFLAG_DAX as needed before file are
        created; or

    (b) Have the administrator set the desired behaviour via mount option; or

    (c) Set or clear the file's FS_XFLAG_DAX flag as needed and wait for the
        inode to be evicted from memory.

        i> the only effective way of ensuring this is to request the admin drop
           the file system caches.


> 
> > + 6. It is expected that users who want to squeeze every last bit of performance
> > +    out of the particular rough and tumble bits of their storage will also be
> > +    exposed to the difficulties of what happens when the operating system can't
> > +    totally virtualize those hardware capabilities.  DAX is such a feature.
> 
> I don't think this adds any value. You may as well just say "caveat
> empor", but that's kinda implied by the fact a computer is
> involved..

Deleted.

> 
> > +
> > +
> > +Details
> > +-------
> > +
> > +There are 2 per-file dax flags.  One is a physical inode setting (FS_XFLAG_DAX)
> 
> s/physical/persistent/

Done.

> 
> > +and the other a currently enabled state (S_DAX).
> 
> the other is a volatile flag indicating the active state of the
> feature (S_DAX)

Done.

> 
> > +
> > +FS_XFLAG_DAX is maintained, on disk, on individual inodes.
> 
> This is implementation detail, not a requirement. The only
> requirement is that it is stored persistently by the filesystem.

removed.

> 
> > It is preserved
> > +within the file system.  This 'physical' config setting can be set using an
> > +ioctl and/or an application such as "xfs_io -c 'chattr [-+]x'".  Files and
> 
> s/physical/persistent/
> 
> ... can be set, cleared and/or queried use the FS_IOC_FS[GS]ETXATTR
> ioctl (see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
> New files and ...

Done.

> 
> > +directories automatically inherit FS_XFLAG_DAX from their parent directory
> > +_when_ _created_.  Therefore, setting FS_XFLAG_DAX at directory creation time
> > +can be used to set a default behavior for an entire sub-tree.  (Doing so on the
> > +root directory acts to set a default for the entire file system.)
> 
> No need for () around the example, but regardless, I don't think
> this is a well thought out example. i.e.  setting it on an existing
> filesystem which already contains data will not affect the default
> behaviour of existing subdirectories or files. IOWs, it only sets the
> default behaviour when set on an -empty- filesystem because of the
> inheritance characteristics of the flag...

I did mention "at directory creation time" which in my mind when extended to
the root directory means when the file system is empty...  But I'll just remove
this example as it provide very little additional information.

> 
> > +The current enabled state (S_DAX) is set when a file inode is _loaded_ based on
> 
> instantiated in memoy by the kernel.

Done.

> 
> > +the underlying media support, the value of FS_XFLAG_DAX, and the file systems
> 
> no comma before "and".

Done.

> 
> > +dax mount option setting.  See below.
> > +
> > +statx can be used to query S_DAX.  NOTE that a directory will never have S_DAX
> > +set and therefore statx will never indicate that S_DAX is set on directories.
> > +
> > +NOTE: Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs
> > +even if the underlying media does not support dax and/or the file system is
> > +overridden with a mount option.
> > +
> > +
> > +Overriding FS_XFLAG_DAX (dax= mount option)
> > +-------------------------------------------
> > +
> > +There exists a dax mount option.  Using the mount option does not change the
> > +physical configured state of individual files but overrides the S_DAX operating
> > +state when inodes are loaded.
> > +
> > +Given underlying media support, the dax mount option is a tri-state option
> > +(never, always, inode) with the following meanings:
> > +
> > +   "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX"
> > +   "-o dax=always" means "always set S_DAX, ignore FS_XFLAG_DAX"
> > +        "-o dax" by itself means "dax=always" to remain compatible with older
> > +	         kernels
> > +   "-o dax=inode" means "follow FS_XFLAG_DAX"
> 
> This is just repeating what is in the definition section.

Removed.

> 
> > +The default state is 'inode'.  Given underlying media support, the following
> > +algorithm is used to determine the effective mode of the file S_DAX on a
> > +capable device.
> > +
> > +	S_DAX = FS_XFLAG_DAX;
> > +
> > +	if (dax_mount == "always")
> > +		S_DAX = true;
> > +	else if (dax_mount == "off")
> > +		S_DAX = false;
> > +
> > +To reiterate: Setting, and inheritance, continues to affect FS_XFLAG_DAX even
> > +while the file system is mounted with a dax override.  However, in-core inode
> > +state (S_DAX) will continue to be overridden until the filesystem is remounted
> > +with dax=inode and the inode is evicted.
> 
> Just put this last paragraph up in the "behavioural definitions" and
> this whole section can go away.

Yep moved above.

To sum up the changes see the entire text below.
Ira


<quote>
Enabling DAX on xfs
-------------------

Summary
-------

 1. There exists an in-kernel file access mode flag S_DAX that corresponds to
    the statx flag STATX_ATTR_DAX.  See the manpage for statx(2) for details
    about this access mode.

 2. There exists a regular file and directory inode flag FS_XFLAG_DAX.  It is
    inherited from the parent directory FS_XFLAG_DAX inode flag at creation
    time.  This advisory flag can be set or cleared at any time, but doing so
    does not immediately affect the S_DAX state.

 3. There exists dax mount options which can override FS_XFLAG_DAX in the
    setting of the S_DAX flag.  Given underlying storage which supports DAX the
    following hold.

    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.

    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."

    "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX."

    "-o dax"        is a legacy option which is an alias for "dax=always".
    		    This may be removed in the future so "-o dax=always" is
		    the preferred method for specifying this behavior.

    NOTE: Setting and inheritance affect FS_XFLAG_DAX at all times even when
    the file system is mounted with a dax option.  However, in-core inode
    state (S_DAX) will continue to be overridden until the file system is
    remounted with dax=inode and the inode is evicted.

 4. Programs that require a specific file access mode (DAX or not DAX)
    can do one of the following:

    (a) Set the parent directory FS_XFLAG_DAX as needed before file are
        created; or

    (b) Have the administrator set the desired behaviour via mount option; or

    (c) Set or clear the file's FS_XFLAG_DAX flag as needed and wait for the
        inode to be evicted from memory.

	i> the only effective way of ensuring this is to request the admin drop
	   the file system caches.


Details
-------

There are 2 per-file dax flags.  One is a persistent inode setting (FS_XFLAG_DAX)
and the other is a volatile flag indicating the active state of the feature
(S_DAX).

FS_XFLAG_DAX is preserved within the file system.  This persistent config
setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl
(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
'chattr [-+]x'".

New files and directories automatically inherit FS_XFLAG_DAX from
their parent directory _when_ _created_.  Therefore, setting FS_XFLAG_DAX at
directory creation time can be used to set a default behavior for an entire
sub-tree.

To clarify inheritance here are 3 examples:

Example A:

mkdir -p a/b/c
xfs_io -c 'chattr +x' a
mkdir a/b/c/d
mkdir a/e

	dax: a,e
	no dax: b,c,d

Example B:

mkdir a
xfs_io -c 'chattr +x' a
mkdir -p a/b/c/d

	dax: a,b,c,d
	no dax:

Example C:

mkdir -p a/b/c
xfs_io -c 'chattr +x' c
mkdir a/b/c/d

	dax: c,d
	no dax: a,b


The current enabled state (S_DAX) is set when a file inode is instantiated in
memory by the kernel.  It is set based on the underlying media support, the
value of FS_XFLAG_DAX and the file systems dax mount option setting.

statx can be used to query S_DAX.  NOTE that a directory will never have S_DAX
set and therefore statx will never indicate that S_DAX is set on directories.

Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs even if
the underlying media does not support dax and/or the file system is overridden
with a mount option.

</quote>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V10 06/11] fs/xfs: Make DAX mount option a tri-state
  2020-04-23 22:35   ` Dave Chinner
@ 2020-04-23 23:33     ` Ira Weiny
  0 siblings, 0 replies; 28+ messages in thread
From: Ira Weiny @ 2020-04-23 23:33 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-kernel, linux-xfs, Darrick J. Wong, Al Viro, Jan Kara,
	Dan Williams, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Fri, Apr 24, 2020 at 08:35:31AM +1000, Dave Chinner wrote:
> On Wed, Apr 22, 2020 at 02:20:57PM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > As agreed upon[1].  We make the dax mount option a tri-state.  '-o dax'
> > continues to operate the same.  We add 'always', 'never', and 'inode'
> > (default).
> > 
> > [1] https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ....
> > @@ -129,7 +163,6 @@ xfs_fs_show_options(
> >  		{ XFS_MOUNT_GRPID,		",grpid" },
> >  		{ XFS_MOUNT_DISCARD,		",discard" },
> >  		{ XFS_MOUNT_LARGEIO,		",largeio" },
> > -		{ XFS_MOUNT_DAX_ALWAYS,		",dax" },
> >  		{ 0, NULL }
> >  	};
> >  	struct xfs_mount	*mp = XFS_M(root->d_sb);
> > @@ -185,6 +218,11 @@ xfs_fs_show_options(
> >  	if (!(mp->m_qflags & XFS_ALL_QUOTA_ACCT))
> >  		seq_puts(m, ",noquota");
> >  
> > +	if (mp->m_flags & XFS_MOUNT_DAX_ALWAYS)
> > +		seq_puts(m, ",dax=always");
> > +	else if (mp->m_flags & XFS_MOUNT_DAX_NEVER)
> > +		seq_puts(m, ",dax=never");
> 
> These can never be set at the same time, so please put these in the
> m_flags options table as XFS_MOUNT_DAX_ALWAYS already is.  i.e.
> 
> @@ -129,7 +163,8 @@ xfs_fs_show_options(
>  		{ XFS_MOUNT_GRPID,		",grpid" },
>  		{ XFS_MOUNT_DISCARD,		",discard" },
>  		{ XFS_MOUNT_LARGEIO,		",largeio" },
> -		{ XFS_MOUNT_DAX_ALWAYS,		",dax" },
> +		{ XFS_MOUNT_DAX_ALWAYS,		",dax=always" },
> +		{ XFS_MOUNT_DAX_NEVER,		",dax=never" },
>  		{ 0, NULL }
>  	};
>  	struct xfs_mount	*mp = XFS_M(root->d_sb);
> 
> Otherwise looks OK.

Done.

Ira

> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V10 10/11] fs: Introduce DCACHE_DONTCACHE
  2020-04-23 22:57   ` Dave Chinner
@ 2020-04-23 23:38     ` Ira Weiny
  0 siblings, 0 replies; 28+ messages in thread
From: Ira Weiny @ 2020-04-23 23:38 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-kernel, linux-xfs, Darrick J. Wong, Al Viro, Jan Kara,
	Dan Williams, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Fri, Apr 24, 2020 at 08:57:34AM +1000, Dave Chinner wrote:
> On Wed, Apr 22, 2020 at 02:21:01PM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > DCACHE_DONTCACHE indicates a dentry should not be cached on final
> > dput().
> > 
> > Also add a helper function to mark DCACHE_DONTCACHE on all dentries
> > pointing to a specific inode when that inode is being set I_DONTCACHE.
> > 
> > This facilitates dropping dentry references to inodes sooner which
> > require eviction to swap S_DAX mode.
> > 
> > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> Code looks fine....
> 
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -1526,6 +1526,21 @@ int generic_delete_inode(struct inode *inode)
> >  }
> >  EXPORT_SYMBOL(generic_delete_inode);
> >  
> > +void mark_inode_dontcache(struct inode *inode)
> > +{
> > +	struct dentry *de;
> > +
> > +	spin_lock(&inode->i_lock);
> > +	hlist_for_each_entry(de, &inode->i_dentry, d_u.d_alias) {
> > +		spin_lock(&de->d_lock);
> > +		de->d_flags |= DCACHE_DONTCACHE;
> > +		spin_unlock(&de->d_lock);
> > +	}
> > +	inode->i_state |= I_DONTCACHE;
> > +	spin_unlock(&inode->i_lock);
> > +}
> > +EXPORT_SYMBOL(mark_inode_dontcache);
> 
> Though I suspect that this should be in fs/dcache.c and not
> fs/inode.c. i.e. nothing in fs/inode.c does dentry list walks, but
> there are several cases in the dcache code where inode dentry walks
> are done under the inode lock (e.g. d_find_alias(inode)).
> 
> So perhaps this should be d_mark_dontcache(inode), which also marks
> the inode as I_DONTCACHE so that everything is evicted on last
> reference...

That does follow an existing pattern.

Al?  Any preference?

Ira

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V10 04/11] Documentation/dax: Update Usage section
  2020-04-23 23:25     ` Ira Weiny
@ 2020-04-24  2:15       ` Dave Chinner
  2020-04-24  5:55         ` Ira Weiny
  0 siblings, 1 reply; 28+ messages in thread
From: Dave Chinner @ 2020-04-24  2:15 UTC (permalink / raw)
  To: Ira Weiny
  Cc: linux-kernel, linux-xfs, Darrick J. Wong, Jan Kara, Al Viro,
	Dan Williams, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Thu, Apr 23, 2020 at 04:25:48PM -0700, Ira Weiny wrote:
> On Fri, Apr 24, 2020 at 08:27:20AM +1000, Dave Chinner wrote:
> > On Wed, Apr 22, 2020 at 02:20:55PM -0700, ira.weiny@intel.com wrote:
> > > +    Unless overridden by mount options (see (3)), if FS_XFLAG_DAX is set
> > > +    and the fs is on pmem then it will enable S_DAX at inode load time;
> > > +    if FS_XFLAG_DAX is not set, it will not enable S_DAX.
> > 
> > This is item 3), and needs to state that it is specific to regular
> > files as DAX is not a method that can be used to access the
> > directory structure.
> 
> In 1) S_DAX was defined as being a "file" access mode flag.  I could add
> explicit verbiage in 1 to clarify that?

A "file" can mean a block device, a symlink, a FIFO, etc. If you are
talking about files containing -user data- then it is a "regular
file" and not just a "file". Making sitinctions like this mean it is
clear that S_DAX will not be set on directories and other types of
files in directories...

> > > +        ii> If the file still does not have the desired S_DAX access
> > > +            mode, either unmount and remount the filesystem, or close
> > > +            the file and use drop_caches.
> > 
> > .... don't have permissions to do either of these things...
> > 
> > Essentially, you may as well say "reboot the machine" at this point,
> > because it's effectively the same thing from a production workload
> > point of view...
> > 
> > Realistically, I'm not sure we should even say "programs must cause
> > eviction", because that's something they cannot do directly without
> > admin privileges nor is it something we want to occur randomly on
> > production machines during production. i.e. this is something that
> > should only be done in scheduled downtime by an administrator, not
> > attempted by applications because DAX isn't immediately available.
> > The admin is in charge here, not the "program".
> 
> I agree with everything you say.
> 
> But I feel a bit stuck here.  Without some type of documentation we are not
> allowing FS_XFLAG_DAX to be changed on a file by the user.  Which is what we
> were proposing before and we all disliked.

For production systems, the admin is the "user" we are taking about.
The program itself shouldn't be choosing the method of file data
access; that's up to the administrator in charge of the system to
set the policy how they want it to be set.

i.e. there's a difference between the user/admin taking action to
change a data access policy, and the application taking actions to
override the policy that the admin has set.

What I'm trying to say is that setting/clearing the DAX flags is an
-admin operation-, and part of the consideration of that admin
operation is when the change should take effect.

i.e. refering to "programs" as if they control the access mode is
entirely the wrong way to be looking at persistent inode flags. They
are an administration policy mechanism that belongs to the data set,
not the application (or "program"). Managing data set storage and
access policy is something administrators do, not the application...

> So I feel like we need to say something about getting the inodes evicted.
> perhaps by a 'drop cache' even requested of the admin???
> 
> Maybe this?
> 
> 
>  4. Programs that require a specific file access mode (DAX or not DAX)
>     can do one of the following:
> 
>     (a) Set the parent directory FS_XFLAG_DAX as needed before file are
>         created; or
> 
>     (b) Have the administrator set the desired behaviour via mount option; or
> 
>     (c) Set or clear the file's FS_XFLAG_DAX flag as needed and wait for the
>         inode to be evicted from memory.
> 
>         i> the only effective way of ensuring this is to request the admin drop
>            the file system caches.

4. The DAX policy can be changed via:

	a) Set the parent directory FS_XFLAG_DAX as needed before
	   files are created

	b) Set the appropriate dax="foo" mount option

	c) Change the FS_XFLAG_DAX on existing regular files and
	   directories. This has runtime constraints and limitations
	   that are described in 5) below.

5. When changing the DAX policy via toggling the persistent
FS_XFLAG_DAX flag, the change in behaviour for existing regular
files may not occur immediately. If the change must take effect
immediately, the administrator needs to:

	1. stop the application so there are no active references to
	   the data set the policy change will affect
	2. evict the data set from kernel caches so it will be
	   re-instantiated when the application is restarted. This can
	   be acheived by:
		a. drop-caches
		b. a filesystem unmount and mount cycle
		c. a system reboot

Hence if DAX access policy changes are required to take immediate
effect, scheduled system-wide downtime will be required to guarantee
the new policy change takes effect when the application is
restarted.


> <quote>
> Enabling DAX on xfs
> -------------------
> 
> Summary
> -------
> 
>  1. There exists an in-kernel file access mode flag S_DAX that corresponds to
>     the statx flag STATX_ATTR_DAX.  See the manpage for statx(2) for details
>     about this access mode.
> 
>  2. There exists a regular file and directory inode flag FS_XFLAG_DAX.  It is
>     inherited from the parent directory FS_XFLAG_DAX inode flag at creation
>     time.  This advisory flag can be set or cleared at any time, but doing so
>     does not immediately affect the S_DAX state.

2. There exists a persistent flag FS_XFLAG_DAX that can be applied to
regular files and directories. This advisory flag can be set or
cleared at any time, but doing so does not immediately affect the
S_DAX state.

3. If the persistent FS_XFLAG_DAX flag is set on a directory, this
flag will be inherited by all regular files and sub directories that
are subsequently created in this directory. Files and subdirectories
that exist at the time this flag is set or cleared on the parent
directory are not modified by this modification of the parent
directory.


> 
>  3. There exists dax mount options which can override FS_XFLAG_DAX in the
>     setting of the S_DAX flag.  Given underlying storage which supports DAX the
>     following hold.
> 
>     "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
> 
>     "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
> 
>     "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX."
> 
>     "-o dax"        is a legacy option which is an alias for "dax=always".
>     		    This may be removed in the future so "-o dax=always" is
> 		    the preferred method for specifying this behavior.
> 
>     NOTE: Setting and inheritance affect FS_XFLAG_DAX at all times even when
>     the file system is mounted with a dax option.  However, in-core inode
>     state (S_DAX) will continue to be overridden until the file system is

s/continue to//

>     remounted with dax=inode and the inode is evicted.

evicted from kernel memory.

> 
>  4. Programs that require a specific file access mode (DAX or not DAX)
>     can do one of the following:
> 
>     (a) Set the parent directory FS_XFLAG_DAX as needed before file are
>         created; or
> 
>     (b) Have the administrator set the desired behaviour via mount option; or
> 
>     (c) Set or clear the file's FS_XFLAG_DAX flag as needed and wait for the
>         inode to be evicted from memory.
> 
> 	i> the only effective way of ensuring this is to request the admin drop
> 	   the file system caches.

See my comments above.

> 
> 
> Details
> -------
> 
> There are 2 per-file dax flags.  One is a persistent inode setting (FS_XFLAG_DAX)
> and the other is a volatile flag indicating the active state of the feature
> (S_DAX).
> 
> FS_XFLAG_DAX is preserved within the file system.  This persistent config
> setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl
> (see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
> 'chattr [-+]x'".

Stray line.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH V10 04/11] Documentation/dax: Update Usage section
  2020-04-24  2:15       ` Dave Chinner
@ 2020-04-24  5:55         ` Ira Weiny
  0 siblings, 0 replies; 28+ messages in thread
From: Ira Weiny @ 2020-04-24  5:55 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-kernel, linux-xfs, Darrick J. Wong, Jan Kara, Al Viro,
	Dan Williams, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Fri, Apr 24, 2020 at 12:15:16PM +1000, Dave Chinner wrote:
> On Thu, Apr 23, 2020 at 04:25:48PM -0700, Ira Weiny wrote:

[snap]

> > > > +        ii> If the file still does not have the desired S_DAX access
> > > > +            mode, either unmount and remount the filesystem, or close
> > > > +            the file and use drop_caches.
> > > 
> > > .... don't have permissions to do either of these things...
> > > 
> > > Essentially, you may as well say "reboot the machine" at this point,
> > > because it's effectively the same thing from a production workload
> > > point of view...
> > > 
> > > Realistically, I'm not sure we should even say "programs must cause
> > > eviction", because that's something they cannot do directly without
> > > admin privileges nor is it something we want to occur randomly on
> > > production machines during production. i.e. this is something that
> > > should only be done in scheduled downtime by an administrator, not
> > > attempted by applications because DAX isn't immediately available.
> > > The admin is in charge here, not the "program".
> > 
> > I agree with everything you say.
> > 
> > But I feel a bit stuck here.  Without some type of documentation we are not
> > allowing FS_XFLAG_DAX to be changed on a file by the user.  Which is what we
> > were proposing before and we all disliked.
> 
> For production systems, the admin is the "user" we are taking about.
> The program itself shouldn't be choosing the method of file data
> access; that's up to the administrator in charge of the system to
> set the policy how they want it to be set.
> 
> i.e. there's a difference between the user/admin taking action to
> change a data access policy, and the application taking actions to
> override the policy that the admin has set.
> 
> What I'm trying to say is that setting/clearing the DAX flags is an
> -admin operation-, and part of the consideration of that admin
> operation is when the change should take effect.
> 
> i.e. refering to "programs" as if they control the access mode is
> entirely the wrong way to be looking at persistent inode flags. They
> are an administration policy mechanism that belongs to the data set,
> not the application (or "program"). Managing data set storage and
> access policy is something administrators do, not the application...

Ok.

> 
> > So I feel like we need to say something about getting the inodes evicted.
> > perhaps by a 'drop cache' even requested of the admin???
> > 
> > Maybe this?
> > 
> > 
> >  4. Programs that require a specific file access mode (DAX or not DAX)
> >     can do one of the following:
> > 
> >     (a) Set the parent directory FS_XFLAG_DAX as needed before file are
> >         created; or
> > 
> >     (b) Have the administrator set the desired behaviour via mount option; or
> > 
> >     (c) Set or clear the file's FS_XFLAG_DAX flag as needed and wait for the
> >         inode to be evicted from memory.
> > 
> >         i> the only effective way of ensuring this is to request the admin drop
> >            the file system caches.
> 
> 4. The DAX policy can be changed via:
> 
> 	a) Set the parent directory FS_XFLAG_DAX as needed before
> 	   files are created
> 
> 	b) Set the appropriate dax="foo" mount option
> 
> 	c) Change the FS_XFLAG_DAX on existing regular files and
> 	   directories. This has runtime constraints and limitations
> 	   that are described in 5) below.
> 
> 5. When changing the DAX policy via toggling the persistent
> FS_XFLAG_DAX flag, the change in behaviour for existing regular
> files may not occur immediately. If the change must take effect
> immediately, the administrator needs to:
> 
> 	1. stop the application so there are no active references to
> 	   the data set the policy change will affect
> 	2. evict the data set from kernel caches so it will be
> 	   re-instantiated when the application is restarted. This can
> 	   be acheived by:
> 		a. drop-caches
> 		b. a filesystem unmount and mount cycle
> 		c. a system reboot
> 
> Hence if DAX access policy changes are required to take immediate
> effect, scheduled system-wide downtime will be required to guarantee
> the new policy change takes effect when the application is
> restarted.
> 
> 
> > <quote>
> > Enabling DAX on xfs
> > -------------------
> > 
> > Summary
> > -------
> > 
> >  1. There exists an in-kernel file access mode flag S_DAX that corresponds to
> >     the statx flag STATX_ATTR_DAX.  See the manpage for statx(2) for details
> >     about this access mode.
> > 
> >  2. There exists a regular file and directory inode flag FS_XFLAG_DAX.  It is
> >     inherited from the parent directory FS_XFLAG_DAX inode flag at creation
> >     time.  This advisory flag can be set or cleared at any time, but doing so
> >     does not immediately affect the S_DAX state.
> 
> 2. There exists a persistent flag FS_XFLAG_DAX that can be applied to
> regular files and directories. This advisory flag can be set or
> cleared at any time, but doing so does not immediately affect the
> S_DAX state.

Done.

> 
> 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this
> flag will be inherited by all regular files and sub directories that
> are subsequently created in this directory. Files and subdirectories
> that exist at the time this flag is set or cleared on the parent
> directory are not modified by this modification of the parent
> directory.
> 

Done.

> 
> > 
> >  3. There exists dax mount options which can override FS_XFLAG_DAX in the
> >     setting of the S_DAX flag.  Given underlying storage which supports DAX the
> >     following hold.
> > 
> >     "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
> > 
> >     "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
> > 
> >     "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX."
> > 
> >     "-o dax"        is a legacy option which is an alias for "dax=always".
> >     		    This may be removed in the future so "-o dax=always" is
> > 		    the preferred method for specifying this behavior.
> > 
> >     NOTE: Setting and inheritance affect FS_XFLAG_DAX at all times even when
> >     the file system is mounted with a dax option.  However, in-core inode
> >     state (S_DAX) will continue to be overridden until the file system is
> 
> s/continue to//

Done.

> 
> >     remounted with dax=inode and the inode is evicted.
> 
> evicted from kernel memory.

Done.

> 
> > 
> >  4. Programs that require a specific file access mode (DAX or not DAX)
> >     can do one of the following:
> > 
> >     (a) Set the parent directory FS_XFLAG_DAX as needed before file are
> >         created; or
> > 
> >     (b) Have the administrator set the desired behaviour via mount option; or
> > 
> >     (c) Set or clear the file's FS_XFLAG_DAX flag as needed and wait for the
> >         inode to be evicted from memory.
> > 
> > 	i> the only effective way of ensuring this is to request the admin drop
> > 	   the file system caches.
> 
> See my comments above.

Done. thanks!

> 
> > 
> > 
> > Details
> > -------
> > 
> > There are 2 per-file dax flags.  One is a persistent inode setting (FS_XFLAG_DAX)
> > and the other is a volatile flag indicating the active state of the feature
> > (S_DAX).
> > 
> > FS_XFLAG_DAX is preserved within the file system.  This persistent config
> > setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl
> > (see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
> > 'chattr [-+]x'".
> 
> Stray line.

Thanks for the review!  V11 should be out soon.

Ira

> 
> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2020-04-24  5:55 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-22 21:20 [PATCH V10 00/11] XFS - Enable per-file/per-directory DAX operations V10 ira.weiny
2020-04-22 21:20 ` [PATCH V10 01/11] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
2020-04-22 21:20 ` [PATCH V10 02/11] fs: Remove unneeded IS_DAX() check in io_is_direct() ira.weiny
2020-04-23 21:41   ` Dave Chinner
2020-04-22 21:20 ` [PATCH V10 03/11] fs/stat: Define DAX statx attribute ira.weiny
2020-04-22 21:20 ` [PATCH V10 04/11] Documentation/dax: Update Usage section ira.weiny
2020-04-23  2:33   ` Yasunori Goto
2020-04-23  5:05     ` Ira Weiny
2020-04-23 22:27   ` Dave Chinner
2020-04-23 23:25     ` Ira Weiny
2020-04-24  2:15       ` Dave Chinner
2020-04-24  5:55         ` Ira Weiny
2020-04-22 21:20 ` [PATCH V10 05/11] fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS ira.weiny
2020-04-23 22:27   ` Dave Chinner
2020-04-22 21:20 ` [PATCH V10 06/11] fs/xfs: Make DAX mount option a tri-state ira.weiny
2020-04-23 22:35   ` Dave Chinner
2020-04-23 23:33     ` Ira Weiny
2020-04-22 21:20 ` [PATCH V10 07/11] fs/xfs: Create function xfs_inode_should_enable_dax() ira.weiny
2020-04-23 22:36   ` Dave Chinner
2020-04-22 21:20 ` [PATCH V10 08/11] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags() ira.weiny
2020-04-23 22:37   ` Dave Chinner
2020-04-22 21:21 ` [PATCH V10 09/11] fs: Lift XFS_IDONTCACHE to the VFS layer ira.weiny
2020-04-23 22:38   ` Dave Chinner
2020-04-22 21:21 ` [PATCH V10 10/11] fs: Introduce DCACHE_DONTCACHE ira.weiny
2020-04-23  8:40   ` Jan Kara
2020-04-23 22:57   ` Dave Chinner
2020-04-23 23:38     ` Ira Weiny
2020-04-22 21:21 ` [PATCH V10 11/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate() ira.weiny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).