LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH V11 00/11] XFS - Enable per-file/per-directory DAX operations V11
@ 2020-04-28  0:21 ira.weiny
  2020-04-28  0:21 ` [PATCH V11 01/11] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
                   ` (10 more replies)
  0 siblings, 11 replies; 29+ messages in thread
From: ira.weiny @ 2020-04-28  0:21 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Al Viro, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jan Kara, linux-ext4,
	linux-fsdevel, linux-api, Jeff Moyer

From: Ira Weiny <ira.weiny@intel.com>

Changes from V10:
	Update the documentaiton quite a bit
	Rename mark_inode_dontcache() to d_mark_dontcache()
	move d_mark_dontcache() to fs/dcache.c
	Move show options to xfs_info_set array
	Collect reviews.

At LSF/MM'19 [1] [2] we discussed applications that overestimate memory
consumption due to their inability to detect whether the kernel will
instantiate page cache for a file, and cases where a global dax enable via a
mount option is too coarse.

The following patch series enables the use of DAX on individual files and/or
directories on xfs, and lays some groundwork to do so in ext4.  It further
enhances the dax mount option to be a tri-state of 'always', 'never', or
'iflag' (default).  Furthermore, it maintians '-o dax' to be equivalent to '-o
dax=always'.

The insight at LSF/MM was to separate the per-mount or per-file "physical"
(FS_XFLAG_DAX) capability switch from an "effective" (S_DAX) attribute for the
file.

At LSF/MM we discussed the difficulties of switching the DAX state of a file
with active mappings / page cache.  It was thought the races could be avoided
by limiting DAX state flips to 0-length files.

However, this turns out to not be true.[3][5] This is because address space
operations (a_ops) may be in use at any time the inode is referenced.

For this reason direct manipulation of the FS_XFLAG_DAX is allowed but the
operation of the file (S_DAX) is not immediately changed.

Details of when and how DAX state can be changed on a file is included in a
documentation patch.

It should be noted that FS_XFLAG_DAX inheritance is not shown in this patch set
as it was maintained from previous work on XFS.  FS_XFLAG_DAX and it's
inheritance will need to be added to other file systems for user control. 


[1] https://lwn.net/Articles/787973/
[2] https://lwn.net/Articles/787233/
[3] https://lkml.org/lkml/2019/10/20/96
[4] https://patchwork.kernel.org/patch/11310511/
[5] https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/


To: linux-kernel@vger.kernel.org
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Cc: linux-xfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-api@vger.kernel.org

Changes from V9:
	Slight reorder of series to put Documentation sooner
	modify i_state under i_lock
	Change name of xfs_ioctl_setattr_dax_invalidate() ->
		xfs_ioctl_setattr_prepare_dax()
	Do not report default dax=inode mount mode
	Move XFS_IEOFBLOCKS to '9'
	Fixup some doc typos
	Fix xfs style indentation
	Fix commit mispelling

Changes from V8:
	Rebase to 5.7-rc2
	Change ALWAYS/NEVER bits to be 26/27
	Remove XFS_IDONTCACHE -> lift to I_DONTCACHE
		use mark_inode_dontcache() in XFS
	create xfs_dax_mode enum
	use xfs signature styles
	Change xfs_inode_enabe_dax() -> xfs_inode_should_enable()
		Based on feedback to ext4 series
	Fix locking of DCACHE_DONTCACHE
	Change flag_inode_dontcache() -> mark_inode_dontcache()
	Change xfs_ioctl_setattr_dax_invalidate() -> xfs_ioctl_dax_check_set_cache()
	Documentation cleanups
	Clean up all commit messages

Changes from V7:
	Add DCACHE_DONTCACHE
	If mount override don't worry about inode caching
	Change mount flags to NEVER/ALWAYS
	Clean up xfs_inode_enable_dax()
	Clarify comments
	Update documentation

Changes from V6:
	Incorporate feedback on patches
	Add ability to change FS_XFLAG_DAX on files at any time.
		Add a don't cache flag to the VFS layer
		Preserve internal XFS IDONTCACHE behavior for bulkstat
		operations

Changes from V5:
	* make dax mount option a tri-state
	* Reject changes to FS_XFLAG_DAX for regular files
		- Allow only on directories
	* Update documentation

Changes from V4:
	* Open code the aops lock rather than add it to the xfs_ilock()
	  subsystem (Darrick's comments were obsoleted by this change)
	* Fix lkp build suggestions and bugs

Changes from V3:
	* Remove global locking...  :-D
	* put back per inode locking and remove pre-mature optimizations
	* Fix issues with Directories having IS_DAX() set
	* Fix kernel crash issues reported by Jeff
	* Add some clean up patches
	* Consolidate diflags to iflags functions
	* Update/add documentation
	* Reorder/rename patches quite a bit

Changes from V2:

	* Move i_dax_sem to be a global percpu_rw_sem rather than per inode
		Internal discussions with Dan determined this would be easier,
		just as performant, and slightly less overhead that having it
		in the SB as suggested by Jan
	* Fix locking order in comments and throughout code
	* Change "mode" to "state" throughout commits
	* Add CONFIG_FS_DAX wrapper to disable inode_[un]lock_state() when not
		configured
	* Add static branch for which is activated by a device which supports
		DAX in XFS
	* Change "lock/unlock" to up/down read/write as appropriate
		Previous names were over simplified
	* Update comments/documentation

	* Remove the xfs specific lock to the vfs (global) layer.
	* Fix i_dax_sem locking order and comments

	* Move 'i_mapped' count from struct inode to struct address_space and
		rename it to mmap_count
	* Add inode_has_mappings() call

	* Fix build issues
	* Clean up syntax spacing and minor issues
	* Update man page text for STATX_ATTR_DAX
	* Add reviewed-by's
	* Rebase to 5.6

	Rename patch:
		from: fs/xfs: Add lock/unlock state to xfs
		to: fs/xfs: Add write DAX lock to xfs layer
	Add patch:
		fs/xfs: Clarify lockdep dependency for xfs_isilocked()
	Drop patch:
		fs/xfs: Fix truncate up

Ira Weiny (11):
  fs/xfs: Remove unnecessary initialization of i_rwsem
  fs: Remove unneeded IS_DAX() check in io_is_direct()
  fs/stat: Define DAX statx attribute
  Documentation/dax: Update Usage section
  fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS
  fs/xfs: Make DAX mount option a tri-state
  fs/xfs: Create function xfs_inode_should_enable_dax()
  fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  fs: Lift XFS_IDONTCACHE to the VFS layer
  fs: Introduce DCACHE_DONTCACHE
  fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()

 Documentation/filesystems/dax.txt | 139 ++++++++++++++++++++++++++++-
 drivers/block/loop.c              |   6 +-
 fs/dcache.c                       |  19 ++++
 fs/stat.c                         |   3 +
 fs/xfs/xfs_icache.c               |   8 +-
 fs/xfs/xfs_inode.h                |   4 +-
 fs/xfs/xfs_ioctl.c                | 141 +++++-------------------------
 fs/xfs/xfs_iops.c                 |  72 ++++++++++-----
 fs/xfs/xfs_mount.h                |   4 +-
 fs/xfs/xfs_super.c                |  50 +++++++++--
 include/linux/dcache.h            |   2 +
 include/linux/fs.h                |  14 +--
 include/uapi/linux/stat.h         |   1 +
 13 files changed, 291 insertions(+), 172 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V11 01/11] fs/xfs: Remove unnecessary initialization of i_rwsem
  2020-04-28  0:21 [PATCH V11 00/11] XFS - Enable per-file/per-directory DAX operations V11 ira.weiny
@ 2020-04-28  0:21 ` ira.weiny
  2020-04-28  0:21 ` [PATCH V11 02/11] fs: Remove unneeded IS_DAX() check in io_is_direct() ira.weiny
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: ira.weiny @ 2020-04-28  0:21 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Dave Chinner, Al Viro, Jan Kara, Dan Williams,
	Dave Chinner, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

An earlier call of xfs_reinit_inode() from xfs_iget_cache_hit() already
handles initialization of i_rwsem.

Doing so again is unneeded.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V4:
	Update commit message to make it clear the xfs_iget_cache_hit()
	is actually doing the initialization via xfs_reinit_inode()

New for V4:

NOTE: This was found while ensuring the new i_aops_sem was properly
handled.  It seems like this is a layering violation so I think it is
worth cleaning up so as to not confuse others.
---
 fs/xfs/xfs_icache.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 8bf1d15be3f6..17a0b86fe701 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -423,6 +423,7 @@ xfs_iget_cache_hit(
 		spin_unlock(&ip->i_flags_lock);
 		rcu_read_unlock();
 
+		ASSERT(!rwsem_is_locked(&inode->i_rwsem));
 		error = xfs_reinit_inode(mp, inode);
 		if (error) {
 			bool wake;
@@ -456,9 +457,6 @@ xfs_iget_cache_hit(
 		ip->i_sick = 0;
 		ip->i_checked = 0;
 
-		ASSERT(!rwsem_is_locked(&inode->i_rwsem));
-		init_rwsem(&inode->i_rwsem);
-
 		spin_unlock(&ip->i_flags_lock);
 		spin_unlock(&pag->pag_ici_lock);
 	} else {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V11 02/11] fs: Remove unneeded IS_DAX() check in io_is_direct()
  2020-04-28  0:21 [PATCH V11 00/11] XFS - Enable per-file/per-directory DAX operations V11 ira.weiny
  2020-04-28  0:21 ` [PATCH V11 01/11] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
@ 2020-04-28  0:21 ` ira.weiny
  2020-04-28  0:21 ` [PATCH V11 03/11] fs/stat: Define DAX statx attribute ira.weiny
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: ira.weiny @ 2020-04-28  0:21 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Dave Chinner, Jan Kara, Christoph Hellwig, Al Viro,
	Dan Williams, Dave Chinner, Theodore Y. Ts'o, Jeff Moyer,
	linux-ext4, linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

Remove the check because DAX now has it's own read/write methods and
file systems which support DAX check IS_DAX() prior to IOCB_DIRECT on
their own.  Therefore, it does not matter if the file state is DAX when
the iocb flags are created.

Also remove io_is_direct() as it is just a simple flag check.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from v8:
	Rebase to latest Linus tree

Changes from v6:
	remove io_is_direct() as well.
	Remove Reviews since this is quite a bit different.

Changes from v3:
	Reword commit message.
	Reordered to be a 'pre-cleanup' patch
---
 drivers/block/loop.c | 6 +++---
 include/linux/fs.h   | 7 +------
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index da693e6a834e..14372df0f354 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -634,8 +634,8 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
 
 static inline void loop_update_dio(struct loop_device *lo)
 {
-	__loop_update_dio(lo, io_is_direct(lo->lo_backing_file) |
-			lo->use_dio);
+	__loop_update_dio(lo, (lo->lo_backing_file->f_flags & O_DIRECT) |
+				lo->use_dio);
 }
 
 static void loop_reread_partitions(struct loop_device *lo,
@@ -1028,7 +1028,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	if (!(lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
 		blk_queue_write_cache(lo->lo_queue, true, false);
 
-	if (io_is_direct(lo->lo_backing_file) && inode->i_sb->s_bdev) {
+	if ((lo->lo_backing_file->f_flags & O_DIRECT) && inode->i_sb->s_bdev) {
 		/* In case of direct I/O, match underlying block size */
 		unsigned short bsize = bdev_logical_block_size(
 			inode->i_sb->s_bdev);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4f6f59b4f22a..a87cc5845a02 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3394,11 +3394,6 @@ extern void setattr_copy(struct inode *inode, const struct iattr *attr);
 
 extern int file_update_time(struct file *file);
 
-static inline bool io_is_direct(struct file *filp)
-{
-	return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host);
-}
-
 static inline bool vma_is_dax(const struct vm_area_struct *vma)
 {
 	return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host);
@@ -3423,7 +3418,7 @@ static inline int iocb_flags(struct file *file)
 	int res = 0;
 	if (file->f_flags & O_APPEND)
 		res |= IOCB_APPEND;
-	if (io_is_direct(file))
+	if (file->f_flags & O_DIRECT)
 		res |= IOCB_DIRECT;
 	if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))
 		res |= IOCB_DSYNC;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V11 03/11] fs/stat: Define DAX statx attribute
  2020-04-28  0:21 [PATCH V11 00/11] XFS - Enable per-file/per-directory DAX operations V11 ira.weiny
  2020-04-28  0:21 ` [PATCH V11 01/11] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
  2020-04-28  0:21 ` [PATCH V11 02/11] fs: Remove unneeded IS_DAX() check in io_is_direct() ira.weiny
@ 2020-04-28  0:21 ` ira.weiny
  2020-04-28  0:21 ` [PATCH V11 04/11] Documentation/dax: Update Usage section ira.weiny
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: ira.weiny @ 2020-04-28  0:21 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Dave Chinner, Jan Kara, Al Viro, Dan Williams,
	Dave Chinner, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

In order for users to determine if a file is currently operating in DAX
state (effective DAX).  Define a statx attribute value and set that
attribute if the effective DAX flag is set.

To go along with this we propose the following addition to the statx man
page:

STATX_ATTR_DAX

	The file is in the DAX (cpu direct access) state.  DAX state
	attempts to minimize software cache effects for both I/O and
	memory mappings of this file.  It requires a file system which
	has been configured to support DAX.

	DAX generally assumes all accesses are via cpu load / store
	instructions which can minimize overhead for small accesses, but
	may adversely affect cpu utilization for large transfers.

	File I/O is done directly to/from user-space buffers and memory
	mapped I/O may be performed with direct memory mappings that
	bypass kernel page cache.

	While the DAX property tends to result in data being transferred
	synchronously, it does not give the same guarantees of O_SYNC
	where data and the necessary metadata are transferred together.

	A DAX file may support being mapped with the MAP_SYNC flag,
	which enables a program to use CPU cache flush instructions to
	persist CPU store operations without an explicit fsync(2).  See
	mmap(2) for more information.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V2:
	Update man page text with comments from Darrick, Jan, Dan, and
	Dave.
---
 fs/stat.c                 | 3 +++
 include/uapi/linux/stat.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/fs/stat.c b/fs/stat.c
index 030008796479..894699c74dde 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -79,6 +79,9 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat,
 	if (IS_AUTOMOUNT(inode))
 		stat->attributes |= STATX_ATTR_AUTOMOUNT;
 
+	if (IS_DAX(inode))
+		stat->attributes |= STATX_ATTR_DAX;
+
 	if (inode->i_op->getattr)
 		return inode->i_op->getattr(path, stat, request_mask,
 					    query_flags);
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index ad80a5c885d5..e5f9d5517f6b 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -169,6 +169,7 @@ struct statx {
 #define STATX_ATTR_ENCRYPTED		0x00000800 /* [I] File requires key to decrypt in fs */
 #define STATX_ATTR_AUTOMOUNT		0x00001000 /* Dir: Automount trigger */
 #define STATX_ATTR_VERITY		0x00100000 /* [I] Verity protected file */
+#define STATX_ATTR_DAX			0x00002000 /* [I] File is DAX */
 
 
 #endif /* _UAPI_LINUX_STAT_H */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V11 04/11] Documentation/dax: Update Usage section
  2020-04-28  0:21 [PATCH V11 00/11] XFS - Enable per-file/per-directory DAX operations V11 ira.weiny
                   ` (2 preceding siblings ...)
  2020-04-28  0:21 ` [PATCH V11 03/11] fs/stat: Define DAX statx attribute ira.weiny
@ 2020-04-28  0:21 ` ira.weiny
  2020-04-28 20:27   ` Darrick J. Wong
  2020-04-28 22:21   ` [PATCH V11.1] " ira.weiny
  2020-04-28  0:21 ` [PATCH V11 05/11] fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS ira.weiny
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 29+ messages in thread
From: ira.weiny @ 2020-04-28  0:21 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Al Viro, Jan Kara, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

Update the Usage section to reflect the new individual dax selection
functionality.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V10:
	Clarifications from Dave
	Add '-c' to xfs_io examples

Changes from V9:
	Fix missing ')'
	Fix trialing '"'

Changes from V8:
	Updates from Darrick

Changes from V7:
	Cleanups/clarifications from Darrick and Dan

Changes from V6:
	Update to allow setting FS_XFLAG_DAX any time.
	Update with list of behaviors from Darrick
	https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/

Changes from V5:
	Update to reflect the agreed upon semantics
	https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
---
 Documentation/filesystems/dax.txt | 139 +++++++++++++++++++++++++++++-
 1 file changed, 136 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
index 679729442fd2..409e4e83e46a 100644
--- a/Documentation/filesystems/dax.txt
+++ b/Documentation/filesystems/dax.txt
@@ -17,11 +17,144 @@ For file mappings, the storage device is mapped directly into userspace.
 Usage
 -----
 
-If you have a block device which supports DAX, you can make a filesystem
+If you have a block device which supports DAX, you can make a file system
 on it as usual.  The DAX code currently only supports files with a block
 size equal to your kernel's PAGE_SIZE, so you may need to specify a block
-size when creating the filesystem.  When mounting it, use the "-o dax"
-option on the command line or add 'dax' to the options in /etc/fstab.
+size when creating the file system.
+
+Currently 3 filesystems support DAX: ext2, ext4 and xfs.  Enabling DAX on them
+is different.
+
+Enabling DAX on ext4 and ext2
+-----------------------------
+
+When mounting the filesystem, use the "-o dax" option on the command line or
+add 'dax' to the options in /etc/fstab.  This works to enable DAX on all files
+within the filesystem.  It is equivalent to the '-o dax=always' behavior below.
+
+
+Enabling DAX on xfs
+-------------------
+
+Summary
+-------
+
+ 1. There exists an in-kernel file access mode flag S_DAX that corresponds to
+    the statx flag STATX_ATTR_DAX.  See the manpage for statx(2) for details
+    about this access mode.
+
+ 2. There exists a persistent flag FS_XFLAG_DAX that can be applied to regular
+    files and directories. This advisory flag can be set or cleared at any
+    time, but doing so does not immediately affect the S_DAX state.
+
+ 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will
+    be inherited by all regular files and sub directories that are subsequently
+    created in this directory. Files and subdirectories that exist at the time
+    this flag is set or cleared on the parent directory are not modified by
+    this modification of the parent directory.
+
+ 4. There exists dax mount options which can override FS_XFLAG_DAX in the
+    setting of the S_DAX flag.  Given underlying storage which supports DAX the
+    following hold.
+
+    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
+
+    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
+
+    "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX."
+
+    "-o dax"        is a legacy option which is an alias for "dax=always".
+		    This may be removed in the future so "-o dax=always" is
+		    the preferred method for specifying this behavior.
+
+    NOTE: Setting and inheritance affect FS_XFLAG_DAX at all times even when
+    the file system is mounted with a dax option.  However, in-core inode state
+    (S_DAX) will be overridden until the file system is remounted with
+    dax=inode and the inode is evicted from kernel memory.
+
+ 5. The DAX policy can be changed via:
+
+    a) Set the parent directory FS_XFLAG_DAX as needed before files are created
+
+    b) Set the appropriate dax="foo" mount option
+
+    c) Change the FS_XFLAG_DAX on existing regular files and directories. This
+       has runtime constraints and limitations that are described in 6) below.
+
+ 6. When changing the DAX policy via toggling the persistent FS_XFLAG_DAX flag,
+    the change in behaviour for existing regular files may not occur
+    immediately.  If the change must take effect immediately, the administrator
+    needs to:
+
+    a) stop the application so there are no active references to the data set
+       the policy change will affect
+
+    b) evict the data set from kernel caches so it will be re-instantiated when
+       the application is restarted. This can be acheived by:
+
+       i. drop-caches
+       ii. a filesystem unmount and mount cycle
+       iii. a system reboot
+
+
+Details
+-------
+
+There are 2 per-file dax flags.  One is a persistent inode setting (FS_XFLAG_DAX)
+and the other is a volatile flag indicating the active state of the feature
+(S_DAX).
+
+FS_XFLAG_DAX is preserved within the file system.  This persistent config
+setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl
+(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
+
+New files and directories automatically inherit FS_XFLAG_DAX from
+their parent directory _when_ _created_.  Therefore, setting FS_XFLAG_DAX at
+directory creation time can be used to set a default behavior for an entire
+sub-tree.
+
+To clarify inheritance here are 3 examples:
+
+Example A:
+
+mkdir -p a/b/c
+xfs_io -c 'chattr +x' a
+mkdir a/b/c/d
+mkdir a/e
+
+	dax: a,e
+	no dax: b,c,d
+
+Example B:
+
+mkdir a
+xfs_io -c 'chattr +x' a
+mkdir -p a/b/c/d
+
+	dax: a,b,c,d
+	no dax:
+
+Example C:
+
+mkdir -p a/b/c
+xfs_io -c 'chattr +x' c
+mkdir a/b/c/d
+
+	dax: c,d
+	no dax: a,b
+
+
+The current enabled state (S_DAX) is set when a file inode is instantiated in
+memory by the kernel.  It is set based on the underlying media support, the
+value of FS_XFLAG_DAX and the file systems dax mount option setting.
+
+statx can be used to query S_DAX.  NOTE that a directory will never have S_DAX
+set and therefore statx will never indicate that S_DAX is set on directories.
+
+Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs even if
+the underlying media does not support dax and/or the file system is overridden
+with a mount option.
+
 
 
 Implementation Tips for Block Driver Writers
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V11 05/11] fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS
  2020-04-28  0:21 [PATCH V11 00/11] XFS - Enable per-file/per-directory DAX operations V11 ira.weiny
                   ` (3 preceding siblings ...)
  2020-04-28  0:21 ` [PATCH V11 04/11] Documentation/dax: Update Usage section ira.weiny
@ 2020-04-28  0:21 ` ira.weiny
  2020-04-28  0:21 ` [PATCH V11 06/11] fs/xfs: Make DAX mount option a tri-state ira.weiny
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: ira.weiny @ 2020-04-28  0:21 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Dave Chinner, Al Viro, Jan Kara, Dan Williams,
	Dave Chinner, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

In prep for the new tri-state mount option which then introduces
XFS_MOUNT_DAX_NEVER.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from v8
	Move bit to 26
---
 fs/xfs/xfs_iops.c  | 2 +-
 fs/xfs/xfs_mount.h | 3 +--
 fs/xfs/xfs_super.c | 8 ++++----
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index f7a99b3bbcf7..462f89af479a 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1248,7 +1248,7 @@ xfs_inode_supports_dax(
 		return false;
 
 	/* DAX mount option or DAX iflag must be set. */
-	if (!(mp->m_flags & XFS_MOUNT_DAX) &&
+	if (!(mp->m_flags & XFS_MOUNT_DAX_ALWAYS) &&
 	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
 		return false;
 
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index b2e4598fdf7d..f6123fb0113c 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -237,8 +237,7 @@ typedef struct xfs_mount {
 #define XFS_MOUNT_FILESTREAMS	(1ULL << 24)	/* enable the filestreams
 						   allocator */
 #define XFS_MOUNT_NOATTR2	(1ULL << 25)	/* disable use of attr2 format */
-
-#define XFS_MOUNT_DAX		(1ULL << 62)	/* TEST ONLY! */
+#define XFS_MOUNT_DAX_ALWAYS	(1ULL << 26)
 
 /*
  * Max and min values for mount-option defined I/O
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 424bb9a2d532..ce169d1c7474 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -129,7 +129,7 @@ xfs_fs_show_options(
 		{ XFS_MOUNT_GRPID,		",grpid" },
 		{ XFS_MOUNT_DISCARD,		",discard" },
 		{ XFS_MOUNT_LARGEIO,		",largeio" },
-		{ XFS_MOUNT_DAX,		",dax" },
+		{ XFS_MOUNT_DAX_ALWAYS,		",dax" },
 		{ 0, NULL }
 	};
 	struct xfs_mount	*mp = XFS_M(root->d_sb);
@@ -1261,7 +1261,7 @@ xfs_fc_parse_param(
 		return 0;
 #ifdef CONFIG_FS_DAX
 	case Opt_dax:
-		mp->m_flags |= XFS_MOUNT_DAX;
+		mp->m_flags |= XFS_MOUNT_DAX_ALWAYS;
 		return 0;
 #endif
 	default:
@@ -1454,7 +1454,7 @@ xfs_fc_fill_super(
 	if (XFS_SB_VERSION_NUM(&mp->m_sb) == XFS_SB_VERSION_5)
 		sb->s_flags |= SB_I_VERSION;
 
-	if (mp->m_flags & XFS_MOUNT_DAX) {
+	if (mp->m_flags & XFS_MOUNT_DAX_ALWAYS) {
 		bool rtdev_is_dax = false, datadev_is_dax;
 
 		xfs_warn(mp,
@@ -1468,7 +1468,7 @@ xfs_fc_fill_super(
 		if (!rtdev_is_dax && !datadev_is_dax) {
 			xfs_alert(mp,
 			"DAX unsupported by block device. Turning off DAX.");
-			mp->m_flags &= ~XFS_MOUNT_DAX;
+			mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS;
 		}
 		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
 			xfs_alert(mp,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V11 06/11] fs/xfs: Make DAX mount option a tri-state
  2020-04-28  0:21 [PATCH V11 00/11] XFS - Enable per-file/per-directory DAX operations V11 ira.weiny
                   ` (4 preceding siblings ...)
  2020-04-28  0:21 ` [PATCH V11 05/11] fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS ira.weiny
@ 2020-04-28  0:21 ` ira.weiny
  2020-04-28 20:08   ` Darrick J. Wong
  2020-04-28  0:21 ` [PATCH V11 07/11] fs/xfs: Create function xfs_inode_should_enable_dax() ira.weiny
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 29+ messages in thread
From: ira.weiny @ 2020-04-28  0:21 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Al Viro, Jan Kara, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

As agreed upon[1].  We make the dax mount option a tri-state.  '-o dax'
continues to operate the same.  We add 'always', 'never', and 'inode'
(default).

[1] https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V10:
	Move show options to xfs_info_set array

Changes from V9:
	Fix indentation in xfs_mount_set_dax_mode()
	Do not report dax=inode

Changes from v8:
	Move NEVER bit to 27
	Use xfs signature style
	use xfs_dax_mode enum

Changes from v7:
	Change to XFS_MOUNT_DAX_NEVER

Changes from v6:
	Use 2 flag bits rather than a field.
	change iflag to inode

Changes from v5:
	New Patch
---
 fs/xfs/xfs_mount.h |  1 +
 fs/xfs/xfs_super.c | 46 ++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index f6123fb0113c..37bfb50db809 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -238,6 +238,7 @@ typedef struct xfs_mount {
 						   allocator */
 #define XFS_MOUNT_NOATTR2	(1ULL << 25)	/* disable use of attr2 format */
 #define XFS_MOUNT_DAX_ALWAYS	(1ULL << 26)
+#define XFS_MOUNT_DAX_NEVER	(1ULL << 27)
 
 /*
  * Max and min values for mount-option defined I/O
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index ce169d1c7474..e80bd2c4c279 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -47,6 +47,39 @@ static struct kset *xfs_kset;		/* top-level xfs sysfs dir */
 static struct xfs_kobj xfs_dbg_kobj;	/* global debug sysfs attrs */
 #endif
 
+enum xfs_dax_mode {
+	XFS_DAX_INODE = 0,
+	XFS_DAX_ALWAYS = 1,
+	XFS_DAX_NEVER = 2,
+};
+
+static void
+xfs_mount_set_dax_mode(
+	struct xfs_mount	*mp,
+	enum xfs_dax_mode	mode)
+{
+	switch (mode) {
+	case XFS_DAX_INODE:
+		mp->m_flags &= ~(XFS_MOUNT_DAX_ALWAYS | XFS_MOUNT_DAX_NEVER);
+		break;
+	case XFS_DAX_ALWAYS:
+		mp->m_flags |= XFS_MOUNT_DAX_ALWAYS;
+		mp->m_flags &= ~XFS_MOUNT_DAX_NEVER;
+		break;
+	case XFS_DAX_NEVER:
+		mp->m_flags |= XFS_MOUNT_DAX_NEVER;
+		mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS;
+		break;
+	}
+}
+
+static const struct constant_table dax_param_enums[] = {
+	{"inode",	XFS_DAX_INODE },
+	{"always",	XFS_DAX_ALWAYS },
+	{"never",	XFS_DAX_NEVER },
+	{}
+};
+
 /*
  * Table driven mount option parser.
  */
@@ -59,7 +92,7 @@ enum {
 	Opt_filestreams, Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota,
 	Opt_prjquota, Opt_uquota, Opt_gquota, Opt_pquota,
 	Opt_uqnoenforce, Opt_gqnoenforce, Opt_pqnoenforce, Opt_qnoenforce,
-	Opt_discard, Opt_nodiscard, Opt_dax,
+	Opt_discard, Opt_nodiscard, Opt_dax, Opt_dax_enum,
 };
 
 static const struct fs_parameter_spec xfs_fs_parameters[] = {
@@ -103,6 +136,7 @@ static const struct fs_parameter_spec xfs_fs_parameters[] = {
 	fsparam_flag("discard",		Opt_discard),
 	fsparam_flag("nodiscard",	Opt_nodiscard),
 	fsparam_flag("dax",		Opt_dax),
+	fsparam_enum("dax",		Opt_dax_enum, dax_param_enums),
 	{}
 };
 
@@ -129,7 +163,8 @@ xfs_fs_show_options(
 		{ XFS_MOUNT_GRPID,		",grpid" },
 		{ XFS_MOUNT_DISCARD,		",discard" },
 		{ XFS_MOUNT_LARGEIO,		",largeio" },
-		{ XFS_MOUNT_DAX_ALWAYS,		",dax" },
+		{ XFS_MOUNT_DAX_ALWAYS,		",dax=always" },
+		{ XFS_MOUNT_DAX_NEVER,		",dax=never" },
 		{ 0, NULL }
 	};
 	struct xfs_mount	*mp = XFS_M(root->d_sb);
@@ -1261,7 +1296,10 @@ xfs_fc_parse_param(
 		return 0;
 #ifdef CONFIG_FS_DAX
 	case Opt_dax:
-		mp->m_flags |= XFS_MOUNT_DAX_ALWAYS;
+		xfs_mount_set_dax_mode(mp, XFS_DAX_ALWAYS);
+		return 0;
+	case Opt_dax_enum:
+		xfs_mount_set_dax_mode(mp, result.uint_32);
 		return 0;
 #endif
 	default:
@@ -1468,7 +1506,7 @@ xfs_fc_fill_super(
 		if (!rtdev_is_dax && !datadev_is_dax) {
 			xfs_alert(mp,
 			"DAX unsupported by block device. Turning off DAX.");
-			mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS;
+			xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
 		}
 		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
 			xfs_alert(mp,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V11 07/11] fs/xfs: Create function xfs_inode_should_enable_dax()
  2020-04-28  0:21 [PATCH V11 00/11] XFS - Enable per-file/per-directory DAX operations V11 ira.weiny
                   ` (5 preceding siblings ...)
  2020-04-28  0:21 ` [PATCH V11 06/11] fs/xfs: Make DAX mount option a tri-state ira.weiny
@ 2020-04-28  0:21 ` ira.weiny
  2020-04-28  0:21 ` [PATCH V11 08/11] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags() ira.weiny
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: ira.weiny @ 2020-04-28  0:21 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Dave Chinner, Al Viro, Jan Kara, Dan Williams,
	Dave Chinner, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

xfs_inode_supports_dax() should reflect if the inode can support DAX not
that it is enabled for DAX.

Change the use of xfs_inode_supports_dax() to reflect only if the inode
and underlying storage support dax.

Add a new function xfs_inode_should_enable_dax() which reflects if the
inode should be enabled for DAX.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V8
	Change to 'should enable' (feedback given by Jan in ext4 series)
		Darrick I've preserved your Reviewed-by for now LMK if
		that is an issue.

Changes from v7:
	Move S_ISREG check first
	use IS_ENABLED(CONFIG_FS_DAX) rather than duplicated function

Changes from v6:
	Change enable checks to be sequential logic.
	Update for 2 bit tri-state option.
	Make 'static' consistent.
	Don't set S_DAX if !CONFIG_FS_DAX

Changes from v5:
	Update to reflect the new tri-state mount option

Changes from v3:
	Update functions and names to be more clear
	Update commit message
	Merge with
		'fs/xfs: Clean up DAX support check'
		don't allow IS_DAX() on a directory
		use STATIC macro for static
		make xfs_inode_supports_dax() static
---
 fs/xfs/xfs_iops.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 462f89af479a..1814f10e43d3 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1243,13 +1243,12 @@ xfs_inode_supports_dax(
 {
 	struct xfs_mount	*mp = ip->i_mount;
 
-	/* Only supported on non-reflinked files. */
-	if (!S_ISREG(VFS_I(ip)->i_mode) || xfs_is_reflink_inode(ip))
+	/* Only supported on regular files. */
+	if (!S_ISREG(VFS_I(ip)->i_mode))
 		return false;
 
-	/* DAX mount option or DAX iflag must be set. */
-	if (!(mp->m_flags & XFS_MOUNT_DAX_ALWAYS) &&
-	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
+	/* Only supported on non-reflinked files. */
+	if (xfs_is_reflink_inode(ip))
 		return false;
 
 	/* Block size must match page size */
@@ -1260,6 +1259,23 @@ xfs_inode_supports_dax(
 	return xfs_inode_buftarg(ip)->bt_daxdev != NULL;
 }
 
+static bool
+xfs_inode_should_enable_dax(
+	struct xfs_inode *ip)
+{
+	if (!IS_ENABLED(CONFIG_FS_DAX))
+		return false;
+	if (ip->i_mount->m_flags & XFS_MOUNT_DAX_NEVER)
+		return false;
+	if (!xfs_inode_supports_dax(ip))
+		return false;
+	if (ip->i_mount->m_flags & XFS_MOUNT_DAX_ALWAYS)
+		return true;
+	if (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)
+		return true;
+	return false;
+}
+
 STATIC void
 xfs_diflags_to_iflags(
 	struct inode		*inode,
@@ -1278,7 +1294,7 @@ xfs_diflags_to_iflags(
 		inode->i_flags |= S_SYNC;
 	if (flags & XFS_DIFLAG_NOATIME)
 		inode->i_flags |= S_NOATIME;
-	if (xfs_inode_supports_dax(ip))
+	if (xfs_inode_should_enable_dax(ip))
 		inode->i_flags |= S_DAX;
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V11 08/11] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
  2020-04-28  0:21 [PATCH V11 00/11] XFS - Enable per-file/per-directory DAX operations V11 ira.weiny
                   ` (6 preceding siblings ...)
  2020-04-28  0:21 ` [PATCH V11 07/11] fs/xfs: Create function xfs_inode_should_enable_dax() ira.weiny
@ 2020-04-28  0:21 ` ira.weiny
  2020-04-28  0:21 ` [PATCH V11 09/11] fs: Lift XFS_IDONTCACHE to the VFS layer ira.weiny
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 29+ messages in thread
From: ira.weiny @ 2020-04-28  0:21 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Dave Chinner, Al Viro, Jan Kara, Dan Williams,
	Dave Chinner, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

The functionality in xfs_diflags_to_linux() and xfs_diflags_to_iflags() are
nearly identical.  The only difference is that *_to_linux() is called after
inode setup and disallows changing the DAX flag.

Combining them can be done with a flag which indicates if this is the initial
setup to allow the DAX flag to be properly set only at init time.

So remove xfs_diflags_to_linux() and call the modified xfs_diflags_to_iflags()
directly.

While we are here simplify xfs_diflags_to_iflags() to take struct xfs_inode and
use xfs_ip2xflags() to ensure future diflags are included correctly.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V7:
	Clarify with a comment the reason for leaving S_DAX out of the
	mask

Changes from V6:
	Move unrelated hunk to previous patch.
	Change logic for better code generation.

Changes from V5:
	The functions are no longer identical so we can only combine
	them rather than deleting one completely.  This is reflected in
	the new init parameter.
---
 fs/xfs/xfs_inode.h |  1 +
 fs/xfs/xfs_ioctl.c | 33 +--------------------------------
 fs/xfs/xfs_iops.c  | 46 +++++++++++++++++++++++++++-------------------
 3 files changed, 29 insertions(+), 51 deletions(-)

diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index c6a63f6764a6..83073c883fbf 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -467,6 +467,7 @@ int	xfs_break_layouts(struct inode *inode, uint *iolock,
 /* from xfs_iops.c */
 extern void xfs_setup_inode(struct xfs_inode *ip);
 extern void xfs_setup_iops(struct xfs_inode *ip);
+extern void xfs_diflags_to_iflags(struct xfs_inode *ip, bool init);
 
 /*
  * When setting up a newly allocated inode, we need to call
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 309958186d33..104495ac187c 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1201,37 +1201,6 @@ xfs_flags2diflags2(
 	return di_flags2;
 }
 
-STATIC void
-xfs_diflags_to_linux(
-	struct xfs_inode	*ip)
-{
-	struct inode		*inode = VFS_I(ip);
-	unsigned int		xflags = xfs_ip2xflags(ip);
-
-	if (xflags & FS_XFLAG_IMMUTABLE)
-		inode->i_flags |= S_IMMUTABLE;
-	else
-		inode->i_flags &= ~S_IMMUTABLE;
-	if (xflags & FS_XFLAG_APPEND)
-		inode->i_flags |= S_APPEND;
-	else
-		inode->i_flags &= ~S_APPEND;
-	if (xflags & FS_XFLAG_SYNC)
-		inode->i_flags |= S_SYNC;
-	else
-		inode->i_flags &= ~S_SYNC;
-	if (xflags & FS_XFLAG_NOATIME)
-		inode->i_flags |= S_NOATIME;
-	else
-		inode->i_flags &= ~S_NOATIME;
-#if 0	/* disabled until the flag switching races are sorted out */
-	if (xflags & FS_XFLAG_DAX)
-		inode->i_flags |= S_DAX;
-	else
-		inode->i_flags &= ~S_DAX;
-#endif
-}
-
 static int
 xfs_ioctl_setattr_xflags(
 	struct xfs_trans	*tp,
@@ -1269,7 +1238,7 @@ xfs_ioctl_setattr_xflags(
 	ip->i_d.di_flags = xfs_flags2diflags(ip, fa->fsx_xflags);
 	ip->i_d.di_flags2 = di_flags2;
 
-	xfs_diflags_to_linux(ip);
+	xfs_diflags_to_iflags(ip, false);
 	xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 	XFS_STATS_INC(mp, xs_ig_attrchg);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 1814f10e43d3..b70b735fe4a4 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1276,26 +1276,34 @@ xfs_inode_should_enable_dax(
 	return false;
 }
 
-STATIC void
+void
 xfs_diflags_to_iflags(
-	struct inode		*inode,
-	struct xfs_inode	*ip)
+	struct xfs_inode	*ip,
+	bool init)
 {
-	uint16_t		flags = ip->i_d.di_flags;
-
-	inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC |
-			    S_NOATIME | S_DAX);
-
-	if (flags & XFS_DIFLAG_IMMUTABLE)
-		inode->i_flags |= S_IMMUTABLE;
-	if (flags & XFS_DIFLAG_APPEND)
-		inode->i_flags |= S_APPEND;
-	if (flags & XFS_DIFLAG_SYNC)
-		inode->i_flags |= S_SYNC;
-	if (flags & XFS_DIFLAG_NOATIME)
-		inode->i_flags |= S_NOATIME;
-	if (xfs_inode_should_enable_dax(ip))
-		inode->i_flags |= S_DAX;
+	struct inode            *inode = VFS_I(ip);
+	unsigned int            xflags = xfs_ip2xflags(ip);
+	unsigned int            flags = 0;
+
+	ASSERT(!(IS_DAX(inode) && init));
+
+	if (xflags & FS_XFLAG_IMMUTABLE)
+		flags |= S_IMMUTABLE;
+	if (xflags & FS_XFLAG_APPEND)
+		flags |= S_APPEND;
+	if (xflags & FS_XFLAG_SYNC)
+		flags |= S_SYNC;
+	if (xflags & FS_XFLAG_NOATIME)
+		flags |= S_NOATIME;
+	if (init && xfs_inode_should_enable_dax(ip))
+		flags |= S_DAX;
+
+	/*
+	 * S_DAX can only be set during inode initialization and is never set by
+	 * the VFS, so we cannot mask off S_DAX in i_flags.
+	 */
+	inode->i_flags &= ~(S_IMMUTABLE | S_APPEND | S_SYNC | S_NOATIME);
+	inode->i_flags |= flags;
 }
 
 /*
@@ -1321,7 +1329,7 @@ xfs_setup_inode(
 	inode_fake_hash(inode);
 
 	i_size_write(inode, ip->i_d.di_size);
-	xfs_diflags_to_iflags(inode, ip);
+	xfs_diflags_to_iflags(ip, true);
 
 	if (S_ISDIR(inode->i_mode)) {
 		/*
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V11 09/11] fs: Lift XFS_IDONTCACHE to the VFS layer
  2020-04-28  0:21 [PATCH V11 00/11] XFS - Enable per-file/per-directory DAX operations V11 ira.weiny
                   ` (7 preceding siblings ...)
  2020-04-28  0:21 ` [PATCH V11 08/11] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags() ira.weiny
@ 2020-04-28  0:21 ` ira.weiny
  2020-04-28 20:06   ` Darrick J. Wong
  2020-04-28  0:21 ` [PATCH V11 10/11] fs: Introduce DCACHE_DONTCACHE ira.weiny
  2020-04-28  0:21 ` [PATCH V11 11/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate() ira.weiny
  10 siblings, 1 reply; 29+ messages in thread
From: ira.weiny @ 2020-04-28  0:21 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Al Viro, Dave Chinner, Jan Kara, Dan Williams,
	Dave Chinner, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

DAX effective mode (S_DAX) changes requires inode eviction.

XFS has an advisory flag (XFS_IDONTCACHE) to prevent caching of the
inode if no other additional references are taken.  We lift this flag to
the VFS layer and change the behavior slightly by allowing the flag to
remain even if multiple references are taken.

This will expedite the eviction of inodes to change S_DAX.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V9:
	Fix misspelling in commit subject
	move XFS_IEOFBLOCKS to '9'

Changes from V8:
	Remove XFS_IDONTCACHE
---
 fs/xfs/xfs_icache.c | 4 ++--
 fs/xfs/xfs_inode.h  | 3 +--
 fs/xfs/xfs_super.c  | 2 +-
 include/linux/fs.h  | 6 +++++-
 4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 17a0b86fe701..de76f7f60695 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -477,7 +477,7 @@ xfs_iget_cache_hit(
 		xfs_ilock(ip, lock_flags);
 
 	if (!(flags & XFS_IGET_INCORE))
-		xfs_iflags_clear(ip, XFS_ISTALE | XFS_IDONTCACHE);
+		xfs_iflags_clear(ip, XFS_ISTALE);
 	XFS_STATS_INC(mp, xs_ig_found);
 
 	return 0;
@@ -559,7 +559,7 @@ xfs_iget_cache_miss(
 	 */
 	iflags = XFS_INEW;
 	if (flags & XFS_IGET_DONTCACHE)
-		iflags |= XFS_IDONTCACHE;
+		VFS_I(ip)->i_state |= I_DONTCACHE;
 	ip->i_udquot = NULL;
 	ip->i_gdquot = NULL;
 	ip->i_pdquot = NULL;
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 83073c883fbf..d8ce3eaa246e 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -218,8 +218,7 @@ static inline bool xfs_inode_has_cow_data(struct xfs_inode *ip)
 #define XFS_IFLOCK		(1 << __XFS_IFLOCK_BIT)
 #define __XFS_IPINNED_BIT	8	 /* wakeup key for zero pin count */
 #define XFS_IPINNED		(1 << __XFS_IPINNED_BIT)
-#define XFS_IDONTCACHE		(1 << 9) /* don't cache the inode long term */
-#define XFS_IEOFBLOCKS		(1 << 10)/* has the preallocblocks tag set */
+#define XFS_IEOFBLOCKS		(1 << 9) /* has the preallocblocks tag set */
 /*
  * If this unlinked inode is in the middle of recovery, don't let drop_inode
  * truncate and free the inode.  This can happen if we iget the inode during
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index e80bd2c4c279..6f91c13fb6ea 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -737,7 +737,7 @@ xfs_fs_drop_inode(
 		return 0;
 	}
 
-	return generic_drop_inode(inode) || (ip->i_flags & XFS_IDONTCACHE);
+	return generic_drop_inode(inode);
 }
 
 static void
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a87cc5845a02..44bd45af760f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2156,6 +2156,8 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
  *
  * I_CREATING		New object's inode in the middle of setting up.
  *
+ * I_DONTCACHE		Evict inode as soon as it is not used anymore.
+ *
  * Q: What is the difference between I_WILL_FREE and I_FREEING?
  */
 #define I_DIRTY_SYNC		(1 << 0)
@@ -2178,6 +2180,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
 #define I_WB_SWITCH		(1 << 13)
 #define I_OVL_INUSE		(1 << 14)
 #define I_CREATING		(1 << 15)
+#define I_DONTCACHE		(1 << 16)
 
 #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
 #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
@@ -3049,7 +3052,8 @@ extern int inode_needs_sync(struct inode *inode);
 extern int generic_delete_inode(struct inode *inode);
 static inline int generic_drop_inode(struct inode *inode)
 {
-	return !inode->i_nlink || inode_unhashed(inode);
+	return !inode->i_nlink || inode_unhashed(inode) ||
+		(inode->i_state & I_DONTCACHE);
 }
 
 extern struct inode *ilookup5_nowait(struct super_block *sb,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V11 10/11] fs: Introduce DCACHE_DONTCACHE
  2020-04-28  0:21 [PATCH V11 00/11] XFS - Enable per-file/per-directory DAX operations V11 ira.weiny
                   ` (8 preceding siblings ...)
  2020-04-28  0:21 ` [PATCH V11 09/11] fs: Lift XFS_IDONTCACHE to the VFS layer ira.weiny
@ 2020-04-28  0:21 ` ira.weiny
  2020-04-28 13:04   ` Jan Kara
  2020-04-28 20:09   ` Darrick J. Wong
  2020-04-28  0:21 ` [PATCH V11 11/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate() ira.weiny
  10 siblings, 2 replies; 29+ messages in thread
From: ira.weiny @ 2020-04-28  0:21 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Al Viro, Jan Kara, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

DCACHE_DONTCACHE indicates a dentry should not be cached on final
dput().

Also add a helper function to mark DCACHE_DONTCACHE on all dentries
pointing to a specific inode when that inode is being set I_DONTCACHE.

This facilitates dropping dentry references to inodes sooner which
require eviction to swap S_DAX mode.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V10:
	rename to d_mark_dontcache()
	Move function to fs/dcache.c

Changes from V9:
	modify i_state under i_lock
	Update comment
		"Purge from memory on final dput()"

Changes from V8:
	Update commit message
	Use mark_inode_dontcache in XFS
	Fix locking...  can't use rcu here.
	Change name to mark_inode_dontcache
---
 fs/dcache.c            | 19 +++++++++++++++++++
 fs/xfs/xfs_icache.c    |  2 +-
 include/linux/dcache.h |  2 ++
 include/linux/fs.h     |  1 +
 4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index b280e07e162b..0d07fb335b78 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -647,6 +647,10 @@ static inline bool retain_dentry(struct dentry *dentry)
 		if (dentry->d_op->d_delete(dentry))
 			return false;
 	}
+
+	if (unlikely(dentry->d_flags & DCACHE_DONTCACHE))
+		return false;
+
 	/* retain; LRU fodder */
 	dentry->d_lockref.count--;
 	if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST)))
@@ -656,6 +660,21 @@ static inline bool retain_dentry(struct dentry *dentry)
 	return true;
 }
 
+void d_mark_dontcache(struct inode *inode)
+{
+	struct dentry *de;
+
+	spin_lock(&inode->i_lock);
+	hlist_for_each_entry(de, &inode->i_dentry, d_u.d_alias) {
+		spin_lock(&de->d_lock);
+		de->d_flags |= DCACHE_DONTCACHE;
+		spin_unlock(&de->d_lock);
+	}
+	inode->i_state |= I_DONTCACHE;
+	spin_unlock(&inode->i_lock);
+}
+EXPORT_SYMBOL(d_mark_dontcache);
+
 /*
  * Finish off a dentry we've decided to kill.
  * dentry->d_lock must be held, returns with it unlocked.
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index de76f7f60695..888646d74d7d 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -559,7 +559,7 @@ xfs_iget_cache_miss(
 	 */
 	iflags = XFS_INEW;
 	if (flags & XFS_IGET_DONTCACHE)
-		VFS_I(ip)->i_state |= I_DONTCACHE;
+		d_mark_dontcache(VFS_I(ip));
 	ip->i_udquot = NULL;
 	ip->i_gdquot = NULL;
 	ip->i_pdquot = NULL;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index c1488cc84fd9..a81f0c3cf352 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -177,6 +177,8 @@ struct dentry_operations {
 
 #define DCACHE_REFERENCED		0x00000040 /* Recently used, don't discard. */
 
+#define DCACHE_DONTCACHE		0x00000080 /* Purge from memory on final dput() */
+
 #define DCACHE_CANT_MOUNT		0x00000100
 #define DCACHE_GENOCIDE			0x00000200
 #define DCACHE_SHRINK_LIST		0x00000400
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 44bd45af760f..7c3e8c0306e0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3055,6 +3055,7 @@ static inline int generic_drop_inode(struct inode *inode)
 	return !inode->i_nlink || inode_unhashed(inode) ||
 		(inode->i_state & I_DONTCACHE);
 }
+extern void d_mark_dontcache(struct inode *inode);
 
 extern struct inode *ilookup5_nowait(struct super_block *sb,
 		unsigned long hashval, int (*test)(struct inode *, void *),
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V11 11/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()
  2020-04-28  0:21 [PATCH V11 00/11] XFS - Enable per-file/per-directory DAX operations V11 ira.weiny
                   ` (9 preceding siblings ...)
  2020-04-28  0:21 ` [PATCH V11 10/11] fs: Introduce DCACHE_DONTCACHE ira.weiny
@ 2020-04-28  0:21 ` ira.weiny
  2020-04-28 20:11   ` Darrick J. Wong
  10 siblings, 1 reply; 29+ messages in thread
From: ira.weiny @ 2020-04-28  0:21 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Al Viro, Jan Kara, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

Because of the separation of FS_XFLAG_DAX from S_DAX and the delayed
setting of S_DAX, data invalidation no longer needs to happen when
FS_XFLAG_DAX is changed.

Change xfs_ioctl_setattr_dax_invalidate() to be
xfs_ioctl_dax_check_set_cache() and alter the code to reflect the new
functionality.

Furthermore, we no longer need the locking so we remove the join_flags
logic.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V10:
	adjust for renamed d_mark_dontcache() function

Changes from V9:
	Change name of function to xfs_ioctl_setattr_prepare_dax()

Changes from V8:
	Change name of function to xfs_ioctl_dax_check_set_cache()
	Update commit message
	Fix bit manipulations

Changes from V7:
	Use new flag_inode_dontcache()
	Skip don't cache if mount over ride is active.

Changes from v6:
	Fix completely broken implementation and update commit message.
	Use the new VFS layer I_DONTCACHE to facilitate inode eviction
	and S_DAX changing on drop_caches

Changes from v5:
	New patch
---
 fs/xfs/xfs_ioctl.c | 108 +++++++++------------------------------------
 1 file changed, 20 insertions(+), 88 deletions(-)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 104495ac187c..ff474f2c9acf 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1245,64 +1245,26 @@ xfs_ioctl_setattr_xflags(
 	return 0;
 }
 
-/*
- * If we are changing DAX flags, we have to ensure the file is clean and any
- * cached objects in the address space are invalidated and removed. This
- * requires us to lock out other IO and page faults similar to a truncate
- * operation. The locks need to be held until the transaction has been committed
- * so that the cache invalidation is atomic with respect to the DAX flag
- * manipulation.
- */
-static int
-xfs_ioctl_setattr_dax_invalidate(
+static void
+xfs_ioctl_setattr_prepare_dax(
 	struct xfs_inode	*ip,
-	struct fsxattr		*fa,
-	int			*join_flags)
+	struct fsxattr		*fa)
 {
-	struct inode		*inode = VFS_I(ip);
-	struct super_block	*sb = inode->i_sb;
-	int			error;
-
-	*join_flags = 0;
-
-	/*
-	 * It is only valid to set the DAX flag on regular files and
-	 * directories on filesystems where the block size is equal to the page
-	 * size. On directories it serves as an inherited hint so we don't
-	 * have to check the device for dax support or flush pagecache.
-	 */
-	if (fa->fsx_xflags & FS_XFLAG_DAX) {
-		struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
-
-		if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize))
-			return -EINVAL;
-	}
-
-	/* If the DAX state is not changing, we have nothing to do here. */
-	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
-		return 0;
-	if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
-		return 0;
+	struct xfs_mount	*mp = ip->i_mount;
+	struct inode            *inode = VFS_I(ip);
 
 	if (S_ISDIR(inode->i_mode))
-		return 0;
-
-	/* lock, flush and invalidate mapping in preparation for flag change */
-	xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
-	error = filemap_write_and_wait(inode->i_mapping);
-	if (error)
-		goto out_unlock;
-	error = invalidate_inode_pages2(inode->i_mapping);
-	if (error)
-		goto out_unlock;
-
-	*join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
-	return 0;
+		return;
 
-out_unlock:
-	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
-	return error;
+	if ((mp->m_flags & XFS_MOUNT_DAX_ALWAYS) ||
+	    (mp->m_flags & XFS_MOUNT_DAX_NEVER))
+		return;
 
+	if (((fa->fsx_xflags & FS_XFLAG_DAX) &&
+	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) ||
+	    (!(fa->fsx_xflags & FS_XFLAG_DAX) &&
+	     (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)))
+		d_mark_dontcache(inode);
 }
 
 /*
@@ -1310,17 +1272,10 @@ xfs_ioctl_setattr_dax_invalidate(
  * have permission to do so. On success, return a clean transaction and the
  * inode locked exclusively ready for further operation specific checks. On
  * failure, return an error without modifying or locking the inode.
- *
- * The inode might already be IO locked on call. If this is the case, it is
- * indicated in @join_flags and we take full responsibility for ensuring they
- * are unlocked from now on. Hence if we have an error here, we still have to
- * unlock them. Otherwise, once they are joined to the transaction, they will
- * be unlocked on commit/cancel.
  */
 static struct xfs_trans *
 xfs_ioctl_setattr_get_trans(
-	struct xfs_inode	*ip,
-	int			join_flags)
+	struct xfs_inode	*ip)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_trans	*tp;
@@ -1337,8 +1292,7 @@ xfs_ioctl_setattr_get_trans(
 		goto out_unlock;
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
-	join_flags = 0;
+	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
 
 	/*
 	 * CAP_FOWNER overrides the following restrictions:
@@ -1359,8 +1313,6 @@ xfs_ioctl_setattr_get_trans(
 out_cancel:
 	xfs_trans_cancel(tp);
 out_unlock:
-	if (join_flags)
-		xfs_iunlock(ip, join_flags);
 	return ERR_PTR(error);
 }
 
@@ -1486,7 +1438,6 @@ xfs_ioctl_setattr(
 	struct xfs_dquot	*pdqp = NULL;
 	struct xfs_dquot	*olddquot = NULL;
 	int			code;
-	int			join_flags = 0;
 
 	trace_xfs_ioctl_setattr(ip);
 
@@ -1510,18 +1461,9 @@ xfs_ioctl_setattr(
 			return code;
 	}
 
-	/*
-	 * Changing DAX config may require inode locking for mapping
-	 * invalidation. These need to be held all the way to transaction commit
-	 * or cancel time, so need to be passed through to
-	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
-	 * appropriately.
-	 */
-	code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags);
-	if (code)
-		goto error_free_dquots;
+	xfs_ioctl_setattr_prepare_dax(ip, fa);
 
-	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
+	tp = xfs_ioctl_setattr_get_trans(ip);
 	if (IS_ERR(tp)) {
 		code = PTR_ERR(tp);
 		goto error_free_dquots;
@@ -1651,7 +1593,6 @@ xfs_ioc_setxflags(
 	struct fsxattr		fa;
 	struct fsxattr		old_fa;
 	unsigned int		flags;
-	int			join_flags = 0;
 	int			error;
 
 	if (copy_from_user(&flags, arg, sizeof(flags)))
@@ -1668,18 +1609,9 @@ xfs_ioc_setxflags(
 	if (error)
 		return error;
 
-	/*
-	 * Changing DAX config may require inode locking for mapping
-	 * invalidation. These need to be held all the way to transaction commit
-	 * or cancel time, so need to be passed through to
-	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
-	 * appropriately.
-	 */
-	error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags);
-	if (error)
-		goto out_drop_write;
+	xfs_ioctl_setattr_prepare_dax(ip, &fa);
 
-	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
+	tp = xfs_ioctl_setattr_get_trans(ip);
 	if (IS_ERR(tp)) {
 		error = PTR_ERR(tp);
 		goto out_drop_write;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V11 10/11] fs: Introduce DCACHE_DONTCACHE
  2020-04-28  0:21 ` [PATCH V11 10/11] fs: Introduce DCACHE_DONTCACHE ira.weiny
@ 2020-04-28 13:04   ` Jan Kara
  2020-04-28 20:09   ` Darrick J. Wong
  1 sibling, 0 replies; 29+ messages in thread
From: Jan Kara @ 2020-04-28 13:04 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, linux-xfs, Darrick J. Wong, Al Viro, Jan Kara,
	Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jeff Moyer, linux-ext4, linux-fsdevel,
	linux-api

On Mon 27-04-20 17:21:41, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> DCACHE_DONTCACHE indicates a dentry should not be cached on final
> dput().
> 
> Also add a helper function to mark DCACHE_DONTCACHE on all dentries
> pointing to a specific inode when that inode is being set I_DONTCACHE.
> 
> This facilitates dropping dentry references to inodes sooner which
> require eviction to swap S_DAX mode.
> 
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

The patch looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> 
> ---
> Changes from V10:
> 	rename to d_mark_dontcache()
> 	Move function to fs/dcache.c
> 
> Changes from V9:
> 	modify i_state under i_lock
> 	Update comment
> 		"Purge from memory on final dput()"
> 
> Changes from V8:
> 	Update commit message
> 	Use mark_inode_dontcache in XFS
> 	Fix locking...  can't use rcu here.
> 	Change name to mark_inode_dontcache
> ---
>  fs/dcache.c            | 19 +++++++++++++++++++
>  fs/xfs/xfs_icache.c    |  2 +-
>  include/linux/dcache.h |  2 ++
>  include/linux/fs.h     |  1 +
>  4 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index b280e07e162b..0d07fb335b78 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -647,6 +647,10 @@ static inline bool retain_dentry(struct dentry *dentry)
>  		if (dentry->d_op->d_delete(dentry))
>  			return false;
>  	}
> +
> +	if (unlikely(dentry->d_flags & DCACHE_DONTCACHE))
> +		return false;
> +
>  	/* retain; LRU fodder */
>  	dentry->d_lockref.count--;
>  	if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST)))
> @@ -656,6 +660,21 @@ static inline bool retain_dentry(struct dentry *dentry)
>  	return true;
>  }
>  
> +void d_mark_dontcache(struct inode *inode)
> +{
> +	struct dentry *de;
> +
> +	spin_lock(&inode->i_lock);
> +	hlist_for_each_entry(de, &inode->i_dentry, d_u.d_alias) {
> +		spin_lock(&de->d_lock);
> +		de->d_flags |= DCACHE_DONTCACHE;
> +		spin_unlock(&de->d_lock);
> +	}
> +	inode->i_state |= I_DONTCACHE;
> +	spin_unlock(&inode->i_lock);
> +}
> +EXPORT_SYMBOL(d_mark_dontcache);
> +
>  /*
>   * Finish off a dentry we've decided to kill.
>   * dentry->d_lock must be held, returns with it unlocked.
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index de76f7f60695..888646d74d7d 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -559,7 +559,7 @@ xfs_iget_cache_miss(
>  	 */
>  	iflags = XFS_INEW;
>  	if (flags & XFS_IGET_DONTCACHE)
> -		VFS_I(ip)->i_state |= I_DONTCACHE;
> +		d_mark_dontcache(VFS_I(ip));
>  	ip->i_udquot = NULL;
>  	ip->i_gdquot = NULL;
>  	ip->i_pdquot = NULL;
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index c1488cc84fd9..a81f0c3cf352 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -177,6 +177,8 @@ struct dentry_operations {
>  
>  #define DCACHE_REFERENCED		0x00000040 /* Recently used, don't discard. */
>  
> +#define DCACHE_DONTCACHE		0x00000080 /* Purge from memory on final dput() */
> +
>  #define DCACHE_CANT_MOUNT		0x00000100
>  #define DCACHE_GENOCIDE			0x00000200
>  #define DCACHE_SHRINK_LIST		0x00000400
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 44bd45af760f..7c3e8c0306e0 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3055,6 +3055,7 @@ static inline int generic_drop_inode(struct inode *inode)
>  	return !inode->i_nlink || inode_unhashed(inode) ||
>  		(inode->i_state & I_DONTCACHE);
>  }
> +extern void d_mark_dontcache(struct inode *inode);
>  
>  extern struct inode *ilookup5_nowait(struct super_block *sb,
>  		unsigned long hashval, int (*test)(struct inode *, void *),
> -- 
> 2.25.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V11 09/11] fs: Lift XFS_IDONTCACHE to the VFS layer
  2020-04-28  0:21 ` [PATCH V11 09/11] fs: Lift XFS_IDONTCACHE to the VFS layer ira.weiny
@ 2020-04-28 20:06   ` Darrick J. Wong
  0 siblings, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2020-04-28 20:06 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, linux-xfs, Al Viro, Dave Chinner, Jan Kara,
	Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jeff Moyer, linux-ext4, linux-fsdevel,
	linux-api

On Mon, Apr 27, 2020 at 05:21:40PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> DAX effective mode (S_DAX) changes requires inode eviction.
> 
> XFS has an advisory flag (XFS_IDONTCACHE) to prevent caching of the
> inode if no other additional references are taken.  We lift this flag to
> the VFS layer and change the behavior slightly by allowing the flag to
> remain even if multiple references are taken.
> 
> This will expedite the eviction of inodes to change S_DAX.
> 
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Looks ok to me,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> 
> ---
> Changes from V9:
> 	Fix misspelling in commit subject
> 	move XFS_IEOFBLOCKS to '9'
> 
> Changes from V8:
> 	Remove XFS_IDONTCACHE
> ---
>  fs/xfs/xfs_icache.c | 4 ++--
>  fs/xfs/xfs_inode.h  | 3 +--
>  fs/xfs/xfs_super.c  | 2 +-
>  include/linux/fs.h  | 6 +++++-
>  4 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 17a0b86fe701..de76f7f60695 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -477,7 +477,7 @@ xfs_iget_cache_hit(
>  		xfs_ilock(ip, lock_flags);
>  
>  	if (!(flags & XFS_IGET_INCORE))
> -		xfs_iflags_clear(ip, XFS_ISTALE | XFS_IDONTCACHE);
> +		xfs_iflags_clear(ip, XFS_ISTALE);
>  	XFS_STATS_INC(mp, xs_ig_found);
>  
>  	return 0;
> @@ -559,7 +559,7 @@ xfs_iget_cache_miss(
>  	 */
>  	iflags = XFS_INEW;
>  	if (flags & XFS_IGET_DONTCACHE)
> -		iflags |= XFS_IDONTCACHE;
> +		VFS_I(ip)->i_state |= I_DONTCACHE;
>  	ip->i_udquot = NULL;
>  	ip->i_gdquot = NULL;
>  	ip->i_pdquot = NULL;
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 83073c883fbf..d8ce3eaa246e 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -218,8 +218,7 @@ static inline bool xfs_inode_has_cow_data(struct xfs_inode *ip)
>  #define XFS_IFLOCK		(1 << __XFS_IFLOCK_BIT)
>  #define __XFS_IPINNED_BIT	8	 /* wakeup key for zero pin count */
>  #define XFS_IPINNED		(1 << __XFS_IPINNED_BIT)
> -#define XFS_IDONTCACHE		(1 << 9) /* don't cache the inode long term */
> -#define XFS_IEOFBLOCKS		(1 << 10)/* has the preallocblocks tag set */
> +#define XFS_IEOFBLOCKS		(1 << 9) /* has the preallocblocks tag set */
>  /*
>   * If this unlinked inode is in the middle of recovery, don't let drop_inode
>   * truncate and free the inode.  This can happen if we iget the inode during
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index e80bd2c4c279..6f91c13fb6ea 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -737,7 +737,7 @@ xfs_fs_drop_inode(
>  		return 0;
>  	}
>  
> -	return generic_drop_inode(inode) || (ip->i_flags & XFS_IDONTCACHE);
> +	return generic_drop_inode(inode);
>  }
>  
>  static void
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index a87cc5845a02..44bd45af760f 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2156,6 +2156,8 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
>   *
>   * I_CREATING		New object's inode in the middle of setting up.
>   *
> + * I_DONTCACHE		Evict inode as soon as it is not used anymore.
> + *
>   * Q: What is the difference between I_WILL_FREE and I_FREEING?
>   */
>  #define I_DIRTY_SYNC		(1 << 0)
> @@ -2178,6 +2180,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
>  #define I_WB_SWITCH		(1 << 13)
>  #define I_OVL_INUSE		(1 << 14)
>  #define I_CREATING		(1 << 15)
> +#define I_DONTCACHE		(1 << 16)
>  
>  #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
>  #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
> @@ -3049,7 +3052,8 @@ extern int inode_needs_sync(struct inode *inode);
>  extern int generic_delete_inode(struct inode *inode);
>  static inline int generic_drop_inode(struct inode *inode)
>  {
> -	return !inode->i_nlink || inode_unhashed(inode);
> +	return !inode->i_nlink || inode_unhashed(inode) ||
> +		(inode->i_state & I_DONTCACHE);
>  }
>  
>  extern struct inode *ilookup5_nowait(struct super_block *sb,
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V11 06/11] fs/xfs: Make DAX mount option a tri-state
  2020-04-28  0:21 ` [PATCH V11 06/11] fs/xfs: Make DAX mount option a tri-state ira.weiny
@ 2020-04-28 20:08   ` Darrick J. Wong
  0 siblings, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2020-04-28 20:08 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, linux-xfs, Al Viro, Jan Kara, Dan Williams,
	Dave Chinner, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Mon, Apr 27, 2020 at 05:21:37PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> As agreed upon[1].  We make the dax mount option a tri-state.  '-o dax'
> continues to operate the same.  We add 'always', 'never', and 'inode'
> (default).
> 
> [1] https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Looks good to me,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> 
> ---
> Changes from V10:
> 	Move show options to xfs_info_set array
> 
> Changes from V9:
> 	Fix indentation in xfs_mount_set_dax_mode()
> 	Do not report dax=inode
> 
> Changes from v8:
> 	Move NEVER bit to 27
> 	Use xfs signature style
> 	use xfs_dax_mode enum
> 
> Changes from v7:
> 	Change to XFS_MOUNT_DAX_NEVER
> 
> Changes from v6:
> 	Use 2 flag bits rather than a field.
> 	change iflag to inode
> 
> Changes from v5:
> 	New Patch
> ---
>  fs/xfs/xfs_mount.h |  1 +
>  fs/xfs/xfs_super.c | 46 ++++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 43 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index f6123fb0113c..37bfb50db809 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -238,6 +238,7 @@ typedef struct xfs_mount {
>  						   allocator */
>  #define XFS_MOUNT_NOATTR2	(1ULL << 25)	/* disable use of attr2 format */
>  #define XFS_MOUNT_DAX_ALWAYS	(1ULL << 26)
> +#define XFS_MOUNT_DAX_NEVER	(1ULL << 27)
>  
>  /*
>   * Max and min values for mount-option defined I/O
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index ce169d1c7474..e80bd2c4c279 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -47,6 +47,39 @@ static struct kset *xfs_kset;		/* top-level xfs sysfs dir */
>  static struct xfs_kobj xfs_dbg_kobj;	/* global debug sysfs attrs */
>  #endif
>  
> +enum xfs_dax_mode {
> +	XFS_DAX_INODE = 0,
> +	XFS_DAX_ALWAYS = 1,
> +	XFS_DAX_NEVER = 2,
> +};
> +
> +static void
> +xfs_mount_set_dax_mode(
> +	struct xfs_mount	*mp,
> +	enum xfs_dax_mode	mode)
> +{
> +	switch (mode) {
> +	case XFS_DAX_INODE:
> +		mp->m_flags &= ~(XFS_MOUNT_DAX_ALWAYS | XFS_MOUNT_DAX_NEVER);
> +		break;
> +	case XFS_DAX_ALWAYS:
> +		mp->m_flags |= XFS_MOUNT_DAX_ALWAYS;
> +		mp->m_flags &= ~XFS_MOUNT_DAX_NEVER;
> +		break;
> +	case XFS_DAX_NEVER:
> +		mp->m_flags |= XFS_MOUNT_DAX_NEVER;
> +		mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS;
> +		break;
> +	}
> +}
> +
> +static const struct constant_table dax_param_enums[] = {
> +	{"inode",	XFS_DAX_INODE },
> +	{"always",	XFS_DAX_ALWAYS },
> +	{"never",	XFS_DAX_NEVER },
> +	{}
> +};
> +
>  /*
>   * Table driven mount option parser.
>   */
> @@ -59,7 +92,7 @@ enum {
>  	Opt_filestreams, Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota,
>  	Opt_prjquota, Opt_uquota, Opt_gquota, Opt_pquota,
>  	Opt_uqnoenforce, Opt_gqnoenforce, Opt_pqnoenforce, Opt_qnoenforce,
> -	Opt_discard, Opt_nodiscard, Opt_dax,
> +	Opt_discard, Opt_nodiscard, Opt_dax, Opt_dax_enum,
>  };
>  
>  static const struct fs_parameter_spec xfs_fs_parameters[] = {
> @@ -103,6 +136,7 @@ static const struct fs_parameter_spec xfs_fs_parameters[] = {
>  	fsparam_flag("discard",		Opt_discard),
>  	fsparam_flag("nodiscard",	Opt_nodiscard),
>  	fsparam_flag("dax",		Opt_dax),
> +	fsparam_enum("dax",		Opt_dax_enum, dax_param_enums),
>  	{}
>  };
>  
> @@ -129,7 +163,8 @@ xfs_fs_show_options(
>  		{ XFS_MOUNT_GRPID,		",grpid" },
>  		{ XFS_MOUNT_DISCARD,		",discard" },
>  		{ XFS_MOUNT_LARGEIO,		",largeio" },
> -		{ XFS_MOUNT_DAX_ALWAYS,		",dax" },
> +		{ XFS_MOUNT_DAX_ALWAYS,		",dax=always" },
> +		{ XFS_MOUNT_DAX_NEVER,		",dax=never" },
>  		{ 0, NULL }
>  	};
>  	struct xfs_mount	*mp = XFS_M(root->d_sb);
> @@ -1261,7 +1296,10 @@ xfs_fc_parse_param(
>  		return 0;
>  #ifdef CONFIG_FS_DAX
>  	case Opt_dax:
> -		mp->m_flags |= XFS_MOUNT_DAX_ALWAYS;
> +		xfs_mount_set_dax_mode(mp, XFS_DAX_ALWAYS);
> +		return 0;
> +	case Opt_dax_enum:
> +		xfs_mount_set_dax_mode(mp, result.uint_32);
>  		return 0;
>  #endif
>  	default:
> @@ -1468,7 +1506,7 @@ xfs_fc_fill_super(
>  		if (!rtdev_is_dax && !datadev_is_dax) {
>  			xfs_alert(mp,
>  			"DAX unsupported by block device. Turning off DAX.");
> -			mp->m_flags &= ~XFS_MOUNT_DAX_ALWAYS;
> +			xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
>  		}
>  		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
>  			xfs_alert(mp,
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V11 10/11] fs: Introduce DCACHE_DONTCACHE
  2020-04-28  0:21 ` [PATCH V11 10/11] fs: Introduce DCACHE_DONTCACHE ira.weiny
  2020-04-28 13:04   ` Jan Kara
@ 2020-04-28 20:09   ` Darrick J. Wong
  1 sibling, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2020-04-28 20:09 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, linux-xfs, Al Viro, Jan Kara, Dan Williams,
	Dave Chinner, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Mon, Apr 27, 2020 at 05:21:41PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> DCACHE_DONTCACHE indicates a dentry should not be cached on final
> dput().
> 
> Also add a helper function to mark DCACHE_DONTCACHE on all dentries
> pointing to a specific inode when that inode is being set I_DONTCACHE.
> 
> This facilitates dropping dentry references to inodes sooner which
> require eviction to swap S_DAX mode.
> 
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Seems reasonable to me,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> 
> ---
> Changes from V10:
> 	rename to d_mark_dontcache()
> 	Move function to fs/dcache.c
> 
> Changes from V9:
> 	modify i_state under i_lock
> 	Update comment
> 		"Purge from memory on final dput()"
> 
> Changes from V8:
> 	Update commit message
> 	Use mark_inode_dontcache in XFS
> 	Fix locking...  can't use rcu here.
> 	Change name to mark_inode_dontcache
> ---
>  fs/dcache.c            | 19 +++++++++++++++++++
>  fs/xfs/xfs_icache.c    |  2 +-
>  include/linux/dcache.h |  2 ++
>  include/linux/fs.h     |  1 +
>  4 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index b280e07e162b..0d07fb335b78 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -647,6 +647,10 @@ static inline bool retain_dentry(struct dentry *dentry)
>  		if (dentry->d_op->d_delete(dentry))
>  			return false;
>  	}
> +
> +	if (unlikely(dentry->d_flags & DCACHE_DONTCACHE))
> +		return false;
> +
>  	/* retain; LRU fodder */
>  	dentry->d_lockref.count--;
>  	if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST)))
> @@ -656,6 +660,21 @@ static inline bool retain_dentry(struct dentry *dentry)
>  	return true;
>  }
>  
> +void d_mark_dontcache(struct inode *inode)
> +{
> +	struct dentry *de;
> +
> +	spin_lock(&inode->i_lock);
> +	hlist_for_each_entry(de, &inode->i_dentry, d_u.d_alias) {
> +		spin_lock(&de->d_lock);
> +		de->d_flags |= DCACHE_DONTCACHE;
> +		spin_unlock(&de->d_lock);
> +	}
> +	inode->i_state |= I_DONTCACHE;
> +	spin_unlock(&inode->i_lock);
> +}
> +EXPORT_SYMBOL(d_mark_dontcache);
> +
>  /*
>   * Finish off a dentry we've decided to kill.
>   * dentry->d_lock must be held, returns with it unlocked.
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index de76f7f60695..888646d74d7d 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -559,7 +559,7 @@ xfs_iget_cache_miss(
>  	 */
>  	iflags = XFS_INEW;
>  	if (flags & XFS_IGET_DONTCACHE)
> -		VFS_I(ip)->i_state |= I_DONTCACHE;
> +		d_mark_dontcache(VFS_I(ip));
>  	ip->i_udquot = NULL;
>  	ip->i_gdquot = NULL;
>  	ip->i_pdquot = NULL;
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index c1488cc84fd9..a81f0c3cf352 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -177,6 +177,8 @@ struct dentry_operations {
>  
>  #define DCACHE_REFERENCED		0x00000040 /* Recently used, don't discard. */
>  
> +#define DCACHE_DONTCACHE		0x00000080 /* Purge from memory on final dput() */
> +
>  #define DCACHE_CANT_MOUNT		0x00000100
>  #define DCACHE_GENOCIDE			0x00000200
>  #define DCACHE_SHRINK_LIST		0x00000400
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 44bd45af760f..7c3e8c0306e0 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3055,6 +3055,7 @@ static inline int generic_drop_inode(struct inode *inode)
>  	return !inode->i_nlink || inode_unhashed(inode) ||
>  		(inode->i_state & I_DONTCACHE);
>  }
> +extern void d_mark_dontcache(struct inode *inode);
>  
>  extern struct inode *ilookup5_nowait(struct super_block *sb,
>  		unsigned long hashval, int (*test)(struct inode *, void *),
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V11 11/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()
  2020-04-28  0:21 ` [PATCH V11 11/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate() ira.weiny
@ 2020-04-28 20:11   ` Darrick J. Wong
  2020-06-02 17:23     ` Darrick J. Wong
  0 siblings, 1 reply; 29+ messages in thread
From: Darrick J. Wong @ 2020-04-28 20:11 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, linux-xfs, Al Viro, Jan Kara, Dan Williams,
	Dave Chinner, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Mon, Apr 27, 2020 at 05:21:42PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> Because of the separation of FS_XFLAG_DAX from S_DAX and the delayed
> setting of S_DAX, data invalidation no longer needs to happen when
> FS_XFLAG_DAX is changed.
> 
> Change xfs_ioctl_setattr_dax_invalidate() to be
> xfs_ioctl_dax_check_set_cache() and alter the code to reflect the new
> functionality.
> 
> Furthermore, we no longer need the locking so we remove the join_flags
> logic.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> 
> ---
> Changes from V10:
> 	adjust for renamed d_mark_dontcache() function
> 
> Changes from V9:
> 	Change name of function to xfs_ioctl_setattr_prepare_dax()
> 
> Changes from V8:
> 	Change name of function to xfs_ioctl_dax_check_set_cache()
> 	Update commit message
> 	Fix bit manipulations
> 
> Changes from V7:
> 	Use new flag_inode_dontcache()
> 	Skip don't cache if mount over ride is active.
> 
> Changes from v6:
> 	Fix completely broken implementation and update commit message.
> 	Use the new VFS layer I_DONTCACHE to facilitate inode eviction
> 	and S_DAX changing on drop_caches
> 
> Changes from v5:
> 	New patch
> ---
>  fs/xfs/xfs_ioctl.c | 108 +++++++++------------------------------------
>  1 file changed, 20 insertions(+), 88 deletions(-)
> 
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 104495ac187c..ff474f2c9acf 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1245,64 +1245,26 @@ xfs_ioctl_setattr_xflags(
>  	return 0;
>  }
>  
> -/*
> - * If we are changing DAX flags, we have to ensure the file is clean and any
> - * cached objects in the address space are invalidated and removed. This
> - * requires us to lock out other IO and page faults similar to a truncate
> - * operation. The locks need to be held until the transaction has been committed
> - * so that the cache invalidation is atomic with respect to the DAX flag
> - * manipulation.
> - */
> -static int
> -xfs_ioctl_setattr_dax_invalidate(
> +static void
> +xfs_ioctl_setattr_prepare_dax(
>  	struct xfs_inode	*ip,
> -	struct fsxattr		*fa,
> -	int			*join_flags)
> +	struct fsxattr		*fa)
>  {
> -	struct inode		*inode = VFS_I(ip);
> -	struct super_block	*sb = inode->i_sb;
> -	int			error;
> -
> -	*join_flags = 0;
> -
> -	/*
> -	 * It is only valid to set the DAX flag on regular files and
> -	 * directories on filesystems where the block size is equal to the page
> -	 * size. On directories it serves as an inherited hint so we don't
> -	 * have to check the device for dax support or flush pagecache.
> -	 */
> -	if (fa->fsx_xflags & FS_XFLAG_DAX) {
> -		struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
> -
> -		if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize))
> -			return -EINVAL;
> -	}
> -
> -	/* If the DAX state is not changing, we have nothing to do here. */
> -	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
> -		return 0;
> -	if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
> -		return 0;
> +	struct xfs_mount	*mp = ip->i_mount;
> +	struct inode            *inode = VFS_I(ip);
>  
>  	if (S_ISDIR(inode->i_mode))
> -		return 0;
> -
> -	/* lock, flush and invalidate mapping in preparation for flag change */
> -	xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> -	error = filemap_write_and_wait(inode->i_mapping);
> -	if (error)
> -		goto out_unlock;
> -	error = invalidate_inode_pages2(inode->i_mapping);
> -	if (error)
> -		goto out_unlock;
> -
> -	*join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
> -	return 0;
> +		return;
>  
> -out_unlock:
> -	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> -	return error;
> +	if ((mp->m_flags & XFS_MOUNT_DAX_ALWAYS) ||
> +	    (mp->m_flags & XFS_MOUNT_DAX_NEVER))
> +		return;
>  
> +	if (((fa->fsx_xflags & FS_XFLAG_DAX) &&
> +	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) ||
> +	    (!(fa->fsx_xflags & FS_XFLAG_DAX) &&
> +	     (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)))
> +		d_mark_dontcache(inode);
>  }
>  
>  /*
> @@ -1310,17 +1272,10 @@ xfs_ioctl_setattr_dax_invalidate(
>   * have permission to do so. On success, return a clean transaction and the
>   * inode locked exclusively ready for further operation specific checks. On
>   * failure, return an error without modifying or locking the inode.
> - *
> - * The inode might already be IO locked on call. If this is the case, it is
> - * indicated in @join_flags and we take full responsibility for ensuring they
> - * are unlocked from now on. Hence if we have an error here, we still have to
> - * unlock them. Otherwise, once they are joined to the transaction, they will
> - * be unlocked on commit/cancel.
>   */
>  static struct xfs_trans *
>  xfs_ioctl_setattr_get_trans(
> -	struct xfs_inode	*ip,
> -	int			join_flags)
> +	struct xfs_inode	*ip)
>  {
>  	struct xfs_mount	*mp = ip->i_mount;
>  	struct xfs_trans	*tp;
> @@ -1337,8 +1292,7 @@ xfs_ioctl_setattr_get_trans(
>  		goto out_unlock;
>  
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
> -	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
> -	join_flags = 0;
> +	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
>  
>  	/*
>  	 * CAP_FOWNER overrides the following restrictions:
> @@ -1359,8 +1313,6 @@ xfs_ioctl_setattr_get_trans(
>  out_cancel:
>  	xfs_trans_cancel(tp);
>  out_unlock:
> -	if (join_flags)
> -		xfs_iunlock(ip, join_flags);
>  	return ERR_PTR(error);
>  }
>  
> @@ -1486,7 +1438,6 @@ xfs_ioctl_setattr(
>  	struct xfs_dquot	*pdqp = NULL;
>  	struct xfs_dquot	*olddquot = NULL;
>  	int			code;
> -	int			join_flags = 0;
>  
>  	trace_xfs_ioctl_setattr(ip);
>  
> @@ -1510,18 +1461,9 @@ xfs_ioctl_setattr(
>  			return code;
>  	}
>  
> -	/*
> -	 * Changing DAX config may require inode locking for mapping
> -	 * invalidation. These need to be held all the way to transaction commit
> -	 * or cancel time, so need to be passed through to
> -	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> -	 * appropriately.
> -	 */
> -	code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags);
> -	if (code)
> -		goto error_free_dquots;
> +	xfs_ioctl_setattr_prepare_dax(ip, fa);
>  
> -	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> +	tp = xfs_ioctl_setattr_get_trans(ip);
>  	if (IS_ERR(tp)) {
>  		code = PTR_ERR(tp);
>  		goto error_free_dquots;
> @@ -1651,7 +1593,6 @@ xfs_ioc_setxflags(
>  	struct fsxattr		fa;
>  	struct fsxattr		old_fa;
>  	unsigned int		flags;
> -	int			join_flags = 0;
>  	int			error;
>  
>  	if (copy_from_user(&flags, arg, sizeof(flags)))
> @@ -1668,18 +1609,9 @@ xfs_ioc_setxflags(
>  	if (error)
>  		return error;
>  
> -	/*
> -	 * Changing DAX config may require inode locking for mapping
> -	 * invalidation. These need to be held all the way to transaction commit
> -	 * or cancel time, so need to be passed through to
> -	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> -	 * appropriately.
> -	 */
> -	error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags);
> -	if (error)
> -		goto out_drop_write;
> +	xfs_ioctl_setattr_prepare_dax(ip, &fa);
>  
> -	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> +	tp = xfs_ioctl_setattr_get_trans(ip);
>  	if (IS_ERR(tp)) {
>  		error = PTR_ERR(tp);
>  		goto out_drop_write;
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V11 04/11] Documentation/dax: Update Usage section
  2020-04-28  0:21 ` [PATCH V11 04/11] Documentation/dax: Update Usage section ira.weiny
@ 2020-04-28 20:27   ` Darrick J. Wong
  2020-04-28 20:53     ` Ira Weiny
  2020-04-28 22:21   ` [PATCH V11.1] " ira.weiny
  1 sibling, 1 reply; 29+ messages in thread
From: Darrick J. Wong @ 2020-04-28 20:27 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, linux-xfs, Al Viro, Jan Kara, Dan Williams,
	Dave Chinner, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Mon, Apr 27, 2020 at 05:21:35PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> Update the Usage section to reflect the new individual dax selection
> functionality.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from V10:
> 	Clarifications from Dave
> 	Add '-c' to xfs_io examples
> 
> Changes from V9:
> 	Fix missing ')'
> 	Fix trialing '"'
> 
> Changes from V8:
> 	Updates from Darrick
> 
> Changes from V7:
> 	Cleanups/clarifications from Darrick and Dan
> 
> Changes from V6:
> 	Update to allow setting FS_XFLAG_DAX any time.
> 	Update with list of behaviors from Darrick
> 	https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> 
> Changes from V5:
> 	Update to reflect the agreed upon semantics
> 	https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> ---
>  Documentation/filesystems/dax.txt | 139 +++++++++++++++++++++++++++++-
>  1 file changed, 136 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> index 679729442fd2..409e4e83e46a 100644
> --- a/Documentation/filesystems/dax.txt
> +++ b/Documentation/filesystems/dax.txt
> @@ -17,11 +17,144 @@ For file mappings, the storage device is mapped directly into userspace.
>  Usage
>  -----
>  
> -If you have a block device which supports DAX, you can make a filesystem
> +If you have a block device which supports DAX, you can make a file system
>  on it as usual.  The DAX code currently only supports files with a block
>  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> -size when creating the filesystem.  When mounting it, use the "-o dax"
> -option on the command line or add 'dax' to the options in /etc/fstab.
> +size when creating the file system.
> +
> +Currently 3 filesystems support DAX: ext2, ext4 and xfs.  Enabling DAX on them
> +is different.
> +
> +Enabling DAX on ext4 and ext2
> +-----------------------------
> +
> +When mounting the filesystem, use the "-o dax" option on the command line or
> +add 'dax' to the options in /etc/fstab.  This works to enable DAX on all files
> +within the filesystem.  It is equivalent to the '-o dax=always' behavior below.
> +
> +
> +Enabling DAX on xfs
> +-------------------
> +
> +Summary
> +-------
> +
> + 1. There exists an in-kernel file access mode flag S_DAX that corresponds to
> +    the statx flag STATX_ATTR_DAX.  See the manpage for statx(2) for details
> +    about this access mode.
> +
> + 2. There exists a persistent flag FS_XFLAG_DAX that can be applied to regular
> +    files and directories. This advisory flag can be set or cleared at any
> +    time, but doing so does not immediately affect the S_DAX state.
> +
> + 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will
> +    be inherited by all regular files and sub directories that are subsequently

Well, I'm at the level of minor edits: "...and subdirectories that..."

> +    created in this directory. Files and subdirectories that exist at the time
> +    this flag is set or cleared on the parent directory are not modified by
> +    this modification of the parent directory.
> +
> + 4. There exists dax mount options which can override FS_XFLAG_DAX in the
> +    setting of the S_DAX flag.  Given underlying storage which supports DAX the
> +    following hold.

"hold:"

> +
> +    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
> +
> +    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
> +
> +    "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX."
> +
> +    "-o dax"        is a legacy option which is an alias for "dax=always".
> +		    This may be removed in the future so "-o dax=always" is
> +		    the preferred method for specifying this behavior.
> +
> +    NOTE: Setting and inheritance affect FS_XFLAG_DAX at all times even when
> +    the file system is mounted with a dax option.

We can also clear the flag at any time no matter the mount option state.
Perhaps:

"NOTE: Modifications to and inheritance behavior of FS_XFLAG_DAX remain
the same even when the filesystem is mounted with a dax option."

> +    However, in-core inode state
> +    (S_DAX) will be overridden until the file system is remounted with
> +    dax=inode and the inode is evicted from kernel memory.
> +
> + 5. The DAX policy can be changed via:

"The S_DAX policy".  I don't want people to get confused.

> +
> +    a) Set the parent directory FS_XFLAG_DAX as needed before files are created
> +
> +    b) Set the appropriate dax="foo" mount option
> +
> +    c) Change the FS_XFLAG_DAX on existing regular files and directories. This
> +       has runtime constraints and limitations that are described in 6) below.

"Setting", and "Changing" at the front of these three bullet points?

Were you to put these together as full sentences, you'd want them to
read "The DAX policy can be changed via setting the parent directory
FS_XFLAG_DAX..."

> +
> + 6. When changing the DAX policy via toggling the persistent FS_XFLAG_DAX flag,

"When changing the S_DAX policy..."

> +    the change in behaviour for existing regular files may not occur
> +    immediately.  If the change must take effect immediately, the administrator
> +    needs to:
> +
> +    a) stop the application so there are no active references to the data set
> +       the policy change will affect
> +
> +    b) evict the data set from kernel caches so it will be re-instantiated when
> +       the application is restarted. This can be acheived by:

"achieved"

> +
> +       i. drop-caches
> +       ii. a filesystem unmount and mount cycle
> +       iii. a system reboot
> +
> +
> +Details
> +-------
> +
> +There are 2 per-file dax flags.  One is a persistent inode setting (FS_XFLAG_DAX)
> +and the other is a volatile flag indicating the active state of the feature
> +(S_DAX).
> +
> +FS_XFLAG_DAX is preserved within the file system.  This persistent config
> +setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl
> +(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
> +
> +New files and directories automatically inherit FS_XFLAG_DAX from
> +their parent directory _when_ _created_.  Therefore, setting FS_XFLAG_DAX at
> +directory creation time can be used to set a default behavior for an entire
> +sub-tree.
> +
> +To clarify inheritance here are 3 examples:

"...inheritance, here are..."

> +
> +Example A:
> +
> +mkdir -p a/b/c
> +xfs_io -c 'chattr +x' a
> +mkdir a/b/c/d
> +mkdir a/e
> +
> +	dax: a,e
> +	no dax: b,c,d
> +
> +Example B:
> +
> +mkdir a
> +xfs_io -c 'chattr +x' a
> +mkdir -p a/b/c/d
> +
> +	dax: a,b,c,d
> +	no dax:
> +
> +Example C:
> +
> +mkdir -p a/b/c
> +xfs_io -c 'chattr +x' c
> +mkdir a/b/c/d
> +
> +	dax: c,d
> +	no dax: a,b
> +
> +
> +The current enabled state (S_DAX) is set when a file inode is instantiated in
> +memory by the kernel.  It is set based on the underlying media support, the
> +value of FS_XFLAG_DAX and the file systems dax mount option setting.

"...and the file system's dax mount option string."

> +
> +statx can be used to query S_DAX.  NOTE that a directory will never have S_DAX

"Note that only regular files will ever have S_DAX set..."?

--D

> +set and therefore statx will never indicate that S_DAX is set on directories.
> +
> +Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs even if
> +the underlying media does not support dax and/or the file system is overridden
> +with a mount option.
> +
>  
>  
>  Implementation Tips for Block Driver Writers
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V11 04/11] Documentation/dax: Update Usage section
  2020-04-28 20:27   ` Darrick J. Wong
@ 2020-04-28 20:53     ` Ira Weiny
  2020-04-28 21:12       ` Darrick J. Wong
  0 siblings, 1 reply; 29+ messages in thread
From: Ira Weiny @ 2020-04-28 20:53 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-kernel, linux-xfs, Al Viro, Jan Kara, Dan Williams,
	Dave Chinner, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Tue, Apr 28, 2020 at 01:27:38PM -0700, Darrick J. Wong wrote:
> On Mon, Apr 27, 2020 at 05:21:35PM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 

[snip]

> > +
> > + 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will
> > +    be inherited by all regular files and sub directories that are subsequently
> 
> Well, I'm at the level of minor edits: "...and subdirectories that..."

Done.

> 
> > +    created in this directory. Files and subdirectories that exist at the time
> > +    this flag is set or cleared on the parent directory are not modified by
> > +    this modification of the parent directory.
> > +
> > + 4. There exists dax mount options which can override FS_XFLAG_DAX in the
> > +    setting of the S_DAX flag.  Given underlying storage which supports DAX the
> > +    following hold.
> 
> "hold:"

Dene.

> 
> > +
> > +    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
> > +
> > +    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
> > +
> > +    "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX."
> > +
> > +    "-o dax"        is a legacy option which is an alias for "dax=always".
> > +		    This may be removed in the future so "-o dax=always" is
> > +		    the preferred method for specifying this behavior.
> > +
> > +    NOTE: Setting and inheritance affect FS_XFLAG_DAX at all times even when
> > +    the file system is mounted with a dax option.
> 
> We can also clear the flag at any time no matter the mount option state.
> Perhaps:
> 
> "NOTE: Modifications to and inheritance behavior of FS_XFLAG_DAX remain
> the same even when the filesystem is mounted with a dax option."

Done.

> 
> > +    However, in-core inode state
> > +    (S_DAX) will be overridden until the file system is remounted with
> > +    dax=inode and the inode is evicted from kernel memory.
> > +
> > + 5. The DAX policy can be changed via:
> 
> "The S_DAX policy".  I don't want people to get confused.

Done.

> 
> > +
> > +    a) Set the parent directory FS_XFLAG_DAX as needed before files are created
> > +
> > +    b) Set the appropriate dax="foo" mount option
> > +
> > +    c) Change the FS_XFLAG_DAX on existing regular files and directories. This
> > +       has runtime constraints and limitations that are described in 6) below.
> 
> "Setting", and "Changing" at the front of these three bullet points?
> 
> Were you to put these together as full sentences, you'd want them to
> read "The DAX policy can be changed via setting the parent directory
> FS_XFLAG_DAX..."
> 

Done.

> > +
> > + 6. When changing the DAX policy via toggling the persistent FS_XFLAG_DAX flag,
> 
> "When changing the S_DAX policy..."

Done.

> 
> > +    the change in behaviour for existing regular files may not occur
> > +    immediately.  If the change must take effect immediately, the administrator
> > +    needs to:
> > +
> > +    a) stop the application so there are no active references to the data set
> > +       the policy change will affect
> > +
> > +    b) evict the data set from kernel caches so it will be re-instantiated when
> > +       the application is restarted. This can be acheived by:
> 
> "achieved"

Done.

> 
> > +
> > +       i. drop-caches
> > +       ii. a filesystem unmount and mount cycle
> > +       iii. a system reboot
> > +
> > +
> > +Details
> > +-------
> > +
> > +There are 2 per-file dax flags.  One is a persistent inode setting (FS_XFLAG_DAX)
> > +and the other is a volatile flag indicating the active state of the feature
> > +(S_DAX).
> > +
> > +FS_XFLAG_DAX is preserved within the file system.  This persistent config
> > +setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl
> > +(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
> > +
> > +New files and directories automatically inherit FS_XFLAG_DAX from
> > +their parent directory _when_ _created_.  Therefore, setting FS_XFLAG_DAX at
> > +directory creation time can be used to set a default behavior for an entire
> > +sub-tree.
> > +
> > +To clarify inheritance here are 3 examples:
> 
> "...inheritance, here are..."

Done.

> 
> > +
> > +Example A:
> > +
> > +mkdir -p a/b/c
> > +xfs_io -c 'chattr +x' a
> > +mkdir a/b/c/d
> > +mkdir a/e
> > +
> > +	dax: a,e
> > +	no dax: b,c,d
> > +
> > +Example B:
> > +
> > +mkdir a
> > +xfs_io -c 'chattr +x' a
> > +mkdir -p a/b/c/d
> > +
> > +	dax: a,b,c,d
> > +	no dax:
> > +
> > +Example C:
> > +
> > +mkdir -p a/b/c
> > +xfs_io -c 'chattr +x' c
> > +mkdir a/b/c/d
> > +
> > +	dax: c,d
> > +	no dax: a,b
> > +
> > +
> > +The current enabled state (S_DAX) is set when a file inode is instantiated in
> > +memory by the kernel.  It is set based on the underlying media support, the
> > +value of FS_XFLAG_DAX and the file systems dax mount option setting.
> 
> "...and the file system's dax mount option string."

No. I don't think string is the right word here.  It is the setting not the
string which is controlling the behavior.  How about:

"...and the file system's dax mount option."

> 
> > +
> > +statx can be used to query S_DAX.  NOTE that a directory will never have S_DAX
> 
> "Note that only regular files will ever have S_DAX set..."?

Done.

Ira

> 
> --D
> 
> > +set and therefore statx will never indicate that S_DAX is set on directories.
> > +
> > +Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs even if
> > +the underlying media does not support dax and/or the file system is overridden
> > +with a mount option.
> > +
> >  
> >  
> >  Implementation Tips for Block Driver Writers
> > -- 
> > 2.25.1
> > 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V11 04/11] Documentation/dax: Update Usage section
  2020-04-28 20:53     ` Ira Weiny
@ 2020-04-28 21:12       ` Darrick J. Wong
  0 siblings, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2020-04-28 21:12 UTC (permalink / raw)
  To: Ira Weiny
  Cc: linux-kernel, linux-xfs, Al Viro, Jan Kara, Dan Williams,
	Dave Chinner, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Tue, Apr 28, 2020 at 01:53:10PM -0700, Ira Weiny wrote:
> On Tue, Apr 28, 2020 at 01:27:38PM -0700, Darrick J. Wong wrote:
> > On Mon, Apr 27, 2020 at 05:21:35PM -0700, ira.weiny@intel.com wrote:
> > > From: Ira Weiny <ira.weiny@intel.com>
> > > 
> 
> [snip]
> 
> > > +
> > > + 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will
> > > +    be inherited by all regular files and sub directories that are subsequently
> > 
> > Well, I'm at the level of minor edits: "...and subdirectories that..."
> 
> Done.
> 
> > 
> > > +    created in this directory. Files and subdirectories that exist at the time
> > > +    this flag is set or cleared on the parent directory are not modified by
> > > +    this modification of the parent directory.
> > > +
> > > + 4. There exists dax mount options which can override FS_XFLAG_DAX in the
> > > +    setting of the S_DAX flag.  Given underlying storage which supports DAX the
> > > +    following hold.
> > 
> > "hold:"
> 
> Dene.
> 
> > 
> > > +
> > > +    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
> > > +
> > > +    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
> > > +
> > > +    "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX."
> > > +
> > > +    "-o dax"        is a legacy option which is an alias for "dax=always".
> > > +		    This may be removed in the future so "-o dax=always" is
> > > +		    the preferred method for specifying this behavior.
> > > +
> > > +    NOTE: Setting and inheritance affect FS_XFLAG_DAX at all times even when
> > > +    the file system is mounted with a dax option.
> > 
> > We can also clear the flag at any time no matter the mount option state.
> > Perhaps:
> > 
> > "NOTE: Modifications to and inheritance behavior of FS_XFLAG_DAX remain
> > the same even when the filesystem is mounted with a dax option."
> 
> Done.
> 
> > 
> > > +    However, in-core inode state
> > > +    (S_DAX) will be overridden until the file system is remounted with
> > > +    dax=inode and the inode is evicted from kernel memory.
> > > +
> > > + 5. The DAX policy can be changed via:
> > 
> > "The S_DAX policy".  I don't want people to get confused.
> 
> Done.
> 
> > 
> > > +
> > > +    a) Set the parent directory FS_XFLAG_DAX as needed before files are created
> > > +
> > > +    b) Set the appropriate dax="foo" mount option
> > > +
> > > +    c) Change the FS_XFLAG_DAX on existing regular files and directories. This
> > > +       has runtime constraints and limitations that are described in 6) below.
> > 
> > "Setting", and "Changing" at the front of these three bullet points?
> > 
> > Were you to put these together as full sentences, you'd want them to
> > read "The DAX policy can be changed via setting the parent directory
> > FS_XFLAG_DAX..."
> > 
> 
> Done.
> 
> > > +
> > > + 6. When changing the DAX policy via toggling the persistent FS_XFLAG_DAX flag,
> > 
> > "When changing the S_DAX policy..."
> 
> Done.
> 
> > 
> > > +    the change in behaviour for existing regular files may not occur
> > > +    immediately.  If the change must take effect immediately, the administrator
> > > +    needs to:
> > > +
> > > +    a) stop the application so there are no active references to the data set
> > > +       the policy change will affect
> > > +
> > > +    b) evict the data set from kernel caches so it will be re-instantiated when
> > > +       the application is restarted. This can be acheived by:
> > 
> > "achieved"
> 
> Done.
> 
> > 
> > > +
> > > +       i. drop-caches
> > > +       ii. a filesystem unmount and mount cycle
> > > +       iii. a system reboot
> > > +
> > > +
> > > +Details
> > > +-------
> > > +
> > > +There are 2 per-file dax flags.  One is a persistent inode setting (FS_XFLAG_DAX)
> > > +and the other is a volatile flag indicating the active state of the feature
> > > +(S_DAX).
> > > +
> > > +FS_XFLAG_DAX is preserved within the file system.  This persistent config
> > > +setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl
> > > +(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
> > > +
> > > +New files and directories automatically inherit FS_XFLAG_DAX from
> > > +their parent directory _when_ _created_.  Therefore, setting FS_XFLAG_DAX at
> > > +directory creation time can be used to set a default behavior for an entire
> > > +sub-tree.
> > > +
> > > +To clarify inheritance here are 3 examples:
> > 
> > "...inheritance, here are..."
> 
> Done.
> 
> > 
> > > +
> > > +Example A:
> > > +
> > > +mkdir -p a/b/c
> > > +xfs_io -c 'chattr +x' a
> > > +mkdir a/b/c/d
> > > +mkdir a/e
> > > +
> > > +	dax: a,e
> > > +	no dax: b,c,d
> > > +
> > > +Example B:
> > > +
> > > +mkdir a
> > > +xfs_io -c 'chattr +x' a
> > > +mkdir -p a/b/c/d
> > > +
> > > +	dax: a,b,c,d
> > > +	no dax:
> > > +
> > > +Example C:
> > > +
> > > +mkdir -p a/b/c
> > > +xfs_io -c 'chattr +x' c
> > > +mkdir a/b/c/d
> > > +
> > > +	dax: c,d
> > > +	no dax: a,b
> > > +
> > > +
> > > +The current enabled state (S_DAX) is set when a file inode is instantiated in
> > > +memory by the kernel.  It is set based on the underlying media support, the
> > > +value of FS_XFLAG_DAX and the file systems dax mount option setting.
> > 
> > "...and the file system's dax mount option string."
> 
> No. I don't think string is the right word here.  It is the setting not the
> string which is controlling the behavior.  How about:
> 
> "...and the file system's dax mount option."

Oops.  I miscopied that; all I really wanted was the apostrophe-s after
"file system".

"...and the file system's dax mount option setting."

would have been fine.

--D

> 
> > 
> > > +
> > > +statx can be used to query S_DAX.  NOTE that a directory will never have S_DAX
> > 
> > "Note that only regular files will ever have S_DAX set..."?
> 
> Done.
> 
> Ira
> 
> > 
> > --D
> > 
> > > +set and therefore statx will never indicate that S_DAX is set on directories.
> > > +
> > > +Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs even if
> > > +the underlying media does not support dax and/or the file system is overridden
> > > +with a mount option.
> > > +
> > >  
> > >  
> > >  Implementation Tips for Block Driver Writers
> > > -- 
> > > 2.25.1
> > > 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V11.1] Documentation/dax: Update Usage section
  2020-04-28  0:21 ` [PATCH V11 04/11] Documentation/dax: Update Usage section ira.weiny
  2020-04-28 20:27   ` Darrick J. Wong
@ 2020-04-28 22:21   ` ira.weiny
  2020-04-29  2:21     ` Randy Dunlap
  2020-04-29  4:33     ` [PATCH V11.2] " ira.weiny
  1 sibling, 2 replies; 29+ messages in thread
From: ira.weiny @ 2020-04-28 22:21 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Al Viro, Jan Kara, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

Update the Usage section to reflect the new individual dax selection
functionality.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V11:
	Minor changes from Darrick

Changes from V10:
	Clarifications from Dave
	Add '-c' to xfs_io examples

Changes from V9:
	Fix missing ')'
	Fix trialing '"'

Changes from V8:
	Updates from Darrick

Changes from V7:
	Cleanups/clarifications from Darrick and Dan

Changes from V6:
	Update to allow setting FS_XFLAG_DAX any time.
	Update with list of behaviors from Darrick
	https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/

Changes from V5:
	Update to reflect the agreed upon semantics
	https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
---
 Documentation/filesystems/dax.txt | 142 +++++++++++++++++++++++++++++-
 1 file changed, 139 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
index 679729442fd2..dc1c1aa36cc2 100644
--- a/Documentation/filesystems/dax.txt
+++ b/Documentation/filesystems/dax.txt
@@ -17,11 +17,147 @@ For file mappings, the storage device is mapped directly into userspace.
 Usage
 -----
 
-If you have a block device which supports DAX, you can make a filesystem
+If you have a block device which supports DAX, you can make a file system
 on it as usual.  The DAX code currently only supports files with a block
 size equal to your kernel's PAGE_SIZE, so you may need to specify a block
-size when creating the filesystem.  When mounting it, use the "-o dax"
-option on the command line or add 'dax' to the options in /etc/fstab.
+size when creating the file system.
+
+Currently 3 filesystems support DAX: ext2, ext4 and xfs.  Enabling DAX on them
+is different.
+
+Enabling DAX on ext4 and ext2
+-----------------------------
+
+When mounting the filesystem, use the "-o dax" option on the command line or
+add 'dax' to the options in /etc/fstab.  This works to enable DAX on all files
+within the filesystem.  It is equivalent to the '-o dax=always' behavior below.
+
+
+Enabling DAX on xfs
+-------------------
+
+Summary
+-------
+
+ 1. There exists an in-kernel file access mode flag S_DAX that corresponds to
+    the statx flag STATX_ATTR_DAX.  See the manpage for statx(2) for details
+    about this access mode.
+
+ 2. There exists a persistent flag FS_XFLAG_DAX that can be applied to regular
+    files and directories. This advisory flag can be set or cleared at any
+    time, but doing so does not immediately affect the S_DAX state.
+
+ 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will
+    be inherited by all regular files and subdirectories that are subsequently
+    created in this directory. Files and subdirectories that exist at the time
+    this flag is set or cleared on the parent directory are not modified by
+    this modification of the parent directory.
+
+ 4. There exists dax mount options which can override FS_XFLAG_DAX in the
+    setting of the S_DAX flag.  Given underlying storage which supports DAX the
+    following hold:
+
+    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
+
+    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
+
+    "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX."
+
+    "-o dax"        is a legacy option which is an alias for "dax=always".
+		    This may be removed in the future so "-o dax=always" is
+		    the preferred method for specifying this behavior.
+
+    NOTE: Modifications to and the inheritance behavior of FS_XFLAG_DAX remain
+    the same even when the file system is mounted with a dax option.  However,
+    in-core inode state (S_DAX) will be overridden until the file system is
+    remounted with dax=inode and the inode is evicted from kernel memory.
+
+ 5. The S_DAX policy can be changed via:
+
+    a) Setting the parent directory FS_XFLAG_DAX as needed before files are
+       created
+
+    b) Setting the appropriate dax="foo" mount option
+
+    c) Changing the FS_XFLAG_DAX on existing regular files and directories.
+       This has runtime constraints and limitations that are described in 6)
+       below.
+
+ 6. When changing the S_DAX policy via toggling the persistent FS_XFLAG_DAX flag,
+    the change in behaviour for existing regular files may not occur
+    immediately.  If the change must take effect immediately, the administrator
+    needs to:
+
+    a) stop the application so there are no active references to the data set
+       the policy change will affect
+
+    b) evict the data set from kernel caches so it will be re-instantiated when
+       the application is restarted. This can be achieved by:
+
+       i. drop-caches
+       ii. a filesystem unmount and mount cycle
+       iii. a system reboot
+
+
+Details
+-------
+
+There are 2 per-file dax flags.  One is a persistent inode setting (FS_XFLAG_DAX)
+and the other is a volatile flag indicating the active state of the feature
+(S_DAX).
+
+FS_XFLAG_DAX is preserved within the file system.  This persistent config
+setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl
+(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
+
+New files and directories automatically inherit FS_XFLAG_DAX from
+their parent directory _when_ _created_.  Therefore, setting FS_XFLAG_DAX at
+directory creation time can be used to set a default behavior for an entire
+sub-tree.
+
+To clarify inheritance, here are 3 examples:
+
+Example A:
+
+mkdir -p a/b/c
+xfs_io -c 'chattr +x' a
+mkdir a/b/c/d
+mkdir a/e
+
+	dax: a,e
+	no dax: b,c,d
+
+Example B:
+
+mkdir a
+xfs_io -c 'chattr +x' a
+mkdir -p a/b/c/d
+
+	dax: a,b,c,d
+	no dax:
+
+Example C:
+
+mkdir -p a/b/c
+xfs_io -c 'chattr +x' c
+mkdir a/b/c/d
+
+	dax: c,d
+	no dax: a,b
+
+
+The current enabled state (S_DAX) is set when a file inode is instantiated in
+memory by the kernel.  It is set based on the underlying media support, the
+value of FS_XFLAG_DAX and the file system's dax mount option.
+
+statx can be used to query S_DAX.  NOTE that only regular files will ever have
+S_DAX set and therefore statx will never indicate that S_DAX is set on
+directories.
+
+Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs even if
+the underlying media does not support dax and/or the file system is overridden
+with a mount option.
+
 
 
 Implementation Tips for Block Driver Writers
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V11.1] Documentation/dax: Update Usage section
  2020-04-28 22:21   ` [PATCH V11.1] " ira.weiny
@ 2020-04-29  2:21     ` Randy Dunlap
  2020-04-29  4:30       ` Ira Weiny
  2020-04-29  4:33     ` [PATCH V11.2] " ira.weiny
  1 sibling, 1 reply; 29+ messages in thread
From: Randy Dunlap @ 2020-04-29  2:21 UTC (permalink / raw)
  To: ira.weiny, linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Al Viro, Jan Kara, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jeff Moyer, linux-ext4, linux-fsdevel,
	linux-api

On 4/28/20 3:21 PM, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> Update the Usage section to reflect the new individual dax selection
> functionality.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from V11:
> 	Minor changes from Darrick
> 
> Changes from V10:
> 	Clarifications from Dave
> 	Add '-c' to xfs_io examples
> 
> Changes from V9:
> 	Fix missing ')'
> 	Fix trialing '"'

trailing

> 
> Changes from V8:
> 	Updates from Darrick
> 
> Changes from V7:
> 	Cleanups/clarifications from Darrick and Dan
> 
> Changes from V6:
> 	Update to allow setting FS_XFLAG_DAX any time.
> 	Update with list of behaviors from Darrick
> 	https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> 
> Changes from V5:
> 	Update to reflect the agreed upon semantics
> 	https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> ---
>  Documentation/filesystems/dax.txt | 142 +++++++++++++++++++++++++++++-
>  1 file changed, 139 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> index 679729442fd2..dc1c1aa36cc2 100644
> --- a/Documentation/filesystems/dax.txt
> +++ b/Documentation/filesystems/dax.txt
> @@ -17,11 +17,147 @@ For file mappings, the storage device is mapped directly into userspace.
>  Usage
>  -----
>  
> -If you have a block device which supports DAX, you can make a filesystem
> +If you have a block device which supports DAX, you can make a file system
>  on it as usual.  The DAX code currently only supports files with a block
>  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> -size when creating the filesystem.  When mounting it, use the "-o dax"
> -option on the command line or add 'dax' to the options in /etc/fstab.
> +size when creating the file system.
> +
> +Currently 3 filesystems support DAX: ext2, ext4 and xfs.  Enabling DAX on them

Why "file system" in the first paragraph when "filesystem" is used here and below?

> +is different.
> +
> +Enabling DAX on ext4 and ext2
> +-----------------------------
> +
> +When mounting the filesystem, use the "-o dax" option on the command line or
> +add 'dax' to the options in /etc/fstab.  This works to enable DAX on all files
> +within the filesystem.  It is equivalent to the '-o dax=always' behavior below.
> +
> +
> +Enabling DAX on xfs
> +-------------------
> +
> +Summary
> +-------
> +
> + 1. There exists an in-kernel file access mode flag S_DAX that corresponds to
> +    the statx flag STATX_ATTR_DAX.  See the manpage for statx(2) for details
> +    about this access mode.
> +
> + 2. There exists a persistent flag FS_XFLAG_DAX that can be applied to regular
> +    files and directories. This advisory flag can be set or cleared at any
> +    time, but doing so does not immediately affect the S_DAX state.
> +
> + 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will
> +    be inherited by all regular files and subdirectories that are subsequently
> +    created in this directory. Files and subdirectories that exist at the time
> +    this flag is set or cleared on the parent directory are not modified by
> +    this modification of the parent directory.
> +
> + 4. There exists dax mount options which can override FS_XFLAG_DAX in the

             exist

> +    setting of the S_DAX flag.  Given underlying storage which supports DAX the
> +    following hold:
> +
> +    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
> +
> +    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
> +
> +    "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX."
> +
> +    "-o dax"        is a legacy option which is an alias for "dax=always".
> +		    This may be removed in the future so "-o dax=always" is
> +		    the preferred method for specifying this behavior.
> +
> +    NOTE: Modifications to and the inheritance behavior of FS_XFLAG_DAX remain
> +    the same even when the file system is mounted with a dax option.  However,
> +    in-core inode state (S_DAX) will be overridden until the file system is

                                     "file system" (2 times above)

> +    remounted with dax=inode and the inode is evicted from kernel memory.
> +
> + 5. The S_DAX policy can be changed via:
> +
> +    a) Setting the parent directory FS_XFLAG_DAX as needed before files are
> +       created
> +
> +    b) Setting the appropriate dax="foo" mount option
> +
> +    c) Changing the FS_XFLAG_DAX on existing regular files and directories.

                       FS_XFLAGS_DAX flag on

> +       This has runtime constraints and limitations that are described in 6)
> +       below.
> +
> + 6. When changing the S_DAX policy via toggling the persistent FS_XFLAG_DAX flag,
> +    the change in behaviour for existing regular files may not occur
> +    immediately.  If the change must take effect immediately, the administrator
> +    needs to:
> +
> +    a) stop the application so there are no active references to the data set
> +       the policy change will affect
> +
> +    b) evict the data set from kernel caches so it will be re-instantiated when
> +       the application is restarted. This can be achieved by:
> +
> +       i. drop-caches
> +       ii. a filesystem unmount and mount cycle

filesystem

> +       iii. a system reboot
> +
> +
> +Details
> +-------
> +
> +There are 2 per-file dax flags.  One is a persistent inode setting (FS_XFLAG_DAX)
> +and the other is a volatile flag indicating the active state of the feature
> +(S_DAX).
> +
> +FS_XFLAG_DAX is preserved within the file system.  This persistent config

file system

> +setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl
> +(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
> +
> +New files and directories automatically inherit FS_XFLAG_DAX from
> +their parent directory _when_ _created_.  Therefore, setting FS_XFLAG_DAX at
> +directory creation time can be used to set a default behavior for an entire
> +sub-tree.
> +
> +To clarify inheritance, here are 3 examples:
> +
> +Example A:
> +
> +mkdir -p a/b/c
> +xfs_io -c 'chattr +x' a
> +mkdir a/b/c/d
> +mkdir a/e
> +
> +	dax: a,e
> +	no dax: b,c,d
> +
> +Example B:
> +
> +mkdir a
> +xfs_io -c 'chattr +x' a
> +mkdir -p a/b/c/d
> +
> +	dax: a,b,c,d
> +	no dax:
> +
> +Example C:
> +
> +mkdir -p a/b/c
> +xfs_io -c 'chattr +x' c
> +mkdir a/b/c/d
> +
> +	dax: c,d
> +	no dax: a,b
> +
> +
> +The current enabled state (S_DAX) is set when a file inode is instantiated in
> +memory by the kernel.  It is set based on the underlying media support, the
> +value of FS_XFLAG_DAX and the file system's dax mount option.
> +
> +statx can be used to query S_DAX.  NOTE that only regular files will ever have
> +S_DAX set and therefore statx will never indicate that S_DAX is set on
> +directories.
> +
> +Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs even if

           the FS_XFLAG_DAX flag

> +the underlying media does not support dax and/or the file system is overridden

file system

Just be consistent, please.

> +with a mount option.
> +
>  
>  
>  Implementation Tips for Block Driver Writers
> 

thanks.
-- 
~Randy


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V11.1] Documentation/dax: Update Usage section
  2020-04-29  2:21     ` Randy Dunlap
@ 2020-04-29  4:30       ` Ira Weiny
  0 siblings, 0 replies; 29+ messages in thread
From: Ira Weiny @ 2020-04-29  4:30 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-kernel, linux-xfs, Darrick J. Wong, Al Viro, Jan Kara,
	Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jeff Moyer, linux-ext4, linux-fsdevel,
	linux-api

On Tue, Apr 28, 2020 at 07:21:18PM -0700, Randy Dunlap wrote:
> On 4/28/20 3:21 PM, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > Update the Usage section to reflect the new individual dax selection
> > functionality.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > 
> > ---
> > Changes from V11:
> > 	Minor changes from Darrick
> > 
> > Changes from V10:
> > 	Clarifications from Dave
> > 	Add '-c' to xfs_io examples
> > 
> > Changes from V9:
> > 	Fix missing ')'
> > 	Fix trialing '"'
> 
> trailing

Thanks

> 
> > 
> > Changes from V8:
> > 	Updates from Darrick
> > 
> > Changes from V7:
> > 	Cleanups/clarifications from Darrick and Dan
> > 
> > Changes from V6:
> > 	Update to allow setting FS_XFLAG_DAX any time.
> > 	Update with list of behaviors from Darrick
> > 	https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/
> > 
> > Changes from V5:
> > 	Update to reflect the agreed upon semantics
> > 	https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
> > ---
> >  Documentation/filesystems/dax.txt | 142 +++++++++++++++++++++++++++++-
> >  1 file changed, 139 insertions(+), 3 deletions(-)
> > 
> > diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> > index 679729442fd2..dc1c1aa36cc2 100644
> > --- a/Documentation/filesystems/dax.txt
> > +++ b/Documentation/filesystems/dax.txt
> > @@ -17,11 +17,147 @@ For file mappings, the storage device is mapped directly into userspace.
> >  Usage
> >  -----
> >  
> > -If you have a block device which supports DAX, you can make a filesystem
> > +If you have a block device which supports DAX, you can make a file system
> >  on it as usual.  The DAX code currently only supports files with a block
> >  size equal to your kernel's PAGE_SIZE, so you may need to specify a block
> > -size when creating the filesystem.  When mounting it, use the "-o dax"
> > -option on the command line or add 'dax' to the options in /etc/fstab.
> > +size when creating the file system.
> > +
> > +Currently 3 filesystems support DAX: ext2, ext4 and xfs.  Enabling DAX on them
> 
> Why "file system" in the first paragraph when "filesystem" is used here and below?

AFAIK they are interchangeable.  https://en.wikipedia.org/wiki/File_system

But consistency is a reasonable request.

The rest of the doc (save 1 place) uses filesystem so I will adopt that here,
and clean up that 1 place.

> 
> > +is different.
> > +
> > +Enabling DAX on ext4 and ext2
> > +-----------------------------
> > +
> > +When mounting the filesystem, use the "-o dax" option on the command line or
> > +add 'dax' to the options in /etc/fstab.  This works to enable DAX on all files
> > +within the filesystem.  It is equivalent to the '-o dax=always' behavior below.
> > +
> > +
> > +Enabling DAX on xfs
> > +-------------------
> > +
> > +Summary
> > +-------
> > +
> > + 1. There exists an in-kernel file access mode flag S_DAX that corresponds to
> > +    the statx flag STATX_ATTR_DAX.  See the manpage for statx(2) for details
> > +    about this access mode.
> > +
> > + 2. There exists a persistent flag FS_XFLAG_DAX that can be applied to regular
> > +    files and directories. This advisory flag can be set or cleared at any
> > +    time, but doing so does not immediately affect the S_DAX state.
> > +
> > + 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will
> > +    be inherited by all regular files and subdirectories that are subsequently
> > +    created in this directory. Files and subdirectories that exist at the time
> > +    this flag is set or cleared on the parent directory are not modified by
> > +    this modification of the parent directory.
> > +
> > + 4. There exists dax mount options which can override FS_XFLAG_DAX in the
> 
>              exist

Ah yea,
Done.

> 
> > +    setting of the S_DAX flag.  Given underlying storage which supports DAX the
> > +    following hold:
> > +
> > +    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
> > +
> > +    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
> > +
> > +    "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX."
> > +
> > +    "-o dax"        is a legacy option which is an alias for "dax=always".
> > +		    This may be removed in the future so "-o dax=always" is
> > +		    the preferred method for specifying this behavior.
> > +
> > +    NOTE: Modifications to and the inheritance behavior of FS_XFLAG_DAX remain
> > +    the same even when the file system is mounted with a dax option.  However,
> > +    in-core inode state (S_DAX) will be overridden until the file system is
> 
>                                      "file system" (2 times above)

Done.

> 
> > +    remounted with dax=inode and the inode is evicted from kernel memory.
> > +
> > + 5. The S_DAX policy can be changed via:
> > +
> > +    a) Setting the parent directory FS_XFLAG_DAX as needed before files are
> > +       created
> > +
> > +    b) Setting the appropriate dax="foo" mount option
> > +
> > +    c) Changing the FS_XFLAG_DAX on existing regular files and directories.
> 
>                        FS_XFLAGS_DAX flag on

Done.

> 
> > +       This has runtime constraints and limitations that are described in 6)
> > +       below.
> > +
> > + 6. When changing the S_DAX policy via toggling the persistent FS_XFLAG_DAX flag,
> > +    the change in behaviour for existing regular files may not occur
> > +    immediately.  If the change must take effect immediately, the administrator
> > +    needs to:
> > +
> > +    a) stop the application so there are no active references to the data set
> > +       the policy change will affect
> > +
> > +    b) evict the data set from kernel caches so it will be re-instantiated when
> > +       the application is restarted. This can be achieved by:
> > +
> > +       i. drop-caches
> > +       ii. a filesystem unmount and mount cycle
> 
> filesystem

Done.

> 
> > +       iii. a system reboot
> > +
> > +
> > +Details
> > +-------
> > +
> > +There are 2 per-file dax flags.  One is a persistent inode setting (FS_XFLAG_DAX)
> > +and the other is a volatile flag indicating the active state of the feature
> > +(S_DAX).
> > +
> > +FS_XFLAG_DAX is preserved within the file system.  This persistent config
> 
> file system

Done.

> 
> > +setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl
> > +(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
> > +
> > +New files and directories automatically inherit FS_XFLAG_DAX from
> > +their parent directory _when_ _created_.  Therefore, setting FS_XFLAG_DAX at
> > +directory creation time can be used to set a default behavior for an entire
> > +sub-tree.
> > +
> > +To clarify inheritance, here are 3 examples:
> > +
> > +Example A:
> > +
> > +mkdir -p a/b/c
> > +xfs_io -c 'chattr +x' a
> > +mkdir a/b/c/d
> > +mkdir a/e
> > +
> > +	dax: a,e
> > +	no dax: b,c,d
> > +
> > +Example B:
> > +
> > +mkdir a
> > +xfs_io -c 'chattr +x' a
> > +mkdir -p a/b/c/d
> > +
> > +	dax: a,b,c,d
> > +	no dax:
> > +
> > +Example C:
> > +
> > +mkdir -p a/b/c
> > +xfs_io -c 'chattr +x' c
> > +mkdir a/b/c/d
> > +
> > +	dax: c,d
> > +	no dax: a,b
> > +
> > +
> > +The current enabled state (S_DAX) is set when a file inode is instantiated in
> > +memory by the kernel.  It is set based on the underlying media support, the
> > +value of FS_XFLAG_DAX and the file system's dax mount option.
> > +
> > +statx can be used to query S_DAX.  NOTE that only regular files will ever have
> > +S_DAX set and therefore statx will never indicate that S_DAX is set on
> > +directories.
> > +
> > +Setting the FS_XFLAG_DAX (specifically or through inheritance) occurs even if
> 
>            the FS_XFLAG_DAX flag

Done.

> 
> > +the underlying media does not support dax and/or the file system is overridden
> 
> file system
> 
> Just be consistent, please.

Fair enough,
Done.

> 
> > +with a mount option.
> > +
> >  
> >  
> >  Implementation Tips for Block Driver Writers
> > 
> 
> thanks.

NP, Thank you!
Ira

> -- 
> ~Randy
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V11.2] Documentation/dax: Update Usage section
  2020-04-28 22:21   ` [PATCH V11.1] " ira.weiny
  2020-04-29  2:21     ` Randy Dunlap
@ 2020-04-29  4:33     ` ira.weiny
  2020-04-29  5:47       ` Randy Dunlap
  1 sibling, 1 reply; 29+ messages in thread
From: ira.weiny @ 2020-04-29  4:33 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Ira Weiny, Al Viro, Jan Kara, Dan Williams, Dave Chinner,
	Christoph Hellwig, Theodore Y. Ts'o, Jeff Moyer, linux-ext4,
	linux-fsdevel, linux-api

From: Ira Weiny <ira.weiny@intel.com>

Update the Usage section to reflect the new individual dax selection
functionality.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V11.1:
	Make filesystem/file system consistently filesystem
	grammatical fixes

Changes from V11:
	Minor changes from Darrick

Changes from V10:
	Clarifications from Dave
	Add '-c' to xfs_io examples

Changes from V9:
	Fix missing ')'
	Fix trailing '"' (fixed this!)

Changes from V8:
	Updates from Darrick

Changes from V7:
	Cleanups/clarifications from Darrick and Dan

Changes from V6:
	Update to allow setting FS_XFLAG_DAX any time.
	Update with list of behaviors from Darrick
	https://lore.kernel.org/lkml/20200409165927.GD6741@magnolia/

Changes from V5:
	Update to reflect the agreed upon semantics
	https://lore.kernel.org/lkml/20200405061945.GA94792@iweiny-DESK2.sc.intel.com/
---
 Documentation/filesystems/dax.txt | 142 +++++++++++++++++++++++++++++-
 1 file changed, 139 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
index 679729442fd2..735fb4b54117 100644
--- a/Documentation/filesystems/dax.txt
+++ b/Documentation/filesystems/dax.txt
@@ -20,8 +20,144 @@ Usage
 If you have a block device which supports DAX, you can make a filesystem
 on it as usual.  The DAX code currently only supports files with a block
 size equal to your kernel's PAGE_SIZE, so you may need to specify a block
-size when creating the filesystem.  When mounting it, use the "-o dax"
-option on the command line or add 'dax' to the options in /etc/fstab.
+size when creating the filesystem.
+
+Currently 3 filesystems support DAX: ext2, ext4 and xfs.  Enabling DAX on them
+is different.
+
+Enabling DAX on ext4 and ext2
+-----------------------------
+
+When mounting the filesystem, use the "-o dax" option on the command line or
+add 'dax' to the options in /etc/fstab.  This works to enable DAX on all files
+within the filesystem.  It is equivalent to the '-o dax=always' behavior below.
+
+
+Enabling DAX on xfs
+-------------------
+
+Summary
+-------
+
+ 1. There exists an in-kernel file access mode flag S_DAX that corresponds to
+    the statx flag STATX_ATTR_DAX.  See the manpage for statx(2) for details
+    about this access mode.
+
+ 2. There exists a persistent flag FS_XFLAG_DAX that can be applied to regular
+    files and directories. This advisory flag can be set or cleared at any
+    time, but doing so does not immediately affect the S_DAX state.
+
+ 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will
+    be inherited by all regular files and subdirectories that are subsequently
+    created in this directory. Files and subdirectories that exist at the time
+    this flag is set or cleared on the parent directory are not modified by
+    this modification of the parent directory.
+
+ 4. There exist dax mount options which can override FS_XFLAG_DAX in the
+    setting of the S_DAX flag.  Given underlying storage which supports DAX the
+    following hold:
+
+    "-o dax=inode"  means "follow FS_XFLAG_DAX" and is the default.
+
+    "-o dax=never"  means "never set S_DAX, ignore FS_XFLAG_DAX."
+
+    "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX."
+
+    "-o dax"        is a legacy option which is an alias for "dax=always".
+		    This may be removed in the future so "-o dax=always" is
+		    the preferred method for specifying this behavior.
+
+    NOTE: Modifications to and the inheritance behavior of FS_XFLAG_DAX remain
+    the same even when the filesystem is mounted with a dax option.  However,
+    in-core inode state (S_DAX) will be overridden until the filesystem is
+    remounted with dax=inode and the inode is evicted from kernel memory.
+
+ 5. The S_DAX policy can be changed via:
+
+    a) Setting the parent directory FS_XFLAG_DAX as needed before files are
+       created
+
+    b) Setting the appropriate dax="foo" mount option
+
+    c) Changing the FS_XFLAG_DAX flag on existing regular files and
+       directories.  This has runtime constraints and limitations that are
+       described in 6) below.
+
+ 6. When changing the S_DAX policy via toggling the persistent FS_XFLAG_DAX flag,
+    the change in behaviour for existing regular files may not occur
+    immediately.  If the change must take effect immediately, the administrator
+    needs to:
+
+    a) stop the application so there are no active references to the data set
+       the policy change will affect
+
+    b) evict the data set from kernel caches so it will be re-instantiated when
+       the application is restarted. This can be achieved by:
+
+       i. drop-caches
+       ii. a filesystem unmount and mount cycle
+       iii. a system reboot
+
+
+Details
+-------
+
+There are 2 per-file dax flags.  One is a persistent inode setting (FS_XFLAG_DAX)
+and the other is a volatile flag indicating the active state of the feature
+(S_DAX).
+
+FS_XFLAG_DAX is preserved within the filesystem.  This persistent config
+setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl
+(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'.
+
+New files and directories automatically inherit FS_XFLAG_DAX from
+their parent directory _when_ _created_.  Therefore, setting FS_XFLAG_DAX at
+directory creation time can be used to set a default behavior for an entire
+sub-tree.
+
+To clarify inheritance, here are 3 examples:
+
+Example A:
+
+mkdir -p a/b/c
+xfs_io -c 'chattr +x' a
+mkdir a/b/c/d
+mkdir a/e
+
+	dax: a,e
+	no dax: b,c,d
+
+Example B:
+
+mkdir a
+xfs_io -c 'chattr +x' a
+mkdir -p a/b/c/d
+
+	dax: a,b,c,d
+	no dax:
+
+Example C:
+
+mkdir -p a/b/c
+xfs_io -c 'chattr +x' c
+mkdir a/b/c/d
+
+	dax: c,d
+	no dax: a,b
+
+
+The current enabled state (S_DAX) is set when a file inode is instantiated in
+memory by the kernel.  It is set based on the underlying media support, the
+value of FS_XFLAG_DAX and the filesystem's dax mount option.
+
+statx can be used to query S_DAX.  NOTE that only regular files will ever have
+S_DAX set and therefore statx will never indicate that S_DAX is set on
+directories.
+
+Setting the FS_XFLAG_DAX flag (specifically or through inheritance) occurs even
+if the underlying media does not support dax and/or the filesystem is
+overridden with a mount option.
+
 
 
 Implementation Tips for Block Driver Writers
@@ -94,7 +230,7 @@ sysadmins have an option to restore the lost data from a prior backup/inbuilt
 redundancy in the following ways:
 
 1. Delete the affected file, and restore from a backup (sysadmin route):
-   This will free the file system blocks that were being used by the file,
+   This will free the filesystem blocks that were being used by the file,
    and the next time they're allocated, they will be zeroed first, which
    happens through the driver, and will clear bad sectors.
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V11.2] Documentation/dax: Update Usage section
  2020-04-29  4:33     ` [PATCH V11.2] " ira.weiny
@ 2020-04-29  5:47       ` Randy Dunlap
  0 siblings, 0 replies; 29+ messages in thread
From: Randy Dunlap @ 2020-04-29  5:47 UTC (permalink / raw)
  To: ira.weiny, linux-kernel, linux-xfs, Darrick J. Wong
  Cc: Al Viro, Jan Kara, Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jeff Moyer, linux-ext4, linux-fsdevel,
	linux-api

On 4/28/20 9:33 PM, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> Update the Usage section to reflect the new individual dax selection
> functionality.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---

Acked-by: Randy Dunlap <rdunlap@infradead.org>

Thanks.

> ---
>  Documentation/filesystems/dax.txt | 142 +++++++++++++++++++++++++++++-
>  1 file changed, 139 insertions(+), 3 deletions(-)


-- 
~Randy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V11 11/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()
  2020-04-28 20:11   ` Darrick J. Wong
@ 2020-06-02 17:23     ` Darrick J. Wong
  2020-06-02 23:15       ` Ira Weiny
  2020-06-03 10:10       ` Jan Kara
  0 siblings, 2 replies; 29+ messages in thread
From: Darrick J. Wong @ 2020-06-02 17:23 UTC (permalink / raw)
  To: ira.weiny
  Cc: linux-kernel, linux-xfs, Al Viro, Jan Kara, Dan Williams,
	Dave Chinner, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Tue, Apr 28, 2020 at 01:11:38PM -0700, Darrick J. Wong wrote:
> On Mon, Apr 27, 2020 at 05:21:42PM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > Because of the separation of FS_XFLAG_DAX from S_DAX and the delayed
> > setting of S_DAX, data invalidation no longer needs to happen when
> > FS_XFLAG_DAX is changed.
> > 
> > Change xfs_ioctl_setattr_dax_invalidate() to be
> > xfs_ioctl_dax_check_set_cache() and alter the code to reflect the new
> > functionality.
> > 
> > Furthermore, we no longer need the locking so we remove the join_flags
> > logic.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> Looks ok,
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> --D
> 
> > 
> > ---
> > Changes from V10:
> > 	adjust for renamed d_mark_dontcache() function
> > 
> > Changes from V9:
> > 	Change name of function to xfs_ioctl_setattr_prepare_dax()
> > 
> > Changes from V8:
> > 	Change name of function to xfs_ioctl_dax_check_set_cache()
> > 	Update commit message
> > 	Fix bit manipulations
> > 
> > Changes from V7:
> > 	Use new flag_inode_dontcache()
> > 	Skip don't cache if mount over ride is active.
> > 
> > Changes from v6:
> > 	Fix completely broken implementation and update commit message.
> > 	Use the new VFS layer I_DONTCACHE to facilitate inode eviction
> > 	and S_DAX changing on drop_caches
> > 
> > Changes from v5:
> > 	New patch
> > ---
> >  fs/xfs/xfs_ioctl.c | 108 +++++++++------------------------------------
> >  1 file changed, 20 insertions(+), 88 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index 104495ac187c..ff474f2c9acf 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -1245,64 +1245,26 @@ xfs_ioctl_setattr_xflags(
> >  	return 0;
> >  }
> >  
> > -/*
> > - * If we are changing DAX flags, we have to ensure the file is clean and any
> > - * cached objects in the address space are invalidated and removed. This
> > - * requires us to lock out other IO and page faults similar to a truncate
> > - * operation. The locks need to be held until the transaction has been committed
> > - * so that the cache invalidation is atomic with respect to the DAX flag
> > - * manipulation.
> > - */
> > -static int
> > -xfs_ioctl_setattr_dax_invalidate(
> > +static void
> > +xfs_ioctl_setattr_prepare_dax(
> >  	struct xfs_inode	*ip,
> > -	struct fsxattr		*fa,
> > -	int			*join_flags)
> > +	struct fsxattr		*fa)
> >  {
> > -	struct inode		*inode = VFS_I(ip);
> > -	struct super_block	*sb = inode->i_sb;
> > -	int			error;
> > -
> > -	*join_flags = 0;
> > -
> > -	/*
> > -	 * It is only valid to set the DAX flag on regular files and
> > -	 * directories on filesystems where the block size is equal to the page
> > -	 * size. On directories it serves as an inherited hint so we don't
> > -	 * have to check the device for dax support or flush pagecache.
> > -	 */
> > -	if (fa->fsx_xflags & FS_XFLAG_DAX) {
> > -		struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
> > -
> > -		if (!bdev_dax_supported(target->bt_bdev, sb->s_blocksize))
> > -			return -EINVAL;
> > -	}
> > -
> > -	/* If the DAX state is not changing, we have nothing to do here. */
> > -	if ((fa->fsx_xflags & FS_XFLAG_DAX) && IS_DAX(inode))
> > -		return 0;
> > -	if (!(fa->fsx_xflags & FS_XFLAG_DAX) && !IS_DAX(inode))
> > -		return 0;
> > +	struct xfs_mount	*mp = ip->i_mount;
> > +	struct inode            *inode = VFS_I(ip);
> >  
> >  	if (S_ISDIR(inode->i_mode))
> > -		return 0;
> > -
> > -	/* lock, flush and invalidate mapping in preparation for flag change */
> > -	xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> > -	error = filemap_write_and_wait(inode->i_mapping);
> > -	if (error)
> > -		goto out_unlock;
> > -	error = invalidate_inode_pages2(inode->i_mapping);
> > -	if (error)
> > -		goto out_unlock;
> > -
> > -	*join_flags = XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL;
> > -	return 0;
> > +		return;
> >  
> > -out_unlock:
> > -	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> > -	return error;
> > +	if ((mp->m_flags & XFS_MOUNT_DAX_ALWAYS) ||
> > +	    (mp->m_flags & XFS_MOUNT_DAX_NEVER))
> > +		return;
> >  
> > +	if (((fa->fsx_xflags & FS_XFLAG_DAX) &&
> > +	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) ||
> > +	    (!(fa->fsx_xflags & FS_XFLAG_DAX) &&
> > +	     (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)))
> > +		d_mark_dontcache(inode);

Now that I think about this further, are we /really/ sure that we want
to let unprivileged userspace cause inode evictions?

--D

> >  }
> >  
> >  /*
> > @@ -1310,17 +1272,10 @@ xfs_ioctl_setattr_dax_invalidate(
> >   * have permission to do so. On success, return a clean transaction and the
> >   * inode locked exclusively ready for further operation specific checks. On
> >   * failure, return an error without modifying or locking the inode.
> > - *
> > - * The inode might already be IO locked on call. If this is the case, it is
> > - * indicated in @join_flags and we take full responsibility for ensuring they
> > - * are unlocked from now on. Hence if we have an error here, we still have to
> > - * unlock them. Otherwise, once they are joined to the transaction, they will
> > - * be unlocked on commit/cancel.
> >   */
> >  static struct xfs_trans *
> >  xfs_ioctl_setattr_get_trans(
> > -	struct xfs_inode	*ip,
> > -	int			join_flags)
> > +	struct xfs_inode	*ip)
> >  {
> >  	struct xfs_mount	*mp = ip->i_mount;
> >  	struct xfs_trans	*tp;
> > @@ -1337,8 +1292,7 @@ xfs_ioctl_setattr_get_trans(
> >  		goto out_unlock;
> >  
> >  	xfs_ilock(ip, XFS_ILOCK_EXCL);
> > -	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL | join_flags);
> > -	join_flags = 0;
> > +	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> >  
> >  	/*
> >  	 * CAP_FOWNER overrides the following restrictions:
> > @@ -1359,8 +1313,6 @@ xfs_ioctl_setattr_get_trans(
> >  out_cancel:
> >  	xfs_trans_cancel(tp);
> >  out_unlock:
> > -	if (join_flags)
> > -		xfs_iunlock(ip, join_flags);
> >  	return ERR_PTR(error);
> >  }
> >  
> > @@ -1486,7 +1438,6 @@ xfs_ioctl_setattr(
> >  	struct xfs_dquot	*pdqp = NULL;
> >  	struct xfs_dquot	*olddquot = NULL;
> >  	int			code;
> > -	int			join_flags = 0;
> >  
> >  	trace_xfs_ioctl_setattr(ip);
> >  
> > @@ -1510,18 +1461,9 @@ xfs_ioctl_setattr(
> >  			return code;
> >  	}
> >  
> > -	/*
> > -	 * Changing DAX config may require inode locking for mapping
> > -	 * invalidation. These need to be held all the way to transaction commit
> > -	 * or cancel time, so need to be passed through to
> > -	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> > -	 * appropriately.
> > -	 */
> > -	code = xfs_ioctl_setattr_dax_invalidate(ip, fa, &join_flags);
> > -	if (code)
> > -		goto error_free_dquots;
> > +	xfs_ioctl_setattr_prepare_dax(ip, fa);
> >  
> > -	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> > +	tp = xfs_ioctl_setattr_get_trans(ip);
> >  	if (IS_ERR(tp)) {
> >  		code = PTR_ERR(tp);
> >  		goto error_free_dquots;
> > @@ -1651,7 +1593,6 @@ xfs_ioc_setxflags(
> >  	struct fsxattr		fa;
> >  	struct fsxattr		old_fa;
> >  	unsigned int		flags;
> > -	int			join_flags = 0;
> >  	int			error;
> >  
> >  	if (copy_from_user(&flags, arg, sizeof(flags)))
> > @@ -1668,18 +1609,9 @@ xfs_ioc_setxflags(
> >  	if (error)
> >  		return error;
> >  
> > -	/*
> > -	 * Changing DAX config may require inode locking for mapping
> > -	 * invalidation. These need to be held all the way to transaction commit
> > -	 * or cancel time, so need to be passed through to
> > -	 * xfs_ioctl_setattr_get_trans() so it can apply them to the join call
> > -	 * appropriately.
> > -	 */
> > -	error = xfs_ioctl_setattr_dax_invalidate(ip, &fa, &join_flags);
> > -	if (error)
> > -		goto out_drop_write;
> > +	xfs_ioctl_setattr_prepare_dax(ip, &fa);
> >  
> > -	tp = xfs_ioctl_setattr_get_trans(ip, join_flags);
> > +	tp = xfs_ioctl_setattr_get_trans(ip);
> >  	if (IS_ERR(tp)) {
> >  		error = PTR_ERR(tp);
> >  		goto out_drop_write;
> > -- 
> > 2.25.1
> > 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V11 11/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()
  2020-06-02 17:23     ` Darrick J. Wong
@ 2020-06-02 23:15       ` Ira Weiny
  2020-06-03 10:10       ` Jan Kara
  1 sibling, 0 replies; 29+ messages in thread
From: Ira Weiny @ 2020-06-02 23:15 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-kernel, linux-xfs, Al Viro, Jan Kara, Dan Williams,
	Dave Chinner, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Tue, Jun 02, 2020 at 10:23:53AM -0700, Darrick J. Wong wrote:
> On Tue, Apr 28, 2020 at 01:11:38PM -0700, Darrick J. Wong wrote:
> > On Mon, Apr 27, 2020 at 05:21:42PM -0700, ira.weiny@intel.com wrote:
> > > From: Ira Weiny <ira.weiny@intel.com>
> > > 

...

> > > -out_unlock:
> > > -	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> > > -	return error;
> > > +	if ((mp->m_flags & XFS_MOUNT_DAX_ALWAYS) ||
> > > +	    (mp->m_flags & XFS_MOUNT_DAX_NEVER))
> > > +		return;
> > >  
> > > +	if (((fa->fsx_xflags & FS_XFLAG_DAX) &&
> > > +	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) ||
> > > +	    (!(fa->fsx_xflags & FS_XFLAG_DAX) &&
> > > +	     (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)))
> > > +		d_mark_dontcache(inode);
> 
> Now that I think about this further, are we /really/ sure that we want
> to let unprivileged userspace cause inode evictions?

This code only applies to files they have access to.  And it does not directly
cause an eviction.  It only hints that those inodes (for which they have access
to) will not be cached thus causing them to be reloaded sooner than they might
otherwise be.

So I think we are fine here.

Ira

> 
> --D
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V11 11/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()
  2020-06-02 17:23     ` Darrick J. Wong
  2020-06-02 23:15       ` Ira Weiny
@ 2020-06-03 10:10       ` Jan Kara
  2020-06-03 17:03         ` Darrick J. Wong
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Kara @ 2020-06-03 10:10 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: ira.weiny, linux-kernel, linux-xfs, Al Viro, Jan Kara,
	Dan Williams, Dave Chinner, Christoph Hellwig,
	Theodore Y. Ts'o, Jeff Moyer, linux-ext4, linux-fsdevel,
	linux-api

On Tue 02-06-20 10:23:53, Darrick J. Wong wrote:
> On Tue, Apr 28, 2020 at 01:11:38PM -0700, Darrick J. Wong wrote:
> > > -out_unlock:
> > > -	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> > > -	return error;
> > > +	if ((mp->m_flags & XFS_MOUNT_DAX_ALWAYS) ||
> > > +	    (mp->m_flags & XFS_MOUNT_DAX_NEVER))
> > > +		return;
> > >  
> > > +	if (((fa->fsx_xflags & FS_XFLAG_DAX) &&
> > > +	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) ||
> > > +	    (!(fa->fsx_xflags & FS_XFLAG_DAX) &&
> > > +	     (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)))
> > > +		d_mark_dontcache(inode);
> 
> Now that I think about this further, are we /really/ sure that we want
> to let unprivileged userspace cause inode evictions?

You have to have an equivalent of write access to the file to be able to
trigger d_mark_dontcache(). So you can e.g. delete it.  Or you could
fadvise / madvise regarding its page cache. I don't see the ability to push
inode out of cache as stronger than the abilities you already have...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V11 11/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()
  2020-06-03 10:10       ` Jan Kara
@ 2020-06-03 17:03         ` Darrick J. Wong
  0 siblings, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2020-06-03 17:03 UTC (permalink / raw)
  To: Jan Kara
  Cc: ira.weiny, linux-kernel, linux-xfs, Al Viro, Dan Williams,
	Dave Chinner, Christoph Hellwig, Theodore Y. Ts'o,
	Jeff Moyer, linux-ext4, linux-fsdevel, linux-api

On Wed, Jun 03, 2020 at 12:10:24PM +0200, Jan Kara wrote:
> On Tue 02-06-20 10:23:53, Darrick J. Wong wrote:
> > On Tue, Apr 28, 2020 at 01:11:38PM -0700, Darrick J. Wong wrote:
> > > > -out_unlock:
> > > > -	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
> > > > -	return error;
> > > > +	if ((mp->m_flags & XFS_MOUNT_DAX_ALWAYS) ||
> > > > +	    (mp->m_flags & XFS_MOUNT_DAX_NEVER))
> > > > +		return;
> > > >  
> > > > +	if (((fa->fsx_xflags & FS_XFLAG_DAX) &&
> > > > +	    !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) ||
> > > > +	    (!(fa->fsx_xflags & FS_XFLAG_DAX) &&
> > > > +	     (ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)))
> > > > +		d_mark_dontcache(inode);
> > 
> > Now that I think about this further, are we /really/ sure that we want
> > to let unprivileged userspace cause inode evictions?
> 
> You have to have an equivalent of write access to the file to be able to
> trigger d_mark_dontcache(). So you can e.g. delete it.  Or you could
> fadvise / madvise regarding its page cache. I don't see the ability to push
> inode out of cache as stronger than the abilities you already have...

<nod> Ok.  I just had one last bout of paranoia, but I think it'll be
fine. :)

--D

> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, back to index

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-28  0:21 [PATCH V11 00/11] XFS - Enable per-file/per-directory DAX operations V11 ira.weiny
2020-04-28  0:21 ` [PATCH V11 01/11] fs/xfs: Remove unnecessary initialization of i_rwsem ira.weiny
2020-04-28  0:21 ` [PATCH V11 02/11] fs: Remove unneeded IS_DAX() check in io_is_direct() ira.weiny
2020-04-28  0:21 ` [PATCH V11 03/11] fs/stat: Define DAX statx attribute ira.weiny
2020-04-28  0:21 ` [PATCH V11 04/11] Documentation/dax: Update Usage section ira.weiny
2020-04-28 20:27   ` Darrick J. Wong
2020-04-28 20:53     ` Ira Weiny
2020-04-28 21:12       ` Darrick J. Wong
2020-04-28 22:21   ` [PATCH V11.1] " ira.weiny
2020-04-29  2:21     ` Randy Dunlap
2020-04-29  4:30       ` Ira Weiny
2020-04-29  4:33     ` [PATCH V11.2] " ira.weiny
2020-04-29  5:47       ` Randy Dunlap
2020-04-28  0:21 ` [PATCH V11 05/11] fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS ira.weiny
2020-04-28  0:21 ` [PATCH V11 06/11] fs/xfs: Make DAX mount option a tri-state ira.weiny
2020-04-28 20:08   ` Darrick J. Wong
2020-04-28  0:21 ` [PATCH V11 07/11] fs/xfs: Create function xfs_inode_should_enable_dax() ira.weiny
2020-04-28  0:21 ` [PATCH V11 08/11] fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags() ira.weiny
2020-04-28  0:21 ` [PATCH V11 09/11] fs: Lift XFS_IDONTCACHE to the VFS layer ira.weiny
2020-04-28 20:06   ` Darrick J. Wong
2020-04-28  0:21 ` [PATCH V11 10/11] fs: Introduce DCACHE_DONTCACHE ira.weiny
2020-04-28 13:04   ` Jan Kara
2020-04-28 20:09   ` Darrick J. Wong
2020-04-28  0:21 ` [PATCH V11 11/11] fs/xfs: Update xfs_ioctl_setattr_dax_invalidate() ira.weiny
2020-04-28 20:11   ` Darrick J. Wong
2020-06-02 17:23     ` Darrick J. Wong
2020-06-02 23:15       ` Ira Weiny
2020-06-03 10:10       ` Jan Kara
2020-06-03 17:03         ` Darrick J. Wong

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git