All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/7] make statx() return I/O alignment information
@ 2022-05-18 23:50 ` Eric Biggers
  0 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, linux-api,
	linux-fscrypt, linux-block, linux-kernel, Keith Busch

This patchset makes the statx() system call return I/O alignment
information, roughly following the design that was suggested at
https://lore.kernel.org/linux-fsdevel/20220120071215.123274-1-ebiggers@kernel.org/T/#u

This feature solves two problems: (a) it allows userspace to determine
when a file supports direct I/O, and with what alignment restrictions;
and (b) it allows userspace to determine the optimum I/O alignment for a
file.  For more details, see patch 1.

This is an RFC.  I'd greatly appreciate any feedback on the UAPI, as
that obviously needs to be gotten right from the beginning.  E.g., does
the proposed set of fields make sense?  Am I including the right
information in stx_offset_align_optimal?

Patch 1 adds the VFS support for STATX_IOALIGN.  The remaining patches
wire it up to ext4 and f2fs.  Support for other filesystems can be added
later.  We could also support this on block device files; however, since
block device nodes have different inodes from the block devices
themselves, it wouldn't apply to statx("/dev/$foo") but rather just to
'fd = open("/dev/foo"); statx(fd)'.  I'm unsure how useful that would be.

Note, f2fs has one corner case where DIO reads are allowed but not DIO
writes.  The proposed statx fields can't represent this.  My proposal
(patch 5) is to just eliminate this case, as it seems much too weird.
But I'd appreciate any feedback on that part.

This patchset applies to v5.18-rc7.

No changes since v1, which I sent a few months ago; I'm resending this
because people seem interested in it again
(https://lore.kernel.org/r/20220518171131.3525293-1-kbusch@fb.com).

Eric Biggers (7):
  statx: add I/O alignment information
  fscrypt: change fscrypt_dio_supported() to prepare for STATX_IOALIGN
  ext4: support STATX_IOALIGN
  f2fs: move f2fs_force_buffered_io() into file.c
  f2fs: don't allow DIO reads but not DIO writes
  f2fs: simplify f2fs_force_buffered_io()
  f2fs: support STATX_IOALIGN

 fs/crypto/inline_crypt.c  | 48 +++++++++++++++---------------
 fs/ext4/ext4.h            |  1 +
 fs/ext4/file.c            | 10 +++----
 fs/ext4/inode.c           | 31 ++++++++++++++++++++
 fs/f2fs/f2fs.h            | 45 -----------------------------
 fs/f2fs/file.c            | 61 ++++++++++++++++++++++++++++++++++++++-
 fs/stat.c                 |  3 ++
 include/linux/fscrypt.h   |  7 ++---
 include/linux/stat.h      |  3 ++
 include/uapi/linux/stat.h |  9 ++++--
 10 files changed, 136 insertions(+), 82 deletions(-)


base-commit: 42226c989789d8da4af1de0c31070c96726d990c
-- 
2.36.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [f2fs-dev] [RFC PATCH v2 0/7] make statx() return I/O alignment information
@ 2022-05-18 23:50 ` Eric Biggers
  0 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-block, linux-api, linux-kernel, linux-f2fs-devel,
	linux-xfs, linux-fscrypt, Keith Busch, linux-ext4

This patchset makes the statx() system call return I/O alignment
information, roughly following the design that was suggested at
https://lore.kernel.org/linux-fsdevel/20220120071215.123274-1-ebiggers@kernel.org/T/#u

This feature solves two problems: (a) it allows userspace to determine
when a file supports direct I/O, and with what alignment restrictions;
and (b) it allows userspace to determine the optimum I/O alignment for a
file.  For more details, see patch 1.

This is an RFC.  I'd greatly appreciate any feedback on the UAPI, as
that obviously needs to be gotten right from the beginning.  E.g., does
the proposed set of fields make sense?  Am I including the right
information in stx_offset_align_optimal?

Patch 1 adds the VFS support for STATX_IOALIGN.  The remaining patches
wire it up to ext4 and f2fs.  Support for other filesystems can be added
later.  We could also support this on block device files; however, since
block device nodes have different inodes from the block devices
themselves, it wouldn't apply to statx("/dev/$foo") but rather just to
'fd = open("/dev/foo"); statx(fd)'.  I'm unsure how useful that would be.

Note, f2fs has one corner case where DIO reads are allowed but not DIO
writes.  The proposed statx fields can't represent this.  My proposal
(patch 5) is to just eliminate this case, as it seems much too weird.
But I'd appreciate any feedback on that part.

This patchset applies to v5.18-rc7.

No changes since v1, which I sent a few months ago; I'm resending this
because people seem interested in it again
(https://lore.kernel.org/r/20220518171131.3525293-1-kbusch@fb.com).

Eric Biggers (7):
  statx: add I/O alignment information
  fscrypt: change fscrypt_dio_supported() to prepare for STATX_IOALIGN
  ext4: support STATX_IOALIGN
  f2fs: move f2fs_force_buffered_io() into file.c
  f2fs: don't allow DIO reads but not DIO writes
  f2fs: simplify f2fs_force_buffered_io()
  f2fs: support STATX_IOALIGN

 fs/crypto/inline_crypt.c  | 48 +++++++++++++++---------------
 fs/ext4/ext4.h            |  1 +
 fs/ext4/file.c            | 10 +++----
 fs/ext4/inode.c           | 31 ++++++++++++++++++++
 fs/f2fs/f2fs.h            | 45 -----------------------------
 fs/f2fs/file.c            | 61 ++++++++++++++++++++++++++++++++++++++-
 fs/stat.c                 |  3 ++
 include/linux/fscrypt.h   |  7 ++---
 include/linux/stat.h      |  3 ++
 include/uapi/linux/stat.h |  9 ++++--
 10 files changed, 136 insertions(+), 82 deletions(-)


base-commit: 42226c989789d8da4af1de0c31070c96726d990c
-- 
2.36.1



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [RFC PATCH v2 1/7] statx: add I/O alignment information
  2022-05-18 23:50 ` [f2fs-dev] " Eric Biggers
@ 2022-05-18 23:50   ` Eric Biggers
  -1 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, linux-api,
	linux-fscrypt, linux-block, linux-kernel, Keith Busch

From: Eric Biggers <ebiggers@google.com>

Traditionally, the conditions for when DIO (direct I/O) is supported
were fairly simple: filesystems either supported DIO aligned to the
block device's logical block size, or didn't support DIO at all.

However, due to filesystem features that have been added over time (e.g,
data journalling, inline data, encryption, verity, compression,
checkpoint disabling, log-structured mode), the conditions for when DIO
is allowed on a file have gotten increasingly complex.  Whether a
particular file supports DIO, and with what alignment, can depend on
various file attributes and filesystem mount options, as well as which
block device(s) the file's data is located on.

XFS has an ioctl XFS_IOC_DIOINFO which exposes this information to
applications.  However, as discussed
(https://lore.kernel.org/linux-fsdevel/20220120071215.123274-1-ebiggers@kernel.org/T/#u),
this ioctl is rarely used and not known to be used outside of
XFS-specific code.  It also was never intended to indicate when a file
doesn't support DIO at all, and it only exposes the minimum I/O
alignment, not the optimal I/O alignment which has been requested too.

Therefore, let's expose this information via statx().  Add the
STATX_IOALIGN flag and three fields associated with it:

* stx_mem_align_dio: the alignment (in bytes) required for user memory
  buffers for DIO, or 0 if DIO is not supported on the file.

* stx_offset_align_dio: the alignment (in bytes) required for file
  offsets and I/O segment lengths for DIO, or 0 if DIO is not supported
  on the file.  This will only be nonzero if stx_mem_align_dio is
  nonzero, and vice versa.

* stx_offset_align_optimal: the alignment (in bytes) suggested for file
  offsets and I/O segment lengths to get optimal performance.  This
  applies to both DIO and buffered I/O.  It differs from stx_blocksize
  in that stx_offset_align_optimal will contain the real optimum I/O
  size, which may be a large value.  In contrast, for compatibility
  reasons stx_blocksize is the minimum size needed to avoid page cache
  read/write/modify cycles, which may be much smaller than the optimum
  I/O size.  For more details about the motivation for this field, see
  https://lore.kernel.org/r/20220210040304.GM59729@dread.disaster.area

Note that as with other statx() extensions, if STATX_IOALIGN isn't set
in the returned statx struct, then these new fields won't be filled in.
This will happen if the filesystem doesn't support STATX_IOALIGN, or if
the file isn't a regular file.  (It might be supported on block device
files in the future.)  It might also happen if the caller didn't include
STATX_IOALIGN in the request mask, since statx() isn't required to
return information that wasn't requested.

This commit adds the VFS-level plumbing for STATX_IOALIGN.  Individual
filesystems will still need to add code to support it.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/stat.c                 | 3 +++
 include/linux/stat.h      | 3 +++
 include/uapi/linux/stat.h | 9 +++++++--
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/stat.c b/fs/stat.c
index 5c2c94464e8b0..9d477218545b8 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -611,6 +611,9 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer)
 	tmp.stx_dev_major = MAJOR(stat->dev);
 	tmp.stx_dev_minor = MINOR(stat->dev);
 	tmp.stx_mnt_id = stat->mnt_id;
+	tmp.stx_mem_align_dio = stat->mem_align_dio;
+	tmp.stx_offset_align_dio = stat->offset_align_dio;
+	tmp.stx_offset_align_optimal = stat->offset_align_optimal;
 
 	return copy_to_user(buffer, &tmp, sizeof(tmp)) ? -EFAULT : 0;
 }
diff --git a/include/linux/stat.h b/include/linux/stat.h
index 7df06931f25d8..48b8b1ad1567c 100644
--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -50,6 +50,9 @@ struct kstat {
 	struct timespec64 btime;			/* File creation time */
 	u64		blocks;
 	u64		mnt_id;
+	u32		mem_align_dio;
+	u32		offset_align_dio;
+	u32		offset_align_optimal;
 };
 
 #endif
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index 1500a0f58041a..f822b23e81091 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -124,9 +124,13 @@ struct statx {
 	__u32	stx_dev_minor;
 	/* 0x90 */
 	__u64	stx_mnt_id;
-	__u64	__spare2;
+	__u32	stx_mem_align_dio;	/* Memory buffer alignment for direct I/O */
+	__u32	stx_offset_align_dio;	/* File offset alignment for direct I/O */
 	/* 0xa0 */
-	__u64	__spare3[12];	/* Spare space for future expansion */
+	__u32	stx_offset_align_optimal; /* Optimal file offset alignment for I/O */
+	__u32	__spare2;
+	/* 0xa8 */
+	__u64	__spare3[11];	/* Spare space for future expansion */
 	/* 0x100 */
 };
 
@@ -152,6 +156,7 @@ struct statx {
 #define STATX_BASIC_STATS	0x000007ffU	/* The stuff in the normal stat struct */
 #define STATX_BTIME		0x00000800U	/* Want/got stx_btime */
 #define STATX_MNT_ID		0x00001000U	/* Got stx_mnt_id */
+#define STATX_IOALIGN		0x00002000U	/* Want/got IO alignment info */
 
 #define STATX__RESERVED		0x80000000U	/* Reserved for future struct statx expansion */
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [f2fs-dev] [RFC PATCH v2 1/7] statx: add I/O alignment information
@ 2022-05-18 23:50   ` Eric Biggers
  0 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-block, linux-api, linux-kernel, linux-f2fs-devel,
	linux-xfs, linux-fscrypt, Keith Busch, linux-ext4

From: Eric Biggers <ebiggers@google.com>

Traditionally, the conditions for when DIO (direct I/O) is supported
were fairly simple: filesystems either supported DIO aligned to the
block device's logical block size, or didn't support DIO at all.

However, due to filesystem features that have been added over time (e.g,
data journalling, inline data, encryption, verity, compression,
checkpoint disabling, log-structured mode), the conditions for when DIO
is allowed on a file have gotten increasingly complex.  Whether a
particular file supports DIO, and with what alignment, can depend on
various file attributes and filesystem mount options, as well as which
block device(s) the file's data is located on.

XFS has an ioctl XFS_IOC_DIOINFO which exposes this information to
applications.  However, as discussed
(https://lore.kernel.org/linux-fsdevel/20220120071215.123274-1-ebiggers@kernel.org/T/#u),
this ioctl is rarely used and not known to be used outside of
XFS-specific code.  It also was never intended to indicate when a file
doesn't support DIO at all, and it only exposes the minimum I/O
alignment, not the optimal I/O alignment which has been requested too.

Therefore, let's expose this information via statx().  Add the
STATX_IOALIGN flag and three fields associated with it:

* stx_mem_align_dio: the alignment (in bytes) required for user memory
  buffers for DIO, or 0 if DIO is not supported on the file.

* stx_offset_align_dio: the alignment (in bytes) required for file
  offsets and I/O segment lengths for DIO, or 0 if DIO is not supported
  on the file.  This will only be nonzero if stx_mem_align_dio is
  nonzero, and vice versa.

* stx_offset_align_optimal: the alignment (in bytes) suggested for file
  offsets and I/O segment lengths to get optimal performance.  This
  applies to both DIO and buffered I/O.  It differs from stx_blocksize
  in that stx_offset_align_optimal will contain the real optimum I/O
  size, which may be a large value.  In contrast, for compatibility
  reasons stx_blocksize is the minimum size needed to avoid page cache
  read/write/modify cycles, which may be much smaller than the optimum
  I/O size.  For more details about the motivation for this field, see
  https://lore.kernel.org/r/20220210040304.GM59729@dread.disaster.area

Note that as with other statx() extensions, if STATX_IOALIGN isn't set
in the returned statx struct, then these new fields won't be filled in.
This will happen if the filesystem doesn't support STATX_IOALIGN, or if
the file isn't a regular file.  (It might be supported on block device
files in the future.)  It might also happen if the caller didn't include
STATX_IOALIGN in the request mask, since statx() isn't required to
return information that wasn't requested.

This commit adds the VFS-level plumbing for STATX_IOALIGN.  Individual
filesystems will still need to add code to support it.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/stat.c                 | 3 +++
 include/linux/stat.h      | 3 +++
 include/uapi/linux/stat.h | 9 +++++++--
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/stat.c b/fs/stat.c
index 5c2c94464e8b0..9d477218545b8 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -611,6 +611,9 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer)
 	tmp.stx_dev_major = MAJOR(stat->dev);
 	tmp.stx_dev_minor = MINOR(stat->dev);
 	tmp.stx_mnt_id = stat->mnt_id;
+	tmp.stx_mem_align_dio = stat->mem_align_dio;
+	tmp.stx_offset_align_dio = stat->offset_align_dio;
+	tmp.stx_offset_align_optimal = stat->offset_align_optimal;
 
 	return copy_to_user(buffer, &tmp, sizeof(tmp)) ? -EFAULT : 0;
 }
diff --git a/include/linux/stat.h b/include/linux/stat.h
index 7df06931f25d8..48b8b1ad1567c 100644
--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -50,6 +50,9 @@ struct kstat {
 	struct timespec64 btime;			/* File creation time */
 	u64		blocks;
 	u64		mnt_id;
+	u32		mem_align_dio;
+	u32		offset_align_dio;
+	u32		offset_align_optimal;
 };
 
 #endif
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index 1500a0f58041a..f822b23e81091 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -124,9 +124,13 @@ struct statx {
 	__u32	stx_dev_minor;
 	/* 0x90 */
 	__u64	stx_mnt_id;
-	__u64	__spare2;
+	__u32	stx_mem_align_dio;	/* Memory buffer alignment for direct I/O */
+	__u32	stx_offset_align_dio;	/* File offset alignment for direct I/O */
 	/* 0xa0 */
-	__u64	__spare3[12];	/* Spare space for future expansion */
+	__u32	stx_offset_align_optimal; /* Optimal file offset alignment for I/O */
+	__u32	__spare2;
+	/* 0xa8 */
+	__u64	__spare3[11];	/* Spare space for future expansion */
 	/* 0x100 */
 };
 
@@ -152,6 +156,7 @@ struct statx {
 #define STATX_BASIC_STATS	0x000007ffU	/* The stuff in the normal stat struct */
 #define STATX_BTIME		0x00000800U	/* Want/got stx_btime */
 #define STATX_MNT_ID		0x00001000U	/* Got stx_mnt_id */
+#define STATX_IOALIGN		0x00002000U	/* Want/got IO alignment info */
 
 #define STATX__RESERVED		0x80000000U	/* Reserved for future struct statx expansion */
 
-- 
2.36.1



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC PATCH v2 2/7] fscrypt: change fscrypt_dio_supported() to prepare for STATX_IOALIGN
  2022-05-18 23:50 ` [f2fs-dev] " Eric Biggers
@ 2022-05-18 23:50   ` Eric Biggers
  -1 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, linux-api,
	linux-fscrypt, linux-block, linux-kernel, Keith Busch

From: Eric Biggers <ebiggers@google.com>

To prepare for STATX_IOALIGN support, make two changes to
fscrypt_dio_supported().

First, remove the filesystem-block-alignment check and make the
filesystems handle it instead.  It previously made sense to have it in
fs/crypto/; however, to support STATX_IOALIGN the alignment requirement
would have to be returned to filesystems.  It ends up being simpler if
filesystems handle this part themselves, especially for f2fs which only
allows fs-block-aligned DIO in the first place.

Second, make fscrypt_dio_supported() work on inodes whose encryption key
hasn't been set up yet, by making it set up the key if needed.  This is
required for statx(), since statx() doesn't require a file descriptor.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/crypto/inline_crypt.c | 48 +++++++++++++++++++++-------------------
 fs/ext4/file.c           |  9 ++++++--
 fs/f2fs/f2fs.h           |  2 +-
 include/linux/fscrypt.h  |  7 ++----
 4 files changed, 35 insertions(+), 31 deletions(-)

diff --git a/fs/crypto/inline_crypt.c b/fs/crypto/inline_crypt.c
index 93c2ca8580923..82df4c0b9903c 100644
--- a/fs/crypto/inline_crypt.c
+++ b/fs/crypto/inline_crypt.c
@@ -370,43 +370,45 @@ bool fscrypt_mergeable_bio_bh(struct bio *bio,
 EXPORT_SYMBOL_GPL(fscrypt_mergeable_bio_bh);
 
 /**
- * fscrypt_dio_supported() - check whether a DIO (direct I/O) request is
- *			     supported as far as encryption is concerned
- * @iocb: the file and position the I/O is targeting
- * @iter: the I/O data segment(s)
+ * fscrypt_dio_supported() - check whether DIO (direct I/O) is supported on an
+ *			     inode, as far as encryption is concerned
+ * @inode: the inode in question
  *
  * Return: %true if there are no encryption constraints that prevent DIO from
  *	   being supported; %false if DIO is unsupported.  (Note that in the
  *	   %true case, the filesystem might have other, non-encryption-related
- *	   constraints that prevent DIO from actually being supported.)
+ *	   constraints that prevent DIO from actually being supported.  Also, on
+ *	   encrypted files the filesystem is still responsible for only allowing
+ *	   DIO when requests are filesystem-block-aligned.)
  */
-bool fscrypt_dio_supported(struct kiocb *iocb, struct iov_iter *iter)
+bool fscrypt_dio_supported(struct inode *inode)
 {
-	const struct inode *inode = file_inode(iocb->ki_filp);
-	const unsigned int blocksize = i_blocksize(inode);
+	int err;
 
 	/* If the file is unencrypted, no veto from us. */
 	if (!fscrypt_needs_contents_encryption(inode))
 		return true;
 
-	/* We only support DIO with inline crypto, not fs-layer crypto. */
-	if (!fscrypt_inode_uses_inline_crypto(inode))
-		return false;
-
 	/*
-	 * Since the granularity of encryption is filesystem blocks, the file
-	 * position and total I/O length must be aligned to the filesystem block
-	 * size -- not just to the block device's logical block size as is
-	 * traditionally the case for DIO on many filesystems.
+	 * We only support DIO with inline crypto, not fs-layer crypto.
 	 *
-	 * We require that the user-provided memory buffers be filesystem block
-	 * aligned too.  It is simpler to have a single alignment value required
-	 * for all properties of the I/O, as is normally the case for DIO.
-	 * Also, allowing less aligned buffers would imply that data units could
-	 * cross bvecs, which would greatly complicate the I/O stack, which
-	 * assumes that bios can be split at any bvec boundary.
+	 * To determine whether the inode is using inline crypto, we have to set
+	 * up the key if it wasn't already done.  This is because in the current
+	 * design of fscrypt, the decision of whether to use inline crypto or
+	 * not isn't made until the inode's encryption key is being set up.  In
+	 * the DIO read/write case, the key will always be set up already, since
+	 * the file will be open.  But in the case of statx(), the key might not
+	 * be set up yet, as the file might not have been opened yet.
 	 */
-	if (!IS_ALIGNED(iocb->ki_pos | iov_iter_alignment(iter), blocksize))
+	err = fscrypt_require_key(inode);
+	if (err) {
+		/*
+		 * Key unavailable or couldn't be set up.  This edge case isn't
+		 * worth worrying about; just report that DIO is unsupported.
+		 */
+		return false;
+	}
+	if (!fscrypt_inode_uses_inline_crypto(inode))
 		return false;
 
 	return true;
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 6feb07e3e1eb5..de153b508b20a 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -40,8 +40,13 @@ static bool ext4_dio_supported(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct inode *inode = file_inode(iocb->ki_filp);
 
-	if (!fscrypt_dio_supported(iocb, iter))
-		return false;
+	if (IS_ENCRYPTED(inode)) {
+		if (!fscrypt_dio_supported(inode))
+			return false;
+		if (!IS_ALIGNED(iocb->ki_pos | iov_iter_alignment(iter),
+				i_blocksize(inode)))
+			return false;
+	}
 	if (fsverity_active(inode))
 		return false;
 	if (ext4_should_journal_data(inode))
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 8c570de21ed5a..271509b1c7928 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -4469,7 +4469,7 @@ static inline bool f2fs_force_buffered_io(struct inode *inode,
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
 	int rw = iov_iter_rw(iter);
 
-	if (!fscrypt_dio_supported(iocb, iter))
+	if (!fscrypt_dio_supported(inode))
 		return true;
 	if (fsverity_active(inode))
 		return true;
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 50d92d805bd8c..6ca89461d48dc 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -714,7 +714,7 @@ bool fscrypt_mergeable_bio(struct bio *bio, const struct inode *inode,
 bool fscrypt_mergeable_bio_bh(struct bio *bio,
 			      const struct buffer_head *next_bh);
 
-bool fscrypt_dio_supported(struct kiocb *iocb, struct iov_iter *iter);
+bool fscrypt_dio_supported(struct inode *inode);
 
 u64 fscrypt_limit_io_blocks(const struct inode *inode, u64 lblk, u64 nr_blocks);
 
@@ -747,11 +747,8 @@ static inline bool fscrypt_mergeable_bio_bh(struct bio *bio,
 	return true;
 }
 
-static inline bool fscrypt_dio_supported(struct kiocb *iocb,
-					 struct iov_iter *iter)
+static inline bool fscrypt_dio_supported(struct inode *inode)
 {
-	const struct inode *inode = file_inode(iocb->ki_filp);
-
 	return !fscrypt_needs_contents_encryption(inode);
 }
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [f2fs-dev] [RFC PATCH v2 2/7] fscrypt: change fscrypt_dio_supported() to prepare for STATX_IOALIGN
@ 2022-05-18 23:50   ` Eric Biggers
  0 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-block, linux-api, linux-kernel, linux-f2fs-devel,
	linux-xfs, linux-fscrypt, Keith Busch, linux-ext4

From: Eric Biggers <ebiggers@google.com>

To prepare for STATX_IOALIGN support, make two changes to
fscrypt_dio_supported().

First, remove the filesystem-block-alignment check and make the
filesystems handle it instead.  It previously made sense to have it in
fs/crypto/; however, to support STATX_IOALIGN the alignment requirement
would have to be returned to filesystems.  It ends up being simpler if
filesystems handle this part themselves, especially for f2fs which only
allows fs-block-aligned DIO in the first place.

Second, make fscrypt_dio_supported() work on inodes whose encryption key
hasn't been set up yet, by making it set up the key if needed.  This is
required for statx(), since statx() doesn't require a file descriptor.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/crypto/inline_crypt.c | 48 +++++++++++++++++++++-------------------
 fs/ext4/file.c           |  9 ++++++--
 fs/f2fs/f2fs.h           |  2 +-
 include/linux/fscrypt.h  |  7 ++----
 4 files changed, 35 insertions(+), 31 deletions(-)

diff --git a/fs/crypto/inline_crypt.c b/fs/crypto/inline_crypt.c
index 93c2ca8580923..82df4c0b9903c 100644
--- a/fs/crypto/inline_crypt.c
+++ b/fs/crypto/inline_crypt.c
@@ -370,43 +370,45 @@ bool fscrypt_mergeable_bio_bh(struct bio *bio,
 EXPORT_SYMBOL_GPL(fscrypt_mergeable_bio_bh);
 
 /**
- * fscrypt_dio_supported() - check whether a DIO (direct I/O) request is
- *			     supported as far as encryption is concerned
- * @iocb: the file and position the I/O is targeting
- * @iter: the I/O data segment(s)
+ * fscrypt_dio_supported() - check whether DIO (direct I/O) is supported on an
+ *			     inode, as far as encryption is concerned
+ * @inode: the inode in question
  *
  * Return: %true if there are no encryption constraints that prevent DIO from
  *	   being supported; %false if DIO is unsupported.  (Note that in the
  *	   %true case, the filesystem might have other, non-encryption-related
- *	   constraints that prevent DIO from actually being supported.)
+ *	   constraints that prevent DIO from actually being supported.  Also, on
+ *	   encrypted files the filesystem is still responsible for only allowing
+ *	   DIO when requests are filesystem-block-aligned.)
  */
-bool fscrypt_dio_supported(struct kiocb *iocb, struct iov_iter *iter)
+bool fscrypt_dio_supported(struct inode *inode)
 {
-	const struct inode *inode = file_inode(iocb->ki_filp);
-	const unsigned int blocksize = i_blocksize(inode);
+	int err;
 
 	/* If the file is unencrypted, no veto from us. */
 	if (!fscrypt_needs_contents_encryption(inode))
 		return true;
 
-	/* We only support DIO with inline crypto, not fs-layer crypto. */
-	if (!fscrypt_inode_uses_inline_crypto(inode))
-		return false;
-
 	/*
-	 * Since the granularity of encryption is filesystem blocks, the file
-	 * position and total I/O length must be aligned to the filesystem block
-	 * size -- not just to the block device's logical block size as is
-	 * traditionally the case for DIO on many filesystems.
+	 * We only support DIO with inline crypto, not fs-layer crypto.
 	 *
-	 * We require that the user-provided memory buffers be filesystem block
-	 * aligned too.  It is simpler to have a single alignment value required
-	 * for all properties of the I/O, as is normally the case for DIO.
-	 * Also, allowing less aligned buffers would imply that data units could
-	 * cross bvecs, which would greatly complicate the I/O stack, which
-	 * assumes that bios can be split at any bvec boundary.
+	 * To determine whether the inode is using inline crypto, we have to set
+	 * up the key if it wasn't already done.  This is because in the current
+	 * design of fscrypt, the decision of whether to use inline crypto or
+	 * not isn't made until the inode's encryption key is being set up.  In
+	 * the DIO read/write case, the key will always be set up already, since
+	 * the file will be open.  But in the case of statx(), the key might not
+	 * be set up yet, as the file might not have been opened yet.
 	 */
-	if (!IS_ALIGNED(iocb->ki_pos | iov_iter_alignment(iter), blocksize))
+	err = fscrypt_require_key(inode);
+	if (err) {
+		/*
+		 * Key unavailable or couldn't be set up.  This edge case isn't
+		 * worth worrying about; just report that DIO is unsupported.
+		 */
+		return false;
+	}
+	if (!fscrypt_inode_uses_inline_crypto(inode))
 		return false;
 
 	return true;
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 6feb07e3e1eb5..de153b508b20a 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -40,8 +40,13 @@ static bool ext4_dio_supported(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct inode *inode = file_inode(iocb->ki_filp);
 
-	if (!fscrypt_dio_supported(iocb, iter))
-		return false;
+	if (IS_ENCRYPTED(inode)) {
+		if (!fscrypt_dio_supported(inode))
+			return false;
+		if (!IS_ALIGNED(iocb->ki_pos | iov_iter_alignment(iter),
+				i_blocksize(inode)))
+			return false;
+	}
 	if (fsverity_active(inode))
 		return false;
 	if (ext4_should_journal_data(inode))
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 8c570de21ed5a..271509b1c7928 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -4469,7 +4469,7 @@ static inline bool f2fs_force_buffered_io(struct inode *inode,
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
 	int rw = iov_iter_rw(iter);
 
-	if (!fscrypt_dio_supported(iocb, iter))
+	if (!fscrypt_dio_supported(inode))
 		return true;
 	if (fsverity_active(inode))
 		return true;
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 50d92d805bd8c..6ca89461d48dc 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -714,7 +714,7 @@ bool fscrypt_mergeable_bio(struct bio *bio, const struct inode *inode,
 bool fscrypt_mergeable_bio_bh(struct bio *bio,
 			      const struct buffer_head *next_bh);
 
-bool fscrypt_dio_supported(struct kiocb *iocb, struct iov_iter *iter);
+bool fscrypt_dio_supported(struct inode *inode);
 
 u64 fscrypt_limit_io_blocks(const struct inode *inode, u64 lblk, u64 nr_blocks);
 
@@ -747,11 +747,8 @@ static inline bool fscrypt_mergeable_bio_bh(struct bio *bio,
 	return true;
 }
 
-static inline bool fscrypt_dio_supported(struct kiocb *iocb,
-					 struct iov_iter *iter)
+static inline bool fscrypt_dio_supported(struct inode *inode)
 {
-	const struct inode *inode = file_inode(iocb->ki_filp);
-
 	return !fscrypt_needs_contents_encryption(inode);
 }
 
-- 
2.36.1



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC PATCH v2 3/7] ext4: support STATX_IOALIGN
  2022-05-18 23:50 ` [f2fs-dev] " Eric Biggers
@ 2022-05-18 23:50   ` Eric Biggers
  -1 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, linux-api,
	linux-fscrypt, linux-block, linux-kernel, Keith Busch

From: Eric Biggers <ebiggers@google.com>

Add support for STATX_IOALIGN to ext4, so that I/O alignment information
is exposed to userspace in a consistent and easy-to-use way.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/ext4/ext4.h  |  1 +
 fs/ext4/file.c  | 15 ++++-----------
 fs/ext4/inode.c | 31 +++++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index a743b1e3b89ec..7c43428901632 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3020,6 +3020,7 @@ extern struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
 extern int  ext4_write_inode(struct inode *, struct writeback_control *);
 extern int  ext4_setattr(struct user_namespace *, struct dentry *,
 			 struct iattr *);
+extern u32  ext4_dio_alignment(struct inode *inode);
 extern int  ext4_getattr(struct user_namespace *, const struct path *,
 			 struct kstat *, u32, unsigned int);
 extern void ext4_evict_inode(struct inode *);
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index de153b508b20a..ba2271e5287b2 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -39,19 +39,12 @@
 static bool ext4_dio_supported(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct inode *inode = file_inode(iocb->ki_filp);
+	u32 dio_align = ext4_dio_alignment(inode);
 
-	if (IS_ENCRYPTED(inode)) {
-		if (!fscrypt_dio_supported(inode))
-			return false;
-		if (!IS_ALIGNED(iocb->ki_pos | iov_iter_alignment(iter),
-				i_blocksize(inode)))
-			return false;
-	}
-	if (fsverity_active(inode))
-		return false;
-	if (ext4_should_journal_data(inode))
+	if (!dio_align)
 		return false;
-	if (ext4_has_inline_data(inode))
+	if (dio_align > bdev_logical_block_size(inode->i_sb->s_bdev) &&
+	    !IS_ALIGNED(iocb->ki_pos | iov_iter_alignment(iter), dio_align))
 		return false;
 	return true;
 }
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 646ece9b3455f..5af2598aa170d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5533,6 +5533,22 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 	return error;
 }
 
+u32 ext4_dio_alignment(struct inode *inode)
+{
+	if (fsverity_active(inode))
+		return 0;
+	if (ext4_should_journal_data(inode))
+		return 0;
+	if (ext4_has_inline_data(inode))
+		return 0;
+	if (IS_ENCRYPTED(inode)) {
+		if (!fscrypt_dio_supported(inode))
+			return 0;
+		return i_blocksize(inode);
+	}
+	return bdev_logical_block_size(inode->i_sb->s_bdev);
+}
+
 int ext4_getattr(struct user_namespace *mnt_userns, const struct path *path,
 		 struct kstat *stat, u32 request_mask, unsigned int query_flags)
 {
@@ -5548,6 +5564,21 @@ int ext4_getattr(struct user_namespace *mnt_userns, const struct path *path,
 		stat->btime.tv_nsec = ei->i_crtime.tv_nsec;
 	}
 
+	/*
+	 * Return the I/O alignment information if requested.  We only return
+	 * this information when requested, since on encrypted files it might
+	 * take a fair bit of work to get if the file wasn't opened recently.
+	 */
+	if ((request_mask & STATX_IOALIGN) && S_ISREG(inode->i_mode)) {
+		u32 dio_align = ext4_dio_alignment(inode);
+		unsigned int io_opt = bdev_io_opt(inode->i_sb->s_bdev);
+
+		stat->result_mask |= STATX_IOALIGN;
+		stat->mem_align_dio = dio_align;
+		stat->offset_align_dio = dio_align;
+		stat->offset_align_optimal = max(io_opt, i_blocksize(inode));
+	}
+
 	flags = ei->i_flags & EXT4_FL_USER_VISIBLE;
 	if (flags & EXT4_APPEND_FL)
 		stat->attributes |= STATX_ATTR_APPEND;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [f2fs-dev] [RFC PATCH v2 3/7] ext4: support STATX_IOALIGN
@ 2022-05-18 23:50   ` Eric Biggers
  0 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-block, linux-api, linux-kernel, linux-f2fs-devel,
	linux-xfs, linux-fscrypt, Keith Busch, linux-ext4

From: Eric Biggers <ebiggers@google.com>

Add support for STATX_IOALIGN to ext4, so that I/O alignment information
is exposed to userspace in a consistent and easy-to-use way.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/ext4/ext4.h  |  1 +
 fs/ext4/file.c  | 15 ++++-----------
 fs/ext4/inode.c | 31 +++++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index a743b1e3b89ec..7c43428901632 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3020,6 +3020,7 @@ extern struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
 extern int  ext4_write_inode(struct inode *, struct writeback_control *);
 extern int  ext4_setattr(struct user_namespace *, struct dentry *,
 			 struct iattr *);
+extern u32  ext4_dio_alignment(struct inode *inode);
 extern int  ext4_getattr(struct user_namespace *, const struct path *,
 			 struct kstat *, u32, unsigned int);
 extern void ext4_evict_inode(struct inode *);
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index de153b508b20a..ba2271e5287b2 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -39,19 +39,12 @@
 static bool ext4_dio_supported(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct inode *inode = file_inode(iocb->ki_filp);
+	u32 dio_align = ext4_dio_alignment(inode);
 
-	if (IS_ENCRYPTED(inode)) {
-		if (!fscrypt_dio_supported(inode))
-			return false;
-		if (!IS_ALIGNED(iocb->ki_pos | iov_iter_alignment(iter),
-				i_blocksize(inode)))
-			return false;
-	}
-	if (fsverity_active(inode))
-		return false;
-	if (ext4_should_journal_data(inode))
+	if (!dio_align)
 		return false;
-	if (ext4_has_inline_data(inode))
+	if (dio_align > bdev_logical_block_size(inode->i_sb->s_bdev) &&
+	    !IS_ALIGNED(iocb->ki_pos | iov_iter_alignment(iter), dio_align))
 		return false;
 	return true;
 }
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 646ece9b3455f..5af2598aa170d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5533,6 +5533,22 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 	return error;
 }
 
+u32 ext4_dio_alignment(struct inode *inode)
+{
+	if (fsverity_active(inode))
+		return 0;
+	if (ext4_should_journal_data(inode))
+		return 0;
+	if (ext4_has_inline_data(inode))
+		return 0;
+	if (IS_ENCRYPTED(inode)) {
+		if (!fscrypt_dio_supported(inode))
+			return 0;
+		return i_blocksize(inode);
+	}
+	return bdev_logical_block_size(inode->i_sb->s_bdev);
+}
+
 int ext4_getattr(struct user_namespace *mnt_userns, const struct path *path,
 		 struct kstat *stat, u32 request_mask, unsigned int query_flags)
 {
@@ -5548,6 +5564,21 @@ int ext4_getattr(struct user_namespace *mnt_userns, const struct path *path,
 		stat->btime.tv_nsec = ei->i_crtime.tv_nsec;
 	}
 
+	/*
+	 * Return the I/O alignment information if requested.  We only return
+	 * this information when requested, since on encrypted files it might
+	 * take a fair bit of work to get if the file wasn't opened recently.
+	 */
+	if ((request_mask & STATX_IOALIGN) && S_ISREG(inode->i_mode)) {
+		u32 dio_align = ext4_dio_alignment(inode);
+		unsigned int io_opt = bdev_io_opt(inode->i_sb->s_bdev);
+
+		stat->result_mask |= STATX_IOALIGN;
+		stat->mem_align_dio = dio_align;
+		stat->offset_align_dio = dio_align;
+		stat->offset_align_optimal = max(io_opt, i_blocksize(inode));
+	}
+
 	flags = ei->i_flags & EXT4_FL_USER_VISIBLE;
 	if (flags & EXT4_APPEND_FL)
 		stat->attributes |= STATX_ATTR_APPEND;
-- 
2.36.1



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC PATCH v2 4/7] f2fs: move f2fs_force_buffered_io() into file.c
  2022-05-18 23:50 ` [f2fs-dev] " Eric Biggers
@ 2022-05-18 23:50   ` Eric Biggers
  -1 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, linux-api,
	linux-fscrypt, linux-block, linux-kernel, Keith Busch

From: Eric Biggers <ebiggers@google.com>

f2fs_force_buffered_io() is only used in file.c, so move it into there.
No behavior change.  This makes it easier to review later patches.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/f2fs.h | 45 ---------------------------------------------
 fs/f2fs/file.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 271509b1c7928..2d6492c016ad6 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -4442,17 +4442,6 @@ static inline void f2fs_i_compr_blocks_update(struct inode *inode,
 	f2fs_mark_inode_dirty_sync(inode, true);
 }
 
-static inline int block_unaligned_IO(struct inode *inode,
-				struct kiocb *iocb, struct iov_iter *iter)
-{
-	unsigned int i_blkbits = READ_ONCE(inode->i_blkbits);
-	unsigned int blocksize_mask = (1 << i_blkbits) - 1;
-	loff_t offset = iocb->ki_pos;
-	unsigned long align = offset | iov_iter_alignment(iter);
-
-	return align & blocksize_mask;
-}
-
 static inline bool f2fs_allow_multi_device_dio(struct f2fs_sb_info *sbi,
 								int flag)
 {
@@ -4463,40 +4452,6 @@ static inline bool f2fs_allow_multi_device_dio(struct f2fs_sb_info *sbi,
 	return sbi->aligned_blksize;
 }
 
-static inline bool f2fs_force_buffered_io(struct inode *inode,
-				struct kiocb *iocb, struct iov_iter *iter)
-{
-	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
-	int rw = iov_iter_rw(iter);
-
-	if (!fscrypt_dio_supported(inode))
-		return true;
-	if (fsverity_active(inode))
-		return true;
-	if (f2fs_compressed_file(inode))
-		return true;
-
-	/* disallow direct IO if any of devices has unaligned blksize */
-	if (f2fs_is_multi_device(sbi) && !sbi->aligned_blksize)
-		return true;
-	/*
-	 * for blkzoned device, fallback direct IO to buffered IO, so
-	 * all IOs can be serialized by log-structured write.
-	 */
-	if (f2fs_sb_has_blkzoned(sbi))
-		return true;
-	if (f2fs_lfs_mode(sbi) && (rw == WRITE)) {
-		if (block_unaligned_IO(inode, iocb, iter))
-			return true;
-		if (F2FS_IO_ALIGNED(sbi))
-			return true;
-	}
-	if (is_sbi_flag_set(F2FS_I_SB(inode), SBI_CP_DISABLED))
-		return true;
-
-	return false;
-}
-
 static inline bool f2fs_need_verity(const struct inode *inode, pgoff_t idx)
 {
 	return fsverity_active(inode) &&
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 5b89af0f27f05..67f2e21ffbd67 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -807,6 +807,51 @@ int f2fs_truncate(struct inode *inode)
 	return 0;
 }
 
+static int block_unaligned_IO(struct inode *inode, struct kiocb *iocb,
+			      struct iov_iter *iter)
+{
+	unsigned int i_blkbits = READ_ONCE(inode->i_blkbits);
+	unsigned int blocksize_mask = (1 << i_blkbits) - 1;
+	loff_t offset = iocb->ki_pos;
+	unsigned long align = offset | iov_iter_alignment(iter);
+
+	return align & blocksize_mask;
+}
+
+static inline bool f2fs_force_buffered_io(struct inode *inode,
+				struct kiocb *iocb, struct iov_iter *iter)
+{
+	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+	int rw = iov_iter_rw(iter);
+
+	if (!fscrypt_dio_supported(inode))
+		return true;
+	if (fsverity_active(inode))
+		return true;
+	if (f2fs_compressed_file(inode))
+		return true;
+
+	/* disallow direct IO if any of devices has unaligned blksize */
+	if (f2fs_is_multi_device(sbi) && !sbi->aligned_blksize)
+		return true;
+	/*
+	 * for blkzoned device, fallback direct IO to buffered IO, so
+	 * all IOs can be serialized by log-structured write.
+	 */
+	if (f2fs_sb_has_blkzoned(sbi))
+		return true;
+	if (f2fs_lfs_mode(sbi) && (rw == WRITE)) {
+		if (block_unaligned_IO(inode, iocb, iter))
+			return true;
+		if (F2FS_IO_ALIGNED(sbi))
+			return true;
+	}
+	if (is_sbi_flag_set(F2FS_I_SB(inode), SBI_CP_DISABLED))
+		return true;
+
+	return false;
+}
+
 int f2fs_getattr(struct user_namespace *mnt_userns, const struct path *path,
 		 struct kstat *stat, u32 request_mask, unsigned int query_flags)
 {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [f2fs-dev] [RFC PATCH v2 4/7] f2fs: move f2fs_force_buffered_io() into file.c
@ 2022-05-18 23:50   ` Eric Biggers
  0 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-block, linux-api, linux-kernel, linux-f2fs-devel,
	linux-xfs, linux-fscrypt, Keith Busch, linux-ext4

From: Eric Biggers <ebiggers@google.com>

f2fs_force_buffered_io() is only used in file.c, so move it into there.
No behavior change.  This makes it easier to review later patches.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/f2fs.h | 45 ---------------------------------------------
 fs/f2fs/file.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 271509b1c7928..2d6492c016ad6 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -4442,17 +4442,6 @@ static inline void f2fs_i_compr_blocks_update(struct inode *inode,
 	f2fs_mark_inode_dirty_sync(inode, true);
 }
 
-static inline int block_unaligned_IO(struct inode *inode,
-				struct kiocb *iocb, struct iov_iter *iter)
-{
-	unsigned int i_blkbits = READ_ONCE(inode->i_blkbits);
-	unsigned int blocksize_mask = (1 << i_blkbits) - 1;
-	loff_t offset = iocb->ki_pos;
-	unsigned long align = offset | iov_iter_alignment(iter);
-
-	return align & blocksize_mask;
-}
-
 static inline bool f2fs_allow_multi_device_dio(struct f2fs_sb_info *sbi,
 								int flag)
 {
@@ -4463,40 +4452,6 @@ static inline bool f2fs_allow_multi_device_dio(struct f2fs_sb_info *sbi,
 	return sbi->aligned_blksize;
 }
 
-static inline bool f2fs_force_buffered_io(struct inode *inode,
-				struct kiocb *iocb, struct iov_iter *iter)
-{
-	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
-	int rw = iov_iter_rw(iter);
-
-	if (!fscrypt_dio_supported(inode))
-		return true;
-	if (fsverity_active(inode))
-		return true;
-	if (f2fs_compressed_file(inode))
-		return true;
-
-	/* disallow direct IO if any of devices has unaligned blksize */
-	if (f2fs_is_multi_device(sbi) && !sbi->aligned_blksize)
-		return true;
-	/*
-	 * for blkzoned device, fallback direct IO to buffered IO, so
-	 * all IOs can be serialized by log-structured write.
-	 */
-	if (f2fs_sb_has_blkzoned(sbi))
-		return true;
-	if (f2fs_lfs_mode(sbi) && (rw == WRITE)) {
-		if (block_unaligned_IO(inode, iocb, iter))
-			return true;
-		if (F2FS_IO_ALIGNED(sbi))
-			return true;
-	}
-	if (is_sbi_flag_set(F2FS_I_SB(inode), SBI_CP_DISABLED))
-		return true;
-
-	return false;
-}
-
 static inline bool f2fs_need_verity(const struct inode *inode, pgoff_t idx)
 {
 	return fsverity_active(inode) &&
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 5b89af0f27f05..67f2e21ffbd67 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -807,6 +807,51 @@ int f2fs_truncate(struct inode *inode)
 	return 0;
 }
 
+static int block_unaligned_IO(struct inode *inode, struct kiocb *iocb,
+			      struct iov_iter *iter)
+{
+	unsigned int i_blkbits = READ_ONCE(inode->i_blkbits);
+	unsigned int blocksize_mask = (1 << i_blkbits) - 1;
+	loff_t offset = iocb->ki_pos;
+	unsigned long align = offset | iov_iter_alignment(iter);
+
+	return align & blocksize_mask;
+}
+
+static inline bool f2fs_force_buffered_io(struct inode *inode,
+				struct kiocb *iocb, struct iov_iter *iter)
+{
+	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+	int rw = iov_iter_rw(iter);
+
+	if (!fscrypt_dio_supported(inode))
+		return true;
+	if (fsverity_active(inode))
+		return true;
+	if (f2fs_compressed_file(inode))
+		return true;
+
+	/* disallow direct IO if any of devices has unaligned blksize */
+	if (f2fs_is_multi_device(sbi) && !sbi->aligned_blksize)
+		return true;
+	/*
+	 * for blkzoned device, fallback direct IO to buffered IO, so
+	 * all IOs can be serialized by log-structured write.
+	 */
+	if (f2fs_sb_has_blkzoned(sbi))
+		return true;
+	if (f2fs_lfs_mode(sbi) && (rw == WRITE)) {
+		if (block_unaligned_IO(inode, iocb, iter))
+			return true;
+		if (F2FS_IO_ALIGNED(sbi))
+			return true;
+	}
+	if (is_sbi_flag_set(F2FS_I_SB(inode), SBI_CP_DISABLED))
+		return true;
+
+	return false;
+}
+
 int f2fs_getattr(struct user_namespace *mnt_userns, const struct path *path,
 		 struct kstat *stat, u32 request_mask, unsigned int query_flags)
 {
-- 
2.36.1



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC PATCH v2 5/7] f2fs: don't allow DIO reads but not DIO writes
  2022-05-18 23:50 ` [f2fs-dev] " Eric Biggers
@ 2022-05-18 23:50   ` Eric Biggers
  -1 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, linux-api,
	linux-fscrypt, linux-block, linux-kernel, Keith Busch

From: Eric Biggers <ebiggers@google.com>

Currently, if an f2fs filesystem is mounted with the mode=lfs and
io_bits mount options, DIO reads are allowed but DIO writes are not.
Allowing DIO reads but not DIO writes is an unusual restriction, which
is likely to be surprising to applications, namely any application that
both reads and writes from a file (using O_DIRECT).  Given this, let's
drop the support for DIO reads in this configuration.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/file.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 67f2e21ffbd67..68947fe16ea35 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -822,7 +822,6 @@ static inline bool f2fs_force_buffered_io(struct inode *inode,
 				struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
-	int rw = iov_iter_rw(iter);
 
 	if (!fscrypt_dio_supported(inode))
 		return true;
@@ -840,7 +839,7 @@ static inline bool f2fs_force_buffered_io(struct inode *inode,
 	 */
 	if (f2fs_sb_has_blkzoned(sbi))
 		return true;
-	if (f2fs_lfs_mode(sbi) && (rw == WRITE)) {
+	if (f2fs_lfs_mode(sbi)) {
 		if (block_unaligned_IO(inode, iocb, iter))
 			return true;
 		if (F2FS_IO_ALIGNED(sbi))
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [f2fs-dev] [RFC PATCH v2 5/7] f2fs: don't allow DIO reads but not DIO writes
@ 2022-05-18 23:50   ` Eric Biggers
  0 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-block, linux-api, linux-kernel, linux-f2fs-devel,
	linux-xfs, linux-fscrypt, Keith Busch, linux-ext4

From: Eric Biggers <ebiggers@google.com>

Currently, if an f2fs filesystem is mounted with the mode=lfs and
io_bits mount options, DIO reads are allowed but DIO writes are not.
Allowing DIO reads but not DIO writes is an unusual restriction, which
is likely to be surprising to applications, namely any application that
both reads and writes from a file (using O_DIRECT).  Given this, let's
drop the support for DIO reads in this configuration.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/file.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 67f2e21ffbd67..68947fe16ea35 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -822,7 +822,6 @@ static inline bool f2fs_force_buffered_io(struct inode *inode,
 				struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
-	int rw = iov_iter_rw(iter);
 
 	if (!fscrypt_dio_supported(inode))
 		return true;
@@ -840,7 +839,7 @@ static inline bool f2fs_force_buffered_io(struct inode *inode,
 	 */
 	if (f2fs_sb_has_blkzoned(sbi))
 		return true;
-	if (f2fs_lfs_mode(sbi) && (rw == WRITE)) {
+	if (f2fs_lfs_mode(sbi)) {
 		if (block_unaligned_IO(inode, iocb, iter))
 			return true;
 		if (F2FS_IO_ALIGNED(sbi))
-- 
2.36.1



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC PATCH v2 6/7] f2fs: simplify f2fs_force_buffered_io()
  2022-05-18 23:50 ` [f2fs-dev] " Eric Biggers
@ 2022-05-18 23:50   ` Eric Biggers
  -1 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, linux-api,
	linux-fscrypt, linux-block, linux-kernel, Keith Busch

From: Eric Biggers <ebiggers@google.com>

f2fs only allows direct I/O that is aligned to the filesystem block
size.  Given that fact, simplify f2fs_force_buffered_io() by removing
the redundant call to block_unaligned_IO().

This makes it easier to reuse this code for STATX_IOALIGN.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/file.c | 24 ++++--------------------
 1 file changed, 4 insertions(+), 20 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 68947fe16ea35..c32f7722ba6b0 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -807,19 +807,7 @@ int f2fs_truncate(struct inode *inode)
 	return 0;
 }
 
-static int block_unaligned_IO(struct inode *inode, struct kiocb *iocb,
-			      struct iov_iter *iter)
-{
-	unsigned int i_blkbits = READ_ONCE(inode->i_blkbits);
-	unsigned int blocksize_mask = (1 << i_blkbits) - 1;
-	loff_t offset = iocb->ki_pos;
-	unsigned long align = offset | iov_iter_alignment(iter);
-
-	return align & blocksize_mask;
-}
-
-static inline bool f2fs_force_buffered_io(struct inode *inode,
-				struct kiocb *iocb, struct iov_iter *iter)
+static bool f2fs_force_buffered_io(struct inode *inode)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
 
@@ -839,12 +827,8 @@ static inline bool f2fs_force_buffered_io(struct inode *inode,
 	 */
 	if (f2fs_sb_has_blkzoned(sbi))
 		return true;
-	if (f2fs_lfs_mode(sbi)) {
-		if (block_unaligned_IO(inode, iocb, iter))
-			return true;
-		if (F2FS_IO_ALIGNED(sbi))
-			return true;
-	}
+	if (f2fs_lfs_mode(sbi) && F2FS_IO_ALIGNED(sbi))
+		return true;
 	if (is_sbi_flag_set(F2FS_I_SB(inode), SBI_CP_DISABLED))
 		return true;
 
@@ -4283,7 +4267,7 @@ static bool f2fs_should_use_dio(struct inode *inode, struct kiocb *iocb,
 	if (!(iocb->ki_flags & IOCB_DIRECT))
 		return false;
 
-	if (f2fs_force_buffered_io(inode, iocb, iter))
+	if (f2fs_force_buffered_io(inode))
 		return false;
 
 	/*
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [f2fs-dev] [RFC PATCH v2 6/7] f2fs: simplify f2fs_force_buffered_io()
@ 2022-05-18 23:50   ` Eric Biggers
  0 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-block, linux-api, linux-kernel, linux-f2fs-devel,
	linux-xfs, linux-fscrypt, Keith Busch, linux-ext4

From: Eric Biggers <ebiggers@google.com>

f2fs only allows direct I/O that is aligned to the filesystem block
size.  Given that fact, simplify f2fs_force_buffered_io() by removing
the redundant call to block_unaligned_IO().

This makes it easier to reuse this code for STATX_IOALIGN.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/file.c | 24 ++++--------------------
 1 file changed, 4 insertions(+), 20 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 68947fe16ea35..c32f7722ba6b0 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -807,19 +807,7 @@ int f2fs_truncate(struct inode *inode)
 	return 0;
 }
 
-static int block_unaligned_IO(struct inode *inode, struct kiocb *iocb,
-			      struct iov_iter *iter)
-{
-	unsigned int i_blkbits = READ_ONCE(inode->i_blkbits);
-	unsigned int blocksize_mask = (1 << i_blkbits) - 1;
-	loff_t offset = iocb->ki_pos;
-	unsigned long align = offset | iov_iter_alignment(iter);
-
-	return align & blocksize_mask;
-}
-
-static inline bool f2fs_force_buffered_io(struct inode *inode,
-				struct kiocb *iocb, struct iov_iter *iter)
+static bool f2fs_force_buffered_io(struct inode *inode)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
 
@@ -839,12 +827,8 @@ static inline bool f2fs_force_buffered_io(struct inode *inode,
 	 */
 	if (f2fs_sb_has_blkzoned(sbi))
 		return true;
-	if (f2fs_lfs_mode(sbi)) {
-		if (block_unaligned_IO(inode, iocb, iter))
-			return true;
-		if (F2FS_IO_ALIGNED(sbi))
-			return true;
-	}
+	if (f2fs_lfs_mode(sbi) && F2FS_IO_ALIGNED(sbi))
+		return true;
 	if (is_sbi_flag_set(F2FS_I_SB(inode), SBI_CP_DISABLED))
 		return true;
 
@@ -4283,7 +4267,7 @@ static bool f2fs_should_use_dio(struct inode *inode, struct kiocb *iocb,
 	if (!(iocb->ki_flags & IOCB_DIRECT))
 		return false;
 
-	if (f2fs_force_buffered_io(inode, iocb, iter))
+	if (f2fs_force_buffered_io(inode))
 		return false;
 
 	/*
-- 
2.36.1



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC PATCH v2 7/7] f2fs: support STATX_IOALIGN
  2022-05-18 23:50 ` [f2fs-dev] " Eric Biggers
@ 2022-05-18 23:50   ` Eric Biggers
  -1 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-ext4, linux-f2fs-devel, linux-xfs, linux-api,
	linux-fscrypt, linux-block, linux-kernel, Keith Busch

From: Eric Biggers <ebiggers@google.com>

Add support for STATX_IOALIGN to f2fs, so that I/O alignment information
is exposed to userspace in a consistent and easy-to-use way.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/file.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index c32f7722ba6b0..f89a190949c59 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -835,6 +835,21 @@ static bool f2fs_force_buffered_io(struct inode *inode)
 	return false;
 }
 
+/* Return the maximum value of io_opt across all the filesystem's devices. */
+static unsigned int f2fs_max_io_opt(struct inode *inode)
+{
+	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+	int io_opt = 0;
+	int i;
+
+	if (!f2fs_is_multi_device(sbi))
+		return bdev_io_opt(sbi->sb->s_bdev);
+
+	for (i = 0; i < sbi->s_ndevs; i++)
+		io_opt = max(io_opt, bdev_io_opt(FDEV(i).bdev));
+	return io_opt;
+}
+
 int f2fs_getattr(struct user_namespace *mnt_userns, const struct path *path,
 		 struct kstat *stat, u32 request_mask, unsigned int query_flags)
 {
@@ -851,6 +866,22 @@ int f2fs_getattr(struct user_namespace *mnt_userns, const struct path *path,
 		stat->btime.tv_nsec = fi->i_crtime.tv_nsec;
 	}
 
+	/*
+	 * Return the I/O alignment information if requested.  We only return
+	 * this information when requested, since on encrypted files it might
+	 * take a fair bit of work to get if the file wasn't opened recently.
+	 */
+	if ((request_mask & STATX_IOALIGN) && S_ISREG(inode->i_mode)) {
+		unsigned int bsize = i_blocksize(inode);
+
+		stat->result_mask |= STATX_IOALIGN;
+		if (!f2fs_force_buffered_io(inode)) {
+			stat->mem_align_dio = bsize;
+			stat->offset_align_dio = bsize;
+		}
+		stat->offset_align_optimal = max(f2fs_max_io_opt(inode), bsize);
+	}
+
 	flags = fi->i_flags;
 	if (flags & F2FS_COMPR_FL)
 		stat->attributes |= STATX_ATTR_COMPRESSED;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [f2fs-dev] [RFC PATCH v2 7/7] f2fs: support STATX_IOALIGN
@ 2022-05-18 23:50   ` Eric Biggers
  0 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-05-18 23:50 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: linux-block, linux-api, linux-kernel, linux-f2fs-devel,
	linux-xfs, linux-fscrypt, Keith Busch, linux-ext4

From: Eric Biggers <ebiggers@google.com>

Add support for STATX_IOALIGN to f2fs, so that I/O alignment information
is exposed to userspace in a consistent and easy-to-use way.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/f2fs/file.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index c32f7722ba6b0..f89a190949c59 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -835,6 +835,21 @@ static bool f2fs_force_buffered_io(struct inode *inode)
 	return false;
 }
 
+/* Return the maximum value of io_opt across all the filesystem's devices. */
+static unsigned int f2fs_max_io_opt(struct inode *inode)
+{
+	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+	int io_opt = 0;
+	int i;
+
+	if (!f2fs_is_multi_device(sbi))
+		return bdev_io_opt(sbi->sb->s_bdev);
+
+	for (i = 0; i < sbi->s_ndevs; i++)
+		io_opt = max(io_opt, bdev_io_opt(FDEV(i).bdev));
+	return io_opt;
+}
+
 int f2fs_getattr(struct user_namespace *mnt_userns, const struct path *path,
 		 struct kstat *stat, u32 request_mask, unsigned int query_flags)
 {
@@ -851,6 +866,22 @@ int f2fs_getattr(struct user_namespace *mnt_userns, const struct path *path,
 		stat->btime.tv_nsec = fi->i_crtime.tv_nsec;
 	}
 
+	/*
+	 * Return the I/O alignment information if requested.  We only return
+	 * this information when requested, since on encrypted files it might
+	 * take a fair bit of work to get if the file wasn't opened recently.
+	 */
+	if ((request_mask & STATX_IOALIGN) && S_ISREG(inode->i_mode)) {
+		unsigned int bsize = i_blocksize(inode);
+
+		stat->result_mask |= STATX_IOALIGN;
+		if (!f2fs_force_buffered_io(inode)) {
+			stat->mem_align_dio = bsize;
+			stat->offset_align_dio = bsize;
+		}
+		stat->offset_align_optimal = max(f2fs_max_io_opt(inode), bsize);
+	}
+
 	flags = fi->i_flags;
 	if (flags & F2FS_COMPR_FL)
 		stat->attributes |= STATX_ATTR_COMPRESSED;
-- 
2.36.1



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v2 1/7] statx: add I/O alignment information
  2022-05-18 23:50   ` [f2fs-dev] " Eric Biggers
@ 2022-05-19  7:05     ` Christoph Hellwig
  -1 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2022-05-19  7:05 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	linux-api, linux-fscrypt, linux-block, linux-kernel, Keith Busch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [f2fs-dev] [RFC PATCH v2 1/7] statx: add I/O alignment information
@ 2022-05-19  7:05     ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2022-05-19  7:05 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-block, linux-api, linux-kernel, linux-f2fs-devel,
	linux-xfs, Keith Busch, linux-fscrypt, linux-fsdevel, linux-ext4

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v2 1/7] statx: add I/O alignment information
  2022-05-18 23:50   ` [f2fs-dev] " Eric Biggers
@ 2022-05-19 23:06     ` Darrick J. Wong
  -1 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2022-05-19 23:06 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	linux-api, linux-fscrypt, linux-block, linux-kernel, Keith Busch

On Wed, May 18, 2022 at 04:50:05PM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Traditionally, the conditions for when DIO (direct I/O) is supported
> were fairly simple: filesystems either supported DIO aligned to the
> block device's logical block size, or didn't support DIO at all.
> 
> However, due to filesystem features that have been added over time (e.g,
> data journalling, inline data, encryption, verity, compression,
> checkpoint disabling, log-structured mode), the conditions for when DIO
> is allowed on a file have gotten increasingly complex.  Whether a
> particular file supports DIO, and with what alignment, can depend on
> various file attributes and filesystem mount options, as well as which
> block device(s) the file's data is located on.
> 
> XFS has an ioctl XFS_IOC_DIOINFO which exposes this information to
> applications.  However, as discussed
> (https://lore.kernel.org/linux-fsdevel/20220120071215.123274-1-ebiggers@kernel.org/T/#u),
> this ioctl is rarely used and not known to be used outside of
> XFS-specific code.  It also was never intended to indicate when a file
> doesn't support DIO at all, and it only exposes the minimum I/O
> alignment, not the optimal I/O alignment which has been requested too.
> 
> Therefore, let's expose this information via statx().  Add the
> STATX_IOALIGN flag and three fields associated with it:
> 
> * stx_mem_align_dio: the alignment (in bytes) required for user memory
>   buffers for DIO, or 0 if DIO is not supported on the file.
> 
> * stx_offset_align_dio: the alignment (in bytes) required for file
>   offsets and I/O segment lengths for DIO, or 0 if DIO is not supported
>   on the file.  This will only be nonzero if stx_mem_align_dio is
>   nonzero, and vice versa.
> 
> * stx_offset_align_optimal: the alignment (in bytes) suggested for file
>   offsets and I/O segment lengths to get optimal performance.  This
>   applies to both DIO and buffered I/O.  It differs from stx_blocksize
>   in that stx_offset_align_optimal will contain the real optimum I/O
>   size, which may be a large value.  In contrast, for compatibility
>   reasons stx_blocksize is the minimum size needed to avoid page cache
>   read/write/modify cycles, which may be much smaller than the optimum
>   I/O size.  For more details about the motivation for this field, see
>   https://lore.kernel.org/r/20220210040304.GM59729@dread.disaster.area

Hmm.  So I guess this is supposed to be the filesystem's best guess at
the IO size that will minimize RMW cycles in the entire stack?  i.e. if
the user does not want RMW of pagecache pages, of file allocation units
(if COW is enabled), of RAID stripes, or in the storage itself, then it
should ensure that all IOs are aligned to this value?

I guess that means for XFS it's effectively max(pagesize, i_blocksize,
bdev io_opt, sb_width, and (pretend XFS can reflink the realtime volume)
the rt extent size)?  I didn't see a manpage update for statx(2) but
that's mostly what I'm interested in. :)

Looking ahead, it looks like the ext4/f2fs implementations only seem to
be returning max(i_blocksize, bdev io_opt)?  But not the pagesize?  Did
I misunderstood this, then?

(The plumbing changes in this patch look ok.)

--D

> Note that as with other statx() extensions, if STATX_IOALIGN isn't set
> in the returned statx struct, then these new fields won't be filled in.
> This will happen if the filesystem doesn't support STATX_IOALIGN, or if
> the file isn't a regular file.  (It might be supported on block device
> files in the future.)  It might also happen if the caller didn't include
> STATX_IOALIGN in the request mask, since statx() isn't required to
> return information that wasn't requested.
> 
> This commit adds the VFS-level plumbing for STATX_IOALIGN.  Individual
> filesystems will still need to add code to support it.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
>  fs/stat.c                 | 3 +++
>  include/linux/stat.h      | 3 +++
>  include/uapi/linux/stat.h | 9 +++++++--
>  3 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/stat.c b/fs/stat.c
> index 5c2c94464e8b0..9d477218545b8 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -611,6 +611,9 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer)
>  	tmp.stx_dev_major = MAJOR(stat->dev);
>  	tmp.stx_dev_minor = MINOR(stat->dev);
>  	tmp.stx_mnt_id = stat->mnt_id;
> +	tmp.stx_mem_align_dio = stat->mem_align_dio;
> +	tmp.stx_offset_align_dio = stat->offset_align_dio;
> +	tmp.stx_offset_align_optimal = stat->offset_align_optimal;
>  
>  	return copy_to_user(buffer, &tmp, sizeof(tmp)) ? -EFAULT : 0;
>  }
> diff --git a/include/linux/stat.h b/include/linux/stat.h
> index 7df06931f25d8..48b8b1ad1567c 100644
> --- a/include/linux/stat.h
> +++ b/include/linux/stat.h
> @@ -50,6 +50,9 @@ struct kstat {
>  	struct timespec64 btime;			/* File creation time */
>  	u64		blocks;
>  	u64		mnt_id;
> +	u32		mem_align_dio;
> +	u32		offset_align_dio;
> +	u32		offset_align_optimal;
>  };
>  
>  #endif
> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> index 1500a0f58041a..f822b23e81091 100644
> --- a/include/uapi/linux/stat.h
> +++ b/include/uapi/linux/stat.h
> @@ -124,9 +124,13 @@ struct statx {
>  	__u32	stx_dev_minor;
>  	/* 0x90 */
>  	__u64	stx_mnt_id;
> -	__u64	__spare2;
> +	__u32	stx_mem_align_dio;	/* Memory buffer alignment for direct I/O */
> +	__u32	stx_offset_align_dio;	/* File offset alignment for direct I/O */
>  	/* 0xa0 */
> -	__u64	__spare3[12];	/* Spare space for future expansion */
> +	__u32	stx_offset_align_optimal; /* Optimal file offset alignment for I/O */
> +	__u32	__spare2;
> +	/* 0xa8 */
> +	__u64	__spare3[11];	/* Spare space for future expansion */
>  	/* 0x100 */
>  };
>  
> @@ -152,6 +156,7 @@ struct statx {
>  #define STATX_BASIC_STATS	0x000007ffU	/* The stuff in the normal stat struct */
>  #define STATX_BTIME		0x00000800U	/* Want/got stx_btime */
>  #define STATX_MNT_ID		0x00001000U	/* Got stx_mnt_id */
> +#define STATX_IOALIGN		0x00002000U	/* Want/got IO alignment info */
>  
>  #define STATX__RESERVED		0x80000000U	/* Reserved for future struct statx expansion */
>  
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [f2fs-dev] [RFC PATCH v2 1/7] statx: add I/O alignment information
@ 2022-05-19 23:06     ` Darrick J. Wong
  0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2022-05-19 23:06 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-block, linux-api, linux-kernel, linux-f2fs-devel,
	linux-xfs, Keith Busch, linux-fscrypt, linux-fsdevel, linux-ext4

On Wed, May 18, 2022 at 04:50:05PM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Traditionally, the conditions for when DIO (direct I/O) is supported
> were fairly simple: filesystems either supported DIO aligned to the
> block device's logical block size, or didn't support DIO at all.
> 
> However, due to filesystem features that have been added over time (e.g,
> data journalling, inline data, encryption, verity, compression,
> checkpoint disabling, log-structured mode), the conditions for when DIO
> is allowed on a file have gotten increasingly complex.  Whether a
> particular file supports DIO, and with what alignment, can depend on
> various file attributes and filesystem mount options, as well as which
> block device(s) the file's data is located on.
> 
> XFS has an ioctl XFS_IOC_DIOINFO which exposes this information to
> applications.  However, as discussed
> (https://lore.kernel.org/linux-fsdevel/20220120071215.123274-1-ebiggers@kernel.org/T/#u),
> this ioctl is rarely used and not known to be used outside of
> XFS-specific code.  It also was never intended to indicate when a file
> doesn't support DIO at all, and it only exposes the minimum I/O
> alignment, not the optimal I/O alignment which has been requested too.
> 
> Therefore, let's expose this information via statx().  Add the
> STATX_IOALIGN flag and three fields associated with it:
> 
> * stx_mem_align_dio: the alignment (in bytes) required for user memory
>   buffers for DIO, or 0 if DIO is not supported on the file.
> 
> * stx_offset_align_dio: the alignment (in bytes) required for file
>   offsets and I/O segment lengths for DIO, or 0 if DIO is not supported
>   on the file.  This will only be nonzero if stx_mem_align_dio is
>   nonzero, and vice versa.
> 
> * stx_offset_align_optimal: the alignment (in bytes) suggested for file
>   offsets and I/O segment lengths to get optimal performance.  This
>   applies to both DIO and buffered I/O.  It differs from stx_blocksize
>   in that stx_offset_align_optimal will contain the real optimum I/O
>   size, which may be a large value.  In contrast, for compatibility
>   reasons stx_blocksize is the minimum size needed to avoid page cache
>   read/write/modify cycles, which may be much smaller than the optimum
>   I/O size.  For more details about the motivation for this field, see
>   https://lore.kernel.org/r/20220210040304.GM59729@dread.disaster.area

Hmm.  So I guess this is supposed to be the filesystem's best guess at
the IO size that will minimize RMW cycles in the entire stack?  i.e. if
the user does not want RMW of pagecache pages, of file allocation units
(if COW is enabled), of RAID stripes, or in the storage itself, then it
should ensure that all IOs are aligned to this value?

I guess that means for XFS it's effectively max(pagesize, i_blocksize,
bdev io_opt, sb_width, and (pretend XFS can reflink the realtime volume)
the rt extent size)?  I didn't see a manpage update for statx(2) but
that's mostly what I'm interested in. :)

Looking ahead, it looks like the ext4/f2fs implementations only seem to
be returning max(i_blocksize, bdev io_opt)?  But not the pagesize?  Did
I misunderstood this, then?

(The plumbing changes in this patch look ok.)

--D

> Note that as with other statx() extensions, if STATX_IOALIGN isn't set
> in the returned statx struct, then these new fields won't be filled in.
> This will happen if the filesystem doesn't support STATX_IOALIGN, or if
> the file isn't a regular file.  (It might be supported on block device
> files in the future.)  It might also happen if the caller didn't include
> STATX_IOALIGN in the request mask, since statx() isn't required to
> return information that wasn't requested.
> 
> This commit adds the VFS-level plumbing for STATX_IOALIGN.  Individual
> filesystems will still need to add code to support it.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
>  fs/stat.c                 | 3 +++
>  include/linux/stat.h      | 3 +++
>  include/uapi/linux/stat.h | 9 +++++++--
>  3 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/stat.c b/fs/stat.c
> index 5c2c94464e8b0..9d477218545b8 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -611,6 +611,9 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer)
>  	tmp.stx_dev_major = MAJOR(stat->dev);
>  	tmp.stx_dev_minor = MINOR(stat->dev);
>  	tmp.stx_mnt_id = stat->mnt_id;
> +	tmp.stx_mem_align_dio = stat->mem_align_dio;
> +	tmp.stx_offset_align_dio = stat->offset_align_dio;
> +	tmp.stx_offset_align_optimal = stat->offset_align_optimal;
>  
>  	return copy_to_user(buffer, &tmp, sizeof(tmp)) ? -EFAULT : 0;
>  }
> diff --git a/include/linux/stat.h b/include/linux/stat.h
> index 7df06931f25d8..48b8b1ad1567c 100644
> --- a/include/linux/stat.h
> +++ b/include/linux/stat.h
> @@ -50,6 +50,9 @@ struct kstat {
>  	struct timespec64 btime;			/* File creation time */
>  	u64		blocks;
>  	u64		mnt_id;
> +	u32		mem_align_dio;
> +	u32		offset_align_dio;
> +	u32		offset_align_optimal;
>  };
>  
>  #endif
> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> index 1500a0f58041a..f822b23e81091 100644
> --- a/include/uapi/linux/stat.h
> +++ b/include/uapi/linux/stat.h
> @@ -124,9 +124,13 @@ struct statx {
>  	__u32	stx_dev_minor;
>  	/* 0x90 */
>  	__u64	stx_mnt_id;
> -	__u64	__spare2;
> +	__u32	stx_mem_align_dio;	/* Memory buffer alignment for direct I/O */
> +	__u32	stx_offset_align_dio;	/* File offset alignment for direct I/O */
>  	/* 0xa0 */
> -	__u64	__spare3[12];	/* Spare space for future expansion */
> +	__u32	stx_offset_align_optimal; /* Optimal file offset alignment for I/O */
> +	__u32	__spare2;
> +	/* 0xa8 */
> +	__u64	__spare3[11];	/* Spare space for future expansion */
>  	/* 0x100 */
>  };
>  
> @@ -152,6 +156,7 @@ struct statx {
>  #define STATX_BASIC_STATS	0x000007ffU	/* The stuff in the normal stat struct */
>  #define STATX_BTIME		0x00000800U	/* Want/got stx_btime */
>  #define STATX_MNT_ID		0x00001000U	/* Got stx_mnt_id */
> +#define STATX_IOALIGN		0x00002000U	/* Want/got IO alignment info */
>  
>  #define STATX__RESERVED		0x80000000U	/* Reserved for future struct statx expansion */
>  
> -- 
> 2.36.1
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v2 1/7] statx: add I/O alignment information
  2022-05-19 23:06     ` [f2fs-dev] " Darrick J. Wong
@ 2022-05-20  3:27       ` Dave Chinner
  -1 siblings, 0 replies; 42+ messages in thread
From: Dave Chinner @ 2022-05-20  3:27 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Eric Biggers, linux-fsdevel, linux-ext4, linux-f2fs-devel,
	linux-xfs, linux-api, linux-fscrypt, linux-block, linux-kernel,
	Keith Busch

On Thu, May 19, 2022 at 04:06:05PM -0700, Darrick J. Wong wrote:
> On Wed, May 18, 2022 at 04:50:05PM -0700, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > Traditionally, the conditions for when DIO (direct I/O) is supported
> > were fairly simple: filesystems either supported DIO aligned to the
> > block device's logical block size, or didn't support DIO at all.
> > 
> > However, due to filesystem features that have been added over time (e.g,
> > data journalling, inline data, encryption, verity, compression,
> > checkpoint disabling, log-structured mode), the conditions for when DIO
> > is allowed on a file have gotten increasingly complex.  Whether a
> > particular file supports DIO, and with what alignment, can depend on
> > various file attributes and filesystem mount options, as well as which
> > block device(s) the file's data is located on.
> > 
> > XFS has an ioctl XFS_IOC_DIOINFO which exposes this information to
> > applications.  However, as discussed
> > (https://lore.kernel.org/linux-fsdevel/20220120071215.123274-1-ebiggers@kernel.org/T/#u),
> > this ioctl is rarely used and not known to be used outside of
> > XFS-specific code.  It also was never intended to indicate when a file
> > doesn't support DIO at all, and it only exposes the minimum I/O
> > alignment, not the optimal I/O alignment which has been requested too.
> > 
> > Therefore, let's expose this information via statx().  Add the
> > STATX_IOALIGN flag and three fields associated with it:
> > 
> > * stx_mem_align_dio: the alignment (in bytes) required for user memory
> >   buffers for DIO, or 0 if DIO is not supported on the file.
> > 
> > * stx_offset_align_dio: the alignment (in bytes) required for file
> >   offsets and I/O segment lengths for DIO, or 0 if DIO is not supported
> >   on the file.  This will only be nonzero if stx_mem_align_dio is
> >   nonzero, and vice versa.
> > 
> > * stx_offset_align_optimal: the alignment (in bytes) suggested for file
> >   offsets and I/O segment lengths to get optimal performance.  This
> >   applies to both DIO and buffered I/O.  It differs from stx_blocksize
> >   in that stx_offset_align_optimal will contain the real optimum I/O
> >   size, which may be a large value.  In contrast, for compatibility
> >   reasons stx_blocksize is the minimum size needed to avoid page cache
> >   read/write/modify cycles, which may be much smaller than the optimum
> >   I/O size.  For more details about the motivation for this field, see
> >   https://lore.kernel.org/r/20220210040304.GM59729@dread.disaster.area
> 
> Hmm.  So I guess this is supposed to be the filesystem's best guess at
> the IO size that will minimize RMW cycles in the entire stack?  i.e. if
> the user does not want RMW of pagecache pages, of file allocation units
> (if COW is enabled), of RAID stripes, or in the storage itself, then it
> should ensure that all IOs are aligned to this value?
> 
> I guess that means for XFS it's effectively max(pagesize, i_blocksize,
> bdev io_opt, sb_width, and (pretend XFS can reflink the realtime volume)
> the rt extent size)?  I didn't see a manpage update for statx(2) but
> that's mostly what I'm interested in. :)

Yup, xfs_stat_blksize() should give a good idea of what we should
do. It will end up being pretty much that, except without the need
to a mount option to turn on the sunit/swidth return, and always
taking into consideration extent size hints rather than just doing
that for RT inodes...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [f2fs-dev] [RFC PATCH v2 1/7] statx: add I/O alignment information
@ 2022-05-20  3:27       ` Dave Chinner
  0 siblings, 0 replies; 42+ messages in thread
From: Dave Chinner @ 2022-05-20  3:27 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-xfs, linux-api, linux-kernel, linux-f2fs-devel,
	Eric Biggers, Keith Busch, linux-fscrypt, linux-block,
	linux-fsdevel, linux-ext4

On Thu, May 19, 2022 at 04:06:05PM -0700, Darrick J. Wong wrote:
> On Wed, May 18, 2022 at 04:50:05PM -0700, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > Traditionally, the conditions for when DIO (direct I/O) is supported
> > were fairly simple: filesystems either supported DIO aligned to the
> > block device's logical block size, or didn't support DIO at all.
> > 
> > However, due to filesystem features that have been added over time (e.g,
> > data journalling, inline data, encryption, verity, compression,
> > checkpoint disabling, log-structured mode), the conditions for when DIO
> > is allowed on a file have gotten increasingly complex.  Whether a
> > particular file supports DIO, and with what alignment, can depend on
> > various file attributes and filesystem mount options, as well as which
> > block device(s) the file's data is located on.
> > 
> > XFS has an ioctl XFS_IOC_DIOINFO which exposes this information to
> > applications.  However, as discussed
> > (https://lore.kernel.org/linux-fsdevel/20220120071215.123274-1-ebiggers@kernel.org/T/#u),
> > this ioctl is rarely used and not known to be used outside of
> > XFS-specific code.  It also was never intended to indicate when a file
> > doesn't support DIO at all, and it only exposes the minimum I/O
> > alignment, not the optimal I/O alignment which has been requested too.
> > 
> > Therefore, let's expose this information via statx().  Add the
> > STATX_IOALIGN flag and three fields associated with it:
> > 
> > * stx_mem_align_dio: the alignment (in bytes) required for user memory
> >   buffers for DIO, or 0 if DIO is not supported on the file.
> > 
> > * stx_offset_align_dio: the alignment (in bytes) required for file
> >   offsets and I/O segment lengths for DIO, or 0 if DIO is not supported
> >   on the file.  This will only be nonzero if stx_mem_align_dio is
> >   nonzero, and vice versa.
> > 
> > * stx_offset_align_optimal: the alignment (in bytes) suggested for file
> >   offsets and I/O segment lengths to get optimal performance.  This
> >   applies to both DIO and buffered I/O.  It differs from stx_blocksize
> >   in that stx_offset_align_optimal will contain the real optimum I/O
> >   size, which may be a large value.  In contrast, for compatibility
> >   reasons stx_blocksize is the minimum size needed to avoid page cache
> >   read/write/modify cycles, which may be much smaller than the optimum
> >   I/O size.  For more details about the motivation for this field, see
> >   https://lore.kernel.org/r/20220210040304.GM59729@dread.disaster.area
> 
> Hmm.  So I guess this is supposed to be the filesystem's best guess at
> the IO size that will minimize RMW cycles in the entire stack?  i.e. if
> the user does not want RMW of pagecache pages, of file allocation units
> (if COW is enabled), of RAID stripes, or in the storage itself, then it
> should ensure that all IOs are aligned to this value?
> 
> I guess that means for XFS it's effectively max(pagesize, i_blocksize,
> bdev io_opt, sb_width, and (pretend XFS can reflink the realtime volume)
> the rt extent size)?  I didn't see a manpage update for statx(2) but
> that's mostly what I'm interested in. :)

Yup, xfs_stat_blksize() should give a good idea of what we should
do. It will end up being pretty much that, except without the need
to a mount option to turn on the sunit/swidth return, and always
taking into consideration extent size hints rather than just doing
that for RT inodes...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v2 1/7] statx: add I/O alignment information
  2022-05-19 23:06     ` [f2fs-dev] " Darrick J. Wong
@ 2022-05-20  6:30       ` Eric Biggers
  -1 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-05-20  6:30 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	linux-api, linux-fscrypt, linux-block, linux-kernel, Keith Busch

On Thu, May 19, 2022 at 04:06:05PM -0700, Darrick J. Wong wrote:
> I guess that means for XFS it's effectively max(pagesize, i_blocksize,
> bdev io_opt, sb_width, and (pretend XFS can reflink the realtime volume)
> the rt extent size)?  I didn't see a manpage update for statx(2) but
> that's mostly what I'm interested in. :)

I'll send out a man page update with the next version.  I don't think there will
be much new information that isn't already included in this patchset, though.

> Looking ahead, it looks like the ext4/f2fs implementations only seem to
> be returning max(i_blocksize, bdev io_opt)?  But not the pagesize?

I think that's just an oversight.  ext4 and f2fs should round the value up to
PAGE_SIZE.

- Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [f2fs-dev] [RFC PATCH v2 1/7] statx: add I/O alignment information
@ 2022-05-20  6:30       ` Eric Biggers
  0 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-05-20  6:30 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-block, linux-api, linux-kernel, linux-f2fs-devel,
	linux-xfs, Keith Busch, linux-fscrypt, linux-fsdevel, linux-ext4

On Thu, May 19, 2022 at 04:06:05PM -0700, Darrick J. Wong wrote:
> I guess that means for XFS it's effectively max(pagesize, i_blocksize,
> bdev io_opt, sb_width, and (pretend XFS can reflink the realtime volume)
> the rt extent size)?  I didn't see a manpage update for statx(2) but
> that's mostly what I'm interested in. :)

I'll send out a man page update with the next version.  I don't think there will
be much new information that isn't already included in this patchset, though.

> Looking ahead, it looks like the ext4/f2fs implementations only seem to
> be returning max(i_blocksize, bdev io_opt)?  But not the pagesize?

I think that's just an oversight.  ext4 and f2fs should round the value up to
PAGE_SIZE.

- Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v2 1/7] statx: add I/O alignment information
  2022-05-18 23:50   ` [f2fs-dev] " Eric Biggers
@ 2022-05-20 11:52     ` Christian Brauner
  -1 siblings, 0 replies; 42+ messages in thread
From: Christian Brauner @ 2022-05-20 11:52 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	linux-api, linux-fscrypt, linux-block, linux-kernel, Keith Busch

On Wed, May 18, 2022 at 04:50:05PM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Traditionally, the conditions for when DIO (direct I/O) is supported
> were fairly simple: filesystems either supported DIO aligned to the
> block device's logical block size, or didn't support DIO at all.
> 
> However, due to filesystem features that have been added over time (e.g,
> data journalling, inline data, encryption, verity, compression,
> checkpoint disabling, log-structured mode), the conditions for when DIO
> is allowed on a file have gotten increasingly complex.  Whether a
> particular file supports DIO, and with what alignment, can depend on
> various file attributes and filesystem mount options, as well as which
> block device(s) the file's data is located on.
> 
> XFS has an ioctl XFS_IOC_DIOINFO which exposes this information to
> applications.  However, as discussed
> (https://lore.kernel.org/linux-fsdevel/20220120071215.123274-1-ebiggers@kernel.org/T/#u),
> this ioctl is rarely used and not known to be used outside of
> XFS-specific code.  It also was never intended to indicate when a file
> doesn't support DIO at all, and it only exposes the minimum I/O
> alignment, not the optimal I/O alignment which has been requested too.
> 
> Therefore, let's expose this information via statx().  Add the
> STATX_IOALIGN flag and three fields associated with it:
> 
> * stx_mem_align_dio: the alignment (in bytes) required for user memory
>   buffers for DIO, or 0 if DIO is not supported on the file.
> 
> * stx_offset_align_dio: the alignment (in bytes) required for file
>   offsets and I/O segment lengths for DIO, or 0 if DIO is not supported
>   on the file.  This will only be nonzero if stx_mem_align_dio is
>   nonzero, and vice versa.
> 
> * stx_offset_align_optimal: the alignment (in bytes) suggested for file
>   offsets and I/O segment lengths to get optimal performance.  This
>   applies to both DIO and buffered I/O.  It differs from stx_blocksize
>   in that stx_offset_align_optimal will contain the real optimum I/O
>   size, which may be a large value.  In contrast, for compatibility
>   reasons stx_blocksize is the minimum size needed to avoid page cache
>   read/write/modify cycles, which may be much smaller than the optimum
>   I/O size.  For more details about the motivation for this field, see
>   https://lore.kernel.org/r/20220210040304.GM59729@dread.disaster.area
> 
> Note that as with other statx() extensions, if STATX_IOALIGN isn't set
> in the returned statx struct, then these new fields won't be filled in.
> This will happen if the filesystem doesn't support STATX_IOALIGN, or if
> the file isn't a regular file.  (It might be supported on block device
> files in the future.)  It might also happen if the caller didn't include
> STATX_IOALIGN in the request mask, since statx() isn't required to
> return information that wasn't requested.
> 
> This commit adds the VFS-level plumbing for STATX_IOALIGN.  Individual
> filesystems will still need to add code to support it.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---

Looks good to me,
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [f2fs-dev] [RFC PATCH v2 1/7] statx: add I/O alignment information
@ 2022-05-20 11:52     ` Christian Brauner
  0 siblings, 0 replies; 42+ messages in thread
From: Christian Brauner @ 2022-05-20 11:52 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-block, linux-api, linux-kernel, linux-f2fs-devel,
	linux-xfs, Keith Busch, linux-fscrypt, linux-fsdevel, linux-ext4

On Wed, May 18, 2022 at 04:50:05PM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Traditionally, the conditions for when DIO (direct I/O) is supported
> were fairly simple: filesystems either supported DIO aligned to the
> block device's logical block size, or didn't support DIO at all.
> 
> However, due to filesystem features that have been added over time (e.g,
> data journalling, inline data, encryption, verity, compression,
> checkpoint disabling, log-structured mode), the conditions for when DIO
> is allowed on a file have gotten increasingly complex.  Whether a
> particular file supports DIO, and with what alignment, can depend on
> various file attributes and filesystem mount options, as well as which
> block device(s) the file's data is located on.
> 
> XFS has an ioctl XFS_IOC_DIOINFO which exposes this information to
> applications.  However, as discussed
> (https://lore.kernel.org/linux-fsdevel/20220120071215.123274-1-ebiggers@kernel.org/T/#u),
> this ioctl is rarely used and not known to be used outside of
> XFS-specific code.  It also was never intended to indicate when a file
> doesn't support DIO at all, and it only exposes the minimum I/O
> alignment, not the optimal I/O alignment which has been requested too.
> 
> Therefore, let's expose this information via statx().  Add the
> STATX_IOALIGN flag and three fields associated with it:
> 
> * stx_mem_align_dio: the alignment (in bytes) required for user memory
>   buffers for DIO, or 0 if DIO is not supported on the file.
> 
> * stx_offset_align_dio: the alignment (in bytes) required for file
>   offsets and I/O segment lengths for DIO, or 0 if DIO is not supported
>   on the file.  This will only be nonzero if stx_mem_align_dio is
>   nonzero, and vice versa.
> 
> * stx_offset_align_optimal: the alignment (in bytes) suggested for file
>   offsets and I/O segment lengths to get optimal performance.  This
>   applies to both DIO and buffered I/O.  It differs from stx_blocksize
>   in that stx_offset_align_optimal will contain the real optimum I/O
>   size, which may be a large value.  In contrast, for compatibility
>   reasons stx_blocksize is the minimum size needed to avoid page cache
>   read/write/modify cycles, which may be much smaller than the optimum
>   I/O size.  For more details about the motivation for this field, see
>   https://lore.kernel.org/r/20220210040304.GM59729@dread.disaster.area
> 
> Note that as with other statx() extensions, if STATX_IOALIGN isn't set
> in the returned statx struct, then these new fields won't be filled in.
> This will happen if the filesystem doesn't support STATX_IOALIGN, or if
> the file isn't a regular file.  (It might be supported on block device
> files in the future.)  It might also happen if the caller didn't include
> STATX_IOALIGN in the request mask, since statx() isn't required to
> return information that wasn't requested.
> 
> This commit adds the VFS-level plumbing for STATX_IOALIGN.  Individual
> filesystems will still need to add code to support it.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---

Looks good to me,
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [f2fs-dev] [RFC PATCH v2 1/7] statx: add I/O alignment information
  2022-05-18 23:50   ` [f2fs-dev] " Eric Biggers
@ 2022-05-27  9:02     ` Florian Weimer
  -1 siblings, 0 replies; 42+ messages in thread
From: Florian Weimer @ 2022-05-27  9:02 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-block, linux-api, linux-kernel, linux-f2fs-devel,
	linux-xfs, Keith Busch, linux-fscrypt, linux-fsdevel, linux-ext4

* Eric Biggers:

> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> index 1500a0f58041a..f822b23e81091 100644
> --- a/include/uapi/linux/stat.h
> +++ b/include/uapi/linux/stat.h
> @@ -124,9 +124,13 @@ struct statx {
>  	__u32	stx_dev_minor;
>  	/* 0x90 */
>  	__u64	stx_mnt_id;
> -	__u64	__spare2;
> +	__u32	stx_mem_align_dio;	/* Memory buffer alignment for direct I/O */
> +	__u32	stx_offset_align_dio;	/* File offset alignment for direct I/O */
>  	/* 0xa0 */
> -	__u64	__spare3[12];	/* Spare space for future expansion */
> +	__u32	stx_offset_align_optimal; /* Optimal file offset alignment for I/O */
> +	__u32	__spare2;
> +	/* 0xa8 */
> +	__u64	__spare3[11];	/* Spare space for future expansion */
>  	/* 0x100 */
>  };

Are 32 bits enough?  Would it make sense to store the base-2 logarithm
instead?

Thanks,
Florian



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v2 1/7] statx: add I/O alignment information
@ 2022-05-27  9:02     ` Florian Weimer
  0 siblings, 0 replies; 42+ messages in thread
From: Florian Weimer @ 2022-05-27  9:02 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fsdevel, linux-ext4, linux-f2fs-devel, linux-xfs,
	linux-api, linux-fscrypt, linux-block, linux-kernel, Keith Busch

* Eric Biggers:

> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> index 1500a0f58041a..f822b23e81091 100644
> --- a/include/uapi/linux/stat.h
> +++ b/include/uapi/linux/stat.h
> @@ -124,9 +124,13 @@ struct statx {
>  	__u32	stx_dev_minor;
>  	/* 0x90 */
>  	__u64	stx_mnt_id;
> -	__u64	__spare2;
> +	__u32	stx_mem_align_dio;	/* Memory buffer alignment for direct I/O */
> +	__u32	stx_offset_align_dio;	/* File offset alignment for direct I/O */
>  	/* 0xa0 */
> -	__u64	__spare3[12];	/* Spare space for future expansion */
> +	__u32	stx_offset_align_optimal; /* Optimal file offset alignment for I/O */
> +	__u32	__spare2;
> +	/* 0xa8 */
> +	__u64	__spare3[11];	/* Spare space for future expansion */
>  	/* 0x100 */
>  };

Are 32 bits enough?  Would it make sense to store the base-2 logarithm
instead?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v2 1/7] statx: add I/O alignment information
  2022-05-27  9:02     ` Florian Weimer
@ 2022-05-27 16:22       ` Darrick J. Wong
  -1 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2022-05-27 16:22 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Eric Biggers, linux-fsdevel, linux-ext4, linux-f2fs-devel,
	linux-xfs, linux-api, linux-fscrypt, linux-block, linux-kernel,
	Keith Busch

On Fri, May 27, 2022 at 11:02:46AM +0200, Florian Weimer wrote:
> * Eric Biggers:
> 
> > diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> > index 1500a0f58041a..f822b23e81091 100644
> > --- a/include/uapi/linux/stat.h
> > +++ b/include/uapi/linux/stat.h
> > @@ -124,9 +124,13 @@ struct statx {
> >  	__u32	stx_dev_minor;
> >  	/* 0x90 */
> >  	__u64	stx_mnt_id;
> > -	__u64	__spare2;
> > +	__u32	stx_mem_align_dio;	/* Memory buffer alignment for direct I/O */
> > +	__u32	stx_offset_align_dio;	/* File offset alignment for direct I/O */
> >  	/* 0xa0 */
> > -	__u64	__spare3[12];	/* Spare space for future expansion */
> > +	__u32	stx_offset_align_optimal; /* Optimal file offset alignment for I/O */
> > +	__u32	__spare2;
> > +	/* 0xa8 */
> > +	__u64	__spare3[11];	/* Spare space for future expansion */
> >  	/* 0x100 */
> >  };
> 
> Are 32 bits enough?  Would it make sense to store the base-2 logarithm
> instead?

I don't think a log2 will work here, XFS will want to report things like
raid stripe sizes, which can be any multiple of the fs blocksize.

32 bits is probably enough, seeing as the kernel won't do an IO larger
than 2GB anyway.

--D

> Thanks,
> Florian
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [f2fs-dev] [RFC PATCH v2 1/7] statx: add I/O alignment information
@ 2022-05-27 16:22       ` Darrick J. Wong
  0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2022-05-27 16:22 UTC (permalink / raw)
  To: Florian Weimer
  Cc: linux-xfs, linux-api, linux-kernel, linux-f2fs-devel,
	Eric Biggers, Keith Busch, linux-fscrypt, linux-block,
	linux-fsdevel, linux-ext4

On Fri, May 27, 2022 at 11:02:46AM +0200, Florian Weimer wrote:
> * Eric Biggers:
> 
> > diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> > index 1500a0f58041a..f822b23e81091 100644
> > --- a/include/uapi/linux/stat.h
> > +++ b/include/uapi/linux/stat.h
> > @@ -124,9 +124,13 @@ struct statx {
> >  	__u32	stx_dev_minor;
> >  	/* 0x90 */
> >  	__u64	stx_mnt_id;
> > -	__u64	__spare2;
> > +	__u32	stx_mem_align_dio;	/* Memory buffer alignment for direct I/O */
> > +	__u32	stx_offset_align_dio;	/* File offset alignment for direct I/O */
> >  	/* 0xa0 */
> > -	__u64	__spare3[12];	/* Spare space for future expansion */
> > +	__u32	stx_offset_align_optimal; /* Optimal file offset alignment for I/O */
> > +	__u32	__spare2;
> > +	/* 0xa8 */
> > +	__u64	__spare3[11];	/* Spare space for future expansion */
> >  	/* 0x100 */
> >  };
> 
> Are 32 bits enough?  Would it make sense to store the base-2 logarithm
> instead?

I don't think a log2 will work here, XFS will want to report things like
raid stripe sizes, which can be any multiple of the fs blocksize.

32 bits is probably enough, seeing as the kernel won't do an IO larger
than 2GB anyway.

--D

> Thanks,
> Florian
> 


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v2 1/7] statx: add I/O alignment information
  2022-05-20  3:27       ` [f2fs-dev] " Dave Chinner
@ 2022-06-14  5:25         ` Eric Biggers
  -1 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-06-14  5:25 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J. Wong, linux-fsdevel, linux-ext4, linux-f2fs-devel,
	linux-xfs, linux-api, linux-fscrypt, linux-block, linux-kernel,
	Keith Busch

On Fri, May 20, 2022 at 01:27:39PM +1000, Dave Chinner wrote:
> > > * stx_offset_align_optimal: the alignment (in bytes) suggested for file
> > >   offsets and I/O segment lengths to get optimal performance.  This
> > >   applies to both DIO and buffered I/O.  It differs from stx_blocksize
> > >   in that stx_offset_align_optimal will contain the real optimum I/O
> > >   size, which may be a large value.  In contrast, for compatibility
> > >   reasons stx_blocksize is the minimum size needed to avoid page cache
> > >   read/write/modify cycles, which may be much smaller than the optimum
> > >   I/O size.  For more details about the motivation for this field, see
> > >   https://lore.kernel.org/r/20220210040304.GM59729@dread.disaster.area
> > 
> > Hmm.  So I guess this is supposed to be the filesystem's best guess at
> > the IO size that will minimize RMW cycles in the entire stack?  i.e. if
> > the user does not want RMW of pagecache pages, of file allocation units
> > (if COW is enabled), of RAID stripes, or in the storage itself, then it
> > should ensure that all IOs are aligned to this value?
> > 
> > I guess that means for XFS it's effectively max(pagesize, i_blocksize,
> > bdev io_opt, sb_width, and (pretend XFS can reflink the realtime volume)
> > the rt extent size)?  I didn't see a manpage update for statx(2) but
> > that's mostly what I'm interested in. :)
> 
> Yup, xfs_stat_blksize() should give a good idea of what we should
> do. It will end up being pretty much that, except without the need
> to a mount option to turn on the sunit/swidth return, and always
> taking into consideration extent size hints rather than just doing
> that for RT inodes...

While working on the man-pages update, I'm having second thoughts about the
stx_offset_align_optimal field.  Does any filesystem other than XFS actually
want stx_offset_align_optimal, when st[x]_blksize already exists?  Many network
filesystems, as well as tmpfs when hugepages are enabled, already report large
(megabytes) sizes in st[x]_blksize.  And all documentation I looked at (man
pages for Linux, POSIX, FreeBSD, NetBSD, macOS) documents st_blksize as
something like "the preferred blocksize for efficient I/O".  It's never
documented as being limited to PAGE_SIZE, which makes sense because it's not.

So stx_offset_align_optimal seems redundant, and it is going to confuse
application developers who will have to decide when to use st[x]_blksize and
when to use stx_offset_align_optimal.

Also, applications that don't work well with huge reported optimal I/O sizes
would still continue to exist, as it will remain possible for applications to
only be tested on filesystems that report a small optimal I/O size.

Perhaps for now we should just add STATX_DIOALIGN instead of STATX_IOALIGN,
leaving out the stx_offset_align_optimal field?  What do people think?

- Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [f2fs-dev] [RFC PATCH v2 1/7] statx: add I/O alignment information
@ 2022-06-14  5:25         ` Eric Biggers
  0 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-06-14  5:25 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-block, Darrick J. Wong, linux-kernel, linux-f2fs-devel,
	linux-xfs, Keith Busch, linux-fscrypt, linux-api, linux-fsdevel,
	linux-ext4

On Fri, May 20, 2022 at 01:27:39PM +1000, Dave Chinner wrote:
> > > * stx_offset_align_optimal: the alignment (in bytes) suggested for file
> > >   offsets and I/O segment lengths to get optimal performance.  This
> > >   applies to both DIO and buffered I/O.  It differs from stx_blocksize
> > >   in that stx_offset_align_optimal will contain the real optimum I/O
> > >   size, which may be a large value.  In contrast, for compatibility
> > >   reasons stx_blocksize is the minimum size needed to avoid page cache
> > >   read/write/modify cycles, which may be much smaller than the optimum
> > >   I/O size.  For more details about the motivation for this field, see
> > >   https://lore.kernel.org/r/20220210040304.GM59729@dread.disaster.area
> > 
> > Hmm.  So I guess this is supposed to be the filesystem's best guess at
> > the IO size that will minimize RMW cycles in the entire stack?  i.e. if
> > the user does not want RMW of pagecache pages, of file allocation units
> > (if COW is enabled), of RAID stripes, or in the storage itself, then it
> > should ensure that all IOs are aligned to this value?
> > 
> > I guess that means for XFS it's effectively max(pagesize, i_blocksize,
> > bdev io_opt, sb_width, and (pretend XFS can reflink the realtime volume)
> > the rt extent size)?  I didn't see a manpage update for statx(2) but
> > that's mostly what I'm interested in. :)
> 
> Yup, xfs_stat_blksize() should give a good idea of what we should
> do. It will end up being pretty much that, except without the need
> to a mount option to turn on the sunit/swidth return, and always
> taking into consideration extent size hints rather than just doing
> that for RT inodes...

While working on the man-pages update, I'm having second thoughts about the
stx_offset_align_optimal field.  Does any filesystem other than XFS actually
want stx_offset_align_optimal, when st[x]_blksize already exists?  Many network
filesystems, as well as tmpfs when hugepages are enabled, already report large
(megabytes) sizes in st[x]_blksize.  And all documentation I looked at (man
pages for Linux, POSIX, FreeBSD, NetBSD, macOS) documents st_blksize as
something like "the preferred blocksize for efficient I/O".  It's never
documented as being limited to PAGE_SIZE, which makes sense because it's not.

So stx_offset_align_optimal seems redundant, and it is going to confuse
application developers who will have to decide when to use st[x]_blksize and
when to use stx_offset_align_optimal.

Also, applications that don't work well with huge reported optimal I/O sizes
would still continue to exist, as it will remain possible for applications to
only be tested on filesystems that report a small optimal I/O size.

Perhaps for now we should just add STATX_DIOALIGN instead of STATX_IOALIGN,
leaving out the stx_offset_align_optimal field?  What do people think?

- Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v2 1/7] statx: add I/O alignment information
  2022-06-14  5:25         ` [f2fs-dev] " Eric Biggers
@ 2022-06-15 13:12           ` Christoph Hellwig
  -1 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2022-06-15 13:12 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Dave Chinner, Darrick J. Wong, linux-fsdevel, linux-ext4,
	linux-f2fs-devel, linux-xfs, linux-api, linux-fscrypt,
	linux-block, linux-kernel, Keith Busch

On Mon, Jun 13, 2022 at 10:25:12PM -0700, Eric Biggers wrote:
> While working on the man-pages update, I'm having second thoughts about the
> stx_offset_align_optimal field.  Does any filesystem other than XFS actually
> want stx_offset_align_optimal, when st[x]_blksize already exists?  Many network
> filesystems, as well as tmpfs when hugepages are enabled, already report large
> (megabytes) sizes in st[x]_blksize.  And all documentation I looked at (man
> pages for Linux, POSIX, FreeBSD, NetBSD, macOS) documents st_blksize as
> something like "the preferred blocksize for efficient I/O".  It's never
> documented as being limited to PAGE_SIZE, which makes sense because it's not.

Yes.  While st_blksize is utterly misnamed, it has always aways been
the optimal I/O size.

> Perhaps for now we should just add STATX_DIOALIGN instead of STATX_IOALIGN,
> leaving out the stx_offset_align_optimal field?  What do people think?

Yes, this sounds like a good plan.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [f2fs-dev] [RFC PATCH v2 1/7] statx: add I/O alignment information
@ 2022-06-15 13:12           ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2022-06-15 13:12 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-block, Darrick J. Wong, Dave Chinner, linux-kernel,
	linux-f2fs-devel, linux-xfs, Keith Busch, linux-fscrypt,
	linux-api, linux-fsdevel, linux-ext4

On Mon, Jun 13, 2022 at 10:25:12PM -0700, Eric Biggers wrote:
> While working on the man-pages update, I'm having second thoughts about the
> stx_offset_align_optimal field.  Does any filesystem other than XFS actually
> want stx_offset_align_optimal, when st[x]_blksize already exists?  Many network
> filesystems, as well as tmpfs when hugepages are enabled, already report large
> (megabytes) sizes in st[x]_blksize.  And all documentation I looked at (man
> pages for Linux, POSIX, FreeBSD, NetBSD, macOS) documents st_blksize as
> something like "the preferred blocksize for efficient I/O".  It's never
> documented as being limited to PAGE_SIZE, which makes sense because it's not.

Yes.  While st_blksize is utterly misnamed, it has always aways been
the optimal I/O size.

> Perhaps for now we should just add STATX_DIOALIGN instead of STATX_IOALIGN,
> leaving out the stx_offset_align_optimal field?  What do people think?

Yes, this sounds like a good plan.


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v2 1/7] statx: add I/O alignment information
  2022-06-15 13:12           ` [f2fs-dev] " Christoph Hellwig
@ 2022-06-16  0:04             ` Eric Biggers
  -1 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-06-16  0:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Chinner, Darrick J. Wong, linux-fsdevel, linux-ext4,
	linux-f2fs-devel, linux-xfs, linux-api, linux-fscrypt,
	linux-block, linux-kernel, Keith Busch

On Wed, Jun 15, 2022 at 06:12:04AM -0700, Christoph Hellwig wrote:
> On Mon, Jun 13, 2022 at 10:25:12PM -0700, Eric Biggers wrote:
> > While working on the man-pages update, I'm having second thoughts about the
> > stx_offset_align_optimal field.  Does any filesystem other than XFS actually
> > want stx_offset_align_optimal, when st[x]_blksize already exists?  Many network
> > filesystems, as well as tmpfs when hugepages are enabled, already report large
> > (megabytes) sizes in st[x]_blksize.  And all documentation I looked at (man
> > pages for Linux, POSIX, FreeBSD, NetBSD, macOS) documents st_blksize as
> > something like "the preferred blocksize for efficient I/O".  It's never
> > documented as being limited to PAGE_SIZE, which makes sense because it's not.
> 
> Yes.  While st_blksize is utterly misnamed, it has always aways been
> the optimal I/O size.
> 
> > Perhaps for now we should just add STATX_DIOALIGN instead of STATX_IOALIGN,
> > leaving out the stx_offset_align_optimal field?  What do people think?
> 
> Yes, this sounds like a good plan.

One more thing.  I'm trying to add support for STATX_DIOALIGN on block devices.
Unfortunately I don't think it is going to work, at all, since the inode is for
the device node and not the block device itself.  This is true even after the
file is opened (I previously thought that at least that case would work).

Were you expecting that this would work on block devices?  It seems they will
need a different API -- a new BLK* ioctl, or files in /sys/block/$dev/queue.

- Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [f2fs-dev] [RFC PATCH v2 1/7] statx: add I/O alignment information
@ 2022-06-16  0:04             ` Eric Biggers
  0 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-06-16  0:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-block, Darrick J. Wong, Dave Chinner, linux-kernel,
	linux-f2fs-devel, linux-xfs, Keith Busch, linux-fscrypt,
	linux-api, linux-fsdevel, linux-ext4

On Wed, Jun 15, 2022 at 06:12:04AM -0700, Christoph Hellwig wrote:
> On Mon, Jun 13, 2022 at 10:25:12PM -0700, Eric Biggers wrote:
> > While working on the man-pages update, I'm having second thoughts about the
> > stx_offset_align_optimal field.  Does any filesystem other than XFS actually
> > want stx_offset_align_optimal, when st[x]_blksize already exists?  Many network
> > filesystems, as well as tmpfs when hugepages are enabled, already report large
> > (megabytes) sizes in st[x]_blksize.  And all documentation I looked at (man
> > pages for Linux, POSIX, FreeBSD, NetBSD, macOS) documents st_blksize as
> > something like "the preferred blocksize for efficient I/O".  It's never
> > documented as being limited to PAGE_SIZE, which makes sense because it's not.
> 
> Yes.  While st_blksize is utterly misnamed, it has always aways been
> the optimal I/O size.
> 
> > Perhaps for now we should just add STATX_DIOALIGN instead of STATX_IOALIGN,
> > leaving out the stx_offset_align_optimal field?  What do people think?
> 
> Yes, this sounds like a good plan.

One more thing.  I'm trying to add support for STATX_DIOALIGN on block devices.
Unfortunately I don't think it is going to work, at all, since the inode is for
the device node and not the block device itself.  This is true even after the
file is opened (I previously thought that at least that case would work).

Were you expecting that this would work on block devices?  It seems they will
need a different API -- a new BLK* ioctl, or files in /sys/block/$dev/queue.

- Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v2 1/7] statx: add I/O alignment information
  2022-06-16  0:04             ` [f2fs-dev] " Eric Biggers
@ 2022-06-16  6:07               ` Christoph Hellwig
  -1 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2022-06-16  6:07 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Christoph Hellwig, Dave Chinner, Darrick J. Wong, linux-fsdevel,
	linux-ext4, linux-f2fs-devel, linux-xfs, linux-api,
	linux-fscrypt, linux-block, linux-kernel, Keith Busch

On Wed, Jun 15, 2022 at 05:04:57PM -0700, Eric Biggers wrote:
> One more thing.  I'm trying to add support for STATX_DIOALIGN on block devices.
> Unfortunately I don't think it is going to work, at all, since the inode is for
> the device node and not the block device itself.  This is true even after the
> file is opened (I previously thought that at least that case would work).

For an open file the block device inode is pointed to by
file->f_mapping->host.

> Were you expecting that this would work on block devices?  It seems they will
> need a different API -- a new BLK* ioctl, or files in /sys/block/$dev/queue.

blkdev_get_no_open on inode->i_rdev gets you the block device, which
then has bdev->bd_inode point to the underlying block device, although
for a block device those limit probably would be retrieved not from
the inode but the gendisk / request_queue anyway.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [f2fs-dev] [RFC PATCH v2 1/7] statx: add I/O alignment information
@ 2022-06-16  6:07               ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2022-06-16  6:07 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-xfs, Darrick J. Wong, Dave Chinner, linux-kernel,
	linux-f2fs-devel, Christoph Hellwig, Keith Busch, linux-fscrypt,
	linux-block, linux-api, linux-fsdevel, linux-ext4

On Wed, Jun 15, 2022 at 05:04:57PM -0700, Eric Biggers wrote:
> One more thing.  I'm trying to add support for STATX_DIOALIGN on block devices.
> Unfortunately I don't think it is going to work, at all, since the inode is for
> the device node and not the block device itself.  This is true even after the
> file is opened (I previously thought that at least that case would work).

For an open file the block device inode is pointed to by
file->f_mapping->host.

> Were you expecting that this would work on block devices?  It seems they will
> need a different API -- a new BLK* ioctl, or files in /sys/block/$dev/queue.

blkdev_get_no_open on inode->i_rdev gets you the block device, which
then has bdev->bd_inode point to the underlying block device, although
for a block device those limit probably would be retrieved not from
the inode but the gendisk / request_queue anyway.


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v2 1/7] statx: add I/O alignment information
  2022-06-16  6:07               ` [f2fs-dev] " Christoph Hellwig
@ 2022-06-16  6:19                 ` Eric Biggers
  -1 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-06-16  6:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Chinner, Darrick J. Wong, linux-fsdevel, linux-ext4,
	linux-f2fs-devel, linux-xfs, linux-api, linux-fscrypt,
	linux-block, linux-kernel, Keith Busch

On Wed, Jun 15, 2022 at 11:07:17PM -0700, Christoph Hellwig wrote:
> On Wed, Jun 15, 2022 at 05:04:57PM -0700, Eric Biggers wrote:
> > One more thing.  I'm trying to add support for STATX_DIOALIGN on block devices.
> > Unfortunately I don't think it is going to work, at all, since the inode is for
> > the device node and not the block device itself.  This is true even after the
> > file is opened (I previously thought that at least that case would work).
> 
> For an open file the block device inode is pointed to by
> file->f_mapping->host.
> 
> > Were you expecting that this would work on block devices?  It seems they will
> > need a different API -- a new BLK* ioctl, or files in /sys/block/$dev/queue.
> 
> blkdev_get_no_open on inode->i_rdev gets you the block device, which
> then has bdev->bd_inode point to the underlying block device, although
> for a block device those limit probably would be retrieved not from
> the inode but the gendisk / request_queue anyway.

Yes I know that.  The issue is that the inode that statx() is operating on is
the device node, so *all* the other statx fields come from that inode.  Size,
nlink, uid, gid, mode, timestamps (including btime if the filesystem supports
it), inode number, device number of the containing filesystem, mount ID, etc.
If we were to randomly grab one field from the underlying block device instead,
that would be inconsistent with everything else.

- Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [f2fs-dev] [RFC PATCH v2 1/7] statx: add I/O alignment information
@ 2022-06-16  6:19                 ` Eric Biggers
  0 siblings, 0 replies; 42+ messages in thread
From: Eric Biggers @ 2022-06-16  6:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-block, Darrick J. Wong, Dave Chinner, linux-kernel,
	linux-f2fs-devel, linux-xfs, Keith Busch, linux-fscrypt,
	linux-api, linux-fsdevel, linux-ext4

On Wed, Jun 15, 2022 at 11:07:17PM -0700, Christoph Hellwig wrote:
> On Wed, Jun 15, 2022 at 05:04:57PM -0700, Eric Biggers wrote:
> > One more thing.  I'm trying to add support for STATX_DIOALIGN on block devices.
> > Unfortunately I don't think it is going to work, at all, since the inode is for
> > the device node and not the block device itself.  This is true even after the
> > file is opened (I previously thought that at least that case would work).
> 
> For an open file the block device inode is pointed to by
> file->f_mapping->host.
> 
> > Were you expecting that this would work on block devices?  It seems they will
> > need a different API -- a new BLK* ioctl, or files in /sys/block/$dev/queue.
> 
> blkdev_get_no_open on inode->i_rdev gets you the block device, which
> then has bdev->bd_inode point to the underlying block device, although
> for a block device those limit probably would be retrieved not from
> the inode but the gendisk / request_queue anyway.

Yes I know that.  The issue is that the inode that statx() is operating on is
the device node, so *all* the other statx fields come from that inode.  Size,
nlink, uid, gid, mode, timestamps (including btime if the filesystem supports
it), inode number, device number of the containing filesystem, mount ID, etc.
If we were to randomly grab one field from the underlying block device instead,
that would be inconsistent with everything else.

- Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH v2 1/7] statx: add I/O alignment information
  2022-06-16  6:19                 ` [f2fs-dev] " Eric Biggers
@ 2022-06-16  6:29                   ` Christoph Hellwig
  -1 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2022-06-16  6:29 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Christoph Hellwig, Dave Chinner, Darrick J. Wong, linux-fsdevel,
	linux-ext4, linux-f2fs-devel, linux-xfs, linux-api,
	linux-fscrypt, linux-block, linux-kernel, Keith Busch

On Wed, Jun 15, 2022 at 11:19:32PM -0700, Eric Biggers wrote:
> Yes I know that.  The issue is that the inode that statx() is operating on is
> the device node, so *all* the other statx fields come from that inode.  Size,
> nlink, uid, gid, mode, timestamps (including btime if the filesystem supports
> it), inode number, device number of the containing filesystem, mount ID, etc.
> If we were to randomly grab one field from the underlying block device instead,
> that would be inconsistent with everything else.

At least on XFS we have a magic hardcoded st_blksize for block devices,
but it seems like the generic doesn't do that.

But I'm really much more worried about an inconsistency where we get
usefull information or some special files rather than where we acquire
this information from.  So I think going to the block device inode, and
also going to it for stx_blksize is the right thing as it actually
makes the interface useful.  We just need a good helper that all
getattr implementations can use to be consistent and/or override these
fields after the call to ->getattr.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [f2fs-dev] [RFC PATCH v2 1/7] statx: add I/O alignment information
@ 2022-06-16  6:29                   ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2022-06-16  6:29 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-xfs, Darrick J. Wong, Dave Chinner, linux-kernel,
	linux-f2fs-devel, Christoph Hellwig, Keith Busch, linux-fscrypt,
	linux-block, linux-api, linux-fsdevel, linux-ext4

On Wed, Jun 15, 2022 at 11:19:32PM -0700, Eric Biggers wrote:
> Yes I know that.  The issue is that the inode that statx() is operating on is
> the device node, so *all* the other statx fields come from that inode.  Size,
> nlink, uid, gid, mode, timestamps (including btime if the filesystem supports
> it), inode number, device number of the containing filesystem, mount ID, etc.
> If we were to randomly grab one field from the underlying block device instead,
> that would be inconsistent with everything else.

At least on XFS we have a magic hardcoded st_blksize for block devices,
but it seems like the generic doesn't do that.

But I'm really much more worried about an inconsistency where we get
usefull information or some special files rather than where we acquire
this information from.  So I think going to the block device inode, and
also going to it for stx_blksize is the right thing as it actually
makes the interface useful.  We just need a good helper that all
getattr implementations can use to be consistent and/or override these
fields after the call to ->getattr.


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2022-06-16  6:29 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-18 23:50 [RFC PATCH v2 0/7] make statx() return I/O alignment information Eric Biggers
2022-05-18 23:50 ` [f2fs-dev] " Eric Biggers
2022-05-18 23:50 ` [RFC PATCH v2 1/7] statx: add " Eric Biggers
2022-05-18 23:50   ` [f2fs-dev] " Eric Biggers
2022-05-19  7:05   ` Christoph Hellwig
2022-05-19  7:05     ` [f2fs-dev] " Christoph Hellwig
2022-05-19 23:06   ` Darrick J. Wong
2022-05-19 23:06     ` [f2fs-dev] " Darrick J. Wong
2022-05-20  3:27     ` Dave Chinner
2022-05-20  3:27       ` [f2fs-dev] " Dave Chinner
2022-06-14  5:25       ` Eric Biggers
2022-06-14  5:25         ` [f2fs-dev] " Eric Biggers
2022-06-15 13:12         ` Christoph Hellwig
2022-06-15 13:12           ` [f2fs-dev] " Christoph Hellwig
2022-06-16  0:04           ` Eric Biggers
2022-06-16  0:04             ` [f2fs-dev] " Eric Biggers
2022-06-16  6:07             ` Christoph Hellwig
2022-06-16  6:07               ` [f2fs-dev] " Christoph Hellwig
2022-06-16  6:19               ` Eric Biggers
2022-06-16  6:19                 ` [f2fs-dev] " Eric Biggers
2022-06-16  6:29                 ` Christoph Hellwig
2022-06-16  6:29                   ` [f2fs-dev] " Christoph Hellwig
2022-05-20  6:30     ` Eric Biggers
2022-05-20  6:30       ` [f2fs-dev] " Eric Biggers
2022-05-20 11:52   ` Christian Brauner
2022-05-20 11:52     ` [f2fs-dev] " Christian Brauner
2022-05-27  9:02   ` Florian Weimer
2022-05-27  9:02     ` Florian Weimer
2022-05-27 16:22     ` Darrick J. Wong
2022-05-27 16:22       ` [f2fs-dev] " Darrick J. Wong
2022-05-18 23:50 ` [RFC PATCH v2 2/7] fscrypt: change fscrypt_dio_supported() to prepare for STATX_IOALIGN Eric Biggers
2022-05-18 23:50   ` [f2fs-dev] " Eric Biggers
2022-05-18 23:50 ` [RFC PATCH v2 3/7] ext4: support STATX_IOALIGN Eric Biggers
2022-05-18 23:50   ` [f2fs-dev] " Eric Biggers
2022-05-18 23:50 ` [RFC PATCH v2 4/7] f2fs: move f2fs_force_buffered_io() into file.c Eric Biggers
2022-05-18 23:50   ` [f2fs-dev] " Eric Biggers
2022-05-18 23:50 ` [RFC PATCH v2 5/7] f2fs: don't allow DIO reads but not DIO writes Eric Biggers
2022-05-18 23:50   ` [f2fs-dev] " Eric Biggers
2022-05-18 23:50 ` [RFC PATCH v2 6/7] f2fs: simplify f2fs_force_buffered_io() Eric Biggers
2022-05-18 23:50   ` [f2fs-dev] " Eric Biggers
2022-05-18 23:50 ` [RFC PATCH v2 7/7] f2fs: support STATX_IOALIGN Eric Biggers
2022-05-18 23:50   ` [f2fs-dev] " Eric Biggers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.