All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv6 00/11] direct-io dma alignment
@ 2022-06-10 19:58 Keith Busch
  2022-06-10 19:58 ` [PATCHv6 01/11] block: fix infinite loop for invalid zone append Keith Busch
                   ` (11 more replies)
  0 siblings, 12 replies; 51+ messages in thread
From: Keith Busch @ 2022-06-10 19:58 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

From: Keith Busch <kbusch@kernel.org>

The previous version is available here:

  https://lore.kernel.org/linux-block/Yp4qQRI5awiycml1@kbusch-mbp.dhcp.thefacebook.com/T/#m0a93b6392038aad6144e066fb5ada2cbf316f78e 

Changes from the previous are all trivial changes:

  Fixed a typo, s/bvev_/bdev_/

  Updated commit messages and reviews

  Updated code comments

  Added a new comment for request_queue dma_alignement expressly
  documenting the consequences of setting this. All existing users of
  this attribute were checked to ensure they are on the correct side of
  those consequences.

Keith Busch (11):
  block: fix infinite loop for invalid zone append
  block/bio: remove duplicate append pages code
  block: export dma_alignment attribute
  block: introduce bdev_dma_alignment helper
  block: add a helper function for dio alignment
  block/merge: count bytes instead of sectors
  block/bounce: count bytes instead of sectors
  iov: introduce iov_iter_aligned
  block: introduce bdev_iter_is_aligned helper
  block: relax direct io memory alignment
  iomap: add support for dma aligned direct-io

 Documentation/ABI/stable/sysfs-block |   9 +++
 block/bio.c                          | 114 ++++++++++++---------------
 block/blk-merge.c                    |  41 ++++++----
 block/blk-sysfs.c                    |   7 ++
 block/bounce.c                       |  13 ++-
 block/fops.c                         |  16 ++--
 fs/iomap/direct-io.c                 |   4 +-
 include/linux/blkdev.h               |  17 ++++
 include/linux/uio.h                  |   2 +
 lib/iov_iter.c                       |  92 +++++++++++++++++++++
 10 files changed, 224 insertions(+), 91 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCHv6 01/11] block: fix infinite loop for invalid zone append
  2022-06-10 19:58 [PATCHv6 00/11] direct-io dma alignment Keith Busch
@ 2022-06-10 19:58 ` Keith Busch
  2022-06-10 19:58 ` [PATCHv6 02/11] block/bio: remove duplicate append pages code Keith Busch
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 51+ messages in thread
From: Keith Busch @ 2022-06-10 19:58 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch, Johannes Thumshirn

From: Keith Busch <kbusch@kernel.org>

Returning 0 early from __bio_iov_append_get_pages() for the
max_append_sectors warning just creates an infinite loop since 0 means
success, and the bio will never fill from the unadvancing iov_iter. We
could turn the return into an error value, but it will already be turned
into an error value later on, so just remove the warning. Clearly no one
ever hit it anyway.

Fixes: 0512a75b98f84 ("block: Introduce REQ_OP_ZONE_APPEND")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 block/bio.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index f92d0223247b..d481d5e4fe47 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1229,9 +1229,6 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
 	size_t offset;
 	int ret = 0;
 
-	if (WARN_ON_ONCE(!max_append_sectors))
-		return 0;
-
 	/*
 	 * Move page array up in the allocated memory for the bio vecs as far as
 	 * possible so that we can start filling biovecs from the beginning
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv6 02/11] block/bio: remove duplicate append pages code
  2022-06-10 19:58 [PATCHv6 00/11] direct-io dma alignment Keith Busch
  2022-06-10 19:58 ` [PATCHv6 01/11] block: fix infinite loop for invalid zone append Keith Busch
@ 2022-06-10 19:58 ` Keith Busch
  2022-06-10 19:58 ` [PATCHv6 03/11] block: export dma_alignment attribute Keith Busch
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 51+ messages in thread
From: Keith Busch @ 2022-06-10 19:58 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch, Johannes Thumshirn

From: Keith Busch <kbusch@kernel.org>

The getting pages setup for zone append and normal IO are identical. Use
common code for each.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/bio.c | 102 ++++++++++++++++++++++------------------------------
 1 file changed, 42 insertions(+), 60 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index d481d5e4fe47..5618c6a4b3a3 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1159,6 +1159,37 @@ static void bio_put_pages(struct page **pages, size_t size, size_t off)
 		put_page(pages[i]);
 }
 
+static int bio_iov_add_page(struct bio *bio, struct page *page,
+		unsigned int len, unsigned int offset)
+{
+	bool same_page = false;
+
+	if (!__bio_try_merge_page(bio, page, len, offset, &same_page)) {
+		if (WARN_ON_ONCE(bio_full(bio, len)))
+			return -EINVAL;
+		__bio_add_page(bio, page, len, offset);
+		return 0;
+	}
+
+	if (same_page)
+		put_page(page);
+	return 0;
+}
+
+static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page,
+		unsigned int len, unsigned int offset)
+{
+	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
+	bool same_page = false;
+
+	if (bio_add_hw_page(q, bio, page, len, offset,
+			queue_max_zone_append_sectors(q), &same_page) != len)
+		return -EINVAL;
+	if (same_page)
+		put_page(page);
+	return 0;
+}
+
 #define PAGE_PTRS_PER_BVEC     (sizeof(struct bio_vec) / sizeof(struct page *))
 
 /**
@@ -1177,7 +1208,6 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
 	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
 	struct page **pages = (struct page **)bv;
-	bool same_page = false;
 	ssize_t size, left;
 	unsigned len, i;
 	size_t offset;
@@ -1186,7 +1216,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	 * Move page array up in the allocated memory for the bio vecs as far as
 	 * possible so that we can start filling biovecs from the beginning
 	 * without overwriting the temporary page array.
-	*/
+	 */
 	BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
 	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
 
@@ -1196,18 +1226,18 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 
 	for (left = size, i = 0; left > 0; left -= len, i++) {
 		struct page *page = pages[i];
+		int ret;
 
 		len = min_t(size_t, PAGE_SIZE - offset, left);
+		if (bio_op(bio) == REQ_OP_ZONE_APPEND)
+			ret = bio_iov_add_zone_append_page(bio, page, len,
+					offset);
+		else
+			ret = bio_iov_add_page(bio, page, len, offset);
 
-		if (__bio_try_merge_page(bio, page, len, offset, &same_page)) {
-			if (same_page)
-				put_page(page);
-		} else {
-			if (WARN_ON_ONCE(bio_full(bio, len))) {
-				bio_put_pages(pages + i, left, offset);
-				return -EINVAL;
-			}
-			__bio_add_page(bio, page, len, offset);
+		if (ret) {
+			bio_put_pages(pages + i, left, offset);
+			return ret;
 		}
 		offset = 0;
 	}
@@ -1216,51 +1246,6 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	return 0;
 }
 
-static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
-{
-	unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
-	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
-	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
-	unsigned int max_append_sectors = queue_max_zone_append_sectors(q);
-	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
-	struct page **pages = (struct page **)bv;
-	ssize_t size, left;
-	unsigned len, i;
-	size_t offset;
-	int ret = 0;
-
-	/*
-	 * Move page array up in the allocated memory for the bio vecs as far as
-	 * possible so that we can start filling biovecs from the beginning
-	 * without overwriting the temporary page array.
-	 */
-	BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
-	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
-
-	size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset);
-	if (unlikely(size <= 0))
-		return size ? size : -EFAULT;
-
-	for (left = size, i = 0; left > 0; left -= len, i++) {
-		struct page *page = pages[i];
-		bool same_page = false;
-
-		len = min_t(size_t, PAGE_SIZE - offset, left);
-		if (bio_add_hw_page(q, bio, page, len, offset,
-				max_append_sectors, &same_page) != len) {
-			bio_put_pages(pages + i, left, offset);
-			ret = -EINVAL;
-			break;
-		}
-		if (same_page)
-			put_page(page);
-		offset = 0;
-	}
-
-	iov_iter_advance(iter, size - left);
-	return ret;
-}
-
 /**
  * bio_iov_iter_get_pages - add user or kernel pages to a bio
  * @bio: bio to add pages to
@@ -1295,10 +1280,7 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	}
 
 	do {
-		if (bio_op(bio) == REQ_OP_ZONE_APPEND)
-			ret = __bio_iov_append_get_pages(bio, iter);
-		else
-			ret = __bio_iov_iter_get_pages(bio, iter);
+		ret = __bio_iov_iter_get_pages(bio, iter);
 	} while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
 
 	/* don't account direct I/O as memory stall */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv6 03/11] block: export dma_alignment attribute
  2022-06-10 19:58 [PATCHv6 00/11] direct-io dma alignment Keith Busch
  2022-06-10 19:58 ` [PATCHv6 01/11] block: fix infinite loop for invalid zone append Keith Busch
  2022-06-10 19:58 ` [PATCHv6 02/11] block/bio: remove duplicate append pages code Keith Busch
@ 2022-06-10 19:58 ` Keith Busch
  2022-06-10 19:58 ` [PATCHv6 04/11] block: introduce bdev_dma_alignment helper Keith Busch
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 51+ messages in thread
From: Keith Busch @ 2022-06-10 19:58 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch, Johannes Thumshirn

From: Keith Busch <kbusch@kernel.org>

User space may want to know how to align their buffers to avoid
bouncing. Export the queue attribute.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/ABI/stable/sysfs-block | 9 +++++++++
 block/blk-sysfs.c                    | 7 +++++++
 2 files changed, 16 insertions(+)

diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
index e8797cd09aff..cd14ecb3c9a5 100644
--- a/Documentation/ABI/stable/sysfs-block
+++ b/Documentation/ABI/stable/sysfs-block
@@ -260,6 +260,15 @@ Description:
 		for discards, and don't read this file.
 
 
+What:		/sys/block/<disk>/queue/dma_alignment
+Date:		May 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		Reports the alignment that user space addresses must have to be
+		used for raw block device access with O_DIRECT and other driver
+		specific passthrough mechanisms.
+
+
 What:		/sys/block/<disk>/queue/fua
 Date:		May 2018
 Contact:	linux-block@vger.kernel.org
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 88bd41d4cb59..14607565d781 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -274,6 +274,11 @@ static ssize_t queue_virt_boundary_mask_show(struct request_queue *q, char *page
 	return queue_var_show(q->limits.virt_boundary_mask, page);
 }
 
+static ssize_t queue_dma_alignment_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(queue_dma_alignment(q), page);
+}
+
 #define QUEUE_SYSFS_BIT_FNS(name, flag, neg)				\
 static ssize_t								\
 queue_##name##_show(struct request_queue *q, char *page)		\
@@ -606,6 +611,7 @@ QUEUE_RO_ENTRY(queue_dax, "dax");
 QUEUE_RW_ENTRY(queue_io_timeout, "io_timeout");
 QUEUE_RW_ENTRY(queue_wb_lat, "wbt_lat_usec");
 QUEUE_RO_ENTRY(queue_virt_boundary_mask, "virt_boundary_mask");
+QUEUE_RO_ENTRY(queue_dma_alignment, "dma_alignment");
 
 #ifdef CONFIG_BLK_DEV_THROTTLING_LOW
 QUEUE_RW_ENTRY(blk_throtl_sample_time, "throttle_sample_time");
@@ -667,6 +673,7 @@ static struct attribute *queue_attrs[] = {
 	&blk_throtl_sample_time_entry.attr,
 #endif
 	&queue_virt_boundary_mask_entry.attr,
+	&queue_dma_alignment_entry.attr,
 	NULL,
 };
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv6 04/11] block: introduce bdev_dma_alignment helper
  2022-06-10 19:58 [PATCHv6 00/11] direct-io dma alignment Keith Busch
                   ` (2 preceding siblings ...)
  2022-06-10 19:58 ` [PATCHv6 03/11] block: export dma_alignment attribute Keith Busch
@ 2022-06-10 19:58 ` Keith Busch
  2022-06-10 19:58 ` [PATCHv6 05/11] block: add a helper function for dio alignment Keith Busch
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 51+ messages in thread
From: Keith Busch @ 2022-06-10 19:58 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch, Johannes Thumshirn

From: Keith Busch <kbusch@kernel.org>

Preparing for upcoming dma_alignment users that have a block_device, but
don't need the request_queue.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/blkdev.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 608d577734c2..ab7e6aa17954 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1366,6 +1366,11 @@ static inline int queue_dma_alignment(const struct request_queue *q)
 	return q ? q->dma_alignment : 511;
 }
 
+static inline unsigned int bdev_dma_alignment(struct block_device *bdev)
+{
+	return queue_dma_alignment(bdev_get_queue(bdev));
+}
+
 static inline int blk_rq_aligned(struct request_queue *q, unsigned long addr,
 				 unsigned int len)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv6 05/11] block: add a helper function for dio alignment
  2022-06-10 19:58 [PATCHv6 00/11] direct-io dma alignment Keith Busch
                   ` (3 preceding siblings ...)
  2022-06-10 19:58 ` [PATCHv6 04/11] block: introduce bdev_dma_alignment helper Keith Busch
@ 2022-06-10 19:58 ` Keith Busch
  2022-07-22 21:53   ` Bart Van Assche
  2022-06-10 19:58 ` [PATCHv6 06/11] block/merge: count bytes instead of sectors Keith Busch
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 51+ messages in thread
From: Keith Busch @ 2022-06-10 19:58 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch, Johannes Thumshirn

From: Keith Busch <kbusch@kernel.org>

This will make it easier to add more complex acceptable alignment
criteria in the future.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/fops.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index d6b3276a6c68..9d32df6fc315 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -42,6 +42,13 @@ static unsigned int dio_bio_write_op(struct kiocb *iocb)
 	return op;
 }
 
+static bool blkdev_dio_unaligned(struct block_device *bdev, loff_t pos,
+			      struct iov_iter *iter)
+{
+	return ((pos | iov_iter_alignment(iter)) &
+	    (bdev_logical_block_size(bdev) - 1));
+}
+
 #define DIO_INLINE_BIO_VECS 4
 
 static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
@@ -54,8 +61,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
 	struct bio bio;
 	ssize_t ret;
 
-	if ((pos | iov_iter_alignment(iter)) &
-	    (bdev_logical_block_size(bdev) - 1))
+	if (blkdev_dio_unaligned(bdev, pos, iter))
 		return -EINVAL;
 
 	if (nr_pages <= DIO_INLINE_BIO_VECS)
@@ -173,8 +179,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 	loff_t pos = iocb->ki_pos;
 	int ret = 0;
 
-	if ((pos | iov_iter_alignment(iter)) &
-	    (bdev_logical_block_size(bdev) - 1))
+	if (blkdev_dio_unaligned(bdev, pos, iter))
 		return -EINVAL;
 
 	if (iocb->ki_flags & IOCB_ALLOC_CACHE)
@@ -298,8 +303,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
 	loff_t pos = iocb->ki_pos;
 	int ret = 0;
 
-	if ((pos | iov_iter_alignment(iter)) &
-	    (bdev_logical_block_size(bdev) - 1))
+	if (blkdev_dio_unaligned(bdev, pos, iter))
 		return -EINVAL;
 
 	if (iocb->ki_flags & IOCB_ALLOC_CACHE)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv6 06/11] block/merge: count bytes instead of sectors
  2022-06-10 19:58 [PATCHv6 00/11] direct-io dma alignment Keith Busch
                   ` (4 preceding siblings ...)
  2022-06-10 19:58 ` [PATCHv6 05/11] block: add a helper function for dio alignment Keith Busch
@ 2022-06-10 19:58 ` Keith Busch
  2022-07-22 21:57   ` Bart Van Assche
  2022-06-10 19:58 ` [PATCHv6 07/11] block/bounce: " Keith Busch
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 51+ messages in thread
From: Keith Busch @ 2022-06-10 19:58 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch, Johannes Thumshirn

From: Keith Busch <kbusch@kernel.org>

Individual bv_len's may not be a sector size.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 block/blk-merge.c | 41 ++++++++++++++++++++++++-----------------
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 7771dacc99cb..3874619ba136 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -201,11 +201,11 @@ static inline unsigned get_max_segment_size(const struct request_queue *q,
  * @nsegs:    [in,out] Number of segments in the bio being built. Incremented
  *            by the number of segments from @bv that may be appended to that
  *            bio without exceeding @max_segs
- * @sectors:  [in,out] Number of sectors in the bio being built. Incremented
- *            by the number of sectors from @bv that may be appended to that
- *            bio without exceeding @max_sectors
+ * @bytes:    [in,out] Number of bytes in the bio being built. Incremented
+ *            by the number of bytes from @bv that may be appended to that
+ *            bio without exceeding @max_bytes
  * @max_segs: [in] upper bound for *@nsegs
- * @max_sectors: [in] upper bound for *@sectors
+ * @max_bytes: [in] upper bound for *@bytes
  *
  * When splitting a bio, it can happen that a bvec is encountered that is too
  * big to fit in a single segment and hence that it has to be split in the
@@ -216,10 +216,10 @@ static inline unsigned get_max_segment_size(const struct request_queue *q,
  */
 static bool bvec_split_segs(const struct request_queue *q,
 			    const struct bio_vec *bv, unsigned *nsegs,
-			    unsigned *sectors, unsigned max_segs,
-			    unsigned max_sectors)
+			    unsigned *bytes, unsigned max_segs,
+			    unsigned max_bytes)
 {
-	unsigned max_len = (min(max_sectors, UINT_MAX >> 9) - *sectors) << 9;
+	unsigned max_len = min(max_bytes, UINT_MAX) - *bytes;
 	unsigned len = min(bv->bv_len, max_len);
 	unsigned total_len = 0;
 	unsigned seg_size = 0;
@@ -237,7 +237,7 @@ static bool bvec_split_segs(const struct request_queue *q,
 			break;
 	}
 
-	*sectors += total_len >> 9;
+	*bytes += total_len;
 
 	/* tell the caller to split the bvec if it is too big to fit */
 	return len > 0 || bv->bv_len > max_len;
@@ -269,8 +269,8 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 {
 	struct bio_vec bv, bvprv, *bvprvp = NULL;
 	struct bvec_iter iter;
-	unsigned nsegs = 0, sectors = 0;
-	const unsigned max_sectors = get_max_io_size(q, bio);
+	unsigned nsegs = 0, bytes = 0;
+	const unsigned max_bytes = get_max_io_size(q, bio) << 9;
 	const unsigned max_segs = queue_max_segments(q);
 
 	bio_for_each_bvec(bv, bio, iter) {
@@ -282,12 +282,12 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 			goto split;
 
 		if (nsegs < max_segs &&
-		    sectors + (bv.bv_len >> 9) <= max_sectors &&
+		    bytes + bv.bv_len <= max_bytes &&
 		    bv.bv_offset + bv.bv_len <= PAGE_SIZE) {
 			nsegs++;
-			sectors += bv.bv_len >> 9;
-		} else if (bvec_split_segs(q, &bv, &nsegs, &sectors, max_segs,
-					 max_sectors)) {
+			bytes += bv.bv_len;
+		} else if (bvec_split_segs(q, &bv, &nsegs, &bytes, max_segs,
+					   max_bytes)) {
 			goto split;
 		}
 
@@ -300,13 +300,20 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 split:
 	*segs = nsegs;
 
+	/*
+	 * Individual bvecs might not be logical block aligned. Round down the
+	 * split size so that each bio is properly block size aligned, even if
+	 * we do not use the full hardware limits.
+	 */
+	bytes = ALIGN_DOWN(bytes, queue_logical_block_size(q));
+
 	/*
 	 * Bio splitting may cause subtle trouble such as hang when doing sync
 	 * iopoll in direct IO routine. Given performance gain of iopoll for
 	 * big IO can be trival, disable iopoll when split needed.
 	 */
 	bio_clear_polled(bio);
-	return bio_split(bio, sectors, GFP_NOIO, bs);
+	return bio_split(bio, bytes >> SECTOR_SHIFT, GFP_NOIO, bs);
 }
 
 /**
@@ -375,7 +382,7 @@ EXPORT_SYMBOL(blk_queue_split);
 unsigned int blk_recalc_rq_segments(struct request *rq)
 {
 	unsigned int nr_phys_segs = 0;
-	unsigned int nr_sectors = 0;
+	unsigned int bytes = 0;
 	struct req_iterator iter;
 	struct bio_vec bv;
 
@@ -398,7 +405,7 @@ unsigned int blk_recalc_rq_segments(struct request *rq)
 	}
 
 	rq_for_each_bvec(bv, rq, iter)
-		bvec_split_segs(rq->q, &bv, &nr_phys_segs, &nr_sectors,
+		bvec_split_segs(rq->q, &bv, &nr_phys_segs, &bytes,
 				UINT_MAX, UINT_MAX);
 	return nr_phys_segs;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv6 07/11] block/bounce: count bytes instead of sectors
  2022-06-10 19:58 [PATCHv6 00/11] direct-io dma alignment Keith Busch
                   ` (5 preceding siblings ...)
  2022-06-10 19:58 ` [PATCHv6 06/11] block/merge: count bytes instead of sectors Keith Busch
@ 2022-06-10 19:58 ` Keith Busch
  2022-06-13 14:22   ` Christoph Hellwig
  2022-07-22 22:01   ` Bart Van Assche
  2022-06-10 19:58 ` [PATCHv6 08/11] iov: introduce iov_iter_aligned Keith Busch
                   ` (4 subsequent siblings)
  11 siblings, 2 replies; 51+ messages in thread
From: Keith Busch @ 2022-06-10 19:58 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch, Pankaj Raghav

From: Keith Busch <kbusch@kernel.org>

Individual bv_len's may not be a sector size.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
---
 block/bounce.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/block/bounce.c b/block/bounce.c
index 8f7b6fe3b4db..fbadf179601f 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -205,19 +205,26 @@ void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig)
 	int rw = bio_data_dir(*bio_orig);
 	struct bio_vec *to, from;
 	struct bvec_iter iter;
-	unsigned i = 0;
+	unsigned i = 0, bytes = 0;
 	bool bounce = false;
-	int sectors = 0;
+	int sectors;
 
 	bio_for_each_segment(from, *bio_orig, iter) {
 		if (i++ < BIO_MAX_VECS)
-			sectors += from.bv_len >> 9;
+			bytes += from.bv_len;
 		if (PageHighMem(from.bv_page))
 			bounce = true;
 	}
 	if (!bounce)
 		return;
 
+	/*
+	 * Individual bvecs might not be logical block aligned. Round down
+	 * the split size so that each bio is properly block size aligned,
+	 * even if we do not use the full hardware limits.
+	 */
+	sectors = ALIGN_DOWN(bytes, queue_logical_block_size(q)) >>
+			SECTOR_SHIFT;
 	if (sectors < bio_sectors(*bio_orig)) {
 		bio = bio_split(*bio_orig, sectors, GFP_NOIO, &bounce_bio_split);
 		bio_chain(bio, *bio_orig);
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv6 08/11] iov: introduce iov_iter_aligned
  2022-06-10 19:58 [PATCHv6 00/11] direct-io dma alignment Keith Busch
                   ` (6 preceding siblings ...)
  2022-06-10 19:58 ` [PATCHv6 07/11] block/bounce: " Keith Busch
@ 2022-06-10 19:58 ` Keith Busch
  2022-06-10 19:58 ` [PATCHv6 09/11] block: introduce bdev_iter_is_aligned helper Keith Busch
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 51+ messages in thread
From: Keith Busch @ 2022-06-10 19:58 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch, Alexander Viro

From: Keith Busch <kbusch@kernel.org>

The existing iov_iter_alignment() function returns the logical OR of
address and length. For cases where address and length need to be
considered separately, introduce a helper function that a caller can
specificy length and address masks that indicate if the iov is
unaligned.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/uio.h |  2 +
 lib/iov_iter.c      | 92 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 94 insertions(+)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 739285fe5a2f..34ba4a731179 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -219,6 +219,8 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
 #endif
 
 size_t iov_iter_zero(size_t bytes, struct iov_iter *);
+bool iov_iter_is_aligned(const struct iov_iter *i, unsigned addr_mask,
+			unsigned len_mask);
 unsigned long iov_iter_alignment(const struct iov_iter *i);
 unsigned long iov_iter_gap_alignment(const struct iov_iter *i);
 void iov_iter_init(struct iov_iter *i, unsigned int direction, const struct iovec *iov,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 6dd5330f7a99..a39b24496878 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1268,6 +1268,98 @@ void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count)
 }
 EXPORT_SYMBOL(iov_iter_discard);
 
+static bool iov_iter_aligned_iovec(const struct iov_iter *i, unsigned addr_mask,
+				   unsigned len_mask)
+{
+	size_t size = i->count;
+	size_t skip = i->iov_offset;
+	unsigned k;
+
+	for (k = 0; k < i->nr_segs; k++, skip = 0) {
+		size_t len = i->iov[k].iov_len - skip;
+
+		if (len > size)
+			len = size;
+		if (len & len_mask)
+			return false;
+		if ((unsigned long)(i->iov[k].iov_base + skip) & addr_mask)
+			return false;
+
+		size -= len;
+		if (!size)
+			break;
+	}
+	return true;
+}
+
+static bool iov_iter_aligned_bvec(const struct iov_iter *i, unsigned addr_mask,
+				  unsigned len_mask)
+{
+	size_t size = i->count;
+	unsigned skip = i->iov_offset;
+	unsigned k;
+
+	for (k = 0; k < i->nr_segs; k++, skip = 0) {
+		size_t len = i->bvec[k].bv_len - skip;
+
+		if (len > size)
+			len = size;
+		if (len & len_mask)
+			return false;
+		if ((unsigned long)(i->bvec[k].bv_offset + skip) & addr_mask)
+			return false;
+
+		size -= len;
+		if (!size)
+			break;
+	}
+	return true;
+}
+
+/**
+ * iov_iter_is_aligned() - Check if the addresses and lengths of each segments
+ * 	are aligned to the parameters.
+ *
+ * @i: &struct iov_iter to restore
+ * @addr_mask: bit mask to check against the iov element's addresses
+ * @len_mask: bit mask to check against the iov element's lengths
+ *
+ * Return: false if any addresses or lengths intersect with the provided masks
+ */
+bool iov_iter_is_aligned(const struct iov_iter *i, unsigned addr_mask,
+			 unsigned len_mask)
+{
+	if (likely(iter_is_iovec(i) || iov_iter_is_kvec(i)))
+		return iov_iter_aligned_iovec(i, addr_mask, len_mask);
+
+	if (iov_iter_is_bvec(i))
+		return iov_iter_aligned_bvec(i, addr_mask, len_mask);
+
+	if (iov_iter_is_pipe(i)) {
+		unsigned int p_mask = i->pipe->ring_size - 1;
+		size_t size = i->count;
+
+		if (size & len_mask)
+			return false;
+		if (size && allocated(&i->pipe->bufs[i->head & p_mask])) {
+			if (i->iov_offset & addr_mask)
+				return false;
+		}
+
+		return true;
+	}
+
+	if (iov_iter_is_xarray(i)) {
+		if (i->count & len_mask)
+			return false;
+		if ((i->xarray_start + i->iov_offset) & addr_mask)
+			return false;
+	}
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(iov_iter_is_aligned);
+
 static unsigned long iov_iter_alignment_iovec(const struct iov_iter *i)
 {
 	unsigned long res = 0;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv6 09/11] block: introduce bdev_iter_is_aligned helper
  2022-06-10 19:58 [PATCHv6 00/11] direct-io dma alignment Keith Busch
                   ` (7 preceding siblings ...)
  2022-06-10 19:58 ` [PATCHv6 08/11] iov: introduce iov_iter_aligned Keith Busch
@ 2022-06-10 19:58 ` Keith Busch
  2022-06-10 19:58 ` [PATCHv6 10/11] block: relax direct io memory alignment Keith Busch
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 51+ messages in thread
From: Keith Busch @ 2022-06-10 19:58 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

From: Keith Busch <kbusch@kernel.org>

Provide a convenient function for this repeatable coding pattern.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/blkdev.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ab7e6aa17954..fb5c177708d5 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1371,6 +1371,13 @@ static inline unsigned int bdev_dma_alignment(struct block_device *bdev)
 	return queue_dma_alignment(bdev_get_queue(bdev));
 }
 
+static inline bool bdev_iter_is_aligned(struct block_device *bdev,
+					struct iov_iter *iter)
+{
+	return iov_iter_is_aligned(iter, bdev_dma_alignment(bdev),
+				   bdev_logical_block_size(bdev) - 1);
+}
+
 static inline int blk_rq_aligned(struct request_queue *q, unsigned long addr,
 				 unsigned int len)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv6 10/11] block: relax direct io memory alignment
  2022-06-10 19:58 [PATCHv6 00/11] direct-io dma alignment Keith Busch
                   ` (8 preceding siblings ...)
  2022-06-10 19:58 ` [PATCHv6 09/11] block: introduce bdev_iter_is_aligned helper Keith Busch
@ 2022-06-10 19:58 ` Keith Busch
  2022-06-10 19:58 ` [PATCHv6 11/11] iomap: add support for dma aligned direct-io Keith Busch
  2022-06-13 21:22 ` [PATCHv6 00/11] direct-io dma alignment Jens Axboe
  11 siblings, 0 replies; 51+ messages in thread
From: Keith Busch @ 2022-06-10 19:58 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

From: Keith Busch <kbusch@kernel.org>

Use the address alignment requirements from the block_device for direct
io instead of requiring addresses be aligned to the block size. User
space can discover the alignment requirements from the dma_alignment
queue attribute.

User space can specify any hardware compatible DMA offset for each
segment, but every segment length is still required to be a multiple of
the block size.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/bio.c            | 9 +++++++++
 block/fops.c           | 4 ++--
 include/linux/blkdev.h | 5 +++++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 5618c6a4b3a3..551f1d12208b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1220,7 +1220,16 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
 	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
 
+	/*
+	 * Each segment in the iov is required to be a block size multiple.
+	 * However, we may not be able to get the entire segment if it spans
+	 * more pages than bi_max_vecs allows, so we have to ALIGN_DOWN the
+	 * result to ensure the bio's total size is correct. The remainder of
+	 * the iov data will be picked up in the next bio iteration.
+	 */
 	size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset);
+	if (size > 0)
+		size = ALIGN_DOWN(size, bdev_logical_block_size(bio->bi_bdev));
 	if (unlikely(size <= 0))
 		return size ? size : -EFAULT;
 
diff --git a/block/fops.c b/block/fops.c
index 9d32df6fc315..86d3cab9bf93 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -45,8 +45,8 @@ static unsigned int dio_bio_write_op(struct kiocb *iocb)
 static bool blkdev_dio_unaligned(struct block_device *bdev, loff_t pos,
 			      struct iov_iter *iter)
 {
-	return ((pos | iov_iter_alignment(iter)) &
-	    (bdev_logical_block_size(bdev) - 1));
+	return pos & (bdev_logical_block_size(bdev) - 1) ||
+		!bdev_iter_is_aligned(bdev, iter);
 }
 
 #define DIO_INLINE_BIO_VECS 4
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index fb5c177708d5..914c613d81da 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -425,6 +425,11 @@ struct request_queue {
 	unsigned long		nr_requests;	/* Max # of requests */
 
 	unsigned int		dma_pad_mask;
+	/*
+	 * Drivers that set dma_alignment to less than 511 must be prepared to
+	 * handle individual bvec's that are not a multiple of a SECTOR_SIZE
+	 * due to possible offsets.
+	 */
 	unsigned int		dma_alignment;
 
 #ifdef CONFIG_BLK_INLINE_ENCRYPTION
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-06-10 19:58 [PATCHv6 00/11] direct-io dma alignment Keith Busch
                   ` (9 preceding siblings ...)
  2022-06-10 19:58 ` [PATCHv6 10/11] block: relax direct io memory alignment Keith Busch
@ 2022-06-10 19:58 ` Keith Busch
  2022-06-23 18:29   ` Eric Farman
  2022-07-22  7:36     ` [f2fs-dev] " Eric Biggers
  2022-06-13 21:22 ` [PATCHv6 00/11] direct-io dma alignment Jens Axboe
  11 siblings, 2 replies; 51+ messages in thread
From: Keith Busch @ 2022-06-10 19:58 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

From: Keith Busch <kbusch@kernel.org>

Use the address alignment requirements from the block_device for direct
io instead of requiring addresses be aligned to the block size.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/direct-io.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 370c3241618a..5d098adba443 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
 	struct inode *inode = iter->inode;
 	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
 	unsigned int fs_block_size = i_blocksize(inode), pad;
-	unsigned int align = iov_iter_alignment(dio->submit.iter);
 	loff_t length = iomap_length(iter);
 	loff_t pos = iter->pos;
 	unsigned int bio_opf;
@@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
 	size_t copied = 0;
 	size_t orig_count;
 
-	if ((pos | length | align) & ((1 << blkbits) - 1))
+	if ((pos | length) & ((1 << blkbits) - 1) ||
+	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
 		return -EINVAL;
 
 	if (iomap->type == IOMAP_UNWRITTEN) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 07/11] block/bounce: count bytes instead of sectors
  2022-06-10 19:58 ` [PATCHv6 07/11] block/bounce: " Keith Busch
@ 2022-06-13 14:22   ` Christoph Hellwig
  2022-07-22 22:01   ` Bart Van Assche
  1 sibling, 0 replies; 51+ messages in thread
From: Christoph Hellwig @ 2022-06-13 14:22 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch,
	bvanassche, damien.lemoal, ebiggers, pankydev8, Keith Busch,
	Pankaj Raghav

On Fri, Jun 10, 2022 at 12:58:26PM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Individual bv_len's may not be a sector size.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 00/11] direct-io dma alignment
  2022-06-10 19:58 [PATCHv6 00/11] direct-io dma alignment Keith Busch
                   ` (10 preceding siblings ...)
  2022-06-10 19:58 ` [PATCHv6 11/11] iomap: add support for dma aligned direct-io Keith Busch
@ 2022-06-13 21:22 ` Jens Axboe
  11 siblings, 0 replies; 51+ messages in thread
From: Jens Axboe @ 2022-06-13 21:22 UTC (permalink / raw)
  To: kbusch, linux-block, linux-nvme, linux-fsdevel
  Cc: bvanassche, ebiggers, damien.lemoal, Kernel-team, pankydev8,
	kbusch, Christoph Hellwig

On Fri, 10 Jun 2022 12:58:19 -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> The previous version is available here:
> 
>   https://lore.kernel.org/linux-block/Yp4qQRI5awiycml1@kbusch-mbp.dhcp.thefacebook.com/T/#m0a93b6392038aad6144e066fb5ada2cbf316f78e
> 
> Changes from the previous are all trivial changes:
> 
> [...]

Applied, thanks!

[01/11] block: fix infinite loop for invalid zone append
        commit: 1180b55c93f6b060ad930db151fe6d2b425f9215
[02/11] block/bio: remove duplicate append pages code
        commit: 7a2b81b95a89e578343b1c944ddd64d1b14ee49a
[03/11] block: export dma_alignment attribute
        commit: 5f507439f051daaa1e3273ff536afda3ad1f1505
[04/11] block: introduce bdev_dma_alignment helper
        commit: 24b10a6e0bc22619535b0ed982b7735910981661
[05/11] block: add a helper function for dio alignment
        commit: 8a39418810a65f0bcbe559261ef011fe0e298eeb
[06/11] block/merge: count bytes instead of sectors
        commit: 4ff782f24a4cad4b033d0f4f6e38cd50e0d463b0
[07/11] block/bounce: count bytes instead of sectors
        commit: 4b5310470e72d77c9b52f8544b98aa8cf77d956f
[08/11] iov: introduce iov_iter_aligned
        commit: ab7c0c3abb2e5fa15655e4b87bb7b937ca7e18c3
[09/11] block: introduce bdev_iter_is_aligned helper
        commit: 72230944b7a53280c1f351a0d5cafed12732ec21
[10/11] block: relax direct io memory alignment
        commit: 84f970d415ef4d048e664ac308792eb93d0152fc
[11/11] iomap: add support for dma aligned direct-io
        commit: 40e11e7a6cc74f11b5ca23ceefec7c84af5c4c73

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-06-10 19:58 ` [PATCHv6 11/11] iomap: add support for dma aligned direct-io Keith Busch
@ 2022-06-23 18:29   ` Eric Farman
  2022-06-23 18:51     ` Keith Busch
  2022-07-22  7:36     ` [f2fs-dev] " Eric Biggers
  1 sibling, 1 reply; 51+ messages in thread
From: Eric Farman @ 2022-06-23 18:29 UTC (permalink / raw)
  To: Keith Busch, linux-fsdevel, linux-block, linux-nvme,
	Christian Borntraeger
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

On Fri, 2022-06-10 at 12:58 -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Use the address alignment requirements from the block_device for
> direct
> io instead of requiring addresses be aligned to the block size.

Hi Keith,

Our s390 PV guests recently started failing to boot from a -next host,
and git blame brought me here.

As near as I have been able to tell, we start tripping up on this code
from patch 9 [1] that gets invoked with this patch:

>	for (k = 0; k < i->nr_segs; k++, skip = 0) {
>		size_t len = i->iov[k].iov_len - skip;
>
>		if (len > size)
>			len = size;
>		if (len & len_mask)
>			return false;

The iovec we're failing on has two segments, one with a len of x200
(and base of x...000) and another with a len of xe00 (and a base of
x...200), while len_mask is of course xfff.

So before I go any further on what we might have broken, do you happen
to have any suggestions what might be going on here, or something I
should try?

Thanks,
Eric

[1] https://lore.kernel.org/r/20220610195830.3574005-9-kbusch@fb.com/

> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/direct-io.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 370c3241618a..5d098adba443 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct
> iomap_iter *iter,
>  	struct inode *inode = iter->inode;
>  	unsigned int blkbits =
> blksize_bits(bdev_logical_block_size(iomap->bdev));
>  	unsigned int fs_block_size = i_blocksize(inode), pad;
> -	unsigned int align = iov_iter_alignment(dio->submit.iter);
>  	loff_t length = iomap_length(iter);
>  	loff_t pos = iter->pos;
>  	unsigned int bio_opf;
> @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct
> iomap_iter *iter,
>  	size_t copied = 0;
>  	size_t orig_count;
>  
> -	if ((pos | length | align) & ((1 << blkbits) - 1))
> +	if ((pos | length) & ((1 << blkbits) - 1) ||
> +	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
>  		return -EINVAL;
>  
>  	if (iomap->type == IOMAP_UNWRITTEN) {


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-06-23 18:29   ` Eric Farman
@ 2022-06-23 18:51     ` Keith Busch
  2022-06-23 19:11       ` Keith Busch
  0 siblings, 1 reply; 51+ messages in thread
From: Keith Busch @ 2022-06-23 18:51 UTC (permalink / raw)
  To: Eric Farman
  Cc: Keith Busch, linux-fsdevel, linux-block, linux-nvme,
	Christian Borntraeger, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, ebiggers, pankydev8

On Thu, Jun 23, 2022 at 02:29:13PM -0400, Eric Farman wrote:
> On Fri, 2022-06-10 at 12:58 -0700, Keith Busch wrote:
> > From: Keith Busch <kbusch@kernel.org>
> > 
> > Use the address alignment requirements from the block_device for
> > direct
> > io instead of requiring addresses be aligned to the block size.
> 
> Hi Keith,
> 
> Our s390 PV guests recently started failing to boot from a -next host,
> and git blame brought me here.
> 
> As near as I have been able to tell, we start tripping up on this code
> from patch 9 [1] that gets invoked with this patch:
> 
> >	for (k = 0; k < i->nr_segs; k++, skip = 0) {
> >		size_t len = i->iov[k].iov_len - skip;
> >
> >		if (len > size)
> >			len = size;
> >		if (len & len_mask)
> >			return false;
> 
> The iovec we're failing on has two segments, one with a len of x200
> (and base of x...000) and another with a len of xe00 (and a base of
> x...200), while len_mask is of course xfff.
> 
> So before I go any further on what we might have broken, do you happen
> to have any suggestions what might be going on here, or something I
> should try?

Thanks for the notice, sorry for the trouble. This check wasn't intended to
have any difference from the previous code with respect to the vector lengths.

Could you tell me if you're accessing this through the block device direct-io,
or through iomap filesystem?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-06-23 18:51     ` Keith Busch
@ 2022-06-23 19:11       ` Keith Busch
  2022-06-23 20:32         ` Eric Farman
  0 siblings, 1 reply; 51+ messages in thread
From: Keith Busch @ 2022-06-23 19:11 UTC (permalink / raw)
  To: Eric Farman
  Cc: Keith Busch, linux-fsdevel, linux-block, linux-nvme,
	Christian Borntraeger, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, ebiggers, pankydev8

On Thu, Jun 23, 2022 at 12:51:08PM -0600, Keith Busch wrote:
> On Thu, Jun 23, 2022 at 02:29:13PM -0400, Eric Farman wrote:
> > On Fri, 2022-06-10 at 12:58 -0700, Keith Busch wrote:
> > > From: Keith Busch <kbusch@kernel.org>
> > > 
> > > Use the address alignment requirements from the block_device for
> > > direct
> > > io instead of requiring addresses be aligned to the block size.
> > 
> > Hi Keith,
> > 
> > Our s390 PV guests recently started failing to boot from a -next host,
> > and git blame brought me here.
> > 
> > As near as I have been able to tell, we start tripping up on this code
> > from patch 9 [1] that gets invoked with this patch:
> > 
> > >	for (k = 0; k < i->nr_segs; k++, skip = 0) {
> > >		size_t len = i->iov[k].iov_len - skip;
> > >
> > >		if (len > size)
> > >			len = size;
> > >		if (len & len_mask)
> > >			return false;
> > 
> > The iovec we're failing on has two segments, one with a len of x200
> > (and base of x...000) and another with a len of xe00 (and a base of
> > x...200), while len_mask is of course xfff.
> > 
> > So before I go any further on what we might have broken, do you happen
> > to have any suggestions what might be going on here, or something I
> > should try?
> 
> Thanks for the notice, sorry for the trouble. This check wasn't intended to
> have any difference from the previous code with respect to the vector lengths.
> 
> Could you tell me if you're accessing this through the block device direct-io,
> or through iomap filesystem?

If using iomap, the previous check was this:

	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
	unsigned int align = iov_iter_alignment(dio->submit.iter);
	...
	if ((pos | length | align) & ((1 << blkbits) - 1))
		return -EINVAL;

And if raw block device, it was this:

	if ((pos | iov_iter_alignment(iter)) &
	    (bdev_logical_block_size(bdev) - 1))
		return -EINVAL;

The result of "iov_iter_alignment()" would include "0xe00 | 0x200" in your
example, and checked against 0xfff should have been failing prior to this
patch. Unless I'm missing something...

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-06-23 19:11       ` Keith Busch
@ 2022-06-23 20:32         ` Eric Farman
  2022-06-23 21:34           ` Eric Farman
  0 siblings, 1 reply; 51+ messages in thread
From: Eric Farman @ 2022-06-23 20:32 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, linux-fsdevel, linux-block, linux-nvme,
	Christian Borntraeger, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, ebiggers, pankydev8

On Thu, 2022-06-23 at 13:11 -0600, Keith Busch wrote:
> On Thu, Jun 23, 2022 at 12:51:08PM -0600, Keith Busch wrote:
> > On Thu, Jun 23, 2022 at 02:29:13PM -0400, Eric Farman wrote:
> > > On Fri, 2022-06-10 at 12:58 -0700, Keith Busch wrote:
> > > > From: Keith Busch <kbusch@kernel.org>
> > > > 
> > > > Use the address alignment requirements from the block_device
> > > > for
> > > > direct
> > > > io instead of requiring addresses be aligned to the block size.
> > > 
> > > Hi Keith,
> > > 
> > > Our s390 PV guests recently started failing to boot from a -next
> > > host,
> > > and git blame brought me here.
> > > 
> > > As near as I have been able to tell, we start tripping up on this
> > > code
> > > from patch 9 [1] that gets invoked with this patch:
> > > 
> > > > 	for (k = 0; k < i->nr_segs; k++, skip = 0) {
> > > > 		size_t len = i->iov[k].iov_len - skip;
> > > > 
> > > > 		if (len > size)
> > > > 			len = size;
> > > > 		if (len & len_mask)
> > > > 			return false;
> > > 
> > > The iovec we're failing on has two segments, one with a len of
> > > x200
> > > (and base of x...000) and another with a len of xe00 (and a base
> > > of
> > > x...200), while len_mask is of course xfff.
> > > 
> > > So before I go any further on what we might have broken, do you
> > > happen
> > > to have any suggestions what might be going on here, or something
> > > I
> > > should try?
> > 
> > Thanks for the notice, sorry for the trouble. This check wasn't
> > intended to
> > have any difference from the previous code with respect to the
> > vector lengths.
> > 
> > Could you tell me if you're accessing this through the block device
> > direct-io,
> > or through iomap filesystem?

Reasonably certain the failure's on iomap. I'd reverted the subject
patch from next-20220622 and got things in working order.

> 
> If using iomap, the previous check was this:
> 
> 	unsigned int blkbits =
> blksize_bits(bdev_logical_block_size(iomap->bdev));
> 	unsigned int align = iov_iter_alignment(dio->submit.iter);
> 	...
> 	if ((pos | length | align) & ((1 << blkbits) - 1))
> 		return -EINVAL;
> 
> 
...
> The result of "iov_iter_alignment()" would include "0xe00 | 0x200" in
> your
> example, and checked against 0xfff should have been failing prior to
> this
> patch. Unless I'm missing something...

Nope, you're not. I didn't look back at what the old check was doing,
just saw "0xe00 and 0x200" and thought "oh there's one page" instead of
noting the code was or'ing them. My bad.

That was the last entry in my trace before the guest gave up, as
everything else through this code up to that point seemed okay. I'll
pick up the working case and see if I can get a clearer picture between
the two.



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-06-23 20:32         ` Eric Farman
@ 2022-06-23 21:34           ` Eric Farman
  2022-06-27 15:21             ` Eric Farman
  0 siblings, 1 reply; 51+ messages in thread
From: Eric Farman @ 2022-06-23 21:34 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, linux-fsdevel, linux-block, linux-nvme,
	Christian Borntraeger, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, ebiggers, pankydev8

On Thu, 2022-06-23 at 16:32 -0400, Eric Farman wrote:
> On Thu, 2022-06-23 at 13:11 -0600, Keith Busch wrote:
> > On Thu, Jun 23, 2022 at 12:51:08PM -0600, Keith Busch wrote:
> > > On Thu, Jun 23, 2022 at 02:29:13PM -0400, Eric Farman wrote:
> > > > On Fri, 2022-06-10 at 12:58 -0700, Keith Busch wrote:
> > > > > From: Keith Busch <kbusch@kernel.org>
> > > > > 
> > > > > Use the address alignment requirements from the block_device
> > > > > for
> > > > > direct
> > > > > io instead of requiring addresses be aligned to the block
> > > > > size.
> > > > 
> > > > Hi Keith,
> > > > 
> > > > Our s390 PV guests recently started failing to boot from a
> > > > -next
> > > > host,
> > > > and git blame brought me here.
> > > > 
> > > > As near as I have been able to tell, we start tripping up on
> > > > this
> > > > code
> > > > from patch 9 [1] that gets invoked with this patch:
> > > > 
> > > > > 	for (k = 0; k < i->nr_segs; k++, skip = 0) {
> > > > > 		size_t len = i->iov[k].iov_len - skip;
> > > > > 
> > > > > 		if (len > size)
> > > > > 			len = size;
> > > > > 		if (len & len_mask)
> > > > > 			return false;
> > > > 
> > > > The iovec we're failing on has two segments, one with a len of
> > > > x200
> > > > (and base of x...000) and another with a len of xe00 (and a
> > > > base
> > > > of
> > > > x...200), while len_mask is of course xfff.
> > > > 
> > > > So before I go any further on what we might have broken, do you
> > > > happen
> > > > to have any suggestions what might be going on here, or
> > > > something
> > > > I
> > > > should try?
> > > 
> > > Thanks for the notice, sorry for the trouble. This check wasn't
> > > intended to
> > > have any difference from the previous code with respect to the
> > > vector lengths.
> > > 
> > > Could you tell me if you're accessing this through the block
> > > device
> > > direct-io,
> > > or through iomap filesystem?
> 
> Reasonably certain the failure's on iomap. I'd reverted the subject
> patch from next-20220622 and got things in working order.
> 
> > If using iomap, the previous check was this:
> > 
> > 	unsigned int blkbits =
> > blksize_bits(bdev_logical_block_size(iomap->bdev));
> > 	unsigned int align = iov_iter_alignment(dio->submit.iter);
> > 	...
> > 	if ((pos | length | align) & ((1 << blkbits) - 1))
> > 		return -EINVAL;
> > 
> > 
> ...
> > The result of "iov_iter_alignment()" would include "0xe00 | 0x200"
> > in
> > your
> > example, and checked against 0xfff should have been failing prior
> > to
> > this
> > patch. Unless I'm missing something...
> 
> Nope, you're not. I didn't look back at what the old check was doing,
> just saw "0xe00 and 0x200" and thought "oh there's one page" instead
> of
> noting the code was or'ing them. My bad.
> 
> That was the last entry in my trace before the guest gave up, as
> everything else through this code up to that point seemed okay. I'll
> pick up the working case and see if I can get a clearer picture
> between
> the two.

Looking over the trace again, I realize I did dump iov_iter_alignment()
as a comparator, and I see one pass through that had a non-zero
response but bdev_iter_is_aligned() returned true...

count = x1000
iov_offset = x0
nr_segs = 1
iov_len = x1000	(len_mask = xfff)
iov_base = x...200 (addr_mask = x1ff)

That particular pass through is in the middle of the stuff it tried to
do, so I don't know if that's the cause or not but it strikes me as
unusual. Will look into that tomorrow and report back.

> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-06-23 21:34           ` Eric Farman
@ 2022-06-27 15:21             ` Eric Farman
  2022-06-27 15:36               ` Keith Busch
  0 siblings, 1 reply; 51+ messages in thread
From: Eric Farman @ 2022-06-27 15:21 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, linux-fsdevel, linux-block, linux-nvme,
	Christian Borntraeger, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, ebiggers, pankydev8, Halil Pasic

On Thu, 2022-06-23 at 17:34 -0400, Eric Farman wrote:
> On Thu, 2022-06-23 at 16:32 -0400, Eric Farman wrote:
> > On Thu, 2022-06-23 at 13:11 -0600, Keith Busch wrote:
> > > On Thu, Jun 23, 2022 at 12:51:08PM -0600, Keith Busch wrote:
> > > > On Thu, Jun 23, 2022 at 02:29:13PM -0400, Eric Farman wrote:
> > > > > On Fri, 2022-06-10 at 12:58 -0700, Keith Busch wrote:
> > > > > > From: Keith Busch <kbusch@kernel.org>
> > > > > > 
> > > > > > Use the address alignment requirements from the
> > > > > > block_device
> > > > > > for
> > > > > > direct
> > > > > > io instead of requiring addresses be aligned to the block
> > > > > > size.
> > > > > 
> > > > > Hi Keith,
> > > > > 
> > > > > Our s390 PV guests recently started failing to boot from a
> > > > > -next
> > > > > host,
> > > > > and git blame brought me here.
> > > > > 
> > > > > As near as I have been able to tell, we start tripping up on
> > > > > this
> > > > > code
> > > > > from patch 9 [1] that gets invoked with this patch:
> > > > > 
> > > > > > 	for (k = 0; k < i->nr_segs; k++, skip = 0) {
> > > > > > 		size_t len = i->iov[k].iov_len - skip;
> > > > > > 
> > > > > > 		if (len > size)
> > > > > > 			len = size;
> > > > > > 		if (len & len_mask)
> > > > > > 			return false;
> > > > > 
> > > > > The iovec we're failing on has two segments, one with a len
> > > > > of
> > > > > x200
> > > > > (and base of x...000) and another with a len of xe00 (and a
> > > > > base
> > > > > of
> > > > > x...200), while len_mask is of course xfff.
> > > > > 
> > > > > So before I go any further on what we might have broken, do
> > > > > you
> > > > > happen
> > > > > to have any suggestions what might be going on here, or
> > > > > something
> > > > > I
> > > > > should try?
> > > > 
> > > > Thanks for the notice, sorry for the trouble. This check wasn't
> > > > intended to
> > > > have any difference from the previous code with respect to the
> > > > vector lengths.
> > > > 
> > > > Could you tell me if you're accessing this through the block
> > > > device
> > > > direct-io,
> > > > or through iomap filesystem?
> > 
> > Reasonably certain the failure's on iomap. I'd reverted the subject
> > patch from next-20220622 and got things in working order.
> > 
> > > If using iomap, the previous check was this:
> > > 
> > > 	unsigned int blkbits =
> > > blksize_bits(bdev_logical_block_size(iomap->bdev));
> > > 	unsigned int align = iov_iter_alignment(dio->submit.iter);
> > > 	...
> > > 	if ((pos | length | align) & ((1 << blkbits) - 1))
> > > 		return -EINVAL;
> > > 
> > > 
> > ...
> > > The result of "iov_iter_alignment()" would include "0xe00 |
> > > 0x200"
> > > in
> > > your
> > > example, and checked against 0xfff should have been failing prior
> > > to
> > > this
> > > patch. Unless I'm missing something...
> > 
> > Nope, you're not. I didn't look back at what the old check was
> > doing,
> > just saw "0xe00 and 0x200" and thought "oh there's one page"
> > instead
> > of
> > noting the code was or'ing them. My bad.
> > 
> > That was the last entry in my trace before the guest gave up, as
> > everything else through this code up to that point seemed okay.
> > I'll
> > pick up the working case and see if I can get a clearer picture
> > between
> > the two.
> 
> Looking over the trace again, I realize I did dump
> iov_iter_alignment()
> as a comparator, and I see one pass through that had a non-zero
> response but bdev_iter_is_aligned() returned true...
> 
> count = x1000
> iov_offset = x0
> nr_segs = 1
> iov_len = x1000	(len_mask = xfff)
> iov_base = x...200 (addr_mask = x1ff)
> 
> That particular pass through is in the middle of the stuff it tried
> to
> do, so I don't know if that's the cause or not but it strikes me as
> unusual. Will look into that tomorrow and report back.
> 

Apologies, it took me an extra day to get back to this, but it is
indeed this pass through that's causing our boot failures. I note that
the old code (in iomap_dio_bio_iter), did:

        if ((pos | length | align) & ((1 << blkbits) - 1))
                return -EINVAL;

With blkbits equal to 12, the resulting mask was 0x0fff against an
align value (from iov_iter_alignment) of x200 kicks us out.

The new code (in iov_iter_aligned_iovec), meanwhile, compares this:

                if ((unsigned long)(i->iov[k].iov_base + skip) &
addr_mask)
                        return false;

iov_base (and the output of the old iov_iter_aligned_iovec() routine)
is x200, but since addr_mask is x1ff this check provides a different
response than it used to.

To check this, I changed the comparator to len_mask (almost certainly
not the right answer since addr_mask is then unused, but it was good
for a quick test), and our PV guests are able to boot again with -next
running in the host.

Thanks,
Eric


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-06-27 15:21             ` Eric Farman
@ 2022-06-27 15:36               ` Keith Busch
  2022-06-28  9:00                 ` Halil Pasic
  0 siblings, 1 reply; 51+ messages in thread
From: Keith Busch @ 2022-06-27 15:36 UTC (permalink / raw)
  To: Eric Farman
  Cc: Keith Busch, linux-fsdevel, linux-block, linux-nvme,
	Christian Borntraeger, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, ebiggers, pankydev8, Halil Pasic

On Mon, Jun 27, 2022 at 11:21:20AM -0400, Eric Farman wrote:
> 
> Apologies, it took me an extra day to get back to this, but it is
> indeed this pass through that's causing our boot failures. I note that
> the old code (in iomap_dio_bio_iter), did:
> 
>         if ((pos | length | align) & ((1 << blkbits) - 1))
>                 return -EINVAL;
> 
> With blkbits equal to 12, the resulting mask was 0x0fff against an
> align value (from iov_iter_alignment) of x200 kicks us out.
> 
> The new code (in iov_iter_aligned_iovec), meanwhile, compares this:
> 
>                 if ((unsigned long)(i->iov[k].iov_base + skip) &
> addr_mask)
>                         return false;
> 
> iov_base (and the output of the old iov_iter_aligned_iovec() routine)
> is x200, but since addr_mask is x1ff this check provides a different
> response than it used to.
> 
> To check this, I changed the comparator to len_mask (almost certainly
> not the right answer since addr_mask is then unused, but it was good
> for a quick test), and our PV guests are able to boot again with -next
> running in the host.

This raises more questions for me. It sounds like your process used to get an
EINVAL error, and it wants to continue getting an EINVAL error instead of
letting the direct-io request proceed. Is that correct? If so, could you
provide more details on what issue occurs with dispatching this request?

If you really need to restrict address' alignment to the storage's logical
block size, I think your storage driver needs to set the dma_alignment queue
limit to that value.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-06-27 15:36               ` Keith Busch
@ 2022-06-28  9:00                 ` Halil Pasic
  2022-06-28 15:20                   ` Eric Farman
  0 siblings, 1 reply; 51+ messages in thread
From: Halil Pasic @ 2022-06-28  9:00 UTC (permalink / raw)
  To: Keith Busch
  Cc: Eric Farman, Keith Busch, linux-fsdevel, linux-block, linux-nvme,
	Christian Borntraeger, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, ebiggers, pankydev8, Halil Pasic

On Mon, 27 Jun 2022 09:36:56 -0600
Keith Busch <kbusch@kernel.org> wrote:

> On Mon, Jun 27, 2022 at 11:21:20AM -0400, Eric Farman wrote:
> > 
> > Apologies, it took me an extra day to get back to this, but it is
> > indeed this pass through that's causing our boot failures. I note that
> > the old code (in iomap_dio_bio_iter), did:
> > 
> >         if ((pos | length | align) & ((1 << blkbits) - 1))
> >                 return -EINVAL;
> > 
> > With blkbits equal to 12, the resulting mask was 0x0fff against an
> > align value (from iov_iter_alignment) of x200 kicks us out.
> > 
> > The new code (in iov_iter_aligned_iovec), meanwhile, compares this:
> > 
> >                 if ((unsigned long)(i->iov[k].iov_base + skip) &
> > addr_mask)
> >                         return false;
> > 
> > iov_base (and the output of the old iov_iter_aligned_iovec() routine)
> > is x200, but since addr_mask is x1ff this check provides a different
> > response than it used to.
> > 
> > To check this, I changed the comparator to len_mask (almost certainly
> > not the right answer since addr_mask is then unused, but it was good
> > for a quick test), and our PV guests are able to boot again with -next
> > running in the host.  
> 
> This raises more questions for me. It sounds like your process used to get an
> EINVAL error, and it wants to continue getting an EINVAL error instead of
> letting the direct-io request proceed. Is that correct? 

Is my understanding as well. But I'm not familiar enough with the code to
tell where and how that -EINVAL gets handled.

BTW let me just point out that the bounce buffering via swiotlb needed
for PV is not unlikely to mess up the alignment of things. But I'm not
sure if that is relevant here.

Regards,
Halil

> If so, could you
> provide more details on what issue occurs with dispatching this request?
> 
> If you really need to restrict address' alignment to the storage's logical
> block size, I think your storage driver needs to set the dma_alignment queue
> limit to that value.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-06-28  9:00                 ` Halil Pasic
@ 2022-06-28 15:20                   ` Eric Farman
  2022-06-29  3:18                     ` Eric Farman
  0 siblings, 1 reply; 51+ messages in thread
From: Eric Farman @ 2022-06-28 15:20 UTC (permalink / raw)
  To: Halil Pasic, Keith Busch
  Cc: Keith Busch, linux-fsdevel, linux-block, linux-nvme,
	Christian Borntraeger, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, ebiggers, pankydev8

On Tue, 2022-06-28 at 11:00 +0200, Halil Pasic wrote:
> On Mon, 27 Jun 2022 09:36:56 -0600
> Keith Busch <kbusch@kernel.org> wrote:
> 
> > On Mon, Jun 27, 2022 at 11:21:20AM -0400, Eric Farman wrote:
> > > Apologies, it took me an extra day to get back to this, but it is
> > > indeed this pass through that's causing our boot failures. I note
> > > that
> > > the old code (in iomap_dio_bio_iter), did:
> > > 
> > >         if ((pos | length | align) & ((1 << blkbits) - 1))
> > >                 return -EINVAL;
> > > 
> > > With blkbits equal to 12, the resulting mask was 0x0fff against
> > > an
> > > align value (from iov_iter_alignment) of x200 kicks us out.
> > > 
> > > The new code (in iov_iter_aligned_iovec), meanwhile, compares
> > > this:
> > > 
> > >                 if ((unsigned long)(i->iov[k].iov_base + skip) &
> > > addr_mask)
> > >                         return false;
> > > 
> > > iov_base (and the output of the old iov_iter_aligned_iovec()
> > > routine)
> > > is x200, but since addr_mask is x1ff this check provides a
> > > different
> > > response than it used to.
> > > 
> > > To check this, I changed the comparator to len_mask (almost
> > > certainly
> > > not the right answer since addr_mask is then unused, but it was
> > > good
> > > for a quick test), and our PV guests are able to boot again with
> > > -next
> > > running in the host.  
> > 
> > This raises more questions for me. It sounds like your process used
> > to get an
> > EINVAL error, and it wants to continue getting an EINVAL error
> > instead of
> > letting the direct-io request proceed. Is that correct? 
> 
> Is my understanding as well. But I'm not familiar enough with the
> code to
> tell where and how that -EINVAL gets handled.
> 
> BTW let me just point out that the bounce buffering via swiotlb
> needed
> for PV is not unlikely to mess up the alignment of things. But I'm
> not
> sure if that is relevant here.
> 
> Regards,
> Halil
> 
> > If so, could you
> > provide more details on what issue occurs with dispatching this
> > request?

This error occurs reading the initial boot record for a guest, stating
QEMU was unable to read block zero from the device. The code that
complains doesn't appear to have anything that says "oh, got EINVAL,
try it this other way" but I haven't chased down if/where something in
between is expecting that and handling it in some unique way. I -think-
 I have an easier reproducer now, so maybe I'd be able to get a better
answer to this question.

> > 
> > If you really need to restrict address' alignment to the storage's
> > logical
> > block size, I think your storage driver needs to set the
> > dma_alignment queue
> > limit to that value.

It's possible that there's a problem in the virtio stack here, but the
failing configuration is a qcow image on the host rootfs, so it's not
using any distinct driver. The bdev request queue that ends up being
used is the same allocated out of blk_alloc_queue, so changing
dma_alignment there wouldn't work.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-06-28 15:20                   ` Eric Farman
@ 2022-06-29  3:18                     ` Eric Farman
  2022-06-29  3:52                       ` Keith Busch
  0 siblings, 1 reply; 51+ messages in thread
From: Eric Farman @ 2022-06-29  3:18 UTC (permalink / raw)
  To: Halil Pasic, Keith Busch
  Cc: Keith Busch, linux-fsdevel, linux-block, linux-nvme,
	Christian Borntraeger, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, ebiggers, pankydev8

On Tue, 2022-06-28 at 11:20 -0400, Eric Farman wrote:
> On Tue, 2022-06-28 at 11:00 +0200, Halil Pasic wrote:
> > On Mon, 27 Jun 2022 09:36:56 -0600
> > Keith Busch <kbusch@kernel.org> wrote:
> > 
> > > On Mon, Jun 27, 2022 at 11:21:20AM -0400, Eric Farman wrote:
> > > > Apologies, it took me an extra day to get back to this, but it
> > > > is
> > > > indeed this pass through that's causing our boot failures. I
> > > > note
> > > > that
> > > > the old code (in iomap_dio_bio_iter), did:
> > > > 
> > > >         if ((pos | length | align) & ((1 << blkbits) - 1))
> > > >                 return -EINVAL;
> > > > 
> > > > With blkbits equal to 12, the resulting mask was 0x0fff against
> > > > an
> > > > align value (from iov_iter_alignment) of x200 kicks us out.
> > > > 
> > > > The new code (in iov_iter_aligned_iovec), meanwhile, compares
> > > > this:
> > > > 
> > > >                 if ((unsigned long)(i->iov[k].iov_base + skip)
> > > > &
> > > > addr_mask)
> > > >                         return false;
> > > > 
> > > > iov_base (and the output of the old iov_iter_aligned_iovec()
> > > > routine)
> > > > is x200, but since addr_mask is x1ff this check provides a
> > > > different
> > > > response than it used to.
> > > > 
> > > > To check this, I changed the comparator to len_mask (almost
> > > > certainly
> > > > not the right answer since addr_mask is then unused, but it was
> > > > good
> > > > for a quick test), and our PV guests are able to boot again
> > > > with
> > > > -next
> > > > running in the host.  
> > > 
> > > This raises more questions for me. It sounds like your process
> > > used
> > > to get an
> > > EINVAL error, and it wants to continue getting an EINVAL error
> > > instead of
> > > letting the direct-io request proceed. Is that correct? 

Sort of. In the working case, I see a set of iovecs come through with
different counts:

base	count
0000	0001
0000	0200
0000	0400
0000	0800
0000	1000
0001	1000
0200	1000 << Change occurs here
0400	1000
0800	1000
1000	1000

EINVAL was being returned for any of these iovecs except the page-
aligned ones. Once the x200 request returns 0, the remainder of the
above list was skipped and the requests continue elsewhere on the file.

Still not sure how our request is getting us into this process. We're
simply asking to read a single block, but that's somewhere within an
image file.

> > 
> > Is my understanding as well. But I'm not familiar enough with the
> > code to
> > tell where and how that -EINVAL gets handled.
> > 
> > BTW let me just point out that the bounce buffering via swiotlb
> > needed
> > for PV is not unlikely to mess up the alignment of things. But I'm
> > not
> > sure if that is relevant here.

It's true that PV guests were the first to trip over this, but I've
since been able to reproduce this with a normal guest. So long as the
image file is connected with cache.direct=true, it's unbootable. That
should absolve the swiotlb bits from being at fault here.

> > 
> > Regards,
> > Halil
> > 
> > > If so, could you
> > > provide more details on what issue occurs with dispatching this
> > > request?
> 
> This error occurs reading the initial boot record for a guest,
> stating
> QEMU was unable to read block zero from the device. The code that
> complains doesn't appear to have anything that says "oh, got EINVAL,
> try it this other way" but I haven't chased down if/where something
> in
> between is expecting that and handling it in some unique way. I
> -think-
>  I have an easier reproducer now, so maybe I'd be able to get a
> better
> answer to this question.
> 
> > > If you really need to restrict address' alignment to the
> > > storage's
> > > logical
> > > block size, I think your storage driver needs to set the
> > > dma_alignment queue
> > > limit to that value.
> 
> It's possible that there's a problem in the virtio stack here, but
> the
> failing configuration is a qcow image on the host rootfs

(on an ext4 filesystem)

> , so it's not
> using any distinct driver. The bdev request queue that ends up being
> used is the same allocated out of blk_alloc_queue, so changing
> dma_alignment there wouldn't work.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-06-29  3:18                     ` Eric Farman
@ 2022-06-29  3:52                       ` Keith Busch
  2022-06-29 18:04                         ` Eric Farman
  0 siblings, 1 reply; 51+ messages in thread
From: Keith Busch @ 2022-06-29  3:52 UTC (permalink / raw)
  To: Eric Farman
  Cc: Halil Pasic, Keith Busch, linux-fsdevel, linux-block, linux-nvme,
	Christian Borntraeger, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, ebiggers, pankydev8

On Tue, Jun 28, 2022 at 11:18:34PM -0400, Eric Farman wrote:
> Sort of. In the working case, I see a set of iovecs come through with
> different counts:
> 
> base	count
> 0000	0001
> 0000	0200
> 0000	0400
> 0000	0800
> 0000	1000
> 0001	1000
> 0200	1000 << Change occurs here
> 0400	1000
> 0800	1000
> 1000	1000
> 
> EINVAL was being returned for any of these iovecs except the page-
> aligned ones. Once the x200 request returns 0, the remainder of the
> above list was skipped and the requests continue elsewhere on the file.
> 
> Still not sure how our request is getting us into this process. We're
> simply asking to read a single block, but that's somewhere within an
> image file.

I thought this was sounding like some kind of corruption. I tested ext4 on
various qemu devices with 4k logical block sizes, and it all looks okay there.

What block driver are you observing this with?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-06-29  3:52                       ` Keith Busch
@ 2022-06-29 18:04                         ` Eric Farman
  2022-06-29 19:07                           ` Keith Busch
  0 siblings, 1 reply; 51+ messages in thread
From: Eric Farman @ 2022-06-29 18:04 UTC (permalink / raw)
  To: Keith Busch
  Cc: Halil Pasic, Keith Busch, linux-fsdevel, linux-block, linux-nvme,
	Christian Borntraeger, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, ebiggers, pankydev8

On Tue, 2022-06-28 at 21:52 -0600, Keith Busch wrote:
> On Tue, Jun 28, 2022 at 11:18:34PM -0400, Eric Farman wrote:
> > Sort of. In the working case, I see a set of iovecs come through
> > with
> > different counts:
> > 
> > base	count
> > 0000	0001
> > 0000	0200
> > 0000	0400
> > 0000	0800
> > 0000	1000
> > 0001	1000
> > 0200	1000 << Change occurs here
> > 0400	1000
> > 0800	1000
> > 1000	1000
> > 
> > EINVAL was being returned for any of these iovecs except the page-
> > aligned ones. Once the x200 request returns 0, the remainder of the
> > above list was skipped and the requests continue elsewhere on the
> > file.
> > 
> > Still not sure how our request is getting us into this process.
> > We're
> > simply asking to read a single block, but that's somewhere within
> > an
> > image file.
> 
> I thought this was sounding like some kind of corruption. I tested
> ext4 on
> various qemu devices with 4k logical block sizes, and it all looks
> okay there.
> 
> What block driver are you observing this with?

s390 dasd

This made me think to change my rootfs, and of course the problem goes
away once on something like a SCSI volume.

So crawling through the dasd (instead of virtio) driver and I finally
find the point where a change to dma_alignment (which you mentioned
earlier) would actually fit.

Such a change fixes this for me, so I'll run it by our DASD guys.
Thanks for your help and patience.

Eric


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-06-29 18:04                         ` Eric Farman
@ 2022-06-29 19:07                           ` Keith Busch
  2022-06-29 19:28                             ` Eric Farman
  2022-06-30  5:45                             ` Christian Borntraeger
  0 siblings, 2 replies; 51+ messages in thread
From: Keith Busch @ 2022-06-29 19:07 UTC (permalink / raw)
  To: Eric Farman
  Cc: Halil Pasic, Keith Busch, linux-fsdevel, linux-block, linux-nvme,
	Christian Borntraeger, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, ebiggers, pankydev8

On Wed, Jun 29, 2022 at 02:04:47PM -0400, Eric Farman wrote:
> s390 dasd
> 
> This made me think to change my rootfs, and of course the problem goes
> away once on something like a SCSI volume.
> 
> So crawling through the dasd (instead of virtio) driver and I finally
> find the point where a change to dma_alignment (which you mentioned
> earlier) would actually fit.
> 
> Such a change fixes this for me, so I'll run it by our DASD guys.
> Thanks for your help and patience.

I'm assuming there's some driver or device requirement that's making this
necessary. Is the below driver change what you're looking for? If so, I think
you might want this regardless of this direct-io patch just because other
interfaces like blk_rq_map_user_iov() and blk_rq_aligned() align to it.

---
diff --git a/drivers/s390/block/dasd_fba.c b/drivers/s390/block/dasd_fba.c
index 60be7f7bf2d1..5c79fb02cded 100644
--- a/drivers/s390/block/dasd_fba.c
+++ b/drivers/s390/block/dasd_fba.c
@@ -780,6 +780,7 @@ static void dasd_fba_setup_blk_queue(struct dasd_block *block)
 	/* With page sized segments each segment can be translated into one idaw/tidaw */
 	blk_queue_max_segment_size(q, PAGE_SIZE);
 	blk_queue_segment_boundary(q, PAGE_SIZE - 1);
+	blk_queue_dma_alignment(q, PAGE_SIZE - 1);
 
 	q->limits.discard_granularity = logical_block_size;
 
--

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-06-29 19:07                           ` Keith Busch
@ 2022-06-29 19:28                             ` Eric Farman
  2022-06-30  5:45                             ` Christian Borntraeger
  1 sibling, 0 replies; 51+ messages in thread
From: Eric Farman @ 2022-06-29 19:28 UTC (permalink / raw)
  To: Keith Busch
  Cc: Halil Pasic, Keith Busch, linux-fsdevel, linux-block, linux-nvme,
	Christian Borntraeger, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, ebiggers, pankydev8

On Wed, 2022-06-29 at 13:07 -0600, Keith Busch wrote:
> On Wed, Jun 29, 2022 at 02:04:47PM -0400, Eric Farman wrote:
> > s390 dasd
> > 
> > This made me think to change my rootfs, and of course the problem
> > goes
> > away once on something like a SCSI volume.
> > 
> > So crawling through the dasd (instead of virtio) driver and I
> > finally
> > find the point where a change to dma_alignment (which you mentioned
> > earlier) would actually fit.
> > 
> > Such a change fixes this for me, so I'll run it by our DASD guys.
> > Thanks for your help and patience.
> 
> I'm assuming there's some driver or device requirement that's making
> this
> necessary. Is the below driver change what you're looking for? 

Yup, that's exactly what I have (in dasd_eckd.c) and indeed gets things
working again. Need to scrounge up some FBA volumes to test that
configuration and the change there.

> If so, I think
> you might want this regardless of this direct-io patch just because
> other
> interfaces like blk_rq_map_user_iov() and blk_rq_aligned() align to
> it.

Good point.

> 
> ---
> diff --git a/drivers/s390/block/dasd_fba.c
> b/drivers/s390/block/dasd_fba.c
> index 60be7f7bf2d1..5c79fb02cded 100644
> --- a/drivers/s390/block/dasd_fba.c
> +++ b/drivers/s390/block/dasd_fba.c
> @@ -780,6 +780,7 @@ static void dasd_fba_setup_blk_queue(struct
> dasd_block *block)
>  	/* With page sized segments each segment can be translated into
> one idaw/tidaw */
>  	blk_queue_max_segment_size(q, PAGE_SIZE);
>  	blk_queue_segment_boundary(q, PAGE_SIZE - 1);
> +	blk_queue_dma_alignment(q, PAGE_SIZE - 1);
>  
>  	q->limits.discard_granularity = logical_block_size;
>  
> --


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-06-29 19:07                           ` Keith Busch
  2022-06-29 19:28                             ` Eric Farman
@ 2022-06-30  5:45                             ` Christian Borntraeger
  1 sibling, 0 replies; 51+ messages in thread
From: Christian Borntraeger @ 2022-06-30  5:45 UTC (permalink / raw)
  To: Keith Busch, Eric Farman
  Cc: Halil Pasic, Keith Busch, linux-fsdevel, linux-block, linux-nvme,
	axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Stefan Haberland, Jan Höppner

CCing Stefan, Jan.

Below is necessary (in dasd-eckd.c) to make kvm boot working again (and this seems to be the right thing anyway).

Am 29.06.22 um 21:07 schrieb Keith Busch:
> On Wed, Jun 29, 2022 at 02:04:47PM -0400, Eric Farman wrote:
>> s390 dasd
>>
>> This made me think to change my rootfs, and of course the problem goes
>> away once on something like a SCSI volume.
>>
>> So crawling through the dasd (instead of virtio) driver and I finally
>> find the point where a change to dma_alignment (which you mentioned
>> earlier) would actually fit.
>>
>> Such a change fixes this for me, so I'll run it by our DASD guys.
>> Thanks for your help and patience.
> 
> I'm assuming there's some driver or device requirement that's making this
> necessary. Is the below driver change what you're looking for? If so, I think
> you might want this regardless of this direct-io patch just because other
> interfaces like blk_rq_map_user_iov() and blk_rq_aligned() align to it.
> 
> ---
> diff --git a/drivers/s390/block/dasd_fba.c b/drivers/s390/block/dasd_fba.c
> index 60be7f7bf2d1..5c79fb02cded 100644
> --- a/drivers/s390/block/dasd_fba.c
> +++ b/drivers/s390/block/dasd_fba.c
> @@ -780,6 +780,7 @@ static void dasd_fba_setup_blk_queue(struct dasd_block *block)
>   	/* With page sized segments each segment can be translated into one idaw/tidaw */
>   	blk_queue_max_segment_size(q, PAGE_SIZE);
>   	blk_queue_segment_boundary(q, PAGE_SIZE - 1);
> +	blk_queue_dma_alignment(q, PAGE_SIZE - 1);
>   
>   	q->limits.discard_granularity = logical_block_size;
>   
> --

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-06-10 19:58 ` [PATCHv6 11/11] iomap: add support for dma aligned direct-io Keith Busch
@ 2022-07-22  7:36     ` Eric Biggers
  2022-07-22  7:36     ` [f2fs-dev] " Eric Biggers
  1 sibling, 0 replies; 51+ messages in thread
From: Eric Biggers @ 2022-07-22  7:36 UTC (permalink / raw)
  To: Keith Busch, Jaegeuk Kim, Chao Yu
  Cc: linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch,
	bvanassche, damien.lemoal, pankydev8, Keith Busch,
	linux-f2fs-devel

[+f2fs list and maintainers]

On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Use the address alignment requirements from the block_device for direct
> io instead of requiring addresses be aligned to the block size.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/direct-io.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 370c3241618a..5d098adba443 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
>  	struct inode *inode = iter->inode;
>  	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
>  	unsigned int fs_block_size = i_blocksize(inode), pad;
> -	unsigned int align = iov_iter_alignment(dio->submit.iter);
>  	loff_t length = iomap_length(iter);
>  	loff_t pos = iter->pos;
>  	unsigned int bio_opf;
> @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
>  	size_t copied = 0;
>  	size_t orig_count;
>  
> -	if ((pos | length | align) & ((1 << blkbits) - 1))
> +	if ((pos | length) & ((1 << blkbits) - 1) ||
> +	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
>  		return -EINVAL;
>  
>  	if (iomap->type == IOMAP_UNWRITTEN) {

I noticed that this patch is going to break the following logic in
f2fs_should_use_dio() in fs/f2fs/file.c:

	/*
	 * Direct I/O not aligned to the disk's logical_block_size will be
	 * attempted, but will fail with -EINVAL.
	 *
	 * f2fs additionally requires that direct I/O be aligned to the
	 * filesystem block size, which is often a stricter requirement.
	 * However, f2fs traditionally falls back to buffered I/O on requests
	 * that are logical_block_size-aligned but not fs-block aligned.
	 *
	 * The below logic implements this behavior.
	 */
	align = iocb->ki_pos | iov_iter_alignment(iter);
	if (!IS_ALIGNED(align, i_blocksize(inode)) &&
	    IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev)))
		return false;

	return true;

So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical
block aligned.  This patch changes that.  The result is that DIO will sometimes
proceed in cases where the I/O doesn't have the fs block alignment required by
f2fs for all DIO.

Does anyone have any thoughts about what f2fs should be doing here?  I think
it's weird that f2fs has different behaviors for different degrees of
misalignment: fail with EINVAL if not logical block aligned, else fallback to
buffered I/O if not fs block aligned.  I think it should be one convention or
the other.  Any opinions about which one it should be?

(Note: if you blame the above code, it was written by me.  But I was just
preserving the existing behavior; I don't know the original motivation.)

- Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [f2fs-dev] [PATCHv6 11/11] iomap: add support for dma aligned direct-io
@ 2022-07-22  7:36     ` Eric Biggers
  0 siblings, 0 replies; 51+ messages in thread
From: Eric Biggers @ 2022-07-22  7:36 UTC (permalink / raw)
  To: Keith Busch, Jaegeuk Kim, Chao Yu
  Cc: axboe, bvanassche, pankydev8, damien.lemoal, linux-nvme,
	linux-f2fs-devel, linux-block, Keith Busch, linux-fsdevel,
	Kernel Team, hch

[+f2fs list and maintainers]

On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Use the address alignment requirements from the block_device for direct
> io instead of requiring addresses be aligned to the block size.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/direct-io.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 370c3241618a..5d098adba443 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
>  	struct inode *inode = iter->inode;
>  	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
>  	unsigned int fs_block_size = i_blocksize(inode), pad;
> -	unsigned int align = iov_iter_alignment(dio->submit.iter);
>  	loff_t length = iomap_length(iter);
>  	loff_t pos = iter->pos;
>  	unsigned int bio_opf;
> @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
>  	size_t copied = 0;
>  	size_t orig_count;
>  
> -	if ((pos | length | align) & ((1 << blkbits) - 1))
> +	if ((pos | length) & ((1 << blkbits) - 1) ||
> +	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
>  		return -EINVAL;
>  
>  	if (iomap->type == IOMAP_UNWRITTEN) {

I noticed that this patch is going to break the following logic in
f2fs_should_use_dio() in fs/f2fs/file.c:

	/*
	 * Direct I/O not aligned to the disk's logical_block_size will be
	 * attempted, but will fail with -EINVAL.
	 *
	 * f2fs additionally requires that direct I/O be aligned to the
	 * filesystem block size, which is often a stricter requirement.
	 * However, f2fs traditionally falls back to buffered I/O on requests
	 * that are logical_block_size-aligned but not fs-block aligned.
	 *
	 * The below logic implements this behavior.
	 */
	align = iocb->ki_pos | iov_iter_alignment(iter);
	if (!IS_ALIGNED(align, i_blocksize(inode)) &&
	    IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev)))
		return false;

	return true;

So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical
block aligned.  This patch changes that.  The result is that DIO will sometimes
proceed in cases where the I/O doesn't have the fs block alignment required by
f2fs for all DIO.

Does anyone have any thoughts about what f2fs should be doing here?  I think
it's weird that f2fs has different behaviors for different degrees of
misalignment: fail with EINVAL if not logical block aligned, else fallback to
buffered I/O if not fs block aligned.  I think it should be one convention or
the other.  Any opinions about which one it should be?

(Note: if you blame the above code, it was written by me.  But I was just
preserving the existing behavior; I don't know the original motivation.)

- Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-07-22  7:36     ` [f2fs-dev] " Eric Biggers
@ 2022-07-22 14:43       ` Keith Busch
  -1 siblings, 0 replies; 51+ messages in thread
From: Keith Busch @ 2022-07-22 14:43 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Keith Busch, Jaegeuk Kim, Chao Yu, linux-fsdevel, linux-block,
	linux-nvme, axboe, Kernel Team, hch, bvanassche, damien.lemoal,
	pankydev8, linux-f2fs-devel

On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote:
> [+f2fs list and maintainers]
> 
> On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote:
> > From: Keith Busch <kbusch@kernel.org>
> > 
> > Use the address alignment requirements from the block_device for direct
> > io instead of requiring addresses be aligned to the block size.
> > 
> > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/iomap/direct-io.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > index 370c3241618a..5d098adba443 100644
> > --- a/fs/iomap/direct-io.c
> > +++ b/fs/iomap/direct-io.c
> > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> >  	struct inode *inode = iter->inode;
> >  	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
> >  	unsigned int fs_block_size = i_blocksize(inode), pad;
> > -	unsigned int align = iov_iter_alignment(dio->submit.iter);
> >  	loff_t length = iomap_length(iter);
> >  	loff_t pos = iter->pos;
> >  	unsigned int bio_opf;
> > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> >  	size_t copied = 0;
> >  	size_t orig_count;
> >  
> > -	if ((pos | length | align) & ((1 << blkbits) - 1))
> > +	if ((pos | length) & ((1 << blkbits) - 1) ||
> > +	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
> >  		return -EINVAL;
> >  
> >  	if (iomap->type == IOMAP_UNWRITTEN) {
> 
> I noticed that this patch is going to break the following logic in
> f2fs_should_use_dio() in fs/f2fs/file.c:
> 
> 	/*
> 	 * Direct I/O not aligned to the disk's logical_block_size will be
> 	 * attempted, but will fail with -EINVAL.
> 	 *
> 	 * f2fs additionally requires that direct I/O be aligned to the
> 	 * filesystem block size, which is often a stricter requirement.
> 	 * However, f2fs traditionally falls back to buffered I/O on requests
> 	 * that are logical_block_size-aligned but not fs-block aligned.
> 	 *
> 	 * The below logic implements this behavior.
> 	 */
> 	align = iocb->ki_pos | iov_iter_alignment(iter);
> 	if (!IS_ALIGNED(align, i_blocksize(inode)) &&
> 	    IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev)))
> 		return false;
> 
> 	return true;
> 
> So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical
> block aligned.  This patch changes that.  The result is that DIO will sometimes
> proceed in cases where the I/O doesn't have the fs block alignment required by
> f2fs for all DIO.
> 
> Does anyone have any thoughts about what f2fs should be doing here?  I think
> it's weird that f2fs has different behaviors for different degrees of
> misalignment: fail with EINVAL if not logical block aligned, else fallback to
> buffered I/O if not fs block aligned.  I think it should be one convention or
> the other.  Any opinions about which one it should be?

It looks like f2fs just falls back to buffered IO for this condition without
reaching the new code in iomap_dio_bio_iter(). btrfs does the same thing
(check_direct_IO()).  Is that a problem?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [f2fs-dev] [PATCHv6 11/11] iomap: add support for dma aligned direct-io
@ 2022-07-22 14:43       ` Keith Busch
  0 siblings, 0 replies; 51+ messages in thread
From: Keith Busch @ 2022-07-22 14:43 UTC (permalink / raw)
  To: Eric Biggers
  Cc: axboe, bvanassche, pankydev8, damien.lemoal, linux-nvme,
	linux-f2fs-devel, linux-block, Keith Busch, linux-fsdevel,
	Jaegeuk Kim, Kernel Team, hch

On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote:
> [+f2fs list and maintainers]
> 
> On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote:
> > From: Keith Busch <kbusch@kernel.org>
> > 
> > Use the address alignment requirements from the block_device for direct
> > io instead of requiring addresses be aligned to the block size.
> > 
> > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/iomap/direct-io.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > index 370c3241618a..5d098adba443 100644
> > --- a/fs/iomap/direct-io.c
> > +++ b/fs/iomap/direct-io.c
> > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> >  	struct inode *inode = iter->inode;
> >  	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
> >  	unsigned int fs_block_size = i_blocksize(inode), pad;
> > -	unsigned int align = iov_iter_alignment(dio->submit.iter);
> >  	loff_t length = iomap_length(iter);
> >  	loff_t pos = iter->pos;
> >  	unsigned int bio_opf;
> > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> >  	size_t copied = 0;
> >  	size_t orig_count;
> >  
> > -	if ((pos | length | align) & ((1 << blkbits) - 1))
> > +	if ((pos | length) & ((1 << blkbits) - 1) ||
> > +	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
> >  		return -EINVAL;
> >  
> >  	if (iomap->type == IOMAP_UNWRITTEN) {
> 
> I noticed that this patch is going to break the following logic in
> f2fs_should_use_dio() in fs/f2fs/file.c:
> 
> 	/*
> 	 * Direct I/O not aligned to the disk's logical_block_size will be
> 	 * attempted, but will fail with -EINVAL.
> 	 *
> 	 * f2fs additionally requires that direct I/O be aligned to the
> 	 * filesystem block size, which is often a stricter requirement.
> 	 * However, f2fs traditionally falls back to buffered I/O on requests
> 	 * that are logical_block_size-aligned but not fs-block aligned.
> 	 *
> 	 * The below logic implements this behavior.
> 	 */
> 	align = iocb->ki_pos | iov_iter_alignment(iter);
> 	if (!IS_ALIGNED(align, i_blocksize(inode)) &&
> 	    IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev)))
> 		return false;
> 
> 	return true;
> 
> So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical
> block aligned.  This patch changes that.  The result is that DIO will sometimes
> proceed in cases where the I/O doesn't have the fs block alignment required by
> f2fs for all DIO.
> 
> Does anyone have any thoughts about what f2fs should be doing here?  I think
> it's weird that f2fs has different behaviors for different degrees of
> misalignment: fail with EINVAL if not logical block aligned, else fallback to
> buffered I/O if not fs block aligned.  I think it should be one convention or
> the other.  Any opinions about which one it should be?

It looks like f2fs just falls back to buffered IO for this condition without
reaching the new code in iomap_dio_bio_iter(). btrfs does the same thing
(check_direct_IO()).  Is that a problem?


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-07-22  7:36     ` [f2fs-dev] " Eric Biggers
@ 2022-07-22 17:53       ` Darrick J. Wong
  -1 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2022-07-22 17:53 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Keith Busch, Jaegeuk Kim, Chao Yu, linux-fsdevel, linux-block,
	linux-nvme, axboe, Kernel Team, hch, bvanassche, damien.lemoal,
	pankydev8, Keith Busch, linux-f2fs-devel

On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote:
> [+f2fs list and maintainers]
> 
> On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote:
> > From: Keith Busch <kbusch@kernel.org>
> > 
> > Use the address alignment requirements from the block_device for direct
> > io instead of requiring addresses be aligned to the block size.
> > 
> > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/iomap/direct-io.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > index 370c3241618a..5d098adba443 100644
> > --- a/fs/iomap/direct-io.c
> > +++ b/fs/iomap/direct-io.c
> > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> >  	struct inode *inode = iter->inode;
> >  	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
> >  	unsigned int fs_block_size = i_blocksize(inode), pad;
> > -	unsigned int align = iov_iter_alignment(dio->submit.iter);
> >  	loff_t length = iomap_length(iter);
> >  	loff_t pos = iter->pos;
> >  	unsigned int bio_opf;
> > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> >  	size_t copied = 0;
> >  	size_t orig_count;
> >  
> > -	if ((pos | length | align) & ((1 << blkbits) - 1))
> > +	if ((pos | length) & ((1 << blkbits) - 1) ||
> > +	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))

How does this change intersect with "make statx() return DIO alignment
information" ?  Will the new STATX_DIOALIGN implementations have to be
adjusted to set stx_dio_mem_align = bdev_dma_alignment(...)?

I'm guessing the answer is yes, but I haven't seen any patches on the
list to do that, but more and more these days email behaves like a flood
of UDP traffic... :(

--D

> >  		return -EINVAL;
> >  
> >  	if (iomap->type == IOMAP_UNWRITTEN) {
> 
> I noticed that this patch is going to break the following logic in
> f2fs_should_use_dio() in fs/f2fs/file.c:
> 
> 	/*
> 	 * Direct I/O not aligned to the disk's logical_block_size will be
> 	 * attempted, but will fail with -EINVAL.
> 	 *
> 	 * f2fs additionally requires that direct I/O be aligned to the
> 	 * filesystem block size, which is often a stricter requirement.
> 	 * However, f2fs traditionally falls back to buffered I/O on requests
> 	 * that are logical_block_size-aligned but not fs-block aligned.
> 	 *
> 	 * The below logic implements this behavior.
> 	 */
> 	align = iocb->ki_pos | iov_iter_alignment(iter);
> 	if (!IS_ALIGNED(align, i_blocksize(inode)) &&
> 	    IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev)))
> 		return false;
> 
> 	return true;
> 
> So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical
> block aligned.  This patch changes that.  The result is that DIO will sometimes
> proceed in cases where the I/O doesn't have the fs block alignment required by
> f2fs for all DIO.
> 
> Does anyone have any thoughts about what f2fs should be doing here?  I think
> it's weird that f2fs has different behaviors for different degrees of
> misalignment: fail with EINVAL if not logical block aligned, else fallback to
> buffered I/O if not fs block aligned.  I think it should be one convention or
> the other.  Any opinions about which one it should be?
> 
> (Note: if you blame the above code, it was written by me.  But I was just
> preserving the existing behavior; I don't know the original motivation.)
> 
> - Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [f2fs-dev] [PATCHv6 11/11] iomap: add support for dma aligned direct-io
@ 2022-07-22 17:53       ` Darrick J. Wong
  0 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2022-07-22 17:53 UTC (permalink / raw)
  To: Eric Biggers
  Cc: axboe, bvanassche, pankydev8, damien.lemoal, linux-nvme,
	linux-f2fs-devel, linux-block, Keith Busch, Keith Busch,
	linux-fsdevel, Jaegeuk Kim, Kernel Team, hch

On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote:
> [+f2fs list and maintainers]
> 
> On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote:
> > From: Keith Busch <kbusch@kernel.org>
> > 
> > Use the address alignment requirements from the block_device for direct
> > io instead of requiring addresses be aligned to the block size.
> > 
> > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/iomap/direct-io.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > index 370c3241618a..5d098adba443 100644
> > --- a/fs/iomap/direct-io.c
> > +++ b/fs/iomap/direct-io.c
> > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> >  	struct inode *inode = iter->inode;
> >  	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
> >  	unsigned int fs_block_size = i_blocksize(inode), pad;
> > -	unsigned int align = iov_iter_alignment(dio->submit.iter);
> >  	loff_t length = iomap_length(iter);
> >  	loff_t pos = iter->pos;
> >  	unsigned int bio_opf;
> > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> >  	size_t copied = 0;
> >  	size_t orig_count;
> >  
> > -	if ((pos | length | align) & ((1 << blkbits) - 1))
> > +	if ((pos | length) & ((1 << blkbits) - 1) ||
> > +	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))

How does this change intersect with "make statx() return DIO alignment
information" ?  Will the new STATX_DIOALIGN implementations have to be
adjusted to set stx_dio_mem_align = bdev_dma_alignment(...)?

I'm guessing the answer is yes, but I haven't seen any patches on the
list to do that, but more and more these days email behaves like a flood
of UDP traffic... :(

--D

> >  		return -EINVAL;
> >  
> >  	if (iomap->type == IOMAP_UNWRITTEN) {
> 
> I noticed that this patch is going to break the following logic in
> f2fs_should_use_dio() in fs/f2fs/file.c:
> 
> 	/*
> 	 * Direct I/O not aligned to the disk's logical_block_size will be
> 	 * attempted, but will fail with -EINVAL.
> 	 *
> 	 * f2fs additionally requires that direct I/O be aligned to the
> 	 * filesystem block size, which is often a stricter requirement.
> 	 * However, f2fs traditionally falls back to buffered I/O on requests
> 	 * that are logical_block_size-aligned but not fs-block aligned.
> 	 *
> 	 * The below logic implements this behavior.
> 	 */
> 	align = iocb->ki_pos | iov_iter_alignment(iter);
> 	if (!IS_ALIGNED(align, i_blocksize(inode)) &&
> 	    IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev)))
> 		return false;
> 
> 	return true;
> 
> So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical
> block aligned.  This patch changes that.  The result is that DIO will sometimes
> proceed in cases where the I/O doesn't have the fs block alignment required by
> f2fs for all DIO.
> 
> Does anyone have any thoughts about what f2fs should be doing here?  I think
> it's weird that f2fs has different behaviors for different degrees of
> misalignment: fail with EINVAL if not logical block aligned, else fallback to
> buffered I/O if not fs block aligned.  I think it should be one convention or
> the other.  Any opinions about which one it should be?
> 
> (Note: if you blame the above code, it was written by me.  But I was just
> preserving the existing behavior; I don't know the original motivation.)
> 
> - Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-07-22 14:43       ` [f2fs-dev] " Keith Busch
@ 2022-07-22 18:01         ` Eric Biggers
  -1 siblings, 0 replies; 51+ messages in thread
From: Eric Biggers @ 2022-07-22 18:01 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, Jaegeuk Kim, Chao Yu, linux-fsdevel, linux-block,
	linux-nvme, axboe, Kernel Team, hch, bvanassche, damien.lemoal,
	pankydev8, linux-f2fs-devel

On Fri, Jul 22, 2022 at 08:43:55AM -0600, Keith Busch wrote:
> On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote:
> > [+f2fs list and maintainers]
> > 
> > On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote:
> > > From: Keith Busch <kbusch@kernel.org>
> > > 
> > > Use the address alignment requirements from the block_device for direct
> > > io instead of requiring addresses be aligned to the block size.
> > > 
> > > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > >  fs/iomap/direct-io.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > > index 370c3241618a..5d098adba443 100644
> > > --- a/fs/iomap/direct-io.c
> > > +++ b/fs/iomap/direct-io.c
> > > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > >  	struct inode *inode = iter->inode;
> > >  	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
> > >  	unsigned int fs_block_size = i_blocksize(inode), pad;
> > > -	unsigned int align = iov_iter_alignment(dio->submit.iter);
> > >  	loff_t length = iomap_length(iter);
> > >  	loff_t pos = iter->pos;
> > >  	unsigned int bio_opf;
> > > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > >  	size_t copied = 0;
> > >  	size_t orig_count;
> > >  
> > > -	if ((pos | length | align) & ((1 << blkbits) - 1))
> > > +	if ((pos | length) & ((1 << blkbits) - 1) ||
> > > +	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
> > >  		return -EINVAL;
> > >  
> > >  	if (iomap->type == IOMAP_UNWRITTEN) {
> > 
> > I noticed that this patch is going to break the following logic in
> > f2fs_should_use_dio() in fs/f2fs/file.c:
> > 
> > 	/*
> > 	 * Direct I/O not aligned to the disk's logical_block_size will be
> > 	 * attempted, but will fail with -EINVAL.
> > 	 *
> > 	 * f2fs additionally requires that direct I/O be aligned to the
> > 	 * filesystem block size, which is often a stricter requirement.
> > 	 * However, f2fs traditionally falls back to buffered I/O on requests
> > 	 * that are logical_block_size-aligned but not fs-block aligned.
> > 	 *
> > 	 * The below logic implements this behavior.
> > 	 */
> > 	align = iocb->ki_pos | iov_iter_alignment(iter);
> > 	if (!IS_ALIGNED(align, i_blocksize(inode)) &&
> > 	    IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev)))
> > 		return false;
> > 
> > 	return true;
> > 
> > So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical
> > block aligned.  This patch changes that.  The result is that DIO will sometimes
> > proceed in cases where the I/O doesn't have the fs block alignment required by
> > f2fs for all DIO.
> > 
> > Does anyone have any thoughts about what f2fs should be doing here?  I think
> > it's weird that f2fs has different behaviors for different degrees of
> > misalignment: fail with EINVAL if not logical block aligned, else fallback to
> > buffered I/O if not fs block aligned.  I think it should be one convention or
> > the other.  Any opinions about which one it should be?
> 
> It looks like f2fs just falls back to buffered IO for this condition without
> reaching the new code in iomap_dio_bio_iter().

No.  It's a bit subtle, so read the code and what I'm saying carefully.  f2fs
only supports 4K aligned DIO and normally falls back to buffered I/O; however,
for DIO that is *very* misaligned (not even LBS aligned) it returns EINVAL
instead.  And it relies on __iomap_dio_rw() returning that EINVAL.

Relying on __iomap_dio_rw() in that way is definitely a bad design on f2fs's
part (and I messed that up when switching f2fs from fs/direct-io.c to iomap).
The obvious fix is to just have f2fs do the LBS alignment check itself.

But I think that f2fs shouldn't have different behavior for different levels of
misalignment in the first place, so I was wondering if anyone had any thoughts
on which behavior (EINVAL or fallback to buffered I/O) should be standardized on
in all cases, at least for f2fs.  There was some discussion about this sort of
thing for ext4 several years ago on the thread
https://lore.kernel.org/linux-ext4/1461472078-20104-1-git-send-email-tytso@mit.edu/T/#u,
but it didn't really reach a conclusion.  I'm wondering if the f2fs maintainers
have any thoughts about why the f2fs behavior is as it is.  I.e. is it just
accidental, or are there specific reasons...

- Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [f2fs-dev] [PATCHv6 11/11] iomap: add support for dma aligned direct-io
@ 2022-07-22 18:01         ` Eric Biggers
  0 siblings, 0 replies; 51+ messages in thread
From: Eric Biggers @ 2022-07-22 18:01 UTC (permalink / raw)
  To: Keith Busch
  Cc: axboe, bvanassche, pankydev8, damien.lemoal, linux-nvme,
	linux-f2fs-devel, linux-block, Keith Busch, linux-fsdevel,
	Jaegeuk Kim, Kernel Team, hch

On Fri, Jul 22, 2022 at 08:43:55AM -0600, Keith Busch wrote:
> On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote:
> > [+f2fs list and maintainers]
> > 
> > On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote:
> > > From: Keith Busch <kbusch@kernel.org>
> > > 
> > > Use the address alignment requirements from the block_device for direct
> > > io instead of requiring addresses be aligned to the block size.
> > > 
> > > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > >  fs/iomap/direct-io.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > > index 370c3241618a..5d098adba443 100644
> > > --- a/fs/iomap/direct-io.c
> > > +++ b/fs/iomap/direct-io.c
> > > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > >  	struct inode *inode = iter->inode;
> > >  	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
> > >  	unsigned int fs_block_size = i_blocksize(inode), pad;
> > > -	unsigned int align = iov_iter_alignment(dio->submit.iter);
> > >  	loff_t length = iomap_length(iter);
> > >  	loff_t pos = iter->pos;
> > >  	unsigned int bio_opf;
> > > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > >  	size_t copied = 0;
> > >  	size_t orig_count;
> > >  
> > > -	if ((pos | length | align) & ((1 << blkbits) - 1))
> > > +	if ((pos | length) & ((1 << blkbits) - 1) ||
> > > +	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
> > >  		return -EINVAL;
> > >  
> > >  	if (iomap->type == IOMAP_UNWRITTEN) {
> > 
> > I noticed that this patch is going to break the following logic in
> > f2fs_should_use_dio() in fs/f2fs/file.c:
> > 
> > 	/*
> > 	 * Direct I/O not aligned to the disk's logical_block_size will be
> > 	 * attempted, but will fail with -EINVAL.
> > 	 *
> > 	 * f2fs additionally requires that direct I/O be aligned to the
> > 	 * filesystem block size, which is often a stricter requirement.
> > 	 * However, f2fs traditionally falls back to buffered I/O on requests
> > 	 * that are logical_block_size-aligned but not fs-block aligned.
> > 	 *
> > 	 * The below logic implements this behavior.
> > 	 */
> > 	align = iocb->ki_pos | iov_iter_alignment(iter);
> > 	if (!IS_ALIGNED(align, i_blocksize(inode)) &&
> > 	    IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev)))
> > 		return false;
> > 
> > 	return true;
> > 
> > So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical
> > block aligned.  This patch changes that.  The result is that DIO will sometimes
> > proceed in cases where the I/O doesn't have the fs block alignment required by
> > f2fs for all DIO.
> > 
> > Does anyone have any thoughts about what f2fs should be doing here?  I think
> > it's weird that f2fs has different behaviors for different degrees of
> > misalignment: fail with EINVAL if not logical block aligned, else fallback to
> > buffered I/O if not fs block aligned.  I think it should be one convention or
> > the other.  Any opinions about which one it should be?
> 
> It looks like f2fs just falls back to buffered IO for this condition without
> reaching the new code in iomap_dio_bio_iter().

No.  It's a bit subtle, so read the code and what I'm saying carefully.  f2fs
only supports 4K aligned DIO and normally falls back to buffered I/O; however,
for DIO that is *very* misaligned (not even LBS aligned) it returns EINVAL
instead.  And it relies on __iomap_dio_rw() returning that EINVAL.

Relying on __iomap_dio_rw() in that way is definitely a bad design on f2fs's
part (and I messed that up when switching f2fs from fs/direct-io.c to iomap).
The obvious fix is to just have f2fs do the LBS alignment check itself.

But I think that f2fs shouldn't have different behavior for different levels of
misalignment in the first place, so I was wondering if anyone had any thoughts
on which behavior (EINVAL or fallback to buffered I/O) should be standardized on
in all cases, at least for f2fs.  There was some discussion about this sort of
thing for ext4 several years ago on the thread
https://lore.kernel.org/linux-ext4/1461472078-20104-1-git-send-email-tytso@mit.edu/T/#u,
but it didn't really reach a conclusion.  I'm wondering if the f2fs maintainers
have any thoughts about why the f2fs behavior is as it is.  I.e. is it just
accidental, or are there specific reasons...

- Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-07-22 17:53       ` [f2fs-dev] " Darrick J. Wong
@ 2022-07-22 18:12         ` Eric Biggers
  -1 siblings, 0 replies; 51+ messages in thread
From: Eric Biggers @ 2022-07-22 18:12 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Keith Busch, Jaegeuk Kim, Chao Yu, linux-fsdevel, linux-block,
	linux-nvme, axboe, Kernel Team, hch, bvanassche, damien.lemoal,
	pankydev8, Keith Busch, linux-f2fs-devel

On Fri, Jul 22, 2022 at 10:53:42AM -0700, Darrick J. Wong wrote:
> On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote:
> > [+f2fs list and maintainers]
> > 
> > On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote:
> > > From: Keith Busch <kbusch@kernel.org>
> > > 
> > > Use the address alignment requirements from the block_device for direct
> > > io instead of requiring addresses be aligned to the block size.
> > > 
> > > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > >  fs/iomap/direct-io.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > > index 370c3241618a..5d098adba443 100644
> > > --- a/fs/iomap/direct-io.c
> > > +++ b/fs/iomap/direct-io.c
> > > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > >  	struct inode *inode = iter->inode;
> > >  	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
> > >  	unsigned int fs_block_size = i_blocksize(inode), pad;
> > > -	unsigned int align = iov_iter_alignment(dio->submit.iter);
> > >  	loff_t length = iomap_length(iter);
> > >  	loff_t pos = iter->pos;
> > >  	unsigned int bio_opf;
> > > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > >  	size_t copied = 0;
> > >  	size_t orig_count;
> > >  
> > > -	if ((pos | length | align) & ((1 << blkbits) - 1))
> > > +	if ((pos | length) & ((1 << blkbits) - 1) ||
> > > +	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
> 
> How does this change intersect with "make statx() return DIO alignment
> information" ?  Will the new STATX_DIOALIGN implementations have to be
> adjusted to set stx_dio_mem_align = bdev_dma_alignment(...)?
> 
> I'm guessing the answer is yes, but I haven't seen any patches on the
> list to do that, but more and more these days email behaves like a flood
> of UDP traffic... :(
> 

Yes.  I haven't done that in the STATX_DIOALIGN patchset yet because I've been
basing it on upstream, which doesn't yet have this iomap patch.  I haven't been
expecting STATX_DIOALIGN to make 5.20, given that it's a new UAPI that needs
time to be properly reviewed, plus I've just been busy with other things.  So
I've been planning to make the above change after this patch lands upstream.

- Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [f2fs-dev] [PATCHv6 11/11] iomap: add support for dma aligned direct-io
@ 2022-07-22 18:12         ` Eric Biggers
  0 siblings, 0 replies; 51+ messages in thread
From: Eric Biggers @ 2022-07-22 18:12 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: axboe, bvanassche, pankydev8, damien.lemoal, linux-nvme,
	linux-f2fs-devel, linux-block, Keith Busch, Keith Busch,
	linux-fsdevel, Jaegeuk Kim, Kernel Team, hch

On Fri, Jul 22, 2022 at 10:53:42AM -0700, Darrick J. Wong wrote:
> On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote:
> > [+f2fs list and maintainers]
> > 
> > On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote:
> > > From: Keith Busch <kbusch@kernel.org>
> > > 
> > > Use the address alignment requirements from the block_device for direct
> > > io instead of requiring addresses be aligned to the block size.
> > > 
> > > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > >  fs/iomap/direct-io.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > > index 370c3241618a..5d098adba443 100644
> > > --- a/fs/iomap/direct-io.c
> > > +++ b/fs/iomap/direct-io.c
> > > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > >  	struct inode *inode = iter->inode;
> > >  	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
> > >  	unsigned int fs_block_size = i_blocksize(inode), pad;
> > > -	unsigned int align = iov_iter_alignment(dio->submit.iter);
> > >  	loff_t length = iomap_length(iter);
> > >  	loff_t pos = iter->pos;
> > >  	unsigned int bio_opf;
> > > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > >  	size_t copied = 0;
> > >  	size_t orig_count;
> > >  
> > > -	if ((pos | length | align) & ((1 << blkbits) - 1))
> > > +	if ((pos | length) & ((1 << blkbits) - 1) ||
> > > +	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
> 
> How does this change intersect with "make statx() return DIO alignment
> information" ?  Will the new STATX_DIOALIGN implementations have to be
> adjusted to set stx_dio_mem_align = bdev_dma_alignment(...)?
> 
> I'm guessing the answer is yes, but I haven't seen any patches on the
> list to do that, but more and more these days email behaves like a flood
> of UDP traffic... :(
> 

Yes.  I haven't done that in the STATX_DIOALIGN patchset yet because I've been
basing it on upstream, which doesn't yet have this iomap patch.  I haven't been
expecting STATX_DIOALIGN to make 5.20, given that it's a new UAPI that needs
time to be properly reviewed, plus I've just been busy with other things.  So
I've been planning to make the above change after this patch lands upstream.

- Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-07-22 18:01         ` [f2fs-dev] " Eric Biggers
@ 2022-07-22 20:26           ` Keith Busch
  -1 siblings, 0 replies; 51+ messages in thread
From: Keith Busch @ 2022-07-22 20:26 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Keith Busch, Jaegeuk Kim, Chao Yu, linux-fsdevel, linux-block,
	linux-nvme, axboe, Kernel Team, hch, bvanassche, damien.lemoal,
	pankydev8, linux-f2fs-devel

On Fri, Jul 22, 2022 at 06:01:35PM +0000, Eric Biggers wrote:
> On Fri, Jul 22, 2022 at 08:43:55AM -0600, Keith Busch wrote:
> > On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote:
> > > [+f2fs list and maintainers]
> > > 
> > > On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote:
> > > > From: Keith Busch <kbusch@kernel.org>
> > > > 
> > > > Use the address alignment requirements from the block_device for direct
> > > > io instead of requiring addresses be aligned to the block size.
> > > > 
> > > > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > > ---
> > > >  fs/iomap/direct-io.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > > > index 370c3241618a..5d098adba443 100644
> > > > --- a/fs/iomap/direct-io.c
> > > > +++ b/fs/iomap/direct-io.c
> > > > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > > >  	struct inode *inode = iter->inode;
> > > >  	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
> > > >  	unsigned int fs_block_size = i_blocksize(inode), pad;
> > > > -	unsigned int align = iov_iter_alignment(dio->submit.iter);
> > > >  	loff_t length = iomap_length(iter);
> > > >  	loff_t pos = iter->pos;
> > > >  	unsigned int bio_opf;
> > > > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > > >  	size_t copied = 0;
> > > >  	size_t orig_count;
> > > >  
> > > > -	if ((pos | length | align) & ((1 << blkbits) - 1))
> > > > +	if ((pos | length) & ((1 << blkbits) - 1) ||
> > > > +	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
> > > >  		return -EINVAL;
> > > >  
> > > >  	if (iomap->type == IOMAP_UNWRITTEN) {
> > > 
> > > I noticed that this patch is going to break the following logic in
> > > f2fs_should_use_dio() in fs/f2fs/file.c:
> > > 
> > > 	/*
> > > 	 * Direct I/O not aligned to the disk's logical_block_size will be
> > > 	 * attempted, but will fail with -EINVAL.
> > > 	 *
> > > 	 * f2fs additionally requires that direct I/O be aligned to the
> > > 	 * filesystem block size, which is often a stricter requirement.
> > > 	 * However, f2fs traditionally falls back to buffered I/O on requests
> > > 	 * that are logical_block_size-aligned but not fs-block aligned.
> > > 	 *
> > > 	 * The below logic implements this behavior.
> > > 	 */
> > > 	align = iocb->ki_pos | iov_iter_alignment(iter);
> > > 	if (!IS_ALIGNED(align, i_blocksize(inode)) &&
> > > 	    IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev)))
> > > 		return false;
> > > 
> > > 	return true;
> > > 
> > > So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical
> > > block aligned.  This patch changes that.  The result is that DIO will sometimes
> > > proceed in cases where the I/O doesn't have the fs block alignment required by
> > > f2fs for all DIO.
> > > 
> > > Does anyone have any thoughts about what f2fs should be doing here?  I think
> > > it's weird that f2fs has different behaviors for different degrees of
> > > misalignment: fail with EINVAL if not logical block aligned, else fallback to
> > > buffered I/O if not fs block aligned.  I think it should be one convention or
> > > the other.  Any opinions about which one it should be?
> > 
> > It looks like f2fs just falls back to buffered IO for this condition without
> > reaching the new code in iomap_dio_bio_iter().
> 
> No.  It's a bit subtle, so read the code and what I'm saying carefully.  f2fs
> only supports 4K aligned DIO and normally falls back to buffered I/O; however,
> for DIO that is *very* misaligned (not even LBS aligned) it returns EINVAL
> instead.  And it relies on __iomap_dio_rw() returning that EINVAL.

Okay, I understand the code flow now.

I tested f2fs direct io with every possible alignment, and it is successful for
all hardware dma supported alignments and continues to return EINVAL for
unaligned. Is the concern that it's now returning success in scenarios that
used to fail? Or is there some other 4k f2fs constraint that I haven't found
yet?
 
> Relying on __iomap_dio_rw() in that way is definitely a bad design on f2fs's
> part (and I messed that up when switching f2fs from fs/direct-io.c to iomap).
> The obvious fix is to just have f2fs do the LBS alignment check itself.
>
> But I think that f2fs shouldn't have different behavior for different levels of
> misalignment in the first place, so I was wondering if anyone had any thoughts
> on which behavior (EINVAL or fallback to buffered I/O) should be standardized on
> in all cases, at least for f2fs.  There was some discussion about this sort of
> thing for ext4 several years ago on the thread
> https://lore.kernel.org/linux-ext4/1461472078-20104-1-git-send-email-tytso@mit.edu/T/#u,
> but it didn't really reach a conclusion.  I'm wondering if the f2fs maintainers
> have any thoughts about why the f2fs behavior is as it is.  I.e. is it just
> accidental, or are there specific reasons...
> 
> - Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [f2fs-dev] [PATCHv6 11/11] iomap: add support for dma aligned direct-io
@ 2022-07-22 20:26           ` Keith Busch
  0 siblings, 0 replies; 51+ messages in thread
From: Keith Busch @ 2022-07-22 20:26 UTC (permalink / raw)
  To: Eric Biggers
  Cc: axboe, bvanassche, pankydev8, damien.lemoal, linux-nvme,
	linux-f2fs-devel, linux-block, Keith Busch, linux-fsdevel,
	Jaegeuk Kim, Kernel Team, hch

On Fri, Jul 22, 2022 at 06:01:35PM +0000, Eric Biggers wrote:
> On Fri, Jul 22, 2022 at 08:43:55AM -0600, Keith Busch wrote:
> > On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote:
> > > [+f2fs list and maintainers]
> > > 
> > > On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote:
> > > > From: Keith Busch <kbusch@kernel.org>
> > > > 
> > > > Use the address alignment requirements from the block_device for direct
> > > > io instead of requiring addresses be aligned to the block size.
> > > > 
> > > > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > > ---
> > > >  fs/iomap/direct-io.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > > > index 370c3241618a..5d098adba443 100644
> > > > --- a/fs/iomap/direct-io.c
> > > > +++ b/fs/iomap/direct-io.c
> > > > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > > >  	struct inode *inode = iter->inode;
> > > >  	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
> > > >  	unsigned int fs_block_size = i_blocksize(inode), pad;
> > > > -	unsigned int align = iov_iter_alignment(dio->submit.iter);
> > > >  	loff_t length = iomap_length(iter);
> > > >  	loff_t pos = iter->pos;
> > > >  	unsigned int bio_opf;
> > > > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > > >  	size_t copied = 0;
> > > >  	size_t orig_count;
> > > >  
> > > > -	if ((pos | length | align) & ((1 << blkbits) - 1))
> > > > +	if ((pos | length) & ((1 << blkbits) - 1) ||
> > > > +	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
> > > >  		return -EINVAL;
> > > >  
> > > >  	if (iomap->type == IOMAP_UNWRITTEN) {
> > > 
> > > I noticed that this patch is going to break the following logic in
> > > f2fs_should_use_dio() in fs/f2fs/file.c:
> > > 
> > > 	/*
> > > 	 * Direct I/O not aligned to the disk's logical_block_size will be
> > > 	 * attempted, but will fail with -EINVAL.
> > > 	 *
> > > 	 * f2fs additionally requires that direct I/O be aligned to the
> > > 	 * filesystem block size, which is often a stricter requirement.
> > > 	 * However, f2fs traditionally falls back to buffered I/O on requests
> > > 	 * that are logical_block_size-aligned but not fs-block aligned.
> > > 	 *
> > > 	 * The below logic implements this behavior.
> > > 	 */
> > > 	align = iocb->ki_pos | iov_iter_alignment(iter);
> > > 	if (!IS_ALIGNED(align, i_blocksize(inode)) &&
> > > 	    IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev)))
> > > 		return false;
> > > 
> > > 	return true;
> > > 
> > > So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical
> > > block aligned.  This patch changes that.  The result is that DIO will sometimes
> > > proceed in cases where the I/O doesn't have the fs block alignment required by
> > > f2fs for all DIO.
> > > 
> > > Does anyone have any thoughts about what f2fs should be doing here?  I think
> > > it's weird that f2fs has different behaviors for different degrees of
> > > misalignment: fail with EINVAL if not logical block aligned, else fallback to
> > > buffered I/O if not fs block aligned.  I think it should be one convention or
> > > the other.  Any opinions about which one it should be?
> > 
> > It looks like f2fs just falls back to buffered IO for this condition without
> > reaching the new code in iomap_dio_bio_iter().
> 
> No.  It's a bit subtle, so read the code and what I'm saying carefully.  f2fs
> only supports 4K aligned DIO and normally falls back to buffered I/O; however,
> for DIO that is *very* misaligned (not even LBS aligned) it returns EINVAL
> instead.  And it relies on __iomap_dio_rw() returning that EINVAL.

Okay, I understand the code flow now.

I tested f2fs direct io with every possible alignment, and it is successful for
all hardware dma supported alignments and continues to return EINVAL for
unaligned. Is the concern that it's now returning success in scenarios that
used to fail? Or is there some other 4k f2fs constraint that I haven't found
yet?
 
> Relying on __iomap_dio_rw() in that way is definitely a bad design on f2fs's
> part (and I messed that up when switching f2fs from fs/direct-io.c to iomap).
> The obvious fix is to just have f2fs do the LBS alignment check itself.
>
> But I think that f2fs shouldn't have different behavior for different levels of
> misalignment in the first place, so I was wondering if anyone had any thoughts
> on which behavior (EINVAL or fallback to buffered I/O) should be standardized on
> in all cases, at least for f2fs.  There was some discussion about this sort of
> thing for ext4 several years ago on the thread
> https://lore.kernel.org/linux-ext4/1461472078-20104-1-git-send-email-tytso@mit.edu/T/#u,
> but it didn't really reach a conclusion.  I'm wondering if the f2fs maintainers
> have any thoughts about why the f2fs behavior is as it is.  I.e. is it just
> accidental, or are there specific reasons...
> 
> - Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 05/11] block: add a helper function for dio alignment
  2022-06-10 19:58 ` [PATCHv6 05/11] block: add a helper function for dio alignment Keith Busch
@ 2022-07-22 21:53   ` Bart Van Assche
  0 siblings, 0 replies; 51+ messages in thread
From: Bart Van Assche @ 2022-07-22 21:53 UTC (permalink / raw)
  To: Keith Busch, linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, damien.lemoal, ebiggers, pankydev8,
	Keith Busch, Johannes Thumshirn

On 6/10/22 12:58, Keith Busch wrote:
> +static bool blkdev_dio_unaligned(struct block_device *bdev, loff_t pos,
> +			      struct iov_iter *iter)
> +{
> +	return ((pos | iov_iter_alignment(iter)) &
> +	    (bdev_logical_block_size(bdev) - 1));
> +}

A nit: if you have to repost this patch series, please leave out the 
outer parentheses from the return statement. Otherwise this patch looks 
good to me.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 06/11] block/merge: count bytes instead of sectors
  2022-06-10 19:58 ` [PATCHv6 06/11] block/merge: count bytes instead of sectors Keith Busch
@ 2022-07-22 21:57   ` Bart Van Assche
  0 siblings, 0 replies; 51+ messages in thread
From: Bart Van Assche @ 2022-07-22 21:57 UTC (permalink / raw)
  To: Keith Busch, linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, damien.lemoal, ebiggers, pankydev8,
	Keith Busch, Johannes Thumshirn

On 6/10/22 12:58, Keith Busch wrote:
> @@ -269,8 +269,8 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>   {
>   	struct bio_vec bv, bvprv, *bvprvp = NULL;
>   	struct bvec_iter iter;
> -	unsigned nsegs = 0, sectors = 0;
> -	const unsigned max_sectors = get_max_io_size(q, bio);
> +	unsigned nsegs = 0, bytes = 0;
> +	const unsigned max_bytes = get_max_io_size(q, bio) << 9;

How about using SECTOR_SHIFT instead of 9?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 07/11] block/bounce: count bytes instead of sectors
  2022-06-10 19:58 ` [PATCHv6 07/11] block/bounce: " Keith Busch
  2022-06-13 14:22   ` Christoph Hellwig
@ 2022-07-22 22:01   ` Bart Van Assche
  2022-07-25 14:46     ` Keith Busch
  1 sibling, 1 reply; 51+ messages in thread
From: Bart Van Assche @ 2022-07-22 22:01 UTC (permalink / raw)
  To: Keith Busch, linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, damien.lemoal, ebiggers, pankydev8,
	Keith Busch, Pankaj Raghav

On 6/10/22 12:58, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Individual bv_len's may not be a sector size.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
> ---
>   block/bounce.c | 13 ++++++++++---
>   1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/block/bounce.c b/block/bounce.c
> index 8f7b6fe3b4db..fbadf179601f 100644
> --- a/block/bounce.c
> +++ b/block/bounce.c
> @@ -205,19 +205,26 @@ void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig)
>   	int rw = bio_data_dir(*bio_orig);
>   	struct bio_vec *to, from;
>   	struct bvec_iter iter;
> -	unsigned i = 0;
> +	unsigned i = 0, bytes = 0;
>   	bool bounce = false;
> -	int sectors = 0;
> +	int sectors;
>   
>   	bio_for_each_segment(from, *bio_orig, iter) {
>   		if (i++ < BIO_MAX_VECS)
> -			sectors += from.bv_len >> 9;
> +			bytes += from.bv_len;
>   		if (PageHighMem(from.bv_page))
>   			bounce = true;
>   	}
>   	if (!bounce)
>   		return;
>   
> +	/*
> +	 * Individual bvecs might not be logical block aligned. Round down
> +	 * the split size so that each bio is properly block size aligned,
> +	 * even if we do not use the full hardware limits.
> +	 */
> +	sectors = ALIGN_DOWN(bytes, queue_logical_block_size(q)) >>
> +			SECTOR_SHIFT;
>   	if (sectors < bio_sectors(*bio_orig)) {
>   		bio = bio_split(*bio_orig, sectors, GFP_NOIO, &bounce_bio_split);
>   		bio_chain(bio, *bio_orig);

Do I see correctly that there are two changes in this patch: counting 
bytes instead of sectors and also splitting at logical block boundaries 
instead of a 512-byte boundary? Should this patch perhaps be split?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-07-22 18:12         ` [f2fs-dev] " Eric Biggers
@ 2022-07-23  5:03           ` Darrick J. Wong
  -1 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2022-07-23  5:03 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Keith Busch, Jaegeuk Kim, Chao Yu, linux-fsdevel, linux-block,
	linux-nvme, axboe, Kernel Team, hch, bvanassche, damien.lemoal,
	pankydev8, Keith Busch, linux-f2fs-devel

On Fri, Jul 22, 2022 at 06:12:40PM +0000, Eric Biggers wrote:
> On Fri, Jul 22, 2022 at 10:53:42AM -0700, Darrick J. Wong wrote:
> > On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote:
> > > [+f2fs list and maintainers]
> > > 
> > > On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote:
> > > > From: Keith Busch <kbusch@kernel.org>
> > > > 
> > > > Use the address alignment requirements from the block_device for direct
> > > > io instead of requiring addresses be aligned to the block size.
> > > > 
> > > > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > > ---
> > > >  fs/iomap/direct-io.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > > > index 370c3241618a..5d098adba443 100644
> > > > --- a/fs/iomap/direct-io.c
> > > > +++ b/fs/iomap/direct-io.c
> > > > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > > >  	struct inode *inode = iter->inode;
> > > >  	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
> > > >  	unsigned int fs_block_size = i_blocksize(inode), pad;
> > > > -	unsigned int align = iov_iter_alignment(dio->submit.iter);
> > > >  	loff_t length = iomap_length(iter);
> > > >  	loff_t pos = iter->pos;
> > > >  	unsigned int bio_opf;
> > > > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > > >  	size_t copied = 0;
> > > >  	size_t orig_count;
> > > >  
> > > > -	if ((pos | length | align) & ((1 << blkbits) - 1))
> > > > +	if ((pos | length) & ((1 << blkbits) - 1) ||
> > > > +	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
> > 
> > How does this change intersect with "make statx() return DIO alignment
> > information" ?  Will the new STATX_DIOALIGN implementations have to be
> > adjusted to set stx_dio_mem_align = bdev_dma_alignment(...)?
> > 
> > I'm guessing the answer is yes, but I haven't seen any patches on the
> > list to do that, but more and more these days email behaves like a flood
> > of UDP traffic... :(
> > 
> 
> Yes.  I haven't done that in the STATX_DIOALIGN patchset yet because I've been
> basing it on upstream, which doesn't yet have this iomap patch.  I haven't been
> expecting STATX_DIOALIGN to make 5.20, given that it's a new UAPI that needs
> time to be properly reviewed, plus I've just been busy with other things.  So
> I've been planning to make the above change after this patch lands upstream.

<nod> Ok, I'm looking forward to it.  Thank you for your work on statx! :)

--D

> - Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [f2fs-dev] [PATCHv6 11/11] iomap: add support for dma aligned direct-io
@ 2022-07-23  5:03           ` Darrick J. Wong
  0 siblings, 0 replies; 51+ messages in thread
From: Darrick J. Wong @ 2022-07-23  5:03 UTC (permalink / raw)
  To: Eric Biggers
  Cc: axboe, bvanassche, pankydev8, damien.lemoal, linux-nvme,
	linux-f2fs-devel, linux-block, Keith Busch, Keith Busch,
	linux-fsdevel, Jaegeuk Kim, Kernel Team, hch

On Fri, Jul 22, 2022 at 06:12:40PM +0000, Eric Biggers wrote:
> On Fri, Jul 22, 2022 at 10:53:42AM -0700, Darrick J. Wong wrote:
> > On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote:
> > > [+f2fs list and maintainers]
> > > 
> > > On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote:
> > > > From: Keith Busch <kbusch@kernel.org>
> > > > 
> > > > Use the address alignment requirements from the block_device for direct
> > > > io instead of requiring addresses be aligned to the block size.
> > > > 
> > > > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > > ---
> > > >  fs/iomap/direct-io.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > > > index 370c3241618a..5d098adba443 100644
> > > > --- a/fs/iomap/direct-io.c
> > > > +++ b/fs/iomap/direct-io.c
> > > > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > > >  	struct inode *inode = iter->inode;
> > > >  	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
> > > >  	unsigned int fs_block_size = i_blocksize(inode), pad;
> > > > -	unsigned int align = iov_iter_alignment(dio->submit.iter);
> > > >  	loff_t length = iomap_length(iter);
> > > >  	loff_t pos = iter->pos;
> > > >  	unsigned int bio_opf;
> > > > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > > >  	size_t copied = 0;
> > > >  	size_t orig_count;
> > > >  
> > > > -	if ((pos | length | align) & ((1 << blkbits) - 1))
> > > > +	if ((pos | length) & ((1 << blkbits) - 1) ||
> > > > +	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
> > 
> > How does this change intersect with "make statx() return DIO alignment
> > information" ?  Will the new STATX_DIOALIGN implementations have to be
> > adjusted to set stx_dio_mem_align = bdev_dma_alignment(...)?
> > 
> > I'm guessing the answer is yes, but I haven't seen any patches on the
> > list to do that, but more and more these days email behaves like a flood
> > of UDP traffic... :(
> > 
> 
> Yes.  I haven't done that in the STATX_DIOALIGN patchset yet because I've been
> basing it on upstream, which doesn't yet have this iomap patch.  I haven't been
> expecting STATX_DIOALIGN to make 5.20, given that it's a new UAPI that needs
> time to be properly reviewed, plus I've just been busy with other things.  So
> I've been planning to make the above change after this patch lands upstream.

<nod> Ok, I'm looking forward to it.  Thank you for your work on statx! :)

--D

> - Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-07-22 18:01         ` [f2fs-dev] " Eric Biggers
@ 2022-07-24  2:13           ` Jaegeuk Kim
  -1 siblings, 0 replies; 51+ messages in thread
From: Jaegeuk Kim @ 2022-07-24  2:13 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Keith Busch, Keith Busch, Chao Yu, linux-fsdevel, linux-block,
	linux-nvme, axboe, Kernel Team, hch, bvanassche, damien.lemoal,
	pankydev8, linux-f2fs-devel

On 07/22, Eric Biggers wrote:
> On Fri, Jul 22, 2022 at 08:43:55AM -0600, Keith Busch wrote:
> > On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote:
> > > [+f2fs list and maintainers]
> > > 
> > > On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote:
> > > > From: Keith Busch <kbusch@kernel.org>
> > > > 
> > > > Use the address alignment requirements from the block_device for direct
> > > > io instead of requiring addresses be aligned to the block size.
> > > > 
> > > > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > > ---
> > > >  fs/iomap/direct-io.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > > > index 370c3241618a..5d098adba443 100644
> > > > --- a/fs/iomap/direct-io.c
> > > > +++ b/fs/iomap/direct-io.c
> > > > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > > >  	struct inode *inode = iter->inode;
> > > >  	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
> > > >  	unsigned int fs_block_size = i_blocksize(inode), pad;
> > > > -	unsigned int align = iov_iter_alignment(dio->submit.iter);
> > > >  	loff_t length = iomap_length(iter);
> > > >  	loff_t pos = iter->pos;
> > > >  	unsigned int bio_opf;
> > > > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > > >  	size_t copied = 0;
> > > >  	size_t orig_count;
> > > >  
> > > > -	if ((pos | length | align) & ((1 << blkbits) - 1))
> > > > +	if ((pos | length) & ((1 << blkbits) - 1) ||
> > > > +	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
> > > >  		return -EINVAL;
> > > >  
> > > >  	if (iomap->type == IOMAP_UNWRITTEN) {
> > > 
> > > I noticed that this patch is going to break the following logic in
> > > f2fs_should_use_dio() in fs/f2fs/file.c:
> > > 
> > > 	/*
> > > 	 * Direct I/O not aligned to the disk's logical_block_size will be
> > > 	 * attempted, but will fail with -EINVAL.
> > > 	 *
> > > 	 * f2fs additionally requires that direct I/O be aligned to the
> > > 	 * filesystem block size, which is often a stricter requirement.
> > > 	 * However, f2fs traditionally falls back to buffered I/O on requests
> > > 	 * that are logical_block_size-aligned but not fs-block aligned.
> > > 	 *
> > > 	 * The below logic implements this behavior.
> > > 	 */
> > > 	align = iocb->ki_pos | iov_iter_alignment(iter);
> > > 	if (!IS_ALIGNED(align, i_blocksize(inode)) &&
> > > 	    IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev)))
> > > 		return false;
> > > 
> > > 	return true;
> > > 
> > > So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical
> > > block aligned.  This patch changes that.  The result is that DIO will sometimes
> > > proceed in cases where the I/O doesn't have the fs block alignment required by
> > > f2fs for all DIO.
> > > 
> > > Does anyone have any thoughts about what f2fs should be doing here?  I think
> > > it's weird that f2fs has different behaviors for different degrees of
> > > misalignment: fail with EINVAL if not logical block aligned, else fallback to
> > > buffered I/O if not fs block aligned.  I think it should be one convention or
> > > the other.  Any opinions about which one it should be?
> > 
> > It looks like f2fs just falls back to buffered IO for this condition without
> > reaching the new code in iomap_dio_bio_iter().
> 
> No.  It's a bit subtle, so read the code and what I'm saying carefully.  f2fs
> only supports 4K aligned DIO and normally falls back to buffered I/O; however,
> for DIO that is *very* misaligned (not even LBS aligned) it returns EINVAL
> instead.  And it relies on __iomap_dio_rw() returning that EINVAL.
> 
> Relying on __iomap_dio_rw() in that way is definitely a bad design on f2fs's
> part (and I messed that up when switching f2fs from fs/direct-io.c to iomap).
> The obvious fix is to just have f2fs do the LBS alignment check itself.
> 
> But I think that f2fs shouldn't have different behavior for different levels of
> misalignment in the first place, so I was wondering if anyone had any thoughts
> on which behavior (EINVAL or fallback to buffered I/O) should be standardized on
> in all cases, at least for f2fs.  There was some discussion about this sort of
> thing for ext4 several years ago on the thread
> https://lore.kernel.org/linux-ext4/1461472078-20104-1-git-send-email-tytso@mit.edu/T/#u,
> but it didn't really reach a conclusion.  I'm wondering if the f2fs maintainers
> have any thoughts about why the f2fs behavior is as it is.  I.e. is it just
> accidental, or are there specific reasons...

If there's a generic way to deal with this, I have no objection to
follow it. Initially, I remember I was trying to match the ext4 rule,
but at some point, I lost the track.

> 
> - Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [f2fs-dev] [PATCHv6 11/11] iomap: add support for dma aligned direct-io
@ 2022-07-24  2:13           ` Jaegeuk Kim
  0 siblings, 0 replies; 51+ messages in thread
From: Jaegeuk Kim @ 2022-07-24  2:13 UTC (permalink / raw)
  To: Eric Biggers
  Cc: axboe, bvanassche, pankydev8, damien.lemoal, linux-nvme,
	linux-f2fs-devel, linux-block, linux-fsdevel, Keith Busch,
	Keith Busch, Kernel Team, hch

On 07/22, Eric Biggers wrote:
> On Fri, Jul 22, 2022 at 08:43:55AM -0600, Keith Busch wrote:
> > On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote:
> > > [+f2fs list and maintainers]
> > > 
> > > On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote:
> > > > From: Keith Busch <kbusch@kernel.org>
> > > > 
> > > > Use the address alignment requirements from the block_device for direct
> > > > io instead of requiring addresses be aligned to the block size.
> > > > 
> > > > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > > ---
> > > >  fs/iomap/direct-io.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > > > index 370c3241618a..5d098adba443 100644
> > > > --- a/fs/iomap/direct-io.c
> > > > +++ b/fs/iomap/direct-io.c
> > > > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > > >  	struct inode *inode = iter->inode;
> > > >  	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
> > > >  	unsigned int fs_block_size = i_blocksize(inode), pad;
> > > > -	unsigned int align = iov_iter_alignment(dio->submit.iter);
> > > >  	loff_t length = iomap_length(iter);
> > > >  	loff_t pos = iter->pos;
> > > >  	unsigned int bio_opf;
> > > > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > > >  	size_t copied = 0;
> > > >  	size_t orig_count;
> > > >  
> > > > -	if ((pos | length | align) & ((1 << blkbits) - 1))
> > > > +	if ((pos | length) & ((1 << blkbits) - 1) ||
> > > > +	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
> > > >  		return -EINVAL;
> > > >  
> > > >  	if (iomap->type == IOMAP_UNWRITTEN) {
> > > 
> > > I noticed that this patch is going to break the following logic in
> > > f2fs_should_use_dio() in fs/f2fs/file.c:
> > > 
> > > 	/*
> > > 	 * Direct I/O not aligned to the disk's logical_block_size will be
> > > 	 * attempted, but will fail with -EINVAL.
> > > 	 *
> > > 	 * f2fs additionally requires that direct I/O be aligned to the
> > > 	 * filesystem block size, which is often a stricter requirement.
> > > 	 * However, f2fs traditionally falls back to buffered I/O on requests
> > > 	 * that are logical_block_size-aligned but not fs-block aligned.
> > > 	 *
> > > 	 * The below logic implements this behavior.
> > > 	 */
> > > 	align = iocb->ki_pos | iov_iter_alignment(iter);
> > > 	if (!IS_ALIGNED(align, i_blocksize(inode)) &&
> > > 	    IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev)))
> > > 		return false;
> > > 
> > > 	return true;
> > > 
> > > So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical
> > > block aligned.  This patch changes that.  The result is that DIO will sometimes
> > > proceed in cases where the I/O doesn't have the fs block alignment required by
> > > f2fs for all DIO.
> > > 
> > > Does anyone have any thoughts about what f2fs should be doing here?  I think
> > > it's weird that f2fs has different behaviors for different degrees of
> > > misalignment: fail with EINVAL if not logical block aligned, else fallback to
> > > buffered I/O if not fs block aligned.  I think it should be one convention or
> > > the other.  Any opinions about which one it should be?
> > 
> > It looks like f2fs just falls back to buffered IO for this condition without
> > reaching the new code in iomap_dio_bio_iter().
> 
> No.  It's a bit subtle, so read the code and what I'm saying carefully.  f2fs
> only supports 4K aligned DIO and normally falls back to buffered I/O; however,
> for DIO that is *very* misaligned (not even LBS aligned) it returns EINVAL
> instead.  And it relies on __iomap_dio_rw() returning that EINVAL.
> 
> Relying on __iomap_dio_rw() in that way is definitely a bad design on f2fs's
> part (and I messed that up when switching f2fs from fs/direct-io.c to iomap).
> The obvious fix is to just have f2fs do the LBS alignment check itself.
> 
> But I think that f2fs shouldn't have different behavior for different levels of
> misalignment in the first place, so I was wondering if anyone had any thoughts
> on which behavior (EINVAL or fallback to buffered I/O) should be standardized on
> in all cases, at least for f2fs.  There was some discussion about this sort of
> thing for ext4 several years ago on the thread
> https://lore.kernel.org/linux-ext4/1461472078-20104-1-git-send-email-tytso@mit.edu/T/#u,
> but it didn't really reach a conclusion.  I'm wondering if the f2fs maintainers
> have any thoughts about why the f2fs behavior is as it is.  I.e. is it just
> accidental, or are there specific reasons...

If there's a generic way to deal with this, I have no objection to
follow it. Initially, I remember I was trying to match the ext4 rule,
but at some point, I lost the track.

> 
> - Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 07/11] block/bounce: count bytes instead of sectors
  2022-07-22 22:01   ` Bart Van Assche
@ 2022-07-25 14:46     ` Keith Busch
  0 siblings, 0 replies; 51+ messages in thread
From: Keith Busch @ 2022-07-25 14:46 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Keith Busch, linux-fsdevel, linux-block, linux-nvme, axboe,
	Kernel Team, hch, damien.lemoal, ebiggers, pankydev8,
	Pankaj Raghav

On Fri, Jul 22, 2022 at 03:01:06PM -0700, Bart Van Assche wrote:
> On 6/10/22 12:58, Keith Busch wrote:
> > From: Keith Busch <kbusch@kernel.org>
> > 
> > Individual bv_len's may not be a sector size.
> > 
> > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> > Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
> > ---
> >   block/bounce.c | 13 ++++++++++---
> >   1 file changed, 10 insertions(+), 3 deletions(-)
> > 
> > diff --git a/block/bounce.c b/block/bounce.c
> > index 8f7b6fe3b4db..fbadf179601f 100644
> > --- a/block/bounce.c
> > +++ b/block/bounce.c
> > @@ -205,19 +205,26 @@ void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig)
> >   	int rw = bio_data_dir(*bio_orig);
> >   	struct bio_vec *to, from;
> >   	struct bvec_iter iter;
> > -	unsigned i = 0;
> > +	unsigned i = 0, bytes = 0;
> >   	bool bounce = false;
> > -	int sectors = 0;
> > +	int sectors;
> >   	bio_for_each_segment(from, *bio_orig, iter) {
> >   		if (i++ < BIO_MAX_VECS)
> > -			sectors += from.bv_len >> 9;
> > +			bytes += from.bv_len;
> >   		if (PageHighMem(from.bv_page))
> >   			bounce = true;
> >   	}
> >   	if (!bounce)
> >   		return;
> > +	/*
> > +	 * Individual bvecs might not be logical block aligned. Round down
> > +	 * the split size so that each bio is properly block size aligned,
> > +	 * even if we do not use the full hardware limits.
> > +	 */
> > +	sectors = ALIGN_DOWN(bytes, queue_logical_block_size(q)) >>
> > +			SECTOR_SHIFT;
> >   	if (sectors < bio_sectors(*bio_orig)) {
> >   		bio = bio_split(*bio_orig, sectors, GFP_NOIO, &bounce_bio_split);
> >   		bio_chain(bio, *bio_orig);
> 
> Do I see correctly that there are two changes in this patch: counting bytes
> instead of sectors and also splitting at logical block boundaries instead of
> a 512-byte boundary? Should this patch perhaps be split?

The code previously would only split on a bvec boundary. All bvecs were logical
block sized, so that part is not changing. We just needed to be able to split
mid-bvec since the series enables unaligned offsets to match hardware
capabilities.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
  2022-07-22 20:26           ` [f2fs-dev] " Keith Busch
@ 2022-07-25 18:19             ` Eric Biggers
  -1 siblings, 0 replies; 51+ messages in thread
From: Eric Biggers @ 2022-07-25 18:19 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, Jaegeuk Kim, Chao Yu, linux-fsdevel, linux-block,
	linux-nvme, axboe, Kernel Team, hch, bvanassche, damien.lemoal,
	pankydev8, linux-f2fs-devel

On Fri, Jul 22, 2022 at 02:26:05PM -0600, Keith Busch wrote:
> On Fri, Jul 22, 2022 at 06:01:35PM +0000, Eric Biggers wrote:
> > On Fri, Jul 22, 2022 at 08:43:55AM -0600, Keith Busch wrote:
> > > On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote:
> > > > [+f2fs list and maintainers]
> > > > 
> > > > On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote:
> > > > > From: Keith Busch <kbusch@kernel.org>
> > > > > 
> > > > > Use the address alignment requirements from the block_device for direct
> > > > > io instead of requiring addresses be aligned to the block size.
> > > > > 
> > > > > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > > > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > > > ---
> > > > >  fs/iomap/direct-io.c | 4 ++--
> > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > > > > index 370c3241618a..5d098adba443 100644
> > > > > --- a/fs/iomap/direct-io.c
> > > > > +++ b/fs/iomap/direct-io.c
> > > > > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > > > >  	struct inode *inode = iter->inode;
> > > > >  	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
> > > > >  	unsigned int fs_block_size = i_blocksize(inode), pad;
> > > > > -	unsigned int align = iov_iter_alignment(dio->submit.iter);
> > > > >  	loff_t length = iomap_length(iter);
> > > > >  	loff_t pos = iter->pos;
> > > > >  	unsigned int bio_opf;
> > > > > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > > > >  	size_t copied = 0;
> > > > >  	size_t orig_count;
> > > > >  
> > > > > -	if ((pos | length | align) & ((1 << blkbits) - 1))
> > > > > +	if ((pos | length) & ((1 << blkbits) - 1) ||
> > > > > +	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
> > > > >  		return -EINVAL;
> > > > >  
> > > > >  	if (iomap->type == IOMAP_UNWRITTEN) {
> > > > 
> > > > I noticed that this patch is going to break the following logic in
> > > > f2fs_should_use_dio() in fs/f2fs/file.c:
> > > > 
> > > > 	/*
> > > > 	 * Direct I/O not aligned to the disk's logical_block_size will be
> > > > 	 * attempted, but will fail with -EINVAL.
> > > > 	 *
> > > > 	 * f2fs additionally requires that direct I/O be aligned to the
> > > > 	 * filesystem block size, which is often a stricter requirement.
> > > > 	 * However, f2fs traditionally falls back to buffered I/O on requests
> > > > 	 * that are logical_block_size-aligned but not fs-block aligned.
> > > > 	 *
> > > > 	 * The below logic implements this behavior.
> > > > 	 */
> > > > 	align = iocb->ki_pos | iov_iter_alignment(iter);
> > > > 	if (!IS_ALIGNED(align, i_blocksize(inode)) &&
> > > > 	    IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev)))
> > > > 		return false;
> > > > 
> > > > 	return true;
> > > > 
> > > > So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical
> > > > block aligned.  This patch changes that.  The result is that DIO will sometimes
> > > > proceed in cases where the I/O doesn't have the fs block alignment required by
> > > > f2fs for all DIO.
> > > > 
> > > > Does anyone have any thoughts about what f2fs should be doing here?  I think
> > > > it's weird that f2fs has different behaviors for different degrees of
> > > > misalignment: fail with EINVAL if not logical block aligned, else fallback to
> > > > buffered I/O if not fs block aligned.  I think it should be one convention or
> > > > the other.  Any opinions about which one it should be?
> > > 
> > > It looks like f2fs just falls back to buffered IO for this condition without
> > > reaching the new code in iomap_dio_bio_iter().
> > 
> > No.  It's a bit subtle, so read the code and what I'm saying carefully.  f2fs
> > only supports 4K aligned DIO and normally falls back to buffered I/O; however,
> > for DIO that is *very* misaligned (not even LBS aligned) it returns EINVAL
> > instead.  And it relies on __iomap_dio_rw() returning that EINVAL.
> 
> Okay, I understand the code flow now.
> 
> I tested f2fs direct io with every possible alignment, and it is successful for
> all hardware dma supported alignments and continues to return EINVAL for
> unaligned. Is the concern that it's now returning success in scenarios that
> used to fail? Or is there some other 4k f2fs constraint that I haven't found
> yet?

f2fs has a lot of different options, and edge cases that only occur occasionally
such as data blocks being moved by garbage collection.  So just because
misaligned DIO appears to work doesn't necessarily mean that it actually works.
Maybe the f2fs developers can elaborate on the reason that f2fs always requires
4K alignment for DIO.

- Eric

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [f2fs-dev] [PATCHv6 11/11] iomap: add support for dma aligned direct-io
@ 2022-07-25 18:19             ` Eric Biggers
  0 siblings, 0 replies; 51+ messages in thread
From: Eric Biggers @ 2022-07-25 18:19 UTC (permalink / raw)
  To: Keith Busch
  Cc: axboe, bvanassche, pankydev8, damien.lemoal, linux-nvme,
	linux-f2fs-devel, linux-block, Keith Busch, linux-fsdevel,
	Jaegeuk Kim, Kernel Team, hch

On Fri, Jul 22, 2022 at 02:26:05PM -0600, Keith Busch wrote:
> On Fri, Jul 22, 2022 at 06:01:35PM +0000, Eric Biggers wrote:
> > On Fri, Jul 22, 2022 at 08:43:55AM -0600, Keith Busch wrote:
> > > On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote:
> > > > [+f2fs list and maintainers]
> > > > 
> > > > On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote:
> > > > > From: Keith Busch <kbusch@kernel.org>
> > > > > 
> > > > > Use the address alignment requirements from the block_device for direct
> > > > > io instead of requiring addresses be aligned to the block size.
> > > > > 
> > > > > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > > > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > > > ---
> > > > >  fs/iomap/direct-io.c | 4 ++--
> > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > > > > index 370c3241618a..5d098adba443 100644
> > > > > --- a/fs/iomap/direct-io.c
> > > > > +++ b/fs/iomap/direct-io.c
> > > > > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > > > >  	struct inode *inode = iter->inode;
> > > > >  	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
> > > > >  	unsigned int fs_block_size = i_blocksize(inode), pad;
> > > > > -	unsigned int align = iov_iter_alignment(dio->submit.iter);
> > > > >  	loff_t length = iomap_length(iter);
> > > > >  	loff_t pos = iter->pos;
> > > > >  	unsigned int bio_opf;
> > > > > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> > > > >  	size_t copied = 0;
> > > > >  	size_t orig_count;
> > > > >  
> > > > > -	if ((pos | length | align) & ((1 << blkbits) - 1))
> > > > > +	if ((pos | length) & ((1 << blkbits) - 1) ||
> > > > > +	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
> > > > >  		return -EINVAL;
> > > > >  
> > > > >  	if (iomap->type == IOMAP_UNWRITTEN) {
> > > > 
> > > > I noticed that this patch is going to break the following logic in
> > > > f2fs_should_use_dio() in fs/f2fs/file.c:
> > > > 
> > > > 	/*
> > > > 	 * Direct I/O not aligned to the disk's logical_block_size will be
> > > > 	 * attempted, but will fail with -EINVAL.
> > > > 	 *
> > > > 	 * f2fs additionally requires that direct I/O be aligned to the
> > > > 	 * filesystem block size, which is often a stricter requirement.
> > > > 	 * However, f2fs traditionally falls back to buffered I/O on requests
> > > > 	 * that are logical_block_size-aligned but not fs-block aligned.
> > > > 	 *
> > > > 	 * The below logic implements this behavior.
> > > > 	 */
> > > > 	align = iocb->ki_pos | iov_iter_alignment(iter);
> > > > 	if (!IS_ALIGNED(align, i_blocksize(inode)) &&
> > > > 	    IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev)))
> > > > 		return false;
> > > > 
> > > > 	return true;
> > > > 
> > > > So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical
> > > > block aligned.  This patch changes that.  The result is that DIO will sometimes
> > > > proceed in cases where the I/O doesn't have the fs block alignment required by
> > > > f2fs for all DIO.
> > > > 
> > > > Does anyone have any thoughts about what f2fs should be doing here?  I think
> > > > it's weird that f2fs has different behaviors for different degrees of
> > > > misalignment: fail with EINVAL if not logical block aligned, else fallback to
> > > > buffered I/O if not fs block aligned.  I think it should be one convention or
> > > > the other.  Any opinions about which one it should be?
> > > 
> > > It looks like f2fs just falls back to buffered IO for this condition without
> > > reaching the new code in iomap_dio_bio_iter().
> > 
> > No.  It's a bit subtle, so read the code and what I'm saying carefully.  f2fs
> > only supports 4K aligned DIO and normally falls back to buffered I/O; however,
> > for DIO that is *very* misaligned (not even LBS aligned) it returns EINVAL
> > instead.  And it relies on __iomap_dio_rw() returning that EINVAL.
> 
> Okay, I understand the code flow now.
> 
> I tested f2fs direct io with every possible alignment, and it is successful for
> all hardware dma supported alignments and continues to return EINVAL for
> unaligned. Is the concern that it's now returning success in scenarios that
> used to fail? Or is there some other 4k f2fs constraint that I haven't found
> yet?

f2fs has a lot of different options, and edge cases that only occur occasionally
such as data blocks being moved by garbage collection.  So just because
misaligned DIO appears to work doesn't necessarily mean that it actually works.
Maybe the f2fs developers can elaborate on the reason that f2fs always requires
4K alignment for DIO.

- Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2022-07-25 18:19 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-10 19:58 [PATCHv6 00/11] direct-io dma alignment Keith Busch
2022-06-10 19:58 ` [PATCHv6 01/11] block: fix infinite loop for invalid zone append Keith Busch
2022-06-10 19:58 ` [PATCHv6 02/11] block/bio: remove duplicate append pages code Keith Busch
2022-06-10 19:58 ` [PATCHv6 03/11] block: export dma_alignment attribute Keith Busch
2022-06-10 19:58 ` [PATCHv6 04/11] block: introduce bdev_dma_alignment helper Keith Busch
2022-06-10 19:58 ` [PATCHv6 05/11] block: add a helper function for dio alignment Keith Busch
2022-07-22 21:53   ` Bart Van Assche
2022-06-10 19:58 ` [PATCHv6 06/11] block/merge: count bytes instead of sectors Keith Busch
2022-07-22 21:57   ` Bart Van Assche
2022-06-10 19:58 ` [PATCHv6 07/11] block/bounce: " Keith Busch
2022-06-13 14:22   ` Christoph Hellwig
2022-07-22 22:01   ` Bart Van Assche
2022-07-25 14:46     ` Keith Busch
2022-06-10 19:58 ` [PATCHv6 08/11] iov: introduce iov_iter_aligned Keith Busch
2022-06-10 19:58 ` [PATCHv6 09/11] block: introduce bdev_iter_is_aligned helper Keith Busch
2022-06-10 19:58 ` [PATCHv6 10/11] block: relax direct io memory alignment Keith Busch
2022-06-10 19:58 ` [PATCHv6 11/11] iomap: add support for dma aligned direct-io Keith Busch
2022-06-23 18:29   ` Eric Farman
2022-06-23 18:51     ` Keith Busch
2022-06-23 19:11       ` Keith Busch
2022-06-23 20:32         ` Eric Farman
2022-06-23 21:34           ` Eric Farman
2022-06-27 15:21             ` Eric Farman
2022-06-27 15:36               ` Keith Busch
2022-06-28  9:00                 ` Halil Pasic
2022-06-28 15:20                   ` Eric Farman
2022-06-29  3:18                     ` Eric Farman
2022-06-29  3:52                       ` Keith Busch
2022-06-29 18:04                         ` Eric Farman
2022-06-29 19:07                           ` Keith Busch
2022-06-29 19:28                             ` Eric Farman
2022-06-30  5:45                             ` Christian Borntraeger
2022-07-22  7:36   ` Eric Biggers
2022-07-22  7:36     ` [f2fs-dev] " Eric Biggers
2022-07-22 14:43     ` Keith Busch
2022-07-22 14:43       ` [f2fs-dev] " Keith Busch
2022-07-22 18:01       ` Eric Biggers
2022-07-22 18:01         ` [f2fs-dev] " Eric Biggers
2022-07-22 20:26         ` Keith Busch
2022-07-22 20:26           ` [f2fs-dev] " Keith Busch
2022-07-25 18:19           ` Eric Biggers
2022-07-25 18:19             ` [f2fs-dev] " Eric Biggers
2022-07-24  2:13         ` Jaegeuk Kim
2022-07-24  2:13           ` [f2fs-dev] " Jaegeuk Kim
2022-07-22 17:53     ` Darrick J. Wong
2022-07-22 17:53       ` [f2fs-dev] " Darrick J. Wong
2022-07-22 18:12       ` Eric Biggers
2022-07-22 18:12         ` [f2fs-dev] " Eric Biggers
2022-07-23  5:03         ` Darrick J. Wong
2022-07-23  5:03           ` [f2fs-dev] " Darrick J. Wong
2022-06-13 21:22 ` [PATCHv6 00/11] direct-io dma alignment Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.