linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv5 00/11] direct-io dma alignment
@ 2022-05-31 19:11 Keith Busch
  2022-05-31 19:11 ` [PATCHv5 01/11] block: fix infinite loop for invalid zone append Keith Busch
                   ` (11 more replies)
  0 siblings, 12 replies; 25+ messages in thread
From: Keith Busch @ 2022-05-31 19:11 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

From: Keith Busch <kbusch@kernel.org>

The most significant change from v4 is the alignment is now checked
prior to building the bio. This gets the expected EINVAL error for
misaligned userspace iovecs in all cases now (Eric Biggers).

I've removed the legacy fs change, so only iomap filesystems get to use
this alignement capability (Christoph Hellwig).

The block fops check for alignment returns a bool now (Damien).

Adjusted some comments, docs, and other minor style issues.

Reviews added for unchanged or trivially changed patches, removed
reviews for ones that changed more significantly.

As before, I tested using 'fio' with forced misaligned user buffers on
raw block, xfs, and ext4 (example raw block profile below).

  [global]
  filename=/dev/nvme0n1
  ioengine=io_uring
  verify=crc32c
  rw=randwrite
  iodepth=64
  direct=1

  [small]
  stonewall
  bsrange=4k-64k
  iomem_align=4

  [large]
  stonewall
  bsrange=512k-4M
  iomem_align=100

Keith Busch (11):
  block: fix infinite loop for invalid zone append
  block/bio: remove duplicate append pages code
  block: export dma_alignment attribute
  block: introduce bdev_dma_alignment helper
  block: add a helper function for dio alignment
  block/merge: count bytes instead of sectors
  block/bounce: count bytes instead of sectors
  iov: introduce iov_iter_aligned
  block: introduce bdev_iter_is_aligned helper
  block: relax direct io memory alignment
  fs: add support for dma aligned direct-io

 Documentation/ABI/stable/sysfs-block |   9 +++
 block/bio.c                          | 114 ++++++++++++---------------
 block/blk-merge.c                    |  41 ++++++----
 block/blk-sysfs.c                    |   7 ++
 block/bounce.c                       |  13 ++-
 block/fops.c                         |  16 ++--
 fs/iomap/direct-io.c                 |   4 +-
 include/linux/blkdev.h               |  12 +++
 include/linux/uio.h                  |   2 +
 lib/iov_iter.c                       |  92 +++++++++++++++++++++
 10 files changed, 219 insertions(+), 91 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCHv5 01/11] block: fix infinite loop for invalid zone append
  2022-05-31 19:11 [PATCHv5 00/11] direct-io dma alignment Keith Busch
@ 2022-05-31 19:11 ` Keith Busch
  2022-06-01  8:03   ` Johannes Thumshirn
  2022-05-31 19:11 ` [PATCHv5 02/11] block/bio: remove duplicate append pages code Keith Busch
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Keith Busch @ 2022-05-31 19:11 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

From: Keith Busch <kbusch@kernel.org>

Returning 0 early from __bio_iov_append_get_pages() for the
max_append_sectors warning just creates an infinite loop since 0 means
success, and the bio will never fill from the unadvancing iov_iter. We
could turn the return into an error value, but it will already be turned
into an error value later on, so just remove the warning. Clearly no one
ever hit it anyway.

Fixes: 0512a75b98f84 ("block: Introduce REQ_OP_ZONE_APPEND")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/bio.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index a3893d80dccc..e249f6414fd5 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1228,9 +1228,6 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
 	size_t offset;
 	int ret = 0;
 
-	if (WARN_ON_ONCE(!max_append_sectors))
-		return 0;
-
 	/*
 	 * Move page array up in the allocated memory for the bio vecs as far as
 	 * possible so that we can start filling biovecs from the beginning
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv5 02/11] block/bio: remove duplicate append pages code
  2022-05-31 19:11 [PATCHv5 00/11] direct-io dma alignment Keith Busch
  2022-05-31 19:11 ` [PATCHv5 01/11] block: fix infinite loop for invalid zone append Keith Busch
@ 2022-05-31 19:11 ` Keith Busch
  2022-05-31 19:11 ` [PATCHv5 03/11] block: export dma_alignment attribute Keith Busch
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Keith Busch @ 2022-05-31 19:11 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch, Johannes Thumshirn

From: Keith Busch <kbusch@kernel.org>

The getting pages setup for zone append and normal IO are identical. Use
common code for each.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/bio.c | 102 ++++++++++++++++++++++------------------------------
 1 file changed, 42 insertions(+), 60 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index e249f6414fd5..55d2a9c4e312 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1158,6 +1158,37 @@ static void bio_put_pages(struct page **pages, size_t size, size_t off)
 		put_page(pages[i]);
 }
 
+static int bio_iov_add_page(struct bio *bio, struct page *page,
+		unsigned int len, unsigned int offset)
+{
+	bool same_page = false;
+
+	if (!__bio_try_merge_page(bio, page, len, offset, &same_page)) {
+		if (WARN_ON_ONCE(bio_full(bio, len)))
+			return -EINVAL;
+		__bio_add_page(bio, page, len, offset);
+		return 0;
+	}
+
+	if (same_page)
+		put_page(page);
+	return 0;
+}
+
+static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page,
+		unsigned int len, unsigned int offset)
+{
+	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
+	bool same_page = false;
+
+	if (bio_add_hw_page(q, bio, page, len, offset,
+			queue_max_zone_append_sectors(q), &same_page) != len)
+		return -EINVAL;
+	if (same_page)
+		put_page(page);
+	return 0;
+}
+
 #define PAGE_PTRS_PER_BVEC     (sizeof(struct bio_vec) / sizeof(struct page *))
 
 /**
@@ -1176,7 +1207,6 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
 	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
 	struct page **pages = (struct page **)bv;
-	bool same_page = false;
 	ssize_t size, left;
 	unsigned len, i;
 	size_t offset;
@@ -1185,7 +1215,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	 * Move page array up in the allocated memory for the bio vecs as far as
 	 * possible so that we can start filling biovecs from the beginning
 	 * without overwriting the temporary page array.
-	*/
+	 */
 	BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
 	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
 
@@ -1195,18 +1225,18 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 
 	for (left = size, i = 0; left > 0; left -= len, i++) {
 		struct page *page = pages[i];
+		int ret;
 
 		len = min_t(size_t, PAGE_SIZE - offset, left);
+		if (bio_op(bio) == REQ_OP_ZONE_APPEND)
+			ret = bio_iov_add_zone_append_page(bio, page, len,
+					offset);
+		else
+			ret = bio_iov_add_page(bio, page, len, offset);
 
-		if (__bio_try_merge_page(bio, page, len, offset, &same_page)) {
-			if (same_page)
-				put_page(page);
-		} else {
-			if (WARN_ON_ONCE(bio_full(bio, len))) {
-				bio_put_pages(pages + i, left, offset);
-				return -EINVAL;
-			}
-			__bio_add_page(bio, page, len, offset);
+		if (ret) {
+			bio_put_pages(pages + i, left, offset);
+			return ret;
 		}
 		offset = 0;
 	}
@@ -1215,51 +1245,6 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	return 0;
 }
 
-static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
-{
-	unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
-	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
-	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
-	unsigned int max_append_sectors = queue_max_zone_append_sectors(q);
-	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
-	struct page **pages = (struct page **)bv;
-	ssize_t size, left;
-	unsigned len, i;
-	size_t offset;
-	int ret = 0;
-
-	/*
-	 * Move page array up in the allocated memory for the bio vecs as far as
-	 * possible so that we can start filling biovecs from the beginning
-	 * without overwriting the temporary page array.
-	 */
-	BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
-	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
-
-	size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset);
-	if (unlikely(size <= 0))
-		return size ? size : -EFAULT;
-
-	for (left = size, i = 0; left > 0; left -= len, i++) {
-		struct page *page = pages[i];
-		bool same_page = false;
-
-		len = min_t(size_t, PAGE_SIZE - offset, left);
-		if (bio_add_hw_page(q, bio, page, len, offset,
-				max_append_sectors, &same_page) != len) {
-			bio_put_pages(pages + i, left, offset);
-			ret = -EINVAL;
-			break;
-		}
-		if (same_page)
-			put_page(page);
-		offset = 0;
-	}
-
-	iov_iter_advance(iter, size - left);
-	return ret;
-}
-
 /**
  * bio_iov_iter_get_pages - add user or kernel pages to a bio
  * @bio: bio to add pages to
@@ -1294,10 +1279,7 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	}
 
 	do {
-		if (bio_op(bio) == REQ_OP_ZONE_APPEND)
-			ret = __bio_iov_append_get_pages(bio, iter);
-		else
-			ret = __bio_iov_iter_get_pages(bio, iter);
+		ret = __bio_iov_iter_get_pages(bio, iter);
 	} while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
 
 	/* don't account direct I/O as memory stall */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv5 03/11] block: export dma_alignment attribute
  2022-05-31 19:11 [PATCHv5 00/11] direct-io dma alignment Keith Busch
  2022-05-31 19:11 ` [PATCHv5 01/11] block: fix infinite loop for invalid zone append Keith Busch
  2022-05-31 19:11 ` [PATCHv5 02/11] block/bio: remove duplicate append pages code Keith Busch
@ 2022-05-31 19:11 ` Keith Busch
  2022-05-31 19:11 ` [PATCHv5 04/11] block: introduce bdev_dma_alignment helper Keith Busch
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Keith Busch @ 2022-05-31 19:11 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch, Johannes Thumshirn

From: Keith Busch <kbusch@kernel.org>

User space may want to know how to align their buffers to avoid
bouncing. Export the queue attribute.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
v4->v5:

  Fixed alphabetical documentation order (Eric)

 Documentation/ABI/stable/sysfs-block | 9 +++++++++
 block/blk-sysfs.c                    | 7 +++++++
 2 files changed, 16 insertions(+)

diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
index e8797cd09aff..cd14ecb3c9a5 100644
--- a/Documentation/ABI/stable/sysfs-block
+++ b/Documentation/ABI/stable/sysfs-block
@@ -260,6 +260,15 @@ Description:
 		for discards, and don't read this file.
 
 
+What:		/sys/block/<disk>/queue/dma_alignment
+Date:		May 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		Reports the alignment that user space addresses must have to be
+		used for raw block device access with O_DIRECT and other driver
+		specific passthrough mechanisms.
+
+
 What:		/sys/block/<disk>/queue/fua
 Date:		May 2018
 Contact:	linux-block@vger.kernel.org
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 88bd41d4cb59..14607565d781 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -274,6 +274,11 @@ static ssize_t queue_virt_boundary_mask_show(struct request_queue *q, char *page
 	return queue_var_show(q->limits.virt_boundary_mask, page);
 }
 
+static ssize_t queue_dma_alignment_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(queue_dma_alignment(q), page);
+}
+
 #define QUEUE_SYSFS_BIT_FNS(name, flag, neg)				\
 static ssize_t								\
 queue_##name##_show(struct request_queue *q, char *page)		\
@@ -606,6 +611,7 @@ QUEUE_RO_ENTRY(queue_dax, "dax");
 QUEUE_RW_ENTRY(queue_io_timeout, "io_timeout");
 QUEUE_RW_ENTRY(queue_wb_lat, "wbt_lat_usec");
 QUEUE_RO_ENTRY(queue_virt_boundary_mask, "virt_boundary_mask");
+QUEUE_RO_ENTRY(queue_dma_alignment, "dma_alignment");
 
 #ifdef CONFIG_BLK_DEV_THROTTLING_LOW
 QUEUE_RW_ENTRY(blk_throtl_sample_time, "throttle_sample_time");
@@ -667,6 +673,7 @@ static struct attribute *queue_attrs[] = {
 	&blk_throtl_sample_time_entry.attr,
 #endif
 	&queue_virt_boundary_mask_entry.attr,
+	&queue_dma_alignment_entry.attr,
 	NULL,
 };
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv5 04/11] block: introduce bdev_dma_alignment helper
  2022-05-31 19:11 [PATCHv5 00/11] direct-io dma alignment Keith Busch
                   ` (2 preceding siblings ...)
  2022-05-31 19:11 ` [PATCHv5 03/11] block: export dma_alignment attribute Keith Busch
@ 2022-05-31 19:11 ` Keith Busch
  2022-05-31 19:11 ` [PATCHv5 05/11] block: add a helper function for dio alignment Keith Busch
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Keith Busch @ 2022-05-31 19:11 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch, Johannes Thumshirn

From: Keith Busch <kbusch@kernel.org>

Preparing for upcoming dma_alignment users that have a block_device, but
don't need the request_queue.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/blkdev.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5bdf2ac9142c..834b981ef01b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1365,6 +1365,11 @@ static inline int queue_dma_alignment(const struct request_queue *q)
 	return q ? q->dma_alignment : 511;
 }
 
+static inline unsigned int bdev_dma_alignment(struct block_device *bdev)
+{
+	return queue_dma_alignment(bdev_get_queue(bdev));
+}
+
 static inline int blk_rq_aligned(struct request_queue *q, unsigned long addr,
 				 unsigned int len)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv5 05/11] block: add a helper function for dio alignment
  2022-05-31 19:11 [PATCHv5 00/11] direct-io dma alignment Keith Busch
                   ` (3 preceding siblings ...)
  2022-05-31 19:11 ` [PATCHv5 04/11] block: introduce bdev_dma_alignment helper Keith Busch
@ 2022-05-31 19:11 ` Keith Busch
  2022-06-01  5:29   ` Christoph Hellwig
  2022-06-01  8:04   ` Johannes Thumshirn
  2022-05-31 19:11 ` [PATCHv5 06/11] block/merge: count bytes instead of sectors Keith Busch
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 25+ messages in thread
From: Keith Busch @ 2022-05-31 19:11 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

From: Keith Busch <kbusch@kernel.org>

This will make it easier to add more complex acceptable alignment
criteria in the future.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
v4->v5:

  Return a bool (Damien)

 block/fops.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index b9b83030e0df..5aec9a130812 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -42,6 +42,13 @@ static unsigned int dio_bio_write_op(struct kiocb *iocb)
 	return op;
 }
 
+static bool blkdev_dio_unaligned(struct block_device *bdev, loff_t pos,
+			      struct iov_iter *iter)
+{
+	return ((pos | iov_iter_alignment(iter)) &
+	    (bdev_logical_block_size(bdev) - 1));
+}
+
 #define DIO_INLINE_BIO_VECS 4
 
 static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
@@ -54,8 +61,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
 	struct bio bio;
 	ssize_t ret;
 
-	if ((pos | iov_iter_alignment(iter)) &
-	    (bdev_logical_block_size(bdev) - 1))
+	if (blkdev_dio_unaligned(bdev, pos, iter))
 		return -EINVAL;
 
 	if (nr_pages <= DIO_INLINE_BIO_VECS)
@@ -173,8 +179,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 	loff_t pos = iocb->ki_pos;
 	int ret = 0;
 
-	if ((pos | iov_iter_alignment(iter)) &
-	    (bdev_logical_block_size(bdev) - 1))
+	if (blkdev_dio_unaligned(bdev, pos, iter))
 		return -EINVAL;
 
 	if (iocb->ki_flags & IOCB_ALLOC_CACHE)
@@ -298,8 +303,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
 	loff_t pos = iocb->ki_pos;
 	int ret = 0;
 
-	if ((pos | iov_iter_alignment(iter)) &
-	    (bdev_logical_block_size(bdev) - 1))
+	if (blkdev_dio_unaligned(bdev, pos, iter))
 		return -EINVAL;
 
 	if (iocb->ki_flags & IOCB_ALLOC_CACHE)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv5 06/11] block/merge: count bytes instead of sectors
  2022-05-31 19:11 [PATCHv5 00/11] direct-io dma alignment Keith Busch
                   ` (4 preceding siblings ...)
  2022-05-31 19:11 ` [PATCHv5 05/11] block: add a helper function for dio alignment Keith Busch
@ 2022-05-31 19:11 ` Keith Busch
  2022-06-01  8:05   ` Johannes Thumshirn
  2022-05-31 19:11 ` [PATCHv5 08/11] iov: introduce iov_iter_aligned Keith Busch
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Keith Busch @ 2022-05-31 19:11 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

From: Keith Busch <kbusch@kernel.org>

Individual bv_len's may not be a sector size.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
v4->v5:

  Updated comment (Christoph)

 block/blk-merge.c | 41 ++++++++++++++++++++++++-----------------
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 7771dacc99cb..21fb4d14ca3e 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -201,11 +201,11 @@ static inline unsigned get_max_segment_size(const struct request_queue *q,
  * @nsegs:    [in,out] Number of segments in the bio being built. Incremented
  *            by the number of segments from @bv that may be appended to that
  *            bio without exceeding @max_segs
- * @sectors:  [in,out] Number of sectors in the bio being built. Incremented
- *            by the number of sectors from @bv that may be appended to that
- *            bio without exceeding @max_sectors
+ * @bytes:    [in,out] Number of bytes in the bio being built. Incremented
+ *            by the number of bytes from @bv that may be appended to that
+ *            bio without exceeding @max_bytes
  * @max_segs: [in] upper bound for *@nsegs
- * @max_sectors: [in] upper bound for *@sectors
+ * @max_bytes: [in] upper bound for *@bytes
  *
  * When splitting a bio, it can happen that a bvec is encountered that is too
  * big to fit in a single segment and hence that it has to be split in the
@@ -216,10 +216,10 @@ static inline unsigned get_max_segment_size(const struct request_queue *q,
  */
 static bool bvec_split_segs(const struct request_queue *q,
 			    const struct bio_vec *bv, unsigned *nsegs,
-			    unsigned *sectors, unsigned max_segs,
-			    unsigned max_sectors)
+			    unsigned *bytes, unsigned max_segs,
+			    unsigned max_bytes)
 {
-	unsigned max_len = (min(max_sectors, UINT_MAX >> 9) - *sectors) << 9;
+	unsigned max_len = min(max_bytes, UINT_MAX) - *bytes;
 	unsigned len = min(bv->bv_len, max_len);
 	unsigned total_len = 0;
 	unsigned seg_size = 0;
@@ -237,7 +237,7 @@ static bool bvec_split_segs(const struct request_queue *q,
 			break;
 	}
 
-	*sectors += total_len >> 9;
+	*bytes += total_len;
 
 	/* tell the caller to split the bvec if it is too big to fit */
 	return len > 0 || bv->bv_len > max_len;
@@ -269,8 +269,8 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 {
 	struct bio_vec bv, bvprv, *bvprvp = NULL;
 	struct bvec_iter iter;
-	unsigned nsegs = 0, sectors = 0;
-	const unsigned max_sectors = get_max_io_size(q, bio);
+	unsigned nsegs = 0, bytes = 0;
+	const unsigned max_bytes = get_max_io_size(q, bio) << 9;
 	const unsigned max_segs = queue_max_segments(q);
 
 	bio_for_each_bvec(bv, bio, iter) {
@@ -282,12 +282,12 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 			goto split;
 
 		if (nsegs < max_segs &&
-		    sectors + (bv.bv_len >> 9) <= max_sectors &&
+		    bytes + bv.bv_len <= max_bytes &&
 		    bv.bv_offset + bv.bv_len <= PAGE_SIZE) {
 			nsegs++;
-			sectors += bv.bv_len >> 9;
-		} else if (bvec_split_segs(q, &bv, &nsegs, &sectors, max_segs,
-					 max_sectors)) {
+			bytes += bv.bv_len;
+		} else if (bvec_split_segs(q, &bv, &nsegs, &bytes, max_segs,
+					   max_bytes)) {
 			goto split;
 		}
 
@@ -300,13 +300,20 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 split:
 	*segs = nsegs;
 
+	/*
+	 * Individual bvecs may not be logical block aligned. Round down the
+	 * split size so that each bio is properly sector size aligned, even if
+	 * we do not use the full hardware limits.
+	 */
+	bytes = ALIGN_DOWN(bytes, queue_logical_block_size(q));
+
 	/*
 	 * Bio splitting may cause subtle trouble such as hang when doing sync
 	 * iopoll in direct IO routine. Given performance gain of iopoll for
 	 * big IO can be trival, disable iopoll when split needed.
 	 */
 	bio_clear_polled(bio);
-	return bio_split(bio, sectors, GFP_NOIO, bs);
+	return bio_split(bio, bytes >> SECTOR_SHIFT, GFP_NOIO, bs);
 }
 
 /**
@@ -375,7 +382,7 @@ EXPORT_SYMBOL(blk_queue_split);
 unsigned int blk_recalc_rq_segments(struct request *rq)
 {
 	unsigned int nr_phys_segs = 0;
-	unsigned int nr_sectors = 0;
+	unsigned int bytes = 0;
 	struct req_iterator iter;
 	struct bio_vec bv;
 
@@ -398,7 +405,7 @@ unsigned int blk_recalc_rq_segments(struct request *rq)
 	}
 
 	rq_for_each_bvec(bv, rq, iter)
-		bvec_split_segs(rq->q, &bv, &nr_phys_segs, &nr_sectors,
+		bvec_split_segs(rq->q, &bv, &nr_phys_segs, &bytes,
 				UINT_MAX, UINT_MAX);
 	return nr_phys_segs;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv5 08/11] iov: introduce iov_iter_aligned
  2022-05-31 19:11 [PATCHv5 00/11] direct-io dma alignment Keith Busch
                   ` (5 preceding siblings ...)
  2022-05-31 19:11 ` [PATCHv5 06/11] block/merge: count bytes instead of sectors Keith Busch
@ 2022-05-31 19:11 ` Keith Busch
  2022-06-01  5:30   ` Christoph Hellwig
  2022-05-31 19:11 ` [PATCHv5 09/11] block: introduce bdev_iter_is_aligned helper Keith Busch
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Keith Busch @ 2022-05-31 19:11 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch, Alexander Viro

From: Keith Busch <kbusch@kernel.org>

The existing iov_iter_alignment() function returns the logical OR of
address and length. For cases where address and length need to be
considered separately, introduce a helper function that a caller can
specificy length and address masks that indicate if the iov is
unaligned.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 include/linux/uio.h |  2 +
 lib/iov_iter.c      | 92 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 94 insertions(+)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 739285fe5a2f..34ba4a731179 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -219,6 +219,8 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
 #endif
 
 size_t iov_iter_zero(size_t bytes, struct iov_iter *);
+bool iov_iter_is_aligned(const struct iov_iter *i, unsigned addr_mask,
+			unsigned len_mask);
 unsigned long iov_iter_alignment(const struct iov_iter *i);
 unsigned long iov_iter_gap_alignment(const struct iov_iter *i);
 void iov_iter_init(struct iov_iter *i, unsigned int direction, const struct iovec *iov,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 6dd5330f7a99..a39b24496878 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1268,6 +1268,98 @@ void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count)
 }
 EXPORT_SYMBOL(iov_iter_discard);
 
+static bool iov_iter_aligned_iovec(const struct iov_iter *i, unsigned addr_mask,
+				   unsigned len_mask)
+{
+	size_t size = i->count;
+	size_t skip = i->iov_offset;
+	unsigned k;
+
+	for (k = 0; k < i->nr_segs; k++, skip = 0) {
+		size_t len = i->iov[k].iov_len - skip;
+
+		if (len > size)
+			len = size;
+		if (len & len_mask)
+			return false;
+		if ((unsigned long)(i->iov[k].iov_base + skip) & addr_mask)
+			return false;
+
+		size -= len;
+		if (!size)
+			break;
+	}
+	return true;
+}
+
+static bool iov_iter_aligned_bvec(const struct iov_iter *i, unsigned addr_mask,
+				  unsigned len_mask)
+{
+	size_t size = i->count;
+	unsigned skip = i->iov_offset;
+	unsigned k;
+
+	for (k = 0; k < i->nr_segs; k++, skip = 0) {
+		size_t len = i->bvec[k].bv_len - skip;
+
+		if (len > size)
+			len = size;
+		if (len & len_mask)
+			return false;
+		if ((unsigned long)(i->bvec[k].bv_offset + skip) & addr_mask)
+			return false;
+
+		size -= len;
+		if (!size)
+			break;
+	}
+	return true;
+}
+
+/**
+ * iov_iter_is_aligned() - Check if the addresses and lengths of each segments
+ * 	are aligned to the parameters.
+ *
+ * @i: &struct iov_iter to restore
+ * @addr_mask: bit mask to check against the iov element's addresses
+ * @len_mask: bit mask to check against the iov element's lengths
+ *
+ * Return: false if any addresses or lengths intersect with the provided masks
+ */
+bool iov_iter_is_aligned(const struct iov_iter *i, unsigned addr_mask,
+			 unsigned len_mask)
+{
+	if (likely(iter_is_iovec(i) || iov_iter_is_kvec(i)))
+		return iov_iter_aligned_iovec(i, addr_mask, len_mask);
+
+	if (iov_iter_is_bvec(i))
+		return iov_iter_aligned_bvec(i, addr_mask, len_mask);
+
+	if (iov_iter_is_pipe(i)) {
+		unsigned int p_mask = i->pipe->ring_size - 1;
+		size_t size = i->count;
+
+		if (size & len_mask)
+			return false;
+		if (size && allocated(&i->pipe->bufs[i->head & p_mask])) {
+			if (i->iov_offset & addr_mask)
+				return false;
+		}
+
+		return true;
+	}
+
+	if (iov_iter_is_xarray(i)) {
+		if (i->count & len_mask)
+			return false;
+		if ((i->xarray_start + i->iov_offset) & addr_mask)
+			return false;
+	}
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(iov_iter_is_aligned);
+
 static unsigned long iov_iter_alignment_iovec(const struct iov_iter *i)
 {
 	unsigned long res = 0;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv5 09/11] block: introduce bdev_iter_is_aligned helper
  2022-05-31 19:11 [PATCHv5 00/11] direct-io dma alignment Keith Busch
                   ` (6 preceding siblings ...)
  2022-05-31 19:11 ` [PATCHv5 08/11] iov: introduce iov_iter_aligned Keith Busch
@ 2022-05-31 19:11 ` Keith Busch
  2022-05-31 21:46   ` Eric Biggers
  2022-06-01  5:31   ` Christoph Hellwig
  2022-05-31 19:11 ` [PATCHv5 10/11] block: relax direct io memory alignment Keith Busch
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 25+ messages in thread
From: Keith Busch @ 2022-05-31 19:11 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

From: Keith Busch <kbusch@kernel.org>

Provide a convenient function for this repeatable coding pattern.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 include/linux/blkdev.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 834b981ef01b..583cdeb8895d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1370,6 +1370,13 @@ static inline unsigned int bdev_dma_alignment(struct block_device *bdev)
 	return queue_dma_alignment(bdev_get_queue(bdev));
 }
 
+static inline bool bvev_iter_is_aligned(struct block_device *bdev,
+					struct iov_iter *iter)
+{
+	return iov_iter_is_aligned(iter, bdev_dma_alignment(bdev),
+				   bdev_logical_block_size(bdev) - 1);
+}
+
 static inline int blk_rq_aligned(struct request_queue *q, unsigned long addr,
 				 unsigned int len)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv5 10/11] block: relax direct io memory alignment
  2022-05-31 19:11 [PATCHv5 00/11] direct-io dma alignment Keith Busch
                   ` (7 preceding siblings ...)
  2022-05-31 19:11 ` [PATCHv5 09/11] block: introduce bdev_iter_is_aligned helper Keith Busch
@ 2022-05-31 19:11 ` Keith Busch
  2022-06-01  5:31   ` Christoph Hellwig
  2022-05-31 19:11 ` [PATCHv5 11/11] fs: add support for dma aligned direct-io Keith Busch
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Keith Busch @ 2022-05-31 19:11 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

From: Keith Busch <kbusch@kernel.org>

Use the address alignment requirements from the block_device for direct
io instead of requiring addresses be aligned to the block size. User
space can discover the alignment requirements from the dma_alignment
queue attribute.

User space can specify any hardware compatible DMA offset for each
segment, but every segment length is still required to be a multiple of
the block size.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 block/bio.c  | 9 +++++++++
 block/fops.c | 4 ++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 55d2a9c4e312..44658aa57784 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1219,7 +1219,16 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
 	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
 
+	/*
+	 * Each segment in the iov is required to be a block size multiple.
+	 * However, we may not be able to get the entire segment if it spans
+	 * more pages than bi_max_vecs allows, so we have to ALIGN_DOWN the
+	 * result to ensure the bio's total size is correct. The remainder of
+	 * the iov data will be picked up in the next bio iteration.
+	 */
 	size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset);
+	if (size > 0)
+		size = ALIGN_DOWN(size, bdev_logical_block_size(bio->bi_bdev));
 	if (unlikely(size <= 0))
 		return size ? size : -EFAULT;
 
diff --git a/block/fops.c b/block/fops.c
index 5aec9a130812..7a02b75009bb 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -45,8 +45,8 @@ static unsigned int dio_bio_write_op(struct kiocb *iocb)
 static bool blkdev_dio_unaligned(struct block_device *bdev, loff_t pos,
 			      struct iov_iter *iter)
 {
-	return ((pos | iov_iter_alignment(iter)) &
-	    (bdev_logical_block_size(bdev) - 1));
+	return pos & (bdev_logical_block_size(bdev) - 1) ||
+		!bvev_iter_is_aligned(bdev, iter);
 }
 
 #define DIO_INLINE_BIO_VECS 4
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv5 11/11] fs: add support for dma aligned direct-io
  2022-05-31 19:11 [PATCHv5 00/11] direct-io dma alignment Keith Busch
                   ` (8 preceding siblings ...)
  2022-05-31 19:11 ` [PATCHv5 10/11] block: relax direct io memory alignment Keith Busch
@ 2022-05-31 19:11 ` Keith Busch
  2022-06-01  5:32   ` Christoph Hellwig
       [not found] ` <20220531191137.2291467-8-kbusch@fb.com>
  2022-06-01  7:11 ` [PATCHv5 00/11] direct-io dma alignment Eric Biggers
  11 siblings, 1 reply; 25+ messages in thread
From: Keith Busch @ 2022-05-31 19:11 UTC (permalink / raw)
  To: linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

From: Keith Busch <kbusch@kernel.org>

Use the address alignment requirements from the block_device for direct
io instead of requiring addresses be aligned to the block size.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 fs/iomap/direct-io.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 80f9b047aa1b..dd78549d9fd6 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -233,7 +233,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
 	struct inode *inode = iter->inode;
 	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
 	unsigned int fs_block_size = i_blocksize(inode), pad;
-	unsigned int align = iov_iter_alignment(dio->submit.iter);
 	loff_t length = iomap_length(iter);
 	loff_t pos = iter->pos;
 	unsigned int bio_opf;
@@ -244,7 +243,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
 	size_t copied = 0;
 	size_t orig_count;
 
-	if ((pos | length | align) & ((1 << blkbits) - 1))
+	if ((pos | length) & ((1 << blkbits) - 1) ||
+	    !bvev_iter_is_aligned(iomap->bdev, dio->submit.iter))
 		return -EINVAL;
 
 	if (iomap->type == IOMAP_UNWRITTEN) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCHv5 09/11] block: introduce bdev_iter_is_aligned helper
  2022-05-31 19:11 ` [PATCHv5 09/11] block: introduce bdev_iter_is_aligned helper Keith Busch
@ 2022-05-31 21:46   ` Eric Biggers
  2022-06-01  5:31   ` Christoph Hellwig
  1 sibling, 0 replies; 25+ messages in thread
From: Eric Biggers @ 2022-05-31 21:46 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch,
	bvanassche, damien.lemoal, pankydev8, Keith Busch

On Tue, May 31, 2022 at 12:11:35PM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Provide a convenient function for this repeatable coding pattern.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  include/linux/blkdev.h | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 834b981ef01b..583cdeb8895d 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1370,6 +1370,13 @@ static inline unsigned int bdev_dma_alignment(struct block_device *bdev)
>  	return queue_dma_alignment(bdev_get_queue(bdev));
>  }
>  
> +static inline bool bvev_iter_is_aligned(struct block_device *bdev,
> +					struct iov_iter *iter)
> +{
> +	return iov_iter_is_aligned(iter, bdev_dma_alignment(bdev),
> +				   bdev_logical_block_size(bdev) - 1);
> +}

"bdev", not "bvev".

- Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv5 05/11] block: add a helper function for dio alignment
  2022-05-31 19:11 ` [PATCHv5 05/11] block: add a helper function for dio alignment Keith Busch
@ 2022-06-01  5:29   ` Christoph Hellwig
  2022-06-01  8:04   ` Johannes Thumshirn
  1 sibling, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-06-01  5:29 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch,
	bvanassche, damien.lemoal, ebiggers, pankydev8, Keith Busch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv5 08/11] iov: introduce iov_iter_aligned
  2022-05-31 19:11 ` [PATCHv5 08/11] iov: introduce iov_iter_aligned Keith Busch
@ 2022-06-01  5:30   ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-06-01  5:30 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch,
	bvanassche, damien.lemoal, ebiggers, pankydev8, Keith Busch,
	Alexander Viro

On Tue, May 31, 2022 at 12:11:34PM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> The existing iov_iter_alignment() function returns the logical OR of
> address and length. For cases where address and length need to be
> considered separately, introduce a helper function that a caller can
> specificy length and address masks that indicate if the iov is
> unaligned.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv5 09/11] block: introduce bdev_iter_is_aligned helper
  2022-05-31 19:11 ` [PATCHv5 09/11] block: introduce bdev_iter_is_aligned helper Keith Busch
  2022-05-31 21:46   ` Eric Biggers
@ 2022-06-01  5:31   ` Christoph Hellwig
  1 sibling, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-06-01  5:31 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch,
	bvanassche, damien.lemoal, ebiggers, pankydev8, Keith Busch

Modulo the naming typo:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv5 10/11] block: relax direct io memory alignment
  2022-05-31 19:11 ` [PATCHv5 10/11] block: relax direct io memory alignment Keith Busch
@ 2022-06-01  5:31   ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-06-01  5:31 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch,
	bvanassche, damien.lemoal, ebiggers, pankydev8, Keith Busch

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv5 11/11] fs: add support for dma aligned direct-io
  2022-05-31 19:11 ` [PATCHv5 11/11] fs: add support for dma aligned direct-io Keith Busch
@ 2022-06-01  5:32   ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-06-01  5:32 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch,
	bvanassche, damien.lemoal, ebiggers, pankydev8, Keith Busch

Looks good with a s/fs/iomap/ in the subject:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv5 07/11] block/bounce: count bytes instead of sectors
       [not found] ` <20220531191137.2291467-8-kbusch@fb.com>
@ 2022-06-01  7:04   ` Eric Biggers
  0 siblings, 0 replies; 25+ messages in thread
From: Eric Biggers @ 2022-06-01  7:04 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch,
	bvanassche, damien.lemoal, pankydev8, Keith Busch, Pankaj Raghav

On Tue, May 31, 2022 at 12:11:33PM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Individual bv_len's may not be a sector size.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
> ---
> v4->v5:
> 
>   Updated comment (Christoph)
> 
>  block/bounce.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/block/bounce.c b/block/bounce.c
> index 8f7b6fe3b4db..fbadf179601f 100644
> --- a/block/bounce.c
> +++ b/block/bounce.c
> @@ -205,19 +205,26 @@ void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig)
>  	int rw = bio_data_dir(*bio_orig);
>  	struct bio_vec *to, from;
>  	struct bvec_iter iter;
> -	unsigned i = 0;
> +	unsigned i = 0, bytes = 0;
>  	bool bounce = false;
> -	int sectors = 0;
> +	int sectors;
>  
>  	bio_for_each_segment(from, *bio_orig, iter) {
>  		if (i++ < BIO_MAX_VECS)
> -			sectors += from.bv_len >> 9;
> +			bytes += from.bv_len;
>  		if (PageHighMem(from.bv_page))
>  			bounce = true;
>  	}
>  	if (!bounce)
>  		return;
>  
> +	/*
> +	 * Individual bvecs may not be logical block aligned. Round down
> +	 * the split size so that each bio is properly sector size aligned,
> +	 * even if we do not use the full hardware limits.
> +	 */

Please write "might not" instead of "may not", since "may not" is ambiguous; it
sometimes means "are not allowed to".  Likewise in other patches.

"Sector size" is ambiguous as well.  I think you mean "logical block size"?

- Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv5 00/11] direct-io dma alignment
  2022-05-31 19:11 [PATCHv5 00/11] direct-io dma alignment Keith Busch
                   ` (10 preceding siblings ...)
       [not found] ` <20220531191137.2291467-8-kbusch@fb.com>
@ 2022-06-01  7:11 ` Eric Biggers
  2022-06-01 14:28   ` Keith Busch
  11 siblings, 1 reply; 25+ messages in thread
From: Eric Biggers @ 2022-06-01  7:11 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-fsdevel, linux-block, linux-nvme, axboe, Kernel Team, hch,
	bvanassche, damien.lemoal, pankydev8, Keith Busch

On Tue, May 31, 2022 at 12:11:26PM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> The most significant change from v4 is the alignment is now checked
> prior to building the bio. This gets the expected EINVAL error for
> misaligned userspace iovecs in all cases now (Eric Biggers).
> 
> I've removed the legacy fs change, so only iomap filesystems get to use
> this alignement capability (Christoph Hellwig).
> 
> The block fops check for alignment returns a bool now (Damien).
> 
> Adjusted some comments, docs, and other minor style issues.
> 
> Reviews added for unchanged or trivially changed patches, removed
> reviews for ones that changed more significantly.
> 
> As before, I tested using 'fio' with forced misaligned user buffers on
> raw block, xfs, and ext4 (example raw block profile below).
> 

I still don't think you've taken care of all the assumptions that bv_len is a
multiple of logical block size, or at least SECTOR_SIZE.  Try this:

	git grep -E 'bv_len (>>|/)'

Also:

	git grep '<.*bv_len;'

Also take a look at bio_for_each_segment(), specifically how iter->bi_sector is
updated.

- Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv5 01/11] block: fix infinite loop for invalid zone append
  2022-05-31 19:11 ` [PATCHv5 01/11] block: fix infinite loop for invalid zone append Keith Busch
@ 2022-06-01  8:03   ` Johannes Thumshirn
  0 siblings, 0 replies; 25+ messages in thread
From: Johannes Thumshirn @ 2022-06-01  8:03 UTC (permalink / raw)
  To: Keith Busch, linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv5 05/11] block: add a helper function for dio alignment
  2022-05-31 19:11 ` [PATCHv5 05/11] block: add a helper function for dio alignment Keith Busch
  2022-06-01  5:29   ` Christoph Hellwig
@ 2022-06-01  8:04   ` Johannes Thumshirn
  1 sibling, 0 replies; 25+ messages in thread
From: Johannes Thumshirn @ 2022-06-01  8:04 UTC (permalink / raw)
  To: Keith Busch, linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv5 06/11] block/merge: count bytes instead of sectors
  2022-05-31 19:11 ` [PATCHv5 06/11] block/merge: count bytes instead of sectors Keith Busch
@ 2022-06-01  8:05   ` Johannes Thumshirn
  0 siblings, 0 replies; 25+ messages in thread
From: Johannes Thumshirn @ 2022-06-01  8:05 UTC (permalink / raw)
  To: Keith Busch, linux-fsdevel, linux-block, linux-nvme
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv5 00/11] direct-io dma alignment
  2022-06-01  7:11 ` [PATCHv5 00/11] direct-io dma alignment Eric Biggers
@ 2022-06-01 14:28   ` Keith Busch
  2022-06-01 16:12     ` Keith Busch
  0 siblings, 1 reply; 25+ messages in thread
From: Keith Busch @ 2022-06-01 14:28 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Keith Busch, linux-fsdevel, linux-block, linux-nvme, axboe,
	Kernel Team, hch, bvanassche, damien.lemoal, pankydev8

On Wed, Jun 01, 2022 at 12:11:40AM -0700, Eric Biggers wrote:
> On Tue, May 31, 2022 at 12:11:26PM -0700, Keith Busch wrote:
> > From: Keith Busch <kbusch@kernel.org>
> > 
> > The most significant change from v4 is the alignment is now checked
> > prior to building the bio. This gets the expected EINVAL error for
> > misaligned userspace iovecs in all cases now (Eric Biggers).
> > 
> > I've removed the legacy fs change, so only iomap filesystems get to use
> > this alignement capability (Christoph Hellwig).
> > 
> > The block fops check for alignment returns a bool now (Damien).
> > 
> > Adjusted some comments, docs, and other minor style issues.
> > 
> > Reviews added for unchanged or trivially changed patches, removed
> > reviews for ones that changed more significantly.
> > 
> > As before, I tested using 'fio' with forced misaligned user buffers on
> > raw block, xfs, and ext4 (example raw block profile below).
> > 
> 
> I still don't think you've taken care of all the assumptions that bv_len is a
> multiple of logical block size, or at least SECTOR_SIZE.  Try this:
> 
> 	git grep -E 'bv_len (>>|/)'

There are only 8 drivers that set the request_queue's dma alignment, which are
the only ones that could be affected from this patch series. The drivers
returned from the above don't set dma alignment, so they're fine to assume
those lengths.

I don't think the above query captures enough since it misses things like
nfhd_submit_bio() that shifts 9 on the following line. Not that that example
matters either for the same reason.
 
> Also:
> 
> 	git grep '<.*bv_len;'
>
> Also take a look at bio_for_each_segment(), specifically how iter->bi_sector is
> updated.

I'm not finding any driver user of this macro that's set the dma alignment
where this would break. They either never set it, or they're stacking drivers
that always get the safe default.

Outside drivers, blk-integrity doesn't operate on sector lengths, so that's
fine, and blk-crypto would prevent such unalignment much earlier. And btrfs
bounces this type of direct IO, so that's also fine.

Even if we assume all the existing users are safe, I suppose we could say this
type of assumption is potentially fragile. For example, I think drivers like
pmem or null_blk could readily reduce their queue's dma alignment limit, but
that may break their current usage.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv5 00/11] direct-io dma alignment
  2022-06-01 14:28   ` Keith Busch
@ 2022-06-01 16:12     ` Keith Busch
  2022-06-06 16:24       ` Keith Busch
  0 siblings, 1 reply; 25+ messages in thread
From: Keith Busch @ 2022-06-01 16:12 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Keith Busch, linux-fsdevel, linux-block, linux-nvme, axboe,
	Kernel Team, hch, bvanassche, damien.lemoal, pankydev8

On Wed, Jun 01, 2022 at 08:28:31AM -0600, Keith Busch wrote:
> On Wed, Jun 01, 2022 at 12:11:40AM -0700, Eric Biggers wrote:
> > I still don't think you've taken care of all the assumptions that bv_len is a
> > multiple of logical block size, or at least SECTOR_SIZE.  Try this:
> > 
> > 	git grep -E 'bv_len (>>|/)'
> 
> There are only 8 drivers that set the request_queue's dma alignment, which are
> the only ones that could be affected from this patch series.

It's actually even simpler to audit than that. Of the 8 drivers that explicitly
set dma alignment, only 3 set it to something smaller than a sector size. None
of them assume any particular bv_len, so I think we're fine.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv5 00/11] direct-io dma alignment
  2022-06-01 16:12     ` Keith Busch
@ 2022-06-06 16:24       ` Keith Busch
  0 siblings, 0 replies; 25+ messages in thread
From: Keith Busch @ 2022-06-06 16:24 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Keith Busch, linux-fsdevel, linux-block, linux-nvme, axboe,
	Kernel Team, hch, bvanassche, damien.lemoal, pankydev8

On Wed, Jun 01, 2022 at 10:12:26AM -0600, Keith Busch wrote:
> On Wed, Jun 01, 2022 at 08:28:31AM -0600, Keith Busch wrote:
> > On Wed, Jun 01, 2022 at 12:11:40AM -0700, Eric Biggers wrote:
> > > I still don't think you've taken care of all the assumptions that bv_len is a
> > > multiple of logical block size, or at least SECTOR_SIZE.  Try this:
> > > 
> > > 	git grep -E 'bv_len (>>|/)'
> > 
> > There are only 8 drivers that set the request_queue's dma alignment, which are
> > the only ones that could be affected from this patch series.
> 
> It's actually even simpler to audit than that. Of the 8 drivers that explicitly
> set dma alignment, only 3 set it to something smaller than a sector size. None
> of them assume any particular bv_len, so I think we're fine.

Eric,

Do you have any more concerns on this area? I'm reasonably confident this is
safe with all the existing users, and I have a new series ready to go with the
trivial fix-ups from the last round. I don't want to post it, though, if you
think I've missed something.

Thanks,
Keith

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-06-06 16:24 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-31 19:11 [PATCHv5 00/11] direct-io dma alignment Keith Busch
2022-05-31 19:11 ` [PATCHv5 01/11] block: fix infinite loop for invalid zone append Keith Busch
2022-06-01  8:03   ` Johannes Thumshirn
2022-05-31 19:11 ` [PATCHv5 02/11] block/bio: remove duplicate append pages code Keith Busch
2022-05-31 19:11 ` [PATCHv5 03/11] block: export dma_alignment attribute Keith Busch
2022-05-31 19:11 ` [PATCHv5 04/11] block: introduce bdev_dma_alignment helper Keith Busch
2022-05-31 19:11 ` [PATCHv5 05/11] block: add a helper function for dio alignment Keith Busch
2022-06-01  5:29   ` Christoph Hellwig
2022-06-01  8:04   ` Johannes Thumshirn
2022-05-31 19:11 ` [PATCHv5 06/11] block/merge: count bytes instead of sectors Keith Busch
2022-06-01  8:05   ` Johannes Thumshirn
2022-05-31 19:11 ` [PATCHv5 08/11] iov: introduce iov_iter_aligned Keith Busch
2022-06-01  5:30   ` Christoph Hellwig
2022-05-31 19:11 ` [PATCHv5 09/11] block: introduce bdev_iter_is_aligned helper Keith Busch
2022-05-31 21:46   ` Eric Biggers
2022-06-01  5:31   ` Christoph Hellwig
2022-05-31 19:11 ` [PATCHv5 10/11] block: relax direct io memory alignment Keith Busch
2022-06-01  5:31   ` Christoph Hellwig
2022-05-31 19:11 ` [PATCHv5 11/11] fs: add support for dma aligned direct-io Keith Busch
2022-06-01  5:32   ` Christoph Hellwig
     [not found] ` <20220531191137.2291467-8-kbusch@fb.com>
2022-06-01  7:04   ` [PATCHv5 07/11] block/bounce: count bytes instead of sectors Eric Biggers
2022-06-01  7:11 ` [PATCHv5 00/11] direct-io dma alignment Eric Biggers
2022-06-01 14:28   ` Keith Busch
2022-06-01 16:12     ` Keith Busch
2022-06-06 16:24       ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).