All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv4 0/9] direct io dma alignment
@ 2022-05-26  1:06 Keith Busch
  2022-05-26  1:06 ` [PATCHv4 1/9] block: fix infiniate loop for invalid zone append Keith Busch
                   ` (8 more replies)
  0 siblings, 9 replies; 29+ messages in thread
From: Keith Busch @ 2022-05-26  1:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-block
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

From: Keith Busch <kbusch@kernel.org>

This is mostly the same as v3, but with more code comments, prep
patches, replace '9' with 'SECTOR_SHIFT', and added reviews.

The biggest thing is I brought fs support back in the last patch.

For testing, I used 'fio', which has an 'iomem_align' parameter that can
force arbitrary memory offsets, even with direct-io. I created two data
integrity verifying profiles: one for raw block, the other for
filesystems, and ran each of them on two nvme namespaces. One namespace
was formatted 512b, the other was 4k.

The profile has different jobs: one smaller transfers, and one larger.
This to exercise both the simple and normal direct io cases, as well as
bio_split() conditions.

For filesystems testing, I used xfs, ext4 and btrfs to represent iomap,
and just ext2 for the older direct io. Note, btrfs falls back to
buffered anyway (see check_direct_IO()), so btrfs was essentially not
testing the direct io path.

Here is an example of one of the profiles for the raw block test:

  [global]
  filename=/dev/nvme0n1
  ioengine=io_uring
  verify=crc32c
  rw=randwrite
  iodepth=64
  direct=1

  [small]
  stonewall
  bsrange=4k-64k
  iomem_align=4

  [large]
  stonewall
  bsrange=512k-4M
  iomem_align=100

Keith Busch (9):
  block: fix infiniate loop for invalid zone append
  block/bio: remove duplicate append pages code
  block: export dma_alignment attribute
  block: introduce bdev_dma_alignment helper
  block: add a helper function for dio alignment
  block/merge: count bytes instead of sectors
  block/bounce: count bytes instead of sectors
  block: relax direct io memory alignment
  fs: add support for dma aligned direct-io

 Documentation/ABI/stable/sysfs-block |   9 +++
 block/bio.c                          | 117 +++++++++++++--------------
 block/blk-merge.c                    |  41 ++++++----
 block/blk-sysfs.c                    |   7 ++
 block/bounce.c                       |  12 ++-
 block/fops.c                         |  40 ++++++---
 fs/direct-io.c                       |  11 ++-
 fs/iomap/direct-io.c                 |   3 +-
 include/linux/blkdev.h               |   5 ++
 9 files changed, 146 insertions(+), 99 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCHv4 1/9] block: fix infiniate loop for invalid zone append
  2022-05-26  1:06 [PATCHv4 0/9] direct io dma alignment Keith Busch
@ 2022-05-26  1:06 ` Keith Busch
  2022-05-26  1:48   ` Damien Le Moal
  2022-05-31  6:19   ` Christoph Hellwig
  2022-05-26  1:06 ` [PATCHv4 2/9] block/bio: remove duplicate append pages code Keith Busch
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 29+ messages in thread
From: Keith Busch @ 2022-05-26  1:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-block
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

From: Keith Busch <kbusch@kernel.org>

Returning 0 early from __bio_iov_append_get_pages() for the
max_append_sectors warning just creates an infinite loop since 0 means
success, and the bio will never fill from the unadvancing iov_iter. We
could turn the return into an error value, but it will already be turned
into an error value later on, so just remove the warning. Clearly no one
ever hit it anyway.

Fixes: 0512a75b98f84 ("block: Introduce REQ_OP_ZONE_APPEND")
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 block/bio.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index a3893d80dccc..e249f6414fd5 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1228,9 +1228,6 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
 	size_t offset;
 	int ret = 0;
 
-	if (WARN_ON_ONCE(!max_append_sectors))
-		return 0;
-
 	/*
 	 * Move page array up in the allocated memory for the bio vecs as far as
 	 * possible so that we can start filling biovecs from the beginning
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCHv4 2/9] block/bio: remove duplicate append pages code
  2022-05-26  1:06 [PATCHv4 0/9] direct io dma alignment Keith Busch
  2022-05-26  1:06 ` [PATCHv4 1/9] block: fix infiniate loop for invalid zone append Keith Busch
@ 2022-05-26  1:06 ` Keith Busch
  2022-05-26  1:06 ` [PATCHv4 3/9] block: export dma_alignment attribute Keith Busch
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Keith Busch @ 2022-05-26  1:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-block
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch, Johannes Thumshirn

From: Keith Busch <kbusch@kernel.org>

The getting pages setup for zone append and normal IO are identical. Use
common code for each.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/bio.c | 102 ++++++++++++++++++++++------------------------------
 1 file changed, 42 insertions(+), 60 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index e249f6414fd5..55d2a9c4e312 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1158,6 +1158,37 @@ static void bio_put_pages(struct page **pages, size_t size, size_t off)
 		put_page(pages[i]);
 }
 
+static int bio_iov_add_page(struct bio *bio, struct page *page,
+		unsigned int len, unsigned int offset)
+{
+	bool same_page = false;
+
+	if (!__bio_try_merge_page(bio, page, len, offset, &same_page)) {
+		if (WARN_ON_ONCE(bio_full(bio, len)))
+			return -EINVAL;
+		__bio_add_page(bio, page, len, offset);
+		return 0;
+	}
+
+	if (same_page)
+		put_page(page);
+	return 0;
+}
+
+static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page,
+		unsigned int len, unsigned int offset)
+{
+	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
+	bool same_page = false;
+
+	if (bio_add_hw_page(q, bio, page, len, offset,
+			queue_max_zone_append_sectors(q), &same_page) != len)
+		return -EINVAL;
+	if (same_page)
+		put_page(page);
+	return 0;
+}
+
 #define PAGE_PTRS_PER_BVEC     (sizeof(struct bio_vec) / sizeof(struct page *))
 
 /**
@@ -1176,7 +1207,6 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
 	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
 	struct page **pages = (struct page **)bv;
-	bool same_page = false;
 	ssize_t size, left;
 	unsigned len, i;
 	size_t offset;
@@ -1185,7 +1215,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	 * Move page array up in the allocated memory for the bio vecs as far as
 	 * possible so that we can start filling biovecs from the beginning
 	 * without overwriting the temporary page array.
-	*/
+	 */
 	BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
 	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
 
@@ -1195,18 +1225,18 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 
 	for (left = size, i = 0; left > 0; left -= len, i++) {
 		struct page *page = pages[i];
+		int ret;
 
 		len = min_t(size_t, PAGE_SIZE - offset, left);
+		if (bio_op(bio) == REQ_OP_ZONE_APPEND)
+			ret = bio_iov_add_zone_append_page(bio, page, len,
+					offset);
+		else
+			ret = bio_iov_add_page(bio, page, len, offset);
 
-		if (__bio_try_merge_page(bio, page, len, offset, &same_page)) {
-			if (same_page)
-				put_page(page);
-		} else {
-			if (WARN_ON_ONCE(bio_full(bio, len))) {
-				bio_put_pages(pages + i, left, offset);
-				return -EINVAL;
-			}
-			__bio_add_page(bio, page, len, offset);
+		if (ret) {
+			bio_put_pages(pages + i, left, offset);
+			return ret;
 		}
 		offset = 0;
 	}
@@ -1215,51 +1245,6 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	return 0;
 }
 
-static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
-{
-	unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
-	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
-	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
-	unsigned int max_append_sectors = queue_max_zone_append_sectors(q);
-	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
-	struct page **pages = (struct page **)bv;
-	ssize_t size, left;
-	unsigned len, i;
-	size_t offset;
-	int ret = 0;
-
-	/*
-	 * Move page array up in the allocated memory for the bio vecs as far as
-	 * possible so that we can start filling biovecs from the beginning
-	 * without overwriting the temporary page array.
-	 */
-	BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
-	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
-
-	size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset);
-	if (unlikely(size <= 0))
-		return size ? size : -EFAULT;
-
-	for (left = size, i = 0; left > 0; left -= len, i++) {
-		struct page *page = pages[i];
-		bool same_page = false;
-
-		len = min_t(size_t, PAGE_SIZE - offset, left);
-		if (bio_add_hw_page(q, bio, page, len, offset,
-				max_append_sectors, &same_page) != len) {
-			bio_put_pages(pages + i, left, offset);
-			ret = -EINVAL;
-			break;
-		}
-		if (same_page)
-			put_page(page);
-		offset = 0;
-	}
-
-	iov_iter_advance(iter, size - left);
-	return ret;
-}
-
 /**
  * bio_iov_iter_get_pages - add user or kernel pages to a bio
  * @bio: bio to add pages to
@@ -1294,10 +1279,7 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	}
 
 	do {
-		if (bio_op(bio) == REQ_OP_ZONE_APPEND)
-			ret = __bio_iov_append_get_pages(bio, iter);
-		else
-			ret = __bio_iov_iter_get_pages(bio, iter);
+		ret = __bio_iov_iter_get_pages(bio, iter);
 	} while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
 
 	/* don't account direct I/O as memory stall */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCHv4 3/9] block: export dma_alignment attribute
  2022-05-26  1:06 [PATCHv4 0/9] direct io dma alignment Keith Busch
  2022-05-26  1:06 ` [PATCHv4 1/9] block: fix infiniate loop for invalid zone append Keith Busch
  2022-05-26  1:06 ` [PATCHv4 2/9] block/bio: remove duplicate append pages code Keith Busch
@ 2022-05-26  1:06 ` Keith Busch
  2022-05-26  7:30   ` Eric Biggers
  2022-05-26  1:06 ` [PATCHv4 4/9] block: introduce bdev_dma_alignment helper Keith Busch
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 29+ messages in thread
From: Keith Busch @ 2022-05-26  1:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-block
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch, Johannes Thumshirn

From: Keith Busch <kbusch@kernel.org>

User space may want to know how to align their buffers to avoid
bouncing. Export the queue attribute.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/ABI/stable/sysfs-block | 9 +++++++++
 block/blk-sysfs.c                    | 7 +++++++
 2 files changed, 16 insertions(+)

diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
index e8797cd09aff..08f6d00df138 100644
--- a/Documentation/ABI/stable/sysfs-block
+++ b/Documentation/ABI/stable/sysfs-block
@@ -142,6 +142,15 @@ Description:
 		Default value of this file is '1'(on).
 
 
+What:		/sys/block/<disk>/queue/dma_alignment
+Date:		May 2022
+Contact:	linux-block@vger.kernel.org
+Description:
+		Reports the alignment that user space addresses must have to be
+		used for raw block device access with O_DIRECT and other driver
+		specific passthrough mechanisms.
+
+
 What:		/sys/block/<disk>/queue/chunk_sectors
 Date:		September 2016
 Contact:	Hannes Reinecke <hare@suse.com>
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 88bd41d4cb59..14607565d781 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -274,6 +274,11 @@ static ssize_t queue_virt_boundary_mask_show(struct request_queue *q, char *page
 	return queue_var_show(q->limits.virt_boundary_mask, page);
 }
 
+static ssize_t queue_dma_alignment_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(queue_dma_alignment(q), page);
+}
+
 #define QUEUE_SYSFS_BIT_FNS(name, flag, neg)				\
 static ssize_t								\
 queue_##name##_show(struct request_queue *q, char *page)		\
@@ -606,6 +611,7 @@ QUEUE_RO_ENTRY(queue_dax, "dax");
 QUEUE_RW_ENTRY(queue_io_timeout, "io_timeout");
 QUEUE_RW_ENTRY(queue_wb_lat, "wbt_lat_usec");
 QUEUE_RO_ENTRY(queue_virt_boundary_mask, "virt_boundary_mask");
+QUEUE_RO_ENTRY(queue_dma_alignment, "dma_alignment");
 
 #ifdef CONFIG_BLK_DEV_THROTTLING_LOW
 QUEUE_RW_ENTRY(blk_throtl_sample_time, "throttle_sample_time");
@@ -667,6 +673,7 @@ static struct attribute *queue_attrs[] = {
 	&blk_throtl_sample_time_entry.attr,
 #endif
 	&queue_virt_boundary_mask_entry.attr,
+	&queue_dma_alignment_entry.attr,
 	NULL,
 };
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCHv4 4/9] block: introduce bdev_dma_alignment helper
  2022-05-26  1:06 [PATCHv4 0/9] direct io dma alignment Keith Busch
                   ` (2 preceding siblings ...)
  2022-05-26  1:06 ` [PATCHv4 3/9] block: export dma_alignment attribute Keith Busch
@ 2022-05-26  1:06 ` Keith Busch
  2022-05-26  1:06 ` [PATCHv4 5/9] block: add a helper function for dio alignment Keith Busch
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Keith Busch @ 2022-05-26  1:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-block
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch, Johannes Thumshirn

From: Keith Busch <kbusch@kernel.org>

Preparing for upcoming dma_alignment users that have a block_device, but
don't need the request_queue.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/blkdev.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5bdf2ac9142c..834b981ef01b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1365,6 +1365,11 @@ static inline int queue_dma_alignment(const struct request_queue *q)
 	return q ? q->dma_alignment : 511;
 }
 
+static inline unsigned int bdev_dma_alignment(struct block_device *bdev)
+{
+	return queue_dma_alignment(bdev_get_queue(bdev));
+}
+
 static inline int blk_rq_aligned(struct request_queue *q, unsigned long addr,
 				 unsigned int len)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCHv4 5/9] block: add a helper function for dio alignment
  2022-05-26  1:06 [PATCHv4 0/9] direct io dma alignment Keith Busch
                   ` (3 preceding siblings ...)
  2022-05-26  1:06 ` [PATCHv4 4/9] block: introduce bdev_dma_alignment helper Keith Busch
@ 2022-05-26  1:06 ` Keith Busch
  2022-05-26  1:54   ` Damien Le Moal
  2022-05-26  1:06 ` [PATCHv4 6/9] block/merge: count bytes instead of sectors Keith Busch
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 29+ messages in thread
From: Keith Busch @ 2022-05-26  1:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-block
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

From: Keith Busch <kbusch@kernel.org>

This will make it easier to add more complex acceptable alignment
criteria in the future.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 block/fops.c | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index b9b83030e0df..bd6c2e13a4e3 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -42,6 +42,16 @@ static unsigned int dio_bio_write_op(struct kiocb *iocb)
 	return op;
 }
 
+static int blkdev_dio_aligned(struct block_device *bdev, loff_t pos,
+			      struct iov_iter *iter)
+{
+	if ((pos | iov_iter_alignment(iter)) &
+	    (bdev_logical_block_size(bdev) - 1))
+		return -EINVAL;
+
+	return 0;
+}
+
 #define DIO_INLINE_BIO_VECS 4
 
 static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
@@ -54,9 +64,9 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
 	struct bio bio;
 	ssize_t ret;
 
-	if ((pos | iov_iter_alignment(iter)) &
-	    (bdev_logical_block_size(bdev) - 1))
-		return -EINVAL;
+	ret = blkdev_dio_aligned(bdev, pos, iter);
+	if (ret)
+		return ret;
 
 	if (nr_pages <= DIO_INLINE_BIO_VECS)
 		vecs = inline_vecs;
@@ -171,11 +181,11 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 	bool is_read = (iov_iter_rw(iter) == READ), is_sync;
 	unsigned int opf = is_read ? REQ_OP_READ : dio_bio_write_op(iocb);
 	loff_t pos = iocb->ki_pos;
-	int ret = 0;
+	int ret;
 
-	if ((pos | iov_iter_alignment(iter)) &
-	    (bdev_logical_block_size(bdev) - 1))
-		return -EINVAL;
+	ret = blkdev_dio_aligned(bdev, pos, iter);
+	if (ret)
+		return ret;
 
 	if (iocb->ki_flags & IOCB_ALLOC_CACHE)
 		opf |= REQ_ALLOC_CACHE;
@@ -296,11 +306,11 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
 	struct blkdev_dio *dio;
 	struct bio *bio;
 	loff_t pos = iocb->ki_pos;
-	int ret = 0;
+	int ret;
 
-	if ((pos | iov_iter_alignment(iter)) &
-	    (bdev_logical_block_size(bdev) - 1))
-		return -EINVAL;
+	ret = blkdev_dio_aligned(bdev, pos, iter);
+	if (ret)
+		return ret;
 
 	if (iocb->ki_flags & IOCB_ALLOC_CACHE)
 		opf |= REQ_ALLOC_CACHE;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCHv4 6/9] block/merge: count bytes instead of sectors
  2022-05-26  1:06 [PATCHv4 0/9] direct io dma alignment Keith Busch
                   ` (4 preceding siblings ...)
  2022-05-26  1:06 ` [PATCHv4 5/9] block: add a helper function for dio alignment Keith Busch
@ 2022-05-26  1:06 ` Keith Busch
  2022-05-26  1:57   ` Damien Le Moal
  2022-05-31  6:25   ` Christoph Hellwig
  2022-05-26  1:06 ` [PATCHv4 7/9] block/bounce: " Keith Busch
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 29+ messages in thread
From: Keith Busch @ 2022-05-26  1:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-block
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

From: Keith Busch <kbusch@kernel.org>

Individual bv_len's may not be a sector size.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
v3->v4:

  Use sector shift

  Add comment explaing the ALIGN_DOWN

 block/blk-merge.c | 41 ++++++++++++++++++++++++-----------------
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 7771dacc99cb..9ff0cb9e4840 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -201,11 +201,11 @@ static inline unsigned get_max_segment_size(const struct request_queue *q,
  * @nsegs:    [in,out] Number of segments in the bio being built. Incremented
  *            by the number of segments from @bv that may be appended to that
  *            bio without exceeding @max_segs
- * @sectors:  [in,out] Number of sectors in the bio being built. Incremented
- *            by the number of sectors from @bv that may be appended to that
- *            bio without exceeding @max_sectors
+ * @bytes:    [in,out] Number of bytes in the bio being built. Incremented
+ *            by the number of bytes from @bv that may be appended to that
+ *            bio without exceeding @max_bytes
  * @max_segs: [in] upper bound for *@nsegs
- * @max_sectors: [in] upper bound for *@sectors
+ * @max_bytes: [in] upper bound for *@bytes
  *
  * When splitting a bio, it can happen that a bvec is encountered that is too
  * big to fit in a single segment and hence that it has to be split in the
@@ -216,10 +216,10 @@ static inline unsigned get_max_segment_size(const struct request_queue *q,
  */
 static bool bvec_split_segs(const struct request_queue *q,
 			    const struct bio_vec *bv, unsigned *nsegs,
-			    unsigned *sectors, unsigned max_segs,
-			    unsigned max_sectors)
+			    unsigned *bytes, unsigned max_segs,
+			    unsigned max_bytes)
 {
-	unsigned max_len = (min(max_sectors, UINT_MAX >> 9) - *sectors) << 9;
+	unsigned max_len = min(max_bytes, UINT_MAX) - *bytes;
 	unsigned len = min(bv->bv_len, max_len);
 	unsigned total_len = 0;
 	unsigned seg_size = 0;
@@ -237,7 +237,7 @@ static bool bvec_split_segs(const struct request_queue *q,
 			break;
 	}
 
-	*sectors += total_len >> 9;
+	*bytes += total_len;
 
 	/* tell the caller to split the bvec if it is too big to fit */
 	return len > 0 || bv->bv_len > max_len;
@@ -269,8 +269,8 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 {
 	struct bio_vec bv, bvprv, *bvprvp = NULL;
 	struct bvec_iter iter;
-	unsigned nsegs = 0, sectors = 0;
-	const unsigned max_sectors = get_max_io_size(q, bio);
+	unsigned nsegs = 0, bytes = 0;
+	const unsigned max_bytes = get_max_io_size(q, bio) << 9;
 	const unsigned max_segs = queue_max_segments(q);
 
 	bio_for_each_bvec(bv, bio, iter) {
@@ -282,12 +282,12 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 			goto split;
 
 		if (nsegs < max_segs &&
-		    sectors + (bv.bv_len >> 9) <= max_sectors &&
+		    bytes + bv.bv_len <= max_bytes &&
 		    bv.bv_offset + bv.bv_len <= PAGE_SIZE) {
 			nsegs++;
-			sectors += bv.bv_len >> 9;
-		} else if (bvec_split_segs(q, &bv, &nsegs, &sectors, max_segs,
-					 max_sectors)) {
+			bytes += bv.bv_len;
+		} else if (bvec_split_segs(q, &bv, &nsegs, &bytes, max_segs,
+					   max_bytes)) {
 			goto split;
 		}
 
@@ -300,13 +300,20 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 split:
 	*segs = nsegs;
 
+	/*
+	 * Individual bvecs may not be logical block aligned, so round down to
+	 * that size to ensure both sides of the split bio are appropriately
+	 * sized.
+	 */
+	bytes = ALIGN_DOWN(bytes, queue_logical_block_size(q));
+
 	/*
 	 * Bio splitting may cause subtle trouble such as hang when doing sync
 	 * iopoll in direct IO routine. Given performance gain of iopoll for
 	 * big IO can be trival, disable iopoll when split needed.
 	 */
 	bio_clear_polled(bio);
-	return bio_split(bio, sectors, GFP_NOIO, bs);
+	return bio_split(bio, bytes >> SECTOR_SHIFT, GFP_NOIO, bs);
 }
 
 /**
@@ -375,7 +382,7 @@ EXPORT_SYMBOL(blk_queue_split);
 unsigned int blk_recalc_rq_segments(struct request *rq)
 {
 	unsigned int nr_phys_segs = 0;
-	unsigned int nr_sectors = 0;
+	unsigned int bytes = 0;
 	struct req_iterator iter;
 	struct bio_vec bv;
 
@@ -398,7 +405,7 @@ unsigned int blk_recalc_rq_segments(struct request *rq)
 	}
 
 	rq_for_each_bvec(bv, rq, iter)
-		bvec_split_segs(rq->q, &bv, &nr_phys_segs, &nr_sectors,
+		bvec_split_segs(rq->q, &bv, &nr_phys_segs, &bytes,
 				UINT_MAX, UINT_MAX);
 	return nr_phys_segs;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCHv4 7/9] block/bounce: count bytes instead of sectors
  2022-05-26  1:06 [PATCHv4 0/9] direct io dma alignment Keith Busch
                   ` (5 preceding siblings ...)
  2022-05-26  1:06 ` [PATCHv4 6/9] block/merge: count bytes instead of sectors Keith Busch
@ 2022-05-26  1:06 ` Keith Busch
  2022-05-26  1:58   ` Damien Le Moal
                     ` (2 more replies)
  2022-05-26  1:06 ` [PATCHv4 8/9] block: relax direct io memory alignment Keith Busch
  2022-05-26  1:06 ` [PATCHv4 9/9] fs: add support for dma aligned direct-io Keith Busch
  8 siblings, 3 replies; 29+ messages in thread
From: Keith Busch @ 2022-05-26  1:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-block
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

From: Keith Busch <kbusch@kernel.org>

Individual bv_len's may not be a sector size.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
v3->v4:

  Use sector shift

  Add comment explaing the ALIGN_DOWN

  Use unsigned int type for counting bytes

 block/bounce.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/block/bounce.c b/block/bounce.c
index 8f7b6fe3b4db..f6ae21ec2a70 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -205,19 +205,25 @@ void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig)
 	int rw = bio_data_dir(*bio_orig);
 	struct bio_vec *to, from;
 	struct bvec_iter iter;
-	unsigned i = 0;
+	unsigned i = 0, bytes = 0;
 	bool bounce = false;
-	int sectors = 0;
+	int sectors;
 
 	bio_for_each_segment(from, *bio_orig, iter) {
 		if (i++ < BIO_MAX_VECS)
-			sectors += from.bv_len >> 9;
+			bytes += from.bv_len;
 		if (PageHighMem(from.bv_page))
 			bounce = true;
 	}
 	if (!bounce)
 		return;
 
+	/*
+	 * If the original has more than BIO_MAX_VECS biovecs, the total bytes
+	 * may not be block size aligned. Align down to ensure both sides of
+	 * the split bio are appropriately sized.
+	 */
+	sectors = ALIGN_DOWN(bytes, queue_logical_block_size(q)) >> SECTOR_SHIFT;
 	if (sectors < bio_sectors(*bio_orig)) {
 		bio = bio_split(*bio_orig, sectors, GFP_NOIO, &bounce_bio_split);
 		bio_chain(bio, *bio_orig);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCHv4 8/9] block: relax direct io memory alignment
  2022-05-26  1:06 [PATCHv4 0/9] direct io dma alignment Keith Busch
                   ` (6 preceding siblings ...)
  2022-05-26  1:06 ` [PATCHv4 7/9] block/bounce: " Keith Busch
@ 2022-05-26  1:06 ` Keith Busch
  2022-05-26  2:03   ` Damien Le Moal
  2022-05-26  7:29   ` Eric Biggers
  2022-05-26  1:06 ` [PATCHv4 9/9] fs: add support for dma aligned direct-io Keith Busch
  8 siblings, 2 replies; 29+ messages in thread
From: Keith Busch @ 2022-05-26  1:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-block
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

From: Keith Busch <kbusch@kernel.org>

Use the address alignment requirements from the hardware for direct io
instead of requiring addresses be aligned to the block size. User space
can discover the alignment requirements from the dma_alignment queue
attribute.

User space can specify any hardware compatible DMA offset for each
segment, but every segment length is still required to be a multiple of
the block size.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 block/bio.c  | 12 ++++++++++++
 block/fops.c | 14 +++++++++++---
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 55d2a9c4e312..c492881959d1 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1219,7 +1219,19 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
 	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
 
+	/*
+	 * Each segment in the iov is required to be a block size multiple.
+	 * However, we may not be able to get the entire segment if it spans
+	 * more pages than bi_max_vecs allows, so we have to ALIGN_DOWN the
+	 * result to ensure the bio's total size is correct. The remainder of
+	 * the iov data will be picked up in the next bio iteration.
+	 *
+	 * If the result is ever 0, that indicates the iov fails the segment
+	 * size requirement and is an error.
+	 */
 	size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset);
+	if (size > 0)
+		size = ALIGN_DOWN(size, bdev_logical_block_size(bio->bi_bdev));
 	if (unlikely(size <= 0))
 		return size ? size : -EFAULT;
 
diff --git a/block/fops.c b/block/fops.c
index bd6c2e13a4e3..6ecbccc552b9 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -45,10 +45,10 @@ static unsigned int dio_bio_write_op(struct kiocb *iocb)
 static int blkdev_dio_aligned(struct block_device *bdev, loff_t pos,
 			      struct iov_iter *iter)
 {
-	if ((pos | iov_iter_alignment(iter)) &
-	    (bdev_logical_block_size(bdev) - 1))
+	if ((pos | iov_iter_count(iter)) & (bdev_logical_block_size(bdev) - 1))
+		return -EINVAL;
+	if (iov_iter_alignment(iter) & bdev_dma_alignment(bdev))
 		return -EINVAL;
-
 	return 0;
 }
 
@@ -88,6 +88,10 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
 	bio.bi_ioprio = iocb->ki_ioprio;
 
 	ret = bio_iov_iter_get_pages(&bio, iter);
+
+	/* check if iov is not aligned */
+	if (unlikely(!ret && iov_iter_count(iter)))
+		ret = -EINVAL;
 	if (unlikely(ret))
 		goto out;
 	ret = bio.bi_iter.bi_size;
@@ -333,6 +337,10 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
 		bio_iov_bvec_set(bio, iter);
 	} else {
 		ret = bio_iov_iter_get_pages(bio, iter);
+
+		/* check if iov is not aligned */
+		if (unlikely(!ret && iov_iter_count(iter)))
+			ret = -EINVAL;
 		if (unlikely(ret)) {
 			bio_put(bio);
 			return ret;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCHv4 9/9] fs: add support for dma aligned direct-io
  2022-05-26  1:06 [PATCHv4 0/9] direct io dma alignment Keith Busch
                   ` (7 preceding siblings ...)
  2022-05-26  1:06 ` [PATCHv4 8/9] block: relax direct io memory alignment Keith Busch
@ 2022-05-26  1:06 ` Keith Busch
  2022-05-26  2:01   ` Damien Le Moal
  2022-05-31  6:29   ` Christoph Hellwig
  8 siblings, 2 replies; 29+ messages in thread
From: Keith Busch @ 2022-05-26  1:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-block
  Cc: axboe, Kernel Team, hch, bvanassche, damien.lemoal, ebiggers,
	pankydev8, Keith Busch

From: Keith Busch <kbusch@kernel.org>

Use the address alignment requirements from the hardware for direct io
instead of requiring addresses be aligned to the block size.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 fs/direct-io.c       | 11 +++++++----
 fs/iomap/direct-io.c |  3 ++-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 840752006f60..64cc176be60c 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1131,7 +1131,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	struct dio_submit sdio = { 0, };
 	struct buffer_head map_bh = { 0, };
 	struct blk_plug plug;
-	unsigned long align = offset | iov_iter_alignment(iter);
+	unsigned long align = iov_iter_alignment(iter);
 
 	/*
 	 * Avoid references to bdev if not absolutely needed to give
@@ -1165,11 +1165,14 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 		goto fail_dio;
 	}
 
-	if (align & blocksize_mask) {
-		if (bdev)
+	if ((offset | align) & blocksize_mask) {
+		if (bdev) {
 			blkbits = blksize_bits(bdev_logical_block_size(bdev));
+			if (align & bdev_dma_alignment(bdev))
+				goto fail_dio;
+		}
 		blocksize_mask = (1 << blkbits) - 1;
-		if (align & blocksize_mask)
+		if ((offset | count) & blocksize_mask)
 			goto fail_dio;
 	}
 
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 80f9b047aa1b..0256d28baa8e 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -244,7 +244,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
 	size_t copied = 0;
 	size_t orig_count;
 
-	if ((pos | length | align) & ((1 << blkbits) - 1))
+	if ((pos | length) & ((1 << blkbits) - 1) ||
+	    align & bdev_dma_alignment(iomap->bdev))
 		return -EINVAL;
 
 	if (iomap->type == IOMAP_UNWRITTEN) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCHv4 1/9] block: fix infiniate loop for invalid zone append
  2022-05-26  1:06 ` [PATCHv4 1/9] block: fix infiniate loop for invalid zone append Keith Busch
@ 2022-05-26  1:48   ` Damien Le Moal
  2022-05-31  6:19   ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Damien Le Moal @ 2022-05-26  1:48 UTC (permalink / raw)
  To: Keith Busch, linux-fsdevel, linux-block
  Cc: axboe, Kernel Team, hch, bvanassche, ebiggers, pankydev8, Keith Busch

On 2022/05/26 10:06, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 

s/infiniate/infinite in the patch title.

> Returning 0 early from __bio_iov_append_get_pages() for the
> max_append_sectors warning just creates an infinite loop since 0 means
> success, and the bio will never fill from the unadvancing iov_iter. We
> could turn the return into an error value, but it will already be turned
> into an error value later on, so just remove the warning. Clearly no one
> ever hit it anyway.
> 
> Fixes: 0512a75b98f84 ("block: Introduce REQ_OP_ZONE_APPEND")
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  block/bio.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index a3893d80dccc..e249f6414fd5 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1228,9 +1228,6 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
>  	size_t offset;
>  	int ret = 0;
>  
> -	if (WARN_ON_ONCE(!max_append_sectors))
> -		return 0;
> -
>  	/*
>  	 * Move page array up in the allocated memory for the bio vecs as far as
>  	 * possible so that we can start filling biovecs from the beginning

Otherwise looks good.

Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv4 5/9] block: add a helper function for dio alignment
  2022-05-26  1:06 ` [PATCHv4 5/9] block: add a helper function for dio alignment Keith Busch
@ 2022-05-26  1:54   ` Damien Le Moal
  2022-05-31  6:22     ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Damien Le Moal @ 2022-05-26  1:54 UTC (permalink / raw)
  To: Keith Busch, linux-fsdevel, linux-block
  Cc: axboe, Kernel Team, hch, bvanassche, ebiggers, pankydev8, Keith Busch

On 2022/05/26 10:06, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> This will make it easier to add more complex acceptable alignment
> criteria in the future.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  block/fops.c | 32 +++++++++++++++++++++-----------
>  1 file changed, 21 insertions(+), 11 deletions(-)
> 
> diff --git a/block/fops.c b/block/fops.c
> index b9b83030e0df..bd6c2e13a4e3 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -42,6 +42,16 @@ static unsigned int dio_bio_write_op(struct kiocb *iocb)
>  	return op;
>  }
>  
> +static int blkdev_dio_aligned(struct block_device *bdev, loff_t pos,
> +			      struct iov_iter *iter)
> +{
> +	if ((pos | iov_iter_alignment(iter)) &
> +	    (bdev_logical_block_size(bdev) - 1))
> +		return -EINVAL;
> +
> +	return 0;
> +}

Nit: The name of this helper really suggest a bool return, which would be a more
flexible interface I think:

	if (!blkdev_dio_aligned(bdev, pos, iter))
		return -EINVAL; <-- or whatever error code the caller wants.

And that will allow you to get rid of the ret variable in a least
__blkdev_direct_IO_async.

> +
>  #define DIO_INLINE_BIO_VECS 4
>  
>  static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
> @@ -54,9 +64,9 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
>  	struct bio bio;
>  	ssize_t ret;
>  
> -	if ((pos | iov_iter_alignment(iter)) &
> -	    (bdev_logical_block_size(bdev) - 1))
> -		return -EINVAL;
> +	ret = blkdev_dio_aligned(bdev, pos, iter);
> +	if (ret)
> +		return ret;
>  
>  	if (nr_pages <= DIO_INLINE_BIO_VECS)
>  		vecs = inline_vecs;
> @@ -171,11 +181,11 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
>  	bool is_read = (iov_iter_rw(iter) == READ), is_sync;
>  	unsigned int opf = is_read ? REQ_OP_READ : dio_bio_write_op(iocb);
>  	loff_t pos = iocb->ki_pos;
> -	int ret = 0;
> +	int ret;
>  
> -	if ((pos | iov_iter_alignment(iter)) &
> -	    (bdev_logical_block_size(bdev) - 1))
> -		return -EINVAL;
> +	ret = blkdev_dio_aligned(bdev, pos, iter);
> +	if (ret)
> +		return ret;
>  
>  	if (iocb->ki_flags & IOCB_ALLOC_CACHE)
>  		opf |= REQ_ALLOC_CACHE;
> @@ -296,11 +306,11 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
>  	struct blkdev_dio *dio;
>  	struct bio *bio;
>  	loff_t pos = iocb->ki_pos;
> -	int ret = 0;
> +	int ret;
>  
> -	if ((pos | iov_iter_alignment(iter)) &
> -	    (bdev_logical_block_size(bdev) - 1))
> -		return -EINVAL;
> +	ret = blkdev_dio_aligned(bdev, pos, iter);
> +	if (ret)
> +		return ret;
>  
>  	if (iocb->ki_flags & IOCB_ALLOC_CACHE)
>  		opf |= REQ_ALLOC_CACHE;


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv4 6/9] block/merge: count bytes instead of sectors
  2022-05-26  1:06 ` [PATCHv4 6/9] block/merge: count bytes instead of sectors Keith Busch
@ 2022-05-26  1:57   ` Damien Le Moal
  2022-05-31  6:25   ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Damien Le Moal @ 2022-05-26  1:57 UTC (permalink / raw)
  To: Keith Busch, linux-fsdevel, linux-block
  Cc: axboe, Kernel Team, hch, bvanassche, ebiggers, pankydev8, Keith Busch

On 2022/05/26 10:06, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Individual bv_len's may not be a sector size.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> v3->v4:
> 
>   Use sector shift
> 
>   Add comment explaing the ALIGN_DOWN
> 
>  block/blk-merge.c | 41 ++++++++++++++++++++++++-----------------
>  1 file changed, 24 insertions(+), 17 deletions(-)
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 7771dacc99cb..9ff0cb9e4840 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -201,11 +201,11 @@ static inline unsigned get_max_segment_size(const struct request_queue *q,
>   * @nsegs:    [in,out] Number of segments in the bio being built. Incremented
>   *            by the number of segments from @bv that may be appended to that
>   *            bio without exceeding @max_segs
> - * @sectors:  [in,out] Number of sectors in the bio being built. Incremented
> - *            by the number of sectors from @bv that may be appended to that
> - *            bio without exceeding @max_sectors
> + * @bytes:    [in,out] Number of bytes in the bio being built. Incremented
> + *            by the number of bytes from @bv that may be appended to that
> + *            bio without exceeding @max_bytes
>   * @max_segs: [in] upper bound for *@nsegs
> - * @max_sectors: [in] upper bound for *@sectors
> + * @max_bytes: [in] upper bound for *@bytes
>   *
>   * When splitting a bio, it can happen that a bvec is encountered that is too
>   * big to fit in a single segment and hence that it has to be split in the
> @@ -216,10 +216,10 @@ static inline unsigned get_max_segment_size(const struct request_queue *q,
>   */
>  static bool bvec_split_segs(const struct request_queue *q,
>  			    const struct bio_vec *bv, unsigned *nsegs,
> -			    unsigned *sectors, unsigned max_segs,
> -			    unsigned max_sectors)
> +			    unsigned *bytes, unsigned max_segs,
> +			    unsigned max_bytes)
>  {
> -	unsigned max_len = (min(max_sectors, UINT_MAX >> 9) - *sectors) << 9;
> +	unsigned max_len = min(max_bytes, UINT_MAX) - *bytes;
>  	unsigned len = min(bv->bv_len, max_len);
>  	unsigned total_len = 0;
>  	unsigned seg_size = 0;
> @@ -237,7 +237,7 @@ static bool bvec_split_segs(const struct request_queue *q,
>  			break;
>  	}
>  
> -	*sectors += total_len >> 9;
> +	*bytes += total_len;
>  
>  	/* tell the caller to split the bvec if it is too big to fit */
>  	return len > 0 || bv->bv_len > max_len;
> @@ -269,8 +269,8 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>  {
>  	struct bio_vec bv, bvprv, *bvprvp = NULL;
>  	struct bvec_iter iter;
> -	unsigned nsegs = 0, sectors = 0;
> -	const unsigned max_sectors = get_max_io_size(q, bio);
> +	unsigned nsegs = 0, bytes = 0;
> +	const unsigned max_bytes = get_max_io_size(q, bio) << 9;

You missed one SECTOR_SHIFT here :)

Also, this get_max_io_size() function is now super confusing. It really should
be get_max_io_sectors() and we could add also get_max_io_bytes(), no ?

>  	const unsigned max_segs = queue_max_segments(q);
>  
>  	bio_for_each_bvec(bv, bio, iter) {
> @@ -282,12 +282,12 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>  			goto split;
>  
>  		if (nsegs < max_segs &&
> -		    sectors + (bv.bv_len >> 9) <= max_sectors &&
> +		    bytes + bv.bv_len <= max_bytes &&
>  		    bv.bv_offset + bv.bv_len <= PAGE_SIZE) {
>  			nsegs++;
> -			sectors += bv.bv_len >> 9;
> -		} else if (bvec_split_segs(q, &bv, &nsegs, &sectors, max_segs,
> -					 max_sectors)) {
> +			bytes += bv.bv_len;
> +		} else if (bvec_split_segs(q, &bv, &nsegs, &bytes, max_segs,
> +					   max_bytes)) {
>  			goto split;
>  		}
>  
> @@ -300,13 +300,20 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>  split:
>  	*segs = nsegs;
>  
> +	/*
> +	 * Individual bvecs may not be logical block aligned, so round down to
> +	 * that size to ensure both sides of the split bio are appropriately
> +	 * sized.
> +	 */
> +	bytes = ALIGN_DOWN(bytes, queue_logical_block_size(q));
> +
>  	/*
>  	 * Bio splitting may cause subtle trouble such as hang when doing sync
>  	 * iopoll in direct IO routine. Given performance gain of iopoll for
>  	 * big IO can be trival, disable iopoll when split needed.
>  	 */
>  	bio_clear_polled(bio);
> -	return bio_split(bio, sectors, GFP_NOIO, bs);
> +	return bio_split(bio, bytes >> SECTOR_SHIFT, GFP_NOIO, bs);
>  }
>  
>  /**
> @@ -375,7 +382,7 @@ EXPORT_SYMBOL(blk_queue_split);
>  unsigned int blk_recalc_rq_segments(struct request *rq)
>  {
>  	unsigned int nr_phys_segs = 0;
> -	unsigned int nr_sectors = 0;
> +	unsigned int bytes = 0;
>  	struct req_iterator iter;
>  	struct bio_vec bv;
>  
> @@ -398,7 +405,7 @@ unsigned int blk_recalc_rq_segments(struct request *rq)
>  	}
>  
>  	rq_for_each_bvec(bv, rq, iter)
> -		bvec_split_segs(rq->q, &bv, &nr_phys_segs, &nr_sectors,
> +		bvec_split_segs(rq->q, &bv, &nr_phys_segs, &bytes,
>  				UINT_MAX, UINT_MAX);
>  	return nr_phys_segs;
>  }


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv4 7/9] block/bounce: count bytes instead of sectors
  2022-05-26  1:06 ` [PATCHv4 7/9] block/bounce: " Keith Busch
@ 2022-05-26  1:58   ` Damien Le Moal
  2022-05-30 15:08   ` Pankaj Raghav
  2022-05-31  6:26   ` Christoph Hellwig
  2 siblings, 0 replies; 29+ messages in thread
From: Damien Le Moal @ 2022-05-26  1:58 UTC (permalink / raw)
  To: Keith Busch, linux-fsdevel, linux-block
  Cc: axboe, Kernel Team, hch, bvanassche, ebiggers, pankydev8, Keith Busch

On 2022/05/26 10:06, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Individual bv_len's may not be a sector size.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> v3->v4:
> 
>   Use sector shift
> 
>   Add comment explaing the ALIGN_DOWN
> 
>   Use unsigned int type for counting bytes
> 
>  block/bounce.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/block/bounce.c b/block/bounce.c
> index 8f7b6fe3b4db..f6ae21ec2a70 100644
> --- a/block/bounce.c
> +++ b/block/bounce.c
> @@ -205,19 +205,25 @@ void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig)
>  	int rw = bio_data_dir(*bio_orig);
>  	struct bio_vec *to, from;
>  	struct bvec_iter iter;
> -	unsigned i = 0;
> +	unsigned i = 0, bytes = 0;
>  	bool bounce = false;
> -	int sectors = 0;
> +	int sectors;
>  
>  	bio_for_each_segment(from, *bio_orig, iter) {
>  		if (i++ < BIO_MAX_VECS)
> -			sectors += from.bv_len >> 9;
> +			bytes += from.bv_len;
>  		if (PageHighMem(from.bv_page))
>  			bounce = true;
>  	}
>  	if (!bounce)
>  		return;
>  
> +	/*
> +	 * If the original has more than BIO_MAX_VECS biovecs, the total bytes
> +	 * may not be block size aligned. Align down to ensure both sides of
> +	 * the split bio are appropriately sized.
> +	 */
> +	sectors = ALIGN_DOWN(bytes, queue_logical_block_size(q)) >> SECTOR_SHIFT;
>  	if (sectors < bio_sectors(*bio_orig)) {
>  		bio = bio_split(*bio_orig, sectors, GFP_NOIO, &bounce_bio_split);
>  		bio_chain(bio, *bio_orig);

Looks good.

Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv4 9/9] fs: add support for dma aligned direct-io
  2022-05-26  1:06 ` [PATCHv4 9/9] fs: add support for dma aligned direct-io Keith Busch
@ 2022-05-26  2:01   ` Damien Le Moal
  2022-05-31  6:29   ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Damien Le Moal @ 2022-05-26  2:01 UTC (permalink / raw)
  To: Keith Busch, linux-fsdevel, linux-block
  Cc: axboe, Kernel Team, hch, bvanassche, ebiggers, pankydev8, Keith Busch

On 2022/05/26 10:06, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Use the address alignment requirements from the hardware for direct io
> instead of requiring addresses be aligned to the block size.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  fs/direct-io.c       | 11 +++++++----
>  fs/iomap/direct-io.c |  3 ++-
>  2 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index 840752006f60..64cc176be60c 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -1131,7 +1131,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
>  	struct dio_submit sdio = { 0, };
>  	struct buffer_head map_bh = { 0, };
>  	struct blk_plug plug;
> -	unsigned long align = offset | iov_iter_alignment(iter);
> +	unsigned long align = iov_iter_alignment(iter);
>  
>  	/*
>  	 * Avoid references to bdev if not absolutely needed to give
> @@ -1165,11 +1165,14 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
>  		goto fail_dio;
>  	}
>  
> -	if (align & blocksize_mask) {
> -		if (bdev)
> +	if ((offset | align) & blocksize_mask) {
> +		if (bdev) {
>  			blkbits = blksize_bits(bdev_logical_block_size(bdev));
> +			if (align & bdev_dma_alignment(bdev))
> +				goto fail_dio;
> +		}
>  		blocksize_mask = (1 << blkbits) - 1;
> -		if (align & blocksize_mask)
> +		if ((offset | count) & blocksize_mask)
>  			goto fail_dio;
>  	}
>  
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 80f9b047aa1b..0256d28baa8e 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -244,7 +244,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
>  	size_t copied = 0;
>  	size_t orig_count;
>  
> -	if ((pos | length | align) & ((1 << blkbits) - 1))
> +	if ((pos | length) & ((1 << blkbits) - 1) ||
> +	    align & bdev_dma_alignment(iomap->bdev))
>  		return -EINVAL;
>  
>  	if (iomap->type == IOMAP_UNWRITTEN) {

Looks good to me.

Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv4 8/9] block: relax direct io memory alignment
  2022-05-26  1:06 ` [PATCHv4 8/9] block: relax direct io memory alignment Keith Busch
@ 2022-05-26  2:03   ` Damien Le Moal
  2022-05-26  7:29   ` Eric Biggers
  1 sibling, 0 replies; 29+ messages in thread
From: Damien Le Moal @ 2022-05-26  2:03 UTC (permalink / raw)
  To: Keith Busch, linux-fsdevel, linux-block
  Cc: axboe, Kernel Team, hch, bvanassche, ebiggers, pankydev8, Keith Busch

On 2022/05/26 10:06, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Use the address alignment requirements from the hardware for direct io
> instead of requiring addresses be aligned to the block size. User space
> can discover the alignment requirements from the dma_alignment queue
> attribute.
> 
> User space can specify any hardware compatible DMA offset for each
> segment, but every segment length is still required to be a multiple of
> the block size.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  block/bio.c  | 12 ++++++++++++
>  block/fops.c | 14 +++++++++++---
>  2 files changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 55d2a9c4e312..c492881959d1 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1219,7 +1219,19 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>  	BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
>  	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
>  
> +	/*
> +	 * Each segment in the iov is required to be a block size multiple.
> +	 * However, we may not be able to get the entire segment if it spans
> +	 * more pages than bi_max_vecs allows, so we have to ALIGN_DOWN the
> +	 * result to ensure the bio's total size is correct. The remainder of
> +	 * the iov data will be picked up in the next bio iteration.
> +	 *
> +	 * If the result is ever 0, that indicates the iov fails the segment
> +	 * size requirement and is an error.
> +	 */
>  	size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset);
> +	if (size > 0)
> +		size = ALIGN_DOWN(size, bdev_logical_block_size(bio->bi_bdev));
>  	if (unlikely(size <= 0))
>  		return size ? size : -EFAULT;
>  
> diff --git a/block/fops.c b/block/fops.c
> index bd6c2e13a4e3..6ecbccc552b9 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -45,10 +45,10 @@ static unsigned int dio_bio_write_op(struct kiocb *iocb)
>  static int blkdev_dio_aligned(struct block_device *bdev, loff_t pos,
>  			      struct iov_iter *iter)
>  {
> -	if ((pos | iov_iter_alignment(iter)) &
> -	    (bdev_logical_block_size(bdev) - 1))
> +	if ((pos | iov_iter_count(iter)) & (bdev_logical_block_size(bdev) - 1))
> +		return -EINVAL;
> +	if (iov_iter_alignment(iter) & bdev_dma_alignment(bdev))
>  		return -EINVAL;
> -
>  	return 0;
>  }
>  
> @@ -88,6 +88,10 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
>  	bio.bi_ioprio = iocb->ki_ioprio;
>  
>  	ret = bio_iov_iter_get_pages(&bio, iter);
> +
> +	/* check if iov is not aligned */
> +	if (unlikely(!ret && iov_iter_count(iter)))
> +		ret = -EINVAL;
>  	if (unlikely(ret))
>  		goto out;
>  	ret = bio.bi_iter.bi_size;
> @@ -333,6 +337,10 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
>  		bio_iov_bvec_set(bio, iter);
>  	} else {
>  		ret = bio_iov_iter_get_pages(bio, iter);
> +
> +		/* check if iov is not aligned */
> +		if (unlikely(!ret && iov_iter_count(iter)))
> +			ret = -EINVAL;
>  		if (unlikely(ret)) {
>  			bio_put(bio);
>  			return ret;

Looks OK to me.

Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv4 8/9] block: relax direct io memory alignment
  2022-05-26  1:06 ` [PATCHv4 8/9] block: relax direct io memory alignment Keith Busch
  2022-05-26  2:03   ` Damien Le Moal
@ 2022-05-26  7:29   ` Eric Biggers
  2022-05-26 13:50     ` Keith Busch
  1 sibling, 1 reply; 29+ messages in thread
From: Eric Biggers @ 2022-05-26  7:29 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-fsdevel, linux-block, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, pankydev8, Keith Busch

On Wed, May 25, 2022 at 06:06:12PM -0700, Keith Busch wrote:
> +	/*
> +	 * Each segment in the iov is required to be a block size multiple.

Where is this enforced?

- Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv4 3/9] block: export dma_alignment attribute
  2022-05-26  1:06 ` [PATCHv4 3/9] block: export dma_alignment attribute Keith Busch
@ 2022-05-26  7:30   ` Eric Biggers
  0 siblings, 0 replies; 29+ messages in thread
From: Eric Biggers @ 2022-05-26  7:30 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-fsdevel, linux-block, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, pankydev8, Keith Busch, Johannes Thumshirn

On Wed, May 25, 2022 at 06:06:07PM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> User space may want to know how to align their buffers to avoid
> bouncing. Export the queue attribute.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  Documentation/ABI/stable/sysfs-block | 9 +++++++++
>  block/blk-sysfs.c                    | 7 +++++++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
> index e8797cd09aff..08f6d00df138 100644
> --- a/Documentation/ABI/stable/sysfs-block
> +++ b/Documentation/ABI/stable/sysfs-block
> @@ -142,6 +142,15 @@ Description:
>  		Default value of this file is '1'(on).
>  
>  
> +What:		/sys/block/<disk>/queue/dma_alignment
> +Date:		May 2022
> +Contact:	linux-block@vger.kernel.org
> +Description:
> +		Reports the alignment that user space addresses must have to be
> +		used for raw block device access with O_DIRECT and other driver
> +		specific passthrough mechanisms.
> +

Please keep this file in alphabetical order.

- Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv4 8/9] block: relax direct io memory alignment
  2022-05-26  7:29   ` Eric Biggers
@ 2022-05-26 13:50     ` Keith Busch
  2022-05-26 18:13       ` Eric Biggers
  0 siblings, 1 reply; 29+ messages in thread
From: Keith Busch @ 2022-05-26 13:50 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Keith Busch, linux-fsdevel, linux-block, axboe, Kernel Team, hch,
	bvanassche, damien.lemoal, pankydev8

On Thu, May 26, 2022 at 12:29:41AM -0700, Eric Biggers wrote:
> On Wed, May 25, 2022 at 06:06:12PM -0700, Keith Busch wrote:
> > +	/*
> > +	 * Each segment in the iov is required to be a block size multiple.
> 
> Where is this enforced?

Right below the comment. If it isn't a block size multiple, then ALIGN_DOWN
will eventually result in 0 and -EFAULT is returned. 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv4 8/9] block: relax direct io memory alignment
  2022-05-26 13:50     ` Keith Busch
@ 2022-05-26 18:13       ` Eric Biggers
  2022-05-26 18:55         ` Keith Busch
  0 siblings, 1 reply; 29+ messages in thread
From: Eric Biggers @ 2022-05-26 18:13 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, linux-fsdevel, linux-block, axboe, Kernel Team, hch,
	bvanassche, damien.lemoal, pankydev8

On Thu, May 26, 2022 at 07:50:45AM -0600, Keith Busch wrote:
> On Thu, May 26, 2022 at 12:29:41AM -0700, Eric Biggers wrote:
> > On Wed, May 25, 2022 at 06:06:12PM -0700, Keith Busch wrote:
> > > +	/*
> > > +	 * Each segment in the iov is required to be a block size multiple.
> > 
> > Where is this enforced?
> 
> Right below the comment. If it isn't a block size multiple, then ALIGN_DOWN
> will eventually result in 0 and -EFAULT is returned. 

That's interesting, I would have expected it to be checked in
blkdev_dio_aligned().

EFAULT isn't the correct error code for this case; it should be EINVAL as is
normally the case for bad alignment.  See the man pages for read and write.

- Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv4 8/9] block: relax direct io memory alignment
  2022-05-26 18:13       ` Eric Biggers
@ 2022-05-26 18:55         ` Keith Busch
  2022-05-26 20:32           ` Keith Busch
  0 siblings, 1 reply; 29+ messages in thread
From: Keith Busch @ 2022-05-26 18:55 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Keith Busch, linux-fsdevel, linux-block, axboe, Kernel Team, hch,
	bvanassche, damien.lemoal, pankydev8

On Thu, May 26, 2022 at 11:13:52AM -0700, Eric Biggers wrote:
> On Thu, May 26, 2022 at 07:50:45AM -0600, Keith Busch wrote:
> > On Thu, May 26, 2022 at 12:29:41AM -0700, Eric Biggers wrote:
> > > On Wed, May 25, 2022 at 06:06:12PM -0700, Keith Busch wrote:
> > > > +	/*
> > > > +	 * Each segment in the iov is required to be a block size multiple.
> > > 
> > > Where is this enforced?
> > 
> > Right below the comment. If it isn't a block size multiple, then ALIGN_DOWN
> > will eventually result in 0 and -EFAULT is returned. 
> 
> That's interesting, I would have expected it to be checked in
> blkdev_dio_aligned().

That would require a change to the iov_iter_alignment API, or a new function
entirely.
 
> EFAULT isn't the correct error code for this case; it should be EINVAL as is
> normally the case for bad alignment.  See the man pages for read and write.

The EFAULT was just to get the do-while loop to break out. The callers in this
patch still return EINVAL when it sees the iov_iter hasn't advanced to the end.

But there are some cases where the EFAULT would be returned to the user. Let me
see how difficult it would be catch it early in blkdev_dio_aligned().

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv4 8/9] block: relax direct io memory alignment
  2022-05-26 18:55         ` Keith Busch
@ 2022-05-26 20:32           ` Keith Busch
  2022-05-31  6:30             ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Keith Busch @ 2022-05-26 20:32 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Keith Busch, linux-fsdevel, linux-block, axboe, Kernel Team, hch,
	bvanassche, damien.lemoal, pankydev8

On Thu, May 26, 2022 at 12:55:50PM -0600, Keith Busch wrote:
> 
> Let me see how difficult it would be catch it early in blkdev_dio_aligned().

Something like this appears to work:

---
diff --git a/block/fops.c b/block/fops.c
index 6ecbccc552b9..20763a143991 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -47,7 +47,8 @@ static int blkdev_dio_aligned(struct block_device *bdev, loff_t pos,
 {
 	if ((pos | iov_iter_count(iter)) & (bdev_logical_block_size(bdev) - 1))
 		return -EINVAL;
-	if (iov_iter_alignment(iter) & bdev_dma_alignment(bdev))
+	if (!iov_iter_aligned(iter, bdev_dma_alignment(bdev),
+			      bdev_logical_block_size(bdev) - 1))
 		return -EINVAL;
 	return 0;
 }
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 739285fe5a2f..a9cbf90d8dcd 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -219,6 +219,8 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
 #endif
 
 size_t iov_iter_zero(size_t bytes, struct iov_iter *);
+bool iov_iter_aligned(const struct iov_iter *i, unsigned addr_mask,
+			unsigned len_mask);
 unsigned long iov_iter_alignment(const struct iov_iter *i);
 unsigned long iov_iter_gap_alignment(const struct iov_iter *i);
 void iov_iter_init(struct iov_iter *i, unsigned int direction, const struct iovec *iov,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 6dd5330f7a99..6a98bbd56408 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1268,6 +1268,89 @@ void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count)
 }
 EXPORT_SYMBOL(iov_iter_discard);
 
+static bool iov_iter_aligned_iovec(const struct iov_iter *i, unsigned addr_mask,
+				   unsigned len_mask)
+{
+	size_t size = i->count;
+	size_t skip = i->iov_offset;
+	unsigned k;
+
+	for (k = 0; k < i->nr_segs; k++, skip = 0) {
+		size_t len = i->iov[k].iov_len - skip;
+
+		if (len > size)
+			len = size;
+		if (len & len_mask)
+			return false;
+		if ((unsigned long)(i->iov[k].iov_base + skip) & addr_mask)
+			return false;
+
+		size -= len;
+		if (!size)
+			break;
+	}
+	return true;
+}
+
+static bool iov_iter_aligned_bvec(const struct iov_iter *i, unsigned addr_mask,
+				  unsigned len_mask)
+{
+	size_t size = i->count;
+	unsigned skip = i->iov_offset;
+	unsigned k;
+
+	for (k = 0; k < i->nr_segs; k++, skip = 0) {
+		size_t len = i->bvec[k].bv_len - skip;
+
+		if (len > size)
+			len = size;
+		if (len & len_mask)
+			return false;
+		if ((unsigned long)(i->bvec[k].bv_offset + skip) & addr_mask)
+			return false;
+
+		size -= len;
+		if (!size)
+			break;
+	}
+	return true;
+}
+
+bool iov_iter_aligned(const struct iov_iter *i, unsigned addr_mask,
+		      unsigned len_mask)
+{
+	if (likely(iter_is_iovec(i) || iov_iter_is_kvec(i)))
+		return iov_iter_aligned_iovec(i, addr_mask, len_mask);
+
+	if (iov_iter_is_bvec(i))
+		return iov_iter_aligned_bvec(i, addr_mask, len_mask);
+
+	if (iov_iter_is_pipe(i)) {
+		unsigned int p_mask = i->pipe->ring_size - 1;
+		size_t size = i->count;
+
+		if (size & len_mask)
+			return false;
+		if (size && allocated(&i->pipe->bufs[i->head & p_mask])) {
+			if (i->iov_offset & addr_mask)
+				return false;
+		}
+
+		return true;
+	}
+
+	if (iov_iter_is_xarray(i)) {
+		if (i->count & len_mask)
+			return false;
+		if ((i->xarray_start + i->iov_offset) & addr_mask)
+			return false;
+	}
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(iov_iter_aligned);
+
 static unsigned long iov_iter_alignment_iovec(const struct iov_iter *i)
 {
 	unsigned long res = 0;
--

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCHv4 7/9] block/bounce: count bytes instead of sectors
  2022-05-26  1:06 ` [PATCHv4 7/9] block/bounce: " Keith Busch
  2022-05-26  1:58   ` Damien Le Moal
@ 2022-05-30 15:08   ` Pankaj Raghav
  2022-05-31  6:26   ` Christoph Hellwig
  2 siblings, 0 replies; 29+ messages in thread
From: Pankaj Raghav @ 2022-05-30 15:08 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-fsdevel, linux-block, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, ebiggers, Keith Busch

On Wed, May 25, 2022 at 06:06:11PM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Individual bv_len's may not be a sector size.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> v3->v4:
> 
>   Use sector shift
> 
>   Add comment explaing the ALIGN_DOWN
> 
>   Use unsigned int type for counting bytes
> 
>  block/bounce.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/block/bounce.c b/block/bounce.c
> index 8f7b6fe3b4db..f6ae21ec2a70 100644
> --- a/block/bounce.c
> +++ b/block/bounce.c
> @@ -205,19 +205,25 @@ void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig)
>  	int rw = bio_data_dir(*bio_orig);
>  	struct bio_vec *to, from;
>  	struct bvec_iter iter;
> -	unsigned i = 0;
> +	unsigned i = 0, bytes = 0;
>  	bool bounce = false;
> -	int sectors = 0;
> +	int sectors;
>  
>  	bio_for_each_segment(from, *bio_orig, iter) {
>  		if (i++ < BIO_MAX_VECS)
> -			sectors += from.bv_len >> 9;
> +			bytes += from.bv_len;
>  		if (PageHighMem(from.bv_page))
>  			bounce = true;
>  	}
>  	if (!bounce)
>  		return;
>  
> +	/*
> +	 * If the original has more than BIO_MAX_VECS biovecs, the total bytes
> +	 * may not be block size aligned. Align down to ensure both sides of
> +	 * the split bio are appropriately sized.
> +	 */
> +	sectors = ALIGN_DOWN(bytes, queue_logical_block_size(q)) >> SECTOR_SHIFT;
>  	if (sectors < bio_sectors(*bio_orig)) {
>  		bio = bio_split(*bio_orig, sectors, GFP_NOIO, &bounce_bio_split);
>  		bio_chain(bio, *bio_orig);
> -- 
> 2.30.2
> 

Looks good,
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
-- 
Pankaj Raghav

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv4 1/9] block: fix infiniate loop for invalid zone append
  2022-05-26  1:06 ` [PATCHv4 1/9] block: fix infiniate loop for invalid zone append Keith Busch
  2022-05-26  1:48   ` Damien Le Moal
@ 2022-05-31  6:19   ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2022-05-31  6:19 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-fsdevel, linux-block, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, ebiggers, pankydev8, Keith Busch

On Wed, May 25, 2022 at 06:06:05PM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Returning 0 early from __bio_iov_append_get_pages() for the
> max_append_sectors warning just creates an infinite loop since 0 means
> success, and the bio will never fill from the unadvancing iov_iter. We
> could turn the return into an error value, but it will already be turned
> into an error value later on, so just remove the warning. Clearly no one
> ever hit it anyway.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv4 5/9] block: add a helper function for dio alignment
  2022-05-26  1:54   ` Damien Le Moal
@ 2022-05-31  6:22     ` Christoph Hellwig
  0 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2022-05-31  6:22 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Keith Busch, linux-fsdevel, linux-block, axboe, Kernel Team, hch,
	bvanassche, ebiggers, pankydev8, Keith Busch

On Thu, May 26, 2022 at 10:54:02AM +0900, Damien Le Moal wrote:
> Nit: The name of this helper really suggest a bool return, which would be a more
> flexible interface I think:
> 
> 	if (!blkdev_dio_aligned(bdev, pos, iter))
> 		return -EINVAL; <-- or whatever error code the caller wants.
> 
> And that will allow you to get rid of the ret variable in a least
> __blkdev_direct_IO_async.

I agree, a bool return looks more natural here.

Otherwise this looks good to me.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv4 6/9] block/merge: count bytes instead of sectors
  2022-05-26  1:06 ` [PATCHv4 6/9] block/merge: count bytes instead of sectors Keith Busch
  2022-05-26  1:57   ` Damien Le Moal
@ 2022-05-31  6:25   ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2022-05-31  6:25 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-fsdevel, linux-block, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, ebiggers, pankydev8, Keith Busch

On Wed, May 25, 2022 at 06:06:10PM -0700, Keith Busch wrote:
> +	/*
> +	 * Individual bvecs may not be logical block aligned, so round down to
> +	 * that size to ensure both sides of the split bio are appropriately
> +	 * sized.
> +	 */

Maybe I'd word this as:

	/*
	 * Individual bvecs may not be logical block aligned.  Round down
	 * the split size so that each bio is properly sector size aligned,
	 * even if we do not use the full hardware limits.
	 */

Otherwise this looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv4 7/9] block/bounce: count bytes instead of sectors
  2022-05-26  1:06 ` [PATCHv4 7/9] block/bounce: " Keith Busch
  2022-05-26  1:58   ` Damien Le Moal
  2022-05-30 15:08   ` Pankaj Raghav
@ 2022-05-31  6:26   ` Christoph Hellwig
  2 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2022-05-31  6:26 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-fsdevel, linux-block, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, ebiggers, pankydev8, Keith Busch

On Wed, May 25, 2022 at 06:06:11PM -0700, Keith Busch wrote:
> +	/*
> +	 * If the original has more than BIO_MAX_VECS biovecs, the total bytes
> +	 * may not be block size aligned. Align down to ensure both sides of
> +	 * the split bio are appropriately sized.
> +	 */

Same comment on the comment as fo the previous patch.

> +	sectors = ALIGN_DOWN(bytes, queue_logical_block_size(q)) >> SECTOR_SHIFT;

Overly long line here.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv4 9/9] fs: add support for dma aligned direct-io
  2022-05-26  1:06 ` [PATCHv4 9/9] fs: add support for dma aligned direct-io Keith Busch
  2022-05-26  2:01   ` Damien Le Moal
@ 2022-05-31  6:29   ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2022-05-31  6:29 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-fsdevel, linux-block, axboe, Kernel Team, hch, bvanassche,
	damien.lemoal, ebiggers, pankydev8, Keith Busch

On Wed, May 25, 2022 at 06:06:13PM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Use the address alignment requirements from the hardware for direct io
> instead of requiring addresses be aligned to the block size.

I'd suggest dropping the legacy direct-io.c change.  Anyone who wants
new features really should be using the iomap code, which also
significantly reduced the amount of testing needed here and the chance
of regressions.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv4 8/9] block: relax direct io memory alignment
  2022-05-26 20:32           ` Keith Busch
@ 2022-05-31  6:30             ` Christoph Hellwig
  0 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2022-05-31  6:30 UTC (permalink / raw)
  To: Keith Busch
  Cc: Eric Biggers, Keith Busch, linux-fsdevel, linux-block, axboe,
	Kernel Team, hch, bvanassche, damien.lemoal, pankydev8

On Thu, May 26, 2022 at 02:32:12PM -0600, Keith Busch wrote:
> On Thu, May 26, 2022 at 12:55:50PM -0600, Keith Busch wrote:
> > 
> > Let me see how difficult it would be catch it early in blkdev_dio_aligned().
> 
> Something like this appears to work:

This looks reasonable to me.  Nits below:

> +	if (!iov_iter_aligned(iter, bdev_dma_alignment(bdev),
> +			      bdev_logical_block_size(bdev) - 1))

This probably wants a helper so that it can also be reused by e.g.
iomap.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2022-05-31  6:30 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-26  1:06 [PATCHv4 0/9] direct io dma alignment Keith Busch
2022-05-26  1:06 ` [PATCHv4 1/9] block: fix infiniate loop for invalid zone append Keith Busch
2022-05-26  1:48   ` Damien Le Moal
2022-05-31  6:19   ` Christoph Hellwig
2022-05-26  1:06 ` [PATCHv4 2/9] block/bio: remove duplicate append pages code Keith Busch
2022-05-26  1:06 ` [PATCHv4 3/9] block: export dma_alignment attribute Keith Busch
2022-05-26  7:30   ` Eric Biggers
2022-05-26  1:06 ` [PATCHv4 4/9] block: introduce bdev_dma_alignment helper Keith Busch
2022-05-26  1:06 ` [PATCHv4 5/9] block: add a helper function for dio alignment Keith Busch
2022-05-26  1:54   ` Damien Le Moal
2022-05-31  6:22     ` Christoph Hellwig
2022-05-26  1:06 ` [PATCHv4 6/9] block/merge: count bytes instead of sectors Keith Busch
2022-05-26  1:57   ` Damien Le Moal
2022-05-31  6:25   ` Christoph Hellwig
2022-05-26  1:06 ` [PATCHv4 7/9] block/bounce: " Keith Busch
2022-05-26  1:58   ` Damien Le Moal
2022-05-30 15:08   ` Pankaj Raghav
2022-05-31  6:26   ` Christoph Hellwig
2022-05-26  1:06 ` [PATCHv4 8/9] block: relax direct io memory alignment Keith Busch
2022-05-26  2:03   ` Damien Le Moal
2022-05-26  7:29   ` Eric Biggers
2022-05-26 13:50     ` Keith Busch
2022-05-26 18:13       ` Eric Biggers
2022-05-26 18:55         ` Keith Busch
2022-05-26 20:32           ` Keith Busch
2022-05-31  6:30             ` Christoph Hellwig
2022-05-26  1:06 ` [PATCHv4 9/9] fs: add support for dma aligned direct-io Keith Busch
2022-05-26  2:01   ` Damien Le Moal
2022-05-31  6:29   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.