All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] Support DMA segments smaller than the page size
@ 2022-10-19 22:23 Bart Van Assche
  2022-10-19 22:23 ` [PATCH 01/10] block: Remove request.write_hint Bart Van Assche
                   ` (9 more replies)
  0 siblings, 10 replies; 21+ messages in thread
From: Bart Van Assche @ 2022-10-19 22:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Bart Van Assche

Hi Jens,

We (Android team) would like to support storage controllers with a segment size
of 4 KiB in combination with a 16 KiB page size. Hence this patch series that
adds supports for DMA segments smaller than the page size. In addition a few
cleanup patches for the block layer are included. Please consider this patch
series for the kernel v6.2 merge window.

Thanks,

Bart.

Bart Van Assche (10):
  block: Remove request.write_hint
  block: Constify most queue limits pointers
  block: Micro-optimize get_max_segment_size()
  block: Add support for small segments in blk_rq_map_user_iov()
  block: Introduce QUEUE_FLAG_SUB_PAGE_SEGMENTS
  block: Fix the number of segment calculations
  block: Add support for segments smaller than the page size
  scsi: core: Set the SUB_PAGE_SEGMENTS request queue flag
  scsi_debug: Support configuring the maximum segment size
  null_blk: Support configuring the maximum segment size

 block/blk-map.c                   | 43 +++++++++++++++++++++-----
 block/blk-merge.c                 | 50 +++++++++++++++++++------------
 block/blk-mq.c                    |  2 ++
 block/blk-settings.c              | 16 +++++-----
 block/blk.h                       | 17 +++++++----
 drivers/block/null_blk/main.c     | 20 +++++++++++--
 drivers/block/null_blk/null_blk.h |  1 +
 drivers/scsi/scsi_debug.c         |  3 ++
 drivers/scsi/scsi_lib.c           |  2 ++
 include/linux/blk-mq.h            |  1 -
 include/linux/blkdev.h            |  3 ++
 11 files changed, 115 insertions(+), 43 deletions(-)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 01/10] block: Remove request.write_hint
  2022-10-19 22:23 [PATCH 00/10] Support DMA segments smaller than the page size Bart Van Assche
@ 2022-10-19 22:23 ` Bart Van Assche
  2022-10-21  1:59   ` Ming Lei
  2022-10-19 22:23 ` [PATCH 02/10] block: Constify most queue limits pointers Bart Van Assche
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2022-10-19 22:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei

Commit c75e707fe1aa ("block: remove the per-bio/request write hint")
removed all code that uses the struct request write_hint member. Hence
also remove 'write_hint' itself.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 include/linux/blk-mq.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index ba18e9bdb799..569053ed959d 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -140,7 +140,6 @@ struct request {
 	struct blk_crypto_keyslot *crypt_keyslot;
 #endif
 
-	unsigned short write_hint;
 	unsigned short ioprio;
 
 	enum mq_rq_state state;

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 02/10] block: Constify most queue limits pointers
  2022-10-19 22:23 [PATCH 00/10] Support DMA segments smaller than the page size Bart Van Assche
  2022-10-19 22:23 ` [PATCH 01/10] block: Remove request.write_hint Bart Van Assche
@ 2022-10-19 22:23 ` Bart Van Assche
  2022-10-21  2:01   ` Ming Lei
  2022-10-19 22:23 ` [PATCH 03/10] block: Micro-optimize get_max_segment_size() Bart Van Assche
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2022-10-19 22:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei, Keith Busch

Document which functions do not modify the queue limits.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-map.c      |  2 +-
 block/blk-merge.c    | 29 ++++++++++++++++-------------
 block/blk-settings.c |  6 +++---
 block/blk.h          | 11 ++++++-----
 4 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/block/blk-map.c b/block/blk-map.c
index 34735626b00f..46688e70b141 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -555,7 +555,7 @@ static int blk_rq_map_user_bvec(struct request *rq, const struct iov_iter *iter)
 	size_t nr_iter = iov_iter_count(iter);
 	size_t nr_segs = iter->nr_segs;
 	struct bio_vec *bvecs, *bvprvp = NULL;
-	struct queue_limits *lim = &q->limits;
+	const struct queue_limits *lim = &q->limits;
 	unsigned int nsegs = 0, bytes = 0;
 	struct bio *bio;
 	size_t i;
diff --git a/block/blk-merge.c b/block/blk-merge.c
index ff04e9290715..58fdc3f8905b 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -100,13 +100,14 @@ static inline bool req_gap_front_merge(struct request *req, struct bio *bio)
  * is defined as 'unsigned int', meantime it has to be aligned to with the
  * logical block size, which is the minimum accepted unit by hardware.
  */
-static unsigned int bio_allowed_max_sectors(struct queue_limits *lim)
+static unsigned int bio_allowed_max_sectors(const struct queue_limits *lim)
 {
 	return round_down(UINT_MAX, lim->logical_block_size) >> SECTOR_SHIFT;
 }
 
-static struct bio *bio_split_discard(struct bio *bio, struct queue_limits *lim,
-		unsigned *nsegs, struct bio_set *bs)
+static struct bio *bio_split_discard(struct bio *bio,
+				     const struct queue_limits *lim,
+				     unsigned *nsegs, struct bio_set *bs)
 {
 	unsigned int max_discard_sectors, granularity;
 	sector_t tmp;
@@ -146,7 +147,8 @@ static struct bio *bio_split_discard(struct bio *bio, struct queue_limits *lim,
 }
 
 static struct bio *bio_split_write_zeroes(struct bio *bio,
-		struct queue_limits *lim, unsigned *nsegs, struct bio_set *bs)
+					  const struct queue_limits *lim,
+					  unsigned *nsegs, struct bio_set *bs)
 {
 	*nsegs = 0;
 	if (!lim->max_write_zeroes_sectors)
@@ -165,7 +167,7 @@ static struct bio *bio_split_write_zeroes(struct bio *bio,
  * aligned to a physical block boundary.
  */
 static inline unsigned get_max_io_size(struct bio *bio,
-		struct queue_limits *lim)
+				       const struct queue_limits *lim)
 {
 	unsigned pbs = lim->physical_block_size >> SECTOR_SHIFT;
 	unsigned lbs = lim->logical_block_size >> SECTOR_SHIFT;
@@ -184,7 +186,7 @@ static inline unsigned get_max_io_size(struct bio *bio,
 	return max_sectors & ~(lbs - 1);
 }
 
-static inline unsigned get_max_segment_size(struct queue_limits *lim,
+static inline unsigned get_max_segment_size(const struct queue_limits *lim,
 		struct page *start_page, unsigned long offset)
 {
 	unsigned long mask = lim->seg_boundary_mask;
@@ -219,9 +221,9 @@ static inline unsigned get_max_segment_size(struct queue_limits *lim,
  * *@nsegs segments and *@sectors sectors would make that bio unacceptable for
  * the block driver.
  */
-static bool bvec_split_segs(struct queue_limits *lim, const struct bio_vec *bv,
-		unsigned *nsegs, unsigned *bytes, unsigned max_segs,
-		unsigned max_bytes)
+static bool bvec_split_segs(const struct queue_limits *lim,
+		const struct bio_vec *bv, unsigned *nsegs, unsigned *bytes,
+		unsigned max_segs, unsigned max_bytes)
 {
 	unsigned max_len = min(max_bytes, UINT_MAX) - *bytes;
 	unsigned len = min(bv->bv_len, max_len);
@@ -267,7 +269,7 @@ static bool bvec_split_segs(struct queue_limits *lim, const struct bio_vec *bv,
  * responsible for ensuring that @bs is only destroyed after processing of the
  * split bio has finished.
  */
-static struct bio *bio_split_rw(struct bio *bio, struct queue_limits *lim,
+static struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim,
 		unsigned *segs, struct bio_set *bs, unsigned max_bytes)
 {
 	struct bio_vec bv, bvprv, *bvprvp = NULL;
@@ -331,8 +333,9 @@ static struct bio *bio_split_rw(struct bio *bio, struct queue_limits *lim,
  * The split bio is allocated from @q->bio_split, which is provided by the
  * block layer.
  */
-struct bio *__bio_split_to_limits(struct bio *bio, struct queue_limits *lim,
-		       unsigned int *nr_segs)
+struct bio *__bio_split_to_limits(struct bio *bio,
+				  const struct queue_limits *lim,
+				  unsigned int *nr_segs)
 {
 	struct bio_set *bs = &bio->bi_bdev->bd_disk->bio_split;
 	struct bio *split;
@@ -377,7 +380,7 @@ struct bio *__bio_split_to_limits(struct bio *bio, struct queue_limits *lim,
  */
 struct bio *bio_split_to_limits(struct bio *bio)
 {
-	struct queue_limits *lim = &bdev_get_queue(bio->bi_bdev)->limits;
+	const struct queue_limits *lim = &bdev_get_queue(bio->bi_bdev)->limits;
 	unsigned int nr_segs;
 
 	if (bio_may_exceed_limits(bio, lim))
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 8bb9eef5310e..1cba5c2a2796 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -481,7 +481,7 @@ void blk_queue_io_opt(struct request_queue *q, unsigned int opt)
 }
 EXPORT_SYMBOL(blk_queue_io_opt);
 
-static int queue_limit_alignment_offset(struct queue_limits *lim,
+static int queue_limit_alignment_offset(const struct queue_limits *lim,
 		sector_t sector)
 {
 	unsigned int granularity = max(lim->physical_block_size, lim->io_min);
@@ -491,8 +491,8 @@ static int queue_limit_alignment_offset(struct queue_limits *lim,
 	return (granularity + lim->alignment_offset - alignment) % granularity;
 }
 
-static unsigned int queue_limit_discard_alignment(struct queue_limits *lim,
-		sector_t sector)
+static unsigned int queue_limit_discard_alignment(
+		const struct queue_limits *lim, sector_t sector)
 {
 	unsigned int alignment, granularity, offset;
 
diff --git a/block/blk.h b/block/blk.h
index d6ea0d1a6db0..7f9e089ab1f7 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -104,7 +104,7 @@ static inline bool biovec_phys_mergeable(struct request_queue *q,
 	return true;
 }
 
-static inline bool __bvec_gap_to_prev(struct queue_limits *lim,
+static inline bool __bvec_gap_to_prev(const struct queue_limits *lim,
 		struct bio_vec *bprv, unsigned int offset)
 {
 	return (offset & lim->virt_boundary_mask) ||
@@ -115,7 +115,7 @@ static inline bool __bvec_gap_to_prev(struct queue_limits *lim,
  * Check if adding a bio_vec after bprv with offset would create a gap in
  * the SG list. Most drivers don't care about this, but some do.
  */
-static inline bool bvec_gap_to_prev(struct queue_limits *lim,
+static inline bool bvec_gap_to_prev(const struct queue_limits *lim,
 		struct bio_vec *bprv, unsigned int offset)
 {
 	if (!lim->virt_boundary_mask)
@@ -297,7 +297,7 @@ ssize_t part_timeout_store(struct device *, struct device_attribute *,
 				const char *, size_t);
 
 static inline bool bio_may_exceed_limits(struct bio *bio,
-		struct queue_limits *lim)
+					 const struct queue_limits *lim)
 {
 	switch (bio_op(bio)) {
 	case REQ_OP_DISCARD:
@@ -320,8 +320,9 @@ static inline bool bio_may_exceed_limits(struct bio *bio,
 		bio->bi_io_vec->bv_len + bio->bi_io_vec->bv_offset > PAGE_SIZE;
 }
 
-struct bio *__bio_split_to_limits(struct bio *bio, struct queue_limits *lim,
-		       unsigned int *nr_segs);
+struct bio *__bio_split_to_limits(struct bio *bio,
+				  const struct queue_limits *lim,
+				  unsigned int *nr_segs);
 int ll_back_merge_fn(struct request *req, struct bio *bio,
 		unsigned int nr_segs);
 bool blk_attempt_req_merge(struct request_queue *q, struct request *rq,

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 03/10] block: Micro-optimize get_max_segment_size()
  2022-10-19 22:23 [PATCH 00/10] Support DMA segments smaller than the page size Bart Van Assche
  2022-10-19 22:23 ` [PATCH 01/10] block: Remove request.write_hint Bart Van Assche
  2022-10-19 22:23 ` [PATCH 02/10] block: Constify most queue limits pointers Bart Van Assche
@ 2022-10-19 22:23 ` Bart Van Assche
  2022-10-21  2:17   ` Ming Lei
  2022-10-19 22:23 ` [PATCH 04/10] block: Add support for small segments in blk_rq_map_user_iov() Bart Van Assche
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2022-10-19 22:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei,
	Keith Busch, Steven Rostedt, Guenter Roeck

This patch removes a conditional jump from get_max_segment_size(). The
x86-64 assembler code for this function without this patch is as follows:

206             return min_not_zero(mask - offset + 1,
   0x0000000000000118 <+72>:    not    %rax
   0x000000000000011b <+75>:    and    0x8(%r10),%rax
   0x000000000000011f <+79>:    add    $0x1,%rax
   0x0000000000000123 <+83>:    je     0x138 <bvec_split_segs+104>
   0x0000000000000125 <+85>:    cmp    %rdx,%rax
   0x0000000000000128 <+88>:    mov    %rdx,%r12
   0x000000000000012b <+91>:    cmovbe %rax,%r12
   0x000000000000012f <+95>:    test   %rdx,%rdx
   0x0000000000000132 <+98>:    mov    %eax,%edx
   0x0000000000000134 <+100>:   cmovne %r12d,%edx

With this patch applied:

206             return min(mask - offset, (unsigned long)lim->max_segment_size - 1) + 1;
   0x000000000000003f <+63>:    mov    0x28(%rdi),%ebp
   0x0000000000000042 <+66>:    not    %rax
   0x0000000000000045 <+69>:    and    0x8(%rdi),%rax
   0x0000000000000049 <+73>:    sub    $0x1,%rbp
   0x000000000000004d <+77>:    cmp    %rbp,%rax
   0x0000000000000050 <+80>:    cmova  %rbp,%rax
   0x0000000000000054 <+84>:    add    $0x1,%eax

Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-merge.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 58fdc3f8905b..35a8f75cc45d 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -186,6 +186,14 @@ static inline unsigned get_max_io_size(struct bio *bio,
 	return max_sectors & ~(lbs - 1);
 }
 
+/**
+ * get_max_segment_size() - maximum number of bytes to add as a single segment
+ * @lim: Request queue limits.
+ * @start_page: See below.
+ * @offset: Offset from @start_page where to add a segment.
+ *
+ * Returns the maximum number of bytes that can be added as a single segment.
+ */
 static inline unsigned get_max_segment_size(const struct queue_limits *lim,
 		struct page *start_page, unsigned long offset)
 {
@@ -194,11 +202,10 @@ static inline unsigned get_max_segment_size(const struct queue_limits *lim,
 	offset = mask & (page_to_phys(start_page) + offset);
 
 	/*
-	 * overflow may be triggered in case of zero page physical address
-	 * on 32bit arch, use queue's max segment size when that happens.
+	 * Prevent an overflow if mask = ULONG_MAX and offset = 0 by adding 1
+	 * after having calculated the minimum.
 	 */
-	return min_not_zero(mask - offset + 1,
-			(unsigned long)lim->max_segment_size);
+	return min(mask - offset, (unsigned long)lim->max_segment_size - 1) + 1;
 }
 
 /**

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 04/10] block: Add support for small segments in blk_rq_map_user_iov()
  2022-10-19 22:23 [PATCH 00/10] Support DMA segments smaller than the page size Bart Van Assche
                   ` (2 preceding siblings ...)
  2022-10-19 22:23 ` [PATCH 03/10] block: Micro-optimize get_max_segment_size() Bart Van Assche
@ 2022-10-19 22:23 ` Bart Van Assche
  2022-10-19 22:23 ` [PATCH 05/10] block: Introduce QUEUE_FLAG_SUB_PAGE_SEGMENTS Bart Van Assche
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2022-10-19 22:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei, Keith Busch

Before changing the return value of bio_add_hw_page() into a value in
the range [0, len], make blk_rq_map_user_iov() fall back to copying data
if mapping the data is not possible due to the segment limit.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-map.c | 27 ++++++++++++++++++++++-----
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/block/blk-map.c b/block/blk-map.c
index 46688e70b141..4505307d758e 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -304,17 +304,26 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 		else {
 			for (j = 0; j < npages; j++) {
 				struct page *page = pages[j];
-				unsigned int n = PAGE_SIZE - offs;
+				unsigned int n = PAGE_SIZE - offs, added;
 				bool same_page = false;
 
 				if (n > bytes)
 					n = bytes;
 
-				if (!bio_add_hw_page(rq->q, bio, page, n, offs,
-						     max_sectors, &same_page)) {
+				added = bio_add_hw_page(rq->q, bio, page, n,
+						offs, max_sectors, &same_page);
+				if (added == 0) {
 					if (same_page)
 						put_page(page);
 					break;
+				} else if (added != n) {
+					/*
+					 * The segment size is smaller than the
+					 * page size and an iov exceeds the
+					 * segment size. Give up.
+					 */
+					ret = -EREMOTEIO;
+					goto out_unmap;
 				}
 
 				bytes -= n;
@@ -654,10 +663,18 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
 
 	i = *iter;
 	do {
-		if (copy)
+		if (copy) {
 			ret = bio_copy_user_iov(rq, map_data, &i, gfp_mask);
-		else
+		} else {
 			ret = bio_map_user_iov(rq, &i, gfp_mask);
+			/*
+			 * Fall back to copying the data if bio_map_user_iov()
+			 * returns -EREMOTEIO.
+			 */
+			if (ret == -EREMOTEIO)
+				ret = bio_copy_user_iov(rq, map_data, &i,
+							gfp_mask);
+		}
 		if (ret)
 			goto unmap_rq;
 		if (!bio)

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 05/10] block: Introduce QUEUE_FLAG_SUB_PAGE_SEGMENTS
  2022-10-19 22:23 [PATCH 00/10] Support DMA segments smaller than the page size Bart Van Assche
                   ` (3 preceding siblings ...)
  2022-10-19 22:23 ` [PATCH 04/10] block: Add support for small segments in blk_rq_map_user_iov() Bart Van Assche
@ 2022-10-19 22:23 ` Bart Van Assche
  2022-10-19 22:23 ` [PATCH 06/10] block: Fix the number of segment calculations Bart Van Assche
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2022-10-19 22:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei, Keith Busch

Prepare for introducing support for segments smaller than the page size
one driver at a time by introducing the request queue flag
QUEUE_FLAG_SUB_PAGE_SEGMENTS. Although I am not aware of any storage
controller that restricts the segment size to 512 bytes, supporting 512
bytes as minimum segment size makes it easy to test small segment support
on systems with PAGE_SIZE = 4096.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-settings.c   | 10 ++++++----
 include/linux/blkdev.h |  3 +++
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 1cba5c2a2796..1b7687d0ece2 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -277,10 +277,12 @@ EXPORT_SYMBOL_GPL(blk_queue_max_discard_segments);
  **/
 void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size)
 {
-	if (max_size < PAGE_SIZE) {
-		max_size = PAGE_SIZE;
-		printk(KERN_INFO "%s: set to minimum %d\n",
-		       __func__, max_size);
+	unsigned int min_segment_size = blk_queue_sub_page_segments(q) ?
+		SECTOR_SIZE : PAGE_SIZE;
+
+	if (max_size < min_segment_size) {
+		max_size = min_segment_size;
+		printk(KERN_INFO "%s: set to minimum %d\n", __func__, max_size);
 	}
 
 	/* see blk_queue_virt_boundary() for the explanation */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 50e358a19d98..6757f836fd57 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -579,6 +579,7 @@ struct request_queue {
 #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
 #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
 #define QUEUE_FLAG_SQ_SCHED     30	/* single queue style io dispatch */
+#define QUEUE_FLAG_SUB_PAGE_SEGMENTS 31	/* segments smaller than one page */
 
 #define QUEUE_FLAG_MQ_DEFAULT	((1UL << QUEUE_FLAG_IO_STAT) |		\
 				 (1UL << QUEUE_FLAG_SAME_COMP) |	\
@@ -619,6 +620,8 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_pm_only(q)	atomic_read(&(q)->pm_only)
 #define blk_queue_registered(q)	test_bit(QUEUE_FLAG_REGISTERED, &(q)->queue_flags)
 #define blk_queue_sq_sched(q)	test_bit(QUEUE_FLAG_SQ_SCHED, &(q)->queue_flags)
+#define blk_queue_sub_page_segments(q)				\
+	test_bit(QUEUE_FLAG_SUB_PAGE_SEGMENTS, &(q)->queue_flags)
 
 extern void blk_set_pm_only(struct request_queue *q);
 extern void blk_clear_pm_only(struct request_queue *q);

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 06/10] block: Fix the number of segment calculations
  2022-10-19 22:23 [PATCH 00/10] Support DMA segments smaller than the page size Bart Van Assche
                   ` (4 preceding siblings ...)
  2022-10-19 22:23 ` [PATCH 05/10] block: Introduce QUEUE_FLAG_SUB_PAGE_SEGMENTS Bart Van Assche
@ 2022-10-19 22:23 ` Bart Van Assche
  2022-11-01 17:23   ` Bart Van Assche
  2022-10-19 22:23 ` [PATCH 07/10] block: Add support for segments smaller than the page size Bart Van Assche
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2022-10-19 22:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei, Keith Busch

Since multi-page bvecs are supported there can be multiple segments per
bvec (see also bvec_split_segs()). Hence this patch that calculates the
number of segments per bvec instead of assuming that there is only one
segment per bvec.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-map.c | 14 +++++++++++++-
 block/blk-mq.c  |  2 ++
 block/blk.h     |  3 +++
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/block/blk-map.c b/block/blk-map.c
index 4505307d758e..2976a8d68992 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -529,6 +529,18 @@ static struct bio *bio_copy_kern(struct request_queue *q, void *data,
 	return ERR_PTR(-ENOMEM);
 }
 
+/* Number of DMA segments required to transfer @bytes data. */
+unsigned int blk_segments(const struct queue_limits *limits, unsigned int bytes)
+{
+	const unsigned int mss = limits->max_segment_size;
+
+	if (bytes <= mss)
+		return 1;
+	if (is_power_of_2(mss))
+		return round_up(bytes, mss) >> ilog2(mss);
+	return (bytes + mss - 1) / mss;
+}
+
 /*
  * Append a bio to a passthrough request.  Only works if the bio can be merged
  * into the request based on the driver constraints.
@@ -540,7 +552,7 @@ int blk_rq_append_bio(struct request *rq, struct bio *bio)
 	unsigned int nr_segs = 0;
 
 	bio_for_each_bvec(bv, bio, iter)
-		nr_segs++;
+		nr_segs += blk_segments(&rq->q->limits, bv.bv_len);
 
 	if (!rq->bio) {
 		blk_rq_bio_prep(rq, bio, nr_segs);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8070b6c10e8d..82ae8099afdd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2883,6 +2883,8 @@ void blk_mq_submit_bio(struct bio *bio)
 	bio = blk_queue_bounce(bio, q);
 	if (bio_may_exceed_limits(bio, &q->limits))
 		bio = __bio_split_to_limits(bio, &q->limits, &nr_segs);
+	else if (bio->bi_vcnt == 1)
+		nr_segs = blk_segments(&q->limits, bio->bi_io_vec[0].bv_len);
 
 	if (!bio_integrity_prep(bio))
 		return;
diff --git a/block/blk.h b/block/blk.h
index 7f9e089ab1f7..342ba7d86e87 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -81,6 +81,9 @@ struct bio_vec *bvec_alloc(mempool_t *pool, unsigned short *nr_vecs,
 		gfp_t gfp_mask);
 void bvec_free(mempool_t *pool, struct bio_vec *bv, unsigned short nr_vecs);
 
+unsigned int blk_segments(const struct queue_limits *limits,
+			  unsigned int bytes);
+
 static inline bool biovec_phys_mergeable(struct request_queue *q,
 		struct bio_vec *vec1, struct bio_vec *vec2)
 {

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 07/10] block: Add support for segments smaller than the page size
  2022-10-19 22:23 [PATCH 00/10] Support DMA segments smaller than the page size Bart Van Assche
                   ` (5 preceding siblings ...)
  2022-10-19 22:23 ` [PATCH 06/10] block: Fix the number of segment calculations Bart Van Assche
@ 2022-10-19 22:23 ` Bart Van Assche
  2022-10-19 22:23 ` [PATCH 08/10] scsi: core: Set the SUB_PAGE_SEGMENTS request queue flag Bart Van Assche
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2022-10-19 22:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei, Keith Busch

Add the necessary checks in the bio splitting code for splitting bios
into segments smaller than the page size.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-merge.c | 6 ++++--
 block/blk.h       | 3 ++-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 35a8f75cc45d..7badfbed09fc 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -294,7 +294,8 @@ static struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim,
 		if (nsegs < lim->max_segments &&
 		    bytes + bv.bv_len <= max_bytes &&
 		    bv.bv_offset + bv.bv_len <= PAGE_SIZE) {
-			nsegs++;
+			/* single-page bvec optimization */
+			nsegs += blk_segments(lim, bv.bv_len);
 			bytes += bv.bv_len;
 		} else {
 			if (bvec_split_segs(lim, &bv, &nsegs, &bytes,
@@ -531,7 +532,8 @@ static int __blk_bios_map_sg(struct request_queue *q, struct bio *bio,
 			    __blk_segment_map_sg_merge(q, &bvec, &bvprv, sg))
 				goto next_bvec;
 
-			if (bvec.bv_offset + bvec.bv_len <= PAGE_SIZE)
+			if (bvec.bv_offset + bvec.bv_len <= PAGE_SIZE &&
+			    bvec.bv_len <= q->limits.max_segment_size)
 				nsegs += __blk_bvec_map_sg(bvec, sglist, sg);
 			else
 				nsegs += blk_bvec_map_sg(q, &bvec, sglist, sg);
diff --git a/block/blk.h b/block/blk.h
index 342ba7d86e87..04328211ae3b 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -312,7 +312,7 @@ static inline bool bio_may_exceed_limits(struct bio *bio,
 	}
 
 	/*
-	 * All drivers must accept single-segments bios that are <= PAGE_SIZE.
+	 * Check whether bio splitting should be performed.
 	 * This is a quick and dirty check that relies on the fact that
 	 * bi_io_vec[0] is always valid if a bio has data.  The check might
 	 * lead to occasional false negatives when bios are cloned, but compared
@@ -320,6 +320,7 @@ static inline bool bio_may_exceed_limits(struct bio *bio,
 	 * doesn't matter anyway.
 	 */
 	return lim->chunk_sectors || bio->bi_vcnt != 1 ||
+		bio->bi_io_vec->bv_len > lim->max_segment_size ||
 		bio->bi_io_vec->bv_len + bio->bi_io_vec->bv_offset > PAGE_SIZE;
 }
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 08/10] scsi: core: Set the SUB_PAGE_SEGMENTS request queue flag
  2022-10-19 22:23 [PATCH 00/10] Support DMA segments smaller than the page size Bart Van Assche
                   ` (6 preceding siblings ...)
  2022-10-19 22:23 ` [PATCH 07/10] block: Add support for segments smaller than the page size Bart Van Assche
@ 2022-10-19 22:23 ` Bart Van Assche
  2022-10-19 22:23 ` [PATCH 09/10] scsi_debug: Support configuring the maximum segment size Bart Van Assche
  2022-10-19 22:23 ` [PATCH 10/10] null_blk: " Bart Van Assche
  9 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2022-10-19 22:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche,
	Martin K . Petersen, James E.J. Bottomley

This patch enables support for segments smaller than the page size.
No changes other than setting the SUB_PAGE_SEGMENTS flag are required
since the SCSI code uses blk_rq_map_sg().

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_lib.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8b89fab7c420..821149a57fbf 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1877,6 +1877,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
 {
 	struct device *dev = shost->dma_dev;
 
+	blk_queue_flag_set(QUEUE_FLAG_SUB_PAGE_SEGMENTS, q);
+
 	/*
 	 * this limit is imposed by hardware restrictions
 	 */

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 09/10] scsi_debug: Support configuring the maximum segment size
  2022-10-19 22:23 [PATCH 00/10] Support DMA segments smaller than the page size Bart Van Assche
                   ` (7 preceding siblings ...)
  2022-10-19 22:23 ` [PATCH 08/10] scsi: core: Set the SUB_PAGE_SEGMENTS request queue flag Bart Van Assche
@ 2022-10-19 22:23 ` Bart Van Assche
  2022-10-19 22:23 ` [PATCH 10/10] null_blk: " Bart Van Assche
  9 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2022-10-19 22:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Doug Gilbert,
	Martin K . Petersen, James E.J. Bottomley

Add a kernel module parameter for configuring the maximum segment size.
This patch enables testing SCSI support for segments smaller than the
page size.

Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_debug.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 697fc57bc711..1c7575440519 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -770,6 +770,7 @@ static int sdebug_sector_size = DEF_SECTOR_SIZE;
 static int sdeb_tur_ms_to_ready = DEF_TUR_MS_TO_READY;
 static int sdebug_virtual_gb = DEF_VIRTUAL_GB;
 static int sdebug_vpd_use_hostno = DEF_VPD_USE_HOSTNO;
+static unsigned int sdebug_max_segment_size = -1U;
 static unsigned int sdebug_lbpu = DEF_LBPU;
 static unsigned int sdebug_lbpws = DEF_LBPWS;
 static unsigned int sdebug_lbpws10 = DEF_LBPWS10;
@@ -5844,6 +5845,7 @@ module_param_named(ndelay, sdebug_ndelay, int, S_IRUGO | S_IWUSR);
 module_param_named(no_lun_0, sdebug_no_lun_0, int, S_IRUGO | S_IWUSR);
 module_param_named(no_rwlock, sdebug_no_rwlock, bool, S_IRUGO | S_IWUSR);
 module_param_named(no_uld, sdebug_no_uld, int, S_IRUGO);
+module_param_named(max_segment_size, sdebug_max_segment_size, uint, S_IRUGO);
 module_param_named(num_parts, sdebug_num_parts, int, S_IRUGO);
 module_param_named(num_tgts, sdebug_num_tgts, int, S_IRUGO | S_IWUSR);
 module_param_named(opt_blks, sdebug_opt_blks, int, S_IRUGO);
@@ -7804,6 +7806,7 @@ static int sdebug_driver_probe(struct device *dev)
 
 	sdebug_driver_template.can_queue = sdebug_max_queue;
 	sdebug_driver_template.cmd_per_lun = sdebug_max_queue;
+	sdebug_driver_template.max_segment_size = sdebug_max_segment_size;
 	if (!sdebug_clustering)
 		sdebug_driver_template.dma_boundary = PAGE_SIZE - 1;
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 10/10] null_blk: Support configuring the maximum segment size
  2022-10-19 22:23 [PATCH 00/10] Support DMA segments smaller than the page size Bart Van Assche
                   ` (8 preceding siblings ...)
  2022-10-19 22:23 ` [PATCH 09/10] scsi_debug: Support configuring the maximum segment size Bart Van Assche
@ 2022-10-19 22:23 ` Bart Van Assche
  2022-10-20 23:13   ` Damien Le Moal
  9 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2022-10-19 22:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei,
	Damien Le Moal, Chaitanya Kulkarni, Johannes Thumshirn,
	Vincent Fu, Shin'ichiro Kawasaki, Yu Kuai

Add support for configuring the maximum segment size.

Add support for segments smaller than the page size.

This patch enables testing segments smaller than the page size with a
driver that does not call blk_rq_map_sg().

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/block/null_blk/main.c     | 20 +++++++++++++++++---
 drivers/block/null_blk/null_blk.h |  1 +
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index 1f154f92f4c2..bc811ab52c4a 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -157,6 +157,10 @@ static int g_max_sectors;
 module_param_named(max_sectors, g_max_sectors, int, 0444);
 MODULE_PARM_DESC(max_sectors, "Maximum size of a command (in 512B sectors)");
 
+static unsigned int g_max_segment_size = 1UL << 31;
+module_param_named(max_segment_size, g_max_segment_size, int, 0444);
+MODULE_PARM_DESC(max_segment_size, "Maximum size of a segment in bytes");
+
 static unsigned int nr_devices = 1;
 module_param(nr_devices, uint, 0444);
 MODULE_PARM_DESC(nr_devices, "Number of devices to register");
@@ -409,6 +413,7 @@ NULLB_DEVICE_ATTR(home_node, uint, NULL);
 NULLB_DEVICE_ATTR(queue_mode, uint, NULL);
 NULLB_DEVICE_ATTR(blocksize, uint, NULL);
 NULLB_DEVICE_ATTR(max_sectors, uint, NULL);
+NULLB_DEVICE_ATTR(max_segment_size, uint, NULL);
 NULLB_DEVICE_ATTR(irqmode, uint, NULL);
 NULLB_DEVICE_ATTR(hw_queue_depth, uint, NULL);
 NULLB_DEVICE_ATTR(index, uint, NULL);
@@ -532,6 +537,7 @@ static struct configfs_attribute *nullb_device_attrs[] = {
 	&nullb_device_attr_queue_mode,
 	&nullb_device_attr_blocksize,
 	&nullb_device_attr_max_sectors,
+	&nullb_device_attr_max_segment_size,
 	&nullb_device_attr_irqmode,
 	&nullb_device_attr_hw_queue_depth,
 	&nullb_device_attr_index,
@@ -610,7 +616,8 @@ static ssize_t memb_group_features_show(struct config_item *item, char *page)
 	return snprintf(page, PAGE_SIZE,
 			"badblocks,blocking,blocksize,cache_size,"
 			"completion_nsec,discard,home_node,hw_queue_depth,"
-			"irqmode,max_sectors,mbps,memory_backed,no_sched,"
+			"irqmode,max_sectors,max_segment_size,mbps,"
+			"memory_backed,no_sched,"
 			"poll_queues,power,queue_mode,shared_tag_bitmap,size,"
 			"submit_queues,use_per_node_hctx,virt_boundary,zoned,"
 			"zone_capacity,zone_max_active,zone_max_open,"
@@ -673,6 +680,7 @@ static struct nullb_device *null_alloc_dev(void)
 	dev->queue_mode = g_queue_mode;
 	dev->blocksize = g_bs;
 	dev->max_sectors = g_max_sectors;
+	dev->max_segment_size = g_max_segment_size;
 	dev->irqmode = g_irqmode;
 	dev->hw_queue_depth = g_hw_queue_depth;
 	dev->blocking = g_blocking;
@@ -1214,6 +1222,8 @@ static int null_transfer(struct nullb *nullb, struct page *page,
 	unsigned int valid_len = len;
 	int err = 0;
 
+	WARN_ONCE(len > dev->max_segment_size, "%u > %u\n", len,
+		  dev->max_segment_size);
 	if (!is_write) {
 		if (dev->zoned)
 			valid_len = null_zone_valid_read_len(nullb,
@@ -1249,7 +1259,8 @@ static int null_handle_rq(struct nullb_cmd *cmd)
 
 	spin_lock_irq(&nullb->lock);
 	rq_for_each_segment(bvec, rq, iter) {
-		len = bvec.bv_len;
+		len = min(bvec.bv_len, nullb->dev->max_segment_size);
+		bvec.bv_len = len;
 		err = null_transfer(nullb, bvec.bv_page, len, bvec.bv_offset,
 				     op_is_write(req_op(rq)), sector,
 				     rq->cmd_flags & REQ_FUA);
@@ -1276,7 +1287,8 @@ static int null_handle_bio(struct nullb_cmd *cmd)
 
 	spin_lock_irq(&nullb->lock);
 	bio_for_each_segment(bvec, bio, iter) {
-		len = bvec.bv_len;
+		len = min(bvec.bv_len, nullb->dev->max_segment_size);
+		bvec.bv_len = len;
 		err = null_transfer(nullb, bvec.bv_page, len, bvec.bv_offset,
 				     op_is_write(bio_op(bio)), sector,
 				     bio->bi_opf & REQ_FUA);
@@ -2088,6 +2100,7 @@ static int null_add_dev(struct nullb_device *dev)
 	nullb->q->queuedata = nullb;
 	blk_queue_flag_set(QUEUE_FLAG_NONROT, nullb->q);
 	blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, nullb->q);
+	blk_queue_flag_set(QUEUE_FLAG_SUB_PAGE_SEGMENTS, nullb->q);
 
 	mutex_lock(&lock);
 	rv = ida_simple_get(&nullb_indexes, 0, 0, GFP_KERNEL);
@@ -2106,6 +2119,7 @@ static int null_add_dev(struct nullb_device *dev)
 	dev->max_sectors = min_t(unsigned int, dev->max_sectors,
 				 BLK_DEF_MAX_SECTORS);
 	blk_queue_max_hw_sectors(nullb->q, dev->max_sectors);
+	blk_queue_max_segment_size(nullb->q, dev->max_segment_size);
 
 	if (dev->virt_boundary)
 		blk_queue_virt_boundary(nullb->q, PAGE_SIZE - 1);
diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h
index 94ff68052b1e..6784ee9f5fda 100644
--- a/drivers/block/null_blk/null_blk.h
+++ b/drivers/block/null_blk/null_blk.h
@@ -102,6 +102,7 @@ struct nullb_device {
 	unsigned int queue_mode; /* block interface */
 	unsigned int blocksize; /* block size */
 	unsigned int max_sectors; /* Max sectors per command */
+	unsigned int max_segment_size; /* Max size of a single DMA segment. */
 	unsigned int irqmode; /* IRQ completion handler */
 	unsigned int hw_queue_depth; /* queue depth */
 	unsigned int index; /* index of the disk, only valid with a disk */

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 10/10] null_blk: Support configuring the maximum segment size
  2022-10-19 22:23 ` [PATCH 10/10] null_blk: " Bart Van Assche
@ 2022-10-20 23:13   ` Damien Le Moal
  2022-10-20 23:34     ` Bart Van Assche
  0 siblings, 1 reply; 21+ messages in thread
From: Damien Le Moal @ 2022-10-20 23:13 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Ming Lei, Chaitanya Kulkarni,
	Johannes Thumshirn, Vincent Fu, Shin'ichiro Kawasaki,
	Yu Kuai

On 10/20/22 07:23, Bart Van Assche wrote:
> Add support for configuring the maximum segment size.
> 
> Add support for segments smaller than the page size.
> 
> This patch enables testing segments smaller than the page size with a
> driver that does not call blk_rq_map_sg().
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/block/null_blk/main.c     | 20 +++++++++++++++++---
>  drivers/block/null_blk/null_blk.h |  1 +
>  2 files changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
> index 1f154f92f4c2..bc811ab52c4a 100644
> --- a/drivers/block/null_blk/main.c
> +++ b/drivers/block/null_blk/main.c
> @@ -157,6 +157,10 @@ static int g_max_sectors;
>  module_param_named(max_sectors, g_max_sectors, int, 0444);
>  MODULE_PARM_DESC(max_sectors, "Maximum size of a command (in 512B sectors)");
>  
> +static unsigned int g_max_segment_size = 1UL << 31;

Nit: UINT_MAX ?

> +module_param_named(max_segment_size, g_max_segment_size, int, 0444);
> +MODULE_PARM_DESC(max_segment_size, "Maximum size of a segment in bytes");
> +
>  static unsigned int nr_devices = 1;
>  module_param(nr_devices, uint, 0444);
>  MODULE_PARM_DESC(nr_devices, "Number of devices to register");
> @@ -409,6 +413,7 @@ NULLB_DEVICE_ATTR(home_node, uint, NULL);
>  NULLB_DEVICE_ATTR(queue_mode, uint, NULL);
>  NULLB_DEVICE_ATTR(blocksize, uint, NULL);
>  NULLB_DEVICE_ATTR(max_sectors, uint, NULL);
> +NULLB_DEVICE_ATTR(max_segment_size, uint, NULL);
>  NULLB_DEVICE_ATTR(irqmode, uint, NULL);
>  NULLB_DEVICE_ATTR(hw_queue_depth, uint, NULL);
>  NULLB_DEVICE_ATTR(index, uint, NULL);
> @@ -532,6 +537,7 @@ static struct configfs_attribute *nullb_device_attrs[] = {
>  	&nullb_device_attr_queue_mode,
>  	&nullb_device_attr_blocksize,
>  	&nullb_device_attr_max_sectors,
> +	&nullb_device_attr_max_segment_size,
>  	&nullb_device_attr_irqmode,
>  	&nullb_device_attr_hw_queue_depth,
>  	&nullb_device_attr_index,
> @@ -610,7 +616,8 @@ static ssize_t memb_group_features_show(struct config_item *item, char *page)
>  	return snprintf(page, PAGE_SIZE,
>  			"badblocks,blocking,blocksize,cache_size,"
>  			"completion_nsec,discard,home_node,hw_queue_depth,"
> -			"irqmode,max_sectors,mbps,memory_backed,no_sched,"
> +			"irqmode,max_sectors,max_segment_size,mbps,"
> +			"memory_backed,no_sched,"
>  			"poll_queues,power,queue_mode,shared_tag_bitmap,size,"
>  			"submit_queues,use_per_node_hctx,virt_boundary,zoned,"
>  			"zone_capacity,zone_max_active,zone_max_open,"
> @@ -673,6 +680,7 @@ static struct nullb_device *null_alloc_dev(void)
>  	dev->queue_mode = g_queue_mode;
>  	dev->blocksize = g_bs;
>  	dev->max_sectors = g_max_sectors;
> +	dev->max_segment_size = g_max_segment_size;
>  	dev->irqmode = g_irqmode;
>  	dev->hw_queue_depth = g_hw_queue_depth;
>  	dev->blocking = g_blocking;
> @@ -1214,6 +1222,8 @@ static int null_transfer(struct nullb *nullb, struct page *page,
>  	unsigned int valid_len = len;
>  	int err = 0;
>  
> +	WARN_ONCE(len > dev->max_segment_size, "%u > %u\n", len,
> +		  dev->max_segment_size);
>  	if (!is_write) {
>  		if (dev->zoned)
>  			valid_len = null_zone_valid_read_len(nullb,
> @@ -1249,7 +1259,8 @@ static int null_handle_rq(struct nullb_cmd *cmd)
>  
>  	spin_lock_irq(&nullb->lock);
>  	rq_for_each_segment(bvec, rq, iter) {
> -		len = bvec.bv_len;
> +		len = min(bvec.bv_len, nullb->dev->max_segment_size);
> +		bvec.bv_len = len;
>  		err = null_transfer(nullb, bvec.bv_page, len, bvec.bv_offset,
>  				     op_is_write(req_op(rq)), sector,
>  				     rq->cmd_flags & REQ_FUA);
> @@ -1276,7 +1287,8 @@ static int null_handle_bio(struct nullb_cmd *cmd)
>  
>  	spin_lock_irq(&nullb->lock);
>  	bio_for_each_segment(bvec, bio, iter) {
> -		len = bvec.bv_len;
> +		len = min(bvec.bv_len, nullb->dev->max_segment_size);
> +		bvec.bv_len = len;
>  		err = null_transfer(nullb, bvec.bv_page, len, bvec.bv_offset,
>  				     op_is_write(bio_op(bio)), sector,
>  				     bio->bi_opf & REQ_FUA);
> @@ -2088,6 +2100,7 @@ static int null_add_dev(struct nullb_device *dev)
>  	nullb->q->queuedata = nullb;
>  	blk_queue_flag_set(QUEUE_FLAG_NONROT, nullb->q);
>  	blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, nullb->q);
> +	blk_queue_flag_set(QUEUE_FLAG_SUB_PAGE_SEGMENTS, nullb->q);

Where is this defined ? I do not see this flag defined anywhere in Linus
tree nor in Jens for-next...

>  
>  	mutex_lock(&lock);
>  	rv = ida_simple_get(&nullb_indexes, 0, 0, GFP_KERNEL);
> @@ -2106,6 +2119,7 @@ static int null_add_dev(struct nullb_device *dev)
>  	dev->max_sectors = min_t(unsigned int, dev->max_sectors,
>  				 BLK_DEF_MAX_SECTORS);
>  	blk_queue_max_hw_sectors(nullb->q, dev->max_sectors);
> +	blk_queue_max_segment_size(nullb->q, dev->max_segment_size);
>  
>  	if (dev->virt_boundary)
>  		blk_queue_virt_boundary(nullb->q, PAGE_SIZE - 1);
> diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h
> index 94ff68052b1e..6784ee9f5fda 100644
> --- a/drivers/block/null_blk/null_blk.h
> +++ b/drivers/block/null_blk/null_blk.h
> @@ -102,6 +102,7 @@ struct nullb_device {
>  	unsigned int queue_mode; /* block interface */
>  	unsigned int blocksize; /* block size */
>  	unsigned int max_sectors; /* Max sectors per command */
> +	unsigned int max_segment_size; /* Max size of a single DMA segment. */
>  	unsigned int irqmode; /* IRQ completion handler */
>  	unsigned int hw_queue_depth; /* queue depth */
>  	unsigned int index; /* index of the disk, only valid with a disk */

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 10/10] null_blk: Support configuring the maximum segment size
  2022-10-20 23:13   ` Damien Le Moal
@ 2022-10-20 23:34     ` Bart Van Assche
  2022-10-20 23:39       ` Damien Le Moal
  0 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2022-10-20 23:34 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Ming Lei, Chaitanya Kulkarni,
	Johannes Thumshirn, Vincent Fu, Shin'ichiro Kawasaki,
	Yu Kuai

On 10/20/22 16:13, Damien Le Moal wrote:
> On 10/20/22 07:23, Bart Van Assche wrote:
>> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
>> index 1f154f92f4c2..bc811ab52c4a 100644
>> --- a/drivers/block/null_blk/main.c
>> +++ b/drivers/block/null_blk/main.c
>> @@ -157,6 +157,10 @@ static int g_max_sectors;
>>   module_param_named(max_sectors, g_max_sectors, int, 0444);
>>   MODULE_PARM_DESC(max_sectors, "Maximum size of a command (in 512B sectors)");
>>   
>> +static unsigned int g_max_segment_size = 1UL << 31;
> 
> Nit: UINT_MAX ?

Hi Damien,

That would be a valid alternative. I will consider changing the value 
into UINT_MAX.

>> @@ -2088,6 +2100,7 @@ static int null_add_dev(struct nullb_device *dev)
>>   	nullb->q->queuedata = nullb;
>>   	blk_queue_flag_set(QUEUE_FLAG_NONROT, nullb->q);
>>   	blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, nullb->q);
>> +	blk_queue_flag_set(QUEUE_FLAG_SUB_PAGE_SEGMENTS, nullb->q);
> 
> Where is this defined ? I do not see this flag defined anywhere in Linus
> tree nor in Jens for-next...

That flag has been defined in patch 05/10 of this series.

In case you would like to take a look, the code I used to test this 
series is available here: 
https://github.com/bvanassche/blktests/commits/master

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 10/10] null_blk: Support configuring the maximum segment size
  2022-10-20 23:34     ` Bart Van Assche
@ 2022-10-20 23:39       ` Damien Le Moal
  2022-10-21  0:39         ` Bart Van Assche
  0 siblings, 1 reply; 21+ messages in thread
From: Damien Le Moal @ 2022-10-20 23:39 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Ming Lei, Chaitanya Kulkarni,
	Johannes Thumshirn, Vincent Fu, Shin'ichiro Kawasaki,
	Yu Kuai

On 10/21/22 08:34, Bart Van Assche wrote:
> On 10/20/22 16:13, Damien Le Moal wrote:
>> On 10/20/22 07:23, Bart Van Assche wrote:
>>> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
>>> index 1f154f92f4c2..bc811ab52c4a 100644
>>> --- a/drivers/block/null_blk/main.c
>>> +++ b/drivers/block/null_blk/main.c
>>> @@ -157,6 +157,10 @@ static int g_max_sectors;
>>>   module_param_named(max_sectors, g_max_sectors, int, 0444);
>>>   MODULE_PARM_DESC(max_sectors, "Maximum size of a command (in 512B sectors)");
>>>   
>>> +static unsigned int g_max_segment_size = 1UL << 31;
>>
>> Nit: UINT_MAX ?
> 
> Hi Damien,
> 
> That would be a valid alternative. I will consider changing the value 
> into UINT_MAX.
> 
>>> @@ -2088,6 +2100,7 @@ static int null_add_dev(struct nullb_device *dev)
>>>   	nullb->q->queuedata = nullb;
>>>   	blk_queue_flag_set(QUEUE_FLAG_NONROT, nullb->q);
>>>   	blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, nullb->q);
>>> +	blk_queue_flag_set(QUEUE_FLAG_SUB_PAGE_SEGMENTS, nullb->q);
>>
>> Where is this defined ? I do not see this flag defined anywhere in Linus
>> tree nor in Jens for-next...
> 
> That flag has been defined in patch 05/10 of this series.
> 
> In case you would like to take a look, the code I used to test this 
> series is available here: 
> https://github.com/bvanassche/blktests/commits/master

Please always send full patch series. Hard to review a patch without the
context for it :)

> 
> Thanks,
> 
> Bart.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 10/10] null_blk: Support configuring the maximum segment size
  2022-10-20 23:39       ` Damien Le Moal
@ 2022-10-21  0:39         ` Bart Van Assche
  0 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2022-10-21  0:39 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Ming Lei, Chaitanya Kulkarni,
	Johannes Thumshirn, Vincent Fu, Shin'ichiro Kawasaki,
	Yu Kuai

On 10/20/22 16:39, Damien Le Moal wrote:
> Please always send full patch series. Hard to review a patch without the
> context for it :)

Hi Damien,

Does this link help? 
https://lore.kernel.org/linux-block/20221019222324.362705-1-bvanassche@acm.org/

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 01/10] block: Remove request.write_hint
  2022-10-19 22:23 ` [PATCH 01/10] block: Remove request.write_hint Bart Van Assche
@ 2022-10-21  1:59   ` Ming Lei
  0 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2022-10-21  1:59 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Jens Axboe, linux-block, Christoph Hellwig

On Wed, Oct 19, 2022 at 03:23:15PM -0700, Bart Van Assche wrote:
> Commit c75e707fe1aa ("block: remove the per-bio/request write hint")
> removed all code that uses the struct request write_hint member. Hence
> also remove 'write_hint' itself.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Ming Lei <ming.lei@redhat.com>


Thanks,
Ming


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 02/10] block: Constify most queue limits pointers
  2022-10-19 22:23 ` [PATCH 02/10] block: Constify most queue limits pointers Bart Van Assche
@ 2022-10-21  2:01   ` Ming Lei
  0 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2022-10-21  2:01 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Jens Axboe, linux-block, Christoph Hellwig, Keith Busch

On Wed, Oct 19, 2022 at 03:23:16PM -0700, Bart Van Assche wrote:
> Document which functions do not modify the queue limits.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Keith Busch <kbusch@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-map.c      |  2 +-
>  block/blk-merge.c    | 29 ++++++++++++++++-------------
>  block/blk-settings.c |  6 +++---
>  block/blk.h          | 11 ++++++-----
>  4 files changed, 26 insertions(+), 22 deletions(-)

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 03/10] block: Micro-optimize get_max_segment_size()
  2022-10-19 22:23 ` [PATCH 03/10] block: Micro-optimize get_max_segment_size() Bart Van Assche
@ 2022-10-21  2:17   ` Ming Lei
  0 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2022-10-21  2:17 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Keith Busch,
	Steven Rostedt, Guenter Roeck

On Wed, Oct 19, 2022 at 03:23:17PM -0700, Bart Van Assche wrote:
> This patch removes a conditional jump from get_max_segment_size(). The
> x86-64 assembler code for this function without this patch is as follows:
> 
> 206             return min_not_zero(mask - offset + 1,
>    0x0000000000000118 <+72>:    not    %rax
>    0x000000000000011b <+75>:    and    0x8(%r10),%rax
>    0x000000000000011f <+79>:    add    $0x1,%rax
>    0x0000000000000123 <+83>:    je     0x138 <bvec_split_segs+104>
>    0x0000000000000125 <+85>:    cmp    %rdx,%rax
>    0x0000000000000128 <+88>:    mov    %rdx,%r12
>    0x000000000000012b <+91>:    cmovbe %rax,%r12
>    0x000000000000012f <+95>:    test   %rdx,%rdx
>    0x0000000000000132 <+98>:    mov    %eax,%edx
>    0x0000000000000134 <+100>:   cmovne %r12d,%edx
> 
> With this patch applied:
> 
> 206             return min(mask - offset, (unsigned long)lim->max_segment_size - 1) + 1;
>    0x000000000000003f <+63>:    mov    0x28(%rdi),%ebp
>    0x0000000000000042 <+66>:    not    %rax
>    0x0000000000000045 <+69>:    and    0x8(%rdi),%rax
>    0x0000000000000049 <+73>:    sub    $0x1,%rbp
>    0x000000000000004d <+77>:    cmp    %rbp,%rax
>    0x0000000000000050 <+80>:    cmova  %rbp,%rax
>    0x0000000000000054 <+84>:    add    $0x1,%eax
> 
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Keith Busch <kbusch@kernel.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-merge.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 58fdc3f8905b..35a8f75cc45d 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -186,6 +186,14 @@ static inline unsigned get_max_io_size(struct bio *bio,
>  	return max_sectors & ~(lbs - 1);
>  }
>  
> +/**
> + * get_max_segment_size() - maximum number of bytes to add as a single segment
> + * @lim: Request queue limits.
> + * @start_page: See below.
> + * @offset: Offset from @start_page where to add a segment.
> + *
> + * Returns the maximum number of bytes that can be added as a single segment.
> + */
>  static inline unsigned get_max_segment_size(const struct queue_limits *lim,
>  		struct page *start_page, unsigned long offset)
>  {
> @@ -194,11 +202,10 @@ static inline unsigned get_max_segment_size(const struct queue_limits *lim,
>  	offset = mask & (page_to_phys(start_page) + offset);
>  
>  	/*
> -	 * overflow may be triggered in case of zero page physical address
> -	 * on 32bit arch, use queue's max segment size when that happens.
> +	 * Prevent an overflow if mask = ULONG_MAX and offset = 0 by adding 1
> +	 * after having calculated the minimum.
>  	 */
> -	return min_not_zero(mask - offset + 1,
> -			(unsigned long)lim->max_segment_size);
> +	return min(mask - offset, (unsigned long)lim->max_segment_size - 1) + 1;
>  }

Looks fine,

Reviewed-by: Ming Lei <ming.lei@redhat.com>


Thanks,
Ming


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 06/10] block: Fix the number of segment calculations
  2022-10-19 22:23 ` [PATCH 06/10] block: Fix the number of segment calculations Bart Van Assche
@ 2022-11-01 17:23   ` Bart Van Assche
  2022-11-02  1:22     ` Ming Lei
  0 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2022-11-01 17:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Ming Lei, Keith Busch

On 10/19/22 15:23, Bart Van Assche wrote:
> Since multi-page bvecs are supported there can be multiple segments per
> bvec (see also bvec_split_segs()). Hence this patch that calculates the
> number of segments per bvec instead of assuming that there is only one
> segment per bvec.

(replying to my own email)

Hi Ming,

Do you agree that this patch fixes a bug introduced by the multi-page 
bvec code and also that it is required even if the segment size is 
larger than the page size?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 06/10] block: Fix the number of segment calculations
  2022-11-01 17:23   ` Bart Van Assche
@ 2022-11-02  1:22     ` Ming Lei
  2022-11-03 19:32       ` Bart Van Assche
  0 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2022-11-02  1:22 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Jens Axboe, linux-block, Christoph Hellwig, Keith Busch

Hi Bart,

On Tue, Nov 01, 2022 at 10:23:32AM -0700, Bart Van Assche wrote:
> On 10/19/22 15:23, Bart Van Assche wrote:
> > Since multi-page bvecs are supported there can be multiple segments per
> > bvec (see also bvec_split_segs()). Hence this patch that calculates the
> > number of segments per bvec instead of assuming that there is only one
> > segment per bvec.
> 
> (replying to my own email)
> 
> Hi Ming,
> 
> Do you agree that this patch fixes a bug introduced by the multi-page bvec
> code and also that it is required even if the segment size is larger than
> the page size?

No, multi-age bvec is only applied on normal IO bvec, and it isn't used
in bio_add_pc_page(), so there isn't such issue in blk_rq_append_bio(),
that is why we needn't to split passthrough bio.

thanks, 
Ming


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 06/10] block: Fix the number of segment calculations
  2022-11-02  1:22     ` Ming Lei
@ 2022-11-03 19:32       ` Bart Van Assche
  0 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2022-11-03 19:32 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Christoph Hellwig, Keith Busch

On 11/1/22 18:22, Ming Lei wrote:
> On Tue, Nov 01, 2022 at 10:23:32AM -0700, Bart Van Assche wrote:
>> On 10/19/22 15:23, Bart Van Assche wrote:
>>> Since multi-page bvecs are supported there can be multiple segments per
>>> bvec (see also bvec_split_segs()). Hence this patch that calculates the
>>> number of segments per bvec instead of assuming that there is only one
>>> segment per bvec.
>>
>> (replying to my own email)
>>
>> Hi Ming,
>>
>> Do you agree that this patch fixes a bug introduced by the multi-page bvec
>> code and also that it is required even if the segment size is larger than
>> the page size?
> 
> No, multi-age bvec is only applied on normal IO bvec, and it isn't used
> in bio_add_pc_page(), so there isn't such issue in blk_rq_append_bio(),
> that is why we needn't to split passthrough bio.

Hi Ming,

Thanks for the feedback. I will fix the description of this patch.

Bart.


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-11-03 19:32 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-19 22:23 [PATCH 00/10] Support DMA segments smaller than the page size Bart Van Assche
2022-10-19 22:23 ` [PATCH 01/10] block: Remove request.write_hint Bart Van Assche
2022-10-21  1:59   ` Ming Lei
2022-10-19 22:23 ` [PATCH 02/10] block: Constify most queue limits pointers Bart Van Assche
2022-10-21  2:01   ` Ming Lei
2022-10-19 22:23 ` [PATCH 03/10] block: Micro-optimize get_max_segment_size() Bart Van Assche
2022-10-21  2:17   ` Ming Lei
2022-10-19 22:23 ` [PATCH 04/10] block: Add support for small segments in blk_rq_map_user_iov() Bart Van Assche
2022-10-19 22:23 ` [PATCH 05/10] block: Introduce QUEUE_FLAG_SUB_PAGE_SEGMENTS Bart Van Assche
2022-10-19 22:23 ` [PATCH 06/10] block: Fix the number of segment calculations Bart Van Assche
2022-11-01 17:23   ` Bart Van Assche
2022-11-02  1:22     ` Ming Lei
2022-11-03 19:32       ` Bart Van Assche
2022-10-19 22:23 ` [PATCH 07/10] block: Add support for segments smaller than the page size Bart Van Assche
2022-10-19 22:23 ` [PATCH 08/10] scsi: core: Set the SUB_PAGE_SEGMENTS request queue flag Bart Van Assche
2022-10-19 22:23 ` [PATCH 09/10] scsi_debug: Support configuring the maximum segment size Bart Van Assche
2022-10-19 22:23 ` [PATCH 10/10] null_blk: " Bart Van Assche
2022-10-20 23:13   ` Damien Le Moal
2022-10-20 23:34     ` Bart Van Assche
2022-10-20 23:39       ` Damien Le Moal
2022-10-21  0:39         ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.