All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/7] Add support for limits below the page size
@ 2023-01-30 21:26 Bart Van Assche
  2023-01-30 21:26 ` [PATCH v4 1/7] block: Introduce blk_mq_debugfs_init() Bart Van Assche
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Bart Van Assche @ 2023-01-30 21:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Avri Altman, Adrian Hunter,
	Christoph Hellwig, Ming Lei, Bart Van Assche

Hi Jens,

We want to improve Android performance by increasing the page size from 4 KiB
to 16 KiB. However, some of the storage controllers we care about do not support
DMA segments larger than 4 KiB. Hence the need support for DMA segments that are
smaller than the size of one virtual memory page. This patch series implements
that support. Please consider this patch series for the next merge window.

Thanks,

Bart.

Changes compared to v3:
- Removed CONFIG_BLK_SUB_PAGE_SEGMENTS and QUEUE_FLAG_SUB_PAGE_SEGMENTS.
  Replaced these by a new member in struct queue_limits and a static branch.
- The static branch that controls whether or not sub-page limits are enabled
  is set by the block layer core instead of by block drivers.
- Dropped the patches that are no longer needed (SCSI core and UFS Exynos
  driver).

Changes compared to v2:
- For SCSI drivers, only set flag QUEUE_FLAG_SUB_PAGE_SEGMENTS if necessary.
- In the scsi_debug patch, sorted kernel module parameters alphabetically.
  Only set flag QUEUE_FLAG_SUB_PAGE_SEGMENTS if necessary.
- Added a patch for the UFS Exynos driver that enables
  CONFIG_BLK_SUB_PAGE_SEGMENTS if the page size exceeds 4 KiB.

Changes compared to v1:
- Added a CONFIG variable that controls whether or not small segment support
  is enabled.
- Improved patch descriptions.

Bart Van Assche (7):
  block: Introduce blk_mq_debugfs_init()
  block: Support configuring limits below the page size
  block: Support submitting passthrough requests with small segments
  block: Add support for filesystem requests and small segments
  block: Add support for small segments in blk_rq_map_user_iov()
  scsi_debug: Support configuring the maximum segment size
  null_blk: Support configuring the maximum segment size

 block/blk-core.c                  |  4 +-
 block/blk-map.c                   | 29 ++++++++---
 block/blk-merge.c                 |  7 ++-
 block/blk-mq-debugfs.c            | 10 ++++
 block/blk-mq-debugfs.h            |  6 +++
 block/blk-mq.c                    |  2 +
 block/blk-settings.c              | 82 ++++++++++++++++++++++++++++---
 block/blk.h                       | 39 ++++++++++++---
 drivers/block/null_blk/main.c     | 19 +++++--
 drivers/block/null_blk/null_blk.h |  1 +
 drivers/scsi/scsi_debug.c         |  4 ++
 include/linux/blkdev.h            |  2 +
 12 files changed, 179 insertions(+), 26 deletions(-)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v4 1/7] block: Introduce blk_mq_debugfs_init()
  2023-01-30 21:26 [PATCH v4 0/7] Add support for limits below the page size Bart Van Assche
@ 2023-01-30 21:26 ` Bart Van Assche
  2023-02-01 20:58   ` Luis Chamberlain
  2023-01-30 21:26 ` [PATCH v4 2/7] block: Support configuring limits below the page size Bart Van Assche
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2023-01-30 21:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Avri Altman, Adrian Hunter,
	Christoph Hellwig, Ming Lei, Bart Van Assche, Keith Busch

Move the code for creating the block layer debugfs root directory into
blk-mq-debugfs.c. This patch prepares for adding more debugfs
initialization code by introducing the function blk_mq_debugfs_init().

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-core.c       | 3 ++-
 block/blk-mq-debugfs.c | 5 +++++
 block/blk-mq-debugfs.h | 6 ++++++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index ccf9a7683a3c..0dacc2df9588 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -45,6 +45,7 @@
 #include <trace/events/block.h>
 
 #include "blk.h"
+#include "blk-mq-debugfs.h"
 #include "blk-mq-sched.h"
 #include "blk-pm.h"
 #include "blk-cgroup.h"
@@ -1202,7 +1203,7 @@ int __init blk_dev_init(void)
 	blk_requestq_cachep = kmem_cache_create("request_queue",
 			sizeof(struct request_queue), 0, SLAB_PANIC, NULL);
 
-	blk_debugfs_root = debugfs_create_dir("block", NULL);
+	blk_mq_debugfs_init();
 
 	return 0;
 }
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index bd942341b638..60d1de0ce624 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -874,3 +874,8 @@ void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx)
 	debugfs_remove_recursive(hctx->sched_debugfs_dir);
 	hctx->sched_debugfs_dir = NULL;
 }
+
+void blk_mq_debugfs_init(void)
+{
+	blk_debugfs_root = debugfs_create_dir("block", NULL);
+}
diff --git a/block/blk-mq-debugfs.h b/block/blk-mq-debugfs.h
index 9c7d4b6117d4..7942119051f5 100644
--- a/block/blk-mq-debugfs.h
+++ b/block/blk-mq-debugfs.h
@@ -17,6 +17,8 @@ struct blk_mq_debugfs_attr {
 	const struct seq_operations *seq_ops;
 };
 
+void blk_mq_debugfs_init(void);
+
 int __blk_mq_debugfs_rq_show(struct seq_file *m, struct request *rq);
 int blk_mq_debugfs_rq_show(struct seq_file *m, void *v);
 
@@ -36,6 +38,10 @@ void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx);
 void blk_mq_debugfs_register_rqos(struct rq_qos *rqos);
 void blk_mq_debugfs_unregister_rqos(struct rq_qos *rqos);
 #else
+static inline void blk_mq_debugfs_init(void)
+{
+}
+
 static inline void blk_mq_debugfs_register(struct request_queue *q)
 {
 }

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 2/7] block: Support configuring limits below the page size
  2023-01-30 21:26 [PATCH v4 0/7] Add support for limits below the page size Bart Van Assche
  2023-01-30 21:26 ` [PATCH v4 1/7] block: Introduce blk_mq_debugfs_init() Bart Van Assche
@ 2023-01-30 21:26 ` Bart Van Assche
  2023-02-01 23:50   ` Luis Chamberlain
  2023-01-30 21:26 ` [PATCH v4 3/7] block: Support submitting passthrough requests with small segments Bart Van Assche
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2023-01-30 21:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Avri Altman, Adrian Hunter,
	Christoph Hellwig, Ming Lei, Bart Van Assche, Keith Busch

Allow block drivers to configure the following:
* Maximum number of hardware sectors values smaller than
  PAGE_SIZE >> SECTOR_SHIFT. With PAGE_SIZE = 4096 this means that values
  below 8 are supported.
* A maximum segment size below the page size. This is most useful
  for page sizes above 4096 bytes.

The blk_sub_page_segments static branch will be used in later patches to
prevent that performance of block drivers that support segments >=
PAGE_SIZE and max_hw_sectors >= PAGE_SIZE >> SECTOR_SHIFT would be affected.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-core.c       |  1 +
 block/blk-mq-debugfs.c |  5 +++
 block/blk-settings.c   | 82 +++++++++++++++++++++++++++++++++++++-----
 block/blk.h            | 10 ++++++
 include/linux/blkdev.h |  2 ++
 5 files changed, 92 insertions(+), 8 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 0dacc2df9588..b193040c7c73 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -270,6 +270,7 @@ static void blk_free_queue(struct request_queue *q)
 
 	blk_free_queue_stats(q->stats);
 	kfree(q->poll_stat);
+	blk_disable_sub_page_limits(&q->limits);
 
 	if (queue_is_mq(q))
 		blk_mq_release(q);
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 60d1de0ce624..4f06e02961f3 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -875,7 +875,12 @@ void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx)
 	hctx->sched_debugfs_dir = NULL;
 }
 
+DEFINE_DEBUGFS_ATTRIBUTE(blk_sub_page_limit_queues_fops,
+			blk_sub_page_limit_queues_get, NULL, "%llu\n");
+
 void blk_mq_debugfs_init(void)
 {
 	blk_debugfs_root = debugfs_create_dir("block", NULL);
+	debugfs_create_file("sub_page_limit_queues", 0400, blk_debugfs_root,
+			    NULL, &blk_sub_page_limit_queues_fops);
 }
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 9c9713c9269c..46d43cef8377 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -18,6 +18,11 @@
 #include "blk.h"
 #include "blk-wbt.h"
 
+/* Protects blk_nr_sub_page_limit_queues and blk_sub_page_limits changes. */
+static DEFINE_MUTEX(blk_sub_page_limit_lock);
+static uint32_t blk_nr_sub_page_limit_queues;
+DEFINE_STATIC_KEY_FALSE(blk_sub_page_limits);
+
 void blk_queue_rq_timeout(struct request_queue *q, unsigned int timeout)
 {
 	q->rq_timeout = timeout;
@@ -58,6 +63,7 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->zoned = BLK_ZONED_NONE;
 	lim->zone_write_granularity = 0;
 	lim->dma_alignment = 511;
+	lim->sub_page_limits = false;
 }
 
 /**
@@ -100,6 +106,55 @@ void blk_queue_bounce_limit(struct request_queue *q, enum blk_bounce bounce)
 }
 EXPORT_SYMBOL(blk_queue_bounce_limit);
 
+/* For debugfs. */
+int blk_sub_page_limit_queues_get(void *data, u64 *val)
+{
+	*val = READ_ONCE(blk_nr_sub_page_limit_queues);
+
+	return 0;
+}
+
+/**
+ * blk_enable_sub_page_limits - enable support for max_segment_size values smaller than PAGE_SIZE and for max_hw_sectors values below PAGE_SIZE >> SECTOR_SHIFT
+ * @lim: request queue limits for which to enable support of these features.
+ *
+ * Support for these features is not enabled all the time because of the
+ * runtime overhead of these features.
+ */
+static void blk_enable_sub_page_limits(struct queue_limits *lim)
+{
+	if (lim->sub_page_limits)
+		return;
+
+	lim->sub_page_limits = true;
+
+	mutex_lock(&blk_sub_page_limit_lock);
+	if (++blk_nr_sub_page_limit_queues == 1)
+		static_branch_enable(&blk_sub_page_limits);
+	mutex_unlock(&blk_sub_page_limit_lock);
+}
+
+/**
+ * blk_disable_sub_page_limits - disable support for max_segment_size values smaller than PAGE_SIZE and for max_hw_sectors values below PAGE_SIZE >> SECTOR_SHIFT
+ * @lim: request queue limits for which to enable support of these features.
+ *
+ * Support for these features is not enabled all the time because of the
+ * runtime overhead of these features.
+ */
+void blk_disable_sub_page_limits(struct queue_limits *lim)
+{
+	if (!lim->sub_page_limits)
+		return;
+
+	lim->sub_page_limits = false;
+
+	mutex_lock(&blk_sub_page_limit_lock);
+	WARN_ON_ONCE(blk_nr_sub_page_limit_queues <= 0);
+	if (--blk_nr_sub_page_limit_queues == 0)
+		static_branch_disable(&blk_sub_page_limits);
+	mutex_unlock(&blk_sub_page_limit_lock);
+}
+
 /**
  * blk_queue_max_hw_sectors - set max sectors for a request for this queue
  * @q:  the request queue for the device
@@ -122,12 +177,17 @@ EXPORT_SYMBOL(blk_queue_bounce_limit);
 void blk_queue_max_hw_sectors(struct request_queue *q, unsigned int max_hw_sectors)
 {
 	struct queue_limits *limits = &q->limits;
+	unsigned int min_max_hw_sectors = PAGE_SIZE >> SECTOR_SHIFT;
 	unsigned int max_sectors;
 
-	if ((max_hw_sectors << 9) < PAGE_SIZE) {
-		max_hw_sectors = 1 << (PAGE_SHIFT - 9);
-		printk(KERN_INFO "%s: set to minimum %d\n",
-		       __func__, max_hw_sectors);
+	if (max_hw_sectors < min_max_hw_sectors) {
+		blk_enable_sub_page_limits(limits);
+		min_max_hw_sectors = 1;
+	}
+
+	if (max_hw_sectors < min_max_hw_sectors) {
+		max_hw_sectors = min_max_hw_sectors;
+		pr_info("%s: set to minimum %u\n", __func__, max_hw_sectors);
 	}
 
 	max_hw_sectors = round_down(max_hw_sectors,
@@ -282,10 +342,16 @@ EXPORT_SYMBOL_GPL(blk_queue_max_discard_segments);
  **/
 void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size)
 {
-	if (max_size < PAGE_SIZE) {
-		max_size = PAGE_SIZE;
-		printk(KERN_INFO "%s: set to minimum %d\n",
-		       __func__, max_size);
+	unsigned int min_max_segment_size = PAGE_SIZE;
+
+	if (max_size < min_max_segment_size) {
+		blk_enable_sub_page_limits(&q->limits);
+		min_max_segment_size = SECTOR_SIZE;
+	}
+
+	if (max_size < min_max_segment_size) {
+		max_size = min_max_segment_size;
+		pr_info("%s: set to minimum %u\n", __func__, max_size);
 	}
 
 	/* see blk_queue_virt_boundary() for the explanation */
diff --git a/block/blk.h b/block/blk.h
index 4c3b3325219a..9a56d7002efc 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -13,6 +13,7 @@ struct elevator_type;
 #define BLK_MAX_TIMEOUT		(5 * HZ)
 
 extern struct dentry *blk_debugfs_root;
+DECLARE_STATIC_KEY_FALSE(blk_sub_page_limits);
 
 struct blk_flush_queue {
 	unsigned int		flush_pending_idx:1;
@@ -32,6 +33,15 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size,
 					      gfp_t flags);
 void blk_free_flush_queue(struct blk_flush_queue *q);
 
+static inline bool blk_queue_sub_page_limits(const struct queue_limits *lim)
+{
+	return static_branch_unlikely(&blk_sub_page_limits) &&
+		lim->sub_page_limits;
+}
+
+int blk_sub_page_limit_queues_get(void *data, u64 *val);
+void blk_disable_sub_page_limits(struct queue_limits *q);
+
 void blk_freeze_queue(struct request_queue *q);
 void __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic);
 void blk_queue_start_drain(struct request_queue *q);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index b9637d63e6f0..af04bf241714 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -319,6 +319,8 @@ struct queue_limits {
 	 * due to possible offsets.
 	 */
 	unsigned int		dma_alignment;
+
+	bool			sub_page_limits;
 };
 
 typedef int (*report_zones_cb)(struct blk_zone *zone, unsigned int idx,

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 3/7] block: Support submitting passthrough requests with small segments
  2023-01-30 21:26 [PATCH v4 0/7] Add support for limits below the page size Bart Van Assche
  2023-01-30 21:26 ` [PATCH v4 1/7] block: Introduce blk_mq_debugfs_init() Bart Van Assche
  2023-01-30 21:26 ` [PATCH v4 2/7] block: Support configuring limits below the page size Bart Van Assche
@ 2023-01-30 21:26 ` Bart Van Assche
  2023-01-30 21:26 ` [PATCH v4 4/7] block: Add support for filesystem requests and " Bart Van Assche
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2023-01-30 21:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Avri Altman, Adrian Hunter,
	Christoph Hellwig, Ming Lei, Bart Van Assche, Keith Busch

If the segment size is smaller than the page size there may be multiple
segments per bvec even if a bvec only contains a single page. Hence this
patch.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-map.c |  2 +-
 block/blk.h     | 18 ++++++++++++++++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/block/blk-map.c b/block/blk-map.c
index 9ee4be4ba2f1..eb059d3a1be2 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -534,7 +534,7 @@ int blk_rq_append_bio(struct request *rq, struct bio *bio)
 	unsigned int nr_segs = 0;
 
 	bio_for_each_bvec(bv, bio, iter)
-		nr_segs++;
+		nr_segs += blk_segments(&rq->q->limits, bv.bv_len);
 
 	if (!rq->bio) {
 		blk_rq_bio_prep(rq, bio, nr_segs);
diff --git a/block/blk.h b/block/blk.h
index 9a56d7002efc..b39938255d13 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -86,6 +86,24 @@ struct bio_vec *bvec_alloc(mempool_t *pool, unsigned short *nr_vecs,
 		gfp_t gfp_mask);
 void bvec_free(mempool_t *pool, struct bio_vec *bv, unsigned short nr_vecs);
 
+/* Number of DMA segments required to transfer @bytes data. */
+static inline unsigned int blk_segments(const struct queue_limits *limits,
+					unsigned int bytes)
+{
+	if (!blk_queue_sub_page_limits(limits))
+		return 1;
+
+	{
+		const unsigned int mss = limits->max_segment_size;
+
+		if (bytes <= mss)
+			return 1;
+		if (is_power_of_2(mss))
+			return round_up(bytes, mss) >> ilog2(mss);
+		return (bytes + mss - 1) / mss;
+	}
+}
+
 static inline bool biovec_phys_mergeable(struct request_queue *q,
 		struct bio_vec *vec1, struct bio_vec *vec2)
 {

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 4/7] block: Add support for filesystem requests and small segments
  2023-01-30 21:26 [PATCH v4 0/7] Add support for limits below the page size Bart Van Assche
                   ` (2 preceding siblings ...)
  2023-01-30 21:26 ` [PATCH v4 3/7] block: Support submitting passthrough requests with small segments Bart Van Assche
@ 2023-01-30 21:26 ` Bart Van Assche
  2023-01-30 21:26 ` [PATCH v4 5/7] block: Add support for small segments in blk_rq_map_user_iov() Bart Van Assche
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2023-01-30 21:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Avri Altman, Adrian Hunter,
	Christoph Hellwig, Ming Lei, Bart Van Assche, Keith Busch

Add support in the bio splitting code and also in the bio submission code
for bios with segments smaller than the page size.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-merge.c |  7 +++++--
 block/blk-mq.c    |  2 ++
 block/blk.h       | 11 +++++------
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index b7c193d67185..bf21475e8a13 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -294,7 +294,8 @@ static struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim,
 		if (nsegs < lim->max_segments &&
 		    bytes + bv.bv_len <= max_bytes &&
 		    bv.bv_offset + bv.bv_len <= PAGE_SIZE) {
-			nsegs++;
+			/* single-page bvec optimization */
+			nsegs += blk_segments(lim, bv.bv_len);
 			bytes += bv.bv_len;
 		} else {
 			if (bvec_split_segs(lim, &bv, &nsegs, &bytes,
@@ -543,7 +544,9 @@ static int __blk_bios_map_sg(struct request_queue *q, struct bio *bio,
 			    __blk_segment_map_sg_merge(q, &bvec, &bvprv, sg))
 				goto next_bvec;
 
-			if (bvec.bv_offset + bvec.bv_len <= PAGE_SIZE)
+			if (bvec.bv_offset + bvec.bv_len <= PAGE_SIZE &&
+			    (!blk_queue_sub_page_limits(&q->limits) ||
+			     bvec.bv_len <= q->limits.max_segment_size))
 				nsegs += __blk_bvec_map_sg(bvec, sglist, sg);
 			else
 				nsegs += blk_bvec_map_sg(q, &bvec, sglist, sg);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9c8dc70020bc..a62b79e97a30 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2959,6 +2959,8 @@ void blk_mq_submit_bio(struct bio *bio)
 		bio = __bio_split_to_limits(bio, &q->limits, &nr_segs);
 		if (!bio)
 			return;
+	} else if (bio->bi_vcnt == 1) {
+		nr_segs = blk_segments(&q->limits, bio->bi_io_vec[0].bv_len);
 	}
 
 	if (!bio_integrity_prep(bio))
diff --git a/block/blk.h b/block/blk.h
index b39938255d13..5c248b80a218 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -333,13 +333,12 @@ static inline bool bio_may_exceed_limits(struct bio *bio,
 	}
 
 	/*
-	 * All drivers must accept single-segments bios that are <= PAGE_SIZE.
-	 * This is a quick and dirty check that relies on the fact that
-	 * bi_io_vec[0] is always valid if a bio has data.  The check might
-	 * lead to occasional false negatives when bios are cloned, but compared
-	 * to the performance impact of cloned bios themselves the loop below
-	 * doesn't matter anyway.
+	 * Check whether bio splitting should be performed. This check may
+	 * trigger the bio splitting code even if splitting is not necessary.
 	 */
+	if (blk_queue_sub_page_limits(lim) &&
+	    bio->bi_io_vec && bio->bi_io_vec->bv_len > lim->max_segment_size)
+		return true;
 	return lim->chunk_sectors || bio->bi_vcnt != 1 ||
 		bio->bi_io_vec->bv_len + bio->bi_io_vec->bv_offset > PAGE_SIZE;
 }

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 5/7] block: Add support for small segments in blk_rq_map_user_iov()
  2023-01-30 21:26 [PATCH v4 0/7] Add support for limits below the page size Bart Van Assche
                   ` (3 preceding siblings ...)
  2023-01-30 21:26 ` [PATCH v4 4/7] block: Add support for filesystem requests and " Bart Van Assche
@ 2023-01-30 21:26 ` Bart Van Assche
  2023-01-30 21:26 ` [PATCH v4 6/7] scsi_debug: Support configuring the maximum segment size Bart Van Assche
  2023-01-30 21:26 ` [PATCH v4 7/7] null_blk: " Bart Van Assche
  6 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2023-01-30 21:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Avri Altman, Adrian Hunter,
	Christoph Hellwig, Ming Lei, Bart Van Assche, Keith Busch

Before changing the return value of bio_add_hw_page() into a value in
the range [0, len], make blk_rq_map_user_iov() fall back to copying data
if mapping the data is not possible due to the segment limit.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-map.c | 27 ++++++++++++++++++++++-----
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/block/blk-map.c b/block/blk-map.c
index eb059d3a1be2..b1dad4690472 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -307,17 +307,26 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 		else {
 			for (j = 0; j < npages; j++) {
 				struct page *page = pages[j];
-				unsigned int n = PAGE_SIZE - offs;
+				unsigned int n = PAGE_SIZE - offs, added;
 				bool same_page = false;
 
 				if (n > bytes)
 					n = bytes;
 
-				if (!bio_add_hw_page(rq->q, bio, page, n, offs,
-						     max_sectors, &same_page)) {
+				added = bio_add_hw_page(rq->q, bio, page, n,
+						offs, max_sectors, &same_page);
+				if (added == 0) {
 					if (same_page)
 						put_page(page);
 					break;
+				} else if (added != n) {
+					/*
+					 * The segment size is smaller than the
+					 * page size and an iov exceeds the
+					 * segment size. Give up.
+					 */
+					ret = -EREMOTEIO;
+					goto out_unmap;
 				}
 
 				bytes -= n;
@@ -657,10 +666,18 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
 
 	i = *iter;
 	do {
-		if (copy)
+		if (copy) {
 			ret = bio_copy_user_iov(rq, map_data, &i, gfp_mask);
-		else
+		} else {
 			ret = bio_map_user_iov(rq, &i, gfp_mask);
+			/*
+			 * Fall back to copying the data if bio_map_user_iov()
+			 * returns -EREMOTEIO.
+			 */
+			if (ret == -EREMOTEIO)
+				ret = bio_copy_user_iov(rq, map_data, &i,
+							gfp_mask);
+		}
 		if (ret)
 			goto unmap_rq;
 		if (!bio)

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 6/7] scsi_debug: Support configuring the maximum segment size
  2023-01-30 21:26 [PATCH v4 0/7] Add support for limits below the page size Bart Van Assche
                   ` (4 preceding siblings ...)
  2023-01-30 21:26 ` [PATCH v4 5/7] block: Add support for small segments in blk_rq_map_user_iov() Bart Van Assche
@ 2023-01-30 21:26 ` Bart Van Assche
  2023-01-30 21:26 ` [PATCH v4 7/7] null_blk: " Bart Van Assche
  6 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2023-01-30 21:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Avri Altman, Adrian Hunter,
	Christoph Hellwig, Ming Lei, Bart Van Assche, Doug Gilbert,
	Martin K . Petersen

Add a kernel module parameter for configuring the maximum segment size.
This patch enables testing SCSI support for segments smaller than the
page size.

Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_debug.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 8553277effb3..603c9faac56f 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -752,6 +752,7 @@ static int sdebug_host_max_queue;	/* per host */
 static int sdebug_lowest_aligned = DEF_LOWEST_ALIGNED;
 static int sdebug_max_luns = DEF_MAX_LUNS;
 static int sdebug_max_queue = SDEBUG_CANQUEUE;	/* per submit queue */
+static unsigned int sdebug_max_segment_size = BLK_MAX_SEGMENT_SIZE;
 static unsigned int sdebug_medium_error_start = OPT_MEDIUM_ERR_ADDR;
 static int sdebug_medium_error_count = OPT_MEDIUM_ERR_NUM;
 static atomic_t retired_max_queue;	/* if > 0 then was prior max_queue */
@@ -5841,6 +5842,7 @@ module_param_named(lowest_aligned, sdebug_lowest_aligned, int, S_IRUGO);
 module_param_named(lun_format, sdebug_lun_am_i, int, S_IRUGO | S_IWUSR);
 module_param_named(max_luns, sdebug_max_luns, int, S_IRUGO | S_IWUSR);
 module_param_named(max_queue, sdebug_max_queue, int, S_IRUGO | S_IWUSR);
+module_param_named(max_segment_size, sdebug_max_segment_size, uint, S_IRUGO);
 module_param_named(medium_error_count, sdebug_medium_error_count, int,
 		   S_IRUGO | S_IWUSR);
 module_param_named(medium_error_start, sdebug_medium_error_start, int,
@@ -5917,6 +5919,7 @@ MODULE_PARM_DESC(lowest_aligned, "lowest aligned lba (def=0)");
 MODULE_PARM_DESC(lun_format, "LUN format: 0->peripheral (def); 1 --> flat address method");
 MODULE_PARM_DESC(max_luns, "number of LUNs per target to simulate(def=1)");
 MODULE_PARM_DESC(max_queue, "max number of queued commands (1 to max(def))");
+MODULE_PARM_DESC(max_segment_size, "max bytes in a single segment");
 MODULE_PARM_DESC(medium_error_count, "count of sectors to return follow on MEDIUM error");
 MODULE_PARM_DESC(medium_error_start, "starting sector number to return MEDIUM error");
 MODULE_PARM_DESC(ndelay, "response delay in nanoseconds (def=0 -> ignore)");
@@ -7816,6 +7819,7 @@ static int sdebug_driver_probe(struct device *dev)
 
 	sdebug_driver_template.can_queue = sdebug_max_queue;
 	sdebug_driver_template.cmd_per_lun = sdebug_max_queue;
+	sdebug_driver_template.max_segment_size = sdebug_max_segment_size;
 	if (!sdebug_clustering)
 		sdebug_driver_template.dma_boundary = PAGE_SIZE - 1;
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 7/7] null_blk: Support configuring the maximum segment size
  2023-01-30 21:26 [PATCH v4 0/7] Add support for limits below the page size Bart Van Assche
                   ` (5 preceding siblings ...)
  2023-01-30 21:26 ` [PATCH v4 6/7] scsi_debug: Support configuring the maximum segment size Bart Van Assche
@ 2023-01-30 21:26 ` Bart Van Assche
  6 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2023-01-30 21:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Jaegeuk Kim, Avri Altman, Adrian Hunter,
	Christoph Hellwig, Ming Lei, Bart Van Assche, Damien Le Moal,
	Chaitanya Kulkarni

Add support for configuring the maximum segment size.

Add support for segments smaller than the page size.

This patch enables testing segments smaller than the page size with a
driver that does not call blk_rq_map_sg().

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/block/null_blk/main.c     | 19 ++++++++++++++++---
 drivers/block/null_blk/null_blk.h |  1 +
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index 4c601ca9552a..06eaa7ff4a86 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -157,6 +157,10 @@ static int g_max_sectors;
 module_param_named(max_sectors, g_max_sectors, int, 0444);
 MODULE_PARM_DESC(max_sectors, "Maximum size of a command (in 512B sectors)");
 
+static unsigned int g_max_segment_size = BLK_MAX_SEGMENT_SIZE;
+module_param_named(max_segment_size, g_max_segment_size, int, 0444);
+MODULE_PARM_DESC(max_segment_size, "Maximum size of a segment in bytes");
+
 static unsigned int nr_devices = 1;
 module_param(nr_devices, uint, 0444);
 MODULE_PARM_DESC(nr_devices, "Number of devices to register");
@@ -409,6 +413,7 @@ NULLB_DEVICE_ATTR(home_node, uint, NULL);
 NULLB_DEVICE_ATTR(queue_mode, uint, NULL);
 NULLB_DEVICE_ATTR(blocksize, uint, NULL);
 NULLB_DEVICE_ATTR(max_sectors, uint, NULL);
+NULLB_DEVICE_ATTR(max_segment_size, uint, NULL);
 NULLB_DEVICE_ATTR(irqmode, uint, NULL);
 NULLB_DEVICE_ATTR(hw_queue_depth, uint, NULL);
 NULLB_DEVICE_ATTR(index, uint, NULL);
@@ -550,6 +555,7 @@ static struct configfs_attribute *nullb_device_attrs[] = {
 	&nullb_device_attr_queue_mode,
 	&nullb_device_attr_blocksize,
 	&nullb_device_attr_max_sectors,
+	&nullb_device_attr_max_segment_size,
 	&nullb_device_attr_irqmode,
 	&nullb_device_attr_hw_queue_depth,
 	&nullb_device_attr_index,
@@ -630,7 +636,8 @@ static ssize_t memb_group_features_show(struct config_item *item, char *page)
 	return snprintf(page, PAGE_SIZE,
 			"badblocks,blocking,blocksize,cache_size,"
 			"completion_nsec,discard,home_node,hw_queue_depth,"
-			"irqmode,max_sectors,mbps,memory_backed,no_sched,"
+			"irqmode,max_sectors,max_segment_size,mbps,"
+			"memory_backed,no_sched,"
 			"poll_queues,power,queue_mode,shared_tag_bitmap,size,"
 			"submit_queues,use_per_node_hctx,virt_boundary,zoned,"
 			"zone_capacity,zone_max_active,zone_max_open,"
@@ -693,6 +700,7 @@ static struct nullb_device *null_alloc_dev(void)
 	dev->queue_mode = g_queue_mode;
 	dev->blocksize = g_bs;
 	dev->max_sectors = g_max_sectors;
+	dev->max_segment_size = g_max_segment_size;
 	dev->irqmode = g_irqmode;
 	dev->hw_queue_depth = g_hw_queue_depth;
 	dev->blocking = g_blocking;
@@ -1234,6 +1242,8 @@ static int null_transfer(struct nullb *nullb, struct page *page,
 	unsigned int valid_len = len;
 	int err = 0;
 
+	WARN_ONCE(len > dev->max_segment_size, "%u > %u\n", len,
+		  dev->max_segment_size);
 	if (!is_write) {
 		if (dev->zoned)
 			valid_len = null_zone_valid_read_len(nullb,
@@ -1269,7 +1279,8 @@ static int null_handle_rq(struct nullb_cmd *cmd)
 
 	spin_lock_irq(&nullb->lock);
 	rq_for_each_segment(bvec, rq, iter) {
-		len = bvec.bv_len;
+		len = min(bvec.bv_len, nullb->dev->max_segment_size);
+		bvec.bv_len = len;
 		err = null_transfer(nullb, bvec.bv_page, len, bvec.bv_offset,
 				     op_is_write(req_op(rq)), sector,
 				     rq->cmd_flags & REQ_FUA);
@@ -1296,7 +1307,8 @@ static int null_handle_bio(struct nullb_cmd *cmd)
 
 	spin_lock_irq(&nullb->lock);
 	bio_for_each_segment(bvec, bio, iter) {
-		len = bvec.bv_len;
+		len = min(bvec.bv_len, nullb->dev->max_segment_size);
+		bvec.bv_len = len;
 		err = null_transfer(nullb, bvec.bv_page, len, bvec.bv_offset,
 				     op_is_write(bio_op(bio)), sector,
 				     bio->bi_opf & REQ_FUA);
@@ -2125,6 +2137,7 @@ static int null_add_dev(struct nullb_device *dev)
 		dev->max_sectors = queue_max_hw_sectors(nullb->q);
 	dev->max_sectors = min(dev->max_sectors, BLK_DEF_MAX_SECTORS);
 	blk_queue_max_hw_sectors(nullb->q, dev->max_sectors);
+	blk_queue_max_segment_size(nullb->q, dev->max_segment_size);
 
 	if (dev->virt_boundary)
 		blk_queue_virt_boundary(nullb->q, PAGE_SIZE - 1);
diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h
index eb5972c50be8..8cb73fe05fa3 100644
--- a/drivers/block/null_blk/null_blk.h
+++ b/drivers/block/null_blk/null_blk.h
@@ -102,6 +102,7 @@ struct nullb_device {
 	unsigned int queue_mode; /* block interface */
 	unsigned int blocksize; /* block size */
 	unsigned int max_sectors; /* Max sectors per command */
+	unsigned int max_segment_size; /* Max size of a single DMA segment. */
 	unsigned int irqmode; /* IRQ completion handler */
 	unsigned int hw_queue_depth; /* queue depth */
 	unsigned int index; /* index of the disk, only valid with a disk */

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 1/7] block: Introduce blk_mq_debugfs_init()
  2023-01-30 21:26 ` [PATCH v4 1/7] block: Introduce blk_mq_debugfs_init() Bart Van Assche
@ 2023-02-01 20:58   ` Luis Chamberlain
  2023-02-01 21:23     ` Luis Chamberlain
  0 siblings, 1 reply; 21+ messages in thread
From: Luis Chamberlain @ 2023-02-01 20:58 UTC (permalink / raw)
  To: Bart Van Assche, Pankaj Raghav
  Cc: Jens Axboe, linux-block, linux-scsi, Jaegeuk Kim, Avri Altman,
	Adrian Hunter, Christoph Hellwig, Ming Lei, Keith Busch

On Mon, Jan 30, 2023 at 01:26:50PM -0800, Bart Van Assche wrote:
> Move the code for creating the block layer debugfs root directory into
> blk-mq-debugfs.c. This patch prepares for adding more debugfs
> initialization code by introducing the function blk_mq_debugfs_init().
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Keith Busch <kbusch@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>

  Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 1/7] block: Introduce blk_mq_debugfs_init()
  2023-02-01 20:58   ` Luis Chamberlain
@ 2023-02-01 21:23     ` Luis Chamberlain
  2023-02-01 22:01       ` Bart Van Assche
  0 siblings, 1 reply; 21+ messages in thread
From: Luis Chamberlain @ 2023-02-01 21:23 UTC (permalink / raw)
  To: Bart Van Assche, Pankaj Raghav, mcgrof
  Cc: Jens Axboe, linux-block, linux-scsi, Jaegeuk Kim, Avri Altman,
	Adrian Hunter, Christoph Hellwig, Ming Lei, Keith Busch

On Wed, Feb 01, 2023 at 12:58:00PM -0800, Luis Chamberlain wrote:
> On Mon, Jan 30, 2023 at 01:26:50PM -0800, Bart Van Assche wrote:
> > Move the code for creating the block layer debugfs root directory into
> > blk-mq-debugfs.c. This patch prepares for adding more debugfs
> > initialization code by introducing the function blk_mq_debugfs_init().
> > 
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Ming Lei <ming.lei@redhat.com>
> > Cc: Keith Busch <kbusch@kernel.org>
> > Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> 
> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>

Sorry but actually a neuron triggered after this to remind me of commit
85e0cbbb8a  ("block: create the request_queue debugfs_dir on
registration") and so using the terminology on that commit, wouldn't
this not create now the root block debugfs dir for request-based block
drivers?

  Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 1/7] block: Introduce blk_mq_debugfs_init()
  2023-02-01 21:23     ` Luis Chamberlain
@ 2023-02-01 22:01       ` Bart Van Assche
  2023-02-01 23:59         ` Luis Chamberlain
  0 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2023-02-01 22:01 UTC (permalink / raw)
  To: Luis Chamberlain, Pankaj Raghav
  Cc: Jens Axboe, linux-block, linux-scsi, Jaegeuk Kim, Avri Altman,
	Adrian Hunter, Christoph Hellwig, Ming Lei, Keith Busch

On 2/1/23 13:23, Luis Chamberlain wrote:
> On Wed, Feb 01, 2023 at 12:58:00PM -0800, Luis Chamberlain wrote:
>> On Mon, Jan 30, 2023 at 01:26:50PM -0800, Bart Van Assche wrote:
>>> Move the code for creating the block layer debugfs root directory into
>>> blk-mq-debugfs.c. This patch prepares for adding more debugfs
>>> initialization code by introducing the function blk_mq_debugfs_init().
>>>
>>> Cc: Christoph Hellwig <hch@lst.de>
>>> Cc: Ming Lei <ming.lei@redhat.com>
>>> Cc: Keith Busch <kbusch@kernel.org>
>>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>>
>> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
> 
> Sorry but actually a neuron triggered after this to remind me of commit
> 85e0cbbb8a  ("block: create the request_queue debugfs_dir on
> registration") and so using the terminology on that commit, wouldn't
> this not create now the root block debugfs dir for request-based block
> drivers?

Hi Luis,

This patch should not change any behavior with CONFIG_DEBUG_FS=y.

As one can see in include/linux/debugfs.h, debugfs_create_dir() does not 
create a directory with CONFIG_DEBUG_FS=n:

static inline struct dentry *debugfs_create_dir(const char *name,
						struct dentry *parent)
{
	return ERR_PTR(-ENODEV);
}

I think the only behavior change introduced by this patch is that 
blk_debugfs_root remains NULL with CONFIG_DEBUG_FS=n instead of being 
set to ERR_PTR(-ENODEV).

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 2/7] block: Support configuring limits below the page size
  2023-01-30 21:26 ` [PATCH v4 2/7] block: Support configuring limits below the page size Bart Van Assche
@ 2023-02-01 23:50   ` Luis Chamberlain
  2023-02-07  0:02     ` Bart Van Assche
  0 siblings, 1 reply; 21+ messages in thread
From: Luis Chamberlain @ 2023-02-01 23:50 UTC (permalink / raw)
  To: Bart Van Assche, Pankaj Raghav
  Cc: Jens Axboe, linux-block, linux-scsi, Jaegeuk Kim, Avri Altman,
	Adrian Hunter, Christoph Hellwig, Ming Lei, Keith Busch

On Mon, Jan 30, 2023 at 01:26:51PM -0800, Bart Van Assche wrote:
> Allow block drivers to configure the following:
> * Maximum number of hardware sectors values smaller than
>   PAGE_SIZE >> SECTOR_SHIFT. With PAGE_SIZE = 4096 this means that values
>   below 8 are supported.
> * A maximum segment size below the page size. This is most useful
>   for page sizes above 4096 bytes.
> 
> The blk_sub_page_segments static branch will be used in later patches to
> prevent that performance of block drivers that support segments >=
> PAGE_SIZE and max_hw_sectors >= PAGE_SIZE >> SECTOR_SHIFT would be affected.

This commit log seems to not make it clear if there is a functional
change or not.

> @@ -100,6 +106,55 @@ void blk_queue_bounce_limit(struct request_queue *q, enum blk_bounce bounce)
>  }
>  EXPORT_SYMBOL(blk_queue_bounce_limit);
>  
> +/* For debugfs. */
> +int blk_sub_page_limit_queues_get(void *data, u64 *val)
> +{
> +	*val = READ_ONCE(blk_nr_sub_page_limit_queues);
> +
> +	return 0;
> +}
> +
> +/**
> + * blk_enable_sub_page_limits - enable support for max_segment_size values smaller than PAGE_SIZE and for max_hw_sectors values below PAGE_SIZE >> SECTOR_SHIFT

That's a really long line for kdoc, can't we just trim it?

And why not update the docs to reflect the change?

> @@ -122,12 +177,17 @@ EXPORT_SYMBOL(blk_queue_bounce_limit);
>  void blk_queue_max_hw_sectors(struct request_queue *q, unsigned int max_hw_sectors)
>  {
>  	struct queue_limits *limits = &q->limits;
> +	unsigned int min_max_hw_sectors = PAGE_SIZE >> SECTOR_SHIFT;
>  	unsigned int max_sectors;
>  
> -	if ((max_hw_sectors << 9) < PAGE_SIZE) {
> -		max_hw_sectors = 1 << (PAGE_SHIFT - 9);
> -		printk(KERN_INFO "%s: set to minimum %d\n",
> -		       __func__, max_hw_sectors);
> +	if (max_hw_sectors < min_max_hw_sectors) {
> +		blk_enable_sub_page_limits(limits);
> +		min_max_hw_sectors = 1;
> +	}

Up to this part this a non-functional update, and so why not a
separate patch to clarify that.

> +
> +	if (max_hw_sectors < min_max_hw_sectors) {
> +		max_hw_sectors = min_max_hw_sectors;
> +		pr_info("%s: set to minimum %u\n", __func__, max_hw_sectors);

But if I understand correctly here we're now changing
max_hw_sectors from 1 to whatever the driver set on
blk_queue_max_hw_sectors() if its smaller than PAGE_SIZE.

To determine if this is a functional change it begs the
question as to how many block drivers have a max hw sector
smaller than the equivalent PAGE_SIZE and wonder if that
could regress.

>  	}
>  
>  	max_hw_sectors = round_down(max_hw_sectors,
> @@ -282,10 +342,16 @@ EXPORT_SYMBOL_GPL(blk_queue_max_discard_segments);
>   **/
>  void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size)
>  {
> -	if (max_size < PAGE_SIZE) {
> -		max_size = PAGE_SIZE;
> -		printk(KERN_INFO "%s: set to minimum %d\n",
> -		       __func__, max_size);
> +	unsigned int min_max_segment_size = PAGE_SIZE;
> +
> +	if (max_size < min_max_segment_size) {
> +		blk_enable_sub_page_limits(&q->limits);
> +		min_max_segment_size = SECTOR_SIZE;
> +	}
> +
> +	if (max_size < min_max_segment_size) {
> +		max_size = min_max_segment_size;
> +		pr_info("%s: set to minimum %u\n", __func__, max_size);

And similar thing here.

  Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 1/7] block: Introduce blk_mq_debugfs_init()
  2023-02-01 22:01       ` Bart Van Assche
@ 2023-02-01 23:59         ` Luis Chamberlain
  2023-02-02  1:06           ` Bart Van Assche
  0 siblings, 1 reply; 21+ messages in thread
From: Luis Chamberlain @ 2023-02-01 23:59 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Pankaj Raghav, Jens Axboe, linux-block, linux-scsi, Jaegeuk Kim,
	Avri Altman, Adrian Hunter, Christoph Hellwig, Ming Lei,
	Keith Busch

On Wed, Feb 01, 2023 at 02:01:18PM -0800, Bart Van Assche wrote:
> On 2/1/23 13:23, Luis Chamberlain wrote:
> > On Wed, Feb 01, 2023 at 12:58:00PM -0800, Luis Chamberlain wrote:
> > > On Mon, Jan 30, 2023 at 01:26:50PM -0800, Bart Van Assche wrote:
> > > > Move the code for creating the block layer debugfs root directory into
> > > > blk-mq-debugfs.c. This patch prepares for adding more debugfs
> > > > initialization code by introducing the function blk_mq_debugfs_init().
> > > > 
> > > > Cc: Christoph Hellwig <hch@lst.de>
> > > > Cc: Ming Lei <ming.lei@redhat.com>
> > > > Cc: Keith Busch <kbusch@kernel.org>
> > > > Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> > > 
> > > Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
> > 
> > Sorry but actually a neuron triggered after this to remind me of commit
> > 85e0cbbb8a  ("block: create the request_queue debugfs_dir on
> > registration") and so using the terminology on that commit, wouldn't
> > this not create now the root block debugfs dir for request-based block
> > drivers?
> 
> Hi Luis,
> 
> This patch should not change any behavior with CONFIG_DEBUG_FS=y.

Ignore CONFIG_DEBUG_FS=y. This is about blktrace which needs it for
both types of drivers.

> As one can see in include/linux/debugfs.h, debugfs_create_dir() does not
> create a directory with CONFIG_DEBUG_FS=n:
> 
> static inline struct dentry *debugfs_create_dir(const char *name,
> 						struct dentry *parent)
> {
> 	return ERR_PTR(-ENODEV);
> }
> 
> I think the only behavior change introduced by this patch is that
> blk_debugfs_root remains NULL with CONFIG_DEBUG_FS=n instead of being set to
> ERR_PTR(-ENODEV).

My point was commit 85e0cbbb8a made blk_debugfs_root non-null always now
when debugfs is enabled for both request-based block drivers and for
make_request block drivers (multiqueue). My reading is that with your
patch blk_debugfs_root will not be created for request-based block
drivers.

  Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 1/7] block: Introduce blk_mq_debugfs_init()
  2023-02-01 23:59         ` Luis Chamberlain
@ 2023-02-02  1:06           ` Bart Van Assche
  2023-02-06 16:03             ` Luis Chamberlain
  0 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2023-02-02  1:06 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Pankaj Raghav, Jens Axboe, linux-block, linux-scsi, Jaegeuk Kim,
	Avri Altman, Adrian Hunter, Christoph Hellwig, Ming Lei,
	Keith Busch

On 2/1/23 15:59, Luis Chamberlain wrote:
> My point was commit 85e0cbbb8a made blk_debugfs_root non-null always now
> when debugfs is enabled for both request-based block drivers and for
> make_request block drivers (multiqueue). My reading is that with your
> patch blk_debugfs_root will not be created for request-based block
> drivers.

Hi Luis,

The empty version of blk_mq_debugfs_init() in my patch is only selected 
if  CONFIG_BLK_DEBUG_FS=n. I think that my patch preserves the behavior 
that /sys/kernel/debug/block/ is created independent of the type of 
request queues that is created. Am I perhaps misunderstanding your feedback?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 1/7] block: Introduce blk_mq_debugfs_init()
  2023-02-02  1:06           ` Bart Van Assche
@ 2023-02-06 16:03             ` Luis Chamberlain
  2023-02-06 21:11               ` Bart Van Assche
  0 siblings, 1 reply; 21+ messages in thread
From: Luis Chamberlain @ 2023-02-06 16:03 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Pankaj Raghav, Jens Axboe, linux-block, linux-scsi, Jaegeuk Kim,
	Avri Altman, Adrian Hunter, Christoph Hellwig, Ming Lei,
	Keith Busch

On Wed, Feb 01, 2023 at 05:06:22PM -0800, Bart Van Assche wrote:
> On 2/1/23 15:59, Luis Chamberlain wrote:
> > My point was commit 85e0cbbb8a made blk_debugfs_root non-null always now
> > when debugfs is enabled for both request-based block drivers and for
> > make_request block drivers (multiqueue). My reading is that with your
> > patch blk_debugfs_root will not be created for request-based block
> > drivers.
> 
> Hi Luis,
> 
> The empty version of blk_mq_debugfs_init() in my patch is only selected if
> CONFIG_BLK_DEBUG_FS=n. I think that my patch preserves the behavior that
> /sys/kernel/debug/block/ is created independent of the type of request
> queues that is created. Am I perhaps misunderstanding your feedback?

Yes, my point is prior to this patch the debugfs directory is created
even if you have CONFIG_BLK_DEBUG_FS=n and I've mentioned the commit
which describes why, but the skinnys is blktrace.

  Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 1/7] block: Introduce blk_mq_debugfs_init()
  2023-02-06 16:03             ` Luis Chamberlain
@ 2023-02-06 21:11               ` Bart Van Assche
  0 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2023-02-06 21:11 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Pankaj Raghav, Jens Axboe, linux-block, linux-scsi, Jaegeuk Kim,
	Avri Altman, Adrian Hunter, Christoph Hellwig, Ming Lei,
	Keith Busch

On 2/6/23 08:03, Luis Chamberlain wrote:
> Yes, my point is prior to this patch the debugfs directory is created
> even if you have CONFIG_BLK_DEBUG_FS=n and I've mentioned the commit
> which describes why, but the skinnys is blktrace.

I overlooked that the blk-mq-debugfs code is guarded by 
CONFIG_BLK_DEBUG_FS instead of CONFIG_DEBUG_FS. I will fix the issue 
that you reported.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 2/7] block: Support configuring limits below the page size
  2023-02-01 23:50   ` Luis Chamberlain
@ 2023-02-07  0:02     ` Bart Van Assche
  2023-02-07  0:19       ` Luis Chamberlain
  0 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2023-02-07  0:02 UTC (permalink / raw)
  To: Luis Chamberlain, Pankaj Raghav
  Cc: Jens Axboe, linux-block, linux-scsi, Jaegeuk Kim, Avri Altman,
	Adrian Hunter, Christoph Hellwig, Ming Lei, Keith Busch

On 2/1/23 15:50, Luis Chamberlain wrote:
> On Mon, Jan 30, 2023 at 01:26:51PM -0800, Bart Van Assche wrote:
>> @@ -122,12 +177,17 @@ EXPORT_SYMBOL(blk_queue_bounce_limit);
>>   void blk_queue_max_hw_sectors(struct request_queue *q, unsigned int max_hw_sectors)
>>   {
>>   	struct queue_limits *limits = &q->limits;
>> +	unsigned int min_max_hw_sectors = PAGE_SIZE >> SECTOR_SHIFT;
>>   	unsigned int max_sectors;
>>   
>> -	if ((max_hw_sectors << 9) < PAGE_SIZE) {
>> -		max_hw_sectors = 1 << (PAGE_SHIFT - 9);
>> -		printk(KERN_INFO "%s: set to minimum %d\n",
>> -		       __func__, max_hw_sectors);
>> +	if (max_hw_sectors < min_max_hw_sectors) {
>> +		blk_enable_sub_page_limits(limits);
>> +		min_max_hw_sectors = 1;
>> +	}
> 
> Up to this part this a non-functional update, and so why not a
> separate patch to clarify that.

Will do.

>> +
>> +	if (max_hw_sectors < min_max_hw_sectors) {
>> +		max_hw_sectors = min_max_hw_sectors;
>> +		pr_info("%s: set to minimum %u\n", __func__, max_hw_sectors);
> 
> But if I understand correctly here we're now changing
> max_hw_sectors from 1 to whatever the driver set on
> blk_queue_max_hw_sectors() if its smaller than PAGE_SIZE.
> 
> To determine if this is a functional change it begs the
> question as to how many block drivers have a max hw sector
> smaller than the equivalent PAGE_SIZE and wonder if that
> could regress.

If a block driver passes an argument to blk_queue_max_hw_sectors() or 
blk_queue_max_segment_size() that is smaller than what is supported by 
the block layer, data corruption will be triggered if the block driver 
or the hardware supported by the block driver does not support the 
larger values chosen by the block layer. Changing limits that will 
trigger data corruption into limits that may work seems like an 
improvement to me?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 2/7] block: Support configuring limits below the page size
  2023-02-07  0:02     ` Bart Van Assche
@ 2023-02-07  0:19       ` Luis Chamberlain
  2023-02-07  0:31         ` Bart Van Assche
  0 siblings, 1 reply; 21+ messages in thread
From: Luis Chamberlain @ 2023-02-07  0:19 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Pankaj Raghav, Jens Axboe, linux-block, linux-scsi, Jaegeuk Kim,
	Avri Altman, Adrian Hunter, Christoph Hellwig, Ming Lei,
	Keith Busch

On Mon, Feb 06, 2023 at 04:02:46PM -0800, Bart Van Assche wrote:
> On 2/1/23 15:50, Luis Chamberlain wrote:
> > On Mon, Jan 30, 2023 at 01:26:51PM -0800, Bart Van Assche wrote:
> > > @@ -122,12 +177,17 @@ EXPORT_SYMBOL(blk_queue_bounce_limit);
> > >   void blk_queue_max_hw_sectors(struct request_queue *q, unsigned int max_hw_sectors)
> > >   {
> > >   	struct queue_limits *limits = &q->limits;
> > > +	unsigned int min_max_hw_sectors = PAGE_SIZE >> SECTOR_SHIFT;
> > >   	unsigned int max_sectors;
> > > -	if ((max_hw_sectors << 9) < PAGE_SIZE) {
> > > -		max_hw_sectors = 1 << (PAGE_SHIFT - 9);
> > > -		printk(KERN_INFO "%s: set to minimum %d\n",
> > > -		       __func__, max_hw_sectors);
> > > +	if (max_hw_sectors < min_max_hw_sectors) {
> > > +		blk_enable_sub_page_limits(limits);
> > > +		min_max_hw_sectors = 1;
> > > +	}
> > 
> > Up to this part this a non-functional update, and so why not a
> > separate patch to clarify that.
> 
> Will do.
> 
> > > +
> > > +	if (max_hw_sectors < min_max_hw_sectors) {
> > > +		max_hw_sectors = min_max_hw_sectors;
> > > +		pr_info("%s: set to minimum %u\n", __func__, max_hw_sectors);
> > 
> > But if I understand correctly here we're now changing
> > max_hw_sectors from 1 to whatever the driver set on
> > blk_queue_max_hw_sectors() if its smaller than PAGE_SIZE.
> > 
> > To determine if this is a functional change it begs the
> > question as to how many block drivers have a max hw sector
> > smaller than the equivalent PAGE_SIZE and wonder if that
> > could regress.
> 
> If a block driver passes an argument to blk_queue_max_hw_sectors() or
> blk_queue_max_segment_size() that is smaller than what is supported by the
> block layer, data corruption will be triggered if the block driver or the
> hardware supported by the block driver does not support the larger values
> chosen by the block layer. Changing limits that will trigger data corruption
> into limits that may work seems like an improvement to me?

Wow, clearly! Without a doubt!

But I'm trying to do a careful review here.

The commit log did not describe what *does* happen in these situations today,
and you seem to now be suggesting in the worst case corruption can happen.
That changes the patch context quite a bit!

My question above still stands though, how many block drivers have a max
hw sector smaller than the equivalent PAGE_SIZE. If you make your
change, even if it fixes some new use case where corruption is seen, can
it regress some old use cases for some old controllers?

  Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 2/7] block: Support configuring limits below the page size
  2023-02-07  0:19       ` Luis Chamberlain
@ 2023-02-07  0:31         ` Bart Van Assche
  2023-02-07  2:08           ` Luis Chamberlain
  0 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2023-02-07  0:31 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Pankaj Raghav, Jens Axboe, linux-block, linux-scsi, Jaegeuk Kim,
	Avri Altman, Adrian Hunter, Christoph Hellwig, Ming Lei,
	Keith Busch

On 2/6/23 16:19, Luis Chamberlain wrote:
> But I'm trying to do a careful review here.

That's appreciated :-)

> The commit log did not describe what *does* happen in these situations today,
> and you seem to now be suggesting in the worst case corruption can happen.
> That changes the patch context quite a bit!
> 
> My question above still stands though, how many block drivers have a max
> hw sector smaller than the equivalent PAGE_SIZE. If you make your
> change, even if it fixes some new use case where corruption is seen, can
> it regress some old use cases for some old controllers?

The blk_queue_max_hw_sectors() change has been requested by a 
contributor to the MMC driver (I'm not familiar with the MMC driver).

I'm not aware of any storage controllers for which the maximum segment 
size is below 4 KiB.

For some storage controllers, e.g. the UFS Exynos controller, the 
maximum supported segment size is 4 KiB. This patch series makes such 
storage controllers compatible with larger page sizes, e.g. 16 KiB.

Does this answer your question?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 2/7] block: Support configuring limits below the page size
  2023-02-07  0:31         ` Bart Van Assche
@ 2023-02-07  2:08           ` Luis Chamberlain
  2023-02-07 18:26             ` Bart Van Assche
  0 siblings, 1 reply; 21+ messages in thread
From: Luis Chamberlain @ 2023-02-07  2:08 UTC (permalink / raw)
  To: Bart Van Assche, Javier González
  Cc: Pankaj Raghav, Jens Axboe, linux-block, linux-scsi, Jaegeuk Kim,
	Avri Altman, Adrian Hunter, Christoph Hellwig, Ming Lei,
	Keith Busch

On Mon, Feb 06, 2023 at 04:31:58PM -0800, Bart Van Assche wrote:
> On 2/6/23 16:19, Luis Chamberlain wrote:
> > But I'm trying to do a careful review here.
> 
> That's appreciated :-)
> 
> > The commit log did not describe what *does* happen in these situations today,
> > and you seem to now be suggesting in the worst case corruption can happen.
> > That changes the patch context quite a bit!
> > 
> > My question above still stands though, how many block drivers have a max
> > hw sector smaller than the equivalent PAGE_SIZE. If you make your
> > change, even if it fixes some new use case where corruption is seen, can
> > it regress some old use cases for some old controllers?
> 
> The blk_queue_max_hw_sectors() change has been requested by a contributor to
> the MMC driver (I'm not familiar with the MMC driver).
> 
> I'm not aware of any storage controllers for which the maximum segment size
> is below 4 KiB.

Then the commit log should mention that. Because do you admit that it
could possible change their behaviour?

> For some storage controllers, e.g. the UFS Exynos controller, the maximum
> supported segment size is 4 KiB. This patch series makes such storage
> controllers compatible with larger page sizes, e.g. 16 KiB.
> 
> Does this answer your question?

Does mine answer the reason to why I am asking it? If we are sure these
don't exist then please mention it in the commit log. And also more
importantly the possible corruption issue you describe which could
happen! Was a corruption actually observed in real life or reported!?

  Luis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 2/7] block: Support configuring limits below the page size
  2023-02-07  2:08           ` Luis Chamberlain
@ 2023-02-07 18:26             ` Bart Van Assche
  0 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2023-02-07 18:26 UTC (permalink / raw)
  To: Luis Chamberlain, Javier González
  Cc: Pankaj Raghav, Jens Axboe, linux-block, linux-scsi, Jaegeuk Kim,
	Avri Altman, Adrian Hunter, Christoph Hellwig, Ming Lei,
	Keith Busch

On 2/6/23 18:08, Luis Chamberlain wrote:
> On Mon, Feb 06, 2023 at 04:31:58PM -0800, Bart Van Assche wrote:
>> On 2/6/23 16:19, Luis Chamberlain wrote:
>>> But I'm trying to do a careful review here.
>>
>> That's appreciated :-)
>>
>>> The commit log did not describe what *does* happen in these situations today,
>>> and you seem to now be suggesting in the worst case corruption can happen.
>>> That changes the patch context quite a bit!
>>>
>>> My question above still stands though, how many block drivers have a max
>>> hw sector smaller than the equivalent PAGE_SIZE. If you make your
>>> change, even if it fixes some new use case where corruption is seen, can
>>> it regress some old use cases for some old controllers?
>>
>> The blk_queue_max_hw_sectors() change has been requested by a contributor to
>> the MMC driver (I'm not familiar with the MMC driver).
>>
>> I'm not aware of any storage controllers for which the maximum segment size
>> is below 4 KiB.
> 
> Then the commit log should mention that. Because do you admit that it
> could possible change their behaviour?

I will make the commit message more detailed.

>> For some storage controllers, e.g. the UFS Exynos controller, the maximum
>> supported segment size is 4 KiB. This patch series makes such storage
>> controllers compatible with larger page sizes, e.g. 16 KiB.
>>
>> Does this answer your question?
> 
> Does mine answer the reason to why I am asking it? If we are sure these
> don't exist then please mention it in the commit log. And also more
> importantly the possible corruption issue you describe which could
> happen! Was a corruption actually observed in real life or reported!?

Incorrect data transfers have been observed in our tests. We noticed in 
our tests with the Exynos controller and PAGE_SIZE = 16 KiB that booting 
fails without this patch series. I think booting failed because the DMA 
engine in this controller was asked to perform transfers that fall 
outside the supported limits. I expect similar behavior for all other 
storage controllers with a DMA engine.

Bart.


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-02-07 18:27 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-30 21:26 [PATCH v4 0/7] Add support for limits below the page size Bart Van Assche
2023-01-30 21:26 ` [PATCH v4 1/7] block: Introduce blk_mq_debugfs_init() Bart Van Assche
2023-02-01 20:58   ` Luis Chamberlain
2023-02-01 21:23     ` Luis Chamberlain
2023-02-01 22:01       ` Bart Van Assche
2023-02-01 23:59         ` Luis Chamberlain
2023-02-02  1:06           ` Bart Van Assche
2023-02-06 16:03             ` Luis Chamberlain
2023-02-06 21:11               ` Bart Van Assche
2023-01-30 21:26 ` [PATCH v4 2/7] block: Support configuring limits below the page size Bart Van Assche
2023-02-01 23:50   ` Luis Chamberlain
2023-02-07  0:02     ` Bart Van Assche
2023-02-07  0:19       ` Luis Chamberlain
2023-02-07  0:31         ` Bart Van Assche
2023-02-07  2:08           ` Luis Chamberlain
2023-02-07 18:26             ` Bart Van Assche
2023-01-30 21:26 ` [PATCH v4 3/7] block: Support submitting passthrough requests with small segments Bart Van Assche
2023-01-30 21:26 ` [PATCH v4 4/7] block: Add support for filesystem requests and " Bart Van Assche
2023-01-30 21:26 ` [PATCH v4 5/7] block: Add support for small segments in blk_rq_map_user_iov() Bart Van Assche
2023-01-30 21:26 ` [PATCH v4 6/7] scsi_debug: Support configuring the maximum segment size Bart Van Assche
2023-01-30 21:26 ` [PATCH v4 7/7] null_blk: " Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.