linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] Elevator cleanups and improvements
@ 2019-08-23  0:15 Damien Le Moal
  2019-08-23  0:15 ` [PATCH 1/7] block: Cleanup elevator_init_mq() use Damien Le Moal
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Damien Le Moal @ 2019-08-23  0:15 UTC (permalink / raw)
  To: linux-block, Jens Axboe

This patch series implements some cleanup of the elevator initialization
code and introduces elevator features identification and device matching
to enhance checks for elevator/device compatibility and fitness.

The first 3 patches of the series are simple cleanups which simplify and
streamline elevator initialization for newly allocated device queues.

The following two patches introduce elevator features control which
allow defining features and function that an elevator supports and match
these against features required by a block device driver. With this, the
sysfs elevator list for a device always only shows elevators matching
the features that a particular device requires, with the exception of
the none elevator which has no features but is always available for use
with any device.

The first feature defined is for zoned block device write order control
through zone write locking which prevents the use of any scheduler that
does not support this feature with zoned devices.

The last 2 patches of this series rework the default elevator selection
and initialization to allow for the feature matching to work,
simultaneously addressing cases not currently well supported, namely,
multi-queue zoned block devices.

Damien Le Moal (7):
  block: Cleanup elevator_init_mq() use
  block: Change elevator_init_mq() to always succeed
  block: Remove sysfs lock from elevator_init_rq()
  block: Introduce elevator features
  block: Introduce zoned block device elevator feature
  block: Improve default elevator selection
  block: Delay default elevator initialization

 block/blk-mq.c                |  10 ---
 block/blk-settings.c          |  15 ++++
 block/blk.h                   |   2 +-
 block/elevator.c              | 144 +++++++++++++++++++++++++---------
 block/genhd.c                 |   3 +
 block/mq-deadline.c           |   1 +
 drivers/block/null_blk_main.c |   5 ++
 drivers/scsi/sd_zbc.c         |   2 +
 include/linux/blkdev.h        |   4 +
 include/linux/elevator.h      |   8 ++
 10 files changed, 147 insertions(+), 47 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/7] block: Cleanup elevator_init_mq() use
  2019-08-23  0:15 [PATCH 0/7] Elevator cleanups and improvements Damien Le Moal
@ 2019-08-23  0:15 ` Damien Le Moal
  2019-08-23  8:08   ` Johannes Thumshirn
  2019-08-23  0:15 ` [PATCH 2/7] block: Change elevator_init_mq() to always succeed Damien Le Moal
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Damien Le Moal @ 2019-08-23  0:15 UTC (permalink / raw)
  To: linux-block, Jens Axboe

Instead of checking a queue tag_set BLK_MQ_F_NO_SCHED flag before
calling elevator_init_mq() to make sure that the queue supports IO
scheduling, use the elevator.c function elv_support_iosched() in
elevator_init_mq(). This does not introduce any functional change but
ensure that elevator_init_mq() does the right thing based on the queue
settings.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/blk-mq.c   |  8 +++-----
 block/elevator.c | 23 +++++++++++++----------
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 509f69fdfcf2..556c774a0f0d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2908,11 +2908,9 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	blk_mq_add_queue_tag_set(set, q);
 	blk_mq_map_swqueue(q);
 
-	if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
-		ret = elevator_init_mq(q);
-		if (ret)
-			goto err_tag_set;
-	}
+	ret = elevator_init_mq(q);
+	if (ret)
+		goto err_tag_set;
 
 	return q;
 
diff --git a/block/elevator.c b/block/elevator.c
index 2f17d66d0e61..1ed2710f1950 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -594,16 +594,26 @@ int elevator_switch_mq(struct request_queue *q,
 	return ret;
 }
 
+static inline bool elv_support_iosched(struct request_queue *q)
+{
+	if (q->tag_set && (q->tag_set->flags & BLK_MQ_F_NO_SCHED))
+		return false;
+	return true;
+}
+
 /*
- * For blk-mq devices, we default to using mq-deadline, if available, for single
- * queue devices.  If deadline isn't available OR we have multiple queues,
- * default to "none".
+ * For blk-mq devices supporting IO scheduling, we default to using mq-deadline,
+ * if available, for single queue devices. If deadline isn't available OR we
+ * have multiple queues, default to "none".
  */
 int elevator_init_mq(struct request_queue *q)
 {
 	struct elevator_type *e;
 	int err = 0;
 
+	if (!elv_support_iosched(q))
+		return 0;
+
 	if (q->nr_hw_queues != 1)
 		return 0;
 
@@ -685,13 +695,6 @@ static int __elevator_change(struct request_queue *q, const char *name)
 	return elevator_switch(q, e);
 }
 
-static inline bool elv_support_iosched(struct request_queue *q)
-{
-	if (q->tag_set && (q->tag_set->flags & BLK_MQ_F_NO_SCHED))
-		return false;
-	return true;
-}
-
 ssize_t elv_iosched_store(struct request_queue *q, const char *name,
 			  size_t count)
 {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/7] block: Change elevator_init_mq() to always succeed
  2019-08-23  0:15 [PATCH 0/7] Elevator cleanups and improvements Damien Le Moal
  2019-08-23  0:15 ` [PATCH 1/7] block: Cleanup elevator_init_mq() use Damien Le Moal
@ 2019-08-23  0:15 ` Damien Le Moal
  2019-08-23  8:54   ` Johannes Thumshirn
  2019-08-23  0:15 ` [PATCH 3/7] block: Remove sysfs lock from elevator_init_rq() Damien Le Moal
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Damien Le Moal @ 2019-08-23  0:15 UTC (permalink / raw)
  To: linux-block, Jens Axboe

If the default elevator chosen is mq-deadline, elevator_init_mq() may
return an error if the mq-deadline initialization fails, leading to
blk_mq_init_allocated_queue() returning an error, which in turn will
cause the block device initialization to fail.

Instead of taking such extreme measure, handle mq-deadline
initialization failures in the same manner as if mq-deadline being not
available (no module to load), that is, default to the "none" scheduler.
With this change, elevator_init_mq() return type can be changed to void.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/blk-mq.c   |  8 +-------
 block/blk.h      |  2 +-
 block/elevator.c | 17 ++++++++++-------
 3 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 556c774a0f0d..274e168c8535 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2846,8 +2846,6 @@ static unsigned int nr_hw_queues(struct blk_mq_tag_set *set)
 struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 						  struct request_queue *q)
 {
-	int ret = -ENOMEM;
-
 	/* mark the queue as mq asap */
 	q->mq_ops = set->ops;
 
@@ -2908,14 +2906,10 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	blk_mq_add_queue_tag_set(set, q);
 	blk_mq_map_swqueue(q);
 
-	ret = elevator_init_mq(q);
-	if (ret)
-		goto err_tag_set;
+	elevator_init_mq(q);
 
 	return q;
 
-err_tag_set:
-	blk_mq_del_queue_tag_set(q);
 err_hctxs:
 	kfree(q->queue_hw_ctx);
 	q->nr_hw_queues = 0;
diff --git a/block/blk.h b/block/blk.h
index de6b2e146d6e..ddb292bb6caf 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -184,7 +184,7 @@ void blk_account_io_done(struct request *req, u64 now);
 
 void blk_insert_flush(struct request *rq);
 
-int elevator_init_mq(struct request_queue *q);
+void elevator_init_mq(struct request_queue *q);
 int elevator_switch_mq(struct request_queue *q,
 			      struct elevator_type *new_e);
 void __elevator_exit(struct request_queue *, struct elevator_queue *);
diff --git a/block/elevator.c b/block/elevator.c
index 1ed2710f1950..7fff06751633 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -603,19 +603,19 @@ static inline bool elv_support_iosched(struct request_queue *q)
 
 /*
  * For blk-mq devices supporting IO scheduling, we default to using mq-deadline,
- * if available, for single queue devices. If deadline isn't available OR we
- * have multiple queues, default to "none".
+ * if available, for single queue devices. If deadline isn't available OR
+ * deadline initialization fails OR we have multiple queues, default to "none".
  */
-int elevator_init_mq(struct request_queue *q)
+void elevator_init_mq(struct request_queue *q)
 {
 	struct elevator_type *e;
 	int err = 0;
 
 	if (!elv_support_iosched(q))
-		return 0;
+		return;
 
 	if (q->nr_hw_queues != 1)
-		return 0;
+		return;
 
 	/*
 	 * q->sysfs_lock must be held to provide mutual exclusion between
@@ -630,11 +630,14 @@ int elevator_init_mq(struct request_queue *q)
 		goto out_unlock;
 
 	err = blk_mq_init_sched(q, e);
-	if (err)
+	if (err) {
+		pr_warn("\"%s\" elevator initialization failed, "
+			"falling back to \"none\"\n", e->elevator_name);
 		elevator_put(e);
+	}
+
 out_unlock:
 	mutex_unlock(&q->sysfs_lock);
-	return err;
 }
 
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/7] block: Remove sysfs lock from elevator_init_rq()
  2019-08-23  0:15 [PATCH 0/7] Elevator cleanups and improvements Damien Le Moal
  2019-08-23  0:15 ` [PATCH 1/7] block: Cleanup elevator_init_mq() use Damien Le Moal
  2019-08-23  0:15 ` [PATCH 2/7] block: Change elevator_init_mq() to always succeed Damien Le Moal
@ 2019-08-23  0:15 ` Damien Le Moal
  2019-08-23  9:00   ` Johannes Thumshirn
  2019-08-23 20:32   ` Bart Van Assche
  2019-08-23  0:15 ` [PATCH 4/7] block: Introduce elevator features Damien Le Moal
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 14+ messages in thread
From: Damien Le Moal @ 2019-08-23  0:15 UTC (permalink / raw)
  To: linux-block, Jens Axboe

Since elevator_init_rq() is called before the device queue is registered
in sysfs, there is no possible conflict with elevator_switch(). Remove
the unnecessary locking of q->sysfs_lock mutex.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/elevator.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index 7fff06751633..6208ddc334ef 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -617,17 +617,12 @@ void elevator_init_mq(struct request_queue *q)
 	if (q->nr_hw_queues != 1)
 		return;
 
-	/*
-	 * q->sysfs_lock must be held to provide mutual exclusion between
-	 * elevator_switch() and here.
-	 */
-	mutex_lock(&q->sysfs_lock);
 	if (unlikely(q->elevator))
-		goto out_unlock;
+		return;
 
 	e = elevator_get(q, "mq-deadline", false);
 	if (!e)
-		goto out_unlock;
+		return;
 
 	err = blk_mq_init_sched(q, e);
 	if (err) {
@@ -635,9 +630,6 @@ void elevator_init_mq(struct request_queue *q)
 			"falling back to \"none\"\n", e->elevator_name);
 		elevator_put(e);
 	}
-
-out_unlock:
-	mutex_unlock(&q->sysfs_lock);
 }
 
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/7] block: Introduce elevator features
  2019-08-23  0:15 [PATCH 0/7] Elevator cleanups and improvements Damien Le Moal
                   ` (2 preceding siblings ...)
  2019-08-23  0:15 ` [PATCH 3/7] block: Remove sysfs lock from elevator_init_rq() Damien Le Moal
@ 2019-08-23  0:15 ` Damien Le Moal
  2019-08-23 11:19   ` Johannes Thumshirn
  2019-08-23  0:15 ` [PATCH 5/7] block: Introduce zoned block device elevator feature Damien Le Moal
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Damien Le Moal @ 2019-08-23  0:15 UTC (permalink / raw)
  To: linux-block, Jens Axboe

Introduce the elevator_features bitfield in the elevator_type structure
to allow an elevator to advertize special features that it supports.
Examples of such features are IO priorities, write hints, and zoned
block devices sequential write constraint support (aka zone write
locking).

The required_elevator_features field is added to the request_queue
structure to allow a device driver to specify features that an elevator
must support for the correct operation of the device. An example of a
typical required elevator feature is zone write locking for zoned block
devices. The helper function blk_queue_required_elevator_features() is
defined for setting this new new field.

With these two new information in place, the elevator functions
elevator_match() and elevator_find() are modified to allow a user to set
only an elevator with a set of features that satisfies the device
requirements. Elevators not matching the device requirements are not
listed in the device sysfs queue/scheduler file to prevent their use.

The "none" elevator can always be selected as before.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/blk-settings.c     | 15 ++++++++++++
 block/elevator.c         | 49 +++++++++++++++++++++++++++++++---------
 include/linux/blkdev.h   |  4 ++++
 include/linux/elevator.h |  1 +
 4 files changed, 58 insertions(+), 11 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 2c1831207a8f..16471bf18810 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -832,6 +832,21 @@ void blk_queue_write_cache(struct request_queue *q, bool wc, bool fua)
 }
 EXPORT_SYMBOL_GPL(blk_queue_write_cache);
 
+/**
+ * blk_queue_required_elevator_features - Set queue's elevator required features
+ * @q:		the request queue for the device
+ * @features:	Required elevator features OR'ed together
+ *
+ * Tell the block layer that the device controlled through @q that the
+ * queue elevator must support all the features specified by @features.
+ */
+void blk_queue_required_elevator_features(struct request_queue *q,
+					  unsigned long features)
+{
+	q->required_elevator_features = features;
+}
+EXPORT_SYMBOL_GPL(blk_queue_required_elevator_features);
+
 static int __init blk_settings_init(void)
 {
 	blk_max_low_pfn = max_low_pfn - 1;
diff --git a/block/elevator.c b/block/elevator.c
index 6208ddc334ef..de11beb32893 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -83,8 +83,26 @@ bool elv_bio_merge_ok(struct request *rq, struct bio *bio)
 }
 EXPORT_SYMBOL(elv_bio_merge_ok);
 
-static bool elevator_match(const struct elevator_type *e, const char *name)
+static inline bool elv_support_features(unsigned long elv_features,
+					unsigned long required_features)
 {
+	return (required_features & elv_features) == required_features;
+}
+
+/**
+ * elevator_match - Test an elevator name and features
+ * @e: Scheduler to test
+ * @name: Elevator name to test
+ * @required_features: Features that the elevator must provide
+ *
+ * Return true is the elevator @e name matches @name and if @e provides all the
+ * the feratures spcified by @required_features.
+ */
+static bool elevator_match(const struct elevator_type *e, const char *name,
+			   unsigned long required_features)
+{
+	if (!elv_support_features(e->elevator_features, required_features))
+		return false;
 	if (!strcmp(e->elevator_name, name))
 		return true;
 	if (e->elevator_alias && !strcmp(e->elevator_alias, name))
@@ -93,15 +111,21 @@ static bool elevator_match(const struct elevator_type *e, const char *name)
 	return false;
 }
 
-/*
- * Return scheduler with name 'name'
+/**
+ * elevator_find - Find an elevator
+ * @name: Name of the elevator to find
+ * @required_features: Features that the elevator must provide
+ *
+ * Return the first registered scheduler with name @name and supporting the
+ * features @required_features and NULL otherwise.
  */
-static struct elevator_type *elevator_find(const char *name)
+static struct elevator_type *elevator_find(const char *name,
+					   unsigned long required_features)
 {
 	struct elevator_type *e;
 
 	list_for_each_entry(e, &elv_list, list) {
-		if (elevator_match(e, name))
+		if (elevator_match(e, name, required_features))
 			return e;
 	}
 
@@ -120,12 +144,12 @@ static struct elevator_type *elevator_get(struct request_queue *q,
 
 	spin_lock(&elv_list_lock);
 
-	e = elevator_find(name);
+	e = elevator_find(name, q->required_elevator_features);
 	if (!e && try_loading) {
 		spin_unlock(&elv_list_lock);
 		request_module("%s-iosched", name);
 		spin_lock(&elv_list_lock);
-		e = elevator_find(name);
+		e = elevator_find(name, q->required_elevator_features);
 	}
 
 	if (e && !try_module_get(e->elevator_owner))
@@ -526,7 +550,7 @@ int elv_register(struct elevator_type *e)
 
 	/* register, don't allow duplicate names */
 	spin_lock(&elv_list_lock);
-	if (elevator_find(e->elevator_name)) {
+	if (elevator_find(e->elevator_name, 0)) {
 		spin_unlock(&elv_list_lock);
 		kmem_cache_destroy(e->icq_cache);
 		return -EBUSY;
@@ -682,7 +706,8 @@ static int __elevator_change(struct request_queue *q, const char *name)
 	if (!e)
 		return -EINVAL;
 
-	if (q->elevator && elevator_match(q->elevator->type, elevator_name)) {
+	if (q->elevator &&
+	    elevator_match(q->elevator->type, elevator_name, 0)) {
 		elevator_put(e);
 		return 0;
 	}
@@ -722,11 +747,13 @@ ssize_t elv_iosched_show(struct request_queue *q, char *name)
 
 	spin_lock(&elv_list_lock);
 	list_for_each_entry(__e, &elv_list, list) {
-		if (elv && elevator_match(elv, __e->elevator_name)) {
+		if (elv && elevator_match(elv, __e->elevator_name, 0)) {
 			len += sprintf(name+len, "[%s] ", elv->elevator_name);
 			continue;
 		}
-		if (elv_support_iosched(q))
+		if (elv_support_iosched(q) &&
+		    elevator_match(__e, __e->elevator_name,
+				   q->required_elevator_features))
 			len += sprintf(name+len, "%s ", __e->elevator_name);
 	}
 	spin_unlock(&elv_list_lock);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 4798bb25f1ee..1e173273511e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -492,6 +492,8 @@ struct request_queue {
 
 	struct queue_limits	limits;
 
+	unsigned long		required_elevator_features;
+
 #ifdef CONFIG_BLK_DEV_ZONED
 	/*
 	 * Zoned block device information for request dispatch control.
@@ -1084,6 +1086,8 @@ extern void blk_queue_dma_alignment(struct request_queue *, int);
 extern void blk_queue_update_dma_alignment(struct request_queue *, int);
 extern void blk_queue_rq_timeout(struct request_queue *, unsigned int);
 extern void blk_queue_write_cache(struct request_queue *q, bool enabled, bool fua);
+extern void blk_queue_required_elevator_features(struct request_queue *q,
+						 unsigned long features);
 
 /*
  * Number of physical segments as sent to the device.
diff --git a/include/linux/elevator.h b/include/linux/elevator.h
index 1dd014c9c87b..a99ca9979d71 100644
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -76,6 +76,7 @@ struct elevator_type
 	struct elv_fs_entry *elevator_attrs;
 	const char *elevator_name;
 	const char *elevator_alias;
+	const unsigned long elevator_features;
 	struct module *elevator_owner;
 #ifdef CONFIG_BLK_DEBUG_FS
 	const struct blk_mq_debugfs_attr *queue_debugfs_attrs;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/7] block: Introduce zoned block device elevator feature
  2019-08-23  0:15 [PATCH 0/7] Elevator cleanups and improvements Damien Le Moal
                   ` (3 preceding siblings ...)
  2019-08-23  0:15 ` [PATCH 4/7] block: Introduce elevator features Damien Le Moal
@ 2019-08-23  0:15 ` Damien Le Moal
  2019-08-23 11:41   ` Johannes Thumshirn
  2019-08-23  0:15 ` [PATCH 6/7] block: Improve default elevator selection Damien Le Moal
  2019-08-23  0:15 ` [PATCH 7/7] block: Delay default elevator initialization Damien Le Moal
  6 siblings, 1 reply; 14+ messages in thread
From: Damien Le Moal @ 2019-08-23  0:15 UTC (permalink / raw)
  To: linux-block, Jens Axboe

Introduce the elevator feature ELEVATOR_F_ZONED_BLOCK_DEV to indicate
that an elevator supports zoned block device write ordering control.

Mark the mq-deadline as supporting this feature which is implemented
using zone write locking. SCSI zoned block device scan and null_blk
device creation with zoned mode enabled are also modified to require
this feature using the helper blk_queue_required_elevator_features().

This requirement can always be satisfied as the mq-deadline scheduler is
always selected for in-kernel compilation when CONFIG_BLK_DEV_ZONED
(zoned block device support) is enabled.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/mq-deadline.c           | 1 +
 drivers/block/null_blk_main.c | 5 +++++
 drivers/scsi/sd_zbc.c         | 2 ++
 include/linux/elevator.h      | 7 +++++++
 4 files changed, 15 insertions(+)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 2a2a2e82832e..95e03408f2ac 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -795,6 +795,7 @@ static struct elevator_type mq_deadline = {
 	.elevator_attrs = deadline_attrs,
 	.elevator_name = "mq-deadline",
 	.elevator_alias = "deadline",
+	.elevator_features = ELEVATOR_F_ZONED_BLOCK_DEV,
 	.elevator_owner = THIS_MODULE,
 };
 MODULE_ALIAS("mq-deadline-iosched");
diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index 99c56d72ff78..253bb7b4443e 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -1538,6 +1538,9 @@ static int null_gendisk_register(struct nullb *nullb)
 
 		if (ret != 0)
 			return ret;
+
+		blk_queue_required_elevator_features(disk->queue,
+						ELEVATOR_F_ZONED_BLOCK_DEV);
 	}
 
 	add_disk(disk);
@@ -1691,6 +1694,8 @@ static int null_add_dev(struct nullb_device *dev)
 		blk_queue_chunk_sectors(nullb->q, dev->zone_size_sects);
 		nullb->q->limits.zoned = BLK_ZONED_HM;
 		blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, nullb->q);
+		blk_queue_required_elevator_features(nullb->q,
+						ELEVATOR_F_ZONED_BLOCK_DEV);
 	}
 
 	nullb->q->queuedata = nullb;
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 367614f0e34f..983f5d0be902 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -490,6 +490,8 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
 		goto err;
 
 	/* The drive satisfies the kernel restrictions: set it up */
+	blk_queue_required_elevator_features(sdkp->disk->queue,
+					     ELEVATOR_F_ZONED_BLOCK_DEV);
 	blk_queue_chunk_sectors(sdkp->disk->queue,
 			logical_to_sectors(sdkp->device, zone_blocks));
 	blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, sdkp->disk->queue);
diff --git a/include/linux/elevator.h b/include/linux/elevator.h
index a99ca9979d71..2b667cb23fc0 100644
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -166,5 +166,12 @@ extern struct request *elv_rb_find(struct rb_root *, sector_t);
 #define rq_entry_fifo(ptr)	list_entry((ptr), struct request, queuelist)
 #define rq_fifo_clear(rq)	list_del_init(&(rq)->queuelist)
 
+/*
+ * Elevator features.
+ */
+
+/* Supports zoned block devices sequential write */
+#define ELEVATOR_F_ZONED_BLOCK_DEV	(1UL << 0)
+
 #endif /* CONFIG_BLOCK */
 #endif
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 6/7] block: Improve default elevator selection
  2019-08-23  0:15 [PATCH 0/7] Elevator cleanups and improvements Damien Le Moal
                   ` (4 preceding siblings ...)
  2019-08-23  0:15 ` [PATCH 5/7] block: Introduce zoned block device elevator feature Damien Le Moal
@ 2019-08-23  0:15 ` Damien Le Moal
  2019-08-23  0:15 ` [PATCH 7/7] block: Delay default elevator initialization Damien Le Moal
  6 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2019-08-23  0:15 UTC (permalink / raw)
  To: linux-block, Jens Axboe

For block devices that do not specify required features, preserve the
current default elevator selection (mq-deadline for single queue
devices, none for multi-queue devices). For devices specifying a
required feature (e.g. zoned block devices ELEVATOR_F_ZONED_BLOCK_DEV
feature), select the first available elevator providing the required
features. In all cases, default to "none" if no elevator is available or
if the initialization of the default elevator fails.

This change allows in particular the selection of the correct
mq-deadline elevator for zoned block devices that are also multi-queue,
such as null_blk devices with zoned mode enabled or any SMR disk
connected to a smartpqi HBA (and exposed as multi-queue devices by the
HBA).

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/elevator.c | 50 +++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 45 insertions(+), 5 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index de11beb32893..ec75dfee7e96 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -626,14 +626,51 @@ static inline bool elv_support_iosched(struct request_queue *q)
 }
 
 /*
- * For blk-mq devices supporting IO scheduling, we default to using mq-deadline,
- * if available, for single queue devices. If deadline isn't available OR
- * deadline initialization fails OR we have multiple queues, default to "none".
+ * For single queue devices, default to using mq-deadline. If we have multiple
+ * queues or mq-deadline is not available, default to "none".
+ */
+static struct elevator_type *elevator_get_default(struct request_queue *q)
+{
+	if (q->nr_hw_queues != 1)
+		return NULL;
+
+	return elevator_get(q, "mq-deadline", false);
+}
+
+/*
+ * Get the first elevator providing the features required by the request queue.
+ * Default to "none" if no matching elevator is found.
+ */
+static struct elevator_type *elevator_get_by_features(struct request_queue *q)
+{
+	struct elevator_type *e;
+
+	spin_lock(&elv_list_lock);
+
+	list_for_each_entry(e, &elv_list, list) {
+		if (elv_support_features(e->elevator_features,
+					 q->required_elevator_features))
+			break;
+	}
+
+	if (e && !try_module_get(e->elevator_owner))
+		e = NULL;
+
+	spin_unlock(&elv_list_lock);
+
+	return e;
+}
+
+/*
+ * For a device queue that has no required features, use the default elevator
+ * settings. Otherwise, use the first elevator available matching the required
+ * features. If no suitable elevator is find or if the chosen elevator
+ * initialization fails, fall back to the "none" elevator (no elevator).
  */
 void elevator_init_mq(struct request_queue *q)
 {
 	struct elevator_type *e;
-	int err = 0;
+	int err;
 
 	if (!elv_support_iosched(q))
 		return;
@@ -644,7 +681,10 @@ void elevator_init_mq(struct request_queue *q)
 	if (unlikely(q->elevator))
 		return;
 
-	e = elevator_get(q, "mq-deadline", false);
+	if (!q->required_elevator_features)
+		e = elevator_get_default(q);
+	else
+		e = elevator_get_by_features(q);
 	if (!e)
 		return;
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 7/7] block: Delay default elevator initialization
  2019-08-23  0:15 [PATCH 0/7] Elevator cleanups and improvements Damien Le Moal
                   ` (5 preceding siblings ...)
  2019-08-23  0:15 ` [PATCH 6/7] block: Improve default elevator selection Damien Le Moal
@ 2019-08-23  0:15 ` Damien Le Moal
  6 siblings, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2019-08-23  0:15 UTC (permalink / raw)
  To: linux-block, Jens Axboe

When elevator_init_mq() is called from blk_mq_init_allocated_queue(),
the only information known about the device is the number of hardware
queues as the block device scan by the device driver is not completed
yet. The device type and the device required features are not set yet,
preventing to correctly choose the default elevator most suitable for
the device.

This currently affects all multi-queue zoned block devices which default
to the "none" elevator instead of the required "mq-deadline" elevator.
These drives currently include host-managed SMR disks connected to a
smartpqi HBA and null_blk block devices with zoned mode enabled.
Upcoming NVMe Zoned Namespace devices will also be affected.

Fix this by moving the execution of elevator_init_mq() from
blk_mq_init_allocated_queue() into __device_add_disk() to allow for the
device driver to probe the device characteristics and set attributes
of the device request queue prior to the elevator initialization.

Also to make sure that the elevator initialization is never done while
requests are in-flight (there should be none when the device driver
calls device_add_disk()), freeze and quiesce the device request queue
before executing blk_mq_init_sched().

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/blk-mq.c   | 2 --
 block/elevator.c | 7 +++++++
 block/genhd.c    | 3 +++
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 274e168c8535..34e9541945dc 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2906,8 +2906,6 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	blk_mq_add_queue_tag_set(set, q);
 	blk_mq_map_swqueue(q);
 
-	elevator_init_mq(q);
-
 	return q;
 
 err_hctxs:
diff --git a/block/elevator.c b/block/elevator.c
index ec75dfee7e96..9218bc86845f 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -688,7 +688,14 @@ void elevator_init_mq(struct request_queue *q)
 	if (!e)
 		return;
 
+	blk_mq_freeze_queue(q);
+	blk_mq_quiesce_queue(q);
+
 	err = blk_mq_init_sched(q, e);
+
+	blk_mq_unquiesce_queue(q);
+	blk_mq_unfreeze_queue(q);
+
 	if (err) {
 		pr_warn("\"%s\" elevator initialization failed, "
 			"falling back to \"none\"\n", e->elevator_name);
diff --git a/block/genhd.c b/block/genhd.c
index 54f1f0d381f4..d2114c25dccd 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -734,6 +734,9 @@ static void __device_add_disk(struct device *parent, struct gendisk *disk,
 				    exact_match, exact_lock, disk);
 	}
 	register_disk(parent, disk, groups);
+
+	elevator_init_mq(disk->queue);
+
 	if (register_queue)
 		blk_register_queue(disk);
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/7] block: Cleanup elevator_init_mq() use
  2019-08-23  0:15 ` [PATCH 1/7] block: Cleanup elevator_init_mq() use Damien Le Moal
@ 2019-08-23  8:08   ` Johannes Thumshirn
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2019-08-23  8:08 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/7] block: Change elevator_init_mq() to always succeed
  2019-08-23  0:15 ` [PATCH 2/7] block: Change elevator_init_mq() to always succeed Damien Le Moal
@ 2019-08-23  8:54   ` Johannes Thumshirn
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2019-08-23  8:54 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/7] block: Remove sysfs lock from elevator_init_rq()
  2019-08-23  0:15 ` [PATCH 3/7] block: Remove sysfs lock from elevator_init_rq() Damien Le Moal
@ 2019-08-23  9:00   ` Johannes Thumshirn
  2019-08-23 20:32   ` Bart Van Assche
  1 sibling, 0 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2019-08-23  9:00 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe

On 23/08/2019 02:15, Damien Le Moal wrote:
> Since elevator_init_rq() is called before the device queue is registered
elevator_init_mq() ~^ (and in Subject)

Otherwise:
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/7] block: Introduce elevator features
  2019-08-23  0:15 ` [PATCH 4/7] block: Introduce elevator features Damien Le Moal
@ 2019-08-23 11:19   ` Johannes Thumshirn
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2019-08-23 11:19 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe

On 23/08/2019 02:15, Damien Le Moal wrote:
> -static struct elevator_type *elevator_find(const char *name)
> +static struct elevator_type *elevator_find(const char *name,
> +					   unsigned long required_features)

[...]
> +	const unsigned long elevator_features;


Wouldn't it make more sens to have a defined number of bits for the
features 'unsigned long' means 64 possible features on 64bit and 32 on
32 bit? Better make it explicit (u64 or u32 depending on how many
features you think you'll need).

Byte,
	Johannes
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/7] block: Introduce zoned block device elevator feature
  2019-08-23  0:15 ` [PATCH 5/7] block: Introduce zoned block device elevator feature Damien Le Moal
@ 2019-08-23 11:41   ` Johannes Thumshirn
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2019-08-23 11:41 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe

On 23/08/2019 02:15, Damien Le Moal wrote:
> Introduce the elevator feature ELEVATOR_F_ZONED_BLOCK_DEV to indicate
> that an elevator supports zoned block device write ordering control.
> 
> Mark the mq-deadline as supporting this feature which is implemented
> using zone write locking. SCSI zoned block device scan and null_blk
> device creation with zoned mode enabled are also modified to require
> this feature using the helper blk_queue_required_elevator_features().
> 
> This requirement can always be satisfied as the mq-deadline scheduler is
> always selected for in-kernel compilation when CONFIG_BLK_DEV_ZONED
> (zoned block device support) is enabled.

Hmm I would have preferred this to be split into two patches, one
introducing the features (not just ELEVATOR_F_ZONED_BLOCK_DEV but also
the features you describe in 4/7) and mark all schedulers with the
feature bits and then one pulling the requirement of
ELEVATOR_F_ZONED_BLOCK_DEV in null_blk and sd.


-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/7] block: Remove sysfs lock from elevator_init_rq()
  2019-08-23  0:15 ` [PATCH 3/7] block: Remove sysfs lock from elevator_init_rq() Damien Le Moal
  2019-08-23  9:00   ` Johannes Thumshirn
@ 2019-08-23 20:32   ` Bart Van Assche
  1 sibling, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2019-08-23 20:32 UTC (permalink / raw)
  To: Damien Le Moal, linux-block, Jens Axboe

On 8/22/19 5:15 PM, Damien Le Moal wrote:
> Since elevator_init_rq() is called before the device queue is registered
> in sysfs, there is no possible conflict with elevator_switch(). Remove
> the unnecessary locking of q->sysfs_lock mutex.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>  block/elevator.c | 12 ++----------
>  1 file changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/block/elevator.c b/block/elevator.c
> index 7fff06751633..6208ddc334ef 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -617,17 +617,12 @@ void elevator_init_mq(struct request_queue *q)
>  	if (q->nr_hw_queues != 1)
>  		return;
>  
> -	/*
> -	 * q->sysfs_lock must be held to provide mutual exclusion between
> -	 * elevator_switch() and here.
> -	 */
> -	mutex_lock(&q->sysfs_lock);
>  	if (unlikely(q->elevator))
> -		goto out_unlock;
> +		return;
>  
>  	e = elevator_get(q, "mq-deadline", false);
>  	if (!e)
> -		goto out_unlock;
> +		return;
>  
>  	err = blk_mq_init_sched(q, e);
>  	if (err) {
> @@ -635,9 +630,6 @@ void elevator_init_mq(struct request_queue *q)
>  			"falling back to \"none\"\n", e->elevator_name);
>  		elevator_put(e);
>  	}
> -
> -out_unlock:
> -	mutex_unlock(&q->sysfs_lock);
>  }

Please consider to add a WARN_ON_ONCE() statement that triggers a
warning if this function is called for a request queue that has already
been registered.

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-08-23 20:32 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-23  0:15 [PATCH 0/7] Elevator cleanups and improvements Damien Le Moal
2019-08-23  0:15 ` [PATCH 1/7] block: Cleanup elevator_init_mq() use Damien Le Moal
2019-08-23  8:08   ` Johannes Thumshirn
2019-08-23  0:15 ` [PATCH 2/7] block: Change elevator_init_mq() to always succeed Damien Le Moal
2019-08-23  8:54   ` Johannes Thumshirn
2019-08-23  0:15 ` [PATCH 3/7] block: Remove sysfs lock from elevator_init_rq() Damien Le Moal
2019-08-23  9:00   ` Johannes Thumshirn
2019-08-23 20:32   ` Bart Van Assche
2019-08-23  0:15 ` [PATCH 4/7] block: Introduce elevator features Damien Le Moal
2019-08-23 11:19   ` Johannes Thumshirn
2019-08-23  0:15 ` [PATCH 5/7] block: Introduce zoned block device elevator feature Damien Le Moal
2019-08-23 11:41   ` Johannes Thumshirn
2019-08-23  0:15 ` [PATCH 6/7] block: Improve default elevator selection Damien Le Moal
2019-08-23  0:15 ` [PATCH 7/7] block: Delay default elevator initialization Damien Le Moal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).