linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] blk-mq: support nested queue quiescing
@ 2021-09-29  4:15 Ming Lei
  2021-09-29  4:15 ` [PATCH 1/5] nvme: add APIs for stopping/starting admin queue Ming Lei
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Ming Lei @ 2021-09-29  4:15 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block
  Cc: Sagi Grimberg, linux-nvme, Keith Busch, Ming Lei

Hello,

request queue quiescing has been applied in lots of block drivers and
block core from different/un-related code paths. So far, both quiesce
and unquiesce changes the queue state unconditionally. This way has
caused trouble, such as, driver is quiescing queue for its own purpose,
however block core's queue quiesce may come because of elevator switch,
updating nr_requests or other queue attributes, then un-expected
unquiesce may come too early.

It has been observed kernel panic when running stress test on dm-mpath
suspend and updating nr_requests.

Fix the issue by supporting nested queue quiescing. But nvme has very
complicated uses on quiesce/unquiesce, which two may not be called in
pairing, so switch to this way in patch 1~4, and patch 5 provides
nested queue quiesce.


Ming Lei (5):
  nvme: add APIs for stopping/starting admin queue
  nvme: apply nvme API to quiesce/unquiesce admin queue
  nvme: prepare for pairing quiescing and unquiescing
  nvme: paring quiesce/unquiesce
  blk-mq: support nested blk_mq_quiesce_queue()

 block/blk-mq.c             |  20 +++++--
 drivers/nvme/host/core.c   | 107 +++++++++++++++++++++++++++++--------
 drivers/nvme/host/fc.c     |   8 +--
 drivers/nvme/host/nvme.h   |   6 +++
 drivers/nvme/host/pci.c    |   8 +--
 drivers/nvme/host/rdma.c   |  14 ++---
 drivers/nvme/host/tcp.c    |  16 +++---
 drivers/nvme/target/loop.c |   4 +-
 include/linux/blkdev.h     |   2 +
 9 files changed, 135 insertions(+), 50 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/5] nvme: add APIs for stopping/starting admin queue
  2021-09-29  4:15 [PATCH 0/5] blk-mq: support nested queue quiescing Ming Lei
@ 2021-09-29  4:15 ` Ming Lei
  2021-09-29  4:15 ` [PATCH 2/5] nvme: apply nvme API to quiesce/unquiesce " Ming Lei
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2021-09-29  4:15 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block
  Cc: Sagi Grimberg, linux-nvme, Keith Busch, Ming Lei

Add two APIs for stopping and starting admin queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c | 12 ++++++++++++
 drivers/nvme/host/nvme.h |  2 ++
 2 files changed, 14 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7fa75433c036..c675eef70a63 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4563,6 +4563,18 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_start_queues);
 
+void nvme_stop_admin_queue(struct nvme_ctrl *ctrl)
+{
+	blk_mq_quiesce_queue(ctrl->admin_q);
+}
+EXPORT_SYMBOL_GPL(nvme_stop_admin_queue);
+
+void nvme_start_admin_queue(struct nvme_ctrl *ctrl)
+{
+	blk_mq_unquiesce_queue(ctrl->admin_q);
+}
+EXPORT_SYMBOL_GPL(nvme_start_admin_queue);
+
 void nvme_sync_io_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 9871c0c9374c..47877a5f1515 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -659,6 +659,8 @@ void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
 
 void nvme_stop_queues(struct nvme_ctrl *ctrl);
 void nvme_start_queues(struct nvme_ctrl *ctrl);
+void nvme_stop_admin_queue(struct nvme_ctrl *ctrl);
+void nvme_start_admin_queue(struct nvme_ctrl *ctrl);
 void nvme_kill_queues(struct nvme_ctrl *ctrl);
 void nvme_sync_queues(struct nvme_ctrl *ctrl);
 void nvme_sync_io_queues(struct nvme_ctrl *ctrl);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/5] nvme: apply nvme API to quiesce/unquiesce admin queue
  2021-09-29  4:15 [PATCH 0/5] blk-mq: support nested queue quiescing Ming Lei
  2021-09-29  4:15 ` [PATCH 1/5] nvme: add APIs for stopping/starting admin queue Ming Lei
@ 2021-09-29  4:15 ` Ming Lei
  2021-09-29  4:15 ` [PATCH 3/5] nvme: prepare for pairing quiescing and unquiescing Ming Lei
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2021-09-29  4:15 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block
  Cc: Sagi Grimberg, linux-nvme, Keith Busch, Ming Lei

Apply the added two APIs to quiesce/unquiesce admin queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c   |  2 +-
 drivers/nvme/host/fc.c     |  8 ++++----
 drivers/nvme/host/pci.c    |  8 ++++----
 drivers/nvme/host/rdma.c   | 14 +++++++-------
 drivers/nvme/host/tcp.c    | 16 ++++++++--------
 drivers/nvme/target/loop.c |  4 ++--
 6 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c675eef70a63..91d91d4f76d7 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4484,7 +4484,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 
 	/* Forcibly unquiesce queues to avoid blocking dispatch */
 	if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
-		blk_mq_unquiesce_queue(ctrl->admin_q);
+		nvme_start_admin_queue(ctrl);
 
 	list_for_each_entry(ns, &ctrl->namespaces, list)
 		nvme_set_queue_dying(ns);
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index aa14ad963d91..580a216da619 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2382,7 +2382,7 @@ nvme_fc_ctrl_free(struct kref *ref)
 	list_del(&ctrl->ctrl_list);
 	spin_unlock_irqrestore(&ctrl->rport->lock, flags);
 
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 	blk_cleanup_queue(ctrl->ctrl.admin_q);
 	blk_cleanup_queue(ctrl->ctrl.fabrics_q);
 	blk_mq_free_tag_set(&ctrl->admin_tag_set);
@@ -2510,7 +2510,7 @@ __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
 	/*
 	 * clean up the admin queue. Same thing as above.
 	 */
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	blk_sync_queue(ctrl->ctrl.admin_q);
 	blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
 				nvme_fc_terminate_exchange, &ctrl->ctrl);
@@ -3095,7 +3095,7 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 	ctrl->ctrl.max_hw_sectors = ctrl->ctrl.max_segments <<
 						(ilog2(SZ_4K) - 9);
 
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 
 	ret = nvme_init_ctrl_finish(&ctrl->ctrl);
 	if (ret || test_bit(ASSOC_FAILED, &ctrl->flags))
@@ -3249,7 +3249,7 @@ nvme_fc_delete_association(struct nvme_fc_ctrl *ctrl)
 	nvme_fc_free_queue(&ctrl->queues[0]);
 
 	/* re-enable the admin_q so anything new can fast fail */
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 
 	/* resume the io queues so that things will fast fail */
 	nvme_start_queues(&ctrl->ctrl);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ca5bda26226a..fa3efb003aa6 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1396,7 +1396,7 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
 
 	nvmeq->dev->online_queues--;
 	if (!nvmeq->qid && nvmeq->dev->ctrl.admin_q)
-		blk_mq_quiesce_queue(nvmeq->dev->ctrl.admin_q);
+		nvme_stop_admin_queue(&nvmeq->dev->ctrl);
 	if (!test_and_clear_bit(NVMEQ_POLLED, &nvmeq->flags))
 		pci_free_irq(to_pci_dev(nvmeq->dev->dev), nvmeq->cq_vector, nvmeq);
 	return 0;
@@ -1655,7 +1655,7 @@ static void nvme_dev_remove_admin(struct nvme_dev *dev)
 		 * user requests may be waiting on a stopped queue. Start the
 		 * queue to flush these to completion.
 		 */
-		blk_mq_unquiesce_queue(dev->ctrl.admin_q);
+		nvme_start_admin_queue(&dev->ctrl);
 		blk_cleanup_queue(dev->ctrl.admin_q);
 		blk_mq_free_tag_set(&dev->admin_tagset);
 	}
@@ -1689,7 +1689,7 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
 			return -ENODEV;
 		}
 	} else
-		blk_mq_unquiesce_queue(dev->ctrl.admin_q);
+		nvme_start_admin_queue(&dev->ctrl);
 
 	return 0;
 }
@@ -2624,7 +2624,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	if (shutdown) {
 		nvme_start_queues(&dev->ctrl);
 		if (dev->ctrl.admin_q && !blk_queue_dying(dev->ctrl.admin_q))
-			blk_mq_unquiesce_queue(dev->ctrl.admin_q);
+			nvme_start_admin_queue(&dev->ctrl);
 	}
 	mutex_unlock(&dev->shutdown_lock);
 }
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 40317e1b9183..799a35558986 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -919,7 +919,7 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 	else
 		ctrl->ctrl.max_integrity_segments = 0;
 
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 
 	error = nvme_init_ctrl_finish(&ctrl->ctrl);
 	if (error)
@@ -928,7 +928,7 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 	return 0;
 
 out_quiesce_queue:
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	blk_sync_queue(ctrl->ctrl.admin_q);
 out_stop_queue:
 	nvme_rdma_stop_queue(&ctrl->queues[0]);
@@ -1026,12 +1026,12 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 static void nvme_rdma_teardown_admin_queue(struct nvme_rdma_ctrl *ctrl,
 		bool remove)
 {
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	blk_sync_queue(ctrl->ctrl.admin_q);
 	nvme_rdma_stop_queue(&ctrl->queues[0]);
 	nvme_cancel_admin_tagset(&ctrl->ctrl);
 	if (remove)
-		blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+		nvme_start_admin_queue(&ctrl->ctrl);
 	nvme_rdma_destroy_admin_queue(ctrl, remove);
 }
 
@@ -1154,7 +1154,7 @@ static int nvme_rdma_setup_ctrl(struct nvme_rdma_ctrl *ctrl, bool new)
 		nvme_rdma_destroy_io_queues(ctrl, new);
 	}
 destroy_admin:
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	blk_sync_queue(ctrl->ctrl.admin_q);
 	nvme_rdma_stop_queue(&ctrl->queues[0]);
 	nvme_cancel_admin_tagset(&ctrl->ctrl);
@@ -1194,7 +1194,7 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
 	nvme_rdma_teardown_io_queues(ctrl, false);
 	nvme_start_queues(&ctrl->ctrl);
 	nvme_rdma_teardown_admin_queue(ctrl, false);
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 
 	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
 		/* state change failure is ok if we started ctrl delete */
@@ -2232,7 +2232,7 @@ static void nvme_rdma_shutdown_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
 	cancel_delayed_work_sync(&ctrl->reconnect_work);
 
 	nvme_rdma_teardown_io_queues(ctrl, shutdown);
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	if (shutdown)
 		nvme_shutdown_ctrl(&ctrl->ctrl);
 	else
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 3c1c29dd3020..4109b726e052 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1915,7 +1915,7 @@ static int nvme_tcp_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 	if (error)
 		goto out_stop_queue;
 
-	blk_mq_unquiesce_queue(ctrl->admin_q);
+	nvme_start_admin_queue(ctrl);
 
 	error = nvme_init_ctrl_finish(ctrl);
 	if (error)
@@ -1924,7 +1924,7 @@ static int nvme_tcp_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 	return 0;
 
 out_quiesce_queue:
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	nvme_stop_admin_queue(ctrl);
 	blk_sync_queue(ctrl->admin_q);
 out_stop_queue:
 	nvme_tcp_stop_queue(ctrl, 0);
@@ -1946,12 +1946,12 @@ static int nvme_tcp_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 static void nvme_tcp_teardown_admin_queue(struct nvme_ctrl *ctrl,
 		bool remove)
 {
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	nvme_stop_admin_queue(ctrl);
 	blk_sync_queue(ctrl->admin_q);
 	nvme_tcp_stop_queue(ctrl, 0);
 	nvme_cancel_admin_tagset(ctrl);
 	if (remove)
-		blk_mq_unquiesce_queue(ctrl->admin_q);
+		nvme_start_admin_queue(ctrl);
 	nvme_tcp_destroy_admin_queue(ctrl, remove);
 }
 
@@ -1960,7 +1960,7 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
 {
 	if (ctrl->queue_count <= 1)
 		return;
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	nvme_stop_admin_queue(ctrl);
 	nvme_start_freeze(ctrl);
 	nvme_stop_queues(ctrl);
 	nvme_sync_io_queues(ctrl);
@@ -2055,7 +2055,7 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl *ctrl, bool new)
 		nvme_tcp_destroy_io_queues(ctrl, new);
 	}
 destroy_admin:
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	nvme_stop_admin_queue(ctrl);
 	blk_sync_queue(ctrl->admin_q);
 	nvme_tcp_stop_queue(ctrl, 0);
 	nvme_cancel_admin_tagset(ctrl);
@@ -2098,7 +2098,7 @@ static void nvme_tcp_error_recovery_work(struct work_struct *work)
 	/* unquiesce to fail fast pending requests */
 	nvme_start_queues(ctrl);
 	nvme_tcp_teardown_admin_queue(ctrl, false);
-	blk_mq_unquiesce_queue(ctrl->admin_q);
+	nvme_start_admin_queue(ctrl);
 
 	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) {
 		/* state change failure is ok if we started ctrl delete */
@@ -2116,7 +2116,7 @@ static void nvme_tcp_teardown_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
 	cancel_delayed_work_sync(&to_tcp_ctrl(ctrl)->connect_work);
 
 	nvme_tcp_teardown_io_queues(ctrl, shutdown);
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	nvme_stop_admin_queue(ctrl);
 	if (shutdown)
 		nvme_shutdown_ctrl(ctrl);
 	else
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 0285ccc7541f..440e1544033b 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -398,7 +398,7 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 	ctrl->ctrl.max_hw_sectors =
 		(NVME_LOOP_MAX_SEGMENTS - 1) << (PAGE_SHIFT - 9);
 
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 
 	error = nvme_init_ctrl_finish(&ctrl->ctrl);
 	if (error)
@@ -428,7 +428,7 @@ static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
 		nvme_loop_destroy_io_queues(ctrl);
 	}
 
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	if (ctrl->ctrl.state == NVME_CTRL_LIVE)
 		nvme_shutdown_ctrl(&ctrl->ctrl);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/5] nvme: prepare for pairing quiescing and unquiescing
  2021-09-29  4:15 [PATCH 0/5] blk-mq: support nested queue quiescing Ming Lei
  2021-09-29  4:15 ` [PATCH 1/5] nvme: add APIs for stopping/starting admin queue Ming Lei
  2021-09-29  4:15 ` [PATCH 2/5] nvme: apply nvme API to quiesce/unquiesce " Ming Lei
@ 2021-09-29  4:15 ` Ming Lei
  2021-09-29  4:15 ` [PATCH 4/5] nvme: paring quiesce/unquiesce Ming Lei
  2021-09-29  4:15 ` [PATCH 5/5] blk-mq: support nested blk_mq_quiesce_queue() Ming Lei
  4 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2021-09-29  4:15 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block
  Cc: Sagi Grimberg, linux-nvme, Keith Busch, Ming Lei

Add two helpers so that we can prepare for pairing quiescing and
unquiescing which will be done in next patch.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c | 52 ++++++++++++++++++++++++----------------
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 91d91d4f76d7..23fb746a8970 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -119,25 +119,6 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
 static void nvme_update_keep_alive(struct nvme_ctrl *ctrl,
 				   struct nvme_command *cmd);
 
-/*
- * Prepare a queue for teardown.
- *
- * This must forcibly unquiesce queues to avoid blocking dispatch, and only set
- * the capacity to 0 after that to avoid blocking dispatchers that may be
- * holding bd_butex.  This will end buffered writers dirtying pages that can't
- * be synced.
- */
-static void nvme_set_queue_dying(struct nvme_ns *ns)
-{
-	if (test_and_set_bit(NVME_NS_DEAD, &ns->flags))
-		return;
-
-	blk_set_queue_dying(ns->queue);
-	blk_mq_unquiesce_queue(ns->queue);
-
-	set_capacity_and_notify(ns->disk, 0);
-}
-
 void nvme_queue_scan(struct nvme_ctrl *ctrl)
 {
 	/*
@@ -4469,6 +4450,35 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 }
 EXPORT_SYMBOL_GPL(nvme_init_ctrl);
 
+static void nvme_start_ns_queue(struct nvme_ns *ns)
+{
+	blk_mq_unquiesce_queue(ns->queue);
+}
+
+static void nvme_stop_ns_queue(struct nvme_ns *ns)
+{
+	blk_mq_quiesce_queue(ns->queue);
+}
+
+/*
+ * Prepare a queue for teardown.
+ *
+ * This must forcibly unquiesce queues to avoid blocking dispatch, and only set
+ * the capacity to 0 after that to avoid blocking dispatchers that may be
+ * holding bd_butex.  This will end buffered writers dirtying pages that can't
+ * be synced.
+ */
+static void nvme_set_queue_dying(struct nvme_ns *ns)
+{
+	if (test_and_set_bit(NVME_NS_DEAD, &ns->flags))
+		return;
+
+	blk_set_queue_dying(ns->queue);
+	nvme_start_ns_queue(ns);
+
+	set_capacity_and_notify(ns->disk, 0);
+}
+
 /**
  * nvme_kill_queues(): Ends all namespace queues
  * @ctrl: the dead controller that needs to end
@@ -4547,7 +4557,7 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 
 	down_read(&ctrl->namespaces_rwsem);
 	list_for_each_entry(ns, &ctrl->namespaces, list)
-		blk_mq_quiesce_queue(ns->queue);
+		nvme_stop_ns_queue(ns);
 	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_stop_queues);
@@ -4558,7 +4568,7 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 
 	down_read(&ctrl->namespaces_rwsem);
 	list_for_each_entry(ns, &ctrl->namespaces, list)
-		blk_mq_unquiesce_queue(ns->queue);
+		nvme_start_ns_queue(ns);
 	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_start_queues);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/5] nvme: paring quiesce/unquiesce
  2021-09-29  4:15 [PATCH 0/5] blk-mq: support nested queue quiescing Ming Lei
                   ` (2 preceding siblings ...)
  2021-09-29  4:15 ` [PATCH 3/5] nvme: prepare for pairing quiescing and unquiescing Ming Lei
@ 2021-09-29  4:15 ` Ming Lei
  2021-09-29 11:49   ` Sagi Grimberg
  2021-09-29  4:15 ` [PATCH 5/5] blk-mq: support nested blk_mq_quiesce_queue() Ming Lei
  4 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2021-09-29  4:15 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block
  Cc: Sagi Grimberg, linux-nvme, Keith Busch, Ming Lei

The current blk_mq_quiesce_queue() and blk_mq_unquiesce_queue() always
stops and starts the queue unconditionally. And there can be concurrent
quiesce/unquiesce coming from different unrelated code paths, so
unquiesce may come unexpectedly and start queue too early.

Prepare for supporting nested / concurrent quiesce/unquiesce, so that we
can address the above issue.

NVMe has very complicated quiesce/unquiesce use pattern, add one mutex
and queue stopped state in nvme_ctrl, so that we can make sure that
quiece/unquiesce is called in pair.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c | 51 ++++++++++++++++++++++++++++++++++++----
 drivers/nvme/host/nvme.h |  4 ++++
 2 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 23fb746a8970..5d0b2eb38e43 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4375,6 +4375,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 	clear_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags);
 	spin_lock_init(&ctrl->lock);
 	mutex_init(&ctrl->scan_lock);
+	mutex_init(&ctrl->queues_stop_lock);
 	INIT_LIST_HEAD(&ctrl->namespaces);
 	xa_init(&ctrl->cels);
 	init_rwsem(&ctrl->namespaces_rwsem);
@@ -4450,14 +4451,44 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 }
 EXPORT_SYMBOL_GPL(nvme_init_ctrl);
 
+static void __nvme_stop_admin_queue(struct nvme_ctrl *ctrl)
+{
+	lockdep_assert_held(&ctrl->queues_stop_lock);
+
+	if (!ctrl->admin_queue_stopped) {
+		blk_mq_quiesce_queue(ctrl->admin_q);
+		ctrl->admin_queue_stopped = true;
+	}
+}
+
+static void __nvme_start_admin_queue(struct nvme_ctrl *ctrl)
+{
+	lockdep_assert_held(&ctrl->queues_stop_lock);
+
+	if (ctrl->admin_queue_stopped) {
+		blk_mq_unquiesce_queue(ctrl->admin_q);
+		ctrl->admin_queue_stopped = false;
+	}
+}
+
 static void nvme_start_ns_queue(struct nvme_ns *ns)
 {
-	blk_mq_unquiesce_queue(ns->queue);
+	lockdep_assert_held(&ns->ctrl->queues_stop_lock);
+
+	if (test_bit(NVME_NS_STOPPED, &ns->flags)) {
+		blk_mq_unquiesce_queue(ns->queue);
+		clear_bit(NVME_NS_STOPPED, &ns->flags);
+	}
 }
 
 static void nvme_stop_ns_queue(struct nvme_ns *ns)
 {
-	blk_mq_quiesce_queue(ns->queue);
+	lockdep_assert_held(&ns->ctrl->queues_stop_lock);
+
+	if (!test_bit(NVME_NS_STOPPED, &ns->flags)) {
+		blk_mq_quiesce_queue(ns->queue);
+		set_bit(NVME_NS_STOPPED, &ns->flags);
+	}
 }
 
 /*
@@ -4490,16 +4521,18 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
 
+	mutex_lock(&ctrl->queues_stop_lock);
 	down_read(&ctrl->namespaces_rwsem);
 
 	/* Forcibly unquiesce queues to avoid blocking dispatch */
 	if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
-		nvme_start_admin_queue(ctrl);
+		__nvme_start_admin_queue(ctrl);
 
 	list_for_each_entry(ns, &ctrl->namespaces, list)
 		nvme_set_queue_dying(ns);
 
 	up_read(&ctrl->namespaces_rwsem);
+	mutex_unlock(&ctrl->queues_stop_lock);
 }
 EXPORT_SYMBOL_GPL(nvme_kill_queues);
 
@@ -4555,10 +4588,12 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
 
+	mutex_lock(&ctrl->queues_stop_lock);
 	down_read(&ctrl->namespaces_rwsem);
 	list_for_each_entry(ns, &ctrl->namespaces, list)
 		nvme_stop_ns_queue(ns);
 	up_read(&ctrl->namespaces_rwsem);
+	mutex_unlock(&ctrl->queues_stop_lock);
 }
 EXPORT_SYMBOL_GPL(nvme_stop_queues);
 
@@ -4566,22 +4601,28 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
 
+	mutex_lock(&ctrl->queues_stop_lock);
 	down_read(&ctrl->namespaces_rwsem);
 	list_for_each_entry(ns, &ctrl->namespaces, list)
 		nvme_start_ns_queue(ns);
 	up_read(&ctrl->namespaces_rwsem);
+	mutex_unlock(&ctrl->queues_stop_lock);
 }
 EXPORT_SYMBOL_GPL(nvme_start_queues);
 
 void nvme_stop_admin_queue(struct nvme_ctrl *ctrl)
 {
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	mutex_lock(&ctrl->queues_stop_lock);
+	__nvme_stop_admin_queue(ctrl);
+	mutex_unlock(&ctrl->queues_stop_lock);
 }
 EXPORT_SYMBOL_GPL(nvme_stop_admin_queue);
 
 void nvme_start_admin_queue(struct nvme_ctrl *ctrl)
 {
-	blk_mq_unquiesce_queue(ctrl->admin_q);
+	mutex_lock(&ctrl->queues_stop_lock);
+	__nvme_start_admin_queue(ctrl);
+	mutex_unlock(&ctrl->queues_stop_lock);
 }
 EXPORT_SYMBOL_GPL(nvme_start_admin_queue);
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 47877a5f1515..3a02a370f025 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -341,6 +341,9 @@ struct nvme_ctrl {
 	struct page *discard_page;
 	unsigned long discard_page_busy;
 
+	bool	admin_queue_stopped;
+	struct mutex	queues_stop_lock;
+
 	struct nvme_fault_inject fault_inject;
 };
 
@@ -457,6 +460,7 @@ struct nvme_ns {
 #define NVME_NS_ANA_PENDING	2
 #define NVME_NS_FORCE_RO	3
 #define NVME_NS_READY		4
+#define NVME_NS_STOPPED		5
 
 	struct cdev		cdev;
 	struct device		cdev_device;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/5] blk-mq: support nested blk_mq_quiesce_queue()
  2021-09-29  4:15 [PATCH 0/5] blk-mq: support nested queue quiescing Ming Lei
                   ` (3 preceding siblings ...)
  2021-09-29  4:15 ` [PATCH 4/5] nvme: paring quiesce/unquiesce Ming Lei
@ 2021-09-29  4:15 ` Ming Lei
  2021-09-29 11:53   ` Sagi Grimberg
  4 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2021-09-29  4:15 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block
  Cc: Sagi Grimberg, linux-nvme, Keith Busch, Ming Lei

Turns out that blk_mq_freeze_queue() isn't stronger[1] than
blk_mq_quiesce_queue() because dispatch may still be in-progress after
queue is frozen, and in several cases, such as switching io scheduler,
updating nr_requests & wbt latency, we still need to quiesce queue as a
supplement of freezing queue.

As we need to extend uses of blk_mq_quiesce_queue(), it is inevitable
for us to need support nested quiesce, especailly we can't let
unquiesce happen when there is quiesce originated from other contexts.

This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
and we only unquiesce queue when it is the last one from all contexts.

One kernel panic issue has been reported[2] when running stress test on
dm-mpath's updating nr_requests and suspending queue, and the similar
issue should exist on almost all drivers which use quiesce/unquiesce.

[1] https://marc.info/?l=linux-block&m=150993988115872&w=2
[2] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c         | 20 +++++++++++++++++---
 include/linux/blkdev.h |  2 ++
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 21bf4c3f0825..10f8a3d4e3a1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -209,7 +209,12 @@ EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
  */
 void blk_mq_quiesce_queue_nowait(struct request_queue *q)
 {
-	blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
+	unsigned long flags;
+
+	spin_lock_irqsave(&q->queue_lock, flags);
+	if (!q->quiesce_depth++)
+		blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
+	spin_unlock_irqrestore(&q->queue_lock, flags);
 }
 EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
 
@@ -250,10 +255,19 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
  */
 void blk_mq_unquiesce_queue(struct request_queue *q)
 {
-	blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
+	unsigned long flags;
+	bool run_queue = false;
+
+	spin_lock_irqsave(&q->queue_lock, flags);
+	if (q->quiesce_depth > 0 && !--q->quiesce_depth) {
+		blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
+		run_queue = true;
+	}
+	spin_unlock_irqrestore(&q->queue_lock, flags);
 
 	/* dispatch requests which are inserted during quiescing */
-	blk_mq_run_hw_queues(q, true);
+	if (run_queue)
+		blk_mq_run_hw_queues(q, true);
 }
 EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 0e960d74615e..74c60e2d61f9 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -315,6 +315,8 @@ struct request_queue {
 	 */
 	struct mutex		mq_freeze_lock;
 
+	int			quiesce_depth;
+
 	struct blk_mq_tag_set	*tag_set;
 	struct list_head	tag_set_list;
 	struct bio_set		bio_split;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/5] nvme: paring quiesce/unquiesce
  2021-09-29  4:15 ` [PATCH 4/5] nvme: paring quiesce/unquiesce Ming Lei
@ 2021-09-29 11:49   ` Sagi Grimberg
  2021-09-29 15:28     ` Ming Lei
  0 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2021-09-29 11:49 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, Christoph Hellwig, linux-block
  Cc: linux-nvme, Keith Busch



On 9/29/21 7:15 AM, Ming Lei wrote:
> The current blk_mq_quiesce_queue() and blk_mq_unquiesce_queue() always
> stops and starts the queue unconditionally. And there can be concurrent
> quiesce/unquiesce coming from different unrelated code paths, so
> unquiesce may come unexpectedly and start queue too early.
> 
> Prepare for supporting nested / concurrent quiesce/unquiesce, so that we
> can address the above issue.
> 
> NVMe has very complicated quiesce/unquiesce use pattern, add one mutex
> and queue stopped state in nvme_ctrl, so that we can make sure that
> quiece/unquiesce is called in pair.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>   drivers/nvme/host/core.c | 51 ++++++++++++++++++++++++++++++++++++----
>   drivers/nvme/host/nvme.h |  4 ++++
>   2 files changed, 50 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 23fb746a8970..5d0b2eb38e43 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4375,6 +4375,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
>   	clear_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags);
>   	spin_lock_init(&ctrl->lock);
>   	mutex_init(&ctrl->scan_lock);
> +	mutex_init(&ctrl->queues_stop_lock);
>   	INIT_LIST_HEAD(&ctrl->namespaces);
>   	xa_init(&ctrl->cels);
>   	init_rwsem(&ctrl->namespaces_rwsem);
> @@ -4450,14 +4451,44 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
>   }
>   EXPORT_SYMBOL_GPL(nvme_init_ctrl);
>   
> +static void __nvme_stop_admin_queue(struct nvme_ctrl *ctrl)
> +{
> +	lockdep_assert_held(&ctrl->queues_stop_lock);
> +
> +	if (!ctrl->admin_queue_stopped) {
> +		blk_mq_quiesce_queue(ctrl->admin_q);
> +		ctrl->admin_queue_stopped = true;
> +	}
> +}
> +
> +static void __nvme_start_admin_queue(struct nvme_ctrl *ctrl)
> +{
> +	lockdep_assert_held(&ctrl->queues_stop_lock);
> +
> +	if (ctrl->admin_queue_stopped) {
> +		blk_mq_unquiesce_queue(ctrl->admin_q);
> +		ctrl->admin_queue_stopped = false;
> +	}
> +}

I'd make this a bit we can flip atomically.

> +
>   static void nvme_start_ns_queue(struct nvme_ns *ns)
>   {
> -	blk_mq_unquiesce_queue(ns->queue);
> +	lockdep_assert_held(&ns->ctrl->queues_stop_lock);
> +
> +	if (test_bit(NVME_NS_STOPPED, &ns->flags)) {
> +		blk_mq_unquiesce_queue(ns->queue);
> +		clear_bit(NVME_NS_STOPPED, &ns->flags);
> +	}
>   }
>   
>   static void nvme_stop_ns_queue(struct nvme_ns *ns)
>   {
> -	blk_mq_quiesce_queue(ns->queue);
> +	lockdep_assert_held(&ns->ctrl->queues_stop_lock);
> +
> +	if (!test_bit(NVME_NS_STOPPED, &ns->flags)) {
> +		blk_mq_quiesce_queue(ns->queue);
> +		set_bit(NVME_NS_STOPPED, &ns->flags);
> +	}
>   }

Why not use test_and_set_bit/test_and_clear_bit for serialization?

>   
>   /*
> @@ -4490,16 +4521,18 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
>   {
>   	struct nvme_ns *ns;
>   
> +	mutex_lock(&ctrl->queues_stop_lock);
>   	down_read(&ctrl->namespaces_rwsem);
>   
>   	/* Forcibly unquiesce queues to avoid blocking dispatch */
>   	if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
> -		nvme_start_admin_queue(ctrl);
> +		__nvme_start_admin_queue(ctrl);
>   
>   	list_for_each_entry(ns, &ctrl->namespaces, list)
>   		nvme_set_queue_dying(ns);
>   
>   	up_read(&ctrl->namespaces_rwsem);
> +	mutex_unlock(&ctrl->queues_stop_lock);

This extra lock wrapping the namespaces_rwsem is scary. The
ordering rules are not clear to me.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 5/5] blk-mq: support nested blk_mq_quiesce_queue()
  2021-09-29  4:15 ` [PATCH 5/5] blk-mq: support nested blk_mq_quiesce_queue() Ming Lei
@ 2021-09-29 11:53   ` Sagi Grimberg
  2021-09-29 15:44     ` Ming Lei
  0 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2021-09-29 11:53 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, Christoph Hellwig, linux-block
  Cc: linux-nvme, Keith Busch



On 9/29/21 7:15 AM, Ming Lei wrote:
> Turns out that blk_mq_freeze_queue() isn't stronger[1] than
> blk_mq_quiesce_queue() because dispatch may still be in-progress after
> queue is frozen, and in several cases, such as switching io scheduler,
> updating nr_requests & wbt latency, we still need to quiesce queue as a
> supplement of freezing queue.
> 
> As we need to extend uses of blk_mq_quiesce_queue(), it is inevitable
> for us to need support nested quiesce, especailly we can't let
> unquiesce happen when there is quiesce originated from other contexts.

The serialization need is clear, but why is the nesting required?
In other words what is the harm is running the hw queue every time
we unquiesce?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/5] nvme: paring quiesce/unquiesce
  2021-09-29 11:49   ` Sagi Grimberg
@ 2021-09-29 15:28     ` Ming Lei
  0 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2021-09-29 15:28 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme, Keith Busch

On Wed, Sep 29, 2021 at 02:49:39PM +0300, Sagi Grimberg wrote:
> 
> 
> On 9/29/21 7:15 AM, Ming Lei wrote:
> > The current blk_mq_quiesce_queue() and blk_mq_unquiesce_queue() always
> > stops and starts the queue unconditionally. And there can be concurrent
> > quiesce/unquiesce coming from different unrelated code paths, so
> > unquiesce may come unexpectedly and start queue too early.
> > 
> > Prepare for supporting nested / concurrent quiesce/unquiesce, so that we
> > can address the above issue.
> > 
> > NVMe has very complicated quiesce/unquiesce use pattern, add one mutex
> > and queue stopped state in nvme_ctrl, so that we can make sure that
> > quiece/unquiesce is called in pair.
> > 
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >   drivers/nvme/host/core.c | 51 ++++++++++++++++++++++++++++++++++++----
> >   drivers/nvme/host/nvme.h |  4 ++++
> >   2 files changed, 50 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 23fb746a8970..5d0b2eb38e43 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -4375,6 +4375,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
> >   	clear_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags);
> >   	spin_lock_init(&ctrl->lock);
> >   	mutex_init(&ctrl->scan_lock);
> > +	mutex_init(&ctrl->queues_stop_lock);
> >   	INIT_LIST_HEAD(&ctrl->namespaces);
> >   	xa_init(&ctrl->cels);
> >   	init_rwsem(&ctrl->namespaces_rwsem);
> > @@ -4450,14 +4451,44 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
> >   }
> >   EXPORT_SYMBOL_GPL(nvme_init_ctrl);
> > +static void __nvme_stop_admin_queue(struct nvme_ctrl *ctrl)
> > +{
> > +	lockdep_assert_held(&ctrl->queues_stop_lock);
> > +
> > +	if (!ctrl->admin_queue_stopped) {
> > +		blk_mq_quiesce_queue(ctrl->admin_q);
> > +		ctrl->admin_queue_stopped = true;
> > +	}
> > +}
> > +
> > +static void __nvme_start_admin_queue(struct nvme_ctrl *ctrl)
> > +{
> > +	lockdep_assert_held(&ctrl->queues_stop_lock);
> > +
> > +	if (ctrl->admin_queue_stopped) {
> > +		blk_mq_unquiesce_queue(ctrl->admin_q);
> > +		ctrl->admin_queue_stopped = false;
> > +	}
> > +}
> 
> I'd make this a bit we can flip atomically.
> 
> > +
> >   static void nvme_start_ns_queue(struct nvme_ns *ns)
> >   {
> > -	blk_mq_unquiesce_queue(ns->queue);
> > +	lockdep_assert_held(&ns->ctrl->queues_stop_lock);
> > +
> > +	if (test_bit(NVME_NS_STOPPED, &ns->flags)) {
> > +		blk_mq_unquiesce_queue(ns->queue);
> > +		clear_bit(NVME_NS_STOPPED, &ns->flags);
> > +	}
> >   }
> >   static void nvme_stop_ns_queue(struct nvme_ns *ns)
> >   {
> > -	blk_mq_quiesce_queue(ns->queue);
> > +	lockdep_assert_held(&ns->ctrl->queues_stop_lock);
> > +
> > +	if (!test_bit(NVME_NS_STOPPED, &ns->flags)) {
> > +		blk_mq_quiesce_queue(ns->queue);
> > +		set_bit(NVME_NS_STOPPED, &ns->flags);
> > +	}
> >   }
> 
> Why not use test_and_set_bit/test_and_clear_bit for serialization?
> 
> >   /*
> > @@ -4490,16 +4521,18 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
> >   {
> >   	struct nvme_ns *ns;
> > +	mutex_lock(&ctrl->queues_stop_lock);
> >   	down_read(&ctrl->namespaces_rwsem);
> >   	/* Forcibly unquiesce queues to avoid blocking dispatch */
> >   	if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
> > -		nvme_start_admin_queue(ctrl);
> > +		__nvme_start_admin_queue(ctrl);
> >   	list_for_each_entry(ns, &ctrl->namespaces, list)
> >   		nvme_set_queue_dying(ns);
> >   	up_read(&ctrl->namespaces_rwsem);
> > +	mutex_unlock(&ctrl->queues_stop_lock);
> 
> This extra lock wrapping the namespaces_rwsem is scary. The
> ordering rules are not clear to me.

The rule is clear: queues_stop_lock has to be acquired before locking
->namespaces_rwsem.

Using test_and_set_bit/test_and_clear_bit could be enough for pairing
quiesce and unquiesce, I will try to remove the lock of
->queues_stop_lock in next version.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 5/5] blk-mq: support nested blk_mq_quiesce_queue()
  2021-09-29 11:53   ` Sagi Grimberg
@ 2021-09-29 15:44     ` Ming Lei
  0 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2021-09-29 15:44 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme, Keith Busch

On Wed, Sep 29, 2021 at 02:53:27PM +0300, Sagi Grimberg wrote:
> 
> 
> On 9/29/21 7:15 AM, Ming Lei wrote:
> > Turns out that blk_mq_freeze_queue() isn't stronger[1] than
> > blk_mq_quiesce_queue() because dispatch may still be in-progress after
> > queue is frozen, and in several cases, such as switching io scheduler,
> > updating nr_requests & wbt latency, we still need to quiesce queue as a
> > supplement of freezing queue.
> > 
> > As we need to extend uses of blk_mq_quiesce_queue(), it is inevitable
> > for us to need support nested quiesce, especailly we can't let
> > unquiesce happen when there is quiesce originated from other contexts.
> 
> The serialization need is clear, but why is the nesting required?

I guess the serialization is what my nesting meant:

1) code path1:

- quiesce
- do something
- unquiesce

2) code path2:
- quiesce
- do something
- unquiesce

What the patch tries to implement is that the actual unquiesce action has
to be done in the last or outermost unquiesce of the two code paths.

Not sure if serialization is a good term here, maybe I should use words of
concurrent quiesce, or other better one? Nesting is really supported
by this patch, such as code path2 may be part of 'do something' in code
path1. Meantime serialization among quiesce and unquiesce is supported
too.

> In other words what is the harm is running the hw queue every time
> we unquiesce?

running hw queue in each unquiesce doesn't matter, what matters is that
the QUIESCE flag has to be cleared in the outermost or the last unquiesce.
But if QUIESCE isn't set, it isn't useless to run hw queue in unquiesce.

 

Thanks,
Ming


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-09-29 15:44 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-29  4:15 [PATCH 0/5] blk-mq: support nested queue quiescing Ming Lei
2021-09-29  4:15 ` [PATCH 1/5] nvme: add APIs for stopping/starting admin queue Ming Lei
2021-09-29  4:15 ` [PATCH 2/5] nvme: apply nvme API to quiesce/unquiesce " Ming Lei
2021-09-29  4:15 ` [PATCH 3/5] nvme: prepare for pairing quiescing and unquiescing Ming Lei
2021-09-29  4:15 ` [PATCH 4/5] nvme: paring quiesce/unquiesce Ming Lei
2021-09-29 11:49   ` Sagi Grimberg
2021-09-29 15:28     ` Ming Lei
2021-09-29  4:15 ` [PATCH 5/5] blk-mq: support nested blk_mq_quiesce_queue() Ming Lei
2021-09-29 11:53   ` Sagi Grimberg
2021-09-29 15:44     ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).