All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 0/6] blk-mq: support concurrent queue quiescing
@ 2021-10-14  8:17 Ming Lei
  2021-10-14  8:17 ` [PATCH V4 1/6] nvme: add APIs for stopping/starting admin queue Ming Lei
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Ming Lei @ 2021-10-14  8:17 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme,
	Chaitanya Kulkarni
  Cc: Sagi Grimberg, Keith Busch, Bart Van Assche, Yu Kuai, Ming Lei

Hello,

request queue quiescing has been applied in lots of block drivers and
block core from different/un-related code paths. So far, both quiesce
and unquiesce changes the queue state unconditionally. This way has
caused trouble, such as, driver is quiescing queue for its own purpose,
however block core's queue quiesce may come because of elevator switch,
updating nr_requests or other queue attributes, then un-expected
unquiesce may come too early.

It has been observed kernel panic when running stress test on dm-mpath
suspend and updating nr_requests.

Fix the issue by supporting concurrent queue quiescing. But nvme has very
complicated uses on quiesce/unquiesce, which two may not be called in
pairing, so switch to this way in patch 1~4, and patch 5 provides
nested queue quiesce.

V4:
	- one small patch style change as suggested by Christoph, only patch 6/6
	is touched

V3:
	- add patch 5/6 to clear NVME_CTRL_ADMIN_Q_STOPPED for nvme-loop
	  after reallocating admin queue
	- take Bart's suggestion to add warning in blk_mq_unquiesce_queue()
	& update commit log

V2:
	- replace mutex with atomic ops for supporting paring quiesce &
	  unquiesce


Ming Lei (6):
  nvme: add APIs for stopping/starting admin queue
  nvme: apply nvme API to quiesce/unquiesce admin queue
  nvme: prepare for pairing quiescing and unquiescing
  nvme: paring quiesce/unquiesce
  nvme: loop: clear NVME_CTRL_ADMIN_Q_STOPPED after admin queue is
    reallocated
  blk-mq: support concurrent queue quiesce/unquiesce

 block/blk-mq.c             | 22 ++++++++++--
 drivers/nvme/host/core.c   | 70 ++++++++++++++++++++++++++------------
 drivers/nvme/host/fc.c     |  8 ++---
 drivers/nvme/host/nvme.h   |  4 +++
 drivers/nvme/host/pci.c    |  8 ++---
 drivers/nvme/host/rdma.c   | 14 ++++----
 drivers/nvme/host/tcp.c    | 16 ++++-----
 drivers/nvme/target/loop.c |  6 ++--
 include/linux/blkdev.h     |  2 ++
 9 files changed, 100 insertions(+), 50 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH V4 1/6] nvme: add APIs for stopping/starting admin queue
  2021-10-14  8:17 [PATCH V4 0/6] blk-mq: support concurrent queue quiescing Ming Lei
@ 2021-10-14  8:17 ` Ming Lei
  2021-10-14  8:17 ` [PATCH V4 2/6] nvme: apply nvme API to quiesce/unquiesce " Ming Lei
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2021-10-14  8:17 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme,
	Chaitanya Kulkarni
  Cc: Sagi Grimberg, Keith Busch, Bart Van Assche, Yu Kuai, Ming Lei

Add two APIs for stopping and starting admin queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c | 12 ++++++++++++
 drivers/nvme/host/nvme.h |  2 ++
 2 files changed, 14 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7fa75433c036..c675eef70a63 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4563,6 +4563,18 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_start_queues);
 
+void nvme_stop_admin_queue(struct nvme_ctrl *ctrl)
+{
+	blk_mq_quiesce_queue(ctrl->admin_q);
+}
+EXPORT_SYMBOL_GPL(nvme_stop_admin_queue);
+
+void nvme_start_admin_queue(struct nvme_ctrl *ctrl)
+{
+	blk_mq_unquiesce_queue(ctrl->admin_q);
+}
+EXPORT_SYMBOL_GPL(nvme_start_admin_queue);
+
 void nvme_sync_io_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 9871c0c9374c..47877a5f1515 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -659,6 +659,8 @@ void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
 
 void nvme_stop_queues(struct nvme_ctrl *ctrl);
 void nvme_start_queues(struct nvme_ctrl *ctrl);
+void nvme_stop_admin_queue(struct nvme_ctrl *ctrl);
+void nvme_start_admin_queue(struct nvme_ctrl *ctrl);
 void nvme_kill_queues(struct nvme_ctrl *ctrl);
 void nvme_sync_queues(struct nvme_ctrl *ctrl);
 void nvme_sync_io_queues(struct nvme_ctrl *ctrl);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V4 2/6] nvme: apply nvme API to quiesce/unquiesce admin queue
  2021-10-14  8:17 [PATCH V4 0/6] blk-mq: support concurrent queue quiescing Ming Lei
  2021-10-14  8:17 ` [PATCH V4 1/6] nvme: add APIs for stopping/starting admin queue Ming Lei
@ 2021-10-14  8:17 ` Ming Lei
  2021-10-14  8:17 ` [PATCH V4 3/6] nvme: prepare for pairing quiescing and unquiescing Ming Lei
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2021-10-14  8:17 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme,
	Chaitanya Kulkarni
  Cc: Sagi Grimberg, Keith Busch, Bart Van Assche, Yu Kuai, Ming Lei

Apply the added two APIs to quiesce/unquiesce admin queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c   |  2 +-
 drivers/nvme/host/fc.c     |  8 ++++----
 drivers/nvme/host/pci.c    |  8 ++++----
 drivers/nvme/host/rdma.c   | 14 +++++++-------
 drivers/nvme/host/tcp.c    | 16 ++++++++--------
 drivers/nvme/target/loop.c |  4 ++--
 6 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c675eef70a63..91d91d4f76d7 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4484,7 +4484,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 
 	/* Forcibly unquiesce queues to avoid blocking dispatch */
 	if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
-		blk_mq_unquiesce_queue(ctrl->admin_q);
+		nvme_start_admin_queue(ctrl);
 
 	list_for_each_entry(ns, &ctrl->namespaces, list)
 		nvme_set_queue_dying(ns);
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index aa14ad963d91..580a216da619 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2382,7 +2382,7 @@ nvme_fc_ctrl_free(struct kref *ref)
 	list_del(&ctrl->ctrl_list);
 	spin_unlock_irqrestore(&ctrl->rport->lock, flags);
 
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 	blk_cleanup_queue(ctrl->ctrl.admin_q);
 	blk_cleanup_queue(ctrl->ctrl.fabrics_q);
 	blk_mq_free_tag_set(&ctrl->admin_tag_set);
@@ -2510,7 +2510,7 @@ __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
 	/*
 	 * clean up the admin queue. Same thing as above.
 	 */
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	blk_sync_queue(ctrl->ctrl.admin_q);
 	blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
 				nvme_fc_terminate_exchange, &ctrl->ctrl);
@@ -3095,7 +3095,7 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 	ctrl->ctrl.max_hw_sectors = ctrl->ctrl.max_segments <<
 						(ilog2(SZ_4K) - 9);
 
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 
 	ret = nvme_init_ctrl_finish(&ctrl->ctrl);
 	if (ret || test_bit(ASSOC_FAILED, &ctrl->flags))
@@ -3249,7 +3249,7 @@ nvme_fc_delete_association(struct nvme_fc_ctrl *ctrl)
 	nvme_fc_free_queue(&ctrl->queues[0]);
 
 	/* re-enable the admin_q so anything new can fast fail */
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 
 	/* resume the io queues so that things will fast fail */
 	nvme_start_queues(&ctrl->ctrl);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ca5bda26226a..fa3efb003aa6 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1396,7 +1396,7 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
 
 	nvmeq->dev->online_queues--;
 	if (!nvmeq->qid && nvmeq->dev->ctrl.admin_q)
-		blk_mq_quiesce_queue(nvmeq->dev->ctrl.admin_q);
+		nvme_stop_admin_queue(&nvmeq->dev->ctrl);
 	if (!test_and_clear_bit(NVMEQ_POLLED, &nvmeq->flags))
 		pci_free_irq(to_pci_dev(nvmeq->dev->dev), nvmeq->cq_vector, nvmeq);
 	return 0;
@@ -1655,7 +1655,7 @@ static void nvme_dev_remove_admin(struct nvme_dev *dev)
 		 * user requests may be waiting on a stopped queue. Start the
 		 * queue to flush these to completion.
 		 */
-		blk_mq_unquiesce_queue(dev->ctrl.admin_q);
+		nvme_start_admin_queue(&dev->ctrl);
 		blk_cleanup_queue(dev->ctrl.admin_q);
 		blk_mq_free_tag_set(&dev->admin_tagset);
 	}
@@ -1689,7 +1689,7 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
 			return -ENODEV;
 		}
 	} else
-		blk_mq_unquiesce_queue(dev->ctrl.admin_q);
+		nvme_start_admin_queue(&dev->ctrl);
 
 	return 0;
 }
@@ -2624,7 +2624,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	if (shutdown) {
 		nvme_start_queues(&dev->ctrl);
 		if (dev->ctrl.admin_q && !blk_queue_dying(dev->ctrl.admin_q))
-			blk_mq_unquiesce_queue(dev->ctrl.admin_q);
+			nvme_start_admin_queue(&dev->ctrl);
 	}
 	mutex_unlock(&dev->shutdown_lock);
 }
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 40317e1b9183..799a35558986 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -919,7 +919,7 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 	else
 		ctrl->ctrl.max_integrity_segments = 0;
 
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 
 	error = nvme_init_ctrl_finish(&ctrl->ctrl);
 	if (error)
@@ -928,7 +928,7 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 	return 0;
 
 out_quiesce_queue:
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	blk_sync_queue(ctrl->ctrl.admin_q);
 out_stop_queue:
 	nvme_rdma_stop_queue(&ctrl->queues[0]);
@@ -1026,12 +1026,12 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 static void nvme_rdma_teardown_admin_queue(struct nvme_rdma_ctrl *ctrl,
 		bool remove)
 {
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	blk_sync_queue(ctrl->ctrl.admin_q);
 	nvme_rdma_stop_queue(&ctrl->queues[0]);
 	nvme_cancel_admin_tagset(&ctrl->ctrl);
 	if (remove)
-		blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+		nvme_start_admin_queue(&ctrl->ctrl);
 	nvme_rdma_destroy_admin_queue(ctrl, remove);
 }
 
@@ -1154,7 +1154,7 @@ static int nvme_rdma_setup_ctrl(struct nvme_rdma_ctrl *ctrl, bool new)
 		nvme_rdma_destroy_io_queues(ctrl, new);
 	}
 destroy_admin:
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	blk_sync_queue(ctrl->ctrl.admin_q);
 	nvme_rdma_stop_queue(&ctrl->queues[0]);
 	nvme_cancel_admin_tagset(&ctrl->ctrl);
@@ -1194,7 +1194,7 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
 	nvme_rdma_teardown_io_queues(ctrl, false);
 	nvme_start_queues(&ctrl->ctrl);
 	nvme_rdma_teardown_admin_queue(ctrl, false);
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 
 	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
 		/* state change failure is ok if we started ctrl delete */
@@ -2232,7 +2232,7 @@ static void nvme_rdma_shutdown_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
 	cancel_delayed_work_sync(&ctrl->reconnect_work);
 
 	nvme_rdma_teardown_io_queues(ctrl, shutdown);
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	if (shutdown)
 		nvme_shutdown_ctrl(&ctrl->ctrl);
 	else
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 3c1c29dd3020..4109b726e052 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1915,7 +1915,7 @@ static int nvme_tcp_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 	if (error)
 		goto out_stop_queue;
 
-	blk_mq_unquiesce_queue(ctrl->admin_q);
+	nvme_start_admin_queue(ctrl);
 
 	error = nvme_init_ctrl_finish(ctrl);
 	if (error)
@@ -1924,7 +1924,7 @@ static int nvme_tcp_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 	return 0;
 
 out_quiesce_queue:
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	nvme_stop_admin_queue(ctrl);
 	blk_sync_queue(ctrl->admin_q);
 out_stop_queue:
 	nvme_tcp_stop_queue(ctrl, 0);
@@ -1946,12 +1946,12 @@ static int nvme_tcp_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 static void nvme_tcp_teardown_admin_queue(struct nvme_ctrl *ctrl,
 		bool remove)
 {
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	nvme_stop_admin_queue(ctrl);
 	blk_sync_queue(ctrl->admin_q);
 	nvme_tcp_stop_queue(ctrl, 0);
 	nvme_cancel_admin_tagset(ctrl);
 	if (remove)
-		blk_mq_unquiesce_queue(ctrl->admin_q);
+		nvme_start_admin_queue(ctrl);
 	nvme_tcp_destroy_admin_queue(ctrl, remove);
 }
 
@@ -1960,7 +1960,7 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
 {
 	if (ctrl->queue_count <= 1)
 		return;
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	nvme_stop_admin_queue(ctrl);
 	nvme_start_freeze(ctrl);
 	nvme_stop_queues(ctrl);
 	nvme_sync_io_queues(ctrl);
@@ -2055,7 +2055,7 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl *ctrl, bool new)
 		nvme_tcp_destroy_io_queues(ctrl, new);
 	}
 destroy_admin:
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	nvme_stop_admin_queue(ctrl);
 	blk_sync_queue(ctrl->admin_q);
 	nvme_tcp_stop_queue(ctrl, 0);
 	nvme_cancel_admin_tagset(ctrl);
@@ -2098,7 +2098,7 @@ static void nvme_tcp_error_recovery_work(struct work_struct *work)
 	/* unquiesce to fail fast pending requests */
 	nvme_start_queues(ctrl);
 	nvme_tcp_teardown_admin_queue(ctrl, false);
-	blk_mq_unquiesce_queue(ctrl->admin_q);
+	nvme_start_admin_queue(ctrl);
 
 	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) {
 		/* state change failure is ok if we started ctrl delete */
@@ -2116,7 +2116,7 @@ static void nvme_tcp_teardown_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
 	cancel_delayed_work_sync(&to_tcp_ctrl(ctrl)->connect_work);
 
 	nvme_tcp_teardown_io_queues(ctrl, shutdown);
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	nvme_stop_admin_queue(ctrl);
 	if (shutdown)
 		nvme_shutdown_ctrl(ctrl);
 	else
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 0285ccc7541f..440e1544033b 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -398,7 +398,7 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 	ctrl->ctrl.max_hw_sectors =
 		(NVME_LOOP_MAX_SEGMENTS - 1) << (PAGE_SHIFT - 9);
 
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 
 	error = nvme_init_ctrl_finish(&ctrl->ctrl);
 	if (error)
@@ -428,7 +428,7 @@ static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
 		nvme_loop_destroy_io_queues(ctrl);
 	}
 
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	if (ctrl->ctrl.state == NVME_CTRL_LIVE)
 		nvme_shutdown_ctrl(&ctrl->ctrl);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V4 3/6] nvme: prepare for pairing quiescing and unquiescing
  2021-10-14  8:17 [PATCH V4 0/6] blk-mq: support concurrent queue quiescing Ming Lei
  2021-10-14  8:17 ` [PATCH V4 1/6] nvme: add APIs for stopping/starting admin queue Ming Lei
  2021-10-14  8:17 ` [PATCH V4 2/6] nvme: apply nvme API to quiesce/unquiesce " Ming Lei
@ 2021-10-14  8:17 ` Ming Lei
  2021-10-14 12:10   ` Christoph Hellwig
  2021-10-14  8:17 ` [PATCH V4 4/6] nvme: paring quiesce/unquiesce Ming Lei
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2021-10-14  8:17 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme,
	Chaitanya Kulkarni
  Cc: Sagi Grimberg, Keith Busch, Bart Van Assche, Yu Kuai, Ming Lei

Add two helpers so that we can prepare for pairing quiescing and
unquiescing which will be done in next patch.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c | 52 ++++++++++++++++++++++++----------------
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 91d91d4f76d7..23fb746a8970 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -119,25 +119,6 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
 static void nvme_update_keep_alive(struct nvme_ctrl *ctrl,
 				   struct nvme_command *cmd);
 
-/*
- * Prepare a queue for teardown.
- *
- * This must forcibly unquiesce queues to avoid blocking dispatch, and only set
- * the capacity to 0 after that to avoid blocking dispatchers that may be
- * holding bd_butex.  This will end buffered writers dirtying pages that can't
- * be synced.
- */
-static void nvme_set_queue_dying(struct nvme_ns *ns)
-{
-	if (test_and_set_bit(NVME_NS_DEAD, &ns->flags))
-		return;
-
-	blk_set_queue_dying(ns->queue);
-	blk_mq_unquiesce_queue(ns->queue);
-
-	set_capacity_and_notify(ns->disk, 0);
-}
-
 void nvme_queue_scan(struct nvme_ctrl *ctrl)
 {
 	/*
@@ -4469,6 +4450,35 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 }
 EXPORT_SYMBOL_GPL(nvme_init_ctrl);
 
+static void nvme_start_ns_queue(struct nvme_ns *ns)
+{
+	blk_mq_unquiesce_queue(ns->queue);
+}
+
+static void nvme_stop_ns_queue(struct nvme_ns *ns)
+{
+	blk_mq_quiesce_queue(ns->queue);
+}
+
+/*
+ * Prepare a queue for teardown.
+ *
+ * This must forcibly unquiesce queues to avoid blocking dispatch, and only set
+ * the capacity to 0 after that to avoid blocking dispatchers that may be
+ * holding bd_butex.  This will end buffered writers dirtying pages that can't
+ * be synced.
+ */
+static void nvme_set_queue_dying(struct nvme_ns *ns)
+{
+	if (test_and_set_bit(NVME_NS_DEAD, &ns->flags))
+		return;
+
+	blk_set_queue_dying(ns->queue);
+	nvme_start_ns_queue(ns);
+
+	set_capacity_and_notify(ns->disk, 0);
+}
+
 /**
  * nvme_kill_queues(): Ends all namespace queues
  * @ctrl: the dead controller that needs to end
@@ -4547,7 +4557,7 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 
 	down_read(&ctrl->namespaces_rwsem);
 	list_for_each_entry(ns, &ctrl->namespaces, list)
-		blk_mq_quiesce_queue(ns->queue);
+		nvme_stop_ns_queue(ns);
 	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_stop_queues);
@@ -4558,7 +4568,7 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 
 	down_read(&ctrl->namespaces_rwsem);
 	list_for_each_entry(ns, &ctrl->namespaces, list)
-		blk_mq_unquiesce_queue(ns->queue);
+		nvme_start_ns_queue(ns);
 	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_start_queues);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V4 4/6] nvme: paring quiesce/unquiesce
  2021-10-14  8:17 [PATCH V4 0/6] blk-mq: support concurrent queue quiescing Ming Lei
                   ` (2 preceding siblings ...)
  2021-10-14  8:17 ` [PATCH V4 3/6] nvme: prepare for pairing quiescing and unquiescing Ming Lei
@ 2021-10-14  8:17 ` Ming Lei
  2021-10-14  8:17 ` [PATCH V4 5/6] nvme: loop: clear NVME_CTRL_ADMIN_Q_STOPPED after admin queue is reallocated Ming Lei
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2021-10-14  8:17 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme,
	Chaitanya Kulkarni
  Cc: Sagi Grimberg, Keith Busch, Bart Van Assche, Yu Kuai, Ming Lei

The current blk_mq_quiesce_queue() and blk_mq_unquiesce_queue() always
stops and starts the queue unconditionally. And there can be concurrent
quiesce/unquiesce coming from different unrelated code paths, so
unquiesce may come unexpectedly and start queue too early.

Prepare for supporting concurrent quiesce/unquiesce from multiple
contexts, so that we can address the above issue.

NVMe has very complicated quiesce/unquiesce use pattern, add one atomic
bit for makeiing sure that blk-mq quiece/unquiesce is always called in
pair.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c | 12 ++++++++----
 drivers/nvme/host/nvme.h |  2 ++
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 23fb746a8970..4391760e2e9a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4452,12 +4452,14 @@ EXPORT_SYMBOL_GPL(nvme_init_ctrl);
 
 static void nvme_start_ns_queue(struct nvme_ns *ns)
 {
-	blk_mq_unquiesce_queue(ns->queue);
+	if (test_and_clear_bit(NVME_NS_STOPPED, &ns->flags))
+		blk_mq_unquiesce_queue(ns->queue);
 }
 
 static void nvme_stop_ns_queue(struct nvme_ns *ns)
 {
-	blk_mq_quiesce_queue(ns->queue);
+	if (!test_and_set_bit(NVME_NS_STOPPED, &ns->flags))
+		blk_mq_quiesce_queue(ns->queue);
 }
 
 /*
@@ -4575,13 +4577,15 @@ EXPORT_SYMBOL_GPL(nvme_start_queues);
 
 void nvme_stop_admin_queue(struct nvme_ctrl *ctrl)
 {
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	if (!test_and_set_bit(NVME_CTRL_ADMIN_Q_STOPPED, &ctrl->flags))
+		blk_mq_quiesce_queue(ctrl->admin_q);
 }
 EXPORT_SYMBOL_GPL(nvme_stop_admin_queue);
 
 void nvme_start_admin_queue(struct nvme_ctrl *ctrl)
 {
-	blk_mq_unquiesce_queue(ctrl->admin_q);
+	if (test_and_clear_bit(NVME_CTRL_ADMIN_Q_STOPPED, &ctrl->flags))
+		blk_mq_unquiesce_queue(ctrl->admin_q);
 }
 EXPORT_SYMBOL_GPL(nvme_start_admin_queue);
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 47877a5f1515..0fa6fa22f088 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -336,6 +336,7 @@ struct nvme_ctrl {
 	int nr_reconnects;
 	unsigned long flags;
 #define NVME_CTRL_FAILFAST_EXPIRED	0
+#define NVME_CTRL_ADMIN_Q_STOPPED	1
 	struct nvmf_ctrl_options *opts;
 
 	struct page *discard_page;
@@ -457,6 +458,7 @@ struct nvme_ns {
 #define NVME_NS_ANA_PENDING	2
 #define NVME_NS_FORCE_RO	3
 #define NVME_NS_READY		4
+#define NVME_NS_STOPPED		5
 
 	struct cdev		cdev;
 	struct device		cdev_device;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V4 5/6] nvme: loop: clear NVME_CTRL_ADMIN_Q_STOPPED after admin queue is reallocated
  2021-10-14  8:17 [PATCH V4 0/6] blk-mq: support concurrent queue quiescing Ming Lei
                   ` (3 preceding siblings ...)
  2021-10-14  8:17 ` [PATCH V4 4/6] nvme: paring quiesce/unquiesce Ming Lei
@ 2021-10-14  8:17 ` Ming Lei
  2021-10-14  8:17 ` [PATCH V4 6/6] blk-mq: support concurrent queue quiesce/unquiesce Ming Lei
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2021-10-14  8:17 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme,
	Chaitanya Kulkarni
  Cc: Sagi Grimberg, Keith Busch, Bart Van Assche, Yu Kuai, Ming Lei

The nvme-loop's admin queue may be freed and reallocated, and we have to
reset the flag of NVME_CTRL_ADMIN_Q_STOPPED so that the flag can match
with the quiesce state of the admin queue.

nvme-loop is the only driver to reallocate request queue, and not see
such usage in other nvme drivers.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/target/loop.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 440e1544033b..eb1094254c82 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -384,6 +384,8 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 		error = PTR_ERR(ctrl->ctrl.admin_q);
 		goto out_cleanup_fabrics_q;
 	}
+	/* reset stopped state for the fresh admin queue */
+	clear_bit(NVME_CTRL_ADMIN_Q_STOPPED, &ctrl->ctrl.flags);
 
 	error = nvmf_connect_admin_queue(&ctrl->ctrl);
 	if (error)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V4 6/6] blk-mq: support concurrent queue quiesce/unquiesce
  2021-10-14  8:17 [PATCH V4 0/6] blk-mq: support concurrent queue quiescing Ming Lei
                   ` (4 preceding siblings ...)
  2021-10-14  8:17 ` [PATCH V4 5/6] nvme: loop: clear NVME_CTRL_ADMIN_Q_STOPPED after admin queue is reallocated Ming Lei
@ 2021-10-14  8:17 ` Ming Lei
  2021-10-14 12:11 ` [PATCH V4 0/6] blk-mq: support concurrent queue quiescing Christoph Hellwig
  2021-10-20  0:28 ` Jens Axboe
  7 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2021-10-14  8:17 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme,
	Chaitanya Kulkarni
  Cc: Sagi Grimberg, Keith Busch, Bart Van Assche, Yu Kuai, Ming Lei

blk_mq_quiesce_queue() has been used a bit wide now, so far we don't support
concurrent/nested quiesce. One biggest issue is that unquiesce can happen
unexpectedly in case that quiesce/unquiesce are run concurrently from
more than one context.

This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
and we only unquiesce queue when it is the last/outer-most one of all
contexts.

Several kernel panic issue has been reported[1][2][3] when running stress
quiesce test. And this patch has been verified in these reports.

[1] https://lore.kernel.org/linux-block/9b21c797-e505-3821-4f5b-df7bf9380328@huawei.com/T/#m1fc52431fad7f33b1ffc3f12c4450e4238540787
[2] https://lore.kernel.org/linux-block/9b21c797-e505-3821-4f5b-df7bf9380328@huawei.com/T/#m10ad90afeb9c8cc318334190a7c24c8b5c5e0722
[3] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c         | 22 +++++++++++++++++++---
 include/linux/blkdev.h |  2 ++
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 68cccb2d1f8c..360e8eaee4dc 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -209,7 +209,12 @@ EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
  */
 void blk_mq_quiesce_queue_nowait(struct request_queue *q)
 {
-	blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
+	unsigned long flags;
+
+	spin_lock_irqsave(&q->queue_lock, flags);
+	if (!q->quiesce_depth++)
+		blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
+	spin_unlock_irqrestore(&q->queue_lock, flags);
 }
 EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
 
@@ -250,10 +255,21 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
  */
 void blk_mq_unquiesce_queue(struct request_queue *q)
 {
-	blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
+	unsigned long flags;
+	bool run_queue = false;
+
+	spin_lock_irqsave(&q->queue_lock, flags);
+	if (WARN_ON_ONCE(q->quiesce_depth <= 0)) {
+		;
+	} else if (!--q->quiesce_depth) {
+		blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
+		run_queue = true;
+	}
+	spin_unlock_irqrestore(&q->queue_lock, flags);
 
 	/* dispatch requests which are inserted during quiescing */
-	blk_mq_run_hw_queues(q, true);
+	if (run_queue)
+		blk_mq_run_hw_queues(q, true);
 }
 EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 17705c970d7e..52020fae665c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -314,6 +314,8 @@ struct request_queue {
 	 */
 	struct mutex		mq_freeze_lock;
 
+	int			quiesce_depth;
+
 	struct blk_mq_tag_set	*tag_set;
 	struct list_head	tag_set_list;
 	struct bio_set		bio_split;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH V4 3/6] nvme: prepare for pairing quiescing and unquiescing
  2021-10-14  8:17 ` [PATCH V4 3/6] nvme: prepare for pairing quiescing and unquiescing Ming Lei
@ 2021-10-14 12:10   ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2021-10-14 12:10 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme,
	Chaitanya Kulkarni, Sagi Grimberg, Keith Busch, Bart Van Assche,
	Yu Kuai

On Thu, Oct 14, 2021 at 04:17:07PM +0800, Ming Lei wrote:
> Add two helpers so that we can prepare for pairing quiescing and
> unquiescing which will be done in next patch.

I'd prefer to not add these wrappers with 1 or two callers, all in the
local file respectively.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V4 0/6] blk-mq: support concurrent queue quiescing
  2021-10-14  8:17 [PATCH V4 0/6] blk-mq: support concurrent queue quiescing Ming Lei
                   ` (5 preceding siblings ...)
  2021-10-14  8:17 ` [PATCH V4 6/6] blk-mq: support concurrent queue quiesce/unquiesce Ming Lei
@ 2021-10-14 12:11 ` Christoph Hellwig
  2021-10-18  1:05   ` Ming Lei
  2021-10-20  0:28 ` Jens Axboe
  7 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2021-10-14 12:11 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme,
	Chaitanya Kulkarni, Sagi Grimberg, Keith Busch, Bart Van Assche,
	Yu Kuai

Except for the nitpick just noted this looks fine to me:

Reviewed-by: Christoph Hellwig <hch@lst.de>

I plan to apply at least 1-5 to the nvme tree with that nitpick fixed
up locally.

Jens: do you want me to take the last patch through the nvme tree as
well?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V4 0/6] blk-mq: support concurrent queue quiescing
  2021-10-14 12:11 ` [PATCH V4 0/6] blk-mq: support concurrent queue quiescing Christoph Hellwig
@ 2021-10-18  1:05   ` Ming Lei
  2021-10-18  1:19     ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2021-10-18  1:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, linux-nvme, Chaitanya Kulkarni,
	Sagi Grimberg, Keith Busch, Bart Van Assche, Yu Kuai

On Thu, Oct 14, 2021 at 02:11:45PM +0200, Christoph Hellwig wrote:
> Except for the nitpick just noted this looks fine to me:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> I plan to apply at least 1-5 to the nvme tree with that nitpick fixed
> up locally.
> 
> Jens: do you want me to take the last patch through the nvme tree as
> well?

Hello Jens & Christoph,

Any chance to make it in v5.16?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V4 0/6] blk-mq: support concurrent queue quiescing
  2021-10-18  1:05   ` Ming Lei
@ 2021-10-18  1:19     ` Jens Axboe
  2021-10-20  0:23       ` Ming Lei
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2021-10-18  1:19 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig
  Cc: linux-block, linux-nvme, Chaitanya Kulkarni, Sagi Grimberg,
	Keith Busch, Bart Van Assche, Yu Kuai

On 10/17/21 7:05 PM, Ming Lei wrote:
> On Thu, Oct 14, 2021 at 02:11:45PM +0200, Christoph Hellwig wrote:
>> Except for the nitpick just noted this looks fine to me:
>>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>
>> I plan to apply at least 1-5 to the nvme tree with that nitpick fixed
>> up locally.
>>
>> Jens: do you want me to take the last patch through the nvme tree as
>> well?
> 
> Hello Jens & Christoph,
> 
> Any chance to make it in v5.16?

Yep, I'll take a look at this in the morning. Rebasing the block tree
anyway, I'll let you know if we need any changes here.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V4 0/6] blk-mq: support concurrent queue quiescing
  2021-10-18  1:19     ` Jens Axboe
@ 2021-10-20  0:23       ` Ming Lei
  0 siblings, 0 replies; 13+ messages in thread
From: Ming Lei @ 2021-10-20  0:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, linux-block, linux-nvme, Chaitanya Kulkarni,
	Sagi Grimberg, Keith Busch, Bart Van Assche, Yu Kuai

On Sun, Oct 17, 2021 at 07:19:13PM -0600, Jens Axboe wrote:
> On 10/17/21 7:05 PM, Ming Lei wrote:
> > On Thu, Oct 14, 2021 at 02:11:45PM +0200, Christoph Hellwig wrote:
> >> Except for the nitpick just noted this looks fine to me:
> >>
> >> Reviewed-by: Christoph Hellwig <hch@lst.de>
> >>
> >> I plan to apply at least 1-5 to the nvme tree with that nitpick fixed
> >> up locally.
> >>
> >> Jens: do you want me to take the last patch through the nvme tree as
> >> well?
> > 
> > Hello Jens & Christoph,
> > 
> > Any chance to make it in v5.16?
> 
> Yep, I'll take a look at this in the morning. Rebasing the block tree
> anyway, I'll let you know if we need any changes here.

Hello Jens,

Looks the 6 patches can still be applied cleanly against both
for-5.16/block and for-5.16/drivers today, do you need an resend?

thanks,
Ming


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V4 0/6] blk-mq: support concurrent queue quiescing
  2021-10-14  8:17 [PATCH V4 0/6] blk-mq: support concurrent queue quiescing Ming Lei
                   ` (6 preceding siblings ...)
  2021-10-14 12:11 ` [PATCH V4 0/6] blk-mq: support concurrent queue quiescing Christoph Hellwig
@ 2021-10-20  0:28 ` Jens Axboe
  7 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2021-10-20  0:28 UTC (permalink / raw)
  To: Ming Lei, linux-block, Chaitanya Kulkarni, Christoph Hellwig, linux-nvme
  Cc: Jens Axboe, Yu Kuai, Keith Busch, Bart Van Assche, Sagi Grimberg

On Thu, 14 Oct 2021 16:17:04 +0800, Ming Lei wrote:
> request queue quiescing has been applied in lots of block drivers and
> block core from different/un-related code paths. So far, both quiesce
> and unquiesce changes the queue state unconditionally. This way has
> caused trouble, such as, driver is quiescing queue for its own purpose,
> however block core's queue quiesce may come because of elevator switch,
> updating nr_requests or other queue attributes, then un-expected
> unquiesce may come too early.
> 
> [...]

Applied, thanks!

[1/6] nvme: add APIs for stopping/starting admin queue
      commit: a277654bafb51fb8b4cf23550f15926bb02536f4
[2/6] nvme: apply nvme API to quiesce/unquiesce admin queue
      commit: 6ca1d9027e0d9ce5604a3e28de89456a76138034
[3/6] nvme: prepare for pairing quiescing and unquiescing
      commit: ebc9b95260151d966728cf0063b3b4e465f934d9
[4/6] nvme: paring quiesce/unquiesce
      commit: 9e6a6b1212100148c109675e003369e3e219dbd9
[5/6] nvme: loop: clear NVME_CTRL_ADMIN_Q_STOPPED after admin queue is reallocated
      commit: 1d35d519d8bf224ccdb43f9a235b8bda2d6d453c
[6/6] blk-mq: support concurrent queue quiesce/unquiesce
      commit: e70feb8b3e6886c525c88943b5f1508d02f5a683

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-10-20  0:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-14  8:17 [PATCH V4 0/6] blk-mq: support concurrent queue quiescing Ming Lei
2021-10-14  8:17 ` [PATCH V4 1/6] nvme: add APIs for stopping/starting admin queue Ming Lei
2021-10-14  8:17 ` [PATCH V4 2/6] nvme: apply nvme API to quiesce/unquiesce " Ming Lei
2021-10-14  8:17 ` [PATCH V4 3/6] nvme: prepare for pairing quiescing and unquiescing Ming Lei
2021-10-14 12:10   ` Christoph Hellwig
2021-10-14  8:17 ` [PATCH V4 4/6] nvme: paring quiesce/unquiesce Ming Lei
2021-10-14  8:17 ` [PATCH V4 5/6] nvme: loop: clear NVME_CTRL_ADMIN_Q_STOPPED after admin queue is reallocated Ming Lei
2021-10-14  8:17 ` [PATCH V4 6/6] blk-mq: support concurrent queue quiesce/unquiesce Ming Lei
2021-10-14 12:11 ` [PATCH V4 0/6] blk-mq: support concurrent queue quiescing Christoph Hellwig
2021-10-18  1:05   ` Ming Lei
2021-10-18  1:19     ` Jens Axboe
2021-10-20  0:23       ` Ming Lei
2021-10-20  0:28 ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.