All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/5] blk-mq: support concurrent queue quiescing
@ 2021-09-30 12:56 ` Ming Lei
  0 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-09-30 12:56 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme
  Cc: Sagi Grimberg, Keith Busch, Ming Lei

Hello,

request queue quiescing has been applied in lots of block drivers and
block core from different/un-related code paths. So far, both quiesce
and unquiesce changes the queue state unconditionally. This way has
caused trouble, such as, driver is quiescing queue for its own purpose,
however block core's queue quiesce may come because of elevator switch,
updating nr_requests or other queue attributes, then un-expected
unquiesce may come too early.

It has been observed kernel panic when running stress test on dm-mpath
suspend and updating nr_requests.

Fix the issue by supporting concurrent queue quiescing. But nvme has very
complicated uses on quiesce/unquiesce, which two may not be called in
pairing, so switch to this way in patch 1~4, and patch 5 provides
nested queue quiesce.

V2:
	- replace mutex with atomic ops for supporting paring quiesce &
	  unquiesce


Ming Lei (5):
  nvme: add APIs for stopping/starting admin queue
  nvme: apply nvme API to quiesce/unquiesce admin queue
  nvme: prepare for pairing quiescing and unquiescing
  nvme: paring quiesce/unquiesce
  blk-mq: support concurrent queue quiesce/unquiesce

 block/blk-mq.c             | 20 +++++++++--
 drivers/nvme/host/core.c   | 70 ++++++++++++++++++++++++++------------
 drivers/nvme/host/fc.c     |  8 ++---
 drivers/nvme/host/nvme.h   |  4 +++
 drivers/nvme/host/pci.c    |  8 ++---
 drivers/nvme/host/rdma.c   | 14 ++++----
 drivers/nvme/host/tcp.c    | 16 ++++-----
 drivers/nvme/target/loop.c |  4 +--
 include/linux/blkdev.h     |  2 ++
 9 files changed, 96 insertions(+), 50 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH V2 0/5] blk-mq: support concurrent queue quiescing
@ 2021-09-30 12:56 ` Ming Lei
  0 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-09-30 12:56 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme
  Cc: Sagi Grimberg, Keith Busch, Ming Lei

Hello,

request queue quiescing has been applied in lots of block drivers and
block core from different/un-related code paths. So far, both quiesce
and unquiesce changes the queue state unconditionally. This way has
caused trouble, such as, driver is quiescing queue for its own purpose,
however block core's queue quiesce may come because of elevator switch,
updating nr_requests or other queue attributes, then un-expected
unquiesce may come too early.

It has been observed kernel panic when running stress test on dm-mpath
suspend and updating nr_requests.

Fix the issue by supporting concurrent queue quiescing. But nvme has very
complicated uses on quiesce/unquiesce, which two may not be called in
pairing, so switch to this way in patch 1~4, and patch 5 provides
nested queue quiesce.

V2:
	- replace mutex with atomic ops for supporting paring quiesce &
	  unquiesce


Ming Lei (5):
  nvme: add APIs for stopping/starting admin queue
  nvme: apply nvme API to quiesce/unquiesce admin queue
  nvme: prepare for pairing quiescing and unquiescing
  nvme: paring quiesce/unquiesce
  blk-mq: support concurrent queue quiesce/unquiesce

 block/blk-mq.c             | 20 +++++++++--
 drivers/nvme/host/core.c   | 70 ++++++++++++++++++++++++++------------
 drivers/nvme/host/fc.c     |  8 ++---
 drivers/nvme/host/nvme.h   |  4 +++
 drivers/nvme/host/pci.c    |  8 ++---
 drivers/nvme/host/rdma.c   | 14 ++++----
 drivers/nvme/host/tcp.c    | 16 ++++-----
 drivers/nvme/target/loop.c |  4 +--
 include/linux/blkdev.h     |  2 ++
 9 files changed, 96 insertions(+), 50 deletions(-)

-- 
2.31.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH V2 1/5] nvme: add APIs for stopping/starting admin queue
  2021-09-30 12:56 ` Ming Lei
@ 2021-09-30 12:56   ` Ming Lei
  -1 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-09-30 12:56 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme
  Cc: Sagi Grimberg, Keith Busch, Ming Lei

Add two APIs for stopping and starting admin queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c | 12 ++++++++++++
 drivers/nvme/host/nvme.h |  2 ++
 2 files changed, 14 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7fa75433c036..c675eef70a63 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4563,6 +4563,18 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_start_queues);
 
+void nvme_stop_admin_queue(struct nvme_ctrl *ctrl)
+{
+	blk_mq_quiesce_queue(ctrl->admin_q);
+}
+EXPORT_SYMBOL_GPL(nvme_stop_admin_queue);
+
+void nvme_start_admin_queue(struct nvme_ctrl *ctrl)
+{
+	blk_mq_unquiesce_queue(ctrl->admin_q);
+}
+EXPORT_SYMBOL_GPL(nvme_start_admin_queue);
+
 void nvme_sync_io_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 9871c0c9374c..47877a5f1515 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -659,6 +659,8 @@ void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
 
 void nvme_stop_queues(struct nvme_ctrl *ctrl);
 void nvme_start_queues(struct nvme_ctrl *ctrl);
+void nvme_stop_admin_queue(struct nvme_ctrl *ctrl);
+void nvme_start_admin_queue(struct nvme_ctrl *ctrl);
 void nvme_kill_queues(struct nvme_ctrl *ctrl);
 void nvme_sync_queues(struct nvme_ctrl *ctrl);
 void nvme_sync_io_queues(struct nvme_ctrl *ctrl);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH V2 1/5] nvme: add APIs for stopping/starting admin queue
@ 2021-09-30 12:56   ` Ming Lei
  0 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-09-30 12:56 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme
  Cc: Sagi Grimberg, Keith Busch, Ming Lei

Add two APIs for stopping and starting admin queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c | 12 ++++++++++++
 drivers/nvme/host/nvme.h |  2 ++
 2 files changed, 14 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7fa75433c036..c675eef70a63 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4563,6 +4563,18 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_start_queues);
 
+void nvme_stop_admin_queue(struct nvme_ctrl *ctrl)
+{
+	blk_mq_quiesce_queue(ctrl->admin_q);
+}
+EXPORT_SYMBOL_GPL(nvme_stop_admin_queue);
+
+void nvme_start_admin_queue(struct nvme_ctrl *ctrl)
+{
+	blk_mq_unquiesce_queue(ctrl->admin_q);
+}
+EXPORT_SYMBOL_GPL(nvme_start_admin_queue);
+
 void nvme_sync_io_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 9871c0c9374c..47877a5f1515 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -659,6 +659,8 @@ void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
 
 void nvme_stop_queues(struct nvme_ctrl *ctrl);
 void nvme_start_queues(struct nvme_ctrl *ctrl);
+void nvme_stop_admin_queue(struct nvme_ctrl *ctrl);
+void nvme_start_admin_queue(struct nvme_ctrl *ctrl);
 void nvme_kill_queues(struct nvme_ctrl *ctrl);
 void nvme_sync_queues(struct nvme_ctrl *ctrl);
 void nvme_sync_io_queues(struct nvme_ctrl *ctrl);
-- 
2.31.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH V2 2/5] nvme: apply nvme API to quiesce/unquiesce admin queue
  2021-09-30 12:56 ` Ming Lei
@ 2021-09-30 12:56   ` Ming Lei
  -1 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-09-30 12:56 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme
  Cc: Sagi Grimberg, Keith Busch, Ming Lei

Apply the added two APIs to quiesce/unquiesce admin queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c   |  2 +-
 drivers/nvme/host/fc.c     |  8 ++++----
 drivers/nvme/host/pci.c    |  8 ++++----
 drivers/nvme/host/rdma.c   | 14 +++++++-------
 drivers/nvme/host/tcp.c    | 16 ++++++++--------
 drivers/nvme/target/loop.c |  4 ++--
 6 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c675eef70a63..91d91d4f76d7 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4484,7 +4484,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 
 	/* Forcibly unquiesce queues to avoid blocking dispatch */
 	if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
-		blk_mq_unquiesce_queue(ctrl->admin_q);
+		nvme_start_admin_queue(ctrl);
 
 	list_for_each_entry(ns, &ctrl->namespaces, list)
 		nvme_set_queue_dying(ns);
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index aa14ad963d91..580a216da619 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2382,7 +2382,7 @@ nvme_fc_ctrl_free(struct kref *ref)
 	list_del(&ctrl->ctrl_list);
 	spin_unlock_irqrestore(&ctrl->rport->lock, flags);
 
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 	blk_cleanup_queue(ctrl->ctrl.admin_q);
 	blk_cleanup_queue(ctrl->ctrl.fabrics_q);
 	blk_mq_free_tag_set(&ctrl->admin_tag_set);
@@ -2510,7 +2510,7 @@ __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
 	/*
 	 * clean up the admin queue. Same thing as above.
 	 */
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	blk_sync_queue(ctrl->ctrl.admin_q);
 	blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
 				nvme_fc_terminate_exchange, &ctrl->ctrl);
@@ -3095,7 +3095,7 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 	ctrl->ctrl.max_hw_sectors = ctrl->ctrl.max_segments <<
 						(ilog2(SZ_4K) - 9);
 
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 
 	ret = nvme_init_ctrl_finish(&ctrl->ctrl);
 	if (ret || test_bit(ASSOC_FAILED, &ctrl->flags))
@@ -3249,7 +3249,7 @@ nvme_fc_delete_association(struct nvme_fc_ctrl *ctrl)
 	nvme_fc_free_queue(&ctrl->queues[0]);
 
 	/* re-enable the admin_q so anything new can fast fail */
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 
 	/* resume the io queues so that things will fast fail */
 	nvme_start_queues(&ctrl->ctrl);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ca5bda26226a..fa3efb003aa6 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1396,7 +1396,7 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
 
 	nvmeq->dev->online_queues--;
 	if (!nvmeq->qid && nvmeq->dev->ctrl.admin_q)
-		blk_mq_quiesce_queue(nvmeq->dev->ctrl.admin_q);
+		nvme_stop_admin_queue(&nvmeq->dev->ctrl);
 	if (!test_and_clear_bit(NVMEQ_POLLED, &nvmeq->flags))
 		pci_free_irq(to_pci_dev(nvmeq->dev->dev), nvmeq->cq_vector, nvmeq);
 	return 0;
@@ -1655,7 +1655,7 @@ static void nvme_dev_remove_admin(struct nvme_dev *dev)
 		 * user requests may be waiting on a stopped queue. Start the
 		 * queue to flush these to completion.
 		 */
-		blk_mq_unquiesce_queue(dev->ctrl.admin_q);
+		nvme_start_admin_queue(&dev->ctrl);
 		blk_cleanup_queue(dev->ctrl.admin_q);
 		blk_mq_free_tag_set(&dev->admin_tagset);
 	}
@@ -1689,7 +1689,7 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
 			return -ENODEV;
 		}
 	} else
-		blk_mq_unquiesce_queue(dev->ctrl.admin_q);
+		nvme_start_admin_queue(&dev->ctrl);
 
 	return 0;
 }
@@ -2624,7 +2624,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	if (shutdown) {
 		nvme_start_queues(&dev->ctrl);
 		if (dev->ctrl.admin_q && !blk_queue_dying(dev->ctrl.admin_q))
-			blk_mq_unquiesce_queue(dev->ctrl.admin_q);
+			nvme_start_admin_queue(&dev->ctrl);
 	}
 	mutex_unlock(&dev->shutdown_lock);
 }
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 40317e1b9183..799a35558986 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -919,7 +919,7 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 	else
 		ctrl->ctrl.max_integrity_segments = 0;
 
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 
 	error = nvme_init_ctrl_finish(&ctrl->ctrl);
 	if (error)
@@ -928,7 +928,7 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 	return 0;
 
 out_quiesce_queue:
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	blk_sync_queue(ctrl->ctrl.admin_q);
 out_stop_queue:
 	nvme_rdma_stop_queue(&ctrl->queues[0]);
@@ -1026,12 +1026,12 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 static void nvme_rdma_teardown_admin_queue(struct nvme_rdma_ctrl *ctrl,
 		bool remove)
 {
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	blk_sync_queue(ctrl->ctrl.admin_q);
 	nvme_rdma_stop_queue(&ctrl->queues[0]);
 	nvme_cancel_admin_tagset(&ctrl->ctrl);
 	if (remove)
-		blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+		nvme_start_admin_queue(&ctrl->ctrl);
 	nvme_rdma_destroy_admin_queue(ctrl, remove);
 }
 
@@ -1154,7 +1154,7 @@ static int nvme_rdma_setup_ctrl(struct nvme_rdma_ctrl *ctrl, bool new)
 		nvme_rdma_destroy_io_queues(ctrl, new);
 	}
 destroy_admin:
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	blk_sync_queue(ctrl->ctrl.admin_q);
 	nvme_rdma_stop_queue(&ctrl->queues[0]);
 	nvme_cancel_admin_tagset(&ctrl->ctrl);
@@ -1194,7 +1194,7 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
 	nvme_rdma_teardown_io_queues(ctrl, false);
 	nvme_start_queues(&ctrl->ctrl);
 	nvme_rdma_teardown_admin_queue(ctrl, false);
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 
 	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
 		/* state change failure is ok if we started ctrl delete */
@@ -2232,7 +2232,7 @@ static void nvme_rdma_shutdown_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
 	cancel_delayed_work_sync(&ctrl->reconnect_work);
 
 	nvme_rdma_teardown_io_queues(ctrl, shutdown);
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	if (shutdown)
 		nvme_shutdown_ctrl(&ctrl->ctrl);
 	else
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 3c1c29dd3020..4109b726e052 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1915,7 +1915,7 @@ static int nvme_tcp_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 	if (error)
 		goto out_stop_queue;
 
-	blk_mq_unquiesce_queue(ctrl->admin_q);
+	nvme_start_admin_queue(ctrl);
 
 	error = nvme_init_ctrl_finish(ctrl);
 	if (error)
@@ -1924,7 +1924,7 @@ static int nvme_tcp_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 	return 0;
 
 out_quiesce_queue:
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	nvme_stop_admin_queue(ctrl);
 	blk_sync_queue(ctrl->admin_q);
 out_stop_queue:
 	nvme_tcp_stop_queue(ctrl, 0);
@@ -1946,12 +1946,12 @@ static int nvme_tcp_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 static void nvme_tcp_teardown_admin_queue(struct nvme_ctrl *ctrl,
 		bool remove)
 {
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	nvme_stop_admin_queue(ctrl);
 	blk_sync_queue(ctrl->admin_q);
 	nvme_tcp_stop_queue(ctrl, 0);
 	nvme_cancel_admin_tagset(ctrl);
 	if (remove)
-		blk_mq_unquiesce_queue(ctrl->admin_q);
+		nvme_start_admin_queue(ctrl);
 	nvme_tcp_destroy_admin_queue(ctrl, remove);
 }
 
@@ -1960,7 +1960,7 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
 {
 	if (ctrl->queue_count <= 1)
 		return;
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	nvme_stop_admin_queue(ctrl);
 	nvme_start_freeze(ctrl);
 	nvme_stop_queues(ctrl);
 	nvme_sync_io_queues(ctrl);
@@ -2055,7 +2055,7 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl *ctrl, bool new)
 		nvme_tcp_destroy_io_queues(ctrl, new);
 	}
 destroy_admin:
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	nvme_stop_admin_queue(ctrl);
 	blk_sync_queue(ctrl->admin_q);
 	nvme_tcp_stop_queue(ctrl, 0);
 	nvme_cancel_admin_tagset(ctrl);
@@ -2098,7 +2098,7 @@ static void nvme_tcp_error_recovery_work(struct work_struct *work)
 	/* unquiesce to fail fast pending requests */
 	nvme_start_queues(ctrl);
 	nvme_tcp_teardown_admin_queue(ctrl, false);
-	blk_mq_unquiesce_queue(ctrl->admin_q);
+	nvme_start_admin_queue(ctrl);
 
 	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) {
 		/* state change failure is ok if we started ctrl delete */
@@ -2116,7 +2116,7 @@ static void nvme_tcp_teardown_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
 	cancel_delayed_work_sync(&to_tcp_ctrl(ctrl)->connect_work);
 
 	nvme_tcp_teardown_io_queues(ctrl, shutdown);
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	nvme_stop_admin_queue(ctrl);
 	if (shutdown)
 		nvme_shutdown_ctrl(ctrl);
 	else
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 0285ccc7541f..440e1544033b 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -398,7 +398,7 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 	ctrl->ctrl.max_hw_sectors =
 		(NVME_LOOP_MAX_SEGMENTS - 1) << (PAGE_SHIFT - 9);
 
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 
 	error = nvme_init_ctrl_finish(&ctrl->ctrl);
 	if (error)
@@ -428,7 +428,7 @@ static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
 		nvme_loop_destroy_io_queues(ctrl);
 	}
 
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	if (ctrl->ctrl.state == NVME_CTRL_LIVE)
 		nvme_shutdown_ctrl(&ctrl->ctrl);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH V2 2/5] nvme: apply nvme API to quiesce/unquiesce admin queue
@ 2021-09-30 12:56   ` Ming Lei
  0 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-09-30 12:56 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme
  Cc: Sagi Grimberg, Keith Busch, Ming Lei

Apply the added two APIs to quiesce/unquiesce admin queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c   |  2 +-
 drivers/nvme/host/fc.c     |  8 ++++----
 drivers/nvme/host/pci.c    |  8 ++++----
 drivers/nvme/host/rdma.c   | 14 +++++++-------
 drivers/nvme/host/tcp.c    | 16 ++++++++--------
 drivers/nvme/target/loop.c |  4 ++--
 6 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c675eef70a63..91d91d4f76d7 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4484,7 +4484,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 
 	/* Forcibly unquiesce queues to avoid blocking dispatch */
 	if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
-		blk_mq_unquiesce_queue(ctrl->admin_q);
+		nvme_start_admin_queue(ctrl);
 
 	list_for_each_entry(ns, &ctrl->namespaces, list)
 		nvme_set_queue_dying(ns);
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index aa14ad963d91..580a216da619 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2382,7 +2382,7 @@ nvme_fc_ctrl_free(struct kref *ref)
 	list_del(&ctrl->ctrl_list);
 	spin_unlock_irqrestore(&ctrl->rport->lock, flags);
 
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 	blk_cleanup_queue(ctrl->ctrl.admin_q);
 	blk_cleanup_queue(ctrl->ctrl.fabrics_q);
 	blk_mq_free_tag_set(&ctrl->admin_tag_set);
@@ -2510,7 +2510,7 @@ __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
 	/*
 	 * clean up the admin queue. Same thing as above.
 	 */
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	blk_sync_queue(ctrl->ctrl.admin_q);
 	blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
 				nvme_fc_terminate_exchange, &ctrl->ctrl);
@@ -3095,7 +3095,7 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 	ctrl->ctrl.max_hw_sectors = ctrl->ctrl.max_segments <<
 						(ilog2(SZ_4K) - 9);
 
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 
 	ret = nvme_init_ctrl_finish(&ctrl->ctrl);
 	if (ret || test_bit(ASSOC_FAILED, &ctrl->flags))
@@ -3249,7 +3249,7 @@ nvme_fc_delete_association(struct nvme_fc_ctrl *ctrl)
 	nvme_fc_free_queue(&ctrl->queues[0]);
 
 	/* re-enable the admin_q so anything new can fast fail */
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 
 	/* resume the io queues so that things will fast fail */
 	nvme_start_queues(&ctrl->ctrl);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ca5bda26226a..fa3efb003aa6 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1396,7 +1396,7 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
 
 	nvmeq->dev->online_queues--;
 	if (!nvmeq->qid && nvmeq->dev->ctrl.admin_q)
-		blk_mq_quiesce_queue(nvmeq->dev->ctrl.admin_q);
+		nvme_stop_admin_queue(&nvmeq->dev->ctrl);
 	if (!test_and_clear_bit(NVMEQ_POLLED, &nvmeq->flags))
 		pci_free_irq(to_pci_dev(nvmeq->dev->dev), nvmeq->cq_vector, nvmeq);
 	return 0;
@@ -1655,7 +1655,7 @@ static void nvme_dev_remove_admin(struct nvme_dev *dev)
 		 * user requests may be waiting on a stopped queue. Start the
 		 * queue to flush these to completion.
 		 */
-		blk_mq_unquiesce_queue(dev->ctrl.admin_q);
+		nvme_start_admin_queue(&dev->ctrl);
 		blk_cleanup_queue(dev->ctrl.admin_q);
 		blk_mq_free_tag_set(&dev->admin_tagset);
 	}
@@ -1689,7 +1689,7 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
 			return -ENODEV;
 		}
 	} else
-		blk_mq_unquiesce_queue(dev->ctrl.admin_q);
+		nvme_start_admin_queue(&dev->ctrl);
 
 	return 0;
 }
@@ -2624,7 +2624,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	if (shutdown) {
 		nvme_start_queues(&dev->ctrl);
 		if (dev->ctrl.admin_q && !blk_queue_dying(dev->ctrl.admin_q))
-			blk_mq_unquiesce_queue(dev->ctrl.admin_q);
+			nvme_start_admin_queue(&dev->ctrl);
 	}
 	mutex_unlock(&dev->shutdown_lock);
 }
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 40317e1b9183..799a35558986 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -919,7 +919,7 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 	else
 		ctrl->ctrl.max_integrity_segments = 0;
 
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 
 	error = nvme_init_ctrl_finish(&ctrl->ctrl);
 	if (error)
@@ -928,7 +928,7 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 	return 0;
 
 out_quiesce_queue:
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	blk_sync_queue(ctrl->ctrl.admin_q);
 out_stop_queue:
 	nvme_rdma_stop_queue(&ctrl->queues[0]);
@@ -1026,12 +1026,12 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 static void nvme_rdma_teardown_admin_queue(struct nvme_rdma_ctrl *ctrl,
 		bool remove)
 {
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	blk_sync_queue(ctrl->ctrl.admin_q);
 	nvme_rdma_stop_queue(&ctrl->queues[0]);
 	nvme_cancel_admin_tagset(&ctrl->ctrl);
 	if (remove)
-		blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+		nvme_start_admin_queue(&ctrl->ctrl);
 	nvme_rdma_destroy_admin_queue(ctrl, remove);
 }
 
@@ -1154,7 +1154,7 @@ static int nvme_rdma_setup_ctrl(struct nvme_rdma_ctrl *ctrl, bool new)
 		nvme_rdma_destroy_io_queues(ctrl, new);
 	}
 destroy_admin:
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	blk_sync_queue(ctrl->ctrl.admin_q);
 	nvme_rdma_stop_queue(&ctrl->queues[0]);
 	nvme_cancel_admin_tagset(&ctrl->ctrl);
@@ -1194,7 +1194,7 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
 	nvme_rdma_teardown_io_queues(ctrl, false);
 	nvme_start_queues(&ctrl->ctrl);
 	nvme_rdma_teardown_admin_queue(ctrl, false);
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 
 	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
 		/* state change failure is ok if we started ctrl delete */
@@ -2232,7 +2232,7 @@ static void nvme_rdma_shutdown_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
 	cancel_delayed_work_sync(&ctrl->reconnect_work);
 
 	nvme_rdma_teardown_io_queues(ctrl, shutdown);
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	if (shutdown)
 		nvme_shutdown_ctrl(&ctrl->ctrl);
 	else
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 3c1c29dd3020..4109b726e052 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1915,7 +1915,7 @@ static int nvme_tcp_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 	if (error)
 		goto out_stop_queue;
 
-	blk_mq_unquiesce_queue(ctrl->admin_q);
+	nvme_start_admin_queue(ctrl);
 
 	error = nvme_init_ctrl_finish(ctrl);
 	if (error)
@@ -1924,7 +1924,7 @@ static int nvme_tcp_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 	return 0;
 
 out_quiesce_queue:
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	nvme_stop_admin_queue(ctrl);
 	blk_sync_queue(ctrl->admin_q);
 out_stop_queue:
 	nvme_tcp_stop_queue(ctrl, 0);
@@ -1946,12 +1946,12 @@ static int nvme_tcp_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 static void nvme_tcp_teardown_admin_queue(struct nvme_ctrl *ctrl,
 		bool remove)
 {
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	nvme_stop_admin_queue(ctrl);
 	blk_sync_queue(ctrl->admin_q);
 	nvme_tcp_stop_queue(ctrl, 0);
 	nvme_cancel_admin_tagset(ctrl);
 	if (remove)
-		blk_mq_unquiesce_queue(ctrl->admin_q);
+		nvme_start_admin_queue(ctrl);
 	nvme_tcp_destroy_admin_queue(ctrl, remove);
 }
 
@@ -1960,7 +1960,7 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
 {
 	if (ctrl->queue_count <= 1)
 		return;
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	nvme_stop_admin_queue(ctrl);
 	nvme_start_freeze(ctrl);
 	nvme_stop_queues(ctrl);
 	nvme_sync_io_queues(ctrl);
@@ -2055,7 +2055,7 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl *ctrl, bool new)
 		nvme_tcp_destroy_io_queues(ctrl, new);
 	}
 destroy_admin:
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	nvme_stop_admin_queue(ctrl);
 	blk_sync_queue(ctrl->admin_q);
 	nvme_tcp_stop_queue(ctrl, 0);
 	nvme_cancel_admin_tagset(ctrl);
@@ -2098,7 +2098,7 @@ static void nvme_tcp_error_recovery_work(struct work_struct *work)
 	/* unquiesce to fail fast pending requests */
 	nvme_start_queues(ctrl);
 	nvme_tcp_teardown_admin_queue(ctrl, false);
-	blk_mq_unquiesce_queue(ctrl->admin_q);
+	nvme_start_admin_queue(ctrl);
 
 	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) {
 		/* state change failure is ok if we started ctrl delete */
@@ -2116,7 +2116,7 @@ static void nvme_tcp_teardown_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
 	cancel_delayed_work_sync(&to_tcp_ctrl(ctrl)->connect_work);
 
 	nvme_tcp_teardown_io_queues(ctrl, shutdown);
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	nvme_stop_admin_queue(ctrl);
 	if (shutdown)
 		nvme_shutdown_ctrl(ctrl);
 	else
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 0285ccc7541f..440e1544033b 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -398,7 +398,7 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 	ctrl->ctrl.max_hw_sectors =
 		(NVME_LOOP_MAX_SEGMENTS - 1) << (PAGE_SHIFT - 9);
 
-	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	nvme_start_admin_queue(&ctrl->ctrl);
 
 	error = nvme_init_ctrl_finish(&ctrl->ctrl);
 	if (error)
@@ -428,7 +428,7 @@ static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
 		nvme_loop_destroy_io_queues(ctrl);
 	}
 
-	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	nvme_stop_admin_queue(&ctrl->ctrl);
 	if (ctrl->ctrl.state == NVME_CTRL_LIVE)
 		nvme_shutdown_ctrl(&ctrl->ctrl);
 
-- 
2.31.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH V2 3/5] nvme: prepare for pairing quiescing and unquiescing
  2021-09-30 12:56 ` Ming Lei
@ 2021-09-30 12:56   ` Ming Lei
  -1 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-09-30 12:56 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme
  Cc: Sagi Grimberg, Keith Busch, Ming Lei

Add two helpers so that we can prepare for pairing quiescing and
unquiescing which will be done in next patch.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c | 52 ++++++++++++++++++++++++----------------
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 91d91d4f76d7..23fb746a8970 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -119,25 +119,6 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
 static void nvme_update_keep_alive(struct nvme_ctrl *ctrl,
 				   struct nvme_command *cmd);
 
-/*
- * Prepare a queue for teardown.
- *
- * This must forcibly unquiesce queues to avoid blocking dispatch, and only set
- * the capacity to 0 after that to avoid blocking dispatchers that may be
- * holding bd_butex.  This will end buffered writers dirtying pages that can't
- * be synced.
- */
-static void nvme_set_queue_dying(struct nvme_ns *ns)
-{
-	if (test_and_set_bit(NVME_NS_DEAD, &ns->flags))
-		return;
-
-	blk_set_queue_dying(ns->queue);
-	blk_mq_unquiesce_queue(ns->queue);
-
-	set_capacity_and_notify(ns->disk, 0);
-}
-
 void nvme_queue_scan(struct nvme_ctrl *ctrl)
 {
 	/*
@@ -4469,6 +4450,35 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 }
 EXPORT_SYMBOL_GPL(nvme_init_ctrl);
 
+static void nvme_start_ns_queue(struct nvme_ns *ns)
+{
+	blk_mq_unquiesce_queue(ns->queue);
+}
+
+static void nvme_stop_ns_queue(struct nvme_ns *ns)
+{
+	blk_mq_quiesce_queue(ns->queue);
+}
+
+/*
+ * Prepare a queue for teardown.
+ *
+ * This must forcibly unquiesce queues to avoid blocking dispatch, and only set
+ * the capacity to 0 after that to avoid blocking dispatchers that may be
+ * holding bd_butex.  This will end buffered writers dirtying pages that can't
+ * be synced.
+ */
+static void nvme_set_queue_dying(struct nvme_ns *ns)
+{
+	if (test_and_set_bit(NVME_NS_DEAD, &ns->flags))
+		return;
+
+	blk_set_queue_dying(ns->queue);
+	nvme_start_ns_queue(ns);
+
+	set_capacity_and_notify(ns->disk, 0);
+}
+
 /**
  * nvme_kill_queues(): Ends all namespace queues
  * @ctrl: the dead controller that needs to end
@@ -4547,7 +4557,7 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 
 	down_read(&ctrl->namespaces_rwsem);
 	list_for_each_entry(ns, &ctrl->namespaces, list)
-		blk_mq_quiesce_queue(ns->queue);
+		nvme_stop_ns_queue(ns);
 	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_stop_queues);
@@ -4558,7 +4568,7 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 
 	down_read(&ctrl->namespaces_rwsem);
 	list_for_each_entry(ns, &ctrl->namespaces, list)
-		blk_mq_unquiesce_queue(ns->queue);
+		nvme_start_ns_queue(ns);
 	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_start_queues);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH V2 3/5] nvme: prepare for pairing quiescing and unquiescing
@ 2021-09-30 12:56   ` Ming Lei
  0 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-09-30 12:56 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme
  Cc: Sagi Grimberg, Keith Busch, Ming Lei

Add two helpers so that we can prepare for pairing quiescing and
unquiescing which will be done in next patch.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c | 52 ++++++++++++++++++++++++----------------
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 91d91d4f76d7..23fb746a8970 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -119,25 +119,6 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
 static void nvme_update_keep_alive(struct nvme_ctrl *ctrl,
 				   struct nvme_command *cmd);
 
-/*
- * Prepare a queue for teardown.
- *
- * This must forcibly unquiesce queues to avoid blocking dispatch, and only set
- * the capacity to 0 after that to avoid blocking dispatchers that may be
- * holding bd_butex.  This will end buffered writers dirtying pages that can't
- * be synced.
- */
-static void nvme_set_queue_dying(struct nvme_ns *ns)
-{
-	if (test_and_set_bit(NVME_NS_DEAD, &ns->flags))
-		return;
-
-	blk_set_queue_dying(ns->queue);
-	blk_mq_unquiesce_queue(ns->queue);
-
-	set_capacity_and_notify(ns->disk, 0);
-}
-
 void nvme_queue_scan(struct nvme_ctrl *ctrl)
 {
 	/*
@@ -4469,6 +4450,35 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 }
 EXPORT_SYMBOL_GPL(nvme_init_ctrl);
 
+static void nvme_start_ns_queue(struct nvme_ns *ns)
+{
+	blk_mq_unquiesce_queue(ns->queue);
+}
+
+static void nvme_stop_ns_queue(struct nvme_ns *ns)
+{
+	blk_mq_quiesce_queue(ns->queue);
+}
+
+/*
+ * Prepare a queue for teardown.
+ *
+ * This must forcibly unquiesce queues to avoid blocking dispatch, and only set
+ * the capacity to 0 after that to avoid blocking dispatchers that may be
+ * holding bd_butex.  This will end buffered writers dirtying pages that can't
+ * be synced.
+ */
+static void nvme_set_queue_dying(struct nvme_ns *ns)
+{
+	if (test_and_set_bit(NVME_NS_DEAD, &ns->flags))
+		return;
+
+	blk_set_queue_dying(ns->queue);
+	nvme_start_ns_queue(ns);
+
+	set_capacity_and_notify(ns->disk, 0);
+}
+
 /**
  * nvme_kill_queues(): Ends all namespace queues
  * @ctrl: the dead controller that needs to end
@@ -4547,7 +4557,7 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 
 	down_read(&ctrl->namespaces_rwsem);
 	list_for_each_entry(ns, &ctrl->namespaces, list)
-		blk_mq_quiesce_queue(ns->queue);
+		nvme_stop_ns_queue(ns);
 	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_stop_queues);
@@ -4558,7 +4568,7 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 
 	down_read(&ctrl->namespaces_rwsem);
 	list_for_each_entry(ns, &ctrl->namespaces, list)
-		blk_mq_unquiesce_queue(ns->queue);
+		nvme_start_ns_queue(ns);
 	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_start_queues);
-- 
2.31.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH V2 4/5] nvme: paring quiesce/unquiesce
  2021-09-30 12:56 ` Ming Lei
@ 2021-09-30 12:56   ` Ming Lei
  -1 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-09-30 12:56 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme
  Cc: Sagi Grimberg, Keith Busch, Ming Lei

The current blk_mq_quiesce_queue() and blk_mq_unquiesce_queue() always
stops and starts the queue unconditionally. And there can be concurrent
quiesce/unquiesce coming from different unrelated code paths, so
unquiesce may come unexpectedly and start queue too early.

Prepare for supporting concurrent quiesce/unquiesce from multiple
contexts, so that we can address the above issue.

NVMe has very complicated quiesce/unquiesce use pattern, add one atomic
bit for makeiing sure that blk-mq quiece/unquiesce is always called in
pair.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c | 12 ++++++++----
 drivers/nvme/host/nvme.h |  2 ++
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 23fb746a8970..4391760e2e9a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4452,12 +4452,14 @@ EXPORT_SYMBOL_GPL(nvme_init_ctrl);
 
 static void nvme_start_ns_queue(struct nvme_ns *ns)
 {
-	blk_mq_unquiesce_queue(ns->queue);
+	if (test_and_clear_bit(NVME_NS_STOPPED, &ns->flags))
+		blk_mq_unquiesce_queue(ns->queue);
 }
 
 static void nvme_stop_ns_queue(struct nvme_ns *ns)
 {
-	blk_mq_quiesce_queue(ns->queue);
+	if (!test_and_set_bit(NVME_NS_STOPPED, &ns->flags))
+		blk_mq_quiesce_queue(ns->queue);
 }
 
 /*
@@ -4575,13 +4577,15 @@ EXPORT_SYMBOL_GPL(nvme_start_queues);
 
 void nvme_stop_admin_queue(struct nvme_ctrl *ctrl)
 {
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	if (!test_and_set_bit(NVME_CTRL_ADMIN_Q_STOPPED, &ctrl->flags))
+		blk_mq_quiesce_queue(ctrl->admin_q);
 }
 EXPORT_SYMBOL_GPL(nvme_stop_admin_queue);
 
 void nvme_start_admin_queue(struct nvme_ctrl *ctrl)
 {
-	blk_mq_unquiesce_queue(ctrl->admin_q);
+	if (test_and_clear_bit(NVME_CTRL_ADMIN_Q_STOPPED, &ctrl->flags))
+		blk_mq_unquiesce_queue(ctrl->admin_q);
 }
 EXPORT_SYMBOL_GPL(nvme_start_admin_queue);
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 47877a5f1515..0fa6fa22f088 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -336,6 +336,7 @@ struct nvme_ctrl {
 	int nr_reconnects;
 	unsigned long flags;
 #define NVME_CTRL_FAILFAST_EXPIRED	0
+#define NVME_CTRL_ADMIN_Q_STOPPED	1
 	struct nvmf_ctrl_options *opts;
 
 	struct page *discard_page;
@@ -457,6 +458,7 @@ struct nvme_ns {
 #define NVME_NS_ANA_PENDING	2
 #define NVME_NS_FORCE_RO	3
 #define NVME_NS_READY		4
+#define NVME_NS_STOPPED		5
 
 	struct cdev		cdev;
 	struct device		cdev_device;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH V2 4/5] nvme: paring quiesce/unquiesce
@ 2021-09-30 12:56   ` Ming Lei
  0 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-09-30 12:56 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme
  Cc: Sagi Grimberg, Keith Busch, Ming Lei

The current blk_mq_quiesce_queue() and blk_mq_unquiesce_queue() always
stops and starts the queue unconditionally. And there can be concurrent
quiesce/unquiesce coming from different unrelated code paths, so
unquiesce may come unexpectedly and start queue too early.

Prepare for supporting concurrent quiesce/unquiesce from multiple
contexts, so that we can address the above issue.

NVMe has very complicated quiesce/unquiesce use pattern, add one atomic
bit for makeiing sure that blk-mq quiece/unquiesce is always called in
pair.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c | 12 ++++++++----
 drivers/nvme/host/nvme.h |  2 ++
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 23fb746a8970..4391760e2e9a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4452,12 +4452,14 @@ EXPORT_SYMBOL_GPL(nvme_init_ctrl);
 
 static void nvme_start_ns_queue(struct nvme_ns *ns)
 {
-	blk_mq_unquiesce_queue(ns->queue);
+	if (test_and_clear_bit(NVME_NS_STOPPED, &ns->flags))
+		blk_mq_unquiesce_queue(ns->queue);
 }
 
 static void nvme_stop_ns_queue(struct nvme_ns *ns)
 {
-	blk_mq_quiesce_queue(ns->queue);
+	if (!test_and_set_bit(NVME_NS_STOPPED, &ns->flags))
+		blk_mq_quiesce_queue(ns->queue);
 }
 
 /*
@@ -4575,13 +4577,15 @@ EXPORT_SYMBOL_GPL(nvme_start_queues);
 
 void nvme_stop_admin_queue(struct nvme_ctrl *ctrl)
 {
-	blk_mq_quiesce_queue(ctrl->admin_q);
+	if (!test_and_set_bit(NVME_CTRL_ADMIN_Q_STOPPED, &ctrl->flags))
+		blk_mq_quiesce_queue(ctrl->admin_q);
 }
 EXPORT_SYMBOL_GPL(nvme_stop_admin_queue);
 
 void nvme_start_admin_queue(struct nvme_ctrl *ctrl)
 {
-	blk_mq_unquiesce_queue(ctrl->admin_q);
+	if (test_and_clear_bit(NVME_CTRL_ADMIN_Q_STOPPED, &ctrl->flags))
+		blk_mq_unquiesce_queue(ctrl->admin_q);
 }
 EXPORT_SYMBOL_GPL(nvme_start_admin_queue);
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 47877a5f1515..0fa6fa22f088 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -336,6 +336,7 @@ struct nvme_ctrl {
 	int nr_reconnects;
 	unsigned long flags;
 #define NVME_CTRL_FAILFAST_EXPIRED	0
+#define NVME_CTRL_ADMIN_Q_STOPPED	1
 	struct nvmf_ctrl_options *opts;
 
 	struct page *discard_page;
@@ -457,6 +458,7 @@ struct nvme_ns {
 #define NVME_NS_ANA_PENDING	2
 #define NVME_NS_FORCE_RO	3
 #define NVME_NS_READY		4
+#define NVME_NS_STOPPED		5
 
 	struct cdev		cdev;
 	struct device		cdev_device;
-- 
2.31.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH V2 5/5] blk-mq: support concurrent queue quiesce/unquiesce
  2021-09-30 12:56 ` Ming Lei
@ 2021-09-30 12:56   ` Ming Lei
  -1 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-09-30 12:56 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme
  Cc: Sagi Grimberg, Keith Busch, Ming Lei

Turns out that blk_mq_freeze_queue() isn't stronger[1] than
blk_mq_quiesce_queue() because dispatch may still be in-progress after
queue is frozen, and in several cases, such as switching io scheduler,
updating nr_requests & wbt latency, we still need to quiesce queue as a
supplement of freezing queue.

As we need to extend uses of blk_mq_quiesce_queue(), it is inevitable
for us to need support nested quiesce, especially we can't let
unquiesce happen when there is quiesce originated from other contexts.

This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
and we only unquiesce queue when it is the last/outer-most one of all
contexts.

One kernel panic issue has been reported[2] when running stress test on
dm-mpath's updating nr_requests and suspending queue, and the similar
issue should exist on almost all drivers which use quiesce/unquiesce.

[1] https://marc.info/?l=linux-block&m=150993988115872&w=2
[2] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c         | 20 +++++++++++++++++---
 include/linux/blkdev.h |  2 ++
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 21bf4c3f0825..10f8a3d4e3a1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -209,7 +209,12 @@ EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
  */
 void blk_mq_quiesce_queue_nowait(struct request_queue *q)
 {
-	blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
+	unsigned long flags;
+
+	spin_lock_irqsave(&q->queue_lock, flags);
+	if (!q->quiesce_depth++)
+		blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
+	spin_unlock_irqrestore(&q->queue_lock, flags);
 }
 EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
 
@@ -250,10 +255,19 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
  */
 void blk_mq_unquiesce_queue(struct request_queue *q)
 {
-	blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
+	unsigned long flags;
+	bool run_queue = false;
+
+	spin_lock_irqsave(&q->queue_lock, flags);
+	if (q->quiesce_depth > 0 && !--q->quiesce_depth) {
+		blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
+		run_queue = true;
+	}
+	spin_unlock_irqrestore(&q->queue_lock, flags);
 
 	/* dispatch requests which are inserted during quiescing */
-	blk_mq_run_hw_queues(q, true);
+	if (run_queue)
+		blk_mq_run_hw_queues(q, true);
 }
 EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 0e960d74615e..74c60e2d61f9 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -315,6 +315,8 @@ struct request_queue {
 	 */
 	struct mutex		mq_freeze_lock;
 
+	int			quiesce_depth;
+
 	struct blk_mq_tag_set	*tag_set;
 	struct list_head	tag_set_list;
 	struct bio_set		bio_split;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH V2 5/5] blk-mq: support concurrent queue quiesce/unquiesce
@ 2021-09-30 12:56   ` Ming Lei
  0 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-09-30 12:56 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme
  Cc: Sagi Grimberg, Keith Busch, Ming Lei

Turns out that blk_mq_freeze_queue() isn't stronger[1] than
blk_mq_quiesce_queue() because dispatch may still be in-progress after
queue is frozen, and in several cases, such as switching io scheduler,
updating nr_requests & wbt latency, we still need to quiesce queue as a
supplement of freezing queue.

As we need to extend uses of blk_mq_quiesce_queue(), it is inevitable
for us to need support nested quiesce, especially we can't let
unquiesce happen when there is quiesce originated from other contexts.

This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
and we only unquiesce queue when it is the last/outer-most one of all
contexts.

One kernel panic issue has been reported[2] when running stress test on
dm-mpath's updating nr_requests and suspending queue, and the similar
issue should exist on almost all drivers which use quiesce/unquiesce.

[1] https://marc.info/?l=linux-block&m=150993988115872&w=2
[2] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c         | 20 +++++++++++++++++---
 include/linux/blkdev.h |  2 ++
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 21bf4c3f0825..10f8a3d4e3a1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -209,7 +209,12 @@ EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
  */
 void blk_mq_quiesce_queue_nowait(struct request_queue *q)
 {
-	blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
+	unsigned long flags;
+
+	spin_lock_irqsave(&q->queue_lock, flags);
+	if (!q->quiesce_depth++)
+		blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
+	spin_unlock_irqrestore(&q->queue_lock, flags);
 }
 EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
 
@@ -250,10 +255,19 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
  */
 void blk_mq_unquiesce_queue(struct request_queue *q)
 {
-	blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
+	unsigned long flags;
+	bool run_queue = false;
+
+	spin_lock_irqsave(&q->queue_lock, flags);
+	if (q->quiesce_depth > 0 && !--q->quiesce_depth) {
+		blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
+		run_queue = true;
+	}
+	spin_unlock_irqrestore(&q->queue_lock, flags);
 
 	/* dispatch requests which are inserted during quiescing */
-	blk_mq_run_hw_queues(q, true);
+	if (run_queue)
+		blk_mq_run_hw_queues(q, true);
 }
 EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 0e960d74615e..74c60e2d61f9 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -315,6 +315,8 @@ struct request_queue {
 	 */
 	struct mutex		mq_freeze_lock;
 
+	int			quiesce_depth;
+
 	struct blk_mq_tag_set	*tag_set;
 	struct list_head	tag_set_list;
 	struct bio_set		bio_split;
-- 
2.31.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: support concurrent queue quiesce/unquiesce
  2021-09-30 12:56   ` Ming Lei
@ 2021-09-30 15:56     ` Bart Van Assche
  -1 siblings, 0 replies; 40+ messages in thread
From: Bart Van Assche @ 2021-09-30 15:56 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, Christoph Hellwig, linux-block, linux-nvme
  Cc: Sagi Grimberg, Keith Busch

On 9/30/21 5:56 AM, Ming Lei wrote:
> Turns out that blk_mq_freeze_queue() isn't stronger[1] than
> blk_mq_quiesce_queue() because dispatch may still be in-progress after
> queue is frozen, and in several cases, such as switching io scheduler,
> updating nr_requests & wbt latency, we still need to quiesce queue as a
> supplement of freezing queue.

Is there agreement about this? If not, how about leaving out the above from the
patch description?

> As we need to extend uses of blk_mq_quiesce_queue(), it is inevitable
> for us to need support nested quiesce, especially we can't let
> unquiesce happen when there is quiesce originated from other contexts.
> 
> This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
> and we only unquiesce queue when it is the last/outer-most one of all
> contexts.
> 
> One kernel panic issue has been reported[2] when running stress test on
> dm-mpath's updating nr_requests and suspending queue, and the similar
> issue should exist on almost all drivers which use quiesce/unquiesce.
> 
> [1] https://marc.info/?l=linux-block&m=150993988115872&w=2
> [2] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html

Please share the call stack of the kernel oops fixed by [2] since that
call stack is not in the patch description.

> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 21bf4c3f0825..10f8a3d4e3a1 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -209,7 +209,12 @@ EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
>    */
>   void blk_mq_quiesce_queue_nowait(struct request_queue *q)
>   {
> -	blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&q->queue_lock, flags);
> +	if (!q->quiesce_depth++)
> +		blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
> +	spin_unlock_irqrestore(&q->queue_lock, flags);
>   }
>   EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);

Consider using == 0 instead of ! to check whether or not quiesce_depth is
zero to improve code readability.

> @@ -250,10 +255,19 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
>    */
>   void blk_mq_unquiesce_queue(struct request_queue *q)
>   {
> -	blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
> +	unsigned long flags;
> +	bool run_queue = false;
> +
> +	spin_lock_irqsave(&q->queue_lock, flags);
> +	if (q->quiesce_depth > 0 && !--q->quiesce_depth) {
> +		blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
> +		run_queue = true;
> +	}
> +	spin_unlock_irqrestore(&q->queue_lock, flags);
>   
>   	/* dispatch requests which are inserted during quiescing */
> -	blk_mq_run_hw_queues(q, true);
> +	if (run_queue)
> +		blk_mq_run_hw_queues(q, true);
>   }

So calling with blk_mq_unquiesce_queue() q->quiesce_depth <= 0 is ignored
quietly? How about triggering a kernel warning for that condition?

Otherwise the code changes look good to me.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: support concurrent queue quiesce/unquiesce
@ 2021-09-30 15:56     ` Bart Van Assche
  0 siblings, 0 replies; 40+ messages in thread
From: Bart Van Assche @ 2021-09-30 15:56 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, Christoph Hellwig, linux-block, linux-nvme
  Cc: Sagi Grimberg, Keith Busch

On 9/30/21 5:56 AM, Ming Lei wrote:
> Turns out that blk_mq_freeze_queue() isn't stronger[1] than
> blk_mq_quiesce_queue() because dispatch may still be in-progress after
> queue is frozen, and in several cases, such as switching io scheduler,
> updating nr_requests & wbt latency, we still need to quiesce queue as a
> supplement of freezing queue.

Is there agreement about this? If not, how about leaving out the above from the
patch description?

> As we need to extend uses of blk_mq_quiesce_queue(), it is inevitable
> for us to need support nested quiesce, especially we can't let
> unquiesce happen when there is quiesce originated from other contexts.
> 
> This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
> and we only unquiesce queue when it is the last/outer-most one of all
> contexts.
> 
> One kernel panic issue has been reported[2] when running stress test on
> dm-mpath's updating nr_requests and suspending queue, and the similar
> issue should exist on almost all drivers which use quiesce/unquiesce.
> 
> [1] https://marc.info/?l=linux-block&m=150993988115872&w=2
> [2] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html

Please share the call stack of the kernel oops fixed by [2] since that
call stack is not in the patch description.

> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 21bf4c3f0825..10f8a3d4e3a1 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -209,7 +209,12 @@ EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
>    */
>   void blk_mq_quiesce_queue_nowait(struct request_queue *q)
>   {
> -	blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&q->queue_lock, flags);
> +	if (!q->quiesce_depth++)
> +		blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
> +	spin_unlock_irqrestore(&q->queue_lock, flags);
>   }
>   EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);

Consider using == 0 instead of ! to check whether or not quiesce_depth is
zero to improve code readability.

> @@ -250,10 +255,19 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
>    */
>   void blk_mq_unquiesce_queue(struct request_queue *q)
>   {
> -	blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
> +	unsigned long flags;
> +	bool run_queue = false;
> +
> +	spin_lock_irqsave(&q->queue_lock, flags);
> +	if (q->quiesce_depth > 0 && !--q->quiesce_depth) {
> +		blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
> +		run_queue = true;
> +	}
> +	spin_unlock_irqrestore(&q->queue_lock, flags);
>   
>   	/* dispatch requests which are inserted during quiescing */
> -	blk_mq_run_hw_queues(q, true);
> +	if (run_queue)
> +		blk_mq_run_hw_queues(q, true);
>   }

So calling with blk_mq_unquiesce_queue() q->quiesce_depth <= 0 is ignored
quietly? How about triggering a kernel warning for that condition?

Otherwise the code changes look good to me.

Thanks,

Bart.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 1/5] nvme: add APIs for stopping/starting admin queue
  2021-09-30 12:56   ` Ming Lei
@ 2021-10-01  5:56     ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 40+ messages in thread
From: Chaitanya Kulkarni @ 2021-10-01  5:56 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, Christoph Hellwig, linux-block, linux-nvme
  Cc: Sagi Grimberg, Keith Busch

On 9/30/2021 5:56 AM, Ming Lei wrote:
> External email: Use caution opening links or attachments
> 
> 
> Add two APIs for stopping and starting admin queue.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>


this patch looks good to me, but from the feedback I've received in past 
we need to add the new functions in the patch where they are actually 
used than adding it in a separate patch.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 1/5] nvme: add APIs for stopping/starting admin queue
@ 2021-10-01  5:56     ` Chaitanya Kulkarni
  0 siblings, 0 replies; 40+ messages in thread
From: Chaitanya Kulkarni @ 2021-10-01  5:56 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, Christoph Hellwig, linux-block, linux-nvme
  Cc: Sagi Grimberg, Keith Busch

On 9/30/2021 5:56 AM, Ming Lei wrote:
> External email: Use caution opening links or attachments
> 
> 
> Add two APIs for stopping and starting admin queue.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>


this patch looks good to me, but from the feedback I've received in past 
we need to add the new functions in the patch where they are actually 
used than adding it in a separate patch.


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 2/5] nvme: apply nvme API to quiesce/unquiesce admin queue
  2021-09-30 12:56   ` Ming Lei
@ 2021-10-01  5:57     ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 40+ messages in thread
From: Chaitanya Kulkarni @ 2021-10-01  5:57 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, Christoph Hellwig, linux-block, linux-nvme
  Cc: Sagi Grimberg, Keith Busch

On 9/30/2021 5:56 AM, Ming Lei wrote:
> External email: Use caution opening links or attachments
> 
> 
> Apply the added two APIs to quiesce/unquiesce admin queue.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>

Probably merge previous patch with this one ?



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 2/5] nvme: apply nvme API to quiesce/unquiesce admin queue
@ 2021-10-01  5:57     ` Chaitanya Kulkarni
  0 siblings, 0 replies; 40+ messages in thread
From: Chaitanya Kulkarni @ 2021-10-01  5:57 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, Christoph Hellwig, linux-block, linux-nvme
  Cc: Sagi Grimberg, Keith Busch

On 9/30/2021 5:56 AM, Ming Lei wrote:
> External email: Use caution opening links or attachments
> 
> 
> Apply the added two APIs to quiesce/unquiesce admin queue.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>

Probably merge previous patch with this one ?


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 1/5] nvme: add APIs for stopping/starting admin queue
  2021-10-01  5:56     ` Chaitanya Kulkarni
@ 2021-10-05  2:23       ` Ming Lei
  -1 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-10-05  2:23 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme,
	Sagi Grimberg, Keith Busch

Hello Chaitanya,

On Fri, Oct 01, 2021 at 05:56:04AM +0000, Chaitanya Kulkarni wrote:
> On 9/30/2021 5:56 AM, Ming Lei wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > Add two APIs for stopping and starting admin queue.
> > 
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> 
> 
> this patch looks good to me, but from the feedback I've received in past 
> we need to add the new functions in the patch where they are actually 
> used than adding it in a separate patch.

The added two APIs are exported via EXPORT_SYMBOL_GPL(), so it won't
cause any build warning. I see lots of such practise too.

It is easier for reviewing in this way since the 1st patch focuses on
API implementation, and the 2nd patch focuses on using the API,
especially there are lots of users in patch 2.

But if you really don't like this way, I am fine to merge the two since
merging is always easier than splitting, :-)


Thanks,
Ming


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 1/5] nvme: add APIs for stopping/starting admin queue
@ 2021-10-05  2:23       ` Ming Lei
  0 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-10-05  2:23 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme,
	Sagi Grimberg, Keith Busch

Hello Chaitanya,

On Fri, Oct 01, 2021 at 05:56:04AM +0000, Chaitanya Kulkarni wrote:
> On 9/30/2021 5:56 AM, Ming Lei wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > Add two APIs for stopping and starting admin queue.
> > 
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> 
> 
> this patch looks good to me, but from the feedback I've received in past 
> we need to add the new functions in the patch where they are actually 
> used than adding it in a separate patch.

The added two APIs are exported via EXPORT_SYMBOL_GPL(), so it won't
cause any build warning. I see lots of such practise too.

It is easier for reviewing in this way since the 1st patch focuses on
API implementation, and the 2nd patch focuses on using the API,
especially there are lots of users in patch 2.

But if you really don't like this way, I am fine to merge the two since
merging is always easier than splitting, :-)


Thanks,
Ming


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: support concurrent queue quiesce/unquiesce
  2021-09-30 15:56     ` Bart Van Assche
@ 2021-10-05  2:31       ` Ming Lei
  -1 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-10-05  2:31 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme,
	Sagi Grimberg, Keith Busch

On Thu, Sep 30, 2021 at 08:56:29AM -0700, Bart Van Assche wrote:
> On 9/30/21 5:56 AM, Ming Lei wrote:
> > Turns out that blk_mq_freeze_queue() isn't stronger[1] than
> > blk_mq_quiesce_queue() because dispatch may still be in-progress after
> > queue is frozen, and in several cases, such as switching io scheduler,
> > updating nr_requests & wbt latency, we still need to quiesce queue as a
> > supplement of freezing queue.
> 
> Is there agreement about this? If not, how about leaving out the above from the
> patch description?

Yeah, actually the code has been merged, please see the related
functions: elevator_switch(), queue_wb_lat_store() and
blk_mq_update_nr_requests().

> 
> > As we need to extend uses of blk_mq_quiesce_queue(), it is inevitable
> > for us to need support nested quiesce, especially we can't let
> > unquiesce happen when there is quiesce originated from other contexts.
> > 
> > This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
> > and we only unquiesce queue when it is the last/outer-most one of all
> > contexts.
> > 
> > One kernel panic issue has been reported[2] when running stress test on
> > dm-mpath's updating nr_requests and suspending queue, and the similar
> > issue should exist on almost all drivers which use quiesce/unquiesce.
> > 
> > [1] https://marc.info/?l=linux-block&m=150993988115872&w=2
> > [2] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html
> 
> Please share the call stack of the kernel oops fixed by [2] since that
> call stack is not in the patch description.

OK, it is something like the following:

[  145.453672] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
[  145.454104] RIP: 0010:dm_softirq_done+0x46/0x220 [dm_mod]
[  145.454536] Code: 85 ed 0f 84 40 01 00 00 44 0f b6 b7 70 01 00 00 4c 8b a5 18 01 00 00 45 89 f5 f6 47 1d 04 75 57 49 8b 7c 24 08 48 85 ff 74 4d <48> 8b 47 08 48 8b 40 58 48 85 c0 74 40 49 8d 4c 24 50 44 89 f2 48
[  145.455423] RSP: 0000:ffffa88600003ef8 EFLAGS: 00010282
[  145.455865] RAX: ffffffffc03fbd10 RBX: ffff979144c00010 RCX: dead000000000200
[  145.456321] RDX: ffffa88600003f30 RSI: ffff979144c00068 RDI: ffffa88600d01040
[  145.456764] RBP: ffff979150eb7990 R08: ffff9791bbc27de0 R09: 0000000000000100
[  145.457205] R10: 0000000000000068 R11: 000000000000004c R12: ffff979144c00138
[  145.457647] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000010
[  145.458080] FS:  00007f57e5d13180(0000) GS:ffff9791bbc00000(0000) knlGS:0000000000000000
[  145.458516] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  145.458945] CR2: ffffa88600d01048 CR3: 0000000106cf8003 CR4: 0000000000370ef0
[  145.459382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  145.459815] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  145.460250] Call Trace:
[  145.460779]  <IRQ>
[  145.461453]  blk_done_softirq+0xa1/0xd0
[  145.462138]  __do_softirq+0xd7/0x2d6
[  145.462814]  irq_exit+0xf7/0x100
[  145.463480]  do_IRQ+0x7f/0xd0
[  145.464131]  common_interrupt+0xf/0xf
[  145.464797]  </IRQ>

> 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 21bf4c3f0825..10f8a3d4e3a1 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -209,7 +209,12 @@ EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
> >    */
> >   void blk_mq_quiesce_queue_nowait(struct request_queue *q)
> >   {
> > -	blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&q->queue_lock, flags);
> > +	if (!q->quiesce_depth++)
> > +		blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
> > +	spin_unlock_irqrestore(&q->queue_lock, flags);
> >   }
> >   EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
> 
> Consider using == 0 instead of ! to check whether or not quiesce_depth is
> zero to improve code readability.

Fine.

> 
> > @@ -250,10 +255,19 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
> >    */
> >   void blk_mq_unquiesce_queue(struct request_queue *q)
> >   {
> > -	blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
> > +	unsigned long flags;
> > +	bool run_queue = false;
> > +
> > +	spin_lock_irqsave(&q->queue_lock, flags);
> > +	if (q->quiesce_depth > 0 && !--q->quiesce_depth) {
> > +		blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
> > +		run_queue = true;
> > +	}
> > +	spin_unlock_irqrestore(&q->queue_lock, flags);
> >   	/* dispatch requests which are inserted during quiescing */
> > -	blk_mq_run_hw_queues(q, true);
> > +	if (run_queue)
> > +		blk_mq_run_hw_queues(q, true);
> >   }
> 
> So calling with blk_mq_unquiesce_queue() q->quiesce_depth <= 0 is ignored
> quietly? How about triggering a kernel warning for that condition?

OK.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: support concurrent queue quiesce/unquiesce
@ 2021-10-05  2:31       ` Ming Lei
  0 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-10-05  2:31 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme,
	Sagi Grimberg, Keith Busch

On Thu, Sep 30, 2021 at 08:56:29AM -0700, Bart Van Assche wrote:
> On 9/30/21 5:56 AM, Ming Lei wrote:
> > Turns out that blk_mq_freeze_queue() isn't stronger[1] than
> > blk_mq_quiesce_queue() because dispatch may still be in-progress after
> > queue is frozen, and in several cases, such as switching io scheduler,
> > updating nr_requests & wbt latency, we still need to quiesce queue as a
> > supplement of freezing queue.
> 
> Is there agreement about this? If not, how about leaving out the above from the
> patch description?

Yeah, actually the code has been merged, please see the related
functions: elevator_switch(), queue_wb_lat_store() and
blk_mq_update_nr_requests().

> 
> > As we need to extend uses of blk_mq_quiesce_queue(), it is inevitable
> > for us to need support nested quiesce, especially we can't let
> > unquiesce happen when there is quiesce originated from other contexts.
> > 
> > This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
> > and we only unquiesce queue when it is the last/outer-most one of all
> > contexts.
> > 
> > One kernel panic issue has been reported[2] when running stress test on
> > dm-mpath's updating nr_requests and suspending queue, and the similar
> > issue should exist on almost all drivers which use quiesce/unquiesce.
> > 
> > [1] https://marc.info/?l=linux-block&m=150993988115872&w=2
> > [2] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html
> 
> Please share the call stack of the kernel oops fixed by [2] since that
> call stack is not in the patch description.

OK, it is something like the following:

[  145.453672] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
[  145.454104] RIP: 0010:dm_softirq_done+0x46/0x220 [dm_mod]
[  145.454536] Code: 85 ed 0f 84 40 01 00 00 44 0f b6 b7 70 01 00 00 4c 8b a5 18 01 00 00 45 89 f5 f6 47 1d 04 75 57 49 8b 7c 24 08 48 85 ff 74 4d <48> 8b 47 08 48 8b 40 58 48 85 c0 74 40 49 8d 4c 24 50 44 89 f2 48
[  145.455423] RSP: 0000:ffffa88600003ef8 EFLAGS: 00010282
[  145.455865] RAX: ffffffffc03fbd10 RBX: ffff979144c00010 RCX: dead000000000200
[  145.456321] RDX: ffffa88600003f30 RSI: ffff979144c00068 RDI: ffffa88600d01040
[  145.456764] RBP: ffff979150eb7990 R08: ffff9791bbc27de0 R09: 0000000000000100
[  145.457205] R10: 0000000000000068 R11: 000000000000004c R12: ffff979144c00138
[  145.457647] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000010
[  145.458080] FS:  00007f57e5d13180(0000) GS:ffff9791bbc00000(0000) knlGS:0000000000000000
[  145.458516] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  145.458945] CR2: ffffa88600d01048 CR3: 0000000106cf8003 CR4: 0000000000370ef0
[  145.459382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  145.459815] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  145.460250] Call Trace:
[  145.460779]  <IRQ>
[  145.461453]  blk_done_softirq+0xa1/0xd0
[  145.462138]  __do_softirq+0xd7/0x2d6
[  145.462814]  irq_exit+0xf7/0x100
[  145.463480]  do_IRQ+0x7f/0xd0
[  145.464131]  common_interrupt+0xf/0xf
[  145.464797]  </IRQ>

> 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 21bf4c3f0825..10f8a3d4e3a1 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -209,7 +209,12 @@ EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
> >    */
> >   void blk_mq_quiesce_queue_nowait(struct request_queue *q)
> >   {
> > -	blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&q->queue_lock, flags);
> > +	if (!q->quiesce_depth++)
> > +		blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
> > +	spin_unlock_irqrestore(&q->queue_lock, flags);
> >   }
> >   EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
> 
> Consider using == 0 instead of ! to check whether or not quiesce_depth is
> zero to improve code readability.

Fine.

> 
> > @@ -250,10 +255,19 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
> >    */
> >   void blk_mq_unquiesce_queue(struct request_queue *q)
> >   {
> > -	blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
> > +	unsigned long flags;
> > +	bool run_queue = false;
> > +
> > +	spin_lock_irqsave(&q->queue_lock, flags);
> > +	if (q->quiesce_depth > 0 && !--q->quiesce_depth) {
> > +		blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
> > +		run_queue = true;
> > +	}
> > +	spin_unlock_irqrestore(&q->queue_lock, flags);
> >   	/* dispatch requests which are inserted during quiescing */
> > -	blk_mq_run_hw_queues(q, true);
> > +	if (run_queue)
> > +		blk_mq_run_hw_queues(q, true);
> >   }
> 
> So calling with blk_mq_unquiesce_queue() q->quiesce_depth <= 0 is ignored
> quietly? How about triggering a kernel warning for that condition?

OK.


Thanks, 
Ming


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 1/5] nvme: add APIs for stopping/starting admin queue
  2021-10-05  2:23       ` Ming Lei
@ 2021-10-05  3:38         ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 40+ messages in thread
From: Chaitanya Kulkarni @ 2021-10-05  3:38 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme,
	Sagi Grimberg, Keith Busch

On 10/4/21 19:23, Ming Lei wrote:
> External email: Use caution opening links or attachments
> 
> 
> Hello Chaitanya,
> 
> On Fri, Oct 01, 2021 at 05:56:04AM +0000, Chaitanya Kulkarni wrote:
>> On 9/30/2021 5:56 AM, Ming Lei wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> Add two APIs for stopping and starting admin queue.
>>>
>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>
>>
>> this patch looks good to me, but from the feedback I've received in past
>> we need to add the new functions in the patch where they are actually
>> used than adding it in a separate patch.
> 
> The added two APIs are exported via EXPORT_SYMBOL_GPL(), so it won't
> cause any build warning. I see lots of such practise too.
> 

the comment was not related to any build or warning.

> It is easier for reviewing in this way since the 1st patch focuses on
> API implementation, and the 2nd patch focuses on using the API,
> especially there are lots of users in patch 2.
> 

I am aware of that, just sharing what I got the comments in the past.

> But if you really don't like this way, I am fine to merge the two since
> merging is always easier than splitting, :-)
> 

it will be good if we can keep the consistency ... nothing else ..

> 
> Thanks,
> Ming
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 1/5] nvme: add APIs for stopping/starting admin queue
@ 2021-10-05  3:38         ` Chaitanya Kulkarni
  0 siblings, 0 replies; 40+ messages in thread
From: Chaitanya Kulkarni @ 2021-10-05  3:38 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme,
	Sagi Grimberg, Keith Busch

On 10/4/21 19:23, Ming Lei wrote:
> External email: Use caution opening links or attachments
> 
> 
> Hello Chaitanya,
> 
> On Fri, Oct 01, 2021 at 05:56:04AM +0000, Chaitanya Kulkarni wrote:
>> On 9/30/2021 5:56 AM, Ming Lei wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> Add two APIs for stopping and starting admin queue.
>>>
>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>
>>
>> this patch looks good to me, but from the feedback I've received in past
>> we need to add the new functions in the patch where they are actually
>> used than adding it in a separate patch.
> 
> The added two APIs are exported via EXPORT_SYMBOL_GPL(), so it won't
> cause any build warning. I see lots of such practise too.
> 

the comment was not related to any build or warning.

> It is easier for reviewing in this way since the 1st patch focuses on
> API implementation, and the 2nd patch focuses on using the API,
> especially there are lots of users in patch 2.
> 

I am aware of that, just sharing what I got the comments in the past.

> But if you really don't like this way, I am fine to merge the two since
> merging is always easier than splitting, :-)
> 

it will be good if we can keep the consistency ... nothing else ..

> 
> Thanks,
> Ming
> 
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 1/5] nvme: add APIs for stopping/starting admin queue
  2021-10-05  3:38         ` Chaitanya Kulkarni
@ 2021-10-05  8:04           ` Ming Lei
  -1 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-10-05  8:04 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme,
	Sagi Grimberg, Keith Busch

On Tue, Oct 05, 2021 at 03:38:18AM +0000, Chaitanya Kulkarni wrote:
> On 10/4/21 19:23, Ming Lei wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > Hello Chaitanya,
> > 
> > On Fri, Oct 01, 2021 at 05:56:04AM +0000, Chaitanya Kulkarni wrote:
> >> On 9/30/2021 5:56 AM, Ming Lei wrote:
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> Add two APIs for stopping and starting admin queue.
> >>>
> >>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> >>
> >>
> >> this patch looks good to me, but from the feedback I've received in past
> >> we need to add the new functions in the patch where they are actually
> >> used than adding it in a separate patch.
> > 
> > The added two APIs are exported via EXPORT_SYMBOL_GPL(), so it won't
> > cause any build warning. I see lots of such practise too.
> > 
> 
> the comment was not related to any build or warning.
> 
> > It is easier for reviewing in this way since the 1st patch focuses on
> > API implementation, and the 2nd patch focuses on using the API,
> > especially there are lots of users in patch 2.
> > 
> 
> I am aware of that, just sharing what I got the comments in the past.
> 
> > But if you really don't like this way, I am fine to merge the two since
> > merging is always easier than splitting, :-)
> > 
> 
> it will be good if we can keep the consistency ... nothing else ..

OK, got it, and the latest such nvme patch is the following one, which
introduced an API in one standalone patch.

commit dda3248e7fc306e0ce3612ae96bdd9a36e2ab04f
Author: Chao Leng <lengchao@huawei.com>
Date:   Thu Feb 4 08:55:11 2021 +0100

    nvme: introduce a nvme_host_path_error helper



Thanks 
Ming


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 1/5] nvme: add APIs for stopping/starting admin queue
@ 2021-10-05  8:04           ` Ming Lei
  0 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-10-05  8:04 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme,
	Sagi Grimberg, Keith Busch

On Tue, Oct 05, 2021 at 03:38:18AM +0000, Chaitanya Kulkarni wrote:
> On 10/4/21 19:23, Ming Lei wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > Hello Chaitanya,
> > 
> > On Fri, Oct 01, 2021 at 05:56:04AM +0000, Chaitanya Kulkarni wrote:
> >> On 9/30/2021 5:56 AM, Ming Lei wrote:
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> Add two APIs for stopping and starting admin queue.
> >>>
> >>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> >>
> >>
> >> this patch looks good to me, but from the feedback I've received in past
> >> we need to add the new functions in the patch where they are actually
> >> used than adding it in a separate patch.
> > 
> > The added two APIs are exported via EXPORT_SYMBOL_GPL(), so it won't
> > cause any build warning. I see lots of such practise too.
> > 
> 
> the comment was not related to any build or warning.
> 
> > It is easier for reviewing in this way since the 1st patch focuses on
> > API implementation, and the 2nd patch focuses on using the API,
> > especially there are lots of users in patch 2.
> > 
> 
> I am aware of that, just sharing what I got the comments in the past.
> 
> > But if you really don't like this way, I am fine to merge the two since
> > merging is always easier than splitting, :-)
> > 
> 
> it will be good if we can keep the consistency ... nothing else ..

OK, got it, and the latest such nvme patch is the following one, which
introduced an API in one standalone patch.

commit dda3248e7fc306e0ce3612ae96bdd9a36e2ab04f
Author: Chao Leng <lengchao@huawei.com>
Date:   Thu Feb 4 08:55:11 2021 +0100

    nvme: introduce a nvme_host_path_error helper



Thanks 
Ming


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: support concurrent queue quiesce/unquiesce
  2021-10-05  2:31       ` Ming Lei
@ 2021-10-08  3:22         ` yukuai (C)
  -1 siblings, 0 replies; 40+ messages in thread
From: yukuai (C) @ 2021-10-08  3:22 UTC (permalink / raw)
  To: Ming Lei, Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme,
	Sagi Grimberg, Keith Busch

On 2021/10/05 10:31, Ming Lei wrote:
> On Thu, Sep 30, 2021 at 08:56:29AM -0700, Bart Van Assche wrote:
>> On 9/30/21 5:56 AM, Ming Lei wrote:
>>> Turns out that blk_mq_freeze_queue() isn't stronger[1] than
>>> blk_mq_quiesce_queue() because dispatch may still be in-progress after
>>> queue is frozen, and in several cases, such as switching io scheduler,
>>> updating nr_requests & wbt latency, we still need to quiesce queue as a
>>> supplement of freezing queue.
>>
>> Is there agreement about this? If not, how about leaving out the above from the
>> patch description?
> 
> Yeah, actually the code has been merged, please see the related
> functions: elevator_switch(), queue_wb_lat_store() and
> blk_mq_update_nr_requests().
> 
>>
>>> As we need to extend uses of blk_mq_quiesce_queue(), it is inevitable
>>> for us to need support nested quiesce, especially we can't let
>>> unquiesce happen when there is quiesce originated from other contexts.
>>>
>>> This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
>>> and we only unquiesce queue when it is the last/outer-most one of all
>>> contexts.
>>>
>>> One kernel panic issue has been reported[2] when running stress test on
>>> dm-mpath's updating nr_requests and suspending queue, and the similar
>>> issue should exist on almost all drivers which use quiesce/unquiesce.
>>>
>>> [1] https://marc.info/?l=linux-block&m=150993988115872&w=2
>>> [2] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html
>>
>> Please share the call stack of the kernel oops fixed by [2] since that
>> call stack is not in the patch description.
> 
> OK, it is something like the following:
> 
> [  145.453672] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
> [  145.454104] RIP: 0010:dm_softirq_done+0x46/0x220 [dm_mod]
> [  145.454536] Code: 85 ed 0f 84 40 01 00 00 44 0f b6 b7 70 01 00 00 4c 8b a5 18 01 00 00 45 89 f5 f6 47 1d 04 75 57 49 8b 7c 24 08 48 85 ff 74 4d <48> 8b 47 08 48 8b 40 58 48 85 c0 74 40 49 8d 4c 24 50 44 89 f2 48
> [  145.455423] RSP: 0000:ffffa88600003ef8 EFLAGS: 00010282
> [  145.455865] RAX: ffffffffc03fbd10 RBX: ffff979144c00010 RCX: dead000000000200
> [  145.456321] RDX: ffffa88600003f30 RSI: ffff979144c00068 RDI: ffffa88600d01040
> [  145.456764] RBP: ffff979150eb7990 R08: ffff9791bbc27de0 R09: 0000000000000100
> [  145.457205] R10: 0000000000000068 R11: 000000000000004c R12: ffff979144c00138
> [  145.457647] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000010
> [  145.458080] FS:  00007f57e5d13180(0000) GS:ffff9791bbc00000(0000) knlGS:0000000000000000
> [  145.458516] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  145.458945] CR2: ffffa88600d01048 CR3: 0000000106cf8003 CR4: 0000000000370ef0
> [  145.459382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  145.459815] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  145.460250] Call Trace:
> [  145.460779]  <IRQ>
> [  145.461453]  blk_done_softirq+0xa1/0xd0
> [  145.462138]  __do_softirq+0xd7/0x2d6
> [  145.462814]  irq_exit+0xf7/0x100
> [  145.463480]  do_IRQ+0x7f/0xd0
> [  145.464131]  common_interrupt+0xf/0xf
> [  145.464797]  </IRQ>

Hi, out test can repoduce this problem:

[  139.158093] BUG: kernel NULL pointer dereference, address: 
0000000000000008
[  139.160285] #PF: supervisor read access in kernel mode
[  139.161905] #PF: error_code(0x0000) - not-present page
[  139.163513] PGD 172745067 P4D 172745067 PUD 17fa88067 PMD 0
[  139.164506] Oops: 0000 [#1] PREEMPT SMP
[  139.165034] CPU: 17 PID: 1083 Comm: nbd-client Not tainted 
5.15.0-rc4-next-20211007-dirty #94
[  139.166179] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS ?-20190727_073836-buildvm-p4
[  139.167962] RIP: 0010:kyber_has_work+0x31/0xb0
[  139.168571] Code: 41 bd 48 00 00 00 41 54 45 31 e4 55 53 48 8b 9f b0 
00 00 00 48 8d 6b 08 49 63 c4 d
[  139.171039] RSP: 0018:ffffc90000f479c8 EFLAGS: 00010246
[  139.171740] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
ffff888176218f40
[  139.172680] RDX: ffffffffffffffff RSI: ffffc90000f479f4 RDI: 
ffff888175310000
[  139.173611] RBP: 0000000000000008 R08: 0000000000000000 R09: 
ffff88882fa6c0a8
[  139.174541] R10: 000000000000030e R11: ffff88817fbcfa10 R12: 
0000000000000000
[  139.175482] R13: 0000000000000048 R14: ffffffff99b7e340 R15: 
ffff8881783edc00
[  139.176402] FS:  00007fa8e62e4b40(0000) GS:ffff88882fa40000(0000) 
knlGS:0000000000000000
[  139.177434] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  139.178190] CR2: 0000000000000008 CR3: 00000001796ac000 CR4: 
00000000000006e0
[  139.179127] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[  139.180066] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[  139.181000] Call Trace:
[  139.182338]  <TASK>
[  139.182638]  blk_mq_run_hw_queue+0x135/0x180
[  139.183207]  blk_mq_run_hw_queues+0x80/0x150
[  139.183766]  blk_mq_unquiesce_queue+0x33/0x40
[  139.184329]  nbd_clear_que+0x52/0xb0 [nbd]
[  139.184869]  nbd_disconnect_and_put+0x6b/0xe0 [nbd]
[  139.185499]  nbd_genl_disconnect+0x125/0x290 [nbd]
[  139.186123]  genl_family_rcv_msg_doit.isra.0+0x102/0x1b0
[  139.186821]  genl_rcv_msg+0xfc/0x2b0
[  139.187300]  ? nbd_ioctl+0x500/0x500 [nbd]
[  139.187847]  ? genl_family_rcv_msg_doit.isra.0+0x1b0/0x1b0
[  139.188564]  netlink_rcv_skb+0x62/0x180
[  139.189075]  genl_rcv+0x34/0x60
[  139.189490]  netlink_unicast+0x26d/0x590
[  139.190006]  netlink_sendmsg+0x3a1/0x6d0
[  139.190513]  ? netlink_rcv_skb+0x180/0x180
[  139.191039]  ____sys_sendmsg+0x1da/0x320
[  139.191556]  ? ____sys_recvmsg+0x130/0x220
[  139.192095]  ___sys_sendmsg+0x8e/0xf0
[  139.192591]  ? ___sys_recvmsg+0xa2/0xf0
[  139.193102]  ? __wake_up_common_lock+0xac/0xe0
[  139.193699]  __sys_sendmsg+0x6d/0xe0
[  139.194167]  __x64_sys_sendmsg+0x23/0x30
[  139.194675]  do_syscall_64+0x35/0x80
[  139.195145]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  139.195806] RIP: 0033:0x7fa8e59ebb87
[  139.196281] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 
00 00 8b 05 6a 2b 2c 00 48 63 8
[  139.198715] RSP: 002b:00007ffd50573c38 EFLAGS: 00000246 ORIG_RAX: 
000000000000002e
[  139.199710] RAX: ffffffffffffffda RBX: 0000000001318120 RCX: 
00007fa8e59ebb87
[  139.200643] RDX: 0000000000000000 RSI: 00007ffd50573c70 RDI: 
0000000000000003
[  139.201583] RBP: 00000000013181f0 R08: 0000000000000014 R09: 
0000000000000002
[  139.202512] R10: 0000000000000006 R11: 0000000000000246 R12: 
0000000001318030
[  139.203434] R13: 00007ffd50573c70 R14: 0000000000000001 R15: 
00000000ffffffff
[  139.204364]  </TASK>
[  139.204652] Modules linked in: nbd
[  139.205101] CR2: 0000000000000008
[  139.205580] ---[ end trace 0248c57101a02431 ]---

hope the call stack can be helpful.

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: support concurrent queue quiesce/unquiesce
@ 2021-10-08  3:22         ` yukuai (C)
  0 siblings, 0 replies; 40+ messages in thread
From: yukuai (C) @ 2021-10-08  3:22 UTC (permalink / raw)
  To: Ming Lei, Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, linux-block, linux-nvme,
	Sagi Grimberg, Keith Busch

On 2021/10/05 10:31, Ming Lei wrote:
> On Thu, Sep 30, 2021 at 08:56:29AM -0700, Bart Van Assche wrote:
>> On 9/30/21 5:56 AM, Ming Lei wrote:
>>> Turns out that blk_mq_freeze_queue() isn't stronger[1] than
>>> blk_mq_quiesce_queue() because dispatch may still be in-progress after
>>> queue is frozen, and in several cases, such as switching io scheduler,
>>> updating nr_requests & wbt latency, we still need to quiesce queue as a
>>> supplement of freezing queue.
>>
>> Is there agreement about this? If not, how about leaving out the above from the
>> patch description?
> 
> Yeah, actually the code has been merged, please see the related
> functions: elevator_switch(), queue_wb_lat_store() and
> blk_mq_update_nr_requests().
> 
>>
>>> As we need to extend uses of blk_mq_quiesce_queue(), it is inevitable
>>> for us to need support nested quiesce, especially we can't let
>>> unquiesce happen when there is quiesce originated from other contexts.
>>>
>>> This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
>>> and we only unquiesce queue when it is the last/outer-most one of all
>>> contexts.
>>>
>>> One kernel panic issue has been reported[2] when running stress test on
>>> dm-mpath's updating nr_requests and suspending queue, and the similar
>>> issue should exist on almost all drivers which use quiesce/unquiesce.
>>>
>>> [1] https://marc.info/?l=linux-block&m=150993988115872&w=2
>>> [2] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html
>>
>> Please share the call stack of the kernel oops fixed by [2] since that
>> call stack is not in the patch description.
> 
> OK, it is something like the following:
> 
> [  145.453672] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
> [  145.454104] RIP: 0010:dm_softirq_done+0x46/0x220 [dm_mod]
> [  145.454536] Code: 85 ed 0f 84 40 01 00 00 44 0f b6 b7 70 01 00 00 4c 8b a5 18 01 00 00 45 89 f5 f6 47 1d 04 75 57 49 8b 7c 24 08 48 85 ff 74 4d <48> 8b 47 08 48 8b 40 58 48 85 c0 74 40 49 8d 4c 24 50 44 89 f2 48
> [  145.455423] RSP: 0000:ffffa88600003ef8 EFLAGS: 00010282
> [  145.455865] RAX: ffffffffc03fbd10 RBX: ffff979144c00010 RCX: dead000000000200
> [  145.456321] RDX: ffffa88600003f30 RSI: ffff979144c00068 RDI: ffffa88600d01040
> [  145.456764] RBP: ffff979150eb7990 R08: ffff9791bbc27de0 R09: 0000000000000100
> [  145.457205] R10: 0000000000000068 R11: 000000000000004c R12: ffff979144c00138
> [  145.457647] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000010
> [  145.458080] FS:  00007f57e5d13180(0000) GS:ffff9791bbc00000(0000) knlGS:0000000000000000
> [  145.458516] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  145.458945] CR2: ffffa88600d01048 CR3: 0000000106cf8003 CR4: 0000000000370ef0
> [  145.459382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  145.459815] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  145.460250] Call Trace:
> [  145.460779]  <IRQ>
> [  145.461453]  blk_done_softirq+0xa1/0xd0
> [  145.462138]  __do_softirq+0xd7/0x2d6
> [  145.462814]  irq_exit+0xf7/0x100
> [  145.463480]  do_IRQ+0x7f/0xd0
> [  145.464131]  common_interrupt+0xf/0xf
> [  145.464797]  </IRQ>

Hi, out test can repoduce this problem:

[  139.158093] BUG: kernel NULL pointer dereference, address: 
0000000000000008
[  139.160285] #PF: supervisor read access in kernel mode
[  139.161905] #PF: error_code(0x0000) - not-present page
[  139.163513] PGD 172745067 P4D 172745067 PUD 17fa88067 PMD 0
[  139.164506] Oops: 0000 [#1] PREEMPT SMP
[  139.165034] CPU: 17 PID: 1083 Comm: nbd-client Not tainted 
5.15.0-rc4-next-20211007-dirty #94
[  139.166179] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS ?-20190727_073836-buildvm-p4
[  139.167962] RIP: 0010:kyber_has_work+0x31/0xb0
[  139.168571] Code: 41 bd 48 00 00 00 41 54 45 31 e4 55 53 48 8b 9f b0 
00 00 00 48 8d 6b 08 49 63 c4 d
[  139.171039] RSP: 0018:ffffc90000f479c8 EFLAGS: 00010246
[  139.171740] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
ffff888176218f40
[  139.172680] RDX: ffffffffffffffff RSI: ffffc90000f479f4 RDI: 
ffff888175310000
[  139.173611] RBP: 0000000000000008 R08: 0000000000000000 R09: 
ffff88882fa6c0a8
[  139.174541] R10: 000000000000030e R11: ffff88817fbcfa10 R12: 
0000000000000000
[  139.175482] R13: 0000000000000048 R14: ffffffff99b7e340 R15: 
ffff8881783edc00
[  139.176402] FS:  00007fa8e62e4b40(0000) GS:ffff88882fa40000(0000) 
knlGS:0000000000000000
[  139.177434] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  139.178190] CR2: 0000000000000008 CR3: 00000001796ac000 CR4: 
00000000000006e0
[  139.179127] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[  139.180066] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[  139.181000] Call Trace:
[  139.182338]  <TASK>
[  139.182638]  blk_mq_run_hw_queue+0x135/0x180
[  139.183207]  blk_mq_run_hw_queues+0x80/0x150
[  139.183766]  blk_mq_unquiesce_queue+0x33/0x40
[  139.184329]  nbd_clear_que+0x52/0xb0 [nbd]
[  139.184869]  nbd_disconnect_and_put+0x6b/0xe0 [nbd]
[  139.185499]  nbd_genl_disconnect+0x125/0x290 [nbd]
[  139.186123]  genl_family_rcv_msg_doit.isra.0+0x102/0x1b0
[  139.186821]  genl_rcv_msg+0xfc/0x2b0
[  139.187300]  ? nbd_ioctl+0x500/0x500 [nbd]
[  139.187847]  ? genl_family_rcv_msg_doit.isra.0+0x1b0/0x1b0
[  139.188564]  netlink_rcv_skb+0x62/0x180
[  139.189075]  genl_rcv+0x34/0x60
[  139.189490]  netlink_unicast+0x26d/0x590
[  139.190006]  netlink_sendmsg+0x3a1/0x6d0
[  139.190513]  ? netlink_rcv_skb+0x180/0x180
[  139.191039]  ____sys_sendmsg+0x1da/0x320
[  139.191556]  ? ____sys_recvmsg+0x130/0x220
[  139.192095]  ___sys_sendmsg+0x8e/0xf0
[  139.192591]  ? ___sys_recvmsg+0xa2/0xf0
[  139.193102]  ? __wake_up_common_lock+0xac/0xe0
[  139.193699]  __sys_sendmsg+0x6d/0xe0
[  139.194167]  __x64_sys_sendmsg+0x23/0x30
[  139.194675]  do_syscall_64+0x35/0x80
[  139.195145]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  139.195806] RIP: 0033:0x7fa8e59ebb87
[  139.196281] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 
00 00 8b 05 6a 2b 2c 00 48 63 8
[  139.198715] RSP: 002b:00007ffd50573c38 EFLAGS: 00000246 ORIG_RAX: 
000000000000002e
[  139.199710] RAX: ffffffffffffffda RBX: 0000000001318120 RCX: 
00007fa8e59ebb87
[  139.200643] RDX: 0000000000000000 RSI: 00007ffd50573c70 RDI: 
0000000000000003
[  139.201583] RBP: 00000000013181f0 R08: 0000000000000014 R09: 
0000000000000002
[  139.202512] R10: 0000000000000006 R11: 0000000000000246 R12: 
0000000001318030
[  139.203434] R13: 00007ffd50573c70 R14: 0000000000000001 R15: 
00000000ffffffff
[  139.204364]  </TASK>
[  139.204652] Modules linked in: nbd
[  139.205101] CR2: 0000000000000008
[  139.205580] ---[ end trace 0248c57101a02431 ]---

hope the call stack can be helpful.

Thanks,
Kuai


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: support concurrent queue quiesce/unquiesce
  2021-10-08  3:22         ` yukuai (C)
@ 2021-10-08  5:10           ` Ming Lei
  -1 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-10-08  5:10 UTC (permalink / raw)
  To: yukuai (C)
  Cc: Bart Van Assche, Jens Axboe, Christoph Hellwig, linux-block,
	linux-nvme, Sagi Grimberg, Keith Busch

Hello yukuai,

On Fri, Oct 08, 2021 at 11:22:38AM +0800, yukuai (C) wrote:
> On 2021/10/05 10:31, Ming Lei wrote:
> > On Thu, Sep 30, 2021 at 08:56:29AM -0700, Bart Van Assche wrote:
> > > On 9/30/21 5:56 AM, Ming Lei wrote:
> > > > Turns out that blk_mq_freeze_queue() isn't stronger[1] than
> > > > blk_mq_quiesce_queue() because dispatch may still be in-progress after
> > > > queue is frozen, and in several cases, such as switching io scheduler,
> > > > updating nr_requests & wbt latency, we still need to quiesce queue as a
> > > > supplement of freezing queue.
> > > 
> > > Is there agreement about this? If not, how about leaving out the above from the
> > > patch description?
> > 
> > Yeah, actually the code has been merged, please see the related
> > functions: elevator_switch(), queue_wb_lat_store() and
> > blk_mq_update_nr_requests().
> > 
> > > 
> > > > As we need to extend uses of blk_mq_quiesce_queue(), it is inevitable
> > > > for us to need support nested quiesce, especially we can't let
> > > > unquiesce happen when there is quiesce originated from other contexts.
> > > > 
> > > > This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
> > > > and we only unquiesce queue when it is the last/outer-most one of all
> > > > contexts.
> > > > 
> > > > One kernel panic issue has been reported[2] when running stress test on
> > > > dm-mpath's updating nr_requests and suspending queue, and the similar
> > > > issue should exist on almost all drivers which use quiesce/unquiesce.
> > > > 
> > > > [1] https://marc.info/?l=linux-block&m=150993988115872&w=2
> > > > [2] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html
> > > 
> > > Please share the call stack of the kernel oops fixed by [2] since that
> > > call stack is not in the patch description.
> > 
> > OK, it is something like the following:
> > 
> > [  145.453672] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
> > [  145.454104] RIP: 0010:dm_softirq_done+0x46/0x220 [dm_mod]
> > [  145.454536] Code: 85 ed 0f 84 40 01 00 00 44 0f b6 b7 70 01 00 00 4c 8b a5 18 01 00 00 45 89 f5 f6 47 1d 04 75 57 49 8b 7c 24 08 48 85 ff 74 4d <48> 8b 47 08 48 8b 40 58 48 85 c0 74 40 49 8d 4c 24 50 44 89 f2 48
> > [  145.455423] RSP: 0000:ffffa88600003ef8 EFLAGS: 00010282
> > [  145.455865] RAX: ffffffffc03fbd10 RBX: ffff979144c00010 RCX: dead000000000200
> > [  145.456321] RDX: ffffa88600003f30 RSI: ffff979144c00068 RDI: ffffa88600d01040
> > [  145.456764] RBP: ffff979150eb7990 R08: ffff9791bbc27de0 R09: 0000000000000100
> > [  145.457205] R10: 0000000000000068 R11: 000000000000004c R12: ffff979144c00138
> > [  145.457647] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000010
> > [  145.458080] FS:  00007f57e5d13180(0000) GS:ffff9791bbc00000(0000) knlGS:0000000000000000
> > [  145.458516] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  145.458945] CR2: ffffa88600d01048 CR3: 0000000106cf8003 CR4: 0000000000370ef0
> > [  145.459382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  145.459815] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [  145.460250] Call Trace:
> > [  145.460779]  <IRQ>
> > [  145.461453]  blk_done_softirq+0xa1/0xd0
> > [  145.462138]  __do_softirq+0xd7/0x2d6
> > [  145.462814]  irq_exit+0xf7/0x100
> > [  145.463480]  do_IRQ+0x7f/0xd0
> > [  145.464131]  common_interrupt+0xf/0xf
> > [  145.464797]  </IRQ>
> 
> Hi, out test can repoduce this problem:
> 
> [  139.158093] BUG: kernel NULL pointer dereference, address:
> 0000000000000008
> [  139.160285] #PF: supervisor read access in kernel mode
> [  139.161905] #PF: error_code(0x0000) - not-present page
> [  139.163513] PGD 172745067 P4D 172745067 PUD 17fa88067 PMD 0
> [  139.164506] Oops: 0000 [#1] PREEMPT SMP
> [  139.165034] CPU: 17 PID: 1083 Comm: nbd-client Not tainted
> 5.15.0-rc4-next-20211007-dirty #94
> [  139.166179] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> ?-20190727_073836-buildvm-p4
> [  139.167962] RIP: 0010:kyber_has_work+0x31/0xb0
> [  139.168571] Code: 41 bd 48 00 00 00 41 54 45 31 e4 55 53 48 8b 9f b0 00
> 00 00 48 8d 6b 08 49 63 c4 d
> [  139.171039] RSP: 0018:ffffc90000f479c8 EFLAGS: 00010246
> [  139.171740] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> ffff888176218f40
> [  139.172680] RDX: ffffffffffffffff RSI: ffffc90000f479f4 RDI:
> ffff888175310000
> [  139.173611] RBP: 0000000000000008 R08: 0000000000000000 R09:
> ffff88882fa6c0a8
> [  139.174541] R10: 000000000000030e R11: ffff88817fbcfa10 R12:
> 0000000000000000
> [  139.175482] R13: 0000000000000048 R14: ffffffff99b7e340 R15:
> ffff8881783edc00
> [  139.176402] FS:  00007fa8e62e4b40(0000) GS:ffff88882fa40000(0000)
> knlGS:0000000000000000
> [  139.177434] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  139.178190] CR2: 0000000000000008 CR3: 00000001796ac000 CR4:
> 00000000000006e0
> [  139.179127] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [  139.180066] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [  139.181000] Call Trace:
> [  139.182338]  <TASK>
> [  139.182638]  blk_mq_run_hw_queue+0x135/0x180
> [  139.183207]  blk_mq_run_hw_queues+0x80/0x150
> [  139.183766]  blk_mq_unquiesce_queue+0x33/0x40
> [  139.184329]  nbd_clear_que+0x52/0xb0 [nbd]
> [  139.184869]  nbd_disconnect_and_put+0x6b/0xe0 [nbd]
> [  139.185499]  nbd_genl_disconnect+0x125/0x290 [nbd]
> [  139.186123]  genl_family_rcv_msg_doit.isra.0+0x102/0x1b0
> [  139.186821]  genl_rcv_msg+0xfc/0x2b0
> [  139.187300]  ? nbd_ioctl+0x500/0x500 [nbd]
> [  139.187847]  ? genl_family_rcv_msg_doit.isra.0+0x1b0/0x1b0
> [  139.188564]  netlink_rcv_skb+0x62/0x180
> [  139.189075]  genl_rcv+0x34/0x60
> [  139.189490]  netlink_unicast+0x26d/0x590
> [  139.190006]  netlink_sendmsg+0x3a1/0x6d0
> [  139.190513]  ? netlink_rcv_skb+0x180/0x180
> [  139.191039]  ____sys_sendmsg+0x1da/0x320
> [  139.191556]  ? ____sys_recvmsg+0x130/0x220
> [  139.192095]  ___sys_sendmsg+0x8e/0xf0
> [  139.192591]  ? ___sys_recvmsg+0xa2/0xf0
> [  139.193102]  ? __wake_up_common_lock+0xac/0xe0
> [  139.193699]  __sys_sendmsg+0x6d/0xe0
> [  139.194167]  __x64_sys_sendmsg+0x23/0x30
> [  139.194675]  do_syscall_64+0x35/0x80
> [  139.195145]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  139.195806] RIP: 0033:0x7fa8e59ebb87
> [  139.196281] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00
> 00 8b 05 6a 2b 2c 00 48 63 8
> [  139.198715] RSP: 002b:00007ffd50573c38 EFLAGS: 00000246 ORIG_RAX:
> 000000000000002e
> [  139.199710] RAX: ffffffffffffffda RBX: 0000000001318120 RCX:
> 00007fa8e59ebb87
> [  139.200643] RDX: 0000000000000000 RSI: 00007ffd50573c70 RDI:
> 0000000000000003
> [  139.201583] RBP: 00000000013181f0 R08: 0000000000000014 R09:
> 0000000000000002
> [  139.202512] R10: 0000000000000006 R11: 0000000000000246 R12:
> 0000000001318030
> [  139.203434] R13: 00007ffd50573c70 R14: 0000000000000001 R15:
> 00000000ffffffff
> [  139.204364]  </TASK>
> [  139.204652] Modules linked in: nbd
> [  139.205101] CR2: 0000000000000008
> [  139.205580] ---[ end trace 0248c57101a02431 ]---
> 
> hope the call stack can be helpful.

Can you share the following info?

1) is the above panic triggered with this quiesce patchset or without
it?

2) what is your test like? Such as, what are the tasks running?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: support concurrent queue quiesce/unquiesce
@ 2021-10-08  5:10           ` Ming Lei
  0 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-10-08  5:10 UTC (permalink / raw)
  To: yukuai (C)
  Cc: Bart Van Assche, Jens Axboe, Christoph Hellwig, linux-block,
	linux-nvme, Sagi Grimberg, Keith Busch

Hello yukuai,

On Fri, Oct 08, 2021 at 11:22:38AM +0800, yukuai (C) wrote:
> On 2021/10/05 10:31, Ming Lei wrote:
> > On Thu, Sep 30, 2021 at 08:56:29AM -0700, Bart Van Assche wrote:
> > > On 9/30/21 5:56 AM, Ming Lei wrote:
> > > > Turns out that blk_mq_freeze_queue() isn't stronger[1] than
> > > > blk_mq_quiesce_queue() because dispatch may still be in-progress after
> > > > queue is frozen, and in several cases, such as switching io scheduler,
> > > > updating nr_requests & wbt latency, we still need to quiesce queue as a
> > > > supplement of freezing queue.
> > > 
> > > Is there agreement about this? If not, how about leaving out the above from the
> > > patch description?
> > 
> > Yeah, actually the code has been merged, please see the related
> > functions: elevator_switch(), queue_wb_lat_store() and
> > blk_mq_update_nr_requests().
> > 
> > > 
> > > > As we need to extend uses of blk_mq_quiesce_queue(), it is inevitable
> > > > for us to need support nested quiesce, especially we can't let
> > > > unquiesce happen when there is quiesce originated from other contexts.
> > > > 
> > > > This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
> > > > and we only unquiesce queue when it is the last/outer-most one of all
> > > > contexts.
> > > > 
> > > > One kernel panic issue has been reported[2] when running stress test on
> > > > dm-mpath's updating nr_requests and suspending queue, and the similar
> > > > issue should exist on almost all drivers which use quiesce/unquiesce.
> > > > 
> > > > [1] https://marc.info/?l=linux-block&m=150993988115872&w=2
> > > > [2] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html
> > > 
> > > Please share the call stack of the kernel oops fixed by [2] since that
> > > call stack is not in the patch description.
> > 
> > OK, it is something like the following:
> > 
> > [  145.453672] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
> > [  145.454104] RIP: 0010:dm_softirq_done+0x46/0x220 [dm_mod]
> > [  145.454536] Code: 85 ed 0f 84 40 01 00 00 44 0f b6 b7 70 01 00 00 4c 8b a5 18 01 00 00 45 89 f5 f6 47 1d 04 75 57 49 8b 7c 24 08 48 85 ff 74 4d <48> 8b 47 08 48 8b 40 58 48 85 c0 74 40 49 8d 4c 24 50 44 89 f2 48
> > [  145.455423] RSP: 0000:ffffa88600003ef8 EFLAGS: 00010282
> > [  145.455865] RAX: ffffffffc03fbd10 RBX: ffff979144c00010 RCX: dead000000000200
> > [  145.456321] RDX: ffffa88600003f30 RSI: ffff979144c00068 RDI: ffffa88600d01040
> > [  145.456764] RBP: ffff979150eb7990 R08: ffff9791bbc27de0 R09: 0000000000000100
> > [  145.457205] R10: 0000000000000068 R11: 000000000000004c R12: ffff979144c00138
> > [  145.457647] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000010
> > [  145.458080] FS:  00007f57e5d13180(0000) GS:ffff9791bbc00000(0000) knlGS:0000000000000000
> > [  145.458516] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  145.458945] CR2: ffffa88600d01048 CR3: 0000000106cf8003 CR4: 0000000000370ef0
> > [  145.459382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  145.459815] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [  145.460250] Call Trace:
> > [  145.460779]  <IRQ>
> > [  145.461453]  blk_done_softirq+0xa1/0xd0
> > [  145.462138]  __do_softirq+0xd7/0x2d6
> > [  145.462814]  irq_exit+0xf7/0x100
> > [  145.463480]  do_IRQ+0x7f/0xd0
> > [  145.464131]  common_interrupt+0xf/0xf
> > [  145.464797]  </IRQ>
> 
> Hi, out test can repoduce this problem:
> 
> [  139.158093] BUG: kernel NULL pointer dereference, address:
> 0000000000000008
> [  139.160285] #PF: supervisor read access in kernel mode
> [  139.161905] #PF: error_code(0x0000) - not-present page
> [  139.163513] PGD 172745067 P4D 172745067 PUD 17fa88067 PMD 0
> [  139.164506] Oops: 0000 [#1] PREEMPT SMP
> [  139.165034] CPU: 17 PID: 1083 Comm: nbd-client Not tainted
> 5.15.0-rc4-next-20211007-dirty #94
> [  139.166179] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> ?-20190727_073836-buildvm-p4
> [  139.167962] RIP: 0010:kyber_has_work+0x31/0xb0
> [  139.168571] Code: 41 bd 48 00 00 00 41 54 45 31 e4 55 53 48 8b 9f b0 00
> 00 00 48 8d 6b 08 49 63 c4 d
> [  139.171039] RSP: 0018:ffffc90000f479c8 EFLAGS: 00010246
> [  139.171740] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> ffff888176218f40
> [  139.172680] RDX: ffffffffffffffff RSI: ffffc90000f479f4 RDI:
> ffff888175310000
> [  139.173611] RBP: 0000000000000008 R08: 0000000000000000 R09:
> ffff88882fa6c0a8
> [  139.174541] R10: 000000000000030e R11: ffff88817fbcfa10 R12:
> 0000000000000000
> [  139.175482] R13: 0000000000000048 R14: ffffffff99b7e340 R15:
> ffff8881783edc00
> [  139.176402] FS:  00007fa8e62e4b40(0000) GS:ffff88882fa40000(0000)
> knlGS:0000000000000000
> [  139.177434] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  139.178190] CR2: 0000000000000008 CR3: 00000001796ac000 CR4:
> 00000000000006e0
> [  139.179127] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [  139.180066] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [  139.181000] Call Trace:
> [  139.182338]  <TASK>
> [  139.182638]  blk_mq_run_hw_queue+0x135/0x180
> [  139.183207]  blk_mq_run_hw_queues+0x80/0x150
> [  139.183766]  blk_mq_unquiesce_queue+0x33/0x40
> [  139.184329]  nbd_clear_que+0x52/0xb0 [nbd]
> [  139.184869]  nbd_disconnect_and_put+0x6b/0xe0 [nbd]
> [  139.185499]  nbd_genl_disconnect+0x125/0x290 [nbd]
> [  139.186123]  genl_family_rcv_msg_doit.isra.0+0x102/0x1b0
> [  139.186821]  genl_rcv_msg+0xfc/0x2b0
> [  139.187300]  ? nbd_ioctl+0x500/0x500 [nbd]
> [  139.187847]  ? genl_family_rcv_msg_doit.isra.0+0x1b0/0x1b0
> [  139.188564]  netlink_rcv_skb+0x62/0x180
> [  139.189075]  genl_rcv+0x34/0x60
> [  139.189490]  netlink_unicast+0x26d/0x590
> [  139.190006]  netlink_sendmsg+0x3a1/0x6d0
> [  139.190513]  ? netlink_rcv_skb+0x180/0x180
> [  139.191039]  ____sys_sendmsg+0x1da/0x320
> [  139.191556]  ? ____sys_recvmsg+0x130/0x220
> [  139.192095]  ___sys_sendmsg+0x8e/0xf0
> [  139.192591]  ? ___sys_recvmsg+0xa2/0xf0
> [  139.193102]  ? __wake_up_common_lock+0xac/0xe0
> [  139.193699]  __sys_sendmsg+0x6d/0xe0
> [  139.194167]  __x64_sys_sendmsg+0x23/0x30
> [  139.194675]  do_syscall_64+0x35/0x80
> [  139.195145]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  139.195806] RIP: 0033:0x7fa8e59ebb87
> [  139.196281] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00
> 00 8b 05 6a 2b 2c 00 48 63 8
> [  139.198715] RSP: 002b:00007ffd50573c38 EFLAGS: 00000246 ORIG_RAX:
> 000000000000002e
> [  139.199710] RAX: ffffffffffffffda RBX: 0000000001318120 RCX:
> 00007fa8e59ebb87
> [  139.200643] RDX: 0000000000000000 RSI: 00007ffd50573c70 RDI:
> 0000000000000003
> [  139.201583] RBP: 00000000013181f0 R08: 0000000000000014 R09:
> 0000000000000002
> [  139.202512] R10: 0000000000000006 R11: 0000000000000246 R12:
> 0000000001318030
> [  139.203434] R13: 00007ffd50573c70 R14: 0000000000000001 R15:
> 00000000ffffffff
> [  139.204364]  </TASK>
> [  139.204652] Modules linked in: nbd
> [  139.205101] CR2: 0000000000000008
> [  139.205580] ---[ end trace 0248c57101a02431 ]---
> 
> hope the call stack can be helpful.

Can you share the following info?

1) is the above panic triggered with this quiesce patchset or without
it?

2) what is your test like? Such as, what are the tasks running?


Thanks,
Ming


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: support concurrent queue quiesce/unquiesce
  2021-10-08  5:10           ` Ming Lei
@ 2021-10-08  6:22             ` yukuai (C)
  -1 siblings, 0 replies; 40+ messages in thread
From: yukuai (C) @ 2021-10-08  6:22 UTC (permalink / raw)
  To: Ming Lei
  Cc: Bart Van Assche, Jens Axboe, Christoph Hellwig, linux-block,
	linux-nvme, Sagi Grimberg, Keith Busch

On 2021/10/08 13:10, Ming Lei wrote:
> Hello yukuai,
> 
> On Fri, Oct 08, 2021 at 11:22:38AM +0800, yukuai (C) wrote:
>> On 2021/10/05 10:31, Ming Lei wrote:
>>> On Thu, Sep 30, 2021 at 08:56:29AM -0700, Bart Van Assche wrote:
>>>> On 9/30/21 5:56 AM, Ming Lei wrote:
>>>>> Turns out that blk_mq_freeze_queue() isn't stronger[1] than
>>>>> blk_mq_quiesce_queue() because dispatch may still be in-progress after
>>>>> queue is frozen, and in several cases, such as switching io scheduler,
>>>>> updating nr_requests & wbt latency, we still need to quiesce queue as a
>>>>> supplement of freezing queue.
>>>>
>>>> Is there agreement about this? If not, how about leaving out the above from the
>>>> patch description?
>>>
>>> Yeah, actually the code has been merged, please see the related
>>> functions: elevator_switch(), queue_wb_lat_store() and
>>> blk_mq_update_nr_requests().
>>>
>>>>
>>>>> As we need to extend uses of blk_mq_quiesce_queue(), it is inevitable
>>>>> for us to need support nested quiesce, especially we can't let
>>>>> unquiesce happen when there is quiesce originated from other contexts.
>>>>>
>>>>> This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
>>>>> and we only unquiesce queue when it is the last/outer-most one of all
>>>>> contexts.
>>>>>
>>>>> One kernel panic issue has been reported[2] when running stress test on
>>>>> dm-mpath's updating nr_requests and suspending queue, and the similar
>>>>> issue should exist on almost all drivers which use quiesce/unquiesce.
>>>>>
>>>>> [1] https://marc.info/?l=linux-block&m=150993988115872&w=2
>>>>> [2] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html
>>>>
>>>> Please share the call stack of the kernel oops fixed by [2] since that
>>>> call stack is not in the patch description.
>>>
>>> OK, it is something like the following:
>>>
>>> [  145.453672] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
>>> [  145.454104] RIP: 0010:dm_softirq_done+0x46/0x220 [dm_mod]
>>> [  145.454536] Code: 85 ed 0f 84 40 01 00 00 44 0f b6 b7 70 01 00 00 4c 8b a5 18 01 00 00 45 89 f5 f6 47 1d 04 75 57 49 8b 7c 24 08 48 85 ff 74 4d <48> 8b 47 08 48 8b 40 58 48 85 c0 74 40 49 8d 4c 24 50 44 89 f2 48
>>> [  145.455423] RSP: 0000:ffffa88600003ef8 EFLAGS: 00010282
>>> [  145.455865] RAX: ffffffffc03fbd10 RBX: ffff979144c00010 RCX: dead000000000200
>>> [  145.456321] RDX: ffffa88600003f30 RSI: ffff979144c00068 RDI: ffffa88600d01040
>>> [  145.456764] RBP: ffff979150eb7990 R08: ffff9791bbc27de0 R09: 0000000000000100
>>> [  145.457205] R10: 0000000000000068 R11: 000000000000004c R12: ffff979144c00138
>>> [  145.457647] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000010
>>> [  145.458080] FS:  00007f57e5d13180(0000) GS:ffff9791bbc00000(0000) knlGS:0000000000000000
>>> [  145.458516] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  145.458945] CR2: ffffa88600d01048 CR3: 0000000106cf8003 CR4: 0000000000370ef0
>>> [  145.459382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [  145.459815] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> [  145.460250] Call Trace:
>>> [  145.460779]  <IRQ>
>>> [  145.461453]  blk_done_softirq+0xa1/0xd0
>>> [  145.462138]  __do_softirq+0xd7/0x2d6
>>> [  145.462814]  irq_exit+0xf7/0x100
>>> [  145.463480]  do_IRQ+0x7f/0xd0
>>> [  145.464131]  common_interrupt+0xf/0xf
>>> [  145.464797]  </IRQ>
>>
>> Hi, out test can repoduce this problem:
>>
>> [  139.158093] BUG: kernel NULL pointer dereference, address:
>> 0000000000000008
>> [  139.160285] #PF: supervisor read access in kernel mode
>> [  139.161905] #PF: error_code(0x0000) - not-present page
>> [  139.163513] PGD 172745067 P4D 172745067 PUD 17fa88067 PMD 0
>> [  139.164506] Oops: 0000 [#1] PREEMPT SMP
>> [  139.165034] CPU: 17 PID: 1083 Comm: nbd-client Not tainted
>> 5.15.0-rc4-next-20211007-dirty #94
>> [  139.166179] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> ?-20190727_073836-buildvm-p4
>> [  139.167962] RIP: 0010:kyber_has_work+0x31/0xb0
>> [  139.168571] Code: 41 bd 48 00 00 00 41 54 45 31 e4 55 53 48 8b 9f b0 00
>> 00 00 48 8d 6b 08 49 63 c4 d
>> [  139.171039] RSP: 0018:ffffc90000f479c8 EFLAGS: 00010246
>> [  139.171740] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
>> ffff888176218f40
>> [  139.172680] RDX: ffffffffffffffff RSI: ffffc90000f479f4 RDI:
>> ffff888175310000
>> [  139.173611] RBP: 0000000000000008 R08: 0000000000000000 R09:
>> ffff88882fa6c0a8
>> [  139.174541] R10: 000000000000030e R11: ffff88817fbcfa10 R12:
>> 0000000000000000
>> [  139.175482] R13: 0000000000000048 R14: ffffffff99b7e340 R15:
>> ffff8881783edc00
>> [  139.176402] FS:  00007fa8e62e4b40(0000) GS:ffff88882fa40000(0000)
>> knlGS:0000000000000000
>> [  139.177434] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  139.178190] CR2: 0000000000000008 CR3: 00000001796ac000 CR4:
>> 00000000000006e0
>> [  139.179127] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [  139.180066] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>> 0000000000000400
>> [  139.181000] Call Trace:
>> [  139.182338]  <TASK>
>> [  139.182638]  blk_mq_run_hw_queue+0x135/0x180
>> [  139.183207]  blk_mq_run_hw_queues+0x80/0x150
>> [  139.183766]  blk_mq_unquiesce_queue+0x33/0x40
>> [  139.184329]  nbd_clear_que+0x52/0xb0 [nbd]
>> [  139.184869]  nbd_disconnect_and_put+0x6b/0xe0 [nbd]
>> [  139.185499]  nbd_genl_disconnect+0x125/0x290 [nbd]
>> [  139.186123]  genl_family_rcv_msg_doit.isra.0+0x102/0x1b0
>> [  139.186821]  genl_rcv_msg+0xfc/0x2b0
>> [  139.187300]  ? nbd_ioctl+0x500/0x500 [nbd]
>> [  139.187847]  ? genl_family_rcv_msg_doit.isra.0+0x1b0/0x1b0
>> [  139.188564]  netlink_rcv_skb+0x62/0x180
>> [  139.189075]  genl_rcv+0x34/0x60
>> [  139.189490]  netlink_unicast+0x26d/0x590
>> [  139.190006]  netlink_sendmsg+0x3a1/0x6d0
>> [  139.190513]  ? netlink_rcv_skb+0x180/0x180
>> [  139.191039]  ____sys_sendmsg+0x1da/0x320
>> [  139.191556]  ? ____sys_recvmsg+0x130/0x220
>> [  139.192095]  ___sys_sendmsg+0x8e/0xf0
>> [  139.192591]  ? ___sys_recvmsg+0xa2/0xf0
>> [  139.193102]  ? __wake_up_common_lock+0xac/0xe0
>> [  139.193699]  __sys_sendmsg+0x6d/0xe0
>> [  139.194167]  __x64_sys_sendmsg+0x23/0x30
>> [  139.194675]  do_syscall_64+0x35/0x80
>> [  139.195145]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>> [  139.195806] RIP: 0033:0x7fa8e59ebb87
>> [  139.196281] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00
>> 00 8b 05 6a 2b 2c 00 48 63 8
>> [  139.198715] RSP: 002b:00007ffd50573c38 EFLAGS: 00000246 ORIG_RAX:
>> 000000000000002e
>> [  139.199710] RAX: ffffffffffffffda RBX: 0000000001318120 RCX:
>> 00007fa8e59ebb87
>> [  139.200643] RDX: 0000000000000000 RSI: 00007ffd50573c70 RDI:
>> 0000000000000003
>> [  139.201583] RBP: 00000000013181f0 R08: 0000000000000014 R09:
>> 0000000000000002
>> [  139.202512] R10: 0000000000000006 R11: 0000000000000246 R12:
>> 0000000001318030
>> [  139.203434] R13: 00007ffd50573c70 R14: 0000000000000001 R15:
>> 00000000ffffffff
>> [  139.204364]  </TASK>
>> [  139.204652] Modules linked in: nbd
>> [  139.205101] CR2: 0000000000000008
>> [  139.205580] ---[ end trace 0248c57101a02431 ]---
>>
>> hope the call stack can be helpful.
> 
> Can you share the following info?
> 
> 1) is the above panic triggered with this quiesce patchset or without
> it?

Without it, of course.
> 
> 2) what is your test like? Such as, what are the tasks running?
> 
The test runs two threads concurrently, the first thread set up a
nbd device, and then shut down the connection(nbd-client -d); the
second thread switch elevator.

By the way, the same problem can be repoduced by nvme, the test
inject io timeout error, and also switch elevator. The call stack
is from v4.19:

[89429.539045] nvme nvme0: I/O 378 QID 1 timeout, aborting
[89429.539850] nvme nvme0: Abort status: 0x4001
[89459.746631] nvme nvme0: I/O 378 QID 1 timeout, reset controller
[89470.197004] kasan: CONFIG_KASAN_INLINE enabled
[89470.197765] kasan: GPF could be caused by NULL-ptr deref or user 
memory access
[89470.198896] general protection fault: 0000 [#1] SMP KASAN
[89470.199706] CPU: 2 PID: 17722 Comm: kworker/u8:3 Not tainted 
4.19.195-01446-gb2a62977d3e4 #1
[89470.200935] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.10.2-1ubuntu1 04/01/2014
[89470.202247] Workqueue: nvme-reset-wq nvme_reset_work
[89470.203009] RIP: 0010:kyber_has_work+0x60/0xf0
[89470.203665] Code: 8b a3 18 01 00 00 49 bd 00 00 00 00 00 fc ff df 49 
8d 5c 24 08 49 8d 6c 24 48 49 83 c4 38 e8 27 1c 1a ff 48 89 d8 48 c1 e8 
03 <42> 80 3c 28 00 75 68 48 3b 1b 75 50 e8 0f 1c 1a ff 48 8d 7b 08 48
[89470.206331] RSP: 0018:ffff888004c379c8 EFLAGS: 00010202
[89470.207096] RAX: 0000000000000001 RBX: 0000000000000008 RCX: 
ffffffffb5238989
[89470.208138] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 
ffff888101ff0be0
[89470.209206] RBP: 0000000000000048 R08: ffffed1020653380 R09: 
ffffed1020653380
[89470.210250] R10: 0000000000000001 R11: ffffed102065337f R12: 
0000000000000038
[89470.211276] R13: dffffc0000000000 R14: 0000000000000000 R15: 
ffff888101ff0be8
[89470.220344] FS:  0000000000000000(0000) GS:ffff88810ed00000(0000) 
knlGS:0000000000000000
[89470.221184] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[89470.221775] CR2: 00007f70134c2770 CR3: 00000000b6e0e000 CR4: 
00000000000006e0
[89470.222521] Call Trace:
[89470.222811]  ? kyber_read_lat_show+0x80/0x80
[89470.223223]  blk_mq_run_hw_queue+0x26b/0x2f0
[89470.223631]  blk_mq_run_hw_queues+0xe8/0x160
[89470.224064]  nvme_start_queues+0x6c/0xb0
[89470.224460]  nvme_start_ctrl+0x155/0x300
[89470.224864]  ? down_read+0xf/0x80
[89470.225186]  ? nvme_set_queue_count+0x1f0/0x1f0
[89470.225803]  ? nvme_change_ctrl_state+0x83/0x2e0
[89470.226285]  nvme_reset_work+0x2fd2/0x4ee0
[89470.226699]  ? rb_erase_cached+0x8f8/0x19d0
[89470.227103]  ? nvme_alloc_queue+0xbf0/0xbf0
[89470.227758]  ? set_next_entity+0x248/0x730
[89470.228174]  ? pick_next_entity+0x199/0x400
[89470.228592]  ? put_prev_entity+0x4f/0x350
[89470.228976]  ? pick_next_task_fair+0x7c9/0x1500
[89470.229424]  ? account_entity_dequeue+0x30b/0x580
[89470.229874]  ? dequeue_entity+0x29b/0xf70
[89470.230352]  ? __switch_to+0x17b/0xb60
[89470.230936]  ? compat_start_thread+0x80/0x80
[89470.231447]  ? dequeue_task_fair+0xe4/0x1d60
[89470.231871]  ? tty_ldisc_receive_buf+0xaa/0x170
[89470.232501]  ? read_word_at_a_time+0xe/0x20
[89470.233195]  ? strscpy+0x96/0x300
[89470.233761]  process_one_work+0x706/0x1270
[89470.234438]  worker_thread+0x91/0xc80
[89470.235038]  ? process_one_work+0x1270/0x1270
[89470.235749]  kthread+0x305/0x3c0
[89470.236292]  ? kthread_park+0x1c0/0x1c0
[89470.236944]  ret_from_fork+0x1f/0x30
[89470.237543] Modules linked in:
[89470.238179] ---[ end trace 4ed415cdb7eafdd4 ]---

Thanks,
Kuai

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: support concurrent queue quiesce/unquiesce
@ 2021-10-08  6:22             ` yukuai (C)
  0 siblings, 0 replies; 40+ messages in thread
From: yukuai (C) @ 2021-10-08  6:22 UTC (permalink / raw)
  To: Ming Lei
  Cc: Bart Van Assche, Jens Axboe, Christoph Hellwig, linux-block,
	linux-nvme, Sagi Grimberg, Keith Busch

On 2021/10/08 13:10, Ming Lei wrote:
> Hello yukuai,
> 
> On Fri, Oct 08, 2021 at 11:22:38AM +0800, yukuai (C) wrote:
>> On 2021/10/05 10:31, Ming Lei wrote:
>>> On Thu, Sep 30, 2021 at 08:56:29AM -0700, Bart Van Assche wrote:
>>>> On 9/30/21 5:56 AM, Ming Lei wrote:
>>>>> Turns out that blk_mq_freeze_queue() isn't stronger[1] than
>>>>> blk_mq_quiesce_queue() because dispatch may still be in-progress after
>>>>> queue is frozen, and in several cases, such as switching io scheduler,
>>>>> updating nr_requests & wbt latency, we still need to quiesce queue as a
>>>>> supplement of freezing queue.
>>>>
>>>> Is there agreement about this? If not, how about leaving out the above from the
>>>> patch description?
>>>
>>> Yeah, actually the code has been merged, please see the related
>>> functions: elevator_switch(), queue_wb_lat_store() and
>>> blk_mq_update_nr_requests().
>>>
>>>>
>>>>> As we need to extend uses of blk_mq_quiesce_queue(), it is inevitable
>>>>> for us to need support nested quiesce, especially we can't let
>>>>> unquiesce happen when there is quiesce originated from other contexts.
>>>>>
>>>>> This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
>>>>> and we only unquiesce queue when it is the last/outer-most one of all
>>>>> contexts.
>>>>>
>>>>> One kernel panic issue has been reported[2] when running stress test on
>>>>> dm-mpath's updating nr_requests and suspending queue, and the similar
>>>>> issue should exist on almost all drivers which use quiesce/unquiesce.
>>>>>
>>>>> [1] https://marc.info/?l=linux-block&m=150993988115872&w=2
>>>>> [2] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html
>>>>
>>>> Please share the call stack of the kernel oops fixed by [2] since that
>>>> call stack is not in the patch description.
>>>
>>> OK, it is something like the following:
>>>
>>> [  145.453672] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
>>> [  145.454104] RIP: 0010:dm_softirq_done+0x46/0x220 [dm_mod]
>>> [  145.454536] Code: 85 ed 0f 84 40 01 00 00 44 0f b6 b7 70 01 00 00 4c 8b a5 18 01 00 00 45 89 f5 f6 47 1d 04 75 57 49 8b 7c 24 08 48 85 ff 74 4d <48> 8b 47 08 48 8b 40 58 48 85 c0 74 40 49 8d 4c 24 50 44 89 f2 48
>>> [  145.455423] RSP: 0000:ffffa88600003ef8 EFLAGS: 00010282
>>> [  145.455865] RAX: ffffffffc03fbd10 RBX: ffff979144c00010 RCX: dead000000000200
>>> [  145.456321] RDX: ffffa88600003f30 RSI: ffff979144c00068 RDI: ffffa88600d01040
>>> [  145.456764] RBP: ffff979150eb7990 R08: ffff9791bbc27de0 R09: 0000000000000100
>>> [  145.457205] R10: 0000000000000068 R11: 000000000000004c R12: ffff979144c00138
>>> [  145.457647] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000010
>>> [  145.458080] FS:  00007f57e5d13180(0000) GS:ffff9791bbc00000(0000) knlGS:0000000000000000
>>> [  145.458516] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  145.458945] CR2: ffffa88600d01048 CR3: 0000000106cf8003 CR4: 0000000000370ef0
>>> [  145.459382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [  145.459815] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> [  145.460250] Call Trace:
>>> [  145.460779]  <IRQ>
>>> [  145.461453]  blk_done_softirq+0xa1/0xd0
>>> [  145.462138]  __do_softirq+0xd7/0x2d6
>>> [  145.462814]  irq_exit+0xf7/0x100
>>> [  145.463480]  do_IRQ+0x7f/0xd0
>>> [  145.464131]  common_interrupt+0xf/0xf
>>> [  145.464797]  </IRQ>
>>
>> Hi, out test can repoduce this problem:
>>
>> [  139.158093] BUG: kernel NULL pointer dereference, address:
>> 0000000000000008
>> [  139.160285] #PF: supervisor read access in kernel mode
>> [  139.161905] #PF: error_code(0x0000) - not-present page
>> [  139.163513] PGD 172745067 P4D 172745067 PUD 17fa88067 PMD 0
>> [  139.164506] Oops: 0000 [#1] PREEMPT SMP
>> [  139.165034] CPU: 17 PID: 1083 Comm: nbd-client Not tainted
>> 5.15.0-rc4-next-20211007-dirty #94
>> [  139.166179] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> ?-20190727_073836-buildvm-p4
>> [  139.167962] RIP: 0010:kyber_has_work+0x31/0xb0
>> [  139.168571] Code: 41 bd 48 00 00 00 41 54 45 31 e4 55 53 48 8b 9f b0 00
>> 00 00 48 8d 6b 08 49 63 c4 d
>> [  139.171039] RSP: 0018:ffffc90000f479c8 EFLAGS: 00010246
>> [  139.171740] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
>> ffff888176218f40
>> [  139.172680] RDX: ffffffffffffffff RSI: ffffc90000f479f4 RDI:
>> ffff888175310000
>> [  139.173611] RBP: 0000000000000008 R08: 0000000000000000 R09:
>> ffff88882fa6c0a8
>> [  139.174541] R10: 000000000000030e R11: ffff88817fbcfa10 R12:
>> 0000000000000000
>> [  139.175482] R13: 0000000000000048 R14: ffffffff99b7e340 R15:
>> ffff8881783edc00
>> [  139.176402] FS:  00007fa8e62e4b40(0000) GS:ffff88882fa40000(0000)
>> knlGS:0000000000000000
>> [  139.177434] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  139.178190] CR2: 0000000000000008 CR3: 00000001796ac000 CR4:
>> 00000000000006e0
>> [  139.179127] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [  139.180066] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>> 0000000000000400
>> [  139.181000] Call Trace:
>> [  139.182338]  <TASK>
>> [  139.182638]  blk_mq_run_hw_queue+0x135/0x180
>> [  139.183207]  blk_mq_run_hw_queues+0x80/0x150
>> [  139.183766]  blk_mq_unquiesce_queue+0x33/0x40
>> [  139.184329]  nbd_clear_que+0x52/0xb0 [nbd]
>> [  139.184869]  nbd_disconnect_and_put+0x6b/0xe0 [nbd]
>> [  139.185499]  nbd_genl_disconnect+0x125/0x290 [nbd]
>> [  139.186123]  genl_family_rcv_msg_doit.isra.0+0x102/0x1b0
>> [  139.186821]  genl_rcv_msg+0xfc/0x2b0
>> [  139.187300]  ? nbd_ioctl+0x500/0x500 [nbd]
>> [  139.187847]  ? genl_family_rcv_msg_doit.isra.0+0x1b0/0x1b0
>> [  139.188564]  netlink_rcv_skb+0x62/0x180
>> [  139.189075]  genl_rcv+0x34/0x60
>> [  139.189490]  netlink_unicast+0x26d/0x590
>> [  139.190006]  netlink_sendmsg+0x3a1/0x6d0
>> [  139.190513]  ? netlink_rcv_skb+0x180/0x180
>> [  139.191039]  ____sys_sendmsg+0x1da/0x320
>> [  139.191556]  ? ____sys_recvmsg+0x130/0x220
>> [  139.192095]  ___sys_sendmsg+0x8e/0xf0
>> [  139.192591]  ? ___sys_recvmsg+0xa2/0xf0
>> [  139.193102]  ? __wake_up_common_lock+0xac/0xe0
>> [  139.193699]  __sys_sendmsg+0x6d/0xe0
>> [  139.194167]  __x64_sys_sendmsg+0x23/0x30
>> [  139.194675]  do_syscall_64+0x35/0x80
>> [  139.195145]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>> [  139.195806] RIP: 0033:0x7fa8e59ebb87
>> [  139.196281] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00
>> 00 8b 05 6a 2b 2c 00 48 63 8
>> [  139.198715] RSP: 002b:00007ffd50573c38 EFLAGS: 00000246 ORIG_RAX:
>> 000000000000002e
>> [  139.199710] RAX: ffffffffffffffda RBX: 0000000001318120 RCX:
>> 00007fa8e59ebb87
>> [  139.200643] RDX: 0000000000000000 RSI: 00007ffd50573c70 RDI:
>> 0000000000000003
>> [  139.201583] RBP: 00000000013181f0 R08: 0000000000000014 R09:
>> 0000000000000002
>> [  139.202512] R10: 0000000000000006 R11: 0000000000000246 R12:
>> 0000000001318030
>> [  139.203434] R13: 00007ffd50573c70 R14: 0000000000000001 R15:
>> 00000000ffffffff
>> [  139.204364]  </TASK>
>> [  139.204652] Modules linked in: nbd
>> [  139.205101] CR2: 0000000000000008
>> [  139.205580] ---[ end trace 0248c57101a02431 ]---
>>
>> hope the call stack can be helpful.
> 
> Can you share the following info?
> 
> 1) is the above panic triggered with this quiesce patchset or without
> it?

Without it, of course.
> 
> 2) what is your test like? Such as, what are the tasks running?
> 
The test runs two threads concurrently, the first thread set up a
nbd device, and then shut down the connection(nbd-client -d); the
second thread switch elevator.

By the way, the same problem can be repoduced by nvme, the test
inject io timeout error, and also switch elevator. The call stack
is from v4.19:

[89429.539045] nvme nvme0: I/O 378 QID 1 timeout, aborting
[89429.539850] nvme nvme0: Abort status: 0x4001
[89459.746631] nvme nvme0: I/O 378 QID 1 timeout, reset controller
[89470.197004] kasan: CONFIG_KASAN_INLINE enabled
[89470.197765] kasan: GPF could be caused by NULL-ptr deref or user 
memory access
[89470.198896] general protection fault: 0000 [#1] SMP KASAN
[89470.199706] CPU: 2 PID: 17722 Comm: kworker/u8:3 Not tainted 
4.19.195-01446-gb2a62977d3e4 #1
[89470.200935] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.10.2-1ubuntu1 04/01/2014
[89470.202247] Workqueue: nvme-reset-wq nvme_reset_work
[89470.203009] RIP: 0010:kyber_has_work+0x60/0xf0
[89470.203665] Code: 8b a3 18 01 00 00 49 bd 00 00 00 00 00 fc ff df 49 
8d 5c 24 08 49 8d 6c 24 48 49 83 c4 38 e8 27 1c 1a ff 48 89 d8 48 c1 e8 
03 <42> 80 3c 28 00 75 68 48 3b 1b 75 50 e8 0f 1c 1a ff 48 8d 7b 08 48
[89470.206331] RSP: 0018:ffff888004c379c8 EFLAGS: 00010202
[89470.207096] RAX: 0000000000000001 RBX: 0000000000000008 RCX: 
ffffffffb5238989
[89470.208138] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 
ffff888101ff0be0
[89470.209206] RBP: 0000000000000048 R08: ffffed1020653380 R09: 
ffffed1020653380
[89470.210250] R10: 0000000000000001 R11: ffffed102065337f R12: 
0000000000000038
[89470.211276] R13: dffffc0000000000 R14: 0000000000000000 R15: 
ffff888101ff0be8
[89470.220344] FS:  0000000000000000(0000) GS:ffff88810ed00000(0000) 
knlGS:0000000000000000
[89470.221184] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[89470.221775] CR2: 00007f70134c2770 CR3: 00000000b6e0e000 CR4: 
00000000000006e0
[89470.222521] Call Trace:
[89470.222811]  ? kyber_read_lat_show+0x80/0x80
[89470.223223]  blk_mq_run_hw_queue+0x26b/0x2f0
[89470.223631]  blk_mq_run_hw_queues+0xe8/0x160
[89470.224064]  nvme_start_queues+0x6c/0xb0
[89470.224460]  nvme_start_ctrl+0x155/0x300
[89470.224864]  ? down_read+0xf/0x80
[89470.225186]  ? nvme_set_queue_count+0x1f0/0x1f0
[89470.225803]  ? nvme_change_ctrl_state+0x83/0x2e0
[89470.226285]  nvme_reset_work+0x2fd2/0x4ee0
[89470.226699]  ? rb_erase_cached+0x8f8/0x19d0
[89470.227103]  ? nvme_alloc_queue+0xbf0/0xbf0
[89470.227758]  ? set_next_entity+0x248/0x730
[89470.228174]  ? pick_next_entity+0x199/0x400
[89470.228592]  ? put_prev_entity+0x4f/0x350
[89470.228976]  ? pick_next_task_fair+0x7c9/0x1500
[89470.229424]  ? account_entity_dequeue+0x30b/0x580
[89470.229874]  ? dequeue_entity+0x29b/0xf70
[89470.230352]  ? __switch_to+0x17b/0xb60
[89470.230936]  ? compat_start_thread+0x80/0x80
[89470.231447]  ? dequeue_task_fair+0xe4/0x1d60
[89470.231871]  ? tty_ldisc_receive_buf+0xaa/0x170
[89470.232501]  ? read_word_at_a_time+0xe/0x20
[89470.233195]  ? strscpy+0x96/0x300
[89470.233761]  process_one_work+0x706/0x1270
[89470.234438]  worker_thread+0x91/0xc80
[89470.235038]  ? process_one_work+0x1270/0x1270
[89470.235749]  kthread+0x305/0x3c0
[89470.236292]  ? kthread_park+0x1c0/0x1c0
[89470.236944]  ret_from_fork+0x1f/0x30
[89470.237543] Modules linked in:
[89470.238179] ---[ end trace 4ed415cdb7eafdd4 ]---

Thanks,
Kuai

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: support concurrent queue quiesce/unquiesce
  2021-10-08  6:22             ` yukuai (C)
@ 2021-10-08  6:35               ` Ming Lei
  -1 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-10-08  6:35 UTC (permalink / raw)
  To: yukuai (C)
  Cc: Bart Van Assche, Jens Axboe, Christoph Hellwig, linux-block,
	linux-nvme, Sagi Grimberg, Keith Busch

On Fri, Oct 08, 2021 at 02:22:39PM +0800, yukuai (C) wrote:
> On 2021/10/08 13:10, Ming Lei wrote:
> > Hello yukuai,
> > 
> > On Fri, Oct 08, 2021 at 11:22:38AM +0800, yukuai (C) wrote:
> > > On 2021/10/05 10:31, Ming Lei wrote:
> > > > On Thu, Sep 30, 2021 at 08:56:29AM -0700, Bart Van Assche wrote:
> > > > > On 9/30/21 5:56 AM, Ming Lei wrote:
> > > > > > Turns out that blk_mq_freeze_queue() isn't stronger[1] than
> > > > > > blk_mq_quiesce_queue() because dispatch may still be in-progress after
> > > > > > queue is frozen, and in several cases, such as switching io scheduler,
> > > > > > updating nr_requests & wbt latency, we still need to quiesce queue as a
> > > > > > supplement of freezing queue.
> > > > > 
> > > > > Is there agreement about this? If not, how about leaving out the above from the
> > > > > patch description?
> > > > 
> > > > Yeah, actually the code has been merged, please see the related
> > > > functions: elevator_switch(), queue_wb_lat_store() and
> > > > blk_mq_update_nr_requests().
> > > > 
> > > > > 
> > > > > > As we need to extend uses of blk_mq_quiesce_queue(), it is inevitable
> > > > > > for us to need support nested quiesce, especially we can't let
> > > > > > unquiesce happen when there is quiesce originated from other contexts.
> > > > > > 
> > > > > > This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
> > > > > > and we only unquiesce queue when it is the last/outer-most one of all
> > > > > > contexts.
> > > > > > 
> > > > > > One kernel panic issue has been reported[2] when running stress test on
> > > > > > dm-mpath's updating nr_requests and suspending queue, and the similar
> > > > > > issue should exist on almost all drivers which use quiesce/unquiesce.
> > > > > > 
> > > > > > [1] https://marc.info/?l=linux-block&m=150993988115872&w=2
> > > > > > [2] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html
> > > > > 
> > > > > Please share the call stack of the kernel oops fixed by [2] since that
> > > > > call stack is not in the patch description.
> > > > 
> > > > OK, it is something like the following:
> > > > 
> > > > [  145.453672] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
> > > > [  145.454104] RIP: 0010:dm_softirq_done+0x46/0x220 [dm_mod]
> > > > [  145.454536] Code: 85 ed 0f 84 40 01 00 00 44 0f b6 b7 70 01 00 00 4c 8b a5 18 01 00 00 45 89 f5 f6 47 1d 04 75 57 49 8b 7c 24 08 48 85 ff 74 4d <48> 8b 47 08 48 8b 40 58 48 85 c0 74 40 49 8d 4c 24 50 44 89 f2 48
> > > > [  145.455423] RSP: 0000:ffffa88600003ef8 EFLAGS: 00010282
> > > > [  145.455865] RAX: ffffffffc03fbd10 RBX: ffff979144c00010 RCX: dead000000000200
> > > > [  145.456321] RDX: ffffa88600003f30 RSI: ffff979144c00068 RDI: ffffa88600d01040
> > > > [  145.456764] RBP: ffff979150eb7990 R08: ffff9791bbc27de0 R09: 0000000000000100
> > > > [  145.457205] R10: 0000000000000068 R11: 000000000000004c R12: ffff979144c00138
> > > > [  145.457647] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000010
> > > > [  145.458080] FS:  00007f57e5d13180(0000) GS:ffff9791bbc00000(0000) knlGS:0000000000000000
> > > > [  145.458516] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [  145.458945] CR2: ffffa88600d01048 CR3: 0000000106cf8003 CR4: 0000000000370ef0
> > > > [  145.459382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > [  145.459815] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > [  145.460250] Call Trace:
> > > > [  145.460779]  <IRQ>
> > > > [  145.461453]  blk_done_softirq+0xa1/0xd0
> > > > [  145.462138]  __do_softirq+0xd7/0x2d6
> > > > [  145.462814]  irq_exit+0xf7/0x100
> > > > [  145.463480]  do_IRQ+0x7f/0xd0
> > > > [  145.464131]  common_interrupt+0xf/0xf
> > > > [  145.464797]  </IRQ>
> > > 
> > > Hi, out test can repoduce this problem:
> > > 
> > > [  139.158093] BUG: kernel NULL pointer dereference, address:
> > > 0000000000000008
> > > [  139.160285] #PF: supervisor read access in kernel mode
> > > [  139.161905] #PF: error_code(0x0000) - not-present page
> > > [  139.163513] PGD 172745067 P4D 172745067 PUD 17fa88067 PMD 0
> > > [  139.164506] Oops: 0000 [#1] PREEMPT SMP
> > > [  139.165034] CPU: 17 PID: 1083 Comm: nbd-client Not tainted
> > > 5.15.0-rc4-next-20211007-dirty #94
> > > [  139.166179] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > > ?-20190727_073836-buildvm-p4
> > > [  139.167962] RIP: 0010:kyber_has_work+0x31/0xb0
> > > [  139.168571] Code: 41 bd 48 00 00 00 41 54 45 31 e4 55 53 48 8b 9f b0 00
> > > 00 00 48 8d 6b 08 49 63 c4 d
> > > [  139.171039] RSP: 0018:ffffc90000f479c8 EFLAGS: 00010246
> > > [  139.171740] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > > ffff888176218f40
> > > [  139.172680] RDX: ffffffffffffffff RSI: ffffc90000f479f4 RDI:
> > > ffff888175310000
> > > [  139.173611] RBP: 0000000000000008 R08: 0000000000000000 R09:
> > > ffff88882fa6c0a8
> > > [  139.174541] R10: 000000000000030e R11: ffff88817fbcfa10 R12:
> > > 0000000000000000
> > > [  139.175482] R13: 0000000000000048 R14: ffffffff99b7e340 R15:
> > > ffff8881783edc00
> > > [  139.176402] FS:  00007fa8e62e4b40(0000) GS:ffff88882fa40000(0000)
> > > knlGS:0000000000000000
> > > [  139.177434] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  139.178190] CR2: 0000000000000008 CR3: 00000001796ac000 CR4:
> > > 00000000000006e0
> > > [  139.179127] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > 0000000000000000
> > > [  139.180066] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > 0000000000000400
> > > [  139.181000] Call Trace:
> > > [  139.182338]  <TASK>
> > > [  139.182638]  blk_mq_run_hw_queue+0x135/0x180
> > > [  139.183207]  blk_mq_run_hw_queues+0x80/0x150
> > > [  139.183766]  blk_mq_unquiesce_queue+0x33/0x40
> > > [  139.184329]  nbd_clear_que+0x52/0xb0 [nbd]
> > > [  139.184869]  nbd_disconnect_and_put+0x6b/0xe0 [nbd]
> > > [  139.185499]  nbd_genl_disconnect+0x125/0x290 [nbd]
> > > [  139.186123]  genl_family_rcv_msg_doit.isra.0+0x102/0x1b0
> > > [  139.186821]  genl_rcv_msg+0xfc/0x2b0
> > > [  139.187300]  ? nbd_ioctl+0x500/0x500 [nbd]
> > > [  139.187847]  ? genl_family_rcv_msg_doit.isra.0+0x1b0/0x1b0
> > > [  139.188564]  netlink_rcv_skb+0x62/0x180
> > > [  139.189075]  genl_rcv+0x34/0x60
> > > [  139.189490]  netlink_unicast+0x26d/0x590
> > > [  139.190006]  netlink_sendmsg+0x3a1/0x6d0
> > > [  139.190513]  ? netlink_rcv_skb+0x180/0x180
> > > [  139.191039]  ____sys_sendmsg+0x1da/0x320
> > > [  139.191556]  ? ____sys_recvmsg+0x130/0x220
> > > [  139.192095]  ___sys_sendmsg+0x8e/0xf0
> > > [  139.192591]  ? ___sys_recvmsg+0xa2/0xf0
> > > [  139.193102]  ? __wake_up_common_lock+0xac/0xe0
> > > [  139.193699]  __sys_sendmsg+0x6d/0xe0
> > > [  139.194167]  __x64_sys_sendmsg+0x23/0x30
> > > [  139.194675]  do_syscall_64+0x35/0x80
> > > [  139.195145]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > [  139.195806] RIP: 0033:0x7fa8e59ebb87
> > > [  139.196281] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00
> > > 00 8b 05 6a 2b 2c 00 48 63 8
> > > [  139.198715] RSP: 002b:00007ffd50573c38 EFLAGS: 00000246 ORIG_RAX:
> > > 000000000000002e
> > > [  139.199710] RAX: ffffffffffffffda RBX: 0000000001318120 RCX:
> > > 00007fa8e59ebb87
> > > [  139.200643] RDX: 0000000000000000 RSI: 00007ffd50573c70 RDI:
> > > 0000000000000003
> > > [  139.201583] RBP: 00000000013181f0 R08: 0000000000000014 R09:
> > > 0000000000000002
> > > [  139.202512] R10: 0000000000000006 R11: 0000000000000246 R12:
> > > 0000000001318030
> > > [  139.203434] R13: 00007ffd50573c70 R14: 0000000000000001 R15:
> > > 00000000ffffffff
> > > [  139.204364]  </TASK>
> > > [  139.204652] Modules linked in: nbd
> > > [  139.205101] CR2: 0000000000000008
> > > [  139.205580] ---[ end trace 0248c57101a02431 ]---
> > > 
> > > hope the call stack can be helpful.
> > 
> > Can you share the following info?
> > 
> > 1) is the above panic triggered with this quiesce patchset or without
> > it?
> 
> Without it, of course.

Got it, any chance to test this patchset V2 and see if your nbd issue can be
fixed?



Thanks,
Ming


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: support concurrent queue quiesce/unquiesce
@ 2021-10-08  6:35               ` Ming Lei
  0 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2021-10-08  6:35 UTC (permalink / raw)
  To: yukuai (C)
  Cc: Bart Van Assche, Jens Axboe, Christoph Hellwig, linux-block,
	linux-nvme, Sagi Grimberg, Keith Busch

On Fri, Oct 08, 2021 at 02:22:39PM +0800, yukuai (C) wrote:
> On 2021/10/08 13:10, Ming Lei wrote:
> > Hello yukuai,
> > 
> > On Fri, Oct 08, 2021 at 11:22:38AM +0800, yukuai (C) wrote:
> > > On 2021/10/05 10:31, Ming Lei wrote:
> > > > On Thu, Sep 30, 2021 at 08:56:29AM -0700, Bart Van Assche wrote:
> > > > > On 9/30/21 5:56 AM, Ming Lei wrote:
> > > > > > Turns out that blk_mq_freeze_queue() isn't stronger[1] than
> > > > > > blk_mq_quiesce_queue() because dispatch may still be in-progress after
> > > > > > queue is frozen, and in several cases, such as switching io scheduler,
> > > > > > updating nr_requests & wbt latency, we still need to quiesce queue as a
> > > > > > supplement of freezing queue.
> > > > > 
> > > > > Is there agreement about this? If not, how about leaving out the above from the
> > > > > patch description?
> > > > 
> > > > Yeah, actually the code has been merged, please see the related
> > > > functions: elevator_switch(), queue_wb_lat_store() and
> > > > blk_mq_update_nr_requests().
> > > > 
> > > > > 
> > > > > > As we need to extend uses of blk_mq_quiesce_queue(), it is inevitable
> > > > > > for us to need support nested quiesce, especially we can't let
> > > > > > unquiesce happen when there is quiesce originated from other contexts.
> > > > > > 
> > > > > > This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
> > > > > > and we only unquiesce queue when it is the last/outer-most one of all
> > > > > > contexts.
> > > > > > 
> > > > > > One kernel panic issue has been reported[2] when running stress test on
> > > > > > dm-mpath's updating nr_requests and suspending queue, and the similar
> > > > > > issue should exist on almost all drivers which use quiesce/unquiesce.
> > > > > > 
> > > > > > [1] https://marc.info/?l=linux-block&m=150993988115872&w=2
> > > > > > [2] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html
> > > > > 
> > > > > Please share the call stack of the kernel oops fixed by [2] since that
> > > > > call stack is not in the patch description.
> > > > 
> > > > OK, it is something like the following:
> > > > 
> > > > [  145.453672] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
> > > > [  145.454104] RIP: 0010:dm_softirq_done+0x46/0x220 [dm_mod]
> > > > [  145.454536] Code: 85 ed 0f 84 40 01 00 00 44 0f b6 b7 70 01 00 00 4c 8b a5 18 01 00 00 45 89 f5 f6 47 1d 04 75 57 49 8b 7c 24 08 48 85 ff 74 4d <48> 8b 47 08 48 8b 40 58 48 85 c0 74 40 49 8d 4c 24 50 44 89 f2 48
> > > > [  145.455423] RSP: 0000:ffffa88600003ef8 EFLAGS: 00010282
> > > > [  145.455865] RAX: ffffffffc03fbd10 RBX: ffff979144c00010 RCX: dead000000000200
> > > > [  145.456321] RDX: ffffa88600003f30 RSI: ffff979144c00068 RDI: ffffa88600d01040
> > > > [  145.456764] RBP: ffff979150eb7990 R08: ffff9791bbc27de0 R09: 0000000000000100
> > > > [  145.457205] R10: 0000000000000068 R11: 000000000000004c R12: ffff979144c00138
> > > > [  145.457647] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000010
> > > > [  145.458080] FS:  00007f57e5d13180(0000) GS:ffff9791bbc00000(0000) knlGS:0000000000000000
> > > > [  145.458516] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [  145.458945] CR2: ffffa88600d01048 CR3: 0000000106cf8003 CR4: 0000000000370ef0
> > > > [  145.459382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > [  145.459815] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > [  145.460250] Call Trace:
> > > > [  145.460779]  <IRQ>
> > > > [  145.461453]  blk_done_softirq+0xa1/0xd0
> > > > [  145.462138]  __do_softirq+0xd7/0x2d6
> > > > [  145.462814]  irq_exit+0xf7/0x100
> > > > [  145.463480]  do_IRQ+0x7f/0xd0
> > > > [  145.464131]  common_interrupt+0xf/0xf
> > > > [  145.464797]  </IRQ>
> > > 
> > > Hi, out test can repoduce this problem:
> > > 
> > > [  139.158093] BUG: kernel NULL pointer dereference, address:
> > > 0000000000000008
> > > [  139.160285] #PF: supervisor read access in kernel mode
> > > [  139.161905] #PF: error_code(0x0000) - not-present page
> > > [  139.163513] PGD 172745067 P4D 172745067 PUD 17fa88067 PMD 0
> > > [  139.164506] Oops: 0000 [#1] PREEMPT SMP
> > > [  139.165034] CPU: 17 PID: 1083 Comm: nbd-client Not tainted
> > > 5.15.0-rc4-next-20211007-dirty #94
> > > [  139.166179] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > > ?-20190727_073836-buildvm-p4
> > > [  139.167962] RIP: 0010:kyber_has_work+0x31/0xb0
> > > [  139.168571] Code: 41 bd 48 00 00 00 41 54 45 31 e4 55 53 48 8b 9f b0 00
> > > 00 00 48 8d 6b 08 49 63 c4 d
> > > [  139.171039] RSP: 0018:ffffc90000f479c8 EFLAGS: 00010246
> > > [  139.171740] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > > ffff888176218f40
> > > [  139.172680] RDX: ffffffffffffffff RSI: ffffc90000f479f4 RDI:
> > > ffff888175310000
> > > [  139.173611] RBP: 0000000000000008 R08: 0000000000000000 R09:
> > > ffff88882fa6c0a8
> > > [  139.174541] R10: 000000000000030e R11: ffff88817fbcfa10 R12:
> > > 0000000000000000
> > > [  139.175482] R13: 0000000000000048 R14: ffffffff99b7e340 R15:
> > > ffff8881783edc00
> > > [  139.176402] FS:  00007fa8e62e4b40(0000) GS:ffff88882fa40000(0000)
> > > knlGS:0000000000000000
> > > [  139.177434] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  139.178190] CR2: 0000000000000008 CR3: 00000001796ac000 CR4:
> > > 00000000000006e0
> > > [  139.179127] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > 0000000000000000
> > > [  139.180066] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > 0000000000000400
> > > [  139.181000] Call Trace:
> > > [  139.182338]  <TASK>
> > > [  139.182638]  blk_mq_run_hw_queue+0x135/0x180
> > > [  139.183207]  blk_mq_run_hw_queues+0x80/0x150
> > > [  139.183766]  blk_mq_unquiesce_queue+0x33/0x40
> > > [  139.184329]  nbd_clear_que+0x52/0xb0 [nbd]
> > > [  139.184869]  nbd_disconnect_and_put+0x6b/0xe0 [nbd]
> > > [  139.185499]  nbd_genl_disconnect+0x125/0x290 [nbd]
> > > [  139.186123]  genl_family_rcv_msg_doit.isra.0+0x102/0x1b0
> > > [  139.186821]  genl_rcv_msg+0xfc/0x2b0
> > > [  139.187300]  ? nbd_ioctl+0x500/0x500 [nbd]
> > > [  139.187847]  ? genl_family_rcv_msg_doit.isra.0+0x1b0/0x1b0
> > > [  139.188564]  netlink_rcv_skb+0x62/0x180
> > > [  139.189075]  genl_rcv+0x34/0x60
> > > [  139.189490]  netlink_unicast+0x26d/0x590
> > > [  139.190006]  netlink_sendmsg+0x3a1/0x6d0
> > > [  139.190513]  ? netlink_rcv_skb+0x180/0x180
> > > [  139.191039]  ____sys_sendmsg+0x1da/0x320
> > > [  139.191556]  ? ____sys_recvmsg+0x130/0x220
> > > [  139.192095]  ___sys_sendmsg+0x8e/0xf0
> > > [  139.192591]  ? ___sys_recvmsg+0xa2/0xf0
> > > [  139.193102]  ? __wake_up_common_lock+0xac/0xe0
> > > [  139.193699]  __sys_sendmsg+0x6d/0xe0
> > > [  139.194167]  __x64_sys_sendmsg+0x23/0x30
> > > [  139.194675]  do_syscall_64+0x35/0x80
> > > [  139.195145]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > [  139.195806] RIP: 0033:0x7fa8e59ebb87
> > > [  139.196281] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00
> > > 00 8b 05 6a 2b 2c 00 48 63 8
> > > [  139.198715] RSP: 002b:00007ffd50573c38 EFLAGS: 00000246 ORIG_RAX:
> > > 000000000000002e
> > > [  139.199710] RAX: ffffffffffffffda RBX: 0000000001318120 RCX:
> > > 00007fa8e59ebb87
> > > [  139.200643] RDX: 0000000000000000 RSI: 00007ffd50573c70 RDI:
> > > 0000000000000003
> > > [  139.201583] RBP: 00000000013181f0 R08: 0000000000000014 R09:
> > > 0000000000000002
> > > [  139.202512] R10: 0000000000000006 R11: 0000000000000246 R12:
> > > 0000000001318030
> > > [  139.203434] R13: 00007ffd50573c70 R14: 0000000000000001 R15:
> > > 00000000ffffffff
> > > [  139.204364]  </TASK>
> > > [  139.204652] Modules linked in: nbd
> > > [  139.205101] CR2: 0000000000000008
> > > [  139.205580] ---[ end trace 0248c57101a02431 ]---
> > > 
> > > hope the call stack can be helpful.
> > 
> > Can you share the following info?
> > 
> > 1) is the above panic triggered with this quiesce patchset or without
> > it?
> 
> Without it, of course.

Got it, any chance to test this patchset V2 and see if your nbd issue can be
fixed?



Thanks,
Ming


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: support concurrent queue quiesce/unquiesce
  2021-10-08  6:35               ` Ming Lei
@ 2021-10-08  7:13                 ` yukuai (C)
  -1 siblings, 0 replies; 40+ messages in thread
From: yukuai (C) @ 2021-10-08  7:13 UTC (permalink / raw)
  To: Ming Lei
  Cc: Bart Van Assche, Jens Axboe, Christoph Hellwig, linux-block,
	linux-nvme, Sagi Grimberg, Keith Busch

On 2021/10/08 14:35, Ming Lei wrote:
> On Fri, Oct 08, 2021 at 02:22:39PM +0800, yukuai (C) wrote:
>> On 2021/10/08 13:10, Ming Lei wrote:
>>> Hello yukuai,
>>>
>>> On Fri, Oct 08, 2021 at 11:22:38AM +0800, yukuai (C) wrote:
>>>> On 2021/10/05 10:31, Ming Lei wrote:
>>>>> On Thu, Sep 30, 2021 at 08:56:29AM -0700, Bart Van Assche wrote:
>>>>>> On 9/30/21 5:56 AM, Ming Lei wrote:
>>>>>>> Turns out that blk_mq_freeze_queue() isn't stronger[1] than
>>>>>>> blk_mq_quiesce_queue() because dispatch may still be in-progress after
>>>>>>> queue is frozen, and in several cases, such as switching io scheduler,
>>>>>>> updating nr_requests & wbt latency, we still need to quiesce queue as a
>>>>>>> supplement of freezing queue.
>>>>>>
>>>>>> Is there agreement about this? If not, how about leaving out the above from the
>>>>>> patch description?
>>>>>
>>>>> Yeah, actually the code has been merged, please see the related
>>>>> functions: elevator_switch(), queue_wb_lat_store() and
>>>>> blk_mq_update_nr_requests().
>>>>>
>>>>>>
>>>>>>> As we need to extend uses of blk_mq_quiesce_queue(), it is inevitable
>>>>>>> for us to need support nested quiesce, especially we can't let
>>>>>>> unquiesce happen when there is quiesce originated from other contexts.
>>>>>>>
>>>>>>> This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
>>>>>>> and we only unquiesce queue when it is the last/outer-most one of all
>>>>>>> contexts.
>>>>>>>
>>>>>>> One kernel panic issue has been reported[2] when running stress test on
>>>>>>> dm-mpath's updating nr_requests and suspending queue, and the similar
>>>>>>> issue should exist on almost all drivers which use quiesce/unquiesce.
>>>>>>>
>>>>>>> [1] https://marc.info/?l=linux-block&m=150993988115872&w=2
>>>>>>> [2] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html
>>>>>>
>>>>>> Please share the call stack of the kernel oops fixed by [2] since that
>>>>>> call stack is not in the patch description.
>>>>>
>>>>> OK, it is something like the following:
>>>>>
>>>>> [  145.453672] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
>>>>> [  145.454104] RIP: 0010:dm_softirq_done+0x46/0x220 [dm_mod]
>>>>> [  145.454536] Code: 85 ed 0f 84 40 01 00 00 44 0f b6 b7 70 01 00 00 4c 8b a5 18 01 00 00 45 89 f5 f6 47 1d 04 75 57 49 8b 7c 24 08 48 85 ff 74 4d <48> 8b 47 08 48 8b 40 58 48 85 c0 74 40 49 8d 4c 24 50 44 89 f2 48
>>>>> [  145.455423] RSP: 0000:ffffa88600003ef8 EFLAGS: 00010282
>>>>> [  145.455865] RAX: ffffffffc03fbd10 RBX: ffff979144c00010 RCX: dead000000000200
>>>>> [  145.456321] RDX: ffffa88600003f30 RSI: ffff979144c00068 RDI: ffffa88600d01040
>>>>> [  145.456764] RBP: ffff979150eb7990 R08: ffff9791bbc27de0 R09: 0000000000000100
>>>>> [  145.457205] R10: 0000000000000068 R11: 000000000000004c R12: ffff979144c00138
>>>>> [  145.457647] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000010
>>>>> [  145.458080] FS:  00007f57e5d13180(0000) GS:ffff9791bbc00000(0000) knlGS:0000000000000000
>>>>> [  145.458516] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [  145.458945] CR2: ffffa88600d01048 CR3: 0000000106cf8003 CR4: 0000000000370ef0
>>>>> [  145.459382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> [  145.459815] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> [  145.460250] Call Trace:
>>>>> [  145.460779]  <IRQ>
>>>>> [  145.461453]  blk_done_softirq+0xa1/0xd0
>>>>> [  145.462138]  __do_softirq+0xd7/0x2d6
>>>>> [  145.462814]  irq_exit+0xf7/0x100
>>>>> [  145.463480]  do_IRQ+0x7f/0xd0
>>>>> [  145.464131]  common_interrupt+0xf/0xf
>>>>> [  145.464797]  </IRQ>
>>>>
>>>> Hi, out test can repoduce this problem:
>>>>
>>>> [  139.158093] BUG: kernel NULL pointer dereference, address:
>>>> 0000000000000008
>>>> [  139.160285] #PF: supervisor read access in kernel mode
>>>> [  139.161905] #PF: error_code(0x0000) - not-present page
>>>> [  139.163513] PGD 172745067 P4D 172745067 PUD 17fa88067 PMD 0
>>>> [  139.164506] Oops: 0000 [#1] PREEMPT SMP
>>>> [  139.165034] CPU: 17 PID: 1083 Comm: nbd-client Not tainted
>>>> 5.15.0-rc4-next-20211007-dirty #94
>>>> [  139.166179] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>>>> ?-20190727_073836-buildvm-p4
>>>> [  139.167962] RIP: 0010:kyber_has_work+0x31/0xb0
>>>> [  139.168571] Code: 41 bd 48 00 00 00 41 54 45 31 e4 55 53 48 8b 9f b0 00
>>>> 00 00 48 8d 6b 08 49 63 c4 d
>>>> [  139.171039] RSP: 0018:ffffc90000f479c8 EFLAGS: 00010246
>>>> [  139.171740] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
>>>> ffff888176218f40
>>>> [  139.172680] RDX: ffffffffffffffff RSI: ffffc90000f479f4 RDI:
>>>> ffff888175310000
>>>> [  139.173611] RBP: 0000000000000008 R08: 0000000000000000 R09:
>>>> ffff88882fa6c0a8
>>>> [  139.174541] R10: 000000000000030e R11: ffff88817fbcfa10 R12:
>>>> 0000000000000000
>>>> [  139.175482] R13: 0000000000000048 R14: ffffffff99b7e340 R15:
>>>> ffff8881783edc00
>>>> [  139.176402] FS:  00007fa8e62e4b40(0000) GS:ffff88882fa40000(0000)
>>>> knlGS:0000000000000000
>>>> [  139.177434] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [  139.178190] CR2: 0000000000000008 CR3: 00000001796ac000 CR4:
>>>> 00000000000006e0
>>>> [  139.179127] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>>> 0000000000000000
>>>> [  139.180066] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>>>> 0000000000000400
>>>> [  139.181000] Call Trace:
>>>> [  139.182338]  <TASK>
>>>> [  139.182638]  blk_mq_run_hw_queue+0x135/0x180
>>>> [  139.183207]  blk_mq_run_hw_queues+0x80/0x150
>>>> [  139.183766]  blk_mq_unquiesce_queue+0x33/0x40
>>>> [  139.184329]  nbd_clear_que+0x52/0xb0 [nbd]
>>>> [  139.184869]  nbd_disconnect_and_put+0x6b/0xe0 [nbd]
>>>> [  139.185499]  nbd_genl_disconnect+0x125/0x290 [nbd]
>>>> [  139.186123]  genl_family_rcv_msg_doit.isra.0+0x102/0x1b0
>>>> [  139.186821]  genl_rcv_msg+0xfc/0x2b0
>>>> [  139.187300]  ? nbd_ioctl+0x500/0x500 [nbd]
>>>> [  139.187847]  ? genl_family_rcv_msg_doit.isra.0+0x1b0/0x1b0
>>>> [  139.188564]  netlink_rcv_skb+0x62/0x180
>>>> [  139.189075]  genl_rcv+0x34/0x60
>>>> [  139.189490]  netlink_unicast+0x26d/0x590
>>>> [  139.190006]  netlink_sendmsg+0x3a1/0x6d0
>>>> [  139.190513]  ? netlink_rcv_skb+0x180/0x180
>>>> [  139.191039]  ____sys_sendmsg+0x1da/0x320
>>>> [  139.191556]  ? ____sys_recvmsg+0x130/0x220
>>>> [  139.192095]  ___sys_sendmsg+0x8e/0xf0
>>>> [  139.192591]  ? ___sys_recvmsg+0xa2/0xf0
>>>> [  139.193102]  ? __wake_up_common_lock+0xac/0xe0
>>>> [  139.193699]  __sys_sendmsg+0x6d/0xe0
>>>> [  139.194167]  __x64_sys_sendmsg+0x23/0x30
>>>> [  139.194675]  do_syscall_64+0x35/0x80
>>>> [  139.195145]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>> [  139.195806] RIP: 0033:0x7fa8e59ebb87
>>>> [  139.196281] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00
>>>> 00 8b 05 6a 2b 2c 00 48 63 8
>>>> [  139.198715] RSP: 002b:00007ffd50573c38 EFLAGS: 00000246 ORIG_RAX:
>>>> 000000000000002e
>>>> [  139.199710] RAX: ffffffffffffffda RBX: 0000000001318120 RCX:
>>>> 00007fa8e59ebb87
>>>> [  139.200643] RDX: 0000000000000000 RSI: 00007ffd50573c70 RDI:
>>>> 0000000000000003
>>>> [  139.201583] RBP: 00000000013181f0 R08: 0000000000000014 R09:
>>>> 0000000000000002
>>>> [  139.202512] R10: 0000000000000006 R11: 0000000000000246 R12:
>>>> 0000000001318030
>>>> [  139.203434] R13: 00007ffd50573c70 R14: 0000000000000001 R15:
>>>> 00000000ffffffff
>>>> [  139.204364]  </TASK>
>>>> [  139.204652] Modules linked in: nbd
>>>> [  139.205101] CR2: 0000000000000008
>>>> [  139.205580] ---[ end trace 0248c57101a02431 ]---
>>>>
>>>> hope the call stack can be helpful.
>>>
>>> Can you share the following info?
>>>
>>> 1) is the above panic triggered with this quiesce patchset or without
>>> it?
>>
>> Without it, of course.
> 
> Got it, any chance to test this patchset V2 and see if your nbd issue can be
> fixed?
> 

I aready test it and the problem can be fixed by your patch 5.

In fact, I was going to send a patch similar to your patch 5 today,
and found your patch set.

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: support concurrent queue quiesce/unquiesce
@ 2021-10-08  7:13                 ` yukuai (C)
  0 siblings, 0 replies; 40+ messages in thread
From: yukuai (C) @ 2021-10-08  7:13 UTC (permalink / raw)
  To: Ming Lei
  Cc: Bart Van Assche, Jens Axboe, Christoph Hellwig, linux-block,
	linux-nvme, Sagi Grimberg, Keith Busch

On 2021/10/08 14:35, Ming Lei wrote:
> On Fri, Oct 08, 2021 at 02:22:39PM +0800, yukuai (C) wrote:
>> On 2021/10/08 13:10, Ming Lei wrote:
>>> Hello yukuai,
>>>
>>> On Fri, Oct 08, 2021 at 11:22:38AM +0800, yukuai (C) wrote:
>>>> On 2021/10/05 10:31, Ming Lei wrote:
>>>>> On Thu, Sep 30, 2021 at 08:56:29AM -0700, Bart Van Assche wrote:
>>>>>> On 9/30/21 5:56 AM, Ming Lei wrote:
>>>>>>> Turns out that blk_mq_freeze_queue() isn't stronger[1] than
>>>>>>> blk_mq_quiesce_queue() because dispatch may still be in-progress after
>>>>>>> queue is frozen, and in several cases, such as switching io scheduler,
>>>>>>> updating nr_requests & wbt latency, we still need to quiesce queue as a
>>>>>>> supplement of freezing queue.
>>>>>>
>>>>>> Is there agreement about this? If not, how about leaving out the above from the
>>>>>> patch description?
>>>>>
>>>>> Yeah, actually the code has been merged, please see the related
>>>>> functions: elevator_switch(), queue_wb_lat_store() and
>>>>> blk_mq_update_nr_requests().
>>>>>
>>>>>>
>>>>>>> As we need to extend uses of blk_mq_quiesce_queue(), it is inevitable
>>>>>>> for us to need support nested quiesce, especially we can't let
>>>>>>> unquiesce happen when there is quiesce originated from other contexts.
>>>>>>>
>>>>>>> This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
>>>>>>> and we only unquiesce queue when it is the last/outer-most one of all
>>>>>>> contexts.
>>>>>>>
>>>>>>> One kernel panic issue has been reported[2] when running stress test on
>>>>>>> dm-mpath's updating nr_requests and suspending queue, and the similar
>>>>>>> issue should exist on almost all drivers which use quiesce/unquiesce.
>>>>>>>
>>>>>>> [1] https://marc.info/?l=linux-block&m=150993988115872&w=2
>>>>>>> [2] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html
>>>>>>
>>>>>> Please share the call stack of the kernel oops fixed by [2] since that
>>>>>> call stack is not in the patch description.
>>>>>
>>>>> OK, it is something like the following:
>>>>>
>>>>> [  145.453672] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
>>>>> [  145.454104] RIP: 0010:dm_softirq_done+0x46/0x220 [dm_mod]
>>>>> [  145.454536] Code: 85 ed 0f 84 40 01 00 00 44 0f b6 b7 70 01 00 00 4c 8b a5 18 01 00 00 45 89 f5 f6 47 1d 04 75 57 49 8b 7c 24 08 48 85 ff 74 4d <48> 8b 47 08 48 8b 40 58 48 85 c0 74 40 49 8d 4c 24 50 44 89 f2 48
>>>>> [  145.455423] RSP: 0000:ffffa88600003ef8 EFLAGS: 00010282
>>>>> [  145.455865] RAX: ffffffffc03fbd10 RBX: ffff979144c00010 RCX: dead000000000200
>>>>> [  145.456321] RDX: ffffa88600003f30 RSI: ffff979144c00068 RDI: ffffa88600d01040
>>>>> [  145.456764] RBP: ffff979150eb7990 R08: ffff9791bbc27de0 R09: 0000000000000100
>>>>> [  145.457205] R10: 0000000000000068 R11: 000000000000004c R12: ffff979144c00138
>>>>> [  145.457647] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000010
>>>>> [  145.458080] FS:  00007f57e5d13180(0000) GS:ffff9791bbc00000(0000) knlGS:0000000000000000
>>>>> [  145.458516] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [  145.458945] CR2: ffffa88600d01048 CR3: 0000000106cf8003 CR4: 0000000000370ef0
>>>>> [  145.459382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> [  145.459815] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> [  145.460250] Call Trace:
>>>>> [  145.460779]  <IRQ>
>>>>> [  145.461453]  blk_done_softirq+0xa1/0xd0
>>>>> [  145.462138]  __do_softirq+0xd7/0x2d6
>>>>> [  145.462814]  irq_exit+0xf7/0x100
>>>>> [  145.463480]  do_IRQ+0x7f/0xd0
>>>>> [  145.464131]  common_interrupt+0xf/0xf
>>>>> [  145.464797]  </IRQ>
>>>>
>>>> Hi, out test can repoduce this problem:
>>>>
>>>> [  139.158093] BUG: kernel NULL pointer dereference, address:
>>>> 0000000000000008
>>>> [  139.160285] #PF: supervisor read access in kernel mode
>>>> [  139.161905] #PF: error_code(0x0000) - not-present page
>>>> [  139.163513] PGD 172745067 P4D 172745067 PUD 17fa88067 PMD 0
>>>> [  139.164506] Oops: 0000 [#1] PREEMPT SMP
>>>> [  139.165034] CPU: 17 PID: 1083 Comm: nbd-client Not tainted
>>>> 5.15.0-rc4-next-20211007-dirty #94
>>>> [  139.166179] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>>>> ?-20190727_073836-buildvm-p4
>>>> [  139.167962] RIP: 0010:kyber_has_work+0x31/0xb0
>>>> [  139.168571] Code: 41 bd 48 00 00 00 41 54 45 31 e4 55 53 48 8b 9f b0 00
>>>> 00 00 48 8d 6b 08 49 63 c4 d
>>>> [  139.171039] RSP: 0018:ffffc90000f479c8 EFLAGS: 00010246
>>>> [  139.171740] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
>>>> ffff888176218f40
>>>> [  139.172680] RDX: ffffffffffffffff RSI: ffffc90000f479f4 RDI:
>>>> ffff888175310000
>>>> [  139.173611] RBP: 0000000000000008 R08: 0000000000000000 R09:
>>>> ffff88882fa6c0a8
>>>> [  139.174541] R10: 000000000000030e R11: ffff88817fbcfa10 R12:
>>>> 0000000000000000
>>>> [  139.175482] R13: 0000000000000048 R14: ffffffff99b7e340 R15:
>>>> ffff8881783edc00
>>>> [  139.176402] FS:  00007fa8e62e4b40(0000) GS:ffff88882fa40000(0000)
>>>> knlGS:0000000000000000
>>>> [  139.177434] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [  139.178190] CR2: 0000000000000008 CR3: 00000001796ac000 CR4:
>>>> 00000000000006e0
>>>> [  139.179127] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>>> 0000000000000000
>>>> [  139.180066] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>>>> 0000000000000400
>>>> [  139.181000] Call Trace:
>>>> [  139.182338]  <TASK>
>>>> [  139.182638]  blk_mq_run_hw_queue+0x135/0x180
>>>> [  139.183207]  blk_mq_run_hw_queues+0x80/0x150
>>>> [  139.183766]  blk_mq_unquiesce_queue+0x33/0x40
>>>> [  139.184329]  nbd_clear_que+0x52/0xb0 [nbd]
>>>> [  139.184869]  nbd_disconnect_and_put+0x6b/0xe0 [nbd]
>>>> [  139.185499]  nbd_genl_disconnect+0x125/0x290 [nbd]
>>>> [  139.186123]  genl_family_rcv_msg_doit.isra.0+0x102/0x1b0
>>>> [  139.186821]  genl_rcv_msg+0xfc/0x2b0
>>>> [  139.187300]  ? nbd_ioctl+0x500/0x500 [nbd]
>>>> [  139.187847]  ? genl_family_rcv_msg_doit.isra.0+0x1b0/0x1b0
>>>> [  139.188564]  netlink_rcv_skb+0x62/0x180
>>>> [  139.189075]  genl_rcv+0x34/0x60
>>>> [  139.189490]  netlink_unicast+0x26d/0x590
>>>> [  139.190006]  netlink_sendmsg+0x3a1/0x6d0
>>>> [  139.190513]  ? netlink_rcv_skb+0x180/0x180
>>>> [  139.191039]  ____sys_sendmsg+0x1da/0x320
>>>> [  139.191556]  ? ____sys_recvmsg+0x130/0x220
>>>> [  139.192095]  ___sys_sendmsg+0x8e/0xf0
>>>> [  139.192591]  ? ___sys_recvmsg+0xa2/0xf0
>>>> [  139.193102]  ? __wake_up_common_lock+0xac/0xe0
>>>> [  139.193699]  __sys_sendmsg+0x6d/0xe0
>>>> [  139.194167]  __x64_sys_sendmsg+0x23/0x30
>>>> [  139.194675]  do_syscall_64+0x35/0x80
>>>> [  139.195145]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>> [  139.195806] RIP: 0033:0x7fa8e59ebb87
>>>> [  139.196281] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00
>>>> 00 8b 05 6a 2b 2c 00 48 63 8
>>>> [  139.198715] RSP: 002b:00007ffd50573c38 EFLAGS: 00000246 ORIG_RAX:
>>>> 000000000000002e
>>>> [  139.199710] RAX: ffffffffffffffda RBX: 0000000001318120 RCX:
>>>> 00007fa8e59ebb87
>>>> [  139.200643] RDX: 0000000000000000 RSI: 00007ffd50573c70 RDI:
>>>> 0000000000000003
>>>> [  139.201583] RBP: 00000000013181f0 R08: 0000000000000014 R09:
>>>> 0000000000000002
>>>> [  139.202512] R10: 0000000000000006 R11: 0000000000000246 R12:
>>>> 0000000001318030
>>>> [  139.203434] R13: 00007ffd50573c70 R14: 0000000000000001 R15:
>>>> 00000000ffffffff
>>>> [  139.204364]  </TASK>
>>>> [  139.204652] Modules linked in: nbd
>>>> [  139.205101] CR2: 0000000000000008
>>>> [  139.205580] ---[ end trace 0248c57101a02431 ]---
>>>>
>>>> hope the call stack can be helpful.
>>>
>>> Can you share the following info?
>>>
>>> 1) is the above panic triggered with this quiesce patchset or without
>>> it?
>>
>> Without it, of course.
> 
> Got it, any chance to test this patchset V2 and see if your nbd issue can be
> fixed?
> 

I aready test it and the problem can be fixed by your patch 5.

In fact, I was going to send a patch similar to your patch 5 today,
and found your patch set.

Thanks,
Kuai


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 1/5] nvme: add APIs for stopping/starting admin queue
  2021-10-05  2:23       ` Ming Lei
@ 2021-10-11 11:39         ` Christoph Hellwig
  -1 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2021-10-11 11:39 UTC (permalink / raw)
  To: Ming Lei
  Cc: Chaitanya Kulkarni, Jens Axboe, Christoph Hellwig, linux-block,
	linux-nvme, Sagi Grimberg, Keith Busch

On Tue, Oct 05, 2021 at 10:23:14AM +0800, Ming Lei wrote:
> On Fri, Oct 01, 2021 at 05:56:04AM +0000, Chaitanya Kulkarni wrote:
> > On 9/30/2021 5:56 AM, Ming Lei wrote:
> > > External email: Use caution opening links or attachments
> > > 
> > > 
> > > Add two APIs for stopping and starting admin queue.
> > > 
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > 
> > 
> > this patch looks good to me, but from the feedback I've received in past 
> > we need to add the new functions in the patch where they are actually 
> > used than adding it in a separate patch.

The above is for use inside the same driver/subsystem.  Cross-subsystem
we usually do what Ming did here.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 1/5] nvme: add APIs for stopping/starting admin queue
@ 2021-10-11 11:39         ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2021-10-11 11:39 UTC (permalink / raw)
  To: Ming Lei
  Cc: Chaitanya Kulkarni, Jens Axboe, Christoph Hellwig, linux-block,
	linux-nvme, Sagi Grimberg, Keith Busch

On Tue, Oct 05, 2021 at 10:23:14AM +0800, Ming Lei wrote:
> On Fri, Oct 01, 2021 at 05:56:04AM +0000, Chaitanya Kulkarni wrote:
> > On 9/30/2021 5:56 AM, Ming Lei wrote:
> > > External email: Use caution opening links or attachments
> > > 
> > > 
> > > Add two APIs for stopping and starting admin queue.
> > > 
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > 
> > 
> > this patch looks good to me, but from the feedback I've received in past 
> > we need to add the new functions in the patch where they are actually 
> > used than adding it in a separate patch.

The above is for use inside the same driver/subsystem.  Cross-subsystem
we usually do what Ming did here.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 1/5] nvme: add APIs for stopping/starting admin queue
  2021-10-11 11:39         ` Christoph Hellwig
@ 2021-10-13  4:25           ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 40+ messages in thread
From: Chaitanya Kulkarni @ 2021-10-13  4:25 UTC (permalink / raw)
  To: Christoph Hellwig, Ming Lei
  Cc: Jens Axboe, linux-block, linux-nvme, Sagi Grimberg, Keith Busch


> The above is for use inside the same driver/subsystem.  Cross-subsystem
> we usually do what Ming did here.
> 

I'll keep in mind for future reviews and patches.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 1/5] nvme: add APIs for stopping/starting admin queue
@ 2021-10-13  4:25           ` Chaitanya Kulkarni
  0 siblings, 0 replies; 40+ messages in thread
From: Chaitanya Kulkarni @ 2021-10-13  4:25 UTC (permalink / raw)
  To: Christoph Hellwig, Ming Lei
  Cc: Jens Axboe, linux-block, linux-nvme, Sagi Grimberg, Keith Busch


> The above is for use inside the same driver/subsystem.  Cross-subsystem
> we usually do what Ming did here.
> 

I'll keep in mind for future reviews and patches.
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2021-10-13  4:25 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-30 12:56 [PATCH V2 0/5] blk-mq: support concurrent queue quiescing Ming Lei
2021-09-30 12:56 ` Ming Lei
2021-09-30 12:56 ` [PATCH V2 1/5] nvme: add APIs for stopping/starting admin queue Ming Lei
2021-09-30 12:56   ` Ming Lei
2021-10-01  5:56   ` Chaitanya Kulkarni
2021-10-01  5:56     ` Chaitanya Kulkarni
2021-10-05  2:23     ` Ming Lei
2021-10-05  2:23       ` Ming Lei
2021-10-05  3:38       ` Chaitanya Kulkarni
2021-10-05  3:38         ` Chaitanya Kulkarni
2021-10-05  8:04         ` Ming Lei
2021-10-05  8:04           ` Ming Lei
2021-10-11 11:39       ` Christoph Hellwig
2021-10-11 11:39         ` Christoph Hellwig
2021-10-13  4:25         ` Chaitanya Kulkarni
2021-10-13  4:25           ` Chaitanya Kulkarni
2021-09-30 12:56 ` [PATCH V2 2/5] nvme: apply nvme API to quiesce/unquiesce " Ming Lei
2021-09-30 12:56   ` Ming Lei
2021-10-01  5:57   ` Chaitanya Kulkarni
2021-10-01  5:57     ` Chaitanya Kulkarni
2021-09-30 12:56 ` [PATCH V2 3/5] nvme: prepare for pairing quiescing and unquiescing Ming Lei
2021-09-30 12:56   ` Ming Lei
2021-09-30 12:56 ` [PATCH V2 4/5] nvme: paring quiesce/unquiesce Ming Lei
2021-09-30 12:56   ` Ming Lei
2021-09-30 12:56 ` [PATCH V2 5/5] blk-mq: support concurrent queue quiesce/unquiesce Ming Lei
2021-09-30 12:56   ` Ming Lei
2021-09-30 15:56   ` Bart Van Assche
2021-09-30 15:56     ` Bart Van Assche
2021-10-05  2:31     ` Ming Lei
2021-10-05  2:31       ` Ming Lei
2021-10-08  3:22       ` yukuai (C)
2021-10-08  3:22         ` yukuai (C)
2021-10-08  5:10         ` Ming Lei
2021-10-08  5:10           ` Ming Lei
2021-10-08  6:22           ` yukuai (C)
2021-10-08  6:22             ` yukuai (C)
2021-10-08  6:35             ` Ming Lei
2021-10-08  6:35               ` Ming Lei
2021-10-08  7:13               ` yukuai (C)
2021-10-08  7:13                 ` yukuai (C)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.