All of lore.kernel.org
 help / color / mirror / Atom feed
* per-tagset SRCU struct and quiesce v3
@ 2022-11-01 15:00 Christoph Hellwig
  2022-11-01 15:00 ` [PATCH 01/14] block: set the disk capacity to 0 in blk_mark_disk_dead Christoph Hellwig
                   ` (16 more replies)
  0 siblings, 17 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-11-01 15:00 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

Hi all,

this series moves the SRCU struct used for quiescing to the tag_set
as the SRCU critical sections for dispatch take about the same time
on all queues anyway, and then adopts the series from Chao that provides
tagset-wide quiesce to use that infrastructure.

To do so this series cleans up a lot of code in nvme that is related
to quiescing queues.

Changes since v2:
 - check blk_queue_skip_tagset_quiesce in blk_mq_unquiesce_tagset
 - drop "nvme-pci: don't warn about the lack of I/O queues for admin
   controllers"
 - drop "nvme: don't call nvme_kill_queues from nvme_remove_namespaces"
 - drop "nvme-pci: mark the namespaces dead earlier in nvme_remove"
 - minor fixups for the above dropped patches
 - fix up a few commit messages

Changes since v1:
 - a whole bunch of new nvme cleanups
 - use the notify version of set_capacity in blk_mark_disk_dead
 - improve the documentation for blk_mq_wait_quiesce_done
 - dynamically allocate the srcu_struct in struct blk_mq_tag_set
 - minor spelling fixes

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 01/14] block: set the disk capacity to 0 in blk_mark_disk_dead
  2022-11-01 15:00 per-tagset SRCU struct and quiesce v3 Christoph Hellwig
@ 2022-11-01 15:00 ` Christoph Hellwig
  2022-11-02  1:18   ` Chaitanya Kulkarni
  2022-11-02 14:35   ` Jens Axboe
  2022-11-01 15:00 ` [PATCH 02/14] nvme-pci: refactor the tagset handling in nvme_reset_work Christoph Hellwig
                   ` (15 subsequent siblings)
  16 siblings, 2 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-11-01 15:00 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

nvme and xen-blkfront are already doing this to stop buffered writes from
creating dirty pages that can't be written out later.  Move it to the
common code.

This also removes the comment about the ordering from nvme, as bd_mutex
not only is gone entirely, but also hasn't been used for locking updates
to the disk size long before that, and thus the ordering requirement
documented there doesn't apply any more.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chao Leng <lengchao@huawei.com>
---
 block/genhd.c                | 5 +++++
 drivers/block/xen-blkfront.c | 1 -
 drivers/nvme/host/core.c     | 7 +------
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 493b93faee9c8..e7bd036024fab 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -555,6 +555,11 @@ void blk_mark_disk_dead(struct gendisk *disk)
 {
 	set_bit(GD_DEAD, &disk->state);
 	blk_queue_start_drain(disk->queue);
+
+	/*
+	 * Stop buffered writers from dirtying pages that can't be written out.
+	 */
+	set_capacity_and_notify(disk, 0);
 }
 EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
 
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 35b9bcad9db90..b28489290323f 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -2129,7 +2129,6 @@ static void blkfront_closing(struct blkfront_info *info)
 	if (info->rq && info->gd) {
 		blk_mq_stop_hw_queues(info->rq);
 		blk_mark_disk_dead(info->gd);
-		set_capacity(info->gd, 0);
 	}
 
 	for_each_rinfo(info, rinfo, i) {
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0090dc0b3ae6f..aea0f89acf409 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -5116,10 +5116,7 @@ static void nvme_stop_ns_queue(struct nvme_ns *ns)
 /*
  * Prepare a queue for teardown.
  *
- * This must forcibly unquiesce queues to avoid blocking dispatch, and only set
- * the capacity to 0 after that to avoid blocking dispatchers that may be
- * holding bd_butex.  This will end buffered writers dirtying pages that can't
- * be synced.
+ * This must forcibly unquiesce queues to avoid blocking dispatch.
  */
 static void nvme_set_queue_dying(struct nvme_ns *ns)
 {
@@ -5128,8 +5125,6 @@ static void nvme_set_queue_dying(struct nvme_ns *ns)
 
 	blk_mark_disk_dead(ns->disk);
 	nvme_start_ns_queue(ns);
-
-	set_capacity_and_notify(ns->disk, 0);
 }
 
 /**
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 02/14] nvme-pci: refactor the tagset handling in nvme_reset_work
  2022-11-01 15:00 per-tagset SRCU struct and quiesce v3 Christoph Hellwig
  2022-11-01 15:00 ` [PATCH 01/14] block: set the disk capacity to 0 in blk_mark_disk_dead Christoph Hellwig
@ 2022-11-01 15:00 ` Christoph Hellwig
  2022-11-02  1:19   ` Chaitanya Kulkarni
  2022-11-01 15:00 ` [PATCH 03/14] nvme: don't remove namespaces in nvme_passthru_end Christoph Hellwig
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-11-01 15:00 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

The code to create, update or delete a tagset and namespaces in
nvme_reset_work is a bit convoluted.  Refactor it with a two high-level
conditionals for first probe vs reset and I/O queues vs no I/O queues
to make the code flow more clear.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 46 ++++++++++++++++++++++++++---------------
 1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5f1d71ac00865..0e8410edb41bf 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2897,25 +2897,37 @@ static void nvme_reset_work(struct work_struct *work)
 	result = nvme_setup_io_queues(dev);
 	if (result)
 		goto out;
-
-	/*
-	 * Keep the controller around but remove all namespaces if we don't have
-	 * any working I/O queue.
-	 */
-	if (dev->online_queues < 2) {
-		dev_warn(dev->ctrl.device, "IO queues not created\n");
-		nvme_kill_queues(&dev->ctrl);
-		nvme_remove_namespaces(&dev->ctrl);
-		nvme_free_tagset(dev);
+		
+	if (dev->ctrl.tagset) {
+		/*
+		 * This is a controller reset and we already have a tagset.
+		 * Freeze and update the number of I/O queues as thos might have
+		 * changed.  If there are no I/O queues left after this reset,
+		 * keep the controller around but remove all namespaces.
+		 */
+		if (dev->online_queues > 1) {
+			nvme_start_queues(&dev->ctrl);
+			nvme_wait_freeze(&dev->ctrl);
+			nvme_pci_update_nr_queues(dev);
+			nvme_dbbuf_set(dev);
+			nvme_unfreeze(&dev->ctrl);
+		} else {
+			dev_warn(dev->ctrl.device, "IO queues lost\n");
+			nvme_kill_queues(&dev->ctrl);
+			nvme_remove_namespaces(&dev->ctrl);
+			nvme_free_tagset(dev);
+		}
 	} else {
-		nvme_start_queues(&dev->ctrl);
-		nvme_wait_freeze(&dev->ctrl);
-		if (!dev->ctrl.tagset)
+		/*
+		 * First probe.  Still allow the controller to show up even if
+		 * there are no namespaces.
+		 */
+		if (dev->online_queues > 1) {
 			nvme_pci_alloc_tag_set(dev);
-		else
-			nvme_pci_update_nr_queues(dev);
-		nvme_dbbuf_set(dev);
-		nvme_unfreeze(&dev->ctrl);
+			nvme_dbbuf_set(dev);
+		} else {
+			dev_warn(dev->ctrl.device, "IO queues not created\n");
+		}
 	}
 
 	/*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 03/14] nvme: don't remove namespaces in nvme_passthru_end
  2022-11-01 15:00 per-tagset SRCU struct and quiesce v3 Christoph Hellwig
  2022-11-01 15:00 ` [PATCH 01/14] block: set the disk capacity to 0 in blk_mark_disk_dead Christoph Hellwig
  2022-11-01 15:00 ` [PATCH 02/14] nvme-pci: refactor the tagset handling in nvme_reset_work Christoph Hellwig
@ 2022-11-01 15:00 ` Christoph Hellwig
  2022-11-02  1:20   ` Chaitanya Kulkarni
  2022-11-01 15:00 ` [PATCH 04/14] nvme: remove the NVME_NS_DEAD check in nvme_remove_invalid_namespaces Christoph Hellwig
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-11-01 15:00 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

The call to nvme_remove_invalid_namespaces made sense when
nvme_passthru_end revalidated all namespaces and had to remove those that
didn't exist any more.  Since we don't revalidate from nvme_passthru_end
now, this call is entirely spurious.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index aea0f89acf409..35386aa24f589 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1118,7 +1118,6 @@ void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects,
 		nvme_unfreeze(ctrl);
 		nvme_mpath_unfreeze(ctrl->subsys);
 		mutex_unlock(&ctrl->subsys->lock);
-		nvme_remove_invalid_namespaces(ctrl, NVME_NSID_ALL);
 		mutex_unlock(&ctrl->scan_lock);
 	}
 	if (effects & NVME_CMD_EFFECTS_CCC)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 04/14] nvme: remove the NVME_NS_DEAD check in nvme_remove_invalid_namespaces
  2022-11-01 15:00 per-tagset SRCU struct and quiesce v3 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2022-11-01 15:00 ` [PATCH 03/14] nvme: don't remove namespaces in nvme_passthru_end Christoph Hellwig
@ 2022-11-01 15:00 ` Christoph Hellwig
  2022-11-02  1:21   ` Chaitanya Kulkarni
  2022-11-01 15:00 ` [PATCH 05/14] nvme: remove the NVME_NS_DEAD check in nvme_validate_ns Christoph Hellwig
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-11-01 15:00 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

The NVME_NS_DEAD check only made sense when we revalidated namespaces
in nvme_passthrough_end for commands that affected the namespace inventory.
These days NVME_NS_DEAD is only set during reset or when tearing down
namespaces, and we always remove all namespaces right after that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 35386aa24f589..390f2a2db3238 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4407,7 +4407,7 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
 
 	down_write(&ctrl->namespaces_rwsem);
 	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
-		if (ns->head->ns_id > nsid || test_bit(NVME_NS_DEAD, &ns->flags))
+		if (ns->head->ns_id > nsid)
 			list_move_tail(&ns->list, &rm_list);
 	}
 	up_write(&ctrl->namespaces_rwsem);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 05/14] nvme: remove the NVME_NS_DEAD check in nvme_validate_ns
  2022-11-01 15:00 per-tagset SRCU struct and quiesce v3 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2022-11-01 15:00 ` [PATCH 04/14] nvme: remove the NVME_NS_DEAD check in nvme_remove_invalid_namespaces Christoph Hellwig
@ 2022-11-01 15:00 ` Christoph Hellwig
  2022-11-02  1:21   ` Chaitanya Kulkarni
  2022-11-01 15:00 ` [PATCH 06/14] nvme: don't unquiesce the admin queue in nvme_kill_queues Christoph Hellwig
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-11-01 15:00 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

At the point where namespaces are marked dead, the controller is in a
non-live state and we won't get pass the identify commands.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 390f2a2db3238..871a8ab7ec199 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4333,10 +4333,6 @@ static void nvme_validate_ns(struct nvme_ns *ns, struct nvme_ns_info *info)
 {
 	int ret = NVME_SC_INVALID_NS | NVME_SC_DNR;
 
-	if (test_bit(NVME_NS_DEAD, &ns->flags))
-		goto out;
-
-	ret = NVME_SC_INVALID_NS | NVME_SC_DNR;
 	if (!nvme_ns_ids_equal(&ns->head->ids, &info->ids)) {
 		dev_err(ns->ctrl->device,
 			"identifiers changed for nsid %d\n", ns->head->ns_id);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 06/14] nvme: don't unquiesce the admin queue in nvme_kill_queues
  2022-11-01 15:00 per-tagset SRCU struct and quiesce v3 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2022-11-01 15:00 ` [PATCH 05/14] nvme: remove the NVME_NS_DEAD check in nvme_validate_ns Christoph Hellwig
@ 2022-11-01 15:00 ` Christoph Hellwig
  2022-11-02  1:24   ` Chaitanya Kulkarni
  2022-11-01 15:00 ` [PATCH 07/14] nvme: split nvme_kill_queues Christoph Hellwig
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-11-01 15:00 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

None of the callers of nvme_kill_queues needs it to unquiesce the
admin queues, as all of them already do it themselves:

 1) nvme_reset_work explicit call nvme_start_admin_queue toward the
    beginning of the function.  The extra call to nvme_start_admin_queue
    in nvme_reset_work this won't do anything as
    NVME_CTRL_ADMIN_Q_STOPPED will already be cleared.
 2) nvme_remove calls nvme_dev_disable with shutdown flag set to true at
    the very beginning of the function if the PCIe device was not present,
    which is the precondition for the call to nvme_kill_queues.
    nvme_dev_disable already calls nvme_start_admin_queue toward the
    end of the function when the shutdown flag is set to true, so the
    admin queue is already enabled at this point.
 3) nvme_remove_dead_ctrl schedules a workqueue to unbind the driver,
    which will end up in nvme_remove, which calls nvme_dev_disable with
    the shutdown flag.  This case will call nvme_start_admin_queue a bit
    later than before.
 4) apple_nvme_remove uses the same sequence as nvme_remove_dead_ctrl
    above.
 5) nvme_remove_namespaces only calls nvme_kill_queues when the
    controller is in the DEAD state.  That can only happen in the PCIe
    driver, and only from nvme_remove. See item 2) above for the
    conditions there.

So it is safe to just remove the call to nvme_start_admin_queue in
nvme_kill_queues without replacement.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 871a8ab7ec199..bb62803de5b21 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -5135,10 +5135,6 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 
 	down_read(&ctrl->namespaces_rwsem);
 
-	/* Forcibly unquiesce queues to avoid blocking dispatch */
-	if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
-		nvme_start_admin_queue(ctrl);
-
 	list_for_each_entry(ns, &ctrl->namespaces, list)
 		nvme_set_queue_dying(ns);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 07/14] nvme: split nvme_kill_queues
  2022-11-01 15:00 per-tagset SRCU struct and quiesce v3 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2022-11-01 15:00 ` [PATCH 06/14] nvme: don't unquiesce the admin queue in nvme_kill_queues Christoph Hellwig
@ 2022-11-01 15:00 ` Christoph Hellwig
  2022-11-02  1:25   ` Chaitanya Kulkarni
  2022-11-01 15:00 ` [PATCH 08/14] nvme-pci: don't unquiesce the I/O queues in nvme_remove_dead_ctrl Christoph Hellwig
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-11-01 15:00 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

nvme_kill_queues does two things:

 1) mark the gendisk of all namespaces dead
 2) unquiesce all I/O queues

These used to be be intertwined due to block layer issues, but aren't
any more.  So move the unquiscing of the I/O queues into the callers,
and rename the rest of the function to the now more descriptive
nvme_mark_namespaces_dead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/apple.c |  3 ++-
 drivers/nvme/host/core.c  | 36 ++++++++----------------------------
 drivers/nvme/host/nvme.h  |  3 +--
 drivers/nvme/host/pci.c   |  6 ++++--
 4 files changed, 15 insertions(+), 33 deletions(-)

diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
index a89b06a7099f1..6c09703ffe922 100644
--- a/drivers/nvme/host/apple.c
+++ b/drivers/nvme/host/apple.c
@@ -1153,7 +1153,8 @@ static void apple_nvme_reset_work(struct work_struct *work)
 	nvme_change_ctrl_state(&anv->ctrl, NVME_CTRL_DELETING);
 	nvme_get_ctrl(&anv->ctrl);
 	apple_nvme_disable(anv, false);
-	nvme_kill_queues(&anv->ctrl);
+	nvme_mark_namespaces_dead(&anv->ctrl);
+	nvme_start_queues(&anv->ctrl);
 	if (!queue_work(nvme_wq, &anv->remove_work))
 		nvme_put_ctrl(&anv->ctrl);
 }
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index bb62803de5b21..ed06fcb87f93a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4561,8 +4561,10 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
 	 * removing the namespaces' disks; fail all the queues now to avoid
 	 * potentially having to clean up the failed sync later.
 	 */
-	if (ctrl->state == NVME_CTRL_DEAD)
-		nvme_kill_queues(ctrl);
+	if (ctrl->state == NVME_CTRL_DEAD) {
+		nvme_mark_namespaces_dead(ctrl);
+		nvme_start_queues(ctrl);
+	}
 
 	/* this is a no-op when called from the controller reset handler */
 	nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING_NOIO);
@@ -5108,39 +5110,17 @@ static void nvme_stop_ns_queue(struct nvme_ns *ns)
 		blk_mq_wait_quiesce_done(ns->queue);
 }
 
-/*
- * Prepare a queue for teardown.
- *
- * This must forcibly unquiesce queues to avoid blocking dispatch.
- */
-static void nvme_set_queue_dying(struct nvme_ns *ns)
-{
-	if (test_and_set_bit(NVME_NS_DEAD, &ns->flags))
-		return;
-
-	blk_mark_disk_dead(ns->disk);
-	nvme_start_ns_queue(ns);
-}
-
-/**
- * nvme_kill_queues(): Ends all namespace queues
- * @ctrl: the dead controller that needs to end
- *
- * Call this function when the driver determines it is unable to get the
- * controller in a state capable of servicing IO.
- */
-void nvme_kill_queues(struct nvme_ctrl *ctrl)
+/* let I/O to all namespaces fail in preparation for surprise removal */
+void nvme_mark_namespaces_dead(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
 
 	down_read(&ctrl->namespaces_rwsem);
-
 	list_for_each_entry(ns, &ctrl->namespaces, list)
-		nvme_set_queue_dying(ns);
-
+		blk_mark_disk_dead(ns->disk);
 	up_read(&ctrl->namespaces_rwsem);
 }
-EXPORT_SYMBOL_GPL(nvme_kill_queues);
+EXPORT_SYMBOL_GPL(nvme_mark_namespaces_dead);
 
 void nvme_unfreeze(struct nvme_ctrl *ctrl)
 {
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a29877217ee65..1cd7168002438 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -483,7 +483,6 @@ struct nvme_ns {
 	unsigned long features;
 	unsigned long flags;
 #define NVME_NS_REMOVING	0
-#define NVME_NS_DEAD     	1
 #define NVME_NS_ANA_PENDING	2
 #define NVME_NS_FORCE_RO	3
 #define NVME_NS_READY		4
@@ -758,7 +757,7 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl);
 void nvme_start_queues(struct nvme_ctrl *ctrl);
 void nvme_stop_admin_queue(struct nvme_ctrl *ctrl);
 void nvme_start_admin_queue(struct nvme_ctrl *ctrl);
-void nvme_kill_queues(struct nvme_ctrl *ctrl);
+void nvme_mark_namespaces_dead(struct nvme_ctrl *ctrl);
 void nvme_sync_queues(struct nvme_ctrl *ctrl);
 void nvme_sync_io_queues(struct nvme_ctrl *ctrl);
 void nvme_unfreeze(struct nvme_ctrl *ctrl);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 0e8410edb41bf..046fc02914a5a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2788,7 +2788,8 @@ static void nvme_remove_dead_ctrl(struct nvme_dev *dev)
 	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
 	nvme_get_ctrl(&dev->ctrl);
 	nvme_dev_disable(dev, false);
-	nvme_kill_queues(&dev->ctrl);
+	nvme_mark_namespaces_dead(&dev->ctrl);
+	nvme_start_queues(&dev->ctrl);
 	if (!queue_work(nvme_wq, &dev->remove_work))
 		nvme_put_ctrl(&dev->ctrl);
 }
@@ -2913,7 +2914,8 @@ static void nvme_reset_work(struct work_struct *work)
 			nvme_unfreeze(&dev->ctrl);
 		} else {
 			dev_warn(dev->ctrl.device, "IO queues lost\n");
-			nvme_kill_queues(&dev->ctrl);
+			nvme_mark_namespaces_dead(&dev->ctrl);
+			nvme_start_queues(&dev->ctrl);
 			nvme_remove_namespaces(&dev->ctrl);
 			nvme_free_tagset(dev);
 		}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 08/14] nvme-pci: don't unquiesce the I/O queues in nvme_remove_dead_ctrl
  2022-11-01 15:00 per-tagset SRCU struct and quiesce v3 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2022-11-01 15:00 ` [PATCH 07/14] nvme: split nvme_kill_queues Christoph Hellwig
@ 2022-11-01 15:00 ` Christoph Hellwig
  2022-11-02  1:26   ` Chaitanya Kulkarni
  2022-11-01 15:00 ` [PATCH 09/14] nvme-apple: don't unquiesce the I/O queues in apple_nvme_reset_work Christoph Hellwig
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-11-01 15:00 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

nvme_remove_dead_ctrl schedules nvme_remove to be called, which will
call nvme_dev_disable and unquiesce the I/O queues.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 046fc02914a5a..159bc8918bbf0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2789,7 +2789,6 @@ static void nvme_remove_dead_ctrl(struct nvme_dev *dev)
 	nvme_get_ctrl(&dev->ctrl);
 	nvme_dev_disable(dev, false);
 	nvme_mark_namespaces_dead(&dev->ctrl);
-	nvme_start_queues(&dev->ctrl);
 	if (!queue_work(nvme_wq, &dev->remove_work))
 		nvme_put_ctrl(&dev->ctrl);
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 09/14] nvme-apple: don't unquiesce the I/O queues in apple_nvme_reset_work
  2022-11-01 15:00 per-tagset SRCU struct and quiesce v3 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2022-11-01 15:00 ` [PATCH 08/14] nvme-pci: don't unquiesce the I/O queues in nvme_remove_dead_ctrl Christoph Hellwig
@ 2022-11-01 15:00 ` Christoph Hellwig
  2022-11-02  5:47   ` Chaitanya Kulkarni
  2022-11-01 15:00 ` [PATCH 10/14] blk-mq: skip non-mq queues in blk_mq_quiesce_queue Christoph Hellwig
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-11-01 15:00 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

apple_nvme_reset_work schedules apple_nvme_remove, to be called, which
will call apple_nvme_disable and unquiesce the I/O queues.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/apple.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
index 6c09703ffe922..24e224c279a41 100644
--- a/drivers/nvme/host/apple.c
+++ b/drivers/nvme/host/apple.c
@@ -1154,7 +1154,6 @@ static void apple_nvme_reset_work(struct work_struct *work)
 	nvme_get_ctrl(&anv->ctrl);
 	apple_nvme_disable(anv, false);
 	nvme_mark_namespaces_dead(&anv->ctrl);
-	nvme_start_queues(&anv->ctrl);
 	if (!queue_work(nvme_wq, &anv->remove_work))
 		nvme_put_ctrl(&anv->ctrl);
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 10/14] blk-mq: skip non-mq queues in blk_mq_quiesce_queue
  2022-11-01 15:00 per-tagset SRCU struct and quiesce v3 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2022-11-01 15:00 ` [PATCH 09/14] nvme-apple: don't unquiesce the I/O queues in apple_nvme_reset_work Christoph Hellwig
@ 2022-11-01 15:00 ` Christoph Hellwig
  2022-11-02  5:47   ` Chaitanya Kulkarni
  2022-11-01 15:00 ` [PATCH 11/14] blk-mq: move the srcu_struct used for quiescing to the tagset Christoph Hellwig
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-11-01 15:00 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block, Hannes Reinecke

For submit_bio based queues there is no (S)RCU critical section during
I/O submission and thus nothing to wait for in blk_mq_wait_quiesce_done,
so skip doing any synchronization.  No non-mq driver should be calling
this, but for now we have core callers that unconditionally call into it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-mq.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 623e8a506539c..b8f37cfd3ae22 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -280,7 +280,9 @@ EXPORT_SYMBOL_GPL(blk_mq_wait_quiesce_done);
 void blk_mq_quiesce_queue(struct request_queue *q)
 {
 	blk_mq_quiesce_queue_nowait(q);
-	blk_mq_wait_quiesce_done(q);
+	/* nothing to wait for non-mq queues */
+	if (queue_is_mq(q))
+		blk_mq_wait_quiesce_done(q);
 }
 EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 11/14] blk-mq: move the srcu_struct used for quiescing to the tagset
  2022-11-01 15:00 per-tagset SRCU struct and quiesce v3 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2022-11-01 15:00 ` [PATCH 10/14] blk-mq: skip non-mq queues in blk_mq_quiesce_queue Christoph Hellwig
@ 2022-11-01 15:00 ` Christoph Hellwig
  2022-11-02  5:49   ` Chaitanya Kulkarni
  2022-11-01 15:00 ` [PATCH 12/14] blk-mq: pass a tagset to blk_mq_wait_quiesce_done Christoph Hellwig
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-11-01 15:00 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block, Hannes Reinecke

All I/O submissions have fairly similar latencies, and a tagset-wide
quiesce is a fairly common operation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chao Leng <lengchao@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-core.c       | 27 +++++----------------------
 block/blk-mq.c         | 33 +++++++++++++++++++++++++--------
 block/blk-mq.h         | 14 +++++++-------
 block/blk-sysfs.c      |  9 ++-------
 block/blk.h            |  9 +--------
 block/genhd.c          |  2 +-
 include/linux/blk-mq.h |  4 ++++
 include/linux/blkdev.h |  9 ---------
 8 files changed, 45 insertions(+), 62 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 5d50dd16e2a59..e9e2bf15cd909 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -65,7 +65,6 @@ DEFINE_IDA(blk_queue_ida);
  * For queue allocation
  */
 struct kmem_cache *blk_requestq_cachep;
-struct kmem_cache *blk_requestq_srcu_cachep;
 
 /*
  * Controlling structure to kblockd
@@ -373,26 +372,20 @@ static void blk_timeout_work(struct work_struct *work)
 {
 }
 
-struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu)
+struct request_queue *blk_alloc_queue(int node_id)
 {
 	struct request_queue *q;
 
-	q = kmem_cache_alloc_node(blk_get_queue_kmem_cache(alloc_srcu),
-			GFP_KERNEL | __GFP_ZERO, node_id);
+	q = kmem_cache_alloc_node(blk_requestq_cachep, GFP_KERNEL | __GFP_ZERO,
+				  node_id);
 	if (!q)
 		return NULL;
 
-	if (alloc_srcu) {
-		blk_queue_flag_set(QUEUE_FLAG_HAS_SRCU, q);
-		if (init_srcu_struct(q->srcu) != 0)
-			goto fail_q;
-	}
-
 	q->last_merge = NULL;
 
 	q->id = ida_alloc(&blk_queue_ida, GFP_KERNEL);
 	if (q->id < 0)
-		goto fail_srcu;
+		goto fail_q;
 
 	q->stats = blk_alloc_queue_stats();
 	if (!q->stats)
@@ -435,11 +428,8 @@ struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu)
 	blk_free_queue_stats(q->stats);
 fail_id:
 	ida_free(&blk_queue_ida, q->id);
-fail_srcu:
-	if (alloc_srcu)
-		cleanup_srcu_struct(q->srcu);
 fail_q:
-	kmem_cache_free(blk_get_queue_kmem_cache(alloc_srcu), q);
+	kmem_cache_free(blk_requestq_cachep, q);
 	return NULL;
 }
 
@@ -1172,9 +1162,6 @@ int __init blk_dev_init(void)
 			sizeof_field(struct request, cmd_flags));
 	BUILD_BUG_ON(REQ_OP_BITS + REQ_FLAG_BITS > 8 *
 			sizeof_field(struct bio, bi_opf));
-	BUILD_BUG_ON(ALIGN(offsetof(struct request_queue, srcu),
-			   __alignof__(struct request_queue)) !=
-		     sizeof(struct request_queue));
 
 	/* used for unplugging and affects IO latency/throughput - HIGHPRI */
 	kblockd_workqueue = alloc_workqueue("kblockd",
@@ -1185,10 +1172,6 @@ int __init blk_dev_init(void)
 	blk_requestq_cachep = kmem_cache_create("request_queue",
 			sizeof(struct request_queue), 0, SLAB_PANIC, NULL);
 
-	blk_requestq_srcu_cachep = kmem_cache_create("request_queue_srcu",
-			sizeof(struct request_queue) +
-			sizeof(struct srcu_struct), 0, SLAB_PANIC, NULL);
-
 	blk_debugfs_root = debugfs_create_dir("block", NULL);
 
 	return 0;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index b8f37cfd3ae22..3479325fc86c3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -261,8 +261,8 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
  */
 void blk_mq_wait_quiesce_done(struct request_queue *q)
 {
-	if (blk_queue_has_srcu(q))
-		synchronize_srcu(q->srcu);
+	if (q->tag_set->flags & BLK_MQ_F_BLOCKING)
+		synchronize_srcu(q->tag_set->srcu);
 	else
 		synchronize_rcu();
 }
@@ -4003,7 +4003,7 @@ static struct request_queue *blk_mq_init_queue_data(struct blk_mq_tag_set *set,
 	struct request_queue *q;
 	int ret;
 
-	q = blk_alloc_queue(set->numa_node, set->flags & BLK_MQ_F_BLOCKING);
+	q = blk_alloc_queue(set->numa_node);
 	if (!q)
 		return ERR_PTR(-ENOMEM);
 	q->queuedata = queuedata;
@@ -4168,9 +4168,6 @@ static void blk_mq_update_poll_flag(struct request_queue *q)
 int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 		struct request_queue *q)
 {
-	WARN_ON_ONCE(blk_queue_has_srcu(q) !=
-			!!(set->flags & BLK_MQ_F_BLOCKING));
-
 	/* mark the queue as mq asap */
 	q->mq_ops = set->ops;
 
@@ -4428,9 +4425,19 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	 */
 	if (set->nr_maps == 1 && set->nr_hw_queues > nr_cpu_ids)
 		set->nr_hw_queues = nr_cpu_ids;
+ 
+	if (set->flags & BLK_MQ_F_BLOCKING) {
+		set->srcu = kmalloc(sizeof(*set->srcu), GFP_KERNEL);
+		if (!set->srcu)
+			return -ENOMEM;
+		ret = init_srcu_struct(set->srcu);
+		if (ret)
+			goto out_free_srcu;
+	}
 
-	if (blk_mq_alloc_tag_set_tags(set, set->nr_hw_queues) < 0)
-		return -ENOMEM;
+	ret = blk_mq_alloc_tag_set_tags(set, set->nr_hw_queues);
+	if (ret)
+		goto out_cleanup_srcu;
 
 	ret = -ENOMEM;
 	for (i = 0; i < set->nr_maps; i++) {
@@ -4460,6 +4467,12 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	}
 	kfree(set->tags);
 	set->tags = NULL;
+out_cleanup_srcu:
+	if (set->flags & BLK_MQ_F_BLOCKING)
+		cleanup_srcu_struct(set->srcu);
+out_free_srcu:
+	if (set->flags & BLK_MQ_F_BLOCKING)
+		kfree(set->srcu);
 	return ret;
 }
 EXPORT_SYMBOL(blk_mq_alloc_tag_set);
@@ -4499,6 +4512,10 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
 
 	kfree(set->tags);
 	set->tags = NULL;
+	if (set->flags & BLK_MQ_F_BLOCKING) {
+		cleanup_srcu_struct(set->srcu);
+		kfree(set->srcu);
+	}
 }
 EXPORT_SYMBOL(blk_mq_free_tag_set);
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 0b2870839cdd6..ef59fee62780d 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -377,17 +377,17 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
 /* run the code block in @dispatch_ops with rcu/srcu read lock held */
 #define __blk_mq_run_dispatch_ops(q, check_sleep, dispatch_ops)	\
 do {								\
-	if (!blk_queue_has_srcu(q)) {				\
-		rcu_read_lock();				\
-		(dispatch_ops);					\
-		rcu_read_unlock();				\
-	} else {						\
+	if ((q)->tag_set->flags & BLK_MQ_F_BLOCKING) {		\
 		int srcu_idx;					\
 								\
 		might_sleep_if(check_sleep);			\
-		srcu_idx = srcu_read_lock((q)->srcu);		\
+		srcu_idx = srcu_read_lock((q)->tag_set->srcu);	\
 		(dispatch_ops);					\
-		srcu_read_unlock((q)->srcu, srcu_idx);		\
+		srcu_read_unlock((q)->tag_set->srcu, srcu_idx);	\
+	} else {						\
+		rcu_read_lock();				\
+		(dispatch_ops);					\
+		rcu_read_unlock();				\
 	}							\
 } while (0)
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 7b98c7074771d..02e94c4beff17 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -742,10 +742,8 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr,
 
 static void blk_free_queue_rcu(struct rcu_head *rcu_head)
 {
-	struct request_queue *q = container_of(rcu_head, struct request_queue,
-					       rcu_head);
-
-	kmem_cache_free(blk_get_queue_kmem_cache(blk_queue_has_srcu(q)), q);
+	kmem_cache_free(blk_requestq_cachep,
+			container_of(rcu_head, struct request_queue, rcu_head));
 }
 
 /**
@@ -782,9 +780,6 @@ static void blk_release_queue(struct kobject *kobj)
 	if (queue_is_mq(q))
 		blk_mq_release(q);
 
-	if (blk_queue_has_srcu(q))
-		cleanup_srcu_struct(q->srcu);
-
 	ida_free(&blk_queue_ida, q->id);
 	call_rcu(&q->rcu_head, blk_free_queue_rcu);
 }
diff --git a/block/blk.h b/block/blk.h
index 7f9e089ab1f75..37faec6e558b0 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -27,7 +27,6 @@ struct blk_flush_queue {
 };
 
 extern struct kmem_cache *blk_requestq_cachep;
-extern struct kmem_cache *blk_requestq_srcu_cachep;
 extern struct kobj_type blk_queue_ktype;
 extern struct ida blk_queue_ida;
 
@@ -428,13 +427,7 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio,
 		struct page *page, unsigned int len, unsigned int offset,
 		unsigned int max_sectors, bool *same_page);
 
-static inline struct kmem_cache *blk_get_queue_kmem_cache(bool srcu)
-{
-	if (srcu)
-		return blk_requestq_srcu_cachep;
-	return blk_requestq_cachep;
-}
-struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu);
+struct request_queue *blk_alloc_queue(int node_id);
 
 int disk_scan_partitions(struct gendisk *disk, fmode_t mode);
 
diff --git a/block/genhd.c b/block/genhd.c
index e7bd036024fab..09cde914e0548 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1414,7 +1414,7 @@ struct gendisk *__blk_alloc_disk(int node, struct lock_class_key *lkclass)
 	struct request_queue *q;
 	struct gendisk *disk;
 
-	q = blk_alloc_queue(node, false);
+	q = blk_alloc_queue(node);
 	if (!q)
 		return NULL;
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 569053ed959d5..f059edebb11d8 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -7,6 +7,7 @@
 #include <linux/lockdep.h>
 #include <linux/scatterlist.h>
 #include <linux/prefetch.h>
+#include <linux/srcu.h>
 
 struct blk_mq_tags;
 struct blk_flush_queue;
@@ -500,6 +501,8 @@ enum hctx_type {
  * @tag_list_lock: Serializes tag_list accesses.
  * @tag_list:	   List of the request queues that use this tag set. See also
  *		   request_queue.tag_set_list.
+ * @srcu:	   Use as lock when type of the request queue is blocking
+ *		   (BLK_MQ_F_BLOCKING).
  */
 struct blk_mq_tag_set {
 	struct blk_mq_queue_map	map[HCTX_MAX_TYPES];
@@ -520,6 +523,7 @@ struct blk_mq_tag_set {
 
 	struct mutex		tag_list_lock;
 	struct list_head	tag_list;
+	struct srcu_struct	*srcu;
 };
 
 /**
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 32137d85c9ad5..6a6fa167fc828 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -22,7 +22,6 @@
 #include <linux/blkzoned.h>
 #include <linux/sched.h>
 #include <linux/sbitmap.h>
-#include <linux/srcu.h>
 #include <linux/uuid.h>
 #include <linux/xarray.h>
 
@@ -543,18 +542,11 @@ struct request_queue {
 	struct mutex		debugfs_mutex;
 
 	bool			mq_sysfs_init_done;
-
-	/**
-	 * @srcu: Sleepable RCU. Use as lock when type of the request queue
-	 * is blocking (BLK_MQ_F_BLOCKING). Must be the last member
-	 */
-	struct srcu_struct	srcu[];
 };
 
 /* Keep blk_queue_flag_name[] in sync with the definitions below */
 #define QUEUE_FLAG_STOPPED	0	/* queue is stopped */
 #define QUEUE_FLAG_DYING	1	/* queue being torn down */
-#define QUEUE_FLAG_HAS_SRCU	2	/* SRCU is allocated */
 #define QUEUE_FLAG_NOMERGES     3	/* disable merge attempts */
 #define QUEUE_FLAG_SAME_COMP	4	/* complete on same CPU-group */
 #define QUEUE_FLAG_FAIL_IO	5	/* fake timeout */
@@ -590,7 +582,6 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 
 #define blk_queue_stopped(q)	test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags)
 #define blk_queue_dying(q)	test_bit(QUEUE_FLAG_DYING, &(q)->queue_flags)
-#define blk_queue_has_srcu(q)	test_bit(QUEUE_FLAG_HAS_SRCU, &(q)->queue_flags)
 #define blk_queue_init_done(q)	test_bit(QUEUE_FLAG_INIT_DONE, &(q)->queue_flags)
 #define blk_queue_nomerges(q)	test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags)
 #define blk_queue_noxmerges(q)	\
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 12/14] blk-mq: pass a tagset to blk_mq_wait_quiesce_done
  2022-11-01 15:00 per-tagset SRCU struct and quiesce v3 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2022-11-01 15:00 ` [PATCH 11/14] blk-mq: move the srcu_struct used for quiescing to the tagset Christoph Hellwig
@ 2022-11-01 15:00 ` Christoph Hellwig
  2022-11-02  5:50   ` Chaitanya Kulkarni
  2022-11-01 15:00 ` [PATCH 13/14] blk-mq: add tagset quiesce interface Christoph Hellwig
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-11-01 15:00 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block, Hannes Reinecke

Nothing in blk_mq_wait_quiesce_done needs the request_queue now, so just
pass the tagset, and move the non-mq check into the only caller that
needs it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chao Leng <lengchao@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-mq.c           | 16 +++++++++-------
 drivers/nvme/host/core.c |  4 ++--
 drivers/scsi/scsi_lib.c  |  2 +-
 include/linux/blk-mq.h   |  2 +-
 4 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3479325fc86c3..b358328a1dce3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -254,15 +254,17 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
 
 /**
  * blk_mq_wait_quiesce_done() - wait until in-progress quiesce is done
- * @q: request queue.
+ * @set: tag_set to wait on
  *
  * Note: it is driver's responsibility for making sure that quiesce has
- * been started.
+ * been started on or more of the request_queues of the tag_set.  This
+ * function only waits for the quiesce on those request_queues that had
+ * the quiesce flag set using blk_mq_quiesce_queue_nowait.
  */
-void blk_mq_wait_quiesce_done(struct request_queue *q)
+void blk_mq_wait_quiesce_done(struct blk_mq_tag_set *set)
 {
-	if (q->tag_set->flags & BLK_MQ_F_BLOCKING)
-		synchronize_srcu(q->tag_set->srcu);
+	if (set->flags & BLK_MQ_F_BLOCKING)
+		synchronize_srcu(set->srcu);
 	else
 		synchronize_rcu();
 }
@@ -282,7 +284,7 @@ void blk_mq_quiesce_queue(struct request_queue *q)
 	blk_mq_quiesce_queue_nowait(q);
 	/* nothing to wait for non-mq queues */
 	if (queue_is_mq(q))
-		blk_mq_wait_quiesce_done(q);
+		blk_mq_wait_quiesce_done(q->tag_set);
 }
 EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
 
@@ -1623,7 +1625,7 @@ static void blk_mq_timeout_work(struct work_struct *work)
 		 * uses srcu or rcu, wait for a synchronization point to
 		 * ensure all running submits have finished
 		 */
-		blk_mq_wait_quiesce_done(q);
+		blk_mq_wait_quiesce_done(q->tag_set);
 
 		expired.next = 0;
 		blk_mq_queue_tag_busy_iter(q, blk_mq_handle_expired, &expired);
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ed06fcb87f93a..66b0b6e110024 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -5107,7 +5107,7 @@ static void nvme_stop_ns_queue(struct nvme_ns *ns)
 	if (!test_and_set_bit(NVME_NS_STOPPED, &ns->flags))
 		blk_mq_quiesce_queue(ns->queue);
 	else
-		blk_mq_wait_quiesce_done(ns->queue);
+		blk_mq_wait_quiesce_done(ns->queue->tag_set);
 }
 
 /* let I/O to all namespaces fail in preparation for surprise removal */
@@ -5197,7 +5197,7 @@ void nvme_stop_admin_queue(struct nvme_ctrl *ctrl)
 	if (!test_and_set_bit(NVME_CTRL_ADMIN_Q_STOPPED, &ctrl->flags))
 		blk_mq_quiesce_queue(ctrl->admin_q);
 	else
-		blk_mq_wait_quiesce_done(ctrl->admin_q);
+		blk_mq_wait_quiesce_done(ctrl->admin_q->tag_set);
 }
 EXPORT_SYMBOL_GPL(nvme_stop_admin_queue);
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8b89fab7c4206..249757ddd8fea 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2735,7 +2735,7 @@ static void scsi_stop_queue(struct scsi_device *sdev, bool nowait)
 			blk_mq_quiesce_queue(sdev->request_queue);
 	} else {
 		if (!nowait)
-			blk_mq_wait_quiesce_done(sdev->request_queue);
+			blk_mq_wait_quiesce_done(sdev->request_queue->tag_set);
 	}
 }
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index f059edebb11d8..061ea6e7af017 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -880,7 +880,7 @@ void blk_mq_start_hw_queues(struct request_queue *q);
 void blk_mq_start_stopped_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
 void blk_mq_start_stopped_hw_queues(struct request_queue *q, bool async);
 void blk_mq_quiesce_queue(struct request_queue *q);
-void blk_mq_wait_quiesce_done(struct request_queue *q);
+void blk_mq_wait_quiesce_done(struct blk_mq_tag_set *set);
 void blk_mq_unquiesce_queue(struct request_queue *q);
 void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
 void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 13/14] blk-mq: add tagset quiesce interface
  2022-11-01 15:00 per-tagset SRCU struct and quiesce v3 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2022-11-01 15:00 ` [PATCH 12/14] blk-mq: pass a tagset to blk_mq_wait_quiesce_done Christoph Hellwig
@ 2022-11-01 15:00 ` Christoph Hellwig
  2022-11-02  5:51   ` Chaitanya Kulkarni
  2022-11-01 15:00 ` [PATCH 14/14] nvme: use blk_mq_[un]quiesce_tagset Christoph Hellwig
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-11-01 15:00 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block, Hannes Reinecke

From: Chao Leng <lengchao@huawei.com>

Drivers that have shared tagsets may need to quiesce potentially a lot
of request queues that all share a single tagset (e.g. nvme). Add an
interface to quiesce all the queues on a given tagset. This interface is
useful because it can speedup the quiesce by doing it in parallel.

Because some queues should not need to be quiesced (e.g. the nvme
connect_q) when quiescing the tagset, introduce a
QUEUE_FLAG_SKIP_TAGSET_QUIESCE flag to allow this new interface to
ski quiescing a particular queue.

Signed-off-by: Chao Leng <lengchao@huawei.com>
[hch: simplify for the per-tag_set srcu_struct]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chao Leng <lengchao@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-mq.c         | 27 +++++++++++++++++++++++++++
 include/linux/blk-mq.h |  2 ++
 include/linux/blkdev.h |  3 +++
 3 files changed, 32 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index b358328a1dce3..f0dd07bbf4c16 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -315,6 +315,33 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
 
+void blk_mq_quiesce_tagset(struct blk_mq_tag_set *set)
+{
+	struct request_queue *q;
+
+	mutex_lock(&set->tag_list_lock);
+	list_for_each_entry(q, &set->tag_list, tag_set_list) {
+		if (!blk_queue_skip_tagset_quiesce(q))
+			blk_mq_quiesce_queue_nowait(q);
+	}
+	blk_mq_wait_quiesce_done(set);
+	mutex_unlock(&set->tag_list_lock);
+}
+EXPORT_SYMBOL_GPL(blk_mq_quiesce_tagset);
+
+void blk_mq_unquiesce_tagset(struct blk_mq_tag_set *set)
+{
+	struct request_queue *q;
+
+	mutex_lock(&set->tag_list_lock);
+	list_for_each_entry(q, &set->tag_list, tag_set_list) {
+		if (!blk_queue_skip_tagset_quiesce(q))
+			blk_mq_unquiesce_queue(q);
+	}
+	mutex_unlock(&set->tag_list_lock);
+}
+EXPORT_SYMBOL_GPL(blk_mq_unquiesce_tagset);
+
 void blk_mq_wake_waiters(struct request_queue *q)
 {
 	struct blk_mq_hw_ctx *hctx;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 061ea6e7af017..109a0e30c4704 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -881,6 +881,8 @@ void blk_mq_start_stopped_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
 void blk_mq_start_stopped_hw_queues(struct request_queue *q, bool async);
 void blk_mq_quiesce_queue(struct request_queue *q);
 void blk_mq_wait_quiesce_done(struct blk_mq_tag_set *set);
+void blk_mq_quiesce_tagset(struct blk_mq_tag_set *set);
+void blk_mq_unquiesce_tagset(struct blk_mq_tag_set *set);
 void blk_mq_unquiesce_queue(struct request_queue *q);
 void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
 void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6a6fa167fc828..9188aa3f62595 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -571,6 +571,7 @@ struct request_queue {
 #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
 #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
 #define QUEUE_FLAG_SQ_SCHED     30	/* single queue style io dispatch */
+#define QUEUE_FLAG_SKIP_TAGSET_QUIESCE	31 /* quiesce_tagset skip the queue*/
 
 #define QUEUE_FLAG_MQ_DEFAULT	((1UL << QUEUE_FLAG_IO_STAT) |		\
 				 (1UL << QUEUE_FLAG_SAME_COMP) |	\
@@ -610,6 +611,8 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_pm_only(q)	atomic_read(&(q)->pm_only)
 #define blk_queue_registered(q)	test_bit(QUEUE_FLAG_REGISTERED, &(q)->queue_flags)
 #define blk_queue_sq_sched(q)	test_bit(QUEUE_FLAG_SQ_SCHED, &(q)->queue_flags)
+#define blk_queue_skip_tagset_quiesce(q) \
+	test_bit(QUEUE_FLAG_SKIP_TAGSET_QUIESCE, &(q)->queue_flags)
 
 extern void blk_set_pm_only(struct request_queue *q);
 extern void blk_clear_pm_only(struct request_queue *q);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 14/14] nvme: use blk_mq_[un]quiesce_tagset
  2022-11-01 15:00 per-tagset SRCU struct and quiesce v3 Christoph Hellwig
                   ` (12 preceding siblings ...)
  2022-11-01 15:00 ` [PATCH 13/14] blk-mq: add tagset quiesce interface Christoph Hellwig
@ 2022-11-01 15:00 ` Christoph Hellwig
  2022-11-02  5:51   ` Chaitanya Kulkarni
  2022-11-21 20:44   ` Shakeel Butt
  2022-11-01 15:25 ` per-tagset SRCU struct and quiesce v3 Keith Busch
                   ` (2 subsequent siblings)
  16 siblings, 2 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-11-01 15:00 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block, Hannes Reinecke

From: Chao Leng <lengchao@huawei.com>

All controller namespaces share the same tagset, so we can use this
interface which does the optimal operation for parallel quiesce based on
the tagset type(e.g. blocking tagsets and non-blocking tagsets).

nvme connect_q should not be quiesced when quiesce tagset, so set the
QUEUE_FLAG_SKIP_TAGSET_QUIESCE to skip it when init connect_q.

Currently we use NVME_NS_STOPPED to ensure pairing quiescing and
unquiescing. If use blk_mq_[un]quiesce_tagset, NVME_NS_STOPPED will be
invalided, so introduce NVME_CTRL_STOPPED to replace NVME_NS_STOPPED.
In addition, we never really quiesce a single namespace. It is a better
choice to move the flag from ns to ctrl.

Signed-off-by: Chao Leng <lengchao@huawei.com>
[hch: rebased on top of prep patches]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chao Leng <lengchao@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c | 34 ++++++++--------------------------
 drivers/nvme/host/nvme.h |  2 +-
 2 files changed, 9 insertions(+), 27 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 66b0b6e110024..f94b05c585cbc 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4895,6 +4895,8 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
 			ret = PTR_ERR(ctrl->connect_q);
 			goto out_free_tag_set;
 		}
+		blk_queue_flag_set(QUEUE_FLAG_SKIP_TAGSET_QUIESCE,
+				   ctrl->connect_q);
 	}
 
 	ctrl->tagset = set;
@@ -5096,20 +5098,6 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 }
 EXPORT_SYMBOL_GPL(nvme_init_ctrl);
 
-static void nvme_start_ns_queue(struct nvme_ns *ns)
-{
-	if (test_and_clear_bit(NVME_NS_STOPPED, &ns->flags))
-		blk_mq_unquiesce_queue(ns->queue);
-}
-
-static void nvme_stop_ns_queue(struct nvme_ns *ns)
-{
-	if (!test_and_set_bit(NVME_NS_STOPPED, &ns->flags))
-		blk_mq_quiesce_queue(ns->queue);
-	else
-		blk_mq_wait_quiesce_done(ns->queue->tag_set);
-}
-
 /* let I/O to all namespaces fail in preparation for surprise removal */
 void nvme_mark_namespaces_dead(struct nvme_ctrl *ctrl)
 {
@@ -5172,23 +5160,17 @@ EXPORT_SYMBOL_GPL(nvme_start_freeze);
 
 void nvme_stop_queues(struct nvme_ctrl *ctrl)
 {
-	struct nvme_ns *ns;
-
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
-		nvme_stop_ns_queue(ns);
-	up_read(&ctrl->namespaces_rwsem);
+	if (!test_and_set_bit(NVME_CTRL_STOPPED, &ctrl->flags))
+		blk_mq_quiesce_tagset(ctrl->tagset);
+	else
+		blk_mq_wait_quiesce_done(ctrl->tagset);
 }
 EXPORT_SYMBOL_GPL(nvme_stop_queues);
 
 void nvme_start_queues(struct nvme_ctrl *ctrl)
 {
-	struct nvme_ns *ns;
-
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
-		nvme_start_ns_queue(ns);
-	up_read(&ctrl->namespaces_rwsem);
+	if (test_and_clear_bit(NVME_CTRL_STOPPED, &ctrl->flags))
+		blk_mq_unquiesce_tagset(ctrl->tagset);
 }
 EXPORT_SYMBOL_GPL(nvme_start_queues);
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 1cd7168002438..f9df10653f3c5 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -237,6 +237,7 @@ enum nvme_ctrl_flags {
 	NVME_CTRL_FAILFAST_EXPIRED	= 0,
 	NVME_CTRL_ADMIN_Q_STOPPED	= 1,
 	NVME_CTRL_STARTED_ONCE		= 2,
+	NVME_CTRL_STOPPED		= 3,
 };
 
 struct nvme_ctrl {
@@ -486,7 +487,6 @@ struct nvme_ns {
 #define NVME_NS_ANA_PENDING	2
 #define NVME_NS_FORCE_RO	3
 #define NVME_NS_READY		4
-#define NVME_NS_STOPPED		5
 
 	struct cdev		cdev;
 	struct device		cdev_device;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: per-tagset SRCU struct and quiesce v3
  2022-11-01 15:00 per-tagset SRCU struct and quiesce v3 Christoph Hellwig
                   ` (13 preceding siblings ...)
  2022-11-01 15:00 ` [PATCH 14/14] nvme: use blk_mq_[un]quiesce_tagset Christoph Hellwig
@ 2022-11-01 15:25 ` Keith Busch
  2022-11-02  1:21 ` Chao Leng
  2022-11-02  6:49 ` Christoph Hellwig
  16 siblings, 0 replies; 38+ messages in thread
From: Keith Busch @ 2022-11-01 15:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Chao Leng, Ming Lei, linux-nvme, linux-block

The series looks good!

Reviewed-by: Keith Busch <kbusch@kernel.org>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/14] block: set the disk capacity to 0 in blk_mark_disk_dead
  2022-11-01 15:00 ` [PATCH 01/14] block: set the disk capacity to 0 in blk_mark_disk_dead Christoph Hellwig
@ 2022-11-02  1:18   ` Chaitanya Kulkarni
  2022-11-02 14:35   ` Jens Axboe
  1 sibling, 0 replies; 38+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-02  1:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ming Lei, Chao Leng, Jens Axboe, linux-nvme, linux-block,
	Keith Busch, Sagi Grimberg

On 11/1/22 08:00, Christoph Hellwig wrote:
> nvme and xen-blkfront are already doing this to stop buffered writes from
> creating dirty pages that can't be written out later.  Move it to the
> common code.
> 
> This also removes the comment about the ordering from nvme, as bd_mutex
> not only is gone entirely, but also hasn't been used for locking updates
> to the disk size long before that, and thus the ordering requirement
> documented there doesn't apply any more.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Keith Busch <kbusch@kernel.org>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> Reviewed-by: Chao Leng <lengchao@huawei.com>
> ---

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 02/14] nvme-pci: refactor the tagset handling in nvme_reset_work
  2022-11-01 15:00 ` [PATCH 02/14] nvme-pci: refactor the tagset handling in nvme_reset_work Christoph Hellwig
@ 2022-11-02  1:19   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-02  1:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ming Lei, Keith Busch, Jens Axboe, Chao Leng, Sagi Grimberg,
	linux-nvme, linux-block

On 11/1/22 08:00, Christoph Hellwig wrote:
> The code to create, update or delete a tagset and namespaces in
> nvme_reset_work is a bit convoluted.  Refactor it with a two high-level
> conditionals for first probe vs reset and I/O queues vs no I/O queues
> to make the code flow more clear.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 03/14] nvme: don't remove namespaces in nvme_passthru_end
  2022-11-01 15:00 ` [PATCH 03/14] nvme: don't remove namespaces in nvme_passthru_end Christoph Hellwig
@ 2022-11-02  1:20   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-02  1:20 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

On 11/1/22 08:00, Christoph Hellwig wrote:
> The call to nvme_remove_invalid_namespaces made sense when
> nvme_passthru_end revalidated all namespaces and had to remove those that
> didn't exist any more.  Since we don't revalidate from nvme_passthru_end
> now, this call is entirely spurious.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> ---

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 04/14] nvme: remove the NVME_NS_DEAD check in nvme_remove_invalid_namespaces
  2022-11-01 15:00 ` [PATCH 04/14] nvme: remove the NVME_NS_DEAD check in nvme_remove_invalid_namespaces Christoph Hellwig
@ 2022-11-02  1:21   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-02  1:21 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

On 11/1/22 08:00, Christoph Hellwig wrote:
> The NVME_NS_DEAD check only made sense when we revalidated namespaces
> in nvme_passthrough_end for commands that affected the namespace inventory.
> These days NVME_NS_DEAD is only set during reset or when tearing down
> namespaces, and we always remove all namespaces right after that.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> ---

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/14] nvme: remove the NVME_NS_DEAD check in nvme_validate_ns
  2022-11-01 15:00 ` [PATCH 05/14] nvme: remove the NVME_NS_DEAD check in nvme_validate_ns Christoph Hellwig
@ 2022-11-02  1:21   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-02  1:21 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

On 11/1/22 08:00, Christoph Hellwig wrote:
> At the point where namespaces are marked dead, the controller is in a
> non-live state and we won't get pass the identify commands.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: per-tagset SRCU struct and quiesce v3
  2022-11-01 15:00 per-tagset SRCU struct and quiesce v3 Christoph Hellwig
                   ` (14 preceding siblings ...)
  2022-11-01 15:25 ` per-tagset SRCU struct and quiesce v3 Keith Busch
@ 2022-11-02  1:21 ` Chao Leng
  2022-11-02  6:49 ` Christoph Hellwig
  16 siblings, 0 replies; 38+ messages in thread
From: Chao Leng @ 2022-11-02  1:21 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Ming Lei, linux-nvme, linux-block

The series looks good!

Reviewed-by: Chao Leng <lengchao@huawei.com>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 06/14] nvme: don't unquiesce the admin queue in nvme_kill_queues
  2022-11-01 15:00 ` [PATCH 06/14] nvme: don't unquiesce the admin queue in nvme_kill_queues Christoph Hellwig
@ 2022-11-02  1:24   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-02  1:24 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

On 11/1/22 08:00, Christoph Hellwig wrote:
> None of the callers of nvme_kill_queues needs it to unquiesce the
> admin queues, as all of them already do it themselves:
> 
>   1) nvme_reset_work explicit call nvme_start_admin_queue toward the
>      beginning of the function.  The extra call to nvme_start_admin_queue
>      in nvme_reset_work this won't do anything as
>      NVME_CTRL_ADMIN_Q_STOPPED will already be cleared.
>   2) nvme_remove calls nvme_dev_disable with shutdown flag set to true at
>      the very beginning of the function if the PCIe device was not present,
>      which is the precondition for the call to nvme_kill_queues.
>      nvme_dev_disable already calls nvme_start_admin_queue toward the
>      end of the function when the shutdown flag is set to true, so the
>      admin queue is already enabled at this point.
>   3) nvme_remove_dead_ctrl schedules a workqueue to unbind the driver,
>      which will end up in nvme_remove, which calls nvme_dev_disable with
>      the shutdown flag.  This case will call nvme_start_admin_queue a bit
>      later than before.
>   4) apple_nvme_remove uses the same sequence as nvme_remove_dead_ctrl
>      above.
>   5) nvme_remove_namespaces only calls nvme_kill_queues when the
>      controller is in the DEAD state.  That can only happen in the PCIe
>      driver, and only from nvme_remove. See item 2) above for the
>      conditions there.
> 
> So it is safe to just remove the call to nvme_start_admin_queue in
> nvme_kill_queues without replacement.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> ---
>   drivers/nvme/host/core.c | 4 ----


Thanks a lot for detailed explanation..
Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 07/14] nvme: split nvme_kill_queues
  2022-11-01 15:00 ` [PATCH 07/14] nvme: split nvme_kill_queues Christoph Hellwig
@ 2022-11-02  1:25   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-02  1:25 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

On 11/1/22 08:00, Christoph Hellwig wrote:
> nvme_kill_queues does two things:
> 
>   1) mark the gendisk of all namespaces dead
>   2) unquiesce all I/O queues
> 
> These used to be be intertwined due to block layer issues, but aren't
> any more.  So move the unquiscing of the I/O queues into the callers,
> and rename the rest of the function to the now more descriptive
> nvme_mark_namespaces_dead.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> ---
>   drivers/nvme/host/apple.c |  3 ++-
>   drivers/nvme/host/core.c  | 36 ++++++++----------------------------
>   drivers/nvme/host/nvme.h  |  3 +--
>   drivers/nvme/host/pci.c   |  6 ++++--
>   4 files changed, 15 insertions(+), 33 deletions(-)
> 

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>





^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 08/14] nvme-pci: don't unquiesce the I/O queues in nvme_remove_dead_ctrl
  2022-11-01 15:00 ` [PATCH 08/14] nvme-pci: don't unquiesce the I/O queues in nvme_remove_dead_ctrl Christoph Hellwig
@ 2022-11-02  1:26   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-02  1:26 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

On 11/1/22 08:00, Christoph Hellwig wrote:
> nvme_remove_dead_ctrl schedules nvme_remove to be called, which will
> call nvme_dev_disable and unquiesce the I/O queues.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 09/14] nvme-apple: don't unquiesce the I/O queues in apple_nvme_reset_work
  2022-11-01 15:00 ` [PATCH 09/14] nvme-apple: don't unquiesce the I/O queues in apple_nvme_reset_work Christoph Hellwig
@ 2022-11-02  5:47   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-02  5:47 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

On 11/1/22 08:00, Christoph Hellwig wrote:
> apple_nvme_reset_work schedules apple_nvme_remove, to be called, which
> will call apple_nvme_disable and unquiesce the I/O queues.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> ---
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 10/14] blk-mq: skip non-mq queues in blk_mq_quiesce_queue
  2022-11-01 15:00 ` [PATCH 10/14] blk-mq: skip non-mq queues in blk_mq_quiesce_queue Christoph Hellwig
@ 2022-11-02  5:47   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-02  5:47 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block, Hannes Reinecke

On 11/1/22 08:00, Christoph Hellwig wrote:
> For submit_bio based queues there is no (S)RCU critical section during
> I/O submission and thus nothing to wait for in blk_mq_wait_quiesce_done,
> so skip doing any synchronization.  No non-mq driver should be calling
> this, but for now we have core callers that unconditionally call into it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Keith Busch <kbusch@kernel.org>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> ---
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 11/14] blk-mq: move the srcu_struct used for quiescing to the tagset
  2022-11-01 15:00 ` [PATCH 11/14] blk-mq: move the srcu_struct used for quiescing to the tagset Christoph Hellwig
@ 2022-11-02  5:49   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-02  5:49 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block, Hannes Reinecke

On 11/1/22 08:00, Christoph Hellwig wrote:
> All I/O submissions have fairly similar latencies, and a tagset-wide
> quiesce is a fairly common operation.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Keith Busch <kbusch@kernel.org>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> Reviewed-by: Chao Leng <lengchao@huawei.com>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> ---

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 12/14] blk-mq: pass a tagset to blk_mq_wait_quiesce_done
  2022-11-01 15:00 ` [PATCH 12/14] blk-mq: pass a tagset to blk_mq_wait_quiesce_done Christoph Hellwig
@ 2022-11-02  5:50   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-02  5:50 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block, Hannes Reinecke

On 11/1/22 08:00, Christoph Hellwig wrote:
> Nothing in blk_mq_wait_quiesce_done needs the request_queue now, so just
> pass the tagset, and move the non-mq check into the only caller that
> needs it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Keith Busch <kbusch@kernel.org>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> Reviewed-by: Chao Leng <lengchao@huawei.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> ---

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 13/14] blk-mq: add tagset quiesce interface
  2022-11-01 15:00 ` [PATCH 13/14] blk-mq: add tagset quiesce interface Christoph Hellwig
@ 2022-11-02  5:51   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-02  5:51 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block, Hannes Reinecke

On 11/1/22 08:00, Christoph Hellwig wrote:
> From: Chao Leng <lengchao@huawei.com>
> 
> Drivers that have shared tagsets may need to quiesce potentially a lot
> of request queues that all share a single tagset (e.g. nvme). Add an
> interface to quiesce all the queues on a given tagset. This interface is
> useful because it can speedup the quiesce by doing it in parallel.
> 
> Because some queues should not need to be quiesced (e.g. the nvme
> connect_q) when quiescing the tagset, introduce a
> QUEUE_FLAG_SKIP_TAGSET_QUIESCE flag to allow this new interface to
> ski quiescing a particular queue.
> 
> Signed-off-by: Chao Leng <lengchao@huawei.com>
> [hch: simplify for the per-tag_set srcu_struct]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Keith Busch <kbusch@kernel.org>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> Reviewed-by: Chao Leng <lengchao@huawei.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> ---
>

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 14/14] nvme: use blk_mq_[un]quiesce_tagset
  2022-11-01 15:00 ` [PATCH 14/14] nvme: use blk_mq_[un]quiesce_tagset Christoph Hellwig
@ 2022-11-02  5:51   ` Chaitanya Kulkarni
  2022-11-21 20:44   ` Shakeel Butt
  1 sibling, 0 replies; 38+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-02  5:51 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block, Hannes Reinecke

On 11/1/22 08:00, Christoph Hellwig wrote:
> From: Chao Leng <lengchao@huawei.com>
> 
> All controller namespaces share the same tagset, so we can use this
> interface which does the optimal operation for parallel quiesce based on
> the tagset type(e.g. blocking tagsets and non-blocking tagsets).
> 
> nvme connect_q should not be quiesced when quiesce tagset, so set the
> QUEUE_FLAG_SKIP_TAGSET_QUIESCE to skip it when init connect_q.
> 
> Currently we use NVME_NS_STOPPED to ensure pairing quiescing and
> unquiescing. If use blk_mq_[un]quiesce_tagset, NVME_NS_STOPPED will be
> invalided, so introduce NVME_CTRL_STOPPED to replace NVME_NS_STOPPED.
> In addition, we never really quiesce a single namespace. It is a better
> choice to move the flag from ns to ctrl.
> 
> Signed-off-by: Chao Leng <lengchao@huawei.com>
> [hch: rebased on top of prep patches]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Keith Busch <kbusch@kernel.org>
> Reviewed-by: Chao Leng <lengchao@huawei.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> ---

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: per-tagset SRCU struct and quiesce v3
  2022-11-01 15:00 per-tagset SRCU struct and quiesce v3 Christoph Hellwig
                   ` (15 preceding siblings ...)
  2022-11-02  1:21 ` Chao Leng
@ 2022-11-02  6:49 ` Christoph Hellwig
  16 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-11-02  6:49 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

Looks like this version is acceptable now, which brings up the question
on how to merge it.  Most changes are in nvme, but it adds a new block
layer interface, that we'd also like to use in scsi.  Maybe the best
thing is if Jens picks it up directly?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/14] block: set the disk capacity to 0 in blk_mark_disk_dead
  2022-11-01 15:00 ` [PATCH 01/14] block: set the disk capacity to 0 in blk_mark_disk_dead Christoph Hellwig
  2022-11-02  1:18   ` Chaitanya Kulkarni
@ 2022-11-02 14:35   ` Jens Axboe
  1 sibling, 0 replies; 38+ messages in thread
From: Jens Axboe @ 2022-11-02 14:35 UTC (permalink / raw)
  To: Chao Leng, Sagi Grimberg, Keith Busch, Christoph Hellwig
  Cc: linux-nvme, Ming Lei, linux-block

On Tue, 1 Nov 2022 16:00:37 +0100, Christoph Hellwig wrote:
> nvme and xen-blkfront are already doing this to stop buffered writes from
> creating dirty pages that can't be written out later.  Move it to the
> common code.
> 
> This also removes the comment about the ordering from nvme, as bd_mutex
> not only is gone entirely, but also hasn't been used for locking updates
> to the disk size long before that, and thus the ordering requirement
> documented there doesn't apply any more.
> 
> [...]

Applied, thanks!

[01/14] block: set the disk capacity to 0 in blk_mark_disk_dead
        commit: 71b26083d59cd4ab22489829ffe7d4ead93f5546
[02/14] nvme-pci: refactor the tagset handling in nvme_reset_work
        commit: 0ffc7e98bfaa45380b800deeb9b65ce0371c652d
[03/14] nvme: don't remove namespaces in nvme_passthru_end
        commit: 23a908647efade186576c9628dd7bb560f6e759b
[04/14] nvme: remove the NVME_NS_DEAD check in nvme_remove_invalid_namespaces
        commit: 4f17344e9daeb6e9f89976d811a5373710ed1f04
[05/14] nvme: remove the NVME_NS_DEAD check in nvme_validate_ns
        commit: fde776afdd8467a09395a7aebdb2499f86315945
[06/14] nvme: don't unquiesce the admin queue in nvme_kill_queues
        commit: 6bcd5089ee1302e9ad7072ca0866f0c5a1158359
[07/14] nvme: split nvme_kill_queues
        commit: cd50f9b24726e9e195a0682c8d8d952396d57aef
[08/14] nvme-pci: don't unquiesce the I/O queues in nvme_remove_dead_ctrl
        commit: bad3e021ae2bb5ac9d650c9a04788efe753367f3
[09/14] nvme-apple: don't unquiesce the I/O queues in apple_nvme_reset_work
        commit: 2b4c2355c5e155cdf341d9ce2c2355b4b26c32c9
[10/14] blk-mq: skip non-mq queues in blk_mq_quiesce_queue
        commit: 8537380bb9882c201db60a1eb201aac6e74083e8
[11/14] blk-mq: move the srcu_struct used for quiescing to the tagset
        commit: 80bd4a7aab4c9ce59bf5e35fdf52aa23d8a3c9f5
[12/14] blk-mq: pass a tagset to blk_mq_wait_quiesce_done
        commit: 483239c75ba768e0e2c0e0c503e5fc13c3d5773a
[13/14] blk-mq: add tagset quiesce interface
        commit: 414dd48e882c5a39e7bd01b096ee6497eb3314b0
[14/14] nvme: use blk_mq_[un]quiesce_tagset
        commit: 98d81f0df70ce6fc48517d938026e3c684b9051a

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 14/14] nvme: use blk_mq_[un]quiesce_tagset
  2022-11-01 15:00 ` [PATCH 14/14] nvme: use blk_mq_[un]quiesce_tagset Christoph Hellwig
  2022-11-02  5:51   ` Chaitanya Kulkarni
@ 2022-11-21 20:44   ` Shakeel Butt
  2022-11-22  2:53     ` Chao Leng
  1 sibling, 1 reply; 38+ messages in thread
From: Shakeel Butt @ 2022-11-21 20:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng, Ming Lei,
	linux-nvme, linux-block, Hannes Reinecke

On Tue, Nov 01, 2022 at 04:00:50PM +0100, Christoph Hellwig wrote:
> From: Chao Leng <lengchao@huawei.com>
> 
> All controller namespaces share the same tagset, so we can use this
> interface which does the optimal operation for parallel quiesce based on
> the tagset type(e.g. blocking tagsets and non-blocking tagsets).
> 
> nvme connect_q should not be quiesced when quiesce tagset, so set the
> QUEUE_FLAG_SKIP_TAGSET_QUIESCE to skip it when init connect_q.
> 
> Currently we use NVME_NS_STOPPED to ensure pairing quiescing and
> unquiescing. If use blk_mq_[un]quiesce_tagset, NVME_NS_STOPPED will be
> invalided, so introduce NVME_CTRL_STOPPED to replace NVME_NS_STOPPED.
> In addition, we never really quiesce a single namespace. It is a better
> choice to move the flag from ns to ctrl.
> 
> Signed-off-by: Chao Leng <lengchao@huawei.com>
> [hch: rebased on top of prep patches]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Keith Busch <kbusch@kernel.org>
> Reviewed-by: Chao Leng <lengchao@huawei.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

This patch is causing the following crash at the boot and reverting it
fixes the issue. This is next-20221121 kernel.

[   19.781883] BUG: kernel NULL pointer dereference, address: 0000000000000078
[   19.788761] #PF: supervisor write access in kernel mode
[   19.793924] #PF: error_code(0x0002) - not-present page
[   19.799006] PGD 0 P4D 0
[   19.801510] Oops: 0002 [#1] SMP PTI
[   19.804955] CPU: 54 PID: 110 Comm: kworker/u146:0 Tainted: G          I        6.1.0-next-20221121-smp-DEV #1
[   19.814753] Hardware name: ...
[   19.822662] Workqueue: nvme-reset-wq nvme_reset_work
[   19.827571] RIP: 0010:mutex_lock+0x13/0x30
[   19.831617] Code: 07 00 00 f7 c1 00 01 00 00 75 02 31 c0 5b 5d c3 00 00 cc cc 00 00 cc 0f 1f 44 00 00 55 48 89 e5 65 48 8b 0d 7f 9a c2 52 31 c0 <f0> 48 0f b1 0f 74 05 e8 11 00 00 00 5d c3 66 66 66 66 66 66 2e 0f
[   19.850172] RSP: 0000:ffff96624651bcb0 EFLAGS: 00010246
[   19.855339] RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffff966246530000
[   19.862388] RDX: 0000000000000000 RSI: 0000000000000286 RDI: 0000000000000078
[   19.869442] RBP: ffff96624651bcb0 R08: 0000000000000004 R09: ffff96624651bc74
[   19.876495] R10: 0000000000000000 R11: 0000000000000000 R12: ffff966246570000
[   19.883547] R13: ffff966246570000 R14: ffff966249f330b0 R15: 0000000000000000
[   19.890599] FS:  0000000000000000(0000) GS:ffff9681bf880000(0000) knlGS:0000000000000000
[   19.898594] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   19.904273] CR2: 0000000000000078 CR3: 0000003c40e0a001 CR4: 00000000001706e0
[   19.911323] Call Trace:
[   19.913743]  <TASK>
[   19.915818]  blk_mq_quiesce_tagset+0x23/0xc0
[   19.920039]  nvme_stop_queues+0x1e/0x30
[   19.923829]  nvme_dev_disable+0xe1/0x3a0
[   19.927702]  nvme_reset_work+0x14cb/0x1600
[   19.931751]  ? ttwu_queue_wakelist+0xc4/0xd0
[   19.935970]  ? try_to_wake_up+0x1b4/0x2d0
[   19.939933]  process_one_work+0x1a4/0x320
[   19.943895]  worker_thread+0x241/0x3f0
[   19.947597]  ? worker_clr_flags+0x50/0x50
[   19.951560]  kthread+0x10d/0x120
[   19.954748]  ? kthread_blkcg+0x30/0x30
[   19.958453]  ret_from_fork+0x1f/0x30
[   19.961987]  </TASK>
[   19.964145] Modules linked in:
[   19.967655] gsmi: Log Shutdown Reason 0x03
[   19.971700] CR2: 0000000000000078
[   19.974978] ---[ end trace 0000000000000000 ]---
[   20.513708] RIP: 0010:mutex_lock+0x13/0x30
[   20.517757] Code: 07 00 00 f7 c1 00 01 00 00 75 02 31 c0 5b 5d c3 00 00 cc cc 00 00 cc 0f 1f 44 00 00 55 48 89 e5 65 48 8b 0d 7f 9a c2 52 31 c0 <f0> 48 0f b1 0f 74 05 e8 11 00 00 00 5d c3 66 66 66 66 66 66 2e 0f
[   20.536308] RSP: 0000:ffff96624651bcb0 EFLAGS: 00010246
[   20.541469] RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffff966246530000
[   20.548519] RDX: 0000000000000000 RSI: 0000000000000286 RDI: 0000000000000078
[   20.555568] RBP: ffff96624651bcb0 R08: 0000000000000004 R09: ffff96624651bc74
[   20.562614] R10: 0000000000000000 R11: 0000000000000000 R12: ffff966246570000
[   20.569663] R13: ffff966246570000 R14: ffff966249f330b0 R15: 0000000000000000
[   20.576713] FS:  0000000000000000(0000) GS:ffff9681bf880000(0000) knlGS:0000000000000000
[   20.584704] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   20.590380] CR2: 0000000000000078 CR3: 0000003c40e0a001 CR4: 00000000001706e0
[   20.597432] Kernel panic - not syncing: Fatal exception
[   20.602654] Kernel Offset: 0x2b800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   20.613747] gsmi: Log Shutdown Reason 0x02


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 14/14] nvme: use blk_mq_[un]quiesce_tagset
  2022-11-21 20:44   ` Shakeel Butt
@ 2022-11-22  2:53     ` Chao Leng
  2022-11-22  6:08       ` Christoph Hellwig
  0 siblings, 1 reply; 38+ messages in thread
From: Chao Leng @ 2022-11-22  2:53 UTC (permalink / raw)
  To: Shakeel Butt, Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, Ming Lei, linux-nvme,
	linux-block, Hannes Reinecke



On 2022/11/22 4:44, Shakeel Butt wrote:
> On Tue, Nov 01, 2022 at 04:00:50PM +0100, Christoph Hellwig wrote:
>> From: Chao Leng <lengchao@huawei.com>
>>
>> All controller namespaces share the same tagset, so we can use this
>> interface which does the optimal operation for parallel quiesce based on
>> the tagset type(e.g. blocking tagsets and non-blocking tagsets).
>>
>> nvme connect_q should not be quiesced when quiesce tagset, so set the
>> QUEUE_FLAG_SKIP_TAGSET_QUIESCE to skip it when init connect_q.
>>
>> Currently we use NVME_NS_STOPPED to ensure pairing quiescing and
>> unquiescing. If use blk_mq_[un]quiesce_tagset, NVME_NS_STOPPED will be
>> invalided, so introduce NVME_CTRL_STOPPED to replace NVME_NS_STOPPED.
>> In addition, we never really quiesce a single namespace. It is a better
>> choice to move the flag from ns to ctrl.
>>
>> Signed-off-by: Chao Leng <lengchao@huawei.com>
>> [hch: rebased on top of prep patches]
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> Reviewed-by: Keith Busch <kbusch@kernel.org>
>> Reviewed-by: Chao Leng <lengchao@huawei.com>
>> Reviewed-by: Hannes Reinecke <hare@suse.de>
>> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> 
> This patch is causing the following crash at the boot and reverting it
> fixes the issue. This is next-20221121 kernel.
This patch can fix it.
https://lore.kernel.org/linux-nvme/20221116072711.1903536-1-hch@lst.de/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 14/14] nvme: use blk_mq_[un]quiesce_tagset
  2022-11-22  2:53     ` Chao Leng
@ 2022-11-22  6:08       ` Christoph Hellwig
  2022-11-22  9:44         ` Sagi Grimberg
  2022-11-22 16:40         ` Shakeel Butt
  0 siblings, 2 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-11-22  6:08 UTC (permalink / raw)
  To: Chao Leng
  Cc: Shakeel Butt, Christoph Hellwig, Jens Axboe, Keith Busch,
	Sagi Grimberg, Ming Lei, linux-nvme, linux-block,
	Hannes Reinecke

On Tue, Nov 22, 2022 at 10:53:17AM +0800, Chao Leng wrote:
>> This patch is causing the following crash at the boot and reverting it
>> fixes the issue. This is next-20221121 kernel.
> This patch can fix it.
> https://lore.kernel.org/linux-nvme/20221116072711.1903536-1-hch@lst.de/

Yes,  But I'm a little curious how it happened.  Shakeel, do you
have a genuine admin controller that does not have any I/O queues
to start with, or did we have other setup time errors?  Can youpost the
full dmesg?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 14/14] nvme: use blk_mq_[un]quiesce_tagset
  2022-11-22  6:08       ` Christoph Hellwig
@ 2022-11-22  9:44         ` Sagi Grimberg
  2022-11-22 16:40         ` Shakeel Butt
  1 sibling, 0 replies; 38+ messages in thread
From: Sagi Grimberg @ 2022-11-22  9:44 UTC (permalink / raw)
  To: Christoph Hellwig, Chao Leng
  Cc: Shakeel Butt, Jens Axboe, Keith Busch, Ming Lei, linux-nvme,
	linux-block, Hannes Reinecke


>>> This patch is causing the following crash at the boot and reverting it
>>> fixes the issue. This is next-20221121 kernel.
>> This patch can fix it.
>> https://lore.kernel.org/linux-nvme/20221116072711.1903536-1-hch@lst.de/
> 
> Yes,  But I'm a little curious how it happened.  Shakeel, do you
> have a genuine admin controller that does not have any I/O queues
> to start with, or did we have other setup time errors?  Can youpost the
> full dmesg?

Is this after splitting reset from probe?

If not, I think that probe schedules nvme_reset, which first
disables the controller, calling nvme_stop_queues unconditionally,
i.e. before the tagset was allocated.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 14/14] nvme: use blk_mq_[un]quiesce_tagset
  2022-11-22  6:08       ` Christoph Hellwig
  2022-11-22  9:44         ` Sagi Grimberg
@ 2022-11-22 16:40         ` Shakeel Butt
  1 sibling, 0 replies; 38+ messages in thread
From: Shakeel Butt @ 2022-11-22 16:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chao Leng, Jens Axboe, Keith Busch, Sagi Grimberg, Ming Lei,
	linux-nvme, linux-block, Hannes Reinecke

On Mon, Nov 21, 2022 at 10:08 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Tue, Nov 22, 2022 at 10:53:17AM +0800, Chao Leng wrote:
> >> This patch is causing the following crash at the boot and reverting it
> >> fixes the issue. This is next-20221121 kernel.
> > This patch can fix it.
> > https://lore.kernel.org/linux-nvme/20221116072711.1903536-1-hch@lst.de/
>
> Yes,  But I'm a little curious how it happened.  Shakeel, do you
> have a genuine admin controller that does not have any I/O queues
> to start with, or did we have other setup time errors?  Can youpost the
> full dmesg?

Sorry, I don't know about the admin controller and its I/O queues. For
dmesg, it was an early crash, so not in /var/log/message. I was able
to get the crash dump through the remote serial console but there is
no easy way to get it to give full dmesg. I will see if I can get more
info.

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2022-11-22 16:40 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-01 15:00 per-tagset SRCU struct and quiesce v3 Christoph Hellwig
2022-11-01 15:00 ` [PATCH 01/14] block: set the disk capacity to 0 in blk_mark_disk_dead Christoph Hellwig
2022-11-02  1:18   ` Chaitanya Kulkarni
2022-11-02 14:35   ` Jens Axboe
2022-11-01 15:00 ` [PATCH 02/14] nvme-pci: refactor the tagset handling in nvme_reset_work Christoph Hellwig
2022-11-02  1:19   ` Chaitanya Kulkarni
2022-11-01 15:00 ` [PATCH 03/14] nvme: don't remove namespaces in nvme_passthru_end Christoph Hellwig
2022-11-02  1:20   ` Chaitanya Kulkarni
2022-11-01 15:00 ` [PATCH 04/14] nvme: remove the NVME_NS_DEAD check in nvme_remove_invalid_namespaces Christoph Hellwig
2022-11-02  1:21   ` Chaitanya Kulkarni
2022-11-01 15:00 ` [PATCH 05/14] nvme: remove the NVME_NS_DEAD check in nvme_validate_ns Christoph Hellwig
2022-11-02  1:21   ` Chaitanya Kulkarni
2022-11-01 15:00 ` [PATCH 06/14] nvme: don't unquiesce the admin queue in nvme_kill_queues Christoph Hellwig
2022-11-02  1:24   ` Chaitanya Kulkarni
2022-11-01 15:00 ` [PATCH 07/14] nvme: split nvme_kill_queues Christoph Hellwig
2022-11-02  1:25   ` Chaitanya Kulkarni
2022-11-01 15:00 ` [PATCH 08/14] nvme-pci: don't unquiesce the I/O queues in nvme_remove_dead_ctrl Christoph Hellwig
2022-11-02  1:26   ` Chaitanya Kulkarni
2022-11-01 15:00 ` [PATCH 09/14] nvme-apple: don't unquiesce the I/O queues in apple_nvme_reset_work Christoph Hellwig
2022-11-02  5:47   ` Chaitanya Kulkarni
2022-11-01 15:00 ` [PATCH 10/14] blk-mq: skip non-mq queues in blk_mq_quiesce_queue Christoph Hellwig
2022-11-02  5:47   ` Chaitanya Kulkarni
2022-11-01 15:00 ` [PATCH 11/14] blk-mq: move the srcu_struct used for quiescing to the tagset Christoph Hellwig
2022-11-02  5:49   ` Chaitanya Kulkarni
2022-11-01 15:00 ` [PATCH 12/14] blk-mq: pass a tagset to blk_mq_wait_quiesce_done Christoph Hellwig
2022-11-02  5:50   ` Chaitanya Kulkarni
2022-11-01 15:00 ` [PATCH 13/14] blk-mq: add tagset quiesce interface Christoph Hellwig
2022-11-02  5:51   ` Chaitanya Kulkarni
2022-11-01 15:00 ` [PATCH 14/14] nvme: use blk_mq_[un]quiesce_tagset Christoph Hellwig
2022-11-02  5:51   ` Chaitanya Kulkarni
2022-11-21 20:44   ` Shakeel Butt
2022-11-22  2:53     ` Chao Leng
2022-11-22  6:08       ` Christoph Hellwig
2022-11-22  9:44         ` Sagi Grimberg
2022-11-22 16:40         ` Shakeel Butt
2022-11-01 15:25 ` per-tagset SRCU struct and quiesce v3 Keith Busch
2022-11-02  1:21 ` Chao Leng
2022-11-02  6:49 ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.