All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: nvme-pci: split the probe and reset handlers
@ 2022-11-08 15:02 ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

Hi all,

this series split the nvme-pci probe handler to be separate from the reset
handler.  I've been wanting to do that for a while, but the bug report from
Gerd that was caused by confusing about the controller state in the reset
state required it to be expedited.

Diffstat:
 host/apple.c  |    2 
 host/core.c   |   39 ++++-
 host/fc.c     |    2 
 host/nvme.h   |    7 
 host/pci.c    |  415 +++++++++++++++++++++++++++-------------------------------
 host/rdma.c   |    2 
 host/tcp.c    |    2 
 target/loop.c |    2 
 8 files changed, 237 insertions(+), 234 deletions(-)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RFC: nvme-pci: split the probe and reset handlers
@ 2022-11-08 15:02 ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

Hi all,

this series split the nvme-pci probe handler to be separate from the reset
handler.  I've been wanting to do that for a while, but the bug report from
Gerd that was caused by confusing about the controller state in the reset
state required it to be expedited.

Diffstat:
 host/apple.c  |    2 
 host/core.c   |   39 ++++-
 host/fc.c     |    2 
 host/nvme.h   |    7 
 host/pci.c    |  415 +++++++++++++++++++++++++++-------------------------------
 host/rdma.c   |    2 
 host/tcp.c    |    2 
 target/loop.c |    2 
 8 files changed, 237 insertions(+), 234 deletions(-)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 01/12] nvme-pci: don't call nvme_init_ctrl_finish from nvme_passthru_end
  2022-11-08 15:02 ` Christoph Hellwig
@ 2022-11-08 15:02   ` Christoph Hellwig
  -1 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

nvme_passthrough_end can race with a reset, so we should not call
nvme_init_ctrl_finish from here.  Instead just log that the controller
capabilities might have changed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f94b05c585cbc..706499d4bfefb 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1120,8 +1120,10 @@ void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects,
 		mutex_unlock(&ctrl->subsys->lock);
 		mutex_unlock(&ctrl->scan_lock);
 	}
-	if (effects & NVME_CMD_EFFECTS_CCC)
-		nvme_init_ctrl_finish(ctrl);
+	if (effects & NVME_CMD_EFFECTS_CCC) {
+		dev_info(ctrl->device,
+"controller capabilities changed, reset may be required to take effect.\n");
+	}
 	if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC)) {
 		nvme_queue_scan(ctrl);
 		flush_work(&ctrl->scan_work);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 01/12] nvme-pci: don't call nvme_init_ctrl_finish from nvme_passthru_end
@ 2022-11-08 15:02   ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

nvme_passthrough_end can race with a reset, so we should not call
nvme_init_ctrl_finish from here.  Instead just log that the controller
capabilities might have changed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f94b05c585cbc..706499d4bfefb 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1120,8 +1120,10 @@ void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects,
 		mutex_unlock(&ctrl->subsys->lock);
 		mutex_unlock(&ctrl->scan_lock);
 	}
-	if (effects & NVME_CMD_EFFECTS_CCC)
-		nvme_init_ctrl_finish(ctrl);
+	if (effects & NVME_CMD_EFFECTS_CCC) {
+		dev_info(ctrl->device,
+"controller capabilities changed, reset may be required to take effect.\n");
+	}
 	if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC)) {
 		nvme_queue_scan(ctrl);
 		flush_work(&ctrl->scan_work);
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 02/12] nvme: move OPAL setup from PCIe to core
  2022-11-08 15:02 ` Christoph Hellwig
@ 2022-11-08 15:02   ` Christoph Hellwig
  -1 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

Nothing about the TCG Opal support is PCIe transport specific, so move it
to the core code.  For this nvme_init_ctrl_finish grows a new
was_suspended argument that allows the transport driver to tell the OPAL
code if the controller came out of a suspend cycle.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/apple.c  |  2 +-
 drivers/nvme/host/core.c   | 25 ++++++++++++++++++++-----
 drivers/nvme/host/fc.c     |  2 +-
 drivers/nvme/host/nvme.h   |  5 +----
 drivers/nvme/host/pci.c    | 14 +-------------
 drivers/nvme/host/rdma.c   |  2 +-
 drivers/nvme/host/tcp.c    |  2 +-
 drivers/nvme/target/loop.c |  2 +-
 8 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
index 24e224c279a41..a85349a7e938c 100644
--- a/drivers/nvme/host/apple.c
+++ b/drivers/nvme/host/apple.c
@@ -1102,7 +1102,7 @@ static void apple_nvme_reset_work(struct work_struct *work)
 		goto out;
 	}
 
-	ret = nvme_init_ctrl_finish(&anv->ctrl);
+	ret = nvme_init_ctrl_finish(&anv->ctrl, false);
 	if (ret)
 		goto out;
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 706499d4bfefb..887d1aefbc49f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2180,8 +2180,7 @@ const struct pr_ops nvme_pr_ops = {
 	.pr_clear	= nvme_pr_clear,
 };
 
-#ifdef CONFIG_BLK_SED_OPAL
-int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
+static int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
 		bool send)
 {
 	struct nvme_ctrl *ctrl = data;
@@ -2198,8 +2197,21 @@ int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
 	return __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, NULL, buffer, len,
 			NVME_QID_ANY, 1, 0);
 }
-EXPORT_SYMBOL_GPL(nvme_sec_submit);
-#endif /* CONFIG_BLK_SED_OPAL */
+
+static void nvme_configure_opal(struct nvme_ctrl *ctrl, bool was_suspended)
+{
+	if (!IS_ENABLED(CONFIG_BLK_SED_OPAL))
+		return;
+	if (ctrl->oacs & NVME_CTRL_OACS_SEC_SUPP) {
+		if (!ctrl->opal_dev)
+			ctrl->opal_dev = init_opal_dev(ctrl, &nvme_sec_submit);
+		else if (was_suspended)
+			opal_unlock_from_suspend(ctrl->opal_dev);
+	} else {
+		free_opal_dev(ctrl->opal_dev);
+		ctrl->opal_dev = NULL;
+	}
+}
 
 #ifdef CONFIG_BLK_DEV_ZONED
 static int nvme_report_zones(struct gendisk *disk, sector_t sector,
@@ -3231,7 +3243,7 @@ static int nvme_init_identify(struct nvme_ctrl *ctrl)
  * register in our nvme_ctrl structure.  This should be called as soon as
  * the admin queue is fully up and running.
  */
-int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl)
+int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl, bool was_suspended)
 {
 	int ret;
 
@@ -3262,6 +3274,8 @@ int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl)
 	if (ret < 0)
 		return ret;
 
+	nvme_configure_opal(ctrl, was_suspended);
+
 	if (!ctrl->identified && !nvme_discovery_ctrl(ctrl)) {
 		/*
 		 * Do not return errors unless we are in a controller reset,
@@ -4996,6 +5010,7 @@ static void nvme_free_ctrl(struct device *dev)
 	nvme_auth_stop(ctrl);
 	nvme_auth_free(ctrl);
 	__free_page(ctrl->discard_page);
+	free_opal_dev(ctrl->opal_dev);
 
 	if (subsys) {
 		mutex_lock(&nvme_subsystems_lock);
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 5d57a042dbcad..c42ce4a65e652 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3106,7 +3106,7 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 
 	nvme_start_admin_queue(&ctrl->ctrl);
 
-	ret = nvme_init_ctrl_finish(&ctrl->ctrl);
+	ret = nvme_init_ctrl_finish(&ctrl->ctrl, false);
 	if (ret || test_bit(ASSOC_FAILED, &ctrl->flags))
 		goto out_disconnect_admin_queue;
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index f9df10653f3c5..124a6412cc5c3 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -735,7 +735,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 void nvme_uninit_ctrl(struct nvme_ctrl *ctrl);
 void nvme_start_ctrl(struct nvme_ctrl *ctrl);
 void nvme_stop_ctrl(struct nvme_ctrl *ctrl);
-int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl);
+int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl, bool was_suspended);
 int nvme_alloc_admin_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
 		const struct blk_mq_ops *ops, unsigned int flags,
 		unsigned int cmd_size);
@@ -747,9 +747,6 @@ void nvme_remove_io_tag_set(struct nvme_ctrl *ctrl);
 
 void nvme_remove_namespaces(struct nvme_ctrl *ctrl);
 
-int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
-		bool send);
-
 void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
 		volatile union nvme_result *res);
 
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 208c387f1558d..e4f084e12b966 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2772,7 +2772,6 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 	nvme_free_tagset(dev);
 	if (dev->ctrl.admin_q)
 		blk_put_queue(dev->ctrl.admin_q);
-	free_opal_dev(dev->ctrl.opal_dev);
 	mempool_destroy(dev->iod_mempool);
 	put_device(dev->dev);
 	kfree(dev->queues);
@@ -2866,21 +2865,10 @@ static void nvme_reset_work(struct work_struct *work)
 	 */
 	dev->ctrl.max_integrity_segments = 1;
 
-	result = nvme_init_ctrl_finish(&dev->ctrl);
+	result = nvme_init_ctrl_finish(&dev->ctrl, was_suspend);
 	if (result)
 		goto out;
 
-	if (dev->ctrl.oacs & NVME_CTRL_OACS_SEC_SUPP) {
-		if (!dev->ctrl.opal_dev)
-			dev->ctrl.opal_dev =
-				init_opal_dev(&dev->ctrl, &nvme_sec_submit);
-		else if (was_suspend)
-			opal_unlock_from_suspend(dev->ctrl.opal_dev);
-	} else {
-		free_opal_dev(dev->ctrl.opal_dev);
-		dev->ctrl.opal_dev = NULL;
-	}
-
 	if (dev->ctrl.oacs & NVME_CTRL_OACS_DBBUF_SUPP) {
 		result = nvme_dbbuf_dma_alloc(dev);
 		if (result)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 6e079abb22ee9..ccd45e5b32986 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -871,7 +871,7 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 
 	nvme_start_admin_queue(&ctrl->ctrl);
 
-	error = nvme_init_ctrl_finish(&ctrl->ctrl);
+	error = nvme_init_ctrl_finish(&ctrl->ctrl, false);
 	if (error)
 		goto out_quiesce_queue;
 
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 1eed0fc26b3ae..4f8584657bb75 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1949,7 +1949,7 @@ static int nvme_tcp_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 
 	nvme_start_admin_queue(ctrl);
 
-	error = nvme_init_ctrl_finish(ctrl);
+	error = nvme_init_ctrl_finish(ctrl, false);
 	if (error)
 		goto out_quiesce_queue;
 
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index b45fe3adf015f..893c50f365c4d 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -377,7 +377,7 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 
 	nvme_start_admin_queue(&ctrl->ctrl);
 
-	error = nvme_init_ctrl_finish(&ctrl->ctrl);
+	error = nvme_init_ctrl_finish(&ctrl->ctrl, false);
 	if (error)
 		goto out_cleanup_tagset;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 02/12] nvme: move OPAL setup from PCIe to core
@ 2022-11-08 15:02   ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

Nothing about the TCG Opal support is PCIe transport specific, so move it
to the core code.  For this nvme_init_ctrl_finish grows a new
was_suspended argument that allows the transport driver to tell the OPAL
code if the controller came out of a suspend cycle.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/apple.c  |  2 +-
 drivers/nvme/host/core.c   | 25 ++++++++++++++++++++-----
 drivers/nvme/host/fc.c     |  2 +-
 drivers/nvme/host/nvme.h   |  5 +----
 drivers/nvme/host/pci.c    | 14 +-------------
 drivers/nvme/host/rdma.c   |  2 +-
 drivers/nvme/host/tcp.c    |  2 +-
 drivers/nvme/target/loop.c |  2 +-
 8 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
index 24e224c279a41..a85349a7e938c 100644
--- a/drivers/nvme/host/apple.c
+++ b/drivers/nvme/host/apple.c
@@ -1102,7 +1102,7 @@ static void apple_nvme_reset_work(struct work_struct *work)
 		goto out;
 	}
 
-	ret = nvme_init_ctrl_finish(&anv->ctrl);
+	ret = nvme_init_ctrl_finish(&anv->ctrl, false);
 	if (ret)
 		goto out;
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 706499d4bfefb..887d1aefbc49f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2180,8 +2180,7 @@ const struct pr_ops nvme_pr_ops = {
 	.pr_clear	= nvme_pr_clear,
 };
 
-#ifdef CONFIG_BLK_SED_OPAL
-int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
+static int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
 		bool send)
 {
 	struct nvme_ctrl *ctrl = data;
@@ -2198,8 +2197,21 @@ int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
 	return __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, NULL, buffer, len,
 			NVME_QID_ANY, 1, 0);
 }
-EXPORT_SYMBOL_GPL(nvme_sec_submit);
-#endif /* CONFIG_BLK_SED_OPAL */
+
+static void nvme_configure_opal(struct nvme_ctrl *ctrl, bool was_suspended)
+{
+	if (!IS_ENABLED(CONFIG_BLK_SED_OPAL))
+		return;
+	if (ctrl->oacs & NVME_CTRL_OACS_SEC_SUPP) {
+		if (!ctrl->opal_dev)
+			ctrl->opal_dev = init_opal_dev(ctrl, &nvme_sec_submit);
+		else if (was_suspended)
+			opal_unlock_from_suspend(ctrl->opal_dev);
+	} else {
+		free_opal_dev(ctrl->opal_dev);
+		ctrl->opal_dev = NULL;
+	}
+}
 
 #ifdef CONFIG_BLK_DEV_ZONED
 static int nvme_report_zones(struct gendisk *disk, sector_t sector,
@@ -3231,7 +3243,7 @@ static int nvme_init_identify(struct nvme_ctrl *ctrl)
  * register in our nvme_ctrl structure.  This should be called as soon as
  * the admin queue is fully up and running.
  */
-int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl)
+int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl, bool was_suspended)
 {
 	int ret;
 
@@ -3262,6 +3274,8 @@ int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl)
 	if (ret < 0)
 		return ret;
 
+	nvme_configure_opal(ctrl, was_suspended);
+
 	if (!ctrl->identified && !nvme_discovery_ctrl(ctrl)) {
 		/*
 		 * Do not return errors unless we are in a controller reset,
@@ -4996,6 +5010,7 @@ static void nvme_free_ctrl(struct device *dev)
 	nvme_auth_stop(ctrl);
 	nvme_auth_free(ctrl);
 	__free_page(ctrl->discard_page);
+	free_opal_dev(ctrl->opal_dev);
 
 	if (subsys) {
 		mutex_lock(&nvme_subsystems_lock);
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 5d57a042dbcad..c42ce4a65e652 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3106,7 +3106,7 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 
 	nvme_start_admin_queue(&ctrl->ctrl);
 
-	ret = nvme_init_ctrl_finish(&ctrl->ctrl);
+	ret = nvme_init_ctrl_finish(&ctrl->ctrl, false);
 	if (ret || test_bit(ASSOC_FAILED, &ctrl->flags))
 		goto out_disconnect_admin_queue;
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index f9df10653f3c5..124a6412cc5c3 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -735,7 +735,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 void nvme_uninit_ctrl(struct nvme_ctrl *ctrl);
 void nvme_start_ctrl(struct nvme_ctrl *ctrl);
 void nvme_stop_ctrl(struct nvme_ctrl *ctrl);
-int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl);
+int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl, bool was_suspended);
 int nvme_alloc_admin_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
 		const struct blk_mq_ops *ops, unsigned int flags,
 		unsigned int cmd_size);
@@ -747,9 +747,6 @@ void nvme_remove_io_tag_set(struct nvme_ctrl *ctrl);
 
 void nvme_remove_namespaces(struct nvme_ctrl *ctrl);
 
-int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
-		bool send);
-
 void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
 		volatile union nvme_result *res);
 
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 208c387f1558d..e4f084e12b966 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2772,7 +2772,6 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 	nvme_free_tagset(dev);
 	if (dev->ctrl.admin_q)
 		blk_put_queue(dev->ctrl.admin_q);
-	free_opal_dev(dev->ctrl.opal_dev);
 	mempool_destroy(dev->iod_mempool);
 	put_device(dev->dev);
 	kfree(dev->queues);
@@ -2866,21 +2865,10 @@ static void nvme_reset_work(struct work_struct *work)
 	 */
 	dev->ctrl.max_integrity_segments = 1;
 
-	result = nvme_init_ctrl_finish(&dev->ctrl);
+	result = nvme_init_ctrl_finish(&dev->ctrl, was_suspend);
 	if (result)
 		goto out;
 
-	if (dev->ctrl.oacs & NVME_CTRL_OACS_SEC_SUPP) {
-		if (!dev->ctrl.opal_dev)
-			dev->ctrl.opal_dev =
-				init_opal_dev(&dev->ctrl, &nvme_sec_submit);
-		else if (was_suspend)
-			opal_unlock_from_suspend(dev->ctrl.opal_dev);
-	} else {
-		free_opal_dev(dev->ctrl.opal_dev);
-		dev->ctrl.opal_dev = NULL;
-	}
-
 	if (dev->ctrl.oacs & NVME_CTRL_OACS_DBBUF_SUPP) {
 		result = nvme_dbbuf_dma_alloc(dev);
 		if (result)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 6e079abb22ee9..ccd45e5b32986 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -871,7 +871,7 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 
 	nvme_start_admin_queue(&ctrl->ctrl);
 
-	error = nvme_init_ctrl_finish(&ctrl->ctrl);
+	error = nvme_init_ctrl_finish(&ctrl->ctrl, false);
 	if (error)
 		goto out_quiesce_queue;
 
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 1eed0fc26b3ae..4f8584657bb75 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1949,7 +1949,7 @@ static int nvme_tcp_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 
 	nvme_start_admin_queue(ctrl);
 
-	error = nvme_init_ctrl_finish(ctrl);
+	error = nvme_init_ctrl_finish(ctrl, false);
 	if (error)
 		goto out_quiesce_queue;
 
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index b45fe3adf015f..893c50f365c4d 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -377,7 +377,7 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 
 	nvme_start_admin_queue(&ctrl->ctrl);
 
-	error = nvme_init_ctrl_finish(&ctrl->ctrl);
+	error = nvme_init_ctrl_finish(&ctrl->ctrl, false);
 	if (error)
 		goto out_cleanup_tagset;
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 03/12] nvme: simplify transport specific device attribute handling
  2022-11-08 15:02 ` Christoph Hellwig
@ 2022-11-08 15:02   ` Christoph Hellwig
  -1 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

Allow the transport driver to override the attribute groups for the
control device, so that the PCIe driver doesn't manually have to add a
group after device creation and keep track of it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c |  8 ++++++--
 drivers/nvme/host/nvme.h |  2 ++
 drivers/nvme/host/pci.c  | 23 ++++++++---------------
 3 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 887d1aefbc49f..d356f84c1bb91 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3891,10 +3891,11 @@ static umode_t nvme_dev_attrs_are_visible(struct kobject *kobj,
 	return a->mode;
 }
 
-static const struct attribute_group nvme_dev_attrs_group = {
+const struct attribute_group nvme_dev_attrs_group = {
 	.attrs		= nvme_dev_attrs,
 	.is_visible	= nvme_dev_attrs_are_visible,
 };
+EXPORT_SYMBOL_GPL(nvme_dev_attrs_group);
 
 static const struct attribute_group *nvme_dev_attr_groups[] = {
 	&nvme_dev_attrs_group,
@@ -5076,7 +5077,10 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 			ctrl->instance);
 	ctrl->device->class = nvme_class;
 	ctrl->device->parent = ctrl->dev;
-	ctrl->device->groups = nvme_dev_attr_groups;
+	if (ops->dev_attr_groups)
+		ctrl->device->groups = ops->dev_attr_groups;
+	else
+		ctrl->device->groups = nvme_dev_attr_groups;
 	ctrl->device->release = nvme_free_ctrl;
 	dev_set_drvdata(ctrl->device, ctrl);
 	ret = dev_set_name(ctrl->device, "nvme%d", ctrl->instance);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 124a6412cc5c3..ac60f28349be0 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -507,6 +507,7 @@ struct nvme_ctrl_ops {
 	unsigned int flags;
 #define NVME_F_FABRICS			(1 << 0)
 #define NVME_F_METADATA_SUPPORTED	(1 << 1)
+	const struct attribute_group **dev_attr_groups;
 	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
 	int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
 	int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
@@ -853,6 +854,7 @@ int nvme_dev_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags);
 extern const struct attribute_group *nvme_ns_id_attr_groups[];
 extern const struct pr_ops nvme_pr_ops;
 extern const struct block_device_operations nvme_ns_head_ops;
+extern const struct attribute_group nvme_dev_attrs_group;
 
 struct nvme_ns *nvme_find_path(struct nvme_ns_head *head);
 #ifdef CONFIG_NVME_MULTIPATH
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index e4f084e12b966..c8f6ce5eee1c2 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -158,8 +158,6 @@ struct nvme_dev {
 	unsigned int nr_allocated_queues;
 	unsigned int nr_write_queues;
 	unsigned int nr_poll_queues;
-
-	bool attrs_added;
 };
 
 static int io_queue_depth_set(const char *val, const struct kernel_param *kp)
@@ -2234,11 +2232,17 @@ static struct attribute *nvme_pci_attrs[] = {
 	NULL,
 };
 
-static const struct attribute_group nvme_pci_attr_group = {
+static const struct attribute_group nvme_pci_dev_attrs_group = {
 	.attrs		= nvme_pci_attrs,
 	.is_visible	= nvme_pci_attrs_are_visible,
 };
 
+static const struct attribute_group *nvme_pci_dev_attr_groups[] = {
+	&nvme_dev_attrs_group,
+	&nvme_pci_dev_attrs_group,
+	NULL,
+};
+
 /*
  * nirqs is the number of interrupts available for write and read
  * queues. The core already reserved an interrupt for the admin queue.
@@ -2930,10 +2934,6 @@ static void nvme_reset_work(struct work_struct *work)
 		goto out;
 	}
 
-	if (!dev->attrs_added && !sysfs_create_group(&dev->ctrl.device->kobj,
-			&nvme_pci_attr_group))
-		dev->attrs_added = true;
-
 	nvme_start_ctrl(&dev->ctrl);
 	return;
 
@@ -3006,6 +3006,7 @@ static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
 	.name			= "pcie",
 	.module			= THIS_MODULE,
 	.flags			= NVME_F_METADATA_SUPPORTED,
+	.dev_attr_groups	= nvme_pci_dev_attr_groups,
 	.reg_read32		= nvme_pci_reg_read32,
 	.reg_write32		= nvme_pci_reg_write32,
 	.reg_read64		= nvme_pci_reg_read64,
@@ -3204,13 +3205,6 @@ static void nvme_shutdown(struct pci_dev *pdev)
 	nvme_disable_prepare_reset(dev, true);
 }
 
-static void nvme_remove_attrs(struct nvme_dev *dev)
-{
-	if (dev->attrs_added)
-		sysfs_remove_group(&dev->ctrl.device->kobj,
-				   &nvme_pci_attr_group);
-}
-
 /*
  * The driver's remove may be called on a device in a partially initialized
  * state. This function must not have any dependencies on the device state in
@@ -3232,7 +3226,6 @@ static void nvme_remove(struct pci_dev *pdev)
 	nvme_stop_ctrl(&dev->ctrl);
 	nvme_remove_namespaces(&dev->ctrl);
 	nvme_dev_disable(dev, true);
-	nvme_remove_attrs(dev);
 	nvme_free_host_mem(dev);
 	nvme_dev_remove_admin(dev);
 	nvme_free_queues(dev, 0);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 03/12] nvme: simplify transport specific device attribute handling
@ 2022-11-08 15:02   ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

Allow the transport driver to override the attribute groups for the
control device, so that the PCIe driver doesn't manually have to add a
group after device creation and keep track of it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c |  8 ++++++--
 drivers/nvme/host/nvme.h |  2 ++
 drivers/nvme/host/pci.c  | 23 ++++++++---------------
 3 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 887d1aefbc49f..d356f84c1bb91 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3891,10 +3891,11 @@ static umode_t nvme_dev_attrs_are_visible(struct kobject *kobj,
 	return a->mode;
 }
 
-static const struct attribute_group nvme_dev_attrs_group = {
+const struct attribute_group nvme_dev_attrs_group = {
 	.attrs		= nvme_dev_attrs,
 	.is_visible	= nvme_dev_attrs_are_visible,
 };
+EXPORT_SYMBOL_GPL(nvme_dev_attrs_group);
 
 static const struct attribute_group *nvme_dev_attr_groups[] = {
 	&nvme_dev_attrs_group,
@@ -5076,7 +5077,10 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 			ctrl->instance);
 	ctrl->device->class = nvme_class;
 	ctrl->device->parent = ctrl->dev;
-	ctrl->device->groups = nvme_dev_attr_groups;
+	if (ops->dev_attr_groups)
+		ctrl->device->groups = ops->dev_attr_groups;
+	else
+		ctrl->device->groups = nvme_dev_attr_groups;
 	ctrl->device->release = nvme_free_ctrl;
 	dev_set_drvdata(ctrl->device, ctrl);
 	ret = dev_set_name(ctrl->device, "nvme%d", ctrl->instance);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 124a6412cc5c3..ac60f28349be0 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -507,6 +507,7 @@ struct nvme_ctrl_ops {
 	unsigned int flags;
 #define NVME_F_FABRICS			(1 << 0)
 #define NVME_F_METADATA_SUPPORTED	(1 << 1)
+	const struct attribute_group **dev_attr_groups;
 	int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
 	int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
 	int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
@@ -853,6 +854,7 @@ int nvme_dev_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags);
 extern const struct attribute_group *nvme_ns_id_attr_groups[];
 extern const struct pr_ops nvme_pr_ops;
 extern const struct block_device_operations nvme_ns_head_ops;
+extern const struct attribute_group nvme_dev_attrs_group;
 
 struct nvme_ns *nvme_find_path(struct nvme_ns_head *head);
 #ifdef CONFIG_NVME_MULTIPATH
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index e4f084e12b966..c8f6ce5eee1c2 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -158,8 +158,6 @@ struct nvme_dev {
 	unsigned int nr_allocated_queues;
 	unsigned int nr_write_queues;
 	unsigned int nr_poll_queues;
-
-	bool attrs_added;
 };
 
 static int io_queue_depth_set(const char *val, const struct kernel_param *kp)
@@ -2234,11 +2232,17 @@ static struct attribute *nvme_pci_attrs[] = {
 	NULL,
 };
 
-static const struct attribute_group nvme_pci_attr_group = {
+static const struct attribute_group nvme_pci_dev_attrs_group = {
 	.attrs		= nvme_pci_attrs,
 	.is_visible	= nvme_pci_attrs_are_visible,
 };
 
+static const struct attribute_group *nvme_pci_dev_attr_groups[] = {
+	&nvme_dev_attrs_group,
+	&nvme_pci_dev_attrs_group,
+	NULL,
+};
+
 /*
  * nirqs is the number of interrupts available for write and read
  * queues. The core already reserved an interrupt for the admin queue.
@@ -2930,10 +2934,6 @@ static void nvme_reset_work(struct work_struct *work)
 		goto out;
 	}
 
-	if (!dev->attrs_added && !sysfs_create_group(&dev->ctrl.device->kobj,
-			&nvme_pci_attr_group))
-		dev->attrs_added = true;
-
 	nvme_start_ctrl(&dev->ctrl);
 	return;
 
@@ -3006,6 +3006,7 @@ static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
 	.name			= "pcie",
 	.module			= THIS_MODULE,
 	.flags			= NVME_F_METADATA_SUPPORTED,
+	.dev_attr_groups	= nvme_pci_dev_attr_groups,
 	.reg_read32		= nvme_pci_reg_read32,
 	.reg_write32		= nvme_pci_reg_write32,
 	.reg_read64		= nvme_pci_reg_read64,
@@ -3204,13 +3205,6 @@ static void nvme_shutdown(struct pci_dev *pdev)
 	nvme_disable_prepare_reset(dev, true);
 }
 
-static void nvme_remove_attrs(struct nvme_dev *dev)
-{
-	if (dev->attrs_added)
-		sysfs_remove_group(&dev->ctrl.device->kobj,
-				   &nvme_pci_attr_group);
-}
-
 /*
  * The driver's remove may be called on a device in a partially initialized
  * state. This function must not have any dependencies on the device state in
@@ -3232,7 +3226,6 @@ static void nvme_remove(struct pci_dev *pdev)
 	nvme_stop_ctrl(&dev->ctrl);
 	nvme_remove_namespaces(&dev->ctrl);
 	nvme_dev_disable(dev, true);
-	nvme_remove_attrs(dev);
 	nvme_free_host_mem(dev);
 	nvme_dev_remove_admin(dev);
 	nvme_free_queues(dev, 0);
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 04/12] nvme-pci: put the admin queue in nvme_dev_remove_admin
  2022-11-08 15:02 ` Christoph Hellwig
@ 2022-11-08 15:02   ` Christoph Hellwig
  -1 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

Once the controller is shutdown no one can access the admin queue.  Tear
it down in nvme_dev_remove_admin, which matches the flow in the other
drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index c8f6ce5eee1c2..f526ad578088a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1747,6 +1747,7 @@ static void nvme_dev_remove_admin(struct nvme_dev *dev)
 		 */
 		nvme_start_admin_queue(&dev->ctrl);
 		blk_mq_destroy_queue(dev->ctrl.admin_q);
+		blk_put_queue(dev->ctrl.admin_q);
 		blk_mq_free_tag_set(&dev->admin_tagset);
 	}
 }
@@ -2774,8 +2775,6 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 
 	nvme_dbbuf_dma_free(dev);
 	nvme_free_tagset(dev);
-	if (dev->ctrl.admin_q)
-		blk_put_queue(dev->ctrl.admin_q);
 	mempool_destroy(dev->iod_mempool);
 	put_device(dev->dev);
 	kfree(dev->queues);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 04/12] nvme-pci: put the admin queue in nvme_dev_remove_admin
@ 2022-11-08 15:02   ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

Once the controller is shutdown no one can access the admin queue.  Tear
it down in nvme_dev_remove_admin, which matches the flow in the other
drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index c8f6ce5eee1c2..f526ad578088a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1747,6 +1747,7 @@ static void nvme_dev_remove_admin(struct nvme_dev *dev)
 		 */
 		nvme_start_admin_queue(&dev->ctrl);
 		blk_mq_destroy_queue(dev->ctrl.admin_q);
+		blk_put_queue(dev->ctrl.admin_q);
 		blk_mq_free_tag_set(&dev->admin_tagset);
 	}
 }
@@ -2774,8 +2775,6 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 
 	nvme_dbbuf_dma_free(dev);
 	nvme_free_tagset(dev);
-	if (dev->ctrl.admin_q)
-		blk_put_queue(dev->ctrl.admin_q);
 	mempool_destroy(dev->iod_mempool);
 	put_device(dev->dev);
 	kfree(dev->queues);
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 05/12] nvme-pci: move more teardown work to nvme_remove
  2022-11-08 15:02 ` Christoph Hellwig
@ 2022-11-08 15:02   ` Christoph Hellwig
  -1 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

nvme_dbbuf_dma_free frees dma coherent memory, so it must not be called
after ->remove has returned.  Fortunately there is no way to use it
after shutdown as no more I/O is possible so it can be moved.  Similarly
the iod_mempool can't be used for a device kept alive after shutdown, so
move it next to freeing the PRP pools.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index f526ad578088a..b638f43f2df26 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2773,9 +2773,7 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 {
 	struct nvme_dev *dev = to_nvme_dev(ctrl);
 
-	nvme_dbbuf_dma_free(dev);
 	nvme_free_tagset(dev);
-	mempool_destroy(dev->iod_mempool);
 	put_device(dev->dev);
 	kfree(dev->queues);
 	kfree(dev);
@@ -3227,7 +3225,9 @@ static void nvme_remove(struct pci_dev *pdev)
 	nvme_dev_disable(dev, true);
 	nvme_free_host_mem(dev);
 	nvme_dev_remove_admin(dev);
+	nvme_dbbuf_dma_free(dev);
 	nvme_free_queues(dev, 0);
+	mempool_destroy(dev->iod_mempool);
 	nvme_release_prp_pools(dev);
 	nvme_dev_unmap(dev);
 	nvme_uninit_ctrl(&dev->ctrl);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 05/12] nvme-pci: move more teardown work to nvme_remove
@ 2022-11-08 15:02   ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

nvme_dbbuf_dma_free frees dma coherent memory, so it must not be called
after ->remove has returned.  Fortunately there is no way to use it
after shutdown as no more I/O is possible so it can be moved.  Similarly
the iod_mempool can't be used for a device kept alive after shutdown, so
move it next to freeing the PRP pools.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index f526ad578088a..b638f43f2df26 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2773,9 +2773,7 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 {
 	struct nvme_dev *dev = to_nvme_dev(ctrl);
 
-	nvme_dbbuf_dma_free(dev);
 	nvme_free_tagset(dev);
-	mempool_destroy(dev->iod_mempool);
 	put_device(dev->dev);
 	kfree(dev->queues);
 	kfree(dev);
@@ -3227,7 +3225,9 @@ static void nvme_remove(struct pci_dev *pdev)
 	nvme_dev_disable(dev, true);
 	nvme_free_host_mem(dev);
 	nvme_dev_remove_admin(dev);
+	nvme_dbbuf_dma_free(dev);
 	nvme_free_queues(dev, 0);
+	mempool_destroy(dev->iod_mempool);
 	nvme_release_prp_pools(dev);
 	nvme_dev_unmap(dev);
 	nvme_uninit_ctrl(&dev->ctrl);
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 06/12] nvme-pci: factor the iod mempool creation into a helper
  2022-11-08 15:02 ` Christoph Hellwig
@ 2022-11-08 15:02   ` Christoph Hellwig
  -1 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

Add a helper to create the iod mempool.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 41 ++++++++++++++++++-----------------------
 1 file changed, 18 insertions(+), 23 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index b638f43f2df26..f7dab65bf5042 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -390,14 +390,6 @@ static int nvme_pci_npages_sgl(void)
 			PAGE_SIZE);
 }
 
-static size_t nvme_pci_iod_alloc_size(void)
-{
-	size_t npages = max(nvme_pci_npages_prp(), nvme_pci_npages_sgl());
-
-	return sizeof(__le64 *) * npages +
-		sizeof(struct scatterlist) * NVME_MAX_SEGS;
-}
-
 static int nvme_admin_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
 				unsigned int hctx_idx)
 {
@@ -2762,6 +2754,22 @@ static void nvme_release_prp_pools(struct nvme_dev *dev)
 	dma_pool_destroy(dev->prp_small_pool);
 }
 
+static int nvme_pci_alloc_iod_mempool(struct nvme_dev *dev)
+{
+	size_t npages = max(nvme_pci_npages_prp(), nvme_pci_npages_sgl());
+	size_t alloc_size = sizeof(__le64 *) * npages +
+			    sizeof(struct scatterlist) * NVME_MAX_SEGS;
+
+	WARN_ON_ONCE(alloc_size > PAGE_SIZE);
+	dev->iod_mempool = mempool_create_node(1,
+			mempool_kmalloc, mempool_kfree,
+			(void *)alloc_size, GFP_KERNEL,
+			dev_to_node(dev->dev));
+	if (!dev->iod_mempool)
+		return -ENOMEM;
+	return 0;
+}
+
 static void nvme_free_tagset(struct nvme_dev *dev)
 {
 	if (dev->tagset.tags)
@@ -3087,7 +3095,6 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	int node, result = -ENOMEM;
 	struct nvme_dev *dev;
 	unsigned long quirks = id->driver_data;
-	size_t alloc_size;
 
 	node = dev_to_node(&pdev->dev);
 	if (node == NUMA_NO_NODE)
@@ -3132,21 +3139,9 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		quirks |= NVME_QUIRK_SIMPLE_SUSPEND;
 	}
 
-	/*
-	 * Double check that our mempool alloc size will cover the biggest
-	 * command we support.
-	 */
-	alloc_size = nvme_pci_iod_alloc_size();
-	WARN_ON_ONCE(alloc_size > PAGE_SIZE);
-
-	dev->iod_mempool = mempool_create_node(1, mempool_kmalloc,
-						mempool_kfree,
-						(void *) alloc_size,
-						GFP_KERNEL, node);
-	if (!dev->iod_mempool) {
-		result = -ENOMEM;
+	result = nvme_pci_alloc_iod_mempool(dev);
+	if (result)
 		goto release_pools;
-	}
 
 	result = nvme_init_ctrl(&dev->ctrl, &pdev->dev, &nvme_pci_ctrl_ops,
 			quirks);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 06/12] nvme-pci: factor the iod mempool creation into a helper
@ 2022-11-08 15:02   ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

Add a helper to create the iod mempool.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 41 ++++++++++++++++++-----------------------
 1 file changed, 18 insertions(+), 23 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index b638f43f2df26..f7dab65bf5042 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -390,14 +390,6 @@ static int nvme_pci_npages_sgl(void)
 			PAGE_SIZE);
 }
 
-static size_t nvme_pci_iod_alloc_size(void)
-{
-	size_t npages = max(nvme_pci_npages_prp(), nvme_pci_npages_sgl());
-
-	return sizeof(__le64 *) * npages +
-		sizeof(struct scatterlist) * NVME_MAX_SEGS;
-}
-
 static int nvme_admin_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
 				unsigned int hctx_idx)
 {
@@ -2762,6 +2754,22 @@ static void nvme_release_prp_pools(struct nvme_dev *dev)
 	dma_pool_destroy(dev->prp_small_pool);
 }
 
+static int nvme_pci_alloc_iod_mempool(struct nvme_dev *dev)
+{
+	size_t npages = max(nvme_pci_npages_prp(), nvme_pci_npages_sgl());
+	size_t alloc_size = sizeof(__le64 *) * npages +
+			    sizeof(struct scatterlist) * NVME_MAX_SEGS;
+
+	WARN_ON_ONCE(alloc_size > PAGE_SIZE);
+	dev->iod_mempool = mempool_create_node(1,
+			mempool_kmalloc, mempool_kfree,
+			(void *)alloc_size, GFP_KERNEL,
+			dev_to_node(dev->dev));
+	if (!dev->iod_mempool)
+		return -ENOMEM;
+	return 0;
+}
+
 static void nvme_free_tagset(struct nvme_dev *dev)
 {
 	if (dev->tagset.tags)
@@ -3087,7 +3095,6 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	int node, result = -ENOMEM;
 	struct nvme_dev *dev;
 	unsigned long quirks = id->driver_data;
-	size_t alloc_size;
 
 	node = dev_to_node(&pdev->dev);
 	if (node == NUMA_NO_NODE)
@@ -3132,21 +3139,9 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		quirks |= NVME_QUIRK_SIMPLE_SUSPEND;
 	}
 
-	/*
-	 * Double check that our mempool alloc size will cover the biggest
-	 * command we support.
-	 */
-	alloc_size = nvme_pci_iod_alloc_size();
-	WARN_ON_ONCE(alloc_size > PAGE_SIZE);
-
-	dev->iod_mempool = mempool_create_node(1, mempool_kmalloc,
-						mempool_kfree,
-						(void *) alloc_size,
-						GFP_KERNEL, node);
-	if (!dev->iod_mempool) {
-		result = -ENOMEM;
+	result = nvme_pci_alloc_iod_mempool(dev);
+	if (result)
 		goto release_pools;
-	}
 
 	result = nvme_init_ctrl(&dev->ctrl, &pdev->dev, &nvme_pci_ctrl_ops,
 			quirks);
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 07/12] nvme-pci: factor out a nvme_pci_alloc_ctrl helper
  2022-11-08 15:02 ` Christoph Hellwig
@ 2022-11-08 15:02   ` Christoph Hellwig
  -1 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

Add a helper that allocates the nvme_dev structure up to the point where
we can call nvme_init_ctrl.  This pairs with the free_ctrl method and can
thus be used to cleanup the teardown path and make it more symmetric.

Note that this now calls nvme_init_ctrl a lot earlier during probing,
which also means the per-controller character device shows up earlier.
Due to the controller state no commnds can be send on it, but it might
make sense to delay the cdev registration until nvme_init_ctrl_finish.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 81 +++++++++++++++++++++++------------------
 1 file changed, 46 insertions(+), 35 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index f7dab65bf5042..e1f60b1c6918f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2777,6 +2777,7 @@ static void nvme_free_tagset(struct nvme_dev *dev)
 	dev->ctrl.tagset = NULL;
 }
 
+/* pairs with nvme_pci_alloc_ctrl */
 static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 {
 	struct nvme_dev *dev = to_nvme_dev(ctrl);
@@ -3090,19 +3091,23 @@ static void nvme_async_probe(void *data, async_cookie_t cookie)
 	nvme_put_ctrl(&dev->ctrl);
 }
 
-static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+static struct nvme_dev *nvme_pci_alloc_ctrl(struct pci_dev *pdev,
+		const struct pci_device_id *id)
 {
-	int node, result = -ENOMEM;
-	struct nvme_dev *dev;
 	unsigned long quirks = id->driver_data;
+	int node = dev_to_node(&pdev->dev);
+	struct nvme_dev *dev;
+	int ret = -ENOMEM;
 
-	node = dev_to_node(&pdev->dev);
 	if (node == NUMA_NO_NODE)
 		set_dev_node(&pdev->dev, first_memory_node);
 
 	dev = kzalloc_node(sizeof(*dev), GFP_KERNEL, node);
 	if (!dev)
-		return -ENOMEM;
+		return NULL;
+	INIT_WORK(&dev->ctrl.reset_work, nvme_reset_work);
+	INIT_WORK(&dev->remove_work, nvme_remove_dead_ctrl_work);
+	mutex_init(&dev->shutdown_lock);
 
 	dev->nr_write_queues = write_queues;
 	dev->nr_poll_queues = poll_queues;
@@ -3110,25 +3115,11 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	dev->queues = kcalloc_node(dev->nr_allocated_queues,
 			sizeof(struct nvme_queue), GFP_KERNEL, node);
 	if (!dev->queues)
-		goto free;
+		goto out_free_dev;
 
 	dev->dev = get_device(&pdev->dev);
-	pci_set_drvdata(pdev, dev);
-
-	result = nvme_dev_map(dev);
-	if (result)
-		goto put_pci;
-
-	INIT_WORK(&dev->ctrl.reset_work, nvme_reset_work);
-	INIT_WORK(&dev->remove_work, nvme_remove_dead_ctrl_work);
-	mutex_init(&dev->shutdown_lock);
-
-	result = nvme_setup_prp_pools(dev);
-	if (result)
-		goto unmap;
 
 	quirks |= check_vendor_combination_bug(pdev);
-
 	if (!noacpi && acpi_storage_d3(&pdev->dev)) {
 		/*
 		 * Some systems use a bios work around to ask for D3 on
@@ -3138,34 +3129,54 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 			 "platform quirk: setting simple suspend\n");
 		quirks |= NVME_QUIRK_SIMPLE_SUSPEND;
 	}
+	ret = nvme_init_ctrl(&dev->ctrl, &pdev->dev, &nvme_pci_ctrl_ops,
+			     quirks);
+	if (ret)
+		goto out_put_device;
+	return dev;
 
-	result = nvme_pci_alloc_iod_mempool(dev);
+out_put_device:
+	put_device(dev->dev);
+	kfree(dev->queues);
+out_free_dev:
+	kfree(dev);
+	return ERR_PTR(ret);
+}
+
+static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	struct nvme_dev *dev;
+	int result = -ENOMEM;
+
+	dev = nvme_pci_alloc_ctrl(pdev, id);
+	if (!dev)
+		return -ENOMEM;
+
+	result = nvme_dev_map(dev);
 	if (result)
-		goto release_pools;
+		goto out_uninit_ctrl;
 
-	result = nvme_init_ctrl(&dev->ctrl, &pdev->dev, &nvme_pci_ctrl_ops,
-			quirks);
+	result = nvme_setup_prp_pools(dev);
+	if (result)
+		goto out_dev_unmap;
+
+	result = nvme_pci_alloc_iod_mempool(dev);
 	if (result)
-		goto release_mempool;
+		goto out_release_prp_pools;
 
 	dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
+	pci_set_drvdata(pdev, dev);
 
 	nvme_reset_ctrl(&dev->ctrl);
 	async_schedule(nvme_async_probe, dev);
-
 	return 0;
 
- release_mempool:
-	mempool_destroy(dev->iod_mempool);
- release_pools:
+out_release_prp_pools:
 	nvme_release_prp_pools(dev);
- unmap:
+out_dev_unmap:
 	nvme_dev_unmap(dev);
- put_pci:
-	put_device(dev->dev);
- free:
-	kfree(dev->queues);
-	kfree(dev);
+out_uninit_ctrl:
+	nvme_uninit_ctrl(&dev->ctrl);
 	return result;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 07/12] nvme-pci: factor out a nvme_pci_alloc_ctrl helper
@ 2022-11-08 15:02   ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

Add a helper that allocates the nvme_dev structure up to the point where
we can call nvme_init_ctrl.  This pairs with the free_ctrl method and can
thus be used to cleanup the teardown path and make it more symmetric.

Note that this now calls nvme_init_ctrl a lot earlier during probing,
which also means the per-controller character device shows up earlier.
Due to the controller state no commnds can be send on it, but it might
make sense to delay the cdev registration until nvme_init_ctrl_finish.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 81 +++++++++++++++++++++++------------------
 1 file changed, 46 insertions(+), 35 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index f7dab65bf5042..e1f60b1c6918f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2777,6 +2777,7 @@ static void nvme_free_tagset(struct nvme_dev *dev)
 	dev->ctrl.tagset = NULL;
 }
 
+/* pairs with nvme_pci_alloc_ctrl */
 static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 {
 	struct nvme_dev *dev = to_nvme_dev(ctrl);
@@ -3090,19 +3091,23 @@ static void nvme_async_probe(void *data, async_cookie_t cookie)
 	nvme_put_ctrl(&dev->ctrl);
 }
 
-static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+static struct nvme_dev *nvme_pci_alloc_ctrl(struct pci_dev *pdev,
+		const struct pci_device_id *id)
 {
-	int node, result = -ENOMEM;
-	struct nvme_dev *dev;
 	unsigned long quirks = id->driver_data;
+	int node = dev_to_node(&pdev->dev);
+	struct nvme_dev *dev;
+	int ret = -ENOMEM;
 
-	node = dev_to_node(&pdev->dev);
 	if (node == NUMA_NO_NODE)
 		set_dev_node(&pdev->dev, first_memory_node);
 
 	dev = kzalloc_node(sizeof(*dev), GFP_KERNEL, node);
 	if (!dev)
-		return -ENOMEM;
+		return NULL;
+	INIT_WORK(&dev->ctrl.reset_work, nvme_reset_work);
+	INIT_WORK(&dev->remove_work, nvme_remove_dead_ctrl_work);
+	mutex_init(&dev->shutdown_lock);
 
 	dev->nr_write_queues = write_queues;
 	dev->nr_poll_queues = poll_queues;
@@ -3110,25 +3115,11 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	dev->queues = kcalloc_node(dev->nr_allocated_queues,
 			sizeof(struct nvme_queue), GFP_KERNEL, node);
 	if (!dev->queues)
-		goto free;
+		goto out_free_dev;
 
 	dev->dev = get_device(&pdev->dev);
-	pci_set_drvdata(pdev, dev);
-
-	result = nvme_dev_map(dev);
-	if (result)
-		goto put_pci;
-
-	INIT_WORK(&dev->ctrl.reset_work, nvme_reset_work);
-	INIT_WORK(&dev->remove_work, nvme_remove_dead_ctrl_work);
-	mutex_init(&dev->shutdown_lock);
-
-	result = nvme_setup_prp_pools(dev);
-	if (result)
-		goto unmap;
 
 	quirks |= check_vendor_combination_bug(pdev);
-
 	if (!noacpi && acpi_storage_d3(&pdev->dev)) {
 		/*
 		 * Some systems use a bios work around to ask for D3 on
@@ -3138,34 +3129,54 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 			 "platform quirk: setting simple suspend\n");
 		quirks |= NVME_QUIRK_SIMPLE_SUSPEND;
 	}
+	ret = nvme_init_ctrl(&dev->ctrl, &pdev->dev, &nvme_pci_ctrl_ops,
+			     quirks);
+	if (ret)
+		goto out_put_device;
+	return dev;
 
-	result = nvme_pci_alloc_iod_mempool(dev);
+out_put_device:
+	put_device(dev->dev);
+	kfree(dev->queues);
+out_free_dev:
+	kfree(dev);
+	return ERR_PTR(ret);
+}
+
+static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	struct nvme_dev *dev;
+	int result = -ENOMEM;
+
+	dev = nvme_pci_alloc_ctrl(pdev, id);
+	if (!dev)
+		return -ENOMEM;
+
+	result = nvme_dev_map(dev);
 	if (result)
-		goto release_pools;
+		goto out_uninit_ctrl;
 
-	result = nvme_init_ctrl(&dev->ctrl, &pdev->dev, &nvme_pci_ctrl_ops,
-			quirks);
+	result = nvme_setup_prp_pools(dev);
+	if (result)
+		goto out_dev_unmap;
+
+	result = nvme_pci_alloc_iod_mempool(dev);
 	if (result)
-		goto release_mempool;
+		goto out_release_prp_pools;
 
 	dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
+	pci_set_drvdata(pdev, dev);
 
 	nvme_reset_ctrl(&dev->ctrl);
 	async_schedule(nvme_async_probe, dev);
-
 	return 0;
 
- release_mempool:
-	mempool_destroy(dev->iod_mempool);
- release_pools:
+out_release_prp_pools:
 	nvme_release_prp_pools(dev);
- unmap:
+out_dev_unmap:
 	nvme_dev_unmap(dev);
- put_pci:
-	put_device(dev->dev);
- free:
-	kfree(dev->queues);
-	kfree(dev);
+out_uninit_ctrl:
+	nvme_uninit_ctrl(&dev->ctrl);
 	return result;
 }
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 08/12] nvme-pci: set constant paramters in nvme_pci_alloc_ctrl
  2022-11-08 15:02 ` Christoph Hellwig
@ 2022-11-08 15:02   ` Christoph Hellwig
  -1 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

Move setting of low-level constant parameters from nvme_reset_work to
nvme_pci_alloc_ctrl.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 38 +++++++++++++++++---------------------
 1 file changed, 17 insertions(+), 21 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index e1f60b1c6918f..87f7c8f5699ec 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2841,21 +2841,6 @@ static void nvme_reset_work(struct work_struct *work)
 		nvme_start_admin_queue(&dev->ctrl);
 	}
 
-	dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1);
-
-	/*
-	 * Limit the max command size to prevent iod->sg allocations going
-	 * over a single page.
-	 */
-	dev->ctrl.max_hw_sectors = min_t(u32,
-		NVME_MAX_KB_SZ << 1, dma_max_mapping_size(dev->dev) >> 9);
-	dev->ctrl.max_segments = NVME_MAX_SEGS;
-
-	/*
-	 * Don't limit the IOMMU merged segment size.
-	 */
-	dma_set_max_seg_size(dev->dev, 0xffffffff);
-
 	mutex_unlock(&dev->shutdown_lock);
 
 	/*
@@ -2869,12 +2854,6 @@ static void nvme_reset_work(struct work_struct *work)
 		goto out;
 	}
 
-	/*
-	 * We do not support an SGL for metadata (yet), so we are limited to a
-	 * single integrity segment for the separate metadata pointer.
-	 */
-	dev->ctrl.max_integrity_segments = 1;
-
 	result = nvme_init_ctrl_finish(&dev->ctrl, was_suspend);
 	if (result)
 		goto out;
@@ -3133,6 +3112,23 @@ static struct nvme_dev *nvme_pci_alloc_ctrl(struct pci_dev *pdev,
 			     quirks);
 	if (ret)
 		goto out_put_device;
+	
+	dma_set_min_align_mask(&pdev->dev, NVME_CTRL_PAGE_SIZE - 1);
+	dma_set_max_seg_size(&pdev->dev, 0xffffffff);
+
+	/*
+	 * Limit the max command size to prevent iod->sg allocations going
+	 * over a single page.
+	 */
+	dev->ctrl.max_hw_sectors = min_t(u32,
+		NVME_MAX_KB_SZ << 1, dma_max_mapping_size(&pdev->dev) >> 9);
+	dev->ctrl.max_segments = NVME_MAX_SEGS;
+
+	/*
+	 * There is no support for SGLs for metadata (yet), so we are limited to
+	 * a single integrity segment for the separate metadata pointer.
+	 */
+	dev->ctrl.max_integrity_segments = 1;
 	return dev;
 
 out_put_device:
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 08/12] nvme-pci: set constant paramters in nvme_pci_alloc_ctrl
@ 2022-11-08 15:02   ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

Move setting of low-level constant parameters from nvme_reset_work to
nvme_pci_alloc_ctrl.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 38 +++++++++++++++++---------------------
 1 file changed, 17 insertions(+), 21 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index e1f60b1c6918f..87f7c8f5699ec 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2841,21 +2841,6 @@ static void nvme_reset_work(struct work_struct *work)
 		nvme_start_admin_queue(&dev->ctrl);
 	}
 
-	dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1);
-
-	/*
-	 * Limit the max command size to prevent iod->sg allocations going
-	 * over a single page.
-	 */
-	dev->ctrl.max_hw_sectors = min_t(u32,
-		NVME_MAX_KB_SZ << 1, dma_max_mapping_size(dev->dev) >> 9);
-	dev->ctrl.max_segments = NVME_MAX_SEGS;
-
-	/*
-	 * Don't limit the IOMMU merged segment size.
-	 */
-	dma_set_max_seg_size(dev->dev, 0xffffffff);
-
 	mutex_unlock(&dev->shutdown_lock);
 
 	/*
@@ -2869,12 +2854,6 @@ static void nvme_reset_work(struct work_struct *work)
 		goto out;
 	}
 
-	/*
-	 * We do not support an SGL for metadata (yet), so we are limited to a
-	 * single integrity segment for the separate metadata pointer.
-	 */
-	dev->ctrl.max_integrity_segments = 1;
-
 	result = nvme_init_ctrl_finish(&dev->ctrl, was_suspend);
 	if (result)
 		goto out;
@@ -3133,6 +3112,23 @@ static struct nvme_dev *nvme_pci_alloc_ctrl(struct pci_dev *pdev,
 			     quirks);
 	if (ret)
 		goto out_put_device;
+	
+	dma_set_min_align_mask(&pdev->dev, NVME_CTRL_PAGE_SIZE - 1);
+	dma_set_max_seg_size(&pdev->dev, 0xffffffff);
+
+	/*
+	 * Limit the max command size to prevent iod->sg allocations going
+	 * over a single page.
+	 */
+	dev->ctrl.max_hw_sectors = min_t(u32,
+		NVME_MAX_KB_SZ << 1, dma_max_mapping_size(&pdev->dev) >> 9);
+	dev->ctrl.max_segments = NVME_MAX_SEGS;
+
+	/*
+	 * There is no support for SGLs for metadata (yet), so we are limited to
+	 * a single integrity segment for the separate metadata pointer.
+	 */
+	dev->ctrl.max_integrity_segments = 1;
 	return dev;
 
 out_put_device:
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 09/12] nvme-pci: call nvme_pci_configure_admin_queue from nvme_pci_enable
  2022-11-08 15:02 ` Christoph Hellwig
@ 2022-11-08 15:02   ` Christoph Hellwig
  -1 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

nvme_pci_configure_admin_queue is called right after nvme_pci_enable, and
it's work is undone by nvme_dev_disable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 87f7c8f5699ec..64e2bd2d4f61e 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2639,7 +2639,8 @@ static int nvme_pci_enable(struct nvme_dev *dev)
 
 	pci_enable_pcie_error_reporting(pdev);
 	pci_save_state(pdev);
-	return 0;
+
+	return nvme_pci_configure_admin_queue(dev);
 
  disable:
 	pci_disable_device(pdev);
@@ -2829,10 +2830,6 @@ static void nvme_reset_work(struct work_struct *work)
 	if (result)
 		goto out_unlock;
 
-	result = nvme_pci_configure_admin_queue(dev);
-	if (result)
-		goto out_unlock;
-
 	if (!dev->ctrl.admin_q) {
 		result = nvme_pci_alloc_admin_tag_set(dev);
 		if (result)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 09/12] nvme-pci: call nvme_pci_configure_admin_queue from nvme_pci_enable
@ 2022-11-08 15:02   ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

nvme_pci_configure_admin_queue is called right after nvme_pci_enable, and
it's work is undone by nvme_dev_disable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 87f7c8f5699ec..64e2bd2d4f61e 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2639,7 +2639,8 @@ static int nvme_pci_enable(struct nvme_dev *dev)
 
 	pci_enable_pcie_error_reporting(pdev);
 	pci_save_state(pdev);
-	return 0;
+
+	return nvme_pci_configure_admin_queue(dev);
 
  disable:
 	pci_disable_device(pdev);
@@ -2829,10 +2830,6 @@ static void nvme_reset_work(struct work_struct *work)
 	if (result)
 		goto out_unlock;
 
-	result = nvme_pci_configure_admin_queue(dev);
-	if (result)
-		goto out_unlock;
-
 	if (!dev->ctrl.admin_q) {
 		result = nvme_pci_alloc_admin_tag_set(dev);
 		if (result)
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 10/12] nvme-pci: split nvme_dbbuf_dma_alloc
  2022-11-08 15:02 ` Christoph Hellwig
@ 2022-11-08 15:02   ` Christoph Hellwig
  -1 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

Move the clearing of the already allocated doorbell buffer memory to
the caller, and instead move printing the error for the allocation
failure into the helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 45 +++++++++++++++++++----------------------
 1 file changed, 21 insertions(+), 24 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 64e2bd2d4f61e..1b3c96a4b7c90 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -239,36 +239,28 @@ static inline unsigned int nvme_dbbuf_size(struct nvme_dev *dev)
 	return dev->nr_allocated_queues * 8 * dev->db_stride;
 }
 
-static int nvme_dbbuf_dma_alloc(struct nvme_dev *dev)
+static void nvme_dbbuf_dma_alloc(struct nvme_dev *dev)
 {
 	unsigned int mem_size = nvme_dbbuf_size(dev);
 
-	if (dev->dbbuf_dbs) {
-		/*
-		 * Clear the dbbuf memory so the driver doesn't observe stale
-		 * values from the previous instantiation.
-		 */
-		memset(dev->dbbuf_dbs, 0, mem_size);
-		memset(dev->dbbuf_eis, 0, mem_size);
-		return 0;
-	}
-
 	dev->dbbuf_dbs = dma_alloc_coherent(dev->dev, mem_size,
 					    &dev->dbbuf_dbs_dma_addr,
 					    GFP_KERNEL);
 	if (!dev->dbbuf_dbs)
-		return -ENOMEM;
+		goto fail;
 	dev->dbbuf_eis = dma_alloc_coherent(dev->dev, mem_size,
 					    &dev->dbbuf_eis_dma_addr,
 					    GFP_KERNEL);
-	if (!dev->dbbuf_eis) {
-		dma_free_coherent(dev->dev, mem_size,
-				  dev->dbbuf_dbs, dev->dbbuf_dbs_dma_addr);
-		dev->dbbuf_dbs = NULL;
-		return -ENOMEM;
-	}
+	if (!dev->dbbuf_eis)
+		goto fail_free_dbbuf_dbs;
+	return;
 
-	return 0;
+fail_free_dbbuf_dbs:
+	dma_free_coherent(dev->dev, mem_size, dev->dbbuf_dbs,
+			  dev->dbbuf_dbs_dma_addr);
+	dev->dbbuf_dbs = NULL;
+fail:
+	dev_warn(dev->dev, "unable to allocate dma for dbbuf\n");
 }
 
 static void nvme_dbbuf_dma_free(struct nvme_dev *dev)
@@ -2855,11 +2847,16 @@ static void nvme_reset_work(struct work_struct *work)
 	if (result)
 		goto out;
 
-	if (dev->ctrl.oacs & NVME_CTRL_OACS_DBBUF_SUPP) {
-		result = nvme_dbbuf_dma_alloc(dev);
-		if (result)
-			dev_warn(dev->dev,
-				 "unable to allocate dma for dbbuf\n");
+	if (dev->dbbuf_dbs) {
+		/*
+		 * Clear the dbbuf memory so the driver doesn't observe stale
+		 * values from the previous instantiation.
+		 */
+		memset(dev->dbbuf_dbs, 0, nvme_dbbuf_size(dev));
+		memset(dev->dbbuf_eis, 0, nvme_dbbuf_size(dev));
+	} else {
+		if (dev->ctrl.oacs & NVME_CTRL_OACS_DBBUF_SUPP)
+			nvme_dbbuf_dma_alloc(dev);
 	}
 
 	if (dev->ctrl.hmpre) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 10/12] nvme-pci: split nvme_dbbuf_dma_alloc
@ 2022-11-08 15:02   ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

Move the clearing of the already allocated doorbell buffer memory to
the caller, and instead move printing the error for the allocation
failure into the helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 45 +++++++++++++++++++----------------------
 1 file changed, 21 insertions(+), 24 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 64e2bd2d4f61e..1b3c96a4b7c90 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -239,36 +239,28 @@ static inline unsigned int nvme_dbbuf_size(struct nvme_dev *dev)
 	return dev->nr_allocated_queues * 8 * dev->db_stride;
 }
 
-static int nvme_dbbuf_dma_alloc(struct nvme_dev *dev)
+static void nvme_dbbuf_dma_alloc(struct nvme_dev *dev)
 {
 	unsigned int mem_size = nvme_dbbuf_size(dev);
 
-	if (dev->dbbuf_dbs) {
-		/*
-		 * Clear the dbbuf memory so the driver doesn't observe stale
-		 * values from the previous instantiation.
-		 */
-		memset(dev->dbbuf_dbs, 0, mem_size);
-		memset(dev->dbbuf_eis, 0, mem_size);
-		return 0;
-	}
-
 	dev->dbbuf_dbs = dma_alloc_coherent(dev->dev, mem_size,
 					    &dev->dbbuf_dbs_dma_addr,
 					    GFP_KERNEL);
 	if (!dev->dbbuf_dbs)
-		return -ENOMEM;
+		goto fail;
 	dev->dbbuf_eis = dma_alloc_coherent(dev->dev, mem_size,
 					    &dev->dbbuf_eis_dma_addr,
 					    GFP_KERNEL);
-	if (!dev->dbbuf_eis) {
-		dma_free_coherent(dev->dev, mem_size,
-				  dev->dbbuf_dbs, dev->dbbuf_dbs_dma_addr);
-		dev->dbbuf_dbs = NULL;
-		return -ENOMEM;
-	}
+	if (!dev->dbbuf_eis)
+		goto fail_free_dbbuf_dbs;
+	return;
 
-	return 0;
+fail_free_dbbuf_dbs:
+	dma_free_coherent(dev->dev, mem_size, dev->dbbuf_dbs,
+			  dev->dbbuf_dbs_dma_addr);
+	dev->dbbuf_dbs = NULL;
+fail:
+	dev_warn(dev->dev, "unable to allocate dma for dbbuf\n");
 }
 
 static void nvme_dbbuf_dma_free(struct nvme_dev *dev)
@@ -2855,11 +2847,16 @@ static void nvme_reset_work(struct work_struct *work)
 	if (result)
 		goto out;
 
-	if (dev->ctrl.oacs & NVME_CTRL_OACS_DBBUF_SUPP) {
-		result = nvme_dbbuf_dma_alloc(dev);
-		if (result)
-			dev_warn(dev->dev,
-				 "unable to allocate dma for dbbuf\n");
+	if (dev->dbbuf_dbs) {
+		/*
+		 * Clear the dbbuf memory so the driver doesn't observe stale
+		 * values from the previous instantiation.
+		 */
+		memset(dev->dbbuf_dbs, 0, nvme_dbbuf_size(dev));
+		memset(dev->dbbuf_eis, 0, nvme_dbbuf_size(dev));
+	} else {
+		if (dev->ctrl.oacs & NVME_CTRL_OACS_DBBUF_SUPP)
+			nvme_dbbuf_dma_alloc(dev);
 	}
 
 	if (dev->ctrl.hmpre) {
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 11/12] nvme-pci: split the initial probe from the rest path
  2022-11-08 15:02 ` Christoph Hellwig
@ 2022-11-08 15:02   ` Christoph Hellwig
  -1 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

nvme_reset_work is a little fragile as it needs to handle both resetting
a live controller and initializing one during probe.  Split out the initial
probe and open code it in nvme_probe and leave nvme_reset_work to just do
the live controller reset.

This fixes a recently introduced bug where nvme_dev_disable causes a NULL
pointer dereferences in blk_mq_quiesce_tagset because the tagset pointer
is not set when the reset state is entered directly from the new state.
The separate probe code can skip the reset state and probe directly and
fixes this.

To make sure the system isn't single threaded on enabling nvme
controllers, set the PROBE_PREFER_ASYNCHRONOUS flag in the device_driver
structure so that the driver core probes in parallel.

Fixes: 98d81f0df70c ("nvme: use blk_mq_[un]quiesce_tagset")
Reported-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 139 ++++++++++++++++++++++++----------------
 1 file changed, 83 insertions(+), 56 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 1b3c96a4b7c90..1c8c70767cb8a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2821,15 +2821,7 @@ static void nvme_reset_work(struct work_struct *work)
 	result = nvme_pci_enable(dev);
 	if (result)
 		goto out_unlock;
-
-	if (!dev->ctrl.admin_q) {
-		result = nvme_pci_alloc_admin_tag_set(dev);
-		if (result)
-			goto out_unlock;
-	} else {
-		nvme_start_admin_queue(&dev->ctrl);
-	}
-
+	nvme_start_admin_queue(&dev->ctrl);
 	mutex_unlock(&dev->shutdown_lock);
 
 	/*
@@ -2854,9 +2846,6 @@ static void nvme_reset_work(struct work_struct *work)
 		 */
 		memset(dev->dbbuf_dbs, 0, nvme_dbbuf_size(dev));
 		memset(dev->dbbuf_eis, 0, nvme_dbbuf_size(dev));
-	} else {
-		if (dev->ctrl.oacs & NVME_CTRL_OACS_DBBUF_SUPP)
-			nvme_dbbuf_dma_alloc(dev);
 	}
 
 	if (dev->ctrl.hmpre) {
@@ -2869,37 +2858,23 @@ static void nvme_reset_work(struct work_struct *work)
 	if (result)
 		goto out;
 
-	if (dev->ctrl.tagset) {
-		/*
-		 * This is a controller reset and we already have a tagset.
-		 * Freeze and update the number of I/O queues as thos might have
-		 * changed.  If there are no I/O queues left after this reset,
-		 * keep the controller around but remove all namespaces.
-		 */
-		if (dev->online_queues > 1) {
-			nvme_start_queues(&dev->ctrl);
-			nvme_wait_freeze(&dev->ctrl);
-			nvme_pci_update_nr_queues(dev);
-			nvme_dbbuf_set(dev);
-			nvme_unfreeze(&dev->ctrl);
-		} else {
-			dev_warn(dev->ctrl.device, "IO queues lost\n");
-			nvme_mark_namespaces_dead(&dev->ctrl);
-			nvme_start_queues(&dev->ctrl);
-			nvme_remove_namespaces(&dev->ctrl);
-			nvme_free_tagset(dev);
-		}
+	/*
+	 * Freeze and update the number of I/O queues as thos might have
+	 * changed.  If there are no I/O queues left after this reset, keep the
+	 * controller around but remove all namespaces.
+	 */
+	if (dev->online_queues > 1) {
+		nvme_start_queues(&dev->ctrl);
+		nvme_wait_freeze(&dev->ctrl);
+		nvme_pci_update_nr_queues(dev);
+		nvme_dbbuf_set(dev);
+		nvme_unfreeze(&dev->ctrl);
 	} else {
-		/*
-		 * First probe.  Still allow the controller to show up even if
-		 * there are no namespaces.
-		 */
-		if (dev->online_queues > 1) {
-			nvme_pci_alloc_tag_set(dev);
-			nvme_dbbuf_set(dev);
-		} else {
-			dev_warn(dev->ctrl.device, "IO queues not created\n");
-		}
+		dev_warn(dev->ctrl.device, "IO queues lost\n");
+		nvme_mark_namespaces_dead(&dev->ctrl);
+		nvme_start_queues(&dev->ctrl);
+		nvme_remove_namespaces(&dev->ctrl);
+		nvme_free_tagset(dev);
 	}
 
 	/*
@@ -3055,15 +3030,6 @@ static unsigned long check_vendor_combination_bug(struct pci_dev *pdev)
 	return 0;
 }
 
-static void nvme_async_probe(void *data, async_cookie_t cookie)
-{
-	struct nvme_dev *dev = data;
-
-	flush_work(&dev->ctrl.reset_work);
-	flush_work(&dev->ctrl.scan_work);
-	nvme_put_ctrl(&dev->ctrl);
-}
-
 static struct nvme_dev *nvme_pci_alloc_ctrl(struct pci_dev *pdev,
 		const struct pci_device_id *id)
 {
@@ -3155,12 +3121,72 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto out_release_prp_pools;
 
 	dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
+
+	result = nvme_pci_enable(dev);
+	if (result)
+		goto out_release_iod_mempool;
+
+	result = nvme_pci_alloc_admin_tag_set(dev);
+	if (result)
+		goto out_disable;
+
+	/*
+	 * Mark the controller as connecting before sending admin commands to
+	 * allow the timeout handler to do the right thing.
+	 */
+	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_CONNECTING)) {
+		dev_warn(dev->ctrl.device,
+			"failed to mark controller CONNECTING\n");
+		result = -EBUSY;
+		goto out_disable;
+	}
+
+	result = nvme_init_ctrl_finish(&dev->ctrl, false);
+	if (result)
+		goto out_disable;
+
+	if (dev->ctrl.oacs & NVME_CTRL_OACS_DBBUF_SUPP)
+		nvme_dbbuf_dma_alloc(dev);
+
+	if (dev->ctrl.hmpre) {
+		result = nvme_setup_host_mem(dev);
+		if (result < 0)
+			goto out_disable;
+	}
+
+	result = nvme_setup_io_queues(dev);
+	if (result)
+		goto out_disable;
+
+	if (dev->online_queues > 1) {
+		nvme_pci_alloc_tag_set(dev);
+		nvme_dbbuf_set(dev);
+	} else {
+		dev_warn(dev->ctrl.device, "IO queues not created\n");
+	}
+
+	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_LIVE)) {
+		dev_warn(dev->ctrl.device,
+			"failed to mark controller live state\n");
+		result = -ENODEV;
+		goto out_disable;
+	}
+
 	pci_set_drvdata(pdev, dev);
 
-	nvme_reset_ctrl(&dev->ctrl);
-	async_schedule(nvme_async_probe, dev);
+	nvme_start_ctrl(&dev->ctrl);
+	nvme_put_ctrl(&dev->ctrl);
 	return 0;
 
+out_disable:
+	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
+	nvme_dev_disable(dev, true);
+	nvme_free_host_mem(dev);
+	nvme_dev_remove_admin(dev);
+	nvme_dbbuf_dma_free(dev);
+	nvme_free_queues(dev, 0);
+out_release_iod_mempool:
+	mempool_destroy(dev->iod_mempool);
 out_release_prp_pools:
 	nvme_release_prp_pools(dev);
 out_dev_unmap:
@@ -3556,11 +3582,12 @@ static struct pci_driver nvme_driver = {
 	.probe		= nvme_probe,
 	.remove		= nvme_remove,
 	.shutdown	= nvme_shutdown,
-#ifdef CONFIG_PM_SLEEP
 	.driver		= {
-		.pm	= &nvme_dev_pm_ops,
-	},
+		.probe_type	= PROBE_PREFER_ASYNCHRONOUS,
+#ifdef CONFIG_PM_SLEEP
+		.pm		= &nvme_dev_pm_ops,
 #endif
+	},
 	.sriov_configure = pci_sriov_configure_simple,
 	.err_handler	= &nvme_err_handler,
 };
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 11/12] nvme-pci: split the initial probe from the rest path
@ 2022-11-08 15:02   ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

nvme_reset_work is a little fragile as it needs to handle both resetting
a live controller and initializing one during probe.  Split out the initial
probe and open code it in nvme_probe and leave nvme_reset_work to just do
the live controller reset.

This fixes a recently introduced bug where nvme_dev_disable causes a NULL
pointer dereferences in blk_mq_quiesce_tagset because the tagset pointer
is not set when the reset state is entered directly from the new state.
The separate probe code can skip the reset state and probe directly and
fixes this.

To make sure the system isn't single threaded on enabling nvme
controllers, set the PROBE_PREFER_ASYNCHRONOUS flag in the device_driver
structure so that the driver core probes in parallel.

Fixes: 98d81f0df70c ("nvme: use blk_mq_[un]quiesce_tagset")
Reported-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 139 ++++++++++++++++++++++++----------------
 1 file changed, 83 insertions(+), 56 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 1b3c96a4b7c90..1c8c70767cb8a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2821,15 +2821,7 @@ static void nvme_reset_work(struct work_struct *work)
 	result = nvme_pci_enable(dev);
 	if (result)
 		goto out_unlock;
-
-	if (!dev->ctrl.admin_q) {
-		result = nvme_pci_alloc_admin_tag_set(dev);
-		if (result)
-			goto out_unlock;
-	} else {
-		nvme_start_admin_queue(&dev->ctrl);
-	}
-
+	nvme_start_admin_queue(&dev->ctrl);
 	mutex_unlock(&dev->shutdown_lock);
 
 	/*
@@ -2854,9 +2846,6 @@ static void nvme_reset_work(struct work_struct *work)
 		 */
 		memset(dev->dbbuf_dbs, 0, nvme_dbbuf_size(dev));
 		memset(dev->dbbuf_eis, 0, nvme_dbbuf_size(dev));
-	} else {
-		if (dev->ctrl.oacs & NVME_CTRL_OACS_DBBUF_SUPP)
-			nvme_dbbuf_dma_alloc(dev);
 	}
 
 	if (dev->ctrl.hmpre) {
@@ -2869,37 +2858,23 @@ static void nvme_reset_work(struct work_struct *work)
 	if (result)
 		goto out;
 
-	if (dev->ctrl.tagset) {
-		/*
-		 * This is a controller reset and we already have a tagset.
-		 * Freeze and update the number of I/O queues as thos might have
-		 * changed.  If there are no I/O queues left after this reset,
-		 * keep the controller around but remove all namespaces.
-		 */
-		if (dev->online_queues > 1) {
-			nvme_start_queues(&dev->ctrl);
-			nvme_wait_freeze(&dev->ctrl);
-			nvme_pci_update_nr_queues(dev);
-			nvme_dbbuf_set(dev);
-			nvme_unfreeze(&dev->ctrl);
-		} else {
-			dev_warn(dev->ctrl.device, "IO queues lost\n");
-			nvme_mark_namespaces_dead(&dev->ctrl);
-			nvme_start_queues(&dev->ctrl);
-			nvme_remove_namespaces(&dev->ctrl);
-			nvme_free_tagset(dev);
-		}
+	/*
+	 * Freeze and update the number of I/O queues as thos might have
+	 * changed.  If there are no I/O queues left after this reset, keep the
+	 * controller around but remove all namespaces.
+	 */
+	if (dev->online_queues > 1) {
+		nvme_start_queues(&dev->ctrl);
+		nvme_wait_freeze(&dev->ctrl);
+		nvme_pci_update_nr_queues(dev);
+		nvme_dbbuf_set(dev);
+		nvme_unfreeze(&dev->ctrl);
 	} else {
-		/*
-		 * First probe.  Still allow the controller to show up even if
-		 * there are no namespaces.
-		 */
-		if (dev->online_queues > 1) {
-			nvme_pci_alloc_tag_set(dev);
-			nvme_dbbuf_set(dev);
-		} else {
-			dev_warn(dev->ctrl.device, "IO queues not created\n");
-		}
+		dev_warn(dev->ctrl.device, "IO queues lost\n");
+		nvme_mark_namespaces_dead(&dev->ctrl);
+		nvme_start_queues(&dev->ctrl);
+		nvme_remove_namespaces(&dev->ctrl);
+		nvme_free_tagset(dev);
 	}
 
 	/*
@@ -3055,15 +3030,6 @@ static unsigned long check_vendor_combination_bug(struct pci_dev *pdev)
 	return 0;
 }
 
-static void nvme_async_probe(void *data, async_cookie_t cookie)
-{
-	struct nvme_dev *dev = data;
-
-	flush_work(&dev->ctrl.reset_work);
-	flush_work(&dev->ctrl.scan_work);
-	nvme_put_ctrl(&dev->ctrl);
-}
-
 static struct nvme_dev *nvme_pci_alloc_ctrl(struct pci_dev *pdev,
 		const struct pci_device_id *id)
 {
@@ -3155,12 +3121,72 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto out_release_prp_pools;
 
 	dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
+
+	result = nvme_pci_enable(dev);
+	if (result)
+		goto out_release_iod_mempool;
+
+	result = nvme_pci_alloc_admin_tag_set(dev);
+	if (result)
+		goto out_disable;
+
+	/*
+	 * Mark the controller as connecting before sending admin commands to
+	 * allow the timeout handler to do the right thing.
+	 */
+	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_CONNECTING)) {
+		dev_warn(dev->ctrl.device,
+			"failed to mark controller CONNECTING\n");
+		result = -EBUSY;
+		goto out_disable;
+	}
+
+	result = nvme_init_ctrl_finish(&dev->ctrl, false);
+	if (result)
+		goto out_disable;
+
+	if (dev->ctrl.oacs & NVME_CTRL_OACS_DBBUF_SUPP)
+		nvme_dbbuf_dma_alloc(dev);
+
+	if (dev->ctrl.hmpre) {
+		result = nvme_setup_host_mem(dev);
+		if (result < 0)
+			goto out_disable;
+	}
+
+	result = nvme_setup_io_queues(dev);
+	if (result)
+		goto out_disable;
+
+	if (dev->online_queues > 1) {
+		nvme_pci_alloc_tag_set(dev);
+		nvme_dbbuf_set(dev);
+	} else {
+		dev_warn(dev->ctrl.device, "IO queues not created\n");
+	}
+
+	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_LIVE)) {
+		dev_warn(dev->ctrl.device,
+			"failed to mark controller live state\n");
+		result = -ENODEV;
+		goto out_disable;
+	}
+
 	pci_set_drvdata(pdev, dev);
 
-	nvme_reset_ctrl(&dev->ctrl);
-	async_schedule(nvme_async_probe, dev);
+	nvme_start_ctrl(&dev->ctrl);
+	nvme_put_ctrl(&dev->ctrl);
 	return 0;
 
+out_disable:
+	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
+	nvme_dev_disable(dev, true);
+	nvme_free_host_mem(dev);
+	nvme_dev_remove_admin(dev);
+	nvme_dbbuf_dma_free(dev);
+	nvme_free_queues(dev, 0);
+out_release_iod_mempool:
+	mempool_destroy(dev->iod_mempool);
 out_release_prp_pools:
 	nvme_release_prp_pools(dev);
 out_dev_unmap:
@@ -3556,11 +3582,12 @@ static struct pci_driver nvme_driver = {
 	.probe		= nvme_probe,
 	.remove		= nvme_remove,
 	.shutdown	= nvme_shutdown,
-#ifdef CONFIG_PM_SLEEP
 	.driver		= {
-		.pm	= &nvme_dev_pm_ops,
-	},
+		.probe_type	= PROBE_PREFER_ASYNCHRONOUS,
+#ifdef CONFIG_PM_SLEEP
+		.pm		= &nvme_dev_pm_ops,
 #endif
+	},
 	.sriov_configure = pci_sriov_configure_simple,
 	.err_handler	= &nvme_err_handler,
 };
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 12/12] nvme-pci: don't unbind the driver on reset failure
  2022-11-08 15:02 ` Christoph Hellwig
@ 2022-11-08 15:02   ` Christoph Hellwig
  -1 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

Unbind a device driver when a reset fails is very unusual behavior.
Just disable the controller and leave it in dead state if we fail to
reset it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 40 ++++++++++------------------------------
 1 file changed, 10 insertions(+), 30 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 1c8c70767cb8a..b1b659db682fa 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -130,7 +130,6 @@ struct nvme_dev {
 	u32 db_stride;
 	void __iomem *bar;
 	unsigned long bar_mapped_size;
-	struct work_struct remove_work;
 	struct mutex shutdown_lock;
 	bool subsystem;
 	u64 cmb_size;
@@ -2781,20 +2780,6 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 	kfree(dev);
 }
 
-static void nvme_remove_dead_ctrl(struct nvme_dev *dev)
-{
-	/*
-	 * Set state to deleting now to avoid blocking nvme_wait_reset(), which
-	 * may be holding this pci_dev's device lock.
-	 */
-	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
-	nvme_get_ctrl(&dev->ctrl);
-	nvme_dev_disable(dev, false);
-	nvme_mark_namespaces_dead(&dev->ctrl);
-	if (!queue_work(nvme_wq, &dev->remove_work))
-		nvme_put_ctrl(&dev->ctrl);
-}
-
 static void nvme_reset_work(struct work_struct *work)
 {
 	struct nvme_dev *dev =
@@ -2894,20 +2879,16 @@ static void nvme_reset_work(struct work_struct *work)
  out_unlock:
 	mutex_unlock(&dev->shutdown_lock);
  out:
-	if (result)
-		dev_warn(dev->ctrl.device,
-			 "Removing after probe failure status: %d\n", result);
-	nvme_remove_dead_ctrl(dev);
-}
-
-static void nvme_remove_dead_ctrl_work(struct work_struct *work)
-{
-	struct nvme_dev *dev = container_of(work, struct nvme_dev, remove_work);
-	struct pci_dev *pdev = to_pci_dev(dev->dev);
-
-	if (pci_get_drvdata(pdev))
-		device_release_driver(&pdev->dev);
-	nvme_put_ctrl(&dev->ctrl);
+	/*
+	 * Set state to deleting now to avoid blocking nvme_wait_reset(), which
+	 * may be holding this pci_dev's device lock.
+	 */
+	dev_warn(dev->ctrl.device, "Disabling device after reset failure: %d\n",
+		 result);
+	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
+	nvme_dev_disable(dev, false);
+	nvme_mark_namespaces_dead(&dev->ctrl);
+	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DEAD);
 }
 
 static int nvme_pci_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val)
@@ -3045,7 +3026,6 @@ static struct nvme_dev *nvme_pci_alloc_ctrl(struct pci_dev *pdev,
 	if (!dev)
 		return NULL;
 	INIT_WORK(&dev->ctrl.reset_work, nvme_reset_work);
-	INIT_WORK(&dev->remove_work, nvme_remove_dead_ctrl_work);
 	mutex_init(&dev->shutdown_lock);
 
 	dev->nr_write_queues = write_queues;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 12/12] nvme-pci: don't unbind the driver on reset failure
@ 2022-11-08 15:02   ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-08 15:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

Unbind a device driver when a reset fails is very unusual behavior.
Just disable the controller and leave it in dead state if we fail to
reset it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/pci.c | 40 ++++++++++------------------------------
 1 file changed, 10 insertions(+), 30 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 1c8c70767cb8a..b1b659db682fa 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -130,7 +130,6 @@ struct nvme_dev {
 	u32 db_stride;
 	void __iomem *bar;
 	unsigned long bar_mapped_size;
-	struct work_struct remove_work;
 	struct mutex shutdown_lock;
 	bool subsystem;
 	u64 cmb_size;
@@ -2781,20 +2780,6 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 	kfree(dev);
 }
 
-static void nvme_remove_dead_ctrl(struct nvme_dev *dev)
-{
-	/*
-	 * Set state to deleting now to avoid blocking nvme_wait_reset(), which
-	 * may be holding this pci_dev's device lock.
-	 */
-	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
-	nvme_get_ctrl(&dev->ctrl);
-	nvme_dev_disable(dev, false);
-	nvme_mark_namespaces_dead(&dev->ctrl);
-	if (!queue_work(nvme_wq, &dev->remove_work))
-		nvme_put_ctrl(&dev->ctrl);
-}
-
 static void nvme_reset_work(struct work_struct *work)
 {
 	struct nvme_dev *dev =
@@ -2894,20 +2879,16 @@ static void nvme_reset_work(struct work_struct *work)
  out_unlock:
 	mutex_unlock(&dev->shutdown_lock);
  out:
-	if (result)
-		dev_warn(dev->ctrl.device,
-			 "Removing after probe failure status: %d\n", result);
-	nvme_remove_dead_ctrl(dev);
-}
-
-static void nvme_remove_dead_ctrl_work(struct work_struct *work)
-{
-	struct nvme_dev *dev = container_of(work, struct nvme_dev, remove_work);
-	struct pci_dev *pdev = to_pci_dev(dev->dev);
-
-	if (pci_get_drvdata(pdev))
-		device_release_driver(&pdev->dev);
-	nvme_put_ctrl(&dev->ctrl);
+	/*
+	 * Set state to deleting now to avoid blocking nvme_wait_reset(), which
+	 * may be holding this pci_dev's device lock.
+	 */
+	dev_warn(dev->ctrl.device, "Disabling device after reset failure: %d\n",
+		 result);
+	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
+	nvme_dev_disable(dev, false);
+	nvme_mark_namespaces_dead(&dev->ctrl);
+	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DEAD);
 }
 
 static int nvme_pci_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val)
@@ -3045,7 +3026,6 @@ static struct nvme_dev *nvme_pci_alloc_ctrl(struct pci_dev *pdev,
 	if (!dev)
 		return NULL;
 	INIT_WORK(&dev->ctrl.reset_work, nvme_reset_work);
-	INIT_WORK(&dev->remove_work, nvme_remove_dead_ctrl_work);
 	mutex_init(&dev->shutdown_lock);
 
 	dev->nr_write_queues = write_queues;
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 01/12] nvme-pci: don't call nvme_init_ctrl_finish from nvme_passthru_end
  2022-11-08 15:02   ` Christoph Hellwig
@ 2022-11-09  2:55     ` Sagi Grimberg
  -1 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  2:55 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme


> nvme_passthrough_end can race with a reset, so we should not call
> nvme_init_ctrl_finish from here.  Instead just log that the controller
> capabilities might have changed.

Can you explain here what is the problem caused by calling this from
here?

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 01/12] nvme-pci: don't call nvme_init_ctrl_finish from nvme_passthru_end
@ 2022-11-09  2:55     ` Sagi Grimberg
  0 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  2:55 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme


> nvme_passthrough_end can race with a reset, so we should not call
> nvme_init_ctrl_finish from here.  Instead just log that the controller
> capabilities might have changed.

Can you explain here what is the problem caused by calling this from
here?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/12] nvme: move OPAL setup from PCIe to core
  2022-11-08 15:02   ` Christoph Hellwig
@ 2022-11-09  2:55     ` Sagi Grimberg
  -1 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  2:55 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme

Nice

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/12] nvme: move OPAL setup from PCIe to core
@ 2022-11-09  2:55     ` Sagi Grimberg
  0 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  2:55 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme

Nice

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/12] nvme: simplify transport specific device attribute handling
  2022-11-08 15:02   ` Christoph Hellwig
@ 2022-11-09  2:57     ` Sagi Grimberg
  -1 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  2:57 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/12] nvme: simplify transport specific device attribute handling
@ 2022-11-09  2:57     ` Sagi Grimberg
  0 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  2:57 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/12] nvme-pci: put the admin queue in nvme_dev_remove_admin
  2022-11-08 15:02   ` Christoph Hellwig
@ 2022-11-09  2:58     ` Sagi Grimberg
  -1 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  2:58 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme


> Once the controller is shutdown no one can access the admin queue.  Tear
> it down in nvme_dev_remove_admin, which matches the flow in the other
> drivers.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/pci.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index c8f6ce5eee1c2..f526ad578088a 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1747,6 +1747,7 @@ static void nvme_dev_remove_admin(struct nvme_dev *dev)
>   		 */
>   		nvme_start_admin_queue(&dev->ctrl);
>   		blk_mq_destroy_queue(dev->ctrl.admin_q);
> +		blk_put_queue(dev->ctrl.admin_q);
>   		blk_mq_free_tag_set(&dev->admin_tagset);
>   	}
>   }
> @@ -2774,8 +2775,6 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
>   
>   	nvme_dbbuf_dma_free(dev);
>   	nvme_free_tagset(dev);
> -	if (dev->ctrl.admin_q)
> -		blk_put_queue(dev->ctrl.admin_q);

Is the check here valid? meaning can we get here with
a NULL admin_q?

In any event, looks good
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/12] nvme-pci: put the admin queue in nvme_dev_remove_admin
@ 2022-11-09  2:58     ` Sagi Grimberg
  0 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  2:58 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme


> Once the controller is shutdown no one can access the admin queue.  Tear
> it down in nvme_dev_remove_admin, which matches the flow in the other
> drivers.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/pci.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index c8f6ce5eee1c2..f526ad578088a 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1747,6 +1747,7 @@ static void nvme_dev_remove_admin(struct nvme_dev *dev)
>   		 */
>   		nvme_start_admin_queue(&dev->ctrl);
>   		blk_mq_destroy_queue(dev->ctrl.admin_q);
> +		blk_put_queue(dev->ctrl.admin_q);
>   		blk_mq_free_tag_set(&dev->admin_tagset);
>   	}
>   }
> @@ -2774,8 +2775,6 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
>   
>   	nvme_dbbuf_dma_free(dev);
>   	nvme_free_tagset(dev);
> -	if (dev->ctrl.admin_q)
> -		blk_put_queue(dev->ctrl.admin_q);

Is the check here valid? meaning can we get here with
a NULL admin_q?

In any event, looks good
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/12] nvme-pci: move more teardown work to nvme_remove
  2022-11-08 15:02   ` Christoph Hellwig
@ 2022-11-09  3:00     ` Sagi Grimberg
  -1 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  3:00 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/12] nvme-pci: move more teardown work to nvme_remove
@ 2022-11-09  3:00     ` Sagi Grimberg
  0 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  3:00 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 06/12] nvme-pci: factor the iod mempool creation into a helper
  2022-11-08 15:02   ` Christoph Hellwig
@ 2022-11-09  3:00     ` Sagi Grimberg
  -1 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  3:00 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 06/12] nvme-pci: factor the iod mempool creation into a helper
@ 2022-11-09  3:00     ` Sagi Grimberg
  0 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  3:00 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/12] nvme-pci: factor out a nvme_pci_alloc_ctrl helper
  2022-11-08 15:02   ` Christoph Hellwig
@ 2022-11-09  3:03     ` Sagi Grimberg
  -1 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  3:03 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme



On 11/8/22 17:02, Christoph Hellwig wrote:
> Add a helper that allocates the nvme_dev structure up to the point where
> we can call nvme_init_ctrl.  This pairs with the free_ctrl method and can
> thus be used to cleanup the teardown path and make it more symmetric.
> 
> Note that this now calls nvme_init_ctrl a lot earlier during probing,
> which also means the per-controller character device shows up earlier.
> Due to the controller state no commnds can be send on it, but it might
> make sense to delay the cdev registration until nvme_init_ctrl_finish.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/pci.c | 81 +++++++++++++++++++++++------------------
>   1 file changed, 46 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index f7dab65bf5042..e1f60b1c6918f 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2777,6 +2777,7 @@ static void nvme_free_tagset(struct nvme_dev *dev)
>   	dev->ctrl.tagset = NULL;
>   }
>   
> +/* pairs with nvme_pci_alloc_ctrl */
>   static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
>   {
>   	struct nvme_dev *dev = to_nvme_dev(ctrl);
> @@ -3090,19 +3091,23 @@ static void nvme_async_probe(void *data, async_cookie_t cookie)
>   	nvme_put_ctrl(&dev->ctrl);
>   }
>   
> -static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> +static struct nvme_dev *nvme_pci_alloc_ctrl(struct pci_dev *pdev,
> +		const struct pci_device_id *id)

kinda weird that nvme_pci_alloc_ctrl return struct nvme_dev. Patch looks
fine though...

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/12] nvme-pci: factor out a nvme_pci_alloc_ctrl helper
@ 2022-11-09  3:03     ` Sagi Grimberg
  0 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  3:03 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme



On 11/8/22 17:02, Christoph Hellwig wrote:
> Add a helper that allocates the nvme_dev structure up to the point where
> we can call nvme_init_ctrl.  This pairs with the free_ctrl method and can
> thus be used to cleanup the teardown path and make it more symmetric.
> 
> Note that this now calls nvme_init_ctrl a lot earlier during probing,
> which also means the per-controller character device shows up earlier.
> Due to the controller state no commnds can be send on it, but it might
> make sense to delay the cdev registration until nvme_init_ctrl_finish.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/pci.c | 81 +++++++++++++++++++++++------------------
>   1 file changed, 46 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index f7dab65bf5042..e1f60b1c6918f 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2777,6 +2777,7 @@ static void nvme_free_tagset(struct nvme_dev *dev)
>   	dev->ctrl.tagset = NULL;
>   }
>   
> +/* pairs with nvme_pci_alloc_ctrl */
>   static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
>   {
>   	struct nvme_dev *dev = to_nvme_dev(ctrl);
> @@ -3090,19 +3091,23 @@ static void nvme_async_probe(void *data, async_cookie_t cookie)
>   	nvme_put_ctrl(&dev->ctrl);
>   }
>   
> -static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> +static struct nvme_dev *nvme_pci_alloc_ctrl(struct pci_dev *pdev,
> +		const struct pci_device_id *id)

kinda weird that nvme_pci_alloc_ctrl return struct nvme_dev. Patch looks
fine though...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/12] nvme-pci: set constant paramters in nvme_pci_alloc_ctrl
  2022-11-08 15:02   ` Christoph Hellwig
@ 2022-11-09  3:03     ` Sagi Grimberg
  -1 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  3:03 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/12] nvme-pci: set constant paramters in nvme_pci_alloc_ctrl
@ 2022-11-09  3:03     ` Sagi Grimberg
  0 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  3:03 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 09/12] nvme-pci: call nvme_pci_configure_admin_queue from nvme_pci_enable
  2022-11-08 15:02   ` Christoph Hellwig
@ 2022-11-09  3:04     ` Sagi Grimberg
  -1 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  3:04 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 09/12] nvme-pci: call nvme_pci_configure_admin_queue from nvme_pci_enable
@ 2022-11-09  3:04     ` Sagi Grimberg
  0 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  3:04 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/12] nvme-pci: split nvme_dbbuf_dma_alloc
  2022-11-08 15:02   ` Christoph Hellwig
@ 2022-11-09  3:05     ` Sagi Grimberg
  -1 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  3:05 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/12] nvme-pci: split nvme_dbbuf_dma_alloc
@ 2022-11-09  3:05     ` Sagi Grimberg
  0 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  3:05 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/12] nvme-pci: split the initial probe from the rest path
  2022-11-08 15:02   ` Christoph Hellwig
@ 2022-11-09  3:14     ` Sagi Grimberg
  -1 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  3:14 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme

Nice

On 11/8/22 17:02, Christoph Hellwig wrote:
> nvme_reset_work is a little fragile as it needs to handle both resetting
> a live controller and initializing one during probe.  Split out the initial
> probe and open code it in nvme_probe and leave nvme_reset_work to just do
> the live controller reset.
> 
> This fixes a recently introduced bug where nvme_dev_disable causes a NULL
> pointer dereferences in blk_mq_quiesce_tagset because the tagset pointer
> is not set when the reset state is entered directly from the new state.
> The separate probe code can skip the reset state and probe directly and
> fixes this.
> 
> To make sure the system isn't single threaded on enabling nvme
> controllers, set the PROBE_PREFER_ASYNCHRONOUS flag in the device_driver
> structure so that the driver core probes in parallel.
> 
> Fixes: 98d81f0df70c ("nvme: use blk_mq_[un]quiesce_tagset")
> Reported-by: Gerd Bayer <gbayer@linux.ibm.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/pci.c | 139 ++++++++++++++++++++++++----------------
>   1 file changed, 83 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 1b3c96a4b7c90..1c8c70767cb8a 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2821,15 +2821,7 @@ static void nvme_reset_work(struct work_struct *work)
>   	result = nvme_pci_enable(dev);
>   	if (result)
>   		goto out_unlock;
> -
> -	if (!dev->ctrl.admin_q) {
> -		result = nvme_pci_alloc_admin_tag_set(dev);
> -		if (result)
> -			goto out_unlock;
> -	} else {
> -		nvme_start_admin_queue(&dev->ctrl);
> -	}
> -
> +	nvme_start_admin_queue(&dev->ctrl);
>   	mutex_unlock(&dev->shutdown_lock);
>   
>   	/*
> @@ -2854,9 +2846,6 @@ static void nvme_reset_work(struct work_struct *work)
>   		 */
>   		memset(dev->dbbuf_dbs, 0, nvme_dbbuf_size(dev));
>   		memset(dev->dbbuf_eis, 0, nvme_dbbuf_size(dev));
> -	} else {
> -		if (dev->ctrl.oacs & NVME_CTRL_OACS_DBBUF_SUPP)
> -			nvme_dbbuf_dma_alloc(dev);
>   	}
>   
>   	if (dev->ctrl.hmpre) {
> @@ -2869,37 +2858,23 @@ static void nvme_reset_work(struct work_struct *work)
>   	if (result)
>   		goto out;
>   
> -	if (dev->ctrl.tagset) {
> -		/*
> -		 * This is a controller reset and we already have a tagset.
> -		 * Freeze and update the number of I/O queues as thos might have
> -		 * changed.  If there are no I/O queues left after this reset,
> -		 * keep the controller around but remove all namespaces.
> -		 */
> -		if (dev->online_queues > 1) {
> -			nvme_start_queues(&dev->ctrl);
> -			nvme_wait_freeze(&dev->ctrl);
> -			nvme_pci_update_nr_queues(dev);
> -			nvme_dbbuf_set(dev);
> -			nvme_unfreeze(&dev->ctrl);
> -		} else {
> -			dev_warn(dev->ctrl.device, "IO queues lost\n");
> -			nvme_mark_namespaces_dead(&dev->ctrl);
> -			nvme_start_queues(&dev->ctrl);
> -			nvme_remove_namespaces(&dev->ctrl);
> -			nvme_free_tagset(dev);
> -		}
> +	/*
> +	 * Freeze and update the number of I/O queues as thos might have
> +	 * changed.  If there are no I/O queues left after this reset, keep the
> +	 * controller around but remove all namespaces.
> +	 */
> +	if (dev->online_queues > 1) {
> +		nvme_start_queues(&dev->ctrl);
> +		nvme_wait_freeze(&dev->ctrl);
> +		nvme_pci_update_nr_queues(dev);
> +		nvme_dbbuf_set(dev);
> +		nvme_unfreeze(&dev->ctrl);
>   	} else {
> -		/*
> -		 * First probe.  Still allow the controller to show up even if
> -		 * there are no namespaces.
> -		 */
> -		if (dev->online_queues > 1) {
> -			nvme_pci_alloc_tag_set(dev);
> -			nvme_dbbuf_set(dev);
> -		} else {
> -			dev_warn(dev->ctrl.device, "IO queues not created\n");
> -		}
> +		dev_warn(dev->ctrl.device, "IO queues lost\n");
> +		nvme_mark_namespaces_dead(&dev->ctrl);
> +		nvme_start_queues(&dev->ctrl);
> +		nvme_remove_namespaces(&dev->ctrl);

Is this needed? isn't nvme_remove coming soon?
In fact, shouldn't all these calls be in nvme_remove?

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/12] nvme-pci: split the initial probe from the rest path
@ 2022-11-09  3:14     ` Sagi Grimberg
  0 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  3:14 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme

Nice

On 11/8/22 17:02, Christoph Hellwig wrote:
> nvme_reset_work is a little fragile as it needs to handle both resetting
> a live controller and initializing one during probe.  Split out the initial
> probe and open code it in nvme_probe and leave nvme_reset_work to just do
> the live controller reset.
> 
> This fixes a recently introduced bug where nvme_dev_disable causes a NULL
> pointer dereferences in blk_mq_quiesce_tagset because the tagset pointer
> is not set when the reset state is entered directly from the new state.
> The separate probe code can skip the reset state and probe directly and
> fixes this.
> 
> To make sure the system isn't single threaded on enabling nvme
> controllers, set the PROBE_PREFER_ASYNCHRONOUS flag in the device_driver
> structure so that the driver core probes in parallel.
> 
> Fixes: 98d81f0df70c ("nvme: use blk_mq_[un]quiesce_tagset")
> Reported-by: Gerd Bayer <gbayer@linux.ibm.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/pci.c | 139 ++++++++++++++++++++++++----------------
>   1 file changed, 83 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 1b3c96a4b7c90..1c8c70767cb8a 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2821,15 +2821,7 @@ static void nvme_reset_work(struct work_struct *work)
>   	result = nvme_pci_enable(dev);
>   	if (result)
>   		goto out_unlock;
> -
> -	if (!dev->ctrl.admin_q) {
> -		result = nvme_pci_alloc_admin_tag_set(dev);
> -		if (result)
> -			goto out_unlock;
> -	} else {
> -		nvme_start_admin_queue(&dev->ctrl);
> -	}
> -
> +	nvme_start_admin_queue(&dev->ctrl);
>   	mutex_unlock(&dev->shutdown_lock);
>   
>   	/*
> @@ -2854,9 +2846,6 @@ static void nvme_reset_work(struct work_struct *work)
>   		 */
>   		memset(dev->dbbuf_dbs, 0, nvme_dbbuf_size(dev));
>   		memset(dev->dbbuf_eis, 0, nvme_dbbuf_size(dev));
> -	} else {
> -		if (dev->ctrl.oacs & NVME_CTRL_OACS_DBBUF_SUPP)
> -			nvme_dbbuf_dma_alloc(dev);
>   	}
>   
>   	if (dev->ctrl.hmpre) {
> @@ -2869,37 +2858,23 @@ static void nvme_reset_work(struct work_struct *work)
>   	if (result)
>   		goto out;
>   
> -	if (dev->ctrl.tagset) {
> -		/*
> -		 * This is a controller reset and we already have a tagset.
> -		 * Freeze and update the number of I/O queues as thos might have
> -		 * changed.  If there are no I/O queues left after this reset,
> -		 * keep the controller around but remove all namespaces.
> -		 */
> -		if (dev->online_queues > 1) {
> -			nvme_start_queues(&dev->ctrl);
> -			nvme_wait_freeze(&dev->ctrl);
> -			nvme_pci_update_nr_queues(dev);
> -			nvme_dbbuf_set(dev);
> -			nvme_unfreeze(&dev->ctrl);
> -		} else {
> -			dev_warn(dev->ctrl.device, "IO queues lost\n");
> -			nvme_mark_namespaces_dead(&dev->ctrl);
> -			nvme_start_queues(&dev->ctrl);
> -			nvme_remove_namespaces(&dev->ctrl);
> -			nvme_free_tagset(dev);
> -		}
> +	/*
> +	 * Freeze and update the number of I/O queues as thos might have
> +	 * changed.  If there are no I/O queues left after this reset, keep the
> +	 * controller around but remove all namespaces.
> +	 */
> +	if (dev->online_queues > 1) {
> +		nvme_start_queues(&dev->ctrl);
> +		nvme_wait_freeze(&dev->ctrl);
> +		nvme_pci_update_nr_queues(dev);
> +		nvme_dbbuf_set(dev);
> +		nvme_unfreeze(&dev->ctrl);
>   	} else {
> -		/*
> -		 * First probe.  Still allow the controller to show up even if
> -		 * there are no namespaces.
> -		 */
> -		if (dev->online_queues > 1) {
> -			nvme_pci_alloc_tag_set(dev);
> -			nvme_dbbuf_set(dev);
> -		} else {
> -			dev_warn(dev->ctrl.device, "IO queues not created\n");
> -		}
> +		dev_warn(dev->ctrl.device, "IO queues lost\n");
> +		nvme_mark_namespaces_dead(&dev->ctrl);
> +		nvme_start_queues(&dev->ctrl);
> +		nvme_remove_namespaces(&dev->ctrl);

Is this needed? isn't nvme_remove coming soon?
In fact, shouldn't all these calls be in nvme_remove?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 12/12] nvme-pci: don't unbind the driver on reset failure
  2022-11-08 15:02   ` Christoph Hellwig
@ 2022-11-09  3:15     ` Sagi Grimberg
  -1 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  3:15 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme


> Unbind a device driver when a reset fails is very unusual behavior.
> Just disable the controller and leave it in dead state if we fail to
> reset it.

OK, that is fine by me.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 12/12] nvme-pci: don't unbind the driver on reset failure
@ 2022-11-09  3:15     ` Sagi Grimberg
  0 siblings, 0 replies; 80+ messages in thread
From: Sagi Grimberg @ 2022-11-09  3:15 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel, linux-nvme


> Unbind a device driver when a reset fails is very unusual behavior.
> Just disable the controller and leave it in dead state if we fail to
> reset it.

OK, that is fine by me.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 01/12] nvme-pci: don't call nvme_init_ctrl_finish from nvme_passthru_end
  2022-11-09  2:55     ` Sagi Grimberg
@ 2022-11-09  6:26       ` Christoph Hellwig
  -1 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-09  6:26 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni, Gerd Bayer,
	asahi, linux-arm-kernel, linux-nvme

[note to self: the subject should say nvme instead of nvme-pci]

On Wed, Nov 09, 2022 at 04:55:47AM +0200, Sagi Grimberg wrote:
>> nvme_passthrough_end can race with a reset, so we should not call
>> nvme_init_ctrl_finish from here.  Instead just log that the controller
>> capabilities might have changed.
>
> Can you explain here what is the problem caused by calling this from
> here?

I'll add it to the next version:

 - stores to the cels xarray can race
 - the new opal initialization can race

and in general it just updates random controller fields without
a reset, which I'm worried about hitting completely untested code.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 01/12] nvme-pci: don't call nvme_init_ctrl_finish from nvme_passthru_end
@ 2022-11-09  6:26       ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-09  6:26 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni, Gerd Bayer,
	asahi, linux-arm-kernel, linux-nvme

[note to self: the subject should say nvme instead of nvme-pci]

On Wed, Nov 09, 2022 at 04:55:47AM +0200, Sagi Grimberg wrote:
>> nvme_passthrough_end can race with a reset, so we should not call
>> nvme_init_ctrl_finish from here.  Instead just log that the controller
>> capabilities might have changed.
>
> Can you explain here what is the problem caused by calling this from
> here?

I'll add it to the next version:

 - stores to the cels xarray can race
 - the new opal initialization can race

and in general it just updates random controller fields without
a reset, which I'm worried about hitting completely untested code.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/12] nvme-pci: put the admin queue in nvme_dev_remove_admin
  2022-11-09  2:58     ` Sagi Grimberg
@ 2022-11-09  6:28       ` Christoph Hellwig
  -1 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-09  6:28 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni, Gerd Bayer,
	asahi, linux-arm-kernel, linux-nvme

On Wed, Nov 09, 2022 at 04:58:46AM +0200, Sagi Grimberg wrote:
>> @@ -2774,8 +2775,6 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
>>     	nvme_dbbuf_dma_free(dev);
>>   	nvme_free_tagset(dev);
>> -	if (dev->ctrl.admin_q)
>> -		blk_put_queue(dev->ctrl.admin_q);
>
> Is the check here valid? meaning can we get here with
> a NULL admin_q?

Yes, the admin queue is only set in the middle of nvme_reset_work,
and for errors before that we can get here.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/12] nvme-pci: put the admin queue in nvme_dev_remove_admin
@ 2022-11-09  6:28       ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-09  6:28 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni, Gerd Bayer,
	asahi, linux-arm-kernel, linux-nvme

On Wed, Nov 09, 2022 at 04:58:46AM +0200, Sagi Grimberg wrote:
>> @@ -2774,8 +2775,6 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
>>     	nvme_dbbuf_dma_free(dev);
>>   	nvme_free_tagset(dev);
>> -	if (dev->ctrl.admin_q)
>> -		blk_put_queue(dev->ctrl.admin_q);
>
> Is the check here valid? meaning can we get here with
> a NULL admin_q?

Yes, the admin queue is only set in the middle of nvme_reset_work,
and for errors before that we can get here.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/12] nvme-pci: factor out a nvme_pci_alloc_ctrl helper
  2022-11-09  3:03     ` Sagi Grimberg
@ 2022-11-09  6:28       ` Christoph Hellwig
  -1 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-09  6:28 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni, Gerd Bayer,
	asahi, linux-arm-kernel, linux-nvme

On Wed, Nov 09, 2022 at 05:03:19AM +0200, Sagi Grimberg wrote:
>>   -static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id 
>> *id)
>> +static struct nvme_dev *nvme_pci_alloc_ctrl(struct pci_dev *pdev,
>> +		const struct pci_device_id *id)
>
> kinda weird that nvme_pci_alloc_ctrl return struct nvme_dev. Patch looks
> fine though...

We can call it alloc_dev instead.  But that whole nvme_dev naming is weird,
it really should be nvme_pci_ctrl.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/12] nvme-pci: factor out a nvme_pci_alloc_ctrl helper
@ 2022-11-09  6:28       ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-09  6:28 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni, Gerd Bayer,
	asahi, linux-arm-kernel, linux-nvme

On Wed, Nov 09, 2022 at 05:03:19AM +0200, Sagi Grimberg wrote:
>>   -static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id 
>> *id)
>> +static struct nvme_dev *nvme_pci_alloc_ctrl(struct pci_dev *pdev,
>> +		const struct pci_device_id *id)
>
> kinda weird that nvme_pci_alloc_ctrl return struct nvme_dev. Patch looks
> fine though...

We can call it alloc_dev instead.  But that whole nvme_dev naming is weird,
it really should be nvme_pci_ctrl.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/12] nvme-pci: split the initial probe from the rest path
  2022-11-09  3:14     ` Sagi Grimberg
@ 2022-11-09  6:31       ` Christoph Hellwig
  -1 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-09  6:31 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni, Gerd Bayer,
	asahi, linux-arm-kernel, linux-nvme

On Wed, Nov 09, 2022 at 05:14:02AM +0200, Sagi Grimberg wrote:
>> -		if (dev->online_queues > 1) {
>> -			nvme_pci_alloc_tag_set(dev);
>> -			nvme_dbbuf_set(dev);
>> -		} else {
>> -			dev_warn(dev->ctrl.device, "IO queues not created\n");
>> -		}
>> +		dev_warn(dev->ctrl.device, "IO queues lost\n");
>> +		nvme_mark_namespaces_dead(&dev->ctrl);
>> +		nvme_start_queues(&dev->ctrl);
>> +		nvme_remove_namespaces(&dev->ctrl);
>
> Is this needed? isn't nvme_remove coming soon?
> In fact, shouldn't all these calls be in nvme_remove?

This handles the case where a controller controller does not have I/O
queues. For controllers that never had them (e.g. admin controllers)
none of the three calls above is needed, but they deal with the
case where a controller had queues, but they are going away.  I'm
not sure if that can happen, but it keeps the behavior of the existing
code.  Keith wrote it to deal with Intel controllers that can be in
a degraded state where having the admin queue live even without I/O
queues allows updating the firmware, so he might know more.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/12] nvme-pci: split the initial probe from the rest path
@ 2022-11-09  6:31       ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-09  6:31 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni, Gerd Bayer,
	asahi, linux-arm-kernel, linux-nvme

On Wed, Nov 09, 2022 at 05:14:02AM +0200, Sagi Grimberg wrote:
>> -		if (dev->online_queues > 1) {
>> -			nvme_pci_alloc_tag_set(dev);
>> -			nvme_dbbuf_set(dev);
>> -		} else {
>> -			dev_warn(dev->ctrl.device, "IO queues not created\n");
>> -		}
>> +		dev_warn(dev->ctrl.device, "IO queues lost\n");
>> +		nvme_mark_namespaces_dead(&dev->ctrl);
>> +		nvme_start_queues(&dev->ctrl);
>> +		nvme_remove_namespaces(&dev->ctrl);
>
> Is this needed? isn't nvme_remove coming soon?
> In fact, shouldn't all these calls be in nvme_remove?

This handles the case where a controller controller does not have I/O
queues. For controllers that never had them (e.g. admin controllers)
none of the three calls above is needed, but they deal with the
case where a controller had queues, but they are going away.  I'm
not sure if that can happen, but it keeps the behavior of the existing
code.  Keith wrote it to deal with Intel controllers that can be in
a degraded state where having the admin queue live even without I/O
queues allows updating the firmware, so he might know more.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/12] nvme-pci: split the initial probe from the rest path
  2022-11-08 15:02   ` Christoph Hellwig
@ 2022-11-09 15:18     ` Gerd Bayer
  -1 siblings, 0 replies; 80+ messages in thread
From: Gerd Bayer @ 2022-11-09 15:18 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, asahi, linux-arm-kernel,
	linux-nvme, Niklas Schnelle

Hi Christoph,

On Tue, 2022-11-08 at 16:02 +0100, Christoph Hellwig wrote:
> nvme_reset_work is a little fragile as it needs to handle both resetting
> a live controller and initializing one during probe.  Split out the initial
> probe and open code it in nvme_probe and leave nvme_reset_work to just do
> the live controller reset.
> 
> This fixes a recently introduced bug where nvme_dev_disable causes a NULL
> pointer dereferences in blk_mq_quiesce_tagset because the tagset pointer
> is not set when the reset state is entered directly from the new state.
> The separate probe code can skip the reset state and probe directly and
> fixes this.
> 
> To make sure the system isn't single threaded on enabling nvme
> controllers, set the PROBE_PREFER_ASYNCHRONOUS flag in the device_driver
> structure so that the driver core probes in parallel.
> 
> Fixes: 98d81f0df70c ("nvme: use blk_mq_[un]quiesce_tagset")
> Reported-by: Gerd Bayer <gbayer@linux.ibm.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/nvme/host/pci.c | 139 ++++++++++++++++++++++++----------------
>  1 file changed, 83 insertions(+), 56 deletions(-)

I have successfully tested the patch series as proposed here on top of next-20220908.
The small test script that I used to expose the race condition in my initial bug report
https://lore.kernel.org/linux-nvme/20221108091609.1020-1-hdanton@sina.com/T/#t
did no longer reproduce the kernel panic.
Even repeated unbind/bind/remove/rescan cycles worked without any issue.

Out of curiousity I did run a little traffic to the NVMe drive after the repeated
cycles, too. No issues.

While the patch series did apply fine on next-20220909 I was unable to do any testing
with that as that had different severe issues to boot.

So feel free to add my 
Tested-by Gerd Bayer <gbayer@linxu.ibm.com> for the whole series

Thank you,
Gerd Bayer



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/12] nvme-pci: split the initial probe from the rest path
@ 2022-11-09 15:18     ` Gerd Bayer
  0 siblings, 0 replies; 80+ messages in thread
From: Gerd Bayer @ 2022-11-09 15:18 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, asahi, linux-arm-kernel,
	linux-nvme, Niklas Schnelle

Hi Christoph,

On Tue, 2022-11-08 at 16:02 +0100, Christoph Hellwig wrote:
> nvme_reset_work is a little fragile as it needs to handle both resetting
> a live controller and initializing one during probe.  Split out the initial
> probe and open code it in nvme_probe and leave nvme_reset_work to just do
> the live controller reset.
> 
> This fixes a recently introduced bug where nvme_dev_disable causes a NULL
> pointer dereferences in blk_mq_quiesce_tagset because the tagset pointer
> is not set when the reset state is entered directly from the new state.
> The separate probe code can skip the reset state and probe directly and
> fixes this.
> 
> To make sure the system isn't single threaded on enabling nvme
> controllers, set the PROBE_PREFER_ASYNCHRONOUS flag in the device_driver
> structure so that the driver core probes in parallel.
> 
> Fixes: 98d81f0df70c ("nvme: use blk_mq_[un]quiesce_tagset")
> Reported-by: Gerd Bayer <gbayer@linux.ibm.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/nvme/host/pci.c | 139 ++++++++++++++++++++++++----------------
>  1 file changed, 83 insertions(+), 56 deletions(-)

I have successfully tested the patch series as proposed here on top of next-20220908.
The small test script that I used to expose the race condition in my initial bug report
https://lore.kernel.org/linux-nvme/20221108091609.1020-1-hdanton@sina.com/T/#t
did no longer reproduce the kernel panic.
Even repeated unbind/bind/remove/rescan cycles worked without any issue.

Out of curiousity I did run a little traffic to the NVMe drive after the repeated
cycles, too. No issues.

While the patch series did apply fine on next-20220909 I was unable to do any testing
with that as that had different severe issues to boot.

So feel free to add my 
Tested-by Gerd Bayer <gbayer@linxu.ibm.com> for the whole series

Thank you,
Gerd Bayer



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/12] nvme-pci: split the initial probe from the rest path
  2022-11-08 15:02   ` Christoph Hellwig
@ 2022-11-09 15:51     ` Keith Busch
  -1 siblings, 0 replies; 80+ messages in thread
From: Keith Busch @ 2022-11-09 15:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

On Tue, Nov 08, 2022 at 04:02:51PM +0100, Christoph Hellwig wrote:
> nvme_reset_work is a little fragile as it needs to handle both resetting
> a live controller and initializing one during probe.  Split out the initial
> probe and open code it in nvme_probe and leave nvme_reset_work to just do
> the live controller reset.

By enabling the controller in probe, you are blocking subsequent
controllers from probing in parallel. Some devices take a very long time
to complete enable, so serializing this process may significantly
increase a system reset time for one with even a modest number of nvme
drives.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/12] nvme-pci: split the initial probe from the rest path
@ 2022-11-09 15:51     ` Keith Busch
  0 siblings, 0 replies; 80+ messages in thread
From: Keith Busch @ 2022-11-09 15:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

On Tue, Nov 08, 2022 at 04:02:51PM +0100, Christoph Hellwig wrote:
> nvme_reset_work is a little fragile as it needs to handle both resetting
> a live controller and initializing one during probe.  Split out the initial
> probe and open code it in nvme_probe and leave nvme_reset_work to just do
> the live controller reset.

By enabling the controller in probe, you are blocking subsequent
controllers from probing in parallel. Some devices take a very long time
to complete enable, so serializing this process may significantly
increase a system reset time for one with even a modest number of nvme
drives.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/12] nvme-pci: split the initial probe from the rest path
  2022-11-08 15:02   ` Christoph Hellwig
@ 2022-11-09 15:56     ` Keith Busch
  -1 siblings, 0 replies; 80+ messages in thread
From: Keith Busch @ 2022-11-09 15:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

On Tue, Nov 08, 2022 at 04:02:51PM +0100, Christoph Hellwig wrote:
> To make sure the system isn't single threaded on enabling nvme
> controllers, set the PROBE_PREFER_ASYNCHRONOUS flag in the device_driver
> structure so that the driver core probes in parallel.

Please ignore my previous message; I replied before getting to this
part. :)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/12] nvme-pci: split the initial probe from the rest path
@ 2022-11-09 15:56     ` Keith Busch
  0 siblings, 0 replies; 80+ messages in thread
From: Keith Busch @ 2022-11-09 15:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

On Tue, Nov 08, 2022 at 04:02:51PM +0100, Christoph Hellwig wrote:
> To make sure the system isn't single threaded on enabling nvme
> controllers, set the PROBE_PREFER_ASYNCHRONOUS flag in the device_driver
> structure so that the driver core probes in parallel.

Please ignore my previous message; I replied before getting to this
part. :)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/12] nvme-pci: split the initial probe from the rest path
  2022-11-09  6:31       ` Christoph Hellwig
@ 2022-11-09 17:00         ` Keith Busch
  -1 siblings, 0 replies; 80+ messages in thread
From: Keith Busch @ 2022-11-09 17:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

On Wed, Nov 09, 2022 at 07:31:19AM +0100, Christoph Hellwig wrote:
> On Wed, Nov 09, 2022 at 05:14:02AM +0200, Sagi Grimberg wrote:
> >> -		if (dev->online_queues > 1) {
> >> -			nvme_pci_alloc_tag_set(dev);
> >> -			nvme_dbbuf_set(dev);
> >> -		} else {
> >> -			dev_warn(dev->ctrl.device, "IO queues not created\n");
> >> -		}
> >> +		dev_warn(dev->ctrl.device, "IO queues lost\n");
> >> +		nvme_mark_namespaces_dead(&dev->ctrl);
> >> +		nvme_start_queues(&dev->ctrl);
> >> +		nvme_remove_namespaces(&dev->ctrl);
> >
> > Is this needed? isn't nvme_remove coming soon?
> > In fact, shouldn't all these calls be in nvme_remove?
> 
> This handles the case where a controller controller does not have I/O
> queues. For controllers that never had them (e.g. admin controllers)
> none of the three calls above is needed, but they deal with the
> case where a controller had queues, but they are going away.  I'm
> not sure if that can happen, but it keeps the behavior of the existing
> code.  Keith wrote it to deal with Intel controllers that can be in
> a degraded state where having the admin queue live even without I/O
> queues allows updating the firmware, so he might know more.

Right, firmware assert and other types of errors can put the controller
into a degraded state with only an admin-queue when it previously had
working IO queues. Keeping the admin queue active allows an admin to
pull logs for their bug reports.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/12] nvme-pci: split the initial probe from the rest path
@ 2022-11-09 17:00         ` Keith Busch
  0 siblings, 0 replies; 80+ messages in thread
From: Keith Busch @ 2022-11-09 17:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

On Wed, Nov 09, 2022 at 07:31:19AM +0100, Christoph Hellwig wrote:
> On Wed, Nov 09, 2022 at 05:14:02AM +0200, Sagi Grimberg wrote:
> >> -		if (dev->online_queues > 1) {
> >> -			nvme_pci_alloc_tag_set(dev);
> >> -			nvme_dbbuf_set(dev);
> >> -		} else {
> >> -			dev_warn(dev->ctrl.device, "IO queues not created\n");
> >> -		}
> >> +		dev_warn(dev->ctrl.device, "IO queues lost\n");
> >> +		nvme_mark_namespaces_dead(&dev->ctrl);
> >> +		nvme_start_queues(&dev->ctrl);
> >> +		nvme_remove_namespaces(&dev->ctrl);
> >
> > Is this needed? isn't nvme_remove coming soon?
> > In fact, shouldn't all these calls be in nvme_remove?
> 
> This handles the case where a controller controller does not have I/O
> queues. For controllers that never had them (e.g. admin controllers)
> none of the three calls above is needed, but they deal with the
> case where a controller had queues, but they are going away.  I'm
> not sure if that can happen, but it keeps the behavior of the existing
> code.  Keith wrote it to deal with Intel controllers that can be in
> a degraded state where having the admin queue live even without I/O
> queues allows updating the firmware, so he might know more.

Right, firmware assert and other types of errors can put the controller
into a degraded state with only an admin-queue when it previously had
working IO queues. Keeping the admin queue active allows an admin to
pull logs for their bug reports.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 12/12] nvme-pci: don't unbind the driver on reset failure
  2022-11-08 15:02   ` Christoph Hellwig
@ 2022-11-09 17:10     ` Keith Busch
  -1 siblings, 0 replies; 80+ messages in thread
From: Keith Busch @ 2022-11-09 17:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

On Tue, Nov 08, 2022 at 04:02:52PM +0100, Christoph Hellwig wrote:
> -static void nvme_remove_dead_ctrl_work(struct work_struct *work)
> -{
> -	struct nvme_dev *dev = container_of(work, struct nvme_dev, remove_work);
> -	struct pci_dev *pdev = to_pci_dev(dev->dev);
> -
> -	if (pci_get_drvdata(pdev))
> -		device_release_driver(&pdev->dev);
> -	nvme_put_ctrl(&dev->ctrl);
> +	/*
> +	 * Set state to deleting now to avoid blocking nvme_wait_reset(), which
> +	 * may be holding this pci_dev's device lock.
> +	 */
> +	dev_warn(dev->ctrl.device, "Disabling device after reset failure: %d\n",
> +		 result);
> +	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
> +	nvme_dev_disable(dev, false);

I think the shutdown can be set to 'true' now that you're not
immediately unbinding the controller. That will unblock any queued IO
that would have been flushed out during the unbind.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 12/12] nvme-pci: don't unbind the driver on reset failure
@ 2022-11-09 17:10     ` Keith Busch
  0 siblings, 0 replies; 80+ messages in thread
From: Keith Busch @ 2022-11-09 17:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

On Tue, Nov 08, 2022 at 04:02:52PM +0100, Christoph Hellwig wrote:
> -static void nvme_remove_dead_ctrl_work(struct work_struct *work)
> -{
> -	struct nvme_dev *dev = container_of(work, struct nvme_dev, remove_work);
> -	struct pci_dev *pdev = to_pci_dev(dev->dev);
> -
> -	if (pci_get_drvdata(pdev))
> -		device_release_driver(&pdev->dev);
> -	nvme_put_ctrl(&dev->ctrl);
> +	/*
> +	 * Set state to deleting now to avoid blocking nvme_wait_reset(), which
> +	 * may be holding this pci_dev's device lock.
> +	 */
> +	dev_warn(dev->ctrl.device, "Disabling device after reset failure: %d\n",
> +		 result);
> +	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
> +	nvme_dev_disable(dev, false);

I think the shutdown can be set to 'true' now that you're not
immediately unbinding the controller. That will unblock any queued IO
that would have been flushed out during the unbind.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: RFC: nvme-pci: split the probe and reset handlers
  2022-11-08 15:02 ` Christoph Hellwig
@ 2022-11-09 17:12   ` Keith Busch
  -1 siblings, 0 replies; 80+ messages in thread
From: Keith Busch @ 2022-11-09 17:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

This was easier to review than I initially thought. This series looks
great! I just had the one comment on patch 12, but I think it's probably
fine as-is.

Reviewed-by: Keith Busch <kbusch@kernel.org>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: RFC: nvme-pci: split the probe and reset handlers
@ 2022-11-09 17:12   ` Keith Busch
  0 siblings, 0 replies; 80+ messages in thread
From: Keith Busch @ 2022-11-09 17:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

This was easier to review than I initially thought. This series looks
great! I just had the one comment on patch 12, but I think it's probably
fine as-is.

Reviewed-by: Keith Busch <kbusch@kernel.org>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/12] nvme: move OPAL setup from PCIe to core
  2022-11-08 15:02   ` Christoph Hellwig
@ 2022-11-09 20:44     ` Keith Busch
  -1 siblings, 0 replies; 80+ messages in thread
From: Keith Busch @ 2022-11-09 20:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

On Tue, Nov 08, 2022 at 04:02:42PM +0100, Christoph Hellwig wrote:
> -#ifdef CONFIG_BLK_SED_OPAL
> -int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
> +static int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
>  		bool send)
>  {
>  	struct nvme_ctrl *ctrl = data;
> @@ -2198,8 +2197,21 @@ int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
>  	return __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, NULL, buffer, len,
>  			NVME_QID_ANY, 1, 0);
>  }
> -EXPORT_SYMBOL_GPL(nvme_sec_submit);
> -#endif /* CONFIG_BLK_SED_OPAL */

It looks like you need to keep the #ifdef. The compiler knows it's not
used without it:

  drivers/nvme/host/core.c:2183:12: warning: 'nvme_sec_submit' defined but not used [-Wunused-function]
   static int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,

> +static void nvme_configure_opal(struct nvme_ctrl *ctrl, bool was_suspended)
> +{
> +	if (!IS_ENABLED(CONFIG_BLK_SED_OPAL))
> +		return;
> +	if (ctrl->oacs & NVME_CTRL_OACS_SEC_SUPP) {
> +		if (!ctrl->opal_dev)
> +			ctrl->opal_dev = init_opal_dev(ctrl, &nvme_sec_submit);
> +		else if (was_suspended)
> +			opal_unlock_from_suspend(ctrl->opal_dev);
> +	} else {
> +		free_opal_dev(ctrl->opal_dev);
> +		ctrl->opal_dev = NULL;
> +	}
> +}

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/12] nvme: move OPAL setup from PCIe to core
@ 2022-11-09 20:44     ` Keith Busch
  0 siblings, 0 replies; 80+ messages in thread
From: Keith Busch @ 2022-11-09 20:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

On Tue, Nov 08, 2022 at 04:02:42PM +0100, Christoph Hellwig wrote:
> -#ifdef CONFIG_BLK_SED_OPAL
> -int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
> +static int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
>  		bool send)
>  {
>  	struct nvme_ctrl *ctrl = data;
> @@ -2198,8 +2197,21 @@ int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
>  	return __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, NULL, buffer, len,
>  			NVME_QID_ANY, 1, 0);
>  }
> -EXPORT_SYMBOL_GPL(nvme_sec_submit);
> -#endif /* CONFIG_BLK_SED_OPAL */

It looks like you need to keep the #ifdef. The compiler knows it's not
used without it:

  drivers/nvme/host/core.c:2183:12: warning: 'nvme_sec_submit' defined but not used [-Wunused-function]
   static int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,

> +static void nvme_configure_opal(struct nvme_ctrl *ctrl, bool was_suspended)
> +{
> +	if (!IS_ENABLED(CONFIG_BLK_SED_OPAL))
> +		return;
> +	if (ctrl->oacs & NVME_CTRL_OACS_SEC_SUPP) {
> +		if (!ctrl->opal_dev)
> +			ctrl->opal_dev = init_opal_dev(ctrl, &nvme_sec_submit);
> +		else if (was_suspended)
> +			opal_unlock_from_suspend(ctrl->opal_dev);
> +	} else {
> +		free_opal_dev(ctrl->opal_dev);
> +		ctrl->opal_dev = NULL;
> +	}
> +}

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/12] nvme: move OPAL setup from PCIe to core
  2022-11-09 20:44     ` Keith Busch
@ 2022-11-09 23:22       ` Chaitanya Kulkarni
  -1 siblings, 0 replies; 80+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-09 23:22 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

On 11/9/22 12:44, Keith Busch wrote:
> On Tue, Nov 08, 2022 at 04:02:42PM +0100, Christoph Hellwig wrote:
>> -#ifdef CONFIG_BLK_SED_OPAL
>> -int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
>> +static int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
>>   		bool send)
>>   {
>>   	struct nvme_ctrl *ctrl = data;
>> @@ -2198,8 +2197,21 @@ int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
>>   	return __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, NULL, buffer, len,
>>   			NVME_QID_ANY, 1, 0);
>>   }
>> -EXPORT_SYMBOL_GPL(nvme_sec_submit);
>> -#endif /* CONFIG_BLK_SED_OPAL */
> 
> It looks like you need to keep the #ifdef. The compiler knows it's not
> used without it:
> 
>    drivers/nvme/host/core.c:2183:12: warning: 'nvme_sec_submit' defined but not used [-Wunused-function]
>     static int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
> 

why not move the sed-opal-code into its own file ?
and remove that #ifdefs altogether and conditionally compile the
file like zns?

-ck


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/12] nvme: move OPAL setup from PCIe to core
@ 2022-11-09 23:22       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 80+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-09 23:22 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

On 11/9/22 12:44, Keith Busch wrote:
> On Tue, Nov 08, 2022 at 04:02:42PM +0100, Christoph Hellwig wrote:
>> -#ifdef CONFIG_BLK_SED_OPAL
>> -int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
>> +static int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
>>   		bool send)
>>   {
>>   	struct nvme_ctrl *ctrl = data;
>> @@ -2198,8 +2197,21 @@ int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
>>   	return __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, NULL, buffer, len,
>>   			NVME_QID_ANY, 1, 0);
>>   }
>> -EXPORT_SYMBOL_GPL(nvme_sec_submit);
>> -#endif /* CONFIG_BLK_SED_OPAL */
> 
> It looks like you need to keep the #ifdef. The compiler knows it's not
> used without it:
> 
>    drivers/nvme/host/core.c:2183:12: warning: 'nvme_sec_submit' defined but not used [-Wunused-function]
>     static int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
> 

why not move the sed-opal-code into its own file ?
and remove that #ifdefs altogether and conditionally compile the
file like zns?

-ck

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/12] nvme-pci: split the initial probe from the rest path
  2022-11-08 15:02   ` Christoph Hellwig
@ 2022-11-10  3:17     ` Chao Leng
  -1 siblings, 0 replies; 80+ messages in thread
From: Chao Leng @ 2022-11-10  3:17 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme



On 2022/11/8 23:02, Christoph Hellwig wrote:
> nvme_reset_work is a little fragile as it needs to handle both resetting
> a live controller and initializing one during probe.  Split out the initial
> probe and open code it in nvme_probe and leave nvme_reset_work to just do
> the live controller reset.
> 
> This fixes a recently introduced bug where nvme_dev_disable causes a NULL
> pointer dereferences in blk_mq_quiesce_tagset because the tagset pointer
> is not set when the reset state is entered directly from the new state.
> The separate probe code can skip the reset state and probe directly and
> fixes this.
> 
> To make sure the system isn't single threaded on enabling nvme
> controllers, set the PROBE_PREFER_ASYNCHRONOUS flag in the device_driver
> structure so that the driver core probes in parallel.
> 
> Fixes: 98d81f0df70c ("nvme: use blk_mq_[un]quiesce_tagset")
> Reported-by: Gerd Bayer <gbayer@linux.ibm.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/pci.c | 139 ++++++++++++++++++++++++----------------
>   1 file changed, 83 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 1b3c96a4b7c90..1c8c70767cb8a 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2821,15 +2821,7 @@ static void nvme_reset_work(struct work_struct *work)
>   	result = nvme_pci_enable(dev);
>   	if (result)
>   		goto out_unlock;
> -
> -	if (!dev->ctrl.admin_q) {
> -		result = nvme_pci_alloc_admin_tag_set(dev);
> -		if (result)
> -			goto out_unlock;
> -	} else {
> -		nvme_start_admin_queue(&dev->ctrl);
> -	}
> -
> +	nvme_start_admin_queue(&dev->ctrl);
>   	mutex_unlock(&dev->shutdown_lock);
>   
>   	/*
> @@ -2854,9 +2846,6 @@ static void nvme_reset_work(struct work_struct *work)
>   		 */
>   		memset(dev->dbbuf_dbs, 0, nvme_dbbuf_size(dev));
>   		memset(dev->dbbuf_eis, 0, nvme_dbbuf_size(dev));
> -	} else {
> -		if (dev->ctrl.oacs & NVME_CTRL_OACS_DBBUF_SUPP)
> -			nvme_dbbuf_dma_alloc(dev);
>   	}
>   
>   	if (dev->ctrl.hmpre) {
> @@ -2869,37 +2858,23 @@ static void nvme_reset_work(struct work_struct *work)
>   	if (result)
>   		goto out;
>   
> -	if (dev->ctrl.tagset) {
> -		/*
> -		 * This is a controller reset and we already have a tagset.
> -		 * Freeze and update the number of I/O queues as thos might have
> -		 * changed.  If there are no I/O queues left after this reset,
> -		 * keep the controller around but remove all namespaces.
> -		 */
> -		if (dev->online_queues > 1) {
> -			nvme_start_queues(&dev->ctrl);
> -			nvme_wait_freeze(&dev->ctrl);
> -			nvme_pci_update_nr_queues(dev);
> -			nvme_dbbuf_set(dev);
> -			nvme_unfreeze(&dev->ctrl);
> -		} else {
> -			dev_warn(dev->ctrl.device, "IO queues lost\n");
> -			nvme_mark_namespaces_dead(&dev->ctrl);
> -			nvme_start_queues(&dev->ctrl);
> -			nvme_remove_namespaces(&dev->ctrl);
> -			nvme_free_tagset(dev);
> -		}
> +	/*
> +	 * Freeze and update the number of I/O queues as thos might have
> +	 * changed.  If there are no I/O queues left after this reset, keep the
> +	 * controller around but remove all namespaces.
> +	 */
> +	if (dev->online_queues > 1) {
> +		nvme_start_queues(&dev->ctrl);
> +		nvme_wait_freeze(&dev->ctrl);
> +		nvme_pci_update_nr_queues(dev);
> +		nvme_dbbuf_set(dev);
> +		nvme_unfreeze(&dev->ctrl);
>   	} else {
> -		/*
> -		 * First probe.  Still allow the controller to show up even if
> -		 * there are no namespaces.
> -		 */
> -		if (dev->online_queues > 1) {
> -			nvme_pci_alloc_tag_set(dev);
> -			nvme_dbbuf_set(dev);
> -		} else {
> -			dev_warn(dev->ctrl.device, "IO queues not created\n");
> -		}
> +		dev_warn(dev->ctrl.device, "IO queues lost\n");
> +		nvme_mark_namespaces_dead(&dev->ctrl);
> +		nvme_start_queues(&dev->ctrl);
> +		nvme_remove_namespaces(&dev->ctrl);
> +		nvme_free_tagset(dev);
nvme_free_tagset is not necessary when IO queues lost.
nvme_free_tagset can be called in nvme_pci_free_ctrl.
If we call nvme_free_tagset here, nvme_dev_disable will still cause a NULL pointer references
in blk_mq_quiesce_tagset.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/12] nvme-pci: split the initial probe from the rest path
@ 2022-11-10  3:17     ` Chao Leng
  0 siblings, 0 replies; 80+ messages in thread
From: Chao Leng @ 2022-11-10  3:17 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: Sagi Grimberg, Chaitanya Kulkarni, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme



On 2022/11/8 23:02, Christoph Hellwig wrote:
> nvme_reset_work is a little fragile as it needs to handle both resetting
> a live controller and initializing one during probe.  Split out the initial
> probe and open code it in nvme_probe and leave nvme_reset_work to just do
> the live controller reset.
> 
> This fixes a recently introduced bug where nvme_dev_disable causes a NULL
> pointer dereferences in blk_mq_quiesce_tagset because the tagset pointer
> is not set when the reset state is entered directly from the new state.
> The separate probe code can skip the reset state and probe directly and
> fixes this.
> 
> To make sure the system isn't single threaded on enabling nvme
> controllers, set the PROBE_PREFER_ASYNCHRONOUS flag in the device_driver
> structure so that the driver core probes in parallel.
> 
> Fixes: 98d81f0df70c ("nvme: use blk_mq_[un]quiesce_tagset")
> Reported-by: Gerd Bayer <gbayer@linux.ibm.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/pci.c | 139 ++++++++++++++++++++++++----------------
>   1 file changed, 83 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 1b3c96a4b7c90..1c8c70767cb8a 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2821,15 +2821,7 @@ static void nvme_reset_work(struct work_struct *work)
>   	result = nvme_pci_enable(dev);
>   	if (result)
>   		goto out_unlock;
> -
> -	if (!dev->ctrl.admin_q) {
> -		result = nvme_pci_alloc_admin_tag_set(dev);
> -		if (result)
> -			goto out_unlock;
> -	} else {
> -		nvme_start_admin_queue(&dev->ctrl);
> -	}
> -
> +	nvme_start_admin_queue(&dev->ctrl);
>   	mutex_unlock(&dev->shutdown_lock);
>   
>   	/*
> @@ -2854,9 +2846,6 @@ static void nvme_reset_work(struct work_struct *work)
>   		 */
>   		memset(dev->dbbuf_dbs, 0, nvme_dbbuf_size(dev));
>   		memset(dev->dbbuf_eis, 0, nvme_dbbuf_size(dev));
> -	} else {
> -		if (dev->ctrl.oacs & NVME_CTRL_OACS_DBBUF_SUPP)
> -			nvme_dbbuf_dma_alloc(dev);
>   	}
>   
>   	if (dev->ctrl.hmpre) {
> @@ -2869,37 +2858,23 @@ static void nvme_reset_work(struct work_struct *work)
>   	if (result)
>   		goto out;
>   
> -	if (dev->ctrl.tagset) {
> -		/*
> -		 * This is a controller reset and we already have a tagset.
> -		 * Freeze and update the number of I/O queues as thos might have
> -		 * changed.  If there are no I/O queues left after this reset,
> -		 * keep the controller around but remove all namespaces.
> -		 */
> -		if (dev->online_queues > 1) {
> -			nvme_start_queues(&dev->ctrl);
> -			nvme_wait_freeze(&dev->ctrl);
> -			nvme_pci_update_nr_queues(dev);
> -			nvme_dbbuf_set(dev);
> -			nvme_unfreeze(&dev->ctrl);
> -		} else {
> -			dev_warn(dev->ctrl.device, "IO queues lost\n");
> -			nvme_mark_namespaces_dead(&dev->ctrl);
> -			nvme_start_queues(&dev->ctrl);
> -			nvme_remove_namespaces(&dev->ctrl);
> -			nvme_free_tagset(dev);
> -		}
> +	/*
> +	 * Freeze and update the number of I/O queues as thos might have
> +	 * changed.  If there are no I/O queues left after this reset, keep the
> +	 * controller around but remove all namespaces.
> +	 */
> +	if (dev->online_queues > 1) {
> +		nvme_start_queues(&dev->ctrl);
> +		nvme_wait_freeze(&dev->ctrl);
> +		nvme_pci_update_nr_queues(dev);
> +		nvme_dbbuf_set(dev);
> +		nvme_unfreeze(&dev->ctrl);
>   	} else {
> -		/*
> -		 * First probe.  Still allow the controller to show up even if
> -		 * there are no namespaces.
> -		 */
> -		if (dev->online_queues > 1) {
> -			nvme_pci_alloc_tag_set(dev);
> -			nvme_dbbuf_set(dev);
> -		} else {
> -			dev_warn(dev->ctrl.device, "IO queues not created\n");
> -		}
> +		dev_warn(dev->ctrl.device, "IO queues lost\n");
> +		nvme_mark_namespaces_dead(&dev->ctrl);
> +		nvme_start_queues(&dev->ctrl);
> +		nvme_remove_namespaces(&dev->ctrl);
> +		nvme_free_tagset(dev);
nvme_free_tagset is not necessary when IO queues lost.
nvme_free_tagset can be called in nvme_pci_free_ctrl.
If we call nvme_free_tagset here, nvme_dev_disable will still cause a NULL pointer references
in blk_mq_quiesce_tagset.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/12] nvme: move OPAL setup from PCIe to core
  2022-11-09 23:22       ` Chaitanya Kulkarni
@ 2022-11-13 16:15         ` Christoph Hellwig
  -1 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-13 16:15 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Keith Busch, Christoph Hellwig, Sagi Grimberg, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

On Wed, Nov 09, 2022 at 11:22:27PM +0000, Chaitanya Kulkarni wrote:
> why not move the sed-opal-code into its own file ?

It's literally just two functions.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/12] nvme: move OPAL setup from PCIe to core
@ 2022-11-13 16:15         ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-13 16:15 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Keith Busch, Christoph Hellwig, Sagi Grimberg, Gerd Bayer, asahi,
	linux-arm-kernel, linux-nvme

On Wed, Nov 09, 2022 at 11:22:27PM +0000, Chaitanya Kulkarni wrote:
> why not move the sed-opal-code into its own file ?

It's literally just two functions.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/12] nvme-pci: split the initial probe from the rest path
  2022-11-10  3:17     ` Chao Leng
@ 2022-11-13 16:19       ` Christoph Hellwig
  -1 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-13 16:19 UTC (permalink / raw)
  To: Chao Leng
  Cc: Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel,
	linux-nvme

On Thu, Nov 10, 2022 at 11:17:22AM +0800, Chao Leng wrote:
> nvme_free_tagset is not necessary when IO queues lost.
> nvme_free_tagset can be called in nvme_pci_free_ctrl.

Yes, it could, but that isn't really required here.

> If we call nvme_free_tagset here, nvme_dev_disable will still cause a NULL pointer references
> in blk_mq_quiesce_tagset.

True.  I have another series as a follow up to sort out some more
of the tagset mess.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/12] nvme-pci: split the initial probe from the rest path
@ 2022-11-13 16:19       ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2022-11-13 16:19 UTC (permalink / raw)
  To: Chao Leng
  Cc: Christoph Hellwig, Keith Busch, Sagi Grimberg,
	Chaitanya Kulkarni, Gerd Bayer, asahi, linux-arm-kernel,
	linux-nvme

On Thu, Nov 10, 2022 at 11:17:22AM +0800, Chao Leng wrote:
> nvme_free_tagset is not necessary when IO queues lost.
> nvme_free_tagset can be called in nvme_pci_free_ctrl.

Yes, it could, but that isn't really required here.

> If we call nvme_free_tagset here, nvme_dev_disable will still cause a NULL pointer references
> in blk_mq_quiesce_tagset.

True.  I have another series as a follow up to sort out some more
of the tagset mess.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2022-11-13 16:20 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-08 15:02 RFC: nvme-pci: split the probe and reset handlers Christoph Hellwig
2022-11-08 15:02 ` Christoph Hellwig
2022-11-08 15:02 ` [PATCH 01/12] nvme-pci: don't call nvme_init_ctrl_finish from nvme_passthru_end Christoph Hellwig
2022-11-08 15:02   ` Christoph Hellwig
2022-11-09  2:55   ` Sagi Grimberg
2022-11-09  2:55     ` Sagi Grimberg
2022-11-09  6:26     ` Christoph Hellwig
2022-11-09  6:26       ` Christoph Hellwig
2022-11-08 15:02 ` [PATCH 02/12] nvme: move OPAL setup from PCIe to core Christoph Hellwig
2022-11-08 15:02   ` Christoph Hellwig
2022-11-09  2:55   ` Sagi Grimberg
2022-11-09  2:55     ` Sagi Grimberg
2022-11-09 20:44   ` Keith Busch
2022-11-09 20:44     ` Keith Busch
2022-11-09 23:22     ` Chaitanya Kulkarni
2022-11-09 23:22       ` Chaitanya Kulkarni
2022-11-13 16:15       ` Christoph Hellwig
2022-11-13 16:15         ` Christoph Hellwig
2022-11-08 15:02 ` [PATCH 03/12] nvme: simplify transport specific device attribute handling Christoph Hellwig
2022-11-08 15:02   ` Christoph Hellwig
2022-11-09  2:57   ` Sagi Grimberg
2022-11-09  2:57     ` Sagi Grimberg
2022-11-08 15:02 ` [PATCH 04/12] nvme-pci: put the admin queue in nvme_dev_remove_admin Christoph Hellwig
2022-11-08 15:02   ` Christoph Hellwig
2022-11-09  2:58   ` Sagi Grimberg
2022-11-09  2:58     ` Sagi Grimberg
2022-11-09  6:28     ` Christoph Hellwig
2022-11-09  6:28       ` Christoph Hellwig
2022-11-08 15:02 ` [PATCH 05/12] nvme-pci: move more teardown work to nvme_remove Christoph Hellwig
2022-11-08 15:02   ` Christoph Hellwig
2022-11-09  3:00   ` Sagi Grimberg
2022-11-09  3:00     ` Sagi Grimberg
2022-11-08 15:02 ` [PATCH 06/12] nvme-pci: factor the iod mempool creation into a helper Christoph Hellwig
2022-11-08 15:02   ` Christoph Hellwig
2022-11-09  3:00   ` Sagi Grimberg
2022-11-09  3:00     ` Sagi Grimberg
2022-11-08 15:02 ` [PATCH 07/12] nvme-pci: factor out a nvme_pci_alloc_ctrl helper Christoph Hellwig
2022-11-08 15:02   ` Christoph Hellwig
2022-11-09  3:03   ` Sagi Grimberg
2022-11-09  3:03     ` Sagi Grimberg
2022-11-09  6:28     ` Christoph Hellwig
2022-11-09  6:28       ` Christoph Hellwig
2022-11-08 15:02 ` [PATCH 08/12] nvme-pci: set constant paramters in nvme_pci_alloc_ctrl Christoph Hellwig
2022-11-08 15:02   ` Christoph Hellwig
2022-11-09  3:03   ` Sagi Grimberg
2022-11-09  3:03     ` Sagi Grimberg
2022-11-08 15:02 ` [PATCH 09/12] nvme-pci: call nvme_pci_configure_admin_queue from nvme_pci_enable Christoph Hellwig
2022-11-08 15:02   ` Christoph Hellwig
2022-11-09  3:04   ` Sagi Grimberg
2022-11-09  3:04     ` Sagi Grimberg
2022-11-08 15:02 ` [PATCH 10/12] nvme-pci: split nvme_dbbuf_dma_alloc Christoph Hellwig
2022-11-08 15:02   ` Christoph Hellwig
2022-11-09  3:05   ` Sagi Grimberg
2022-11-09  3:05     ` Sagi Grimberg
2022-11-08 15:02 ` [PATCH 11/12] nvme-pci: split the initial probe from the rest path Christoph Hellwig
2022-11-08 15:02   ` Christoph Hellwig
2022-11-09  3:14   ` Sagi Grimberg
2022-11-09  3:14     ` Sagi Grimberg
2022-11-09  6:31     ` Christoph Hellwig
2022-11-09  6:31       ` Christoph Hellwig
2022-11-09 17:00       ` Keith Busch
2022-11-09 17:00         ` Keith Busch
2022-11-09 15:18   ` Gerd Bayer
2022-11-09 15:18     ` Gerd Bayer
2022-11-09 15:51   ` Keith Busch
2022-11-09 15:51     ` Keith Busch
2022-11-09 15:56   ` Keith Busch
2022-11-09 15:56     ` Keith Busch
2022-11-10  3:17   ` Chao Leng
2022-11-10  3:17     ` Chao Leng
2022-11-13 16:19     ` Christoph Hellwig
2022-11-13 16:19       ` Christoph Hellwig
2022-11-08 15:02 ` [PATCH 12/12] nvme-pci: don't unbind the driver on reset failure Christoph Hellwig
2022-11-08 15:02   ` Christoph Hellwig
2022-11-09  3:15   ` Sagi Grimberg
2022-11-09  3:15     ` Sagi Grimberg
2022-11-09 17:10   ` Keith Busch
2022-11-09 17:10     ` Keith Busch
2022-11-09 17:12 ` RFC: nvme-pci: split the probe and reset handlers Keith Busch
2022-11-09 17:12   ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.