All of lore.kernel.org
 help / color / mirror / Atom feed
* Make NVME shutdown two-pass - Version 3
@ 2023-12-21 17:22 Jeremy Allison
  2023-12-21 17:22 ` [PATCH 1/4] driver core: Support two-pass driver shutdown Jeremy Allison
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Jeremy Allison @ 2023-12-21 17:22 UTC (permalink / raw)
  To: jallison, jra, tansuresh, hch, gregkh, rafael, bhelgaas; +Cc: linux-nvme

This is version 3 of a patchset originally written by
Tanjore Suresh <tansuresh@google.com> to make shutdown
of nvme devices two-pass.

Changes from version 2:

1). As requested by Christoph Hellwig, don't add shutdown_pre()
shutdown_post() methods, only add a shutdown_wait() method
to prevent clutter in the interfaces. Two pass shutdown
is only done if both shutdown() and shutdown_wait() methods
are defined.

2). As requested by Christoph Hellwig, split out the addition
of the 'enum shutdown_type' into a separate patch to make the
subsequent changes clearer.

3). Use 'two-pass' instead of async in the interface
documentation. This isn't truly async as there's no
notification on completion.

-------------------------------------------------------------
Currently the Linux nvme driver shutdown code steps
through each connected drive, sets the NVME_CC_SHN_NORMAL
(normal shutdown) flag and then polls the given drive
waiting for the response NVME_CSTS_SHST_CMPLT flag
(shutdown complete).

Each drive is taking around 13 seconds to respond to this.

The customer has 20+ drives on the box so this time adds
up on shutdown when the nvme driver is being shut down.

This patchset changes shutdown to proceed in parallel,
so the NVME_CC_SHN_NORMAL (normal shutdown) flag is
sent to all drives first, and then it polls waiting
for the NVME_CSTS_SHST_CMPLT flag (shutdown complete)
for all drives.

In the specific customer case it reduces the NVME
shutdown time from over 300 seconds to around 15
seconds.
-------------------------------------------------------------

Thanks for your consideration,

Jeremy Allison.
CIQ / Samba Team.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/4] driver core: Support two-pass driver shutdown
  2023-12-21 17:22 Make NVME shutdown two-pass - Version 3 Jeremy Allison
@ 2023-12-21 17:22 ` Jeremy Allison
  2023-12-21 17:29   ` Greg KH
  2023-12-27 20:33   ` Bjorn Helgaas
  2023-12-21 17:22 ` [PATCH 2/4] PCI: Support two-pass shutdown Jeremy Allison
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 22+ messages in thread
From: Jeremy Allison @ 2023-12-21 17:22 UTC (permalink / raw)
  To: jallison, jra, tansuresh, hch, gregkh, rafael, bhelgaas; +Cc: linux-nvme

From: Tanjore Suresh <tansuresh@google.com>

This changes the bus driver interface with an additional entry point
to enable devices to implement two-pass shutdown. The existing
synchronous interface to shutdown is called, and if a shutdown_wait
method is defined the device is moved to an alternate list.

Once the shutdown method is called for all devices, the
shutdown_wait method is then called synchronously for
all devices on the alternate list.

Signed-off-by: Tanjore Suresh <tansuresh@google.com>
Signed-off-by: Jeremy Allison <jallison@ciq.com>
---
 drivers/base/core.c        | 35 +++++++++++++++++++++++++++++++++++
 include/linux/device/bus.h |  6 +++++-
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 67ba592afc77..a66988cbaad1 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -4725,6 +4725,7 @@ EXPORT_SYMBOL_GPL(device_change_owner);
 void device_shutdown(void)
 {
 	struct device *dev, *parent;
+	LIST_HEAD(shutdown_wait_list);
 
 	wait_for_device_probe();
 	device_block_probing();
@@ -4769,10 +4770,15 @@ void device_shutdown(void)
 				dev_info(dev, "shutdown_pre\n");
 			dev->class->shutdown_pre(dev);
 		}
+
 		if (dev->bus && dev->bus->shutdown) {
 			if (initcall_debug)
 				dev_info(dev, "shutdown\n");
 			dev->bus->shutdown(dev);
+			/* Only put the device on the shutdown_wait_list
+			 * if a shutdown_wait() method is also defined. */
+			if (dev->bus->shutdown_wait)
+				list_add(&dev->kobj.entry, &shutdown_wait_list);
 		} else if (dev->driver && dev->driver->shutdown) {
 			if (initcall_debug)
 				dev_info(dev, "shutdown\n");
@@ -4789,6 +4795,35 @@ void device_shutdown(void)
 		spin_lock(&devices_kset->list_lock);
 	}
 	spin_unlock(&devices_kset->list_lock);
+
+	/*
+	 * Second pass only for devices that have configured
+	 * a shutdown_wait() method.
+	 */
+	while (!list_empty(&shutdown_wait_list)) {
+		dev = list_entry(shutdown_wait_list.next, struct device,
+				kobj.entry);
+		parent = get_device(dev->parent);
+		get_device(dev);
+		/*
+		 * Make sure the device is off the  list
+		 */
+		list_del_init(&dev->kobj.entry);
+		if (parent)
+			device_lock(parent);
+		device_lock(dev);
+		if (dev->bus && dev->bus->shutdown_wait) {
+			if (initcall_debug)
+				dev_info(dev,
+				"shutdown_wait called\n");
+			dev->bus->shutdown_wait(dev);
+		}
+		device_unlock(dev);
+		if (parent)
+			device_unlock(parent);
+		put_device(dev);
+		put_device(parent);
+	}
 }
 
 /*
diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
index ae10c4322754..6f192221b551 100644
--- a/include/linux/device/bus.h
+++ b/include/linux/device/bus.h
@@ -48,7 +48,10 @@ struct fwnode_handle;
  *		will never get called until they do.
  * @remove:	Called when a device removed from this bus.
  * @shutdown:	Called at shut-down time to quiesce the device.
- *
+ * @shutdown_wait:	If this method exists, devices are stored on a separate
+ * 			list after shutdown() has been called and
+ * 			shutdown_wait() is called synchronously on each device
+ * 			in this list in turn.
  * @online:	Called to put the device back online (after offlining it).
  * @offline:	Called to put the device offline for hot-removal. May fail.
  *
@@ -90,6 +93,7 @@ struct bus_type {
 	void (*sync_state)(struct device *dev);
 	void (*remove)(struct device *dev);
 	void (*shutdown)(struct device *dev);
+	void (*shutdown_wait)(struct device *dev);
 
 	int (*online)(struct device *dev);
 	int (*offline)(struct device *dev);
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/4] PCI: Support two-pass shutdown
  2023-12-21 17:22 Make NVME shutdown two-pass - Version 3 Jeremy Allison
  2023-12-21 17:22 ` [PATCH 1/4] driver core: Support two-pass driver shutdown Jeremy Allison
@ 2023-12-21 17:22 ` Jeremy Allison
  2023-12-21 17:22 ` [PATCH 3/4] Change 'bool shutdown' into an enum shutdown_type { NVME_DISABLE_RESET = 0, NVME_DISABLE_SHUTDOWN_SYNC = 1 } Jeremy Allison
  2023-12-21 17:22 ` [PATCH 4/4] nvme: Add two-pass shutdown support Jeremy Allison
  3 siblings, 0 replies; 22+ messages in thread
From: Jeremy Allison @ 2023-12-21 17:22 UTC (permalink / raw)
  To: jallison, jra, tansuresh, hch, gregkh, rafael, bhelgaas; +Cc: linux-nvme

From: Tanjore Suresh <tansuresh@google.com>

Enhances the base PCI driver to add support for two-pass
shutdown. Adds shutdown_wait() method.

Assume a device takes n secs to shutdown. If a machine has been
populated with M such devices, the total time spent in shutting down
all the devices will be M * n secs if the shutdown is done
synchronously. For example, if NVMe PCI Controllers take 5 secs
to shutdown and if there are 16 such NVMe controllers in a system,
system will spend a total of 80 secs to shutdown all
NVMe devices in that system.

In order to speed up the shutdown time, a two-pass interface to
shutdown has been implemented. The caller calls the shutdown method
for each device in turn to allow a shutdown request to be sent,
then the caller walks the list of devices and calls shutdown_wait()
to synchronously wait for the shutdown to complete.

In the NVMe case above, all 16 devices will process the shutdown
in parallel taking the total time to shutdown down to the time
for one NVMe PCI Controller to shut down.

This will significantly reduce the machine reboot time.

Signed-off-by: Tanjore Suresh <tansuresh@google.com>
Signed-off-by: Jeremy Allison <jallison@ciq.com>
---
 drivers/pci/pci-driver.c | 9 +++++++++
 include/linux/pci.h      | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 51ec9e7e784f..257bbb04c806 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -547,6 +547,14 @@ static int pci_restore_standard_config(struct pci_dev *pci_dev)
 }
 #endif /* CONFIG_PM_SLEEP */
 
+static void pci_device_shutdown_wait(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct pci_driver *drv = pci_dev->driver;
+
+	if (drv && drv->shutdown_wait)
+		drv->shutdown_wait(pci_dev);
+}
 #ifdef CONFIG_PM
 
 /* Auxiliary functions used for system resume and run-time resume */
@@ -1682,6 +1690,7 @@ struct bus_type pci_bus_type = {
 	.probe		= pci_device_probe,
 	.remove		= pci_device_remove,
 	.shutdown	= pci_device_shutdown,
+	.shutdown_wait	= pci_device_shutdown_wait,
 	.dev_groups	= pci_dev_groups,
 	.bus_groups	= pci_bus_groups,
 	.drv_groups	= pci_drv_groups,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 60ca768bc867..e3ba7043c7de 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -917,6 +917,7 @@ struct module;
  *		Useful for enabling wake-on-lan (NIC) or changing
  *		the power state of a device before reboot.
  *		e.g. drivers/net/e100.c.
+ * @shutdown_wait: Optional driver callback to allow two-pass shutdown.
  * @sriov_configure: Optional driver callback to allow configuration of
  *		number of VFs to enable via sysfs "sriov_numvfs" file.
  * @sriov_set_msix_vec_count: PF Driver callback to change number of MSI-X
@@ -948,6 +949,7 @@ struct pci_driver {
 	int  (*suspend)(struct pci_dev *dev, pm_message_t state);	/* Device suspended */
 	int  (*resume)(struct pci_dev *dev);	/* Device woken up */
 	void (*shutdown)(struct pci_dev *dev);
+	void (*shutdown_wait)(struct pci_dev *dev);
 	int  (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */
 	int  (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */
 	u32  (*sriov_get_vf_total_msix)(struct pci_dev *pf);
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/4] Change 'bool shutdown' into an enum shutdown_type { NVME_DISABLE_RESET = 0, NVME_DISABLE_SHUTDOWN_SYNC = 1 }.
  2023-12-21 17:22 Make NVME shutdown two-pass - Version 3 Jeremy Allison
  2023-12-21 17:22 ` [PATCH 1/4] driver core: Support two-pass driver shutdown Jeremy Allison
  2023-12-21 17:22 ` [PATCH 2/4] PCI: Support two-pass shutdown Jeremy Allison
@ 2023-12-21 17:22 ` Jeremy Allison
  2023-12-27 19:33   ` Bjorn Helgaas
  2023-12-21 17:22 ` [PATCH 4/4] nvme: Add two-pass shutdown support Jeremy Allison
  3 siblings, 1 reply; 22+ messages in thread
From: Jeremy Allison @ 2023-12-21 17:22 UTC (permalink / raw)
  To: jallison, jra, tansuresh, hch, gregkh, rafael, bhelgaas; +Cc: linux-nvme

Convert nvme_disable_ctrl() and nvme_dev_disable()
inside drivers/nvme/host/pci.c to use this

bool shutdown = false == NVME_DISABLE_RESET
bool shutdown = true == NVME_DISABLE_SHUTDOWN_SYNC.

This will make it easier to add a third request: NVME_DISABLE_SHUTDOWN_ASNYC
later.

As nvme_disable_ctrl() is used outside of drivers/nvme/host/pci.c,
convert the callers of nvme_disable_ctrl() to this convention too.

Signed-off-by: Jeremy Allison <jallison@ciq.com>
---
 drivers/nvme/host/apple.c  |  4 ++--
 drivers/nvme/host/core.c   |  6 +++---
 drivers/nvme/host/nvme.h   |  7 +++++-
 drivers/nvme/host/pci.c    | 44 +++++++++++++++++++-------------------
 drivers/nvme/host/rdma.c   |  3 ++-
 drivers/nvme/host/tcp.c    |  3 ++-
 drivers/nvme/target/loop.c |  2 +-
 7 files changed, 38 insertions(+), 31 deletions(-)

diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
index 596bb11eeba5..764639ede41d 100644
--- a/drivers/nvme/host/apple.c
+++ b/drivers/nvme/host/apple.c
@@ -844,8 +844,8 @@ static void apple_nvme_disable(struct apple_nvme *anv, bool shutdown)
 		 * NVMe disabled state, after a clean shutdown).
 		 */
 		if (shutdown)
-			nvme_disable_ctrl(&anv->ctrl, shutdown);
-		nvme_disable_ctrl(&anv->ctrl, false);
+			nvme_disable_ctrl(&anv->ctrl, NVME_DISABLE_SHUTDOWN_SYNC);
+		nvme_disable_ctrl(&anv->ctrl, NVME_DISABLE_RESET);
 	}
 
 	WRITE_ONCE(anv->ioq.enabled, false);
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d144d1acb09a..bc7040da8e74 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2211,12 +2211,12 @@ static int nvme_wait_ready(struct nvme_ctrl *ctrl, u32 mask, u32 val,
 	return ret;
 }
 
-int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
+int nvme_disable_ctrl(struct nvme_ctrl *ctrl, enum shutdown_type shutdown_type)
 {
 	int ret;
 
 	ctrl->ctrl_config &= ~NVME_CC_SHN_MASK;
-	if (shutdown)
+	if (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC)
 		ctrl->ctrl_config |= NVME_CC_SHN_NORMAL;
 	else
 		ctrl->ctrl_config &= ~NVME_CC_ENABLE;
@@ -2225,7 +2225,7 @@ int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
 	if (ret)
 		return ret;
 
-	if (shutdown) {
+	if (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC) {
 		return nvme_wait_ready(ctrl, NVME_CSTS_SHST_MASK,
 				       NVME_CSTS_SHST_CMPLT,
 				       ctrl->shutdown_timeout, "shutdown");
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 3dbd187896d8..d880f1ee08d4 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -187,6 +187,11 @@ enum {
 	NVME_MPATH_IO_STATS		= (1 << 2),
 };
 
+enum shutdown_type {
+	NVME_DISABLE_RESET = 0,
+	NVME_DISABLE_SHUTDOWN_SYNC = 1
+};
+
 static inline struct nvme_request *nvme_req(struct request *req)
 {
 	return blk_mq_rq_to_pdu(req);
@@ -750,7 +755,7 @@ void nvme_cancel_tagset(struct nvme_ctrl *ctrl);
 void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl);
 bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 		enum nvme_ctrl_state new_state);
-int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown);
+int nvme_disable_ctrl(struct nvme_ctrl *ctrl, enum shutdown_type shutdown_type);
 int nvme_enable_ctrl(struct nvme_ctrl *ctrl);
 int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 		const struct nvme_ctrl_ops *ops, unsigned long quirks);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 507bc149046d..77b015affb0b 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -108,7 +108,7 @@ MODULE_PARM_DESC(noacpi, "disable acpi bios quirks");
 struct nvme_dev;
 struct nvme_queue;
 
-static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
+static void nvme_dev_disable(struct nvme_dev *dev, enum shutdown_type shutdown_type);
 static void nvme_delete_io_queues(struct nvme_dev *dev);
 static void nvme_update_attrs(struct nvme_dev *dev);
 
@@ -1330,7 +1330,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
 			 "I/O %d QID %d timeout, disable controller\n",
 			 req->tag, nvmeq->qid);
 		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
-		nvme_dev_disable(dev, true);
+		nvme_dev_disable(dev, NVME_DISABLE_SHUTDOWN_SYNC);
 		return BLK_EH_DONE;
 	case NVME_CTRL_RESETTING:
 		return BLK_EH_RESET_TIMER;
@@ -1390,7 +1390,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
 	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING))
 		return BLK_EH_DONE;
 
-	nvme_dev_disable(dev, false);
+	nvme_dev_disable(dev, NVME_DISABLE_RESET);
 	if (nvme_try_sched_reset(&dev->ctrl))
 		nvme_unquiesce_io_queues(&dev->ctrl);
 	return BLK_EH_DONE;
@@ -1736,7 +1736,7 @@ static int nvme_pci_configure_admin_queue(struct nvme_dev *dev)
 	 * commands to the admin queue ... and we don't know what memory that
 	 * might be pointing at!
 	 */
-	result = nvme_disable_ctrl(&dev->ctrl, false);
+	result = nvme_disable_ctrl(&dev->ctrl, NVME_DISABLE_RESET);
 	if (result < 0)
 		return result;
 
@@ -2571,7 +2571,7 @@ static bool nvme_pci_ctrl_is_dead(struct nvme_dev *dev)
 	return (csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY);
 }
 
-static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
+static void nvme_dev_disable(struct nvme_dev *dev, enum shutdown_type shutdown_type)
 {
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 	bool dead;
@@ -2586,7 +2586,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 		 * Give the controller a chance to complete all entered requests
 		 * if doing a safe shutdown.
 		 */
-		if (!dead && shutdown)
+		if (!dead && (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC))
 			nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
 	}
 
@@ -2594,7 +2594,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 
 	if (!dead && dev->ctrl.queue_count > 0) {
 		nvme_delete_io_queues(dev);
-		nvme_disable_ctrl(&dev->ctrl, shutdown);
+		nvme_disable_ctrl(&dev->ctrl, shutdown_type);
 		nvme_poll_irqdisable(&dev->queues[0]);
 	}
 	nvme_suspend_io_queues(dev);
@@ -2612,7 +2612,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	 * must flush all entered requests to their failed completion to avoid
 	 * deadlocking blk-mq hot-cpu notifier.
 	 */
-	if (shutdown) {
+	if (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC) {
 		nvme_unquiesce_io_queues(&dev->ctrl);
 		if (dev->ctrl.admin_q && !blk_queue_dying(dev->ctrl.admin_q))
 			nvme_unquiesce_admin_queue(&dev->ctrl);
@@ -2620,11 +2620,11 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	mutex_unlock(&dev->shutdown_lock);
 }
 
-static int nvme_disable_prepare_reset(struct nvme_dev *dev, bool shutdown)
+static int nvme_disable_prepare_reset(struct nvme_dev *dev, enum shutdown_type shutdown_type)
 {
 	if (!nvme_wait_reset(&dev->ctrl))
 		return -EBUSY;
-	nvme_dev_disable(dev, shutdown);
+	nvme_dev_disable(dev, shutdown_type);
 	return 0;
 }
 
@@ -2702,7 +2702,7 @@ static void nvme_reset_work(struct work_struct *work)
 	 * moving on.
 	 */
 	if (dev->ctrl.ctrl_config & NVME_CC_ENABLE)
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, NVME_DISABLE_RESET);
 	nvme_sync_queues(&dev->ctrl);
 
 	mutex_lock(&dev->shutdown_lock);
@@ -2780,7 +2780,7 @@ static void nvme_reset_work(struct work_struct *work)
 	dev_warn(dev->ctrl.device, "Disabling device after reset failure: %d\n",
 		 result);
 	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
-	nvme_dev_disable(dev, true);
+	nvme_dev_disable(dev, NVME_DISABLE_SHUTDOWN_SYNC);
 	nvme_sync_queues(&dev->ctrl);
 	nvme_mark_namespaces_dead(&dev->ctrl);
 	nvme_unquiesce_io_queues(&dev->ctrl);
@@ -3058,7 +3058,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 out_disable:
 	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
-	nvme_dev_disable(dev, true);
+	nvme_dev_disable(dev, NVME_DISABLE_SHUTDOWN_SYNC);
 	nvme_free_host_mem(dev);
 	nvme_dev_remove_admin(dev);
 	nvme_dbbuf_dma_free(dev);
@@ -3084,7 +3084,7 @@ static void nvme_reset_prepare(struct pci_dev *pdev)
 	 * state as pci_dev device lock is held, making it impossible to race
 	 * with ->remove().
 	 */
-	nvme_disable_prepare_reset(dev, false);
+	nvme_disable_prepare_reset(dev, NVME_DISABLE_RESET);
 	nvme_sync_queues(&dev->ctrl);
 }
 
@@ -3100,7 +3100,7 @@ static void nvme_shutdown(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
 
-	nvme_disable_prepare_reset(dev, true);
+	nvme_disable_prepare_reset(dev, NVME_DISABLE_SHUTDOWN_SYNC);
 }
 
 /*
@@ -3117,13 +3117,13 @@ static void nvme_remove(struct pci_dev *pdev)
 
 	if (!pci_device_is_present(pdev)) {
 		nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DEAD);
-		nvme_dev_disable(dev, true);
+		nvme_dev_disable(dev, NVME_DISABLE_SHUTDOWN_SYNC);
 	}
 
 	flush_work(&dev->ctrl.reset_work);
 	nvme_stop_ctrl(&dev->ctrl);
 	nvme_remove_namespaces(&dev->ctrl);
-	nvme_dev_disable(dev, true);
+	nvme_dev_disable(dev, NVME_DISABLE_SHUTDOWN_SYNC);
 	nvme_free_host_mem(dev);
 	nvme_dev_remove_admin(dev);
 	nvme_dbbuf_dma_free(dev);
@@ -3186,7 +3186,7 @@ static int nvme_suspend(struct device *dev)
 	if (pm_suspend_via_firmware() || !ctrl->npss ||
 	    !pcie_aspm_enabled(pdev) ||
 	    (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
-		return nvme_disable_prepare_reset(ndev, true);
+		return nvme_disable_prepare_reset(ndev, NVME_DISABLE_SHUTDOWN_SYNC);
 
 	nvme_start_freeze(ctrl);
 	nvme_wait_freeze(ctrl);
@@ -3229,7 +3229,7 @@ static int nvme_suspend(struct device *dev)
 		 * Clearing npss forces a controller reset on resume. The
 		 * correct value will be rediscovered then.
 		 */
-		ret = nvme_disable_prepare_reset(ndev, true);
+		ret = nvme_disable_prepare_reset(ndev, NVME_DISABLE_SHUTDOWN_SYNC);
 		ctrl->npss = 0;
 	}
 unfreeze:
@@ -3241,7 +3241,7 @@ static int nvme_simple_suspend(struct device *dev)
 {
 	struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev));
 
-	return nvme_disable_prepare_reset(ndev, true);
+	return nvme_disable_prepare_reset(ndev, NVME_DISABLE_SHUTDOWN_SYNC);
 }
 
 static int nvme_simple_resume(struct device *dev)
@@ -3279,10 +3279,10 @@ static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
 		dev_warn(dev->ctrl.device,
 			"frozen state error detected, reset controller\n");
 		if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
-			nvme_dev_disable(dev, true);
+			nvme_dev_disable(dev, NVME_DISABLE_SHUTDOWN_SYNC);
 			return PCI_ERS_RESULT_DISCONNECT;
 		}
-		nvme_dev_disable(dev, false);
+		nvme_dev_disable(dev, NVME_DISABLE_RESET);
 		return PCI_ERS_RESULT_NEED_RESET;
 	case pci_channel_io_perm_failure:
 		dev_warn(dev->ctrl.device,
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index bc90ec3c51b0..b969ab23a55b 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -2136,7 +2136,8 @@ static void nvme_rdma_shutdown_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
 {
 	nvme_rdma_teardown_io_queues(ctrl, shutdown);
 	nvme_quiesce_admin_queue(&ctrl->ctrl);
-	nvme_disable_ctrl(&ctrl->ctrl, shutdown);
+	nvme_disable_ctrl(&ctrl->ctrl, shutdown ?
+		NVME_DISABLE_SHUTDOWN_SYNC : NVME_DISABLE_RESET);
 	nvme_rdma_teardown_admin_queue(ctrl, shutdown);
 }
 
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index d79811cfa0ce..e3f4541abccd 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2292,7 +2292,8 @@ static void nvme_tcp_teardown_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
 {
 	nvme_tcp_teardown_io_queues(ctrl, shutdown);
 	nvme_quiesce_admin_queue(ctrl);
-	nvme_disable_ctrl(ctrl, shutdown);
+	nvme_disable_ctrl(ctrl, shutdown ?
+		NVME_DISABLE_SHUTDOWN_SYNC : NVME_DISABLE_RESET);
 	nvme_tcp_teardown_admin_queue(ctrl, shutdown);
 }
 
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 9cb434c58075..6cb6e7c6bdd1 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -401,7 +401,7 @@ static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
 
 	nvme_quiesce_admin_queue(&ctrl->ctrl);
 	if (ctrl->ctrl.state == NVME_CTRL_LIVE)
-		nvme_disable_ctrl(&ctrl->ctrl, true);
+		nvme_disable_ctrl(&ctrl->ctrl, NVME_DISABLE_SHUTDOWN_SYNC);
 
 	nvme_cancel_admin_tagset(&ctrl->ctrl);
 	nvme_loop_destroy_admin_queue(ctrl);
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/4] nvme: Add two-pass shutdown support
  2023-12-21 17:22 Make NVME shutdown two-pass - Version 3 Jeremy Allison
                   ` (2 preceding siblings ...)
  2023-12-21 17:22 ` [PATCH 3/4] Change 'bool shutdown' into an enum shutdown_type { NVME_DISABLE_RESET = 0, NVME_DISABLE_SHUTDOWN_SYNC = 1 } Jeremy Allison
@ 2023-12-21 17:22 ` Jeremy Allison
  2023-12-25  9:58   ` Sagi Grimberg
  3 siblings, 1 reply; 22+ messages in thread
From: Jeremy Allison @ 2023-12-21 17:22 UTC (permalink / raw)
  To: jallison, jra, tansuresh, hch, gregkh, rafael, bhelgaas; +Cc: linux-nvme

This works with the two-pass shutdown mechanism setup for the PCI
drivers and participates to provide the shutdown_wait
method at the pci_driver structure level.

Adds the new NVME_DISABLE_SHUTDOWN_ASYNC to enum shutdown_type.
Changes the nvme shutdown() method to set the
NVME_CC_SHN_NORMAL bit and then return to the caller when
requested by NVME_DISABLE_SHUTDOWN_ASYNC.

nvme_shutdown_wait() is added to call an internal
nvme_wait_for_shutdown_cmpl() function to synchronously
wait for the device to wait for the NVME_CSTS_SHST_CMPLT bit.

This change speeds up the shutdown in a system which hosts
many controllers.

Signed-off-by: Jeremy Allison <jallison@ciq.com>
Signed-off-by: Tanjore Suresh <tansuresh@google.com>
---
 drivers/nvme/host/core.c | 29 +++++++++++++++++++++++++++--
 drivers/nvme/host/nvme.h |  4 +++-
 drivers/nvme/host/pci.c  | 24 ++++++++++++++++++++++--
 3 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index bc7040da8e74..2ebcd40106b7 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2216,7 +2216,7 @@ int nvme_disable_ctrl(struct nvme_ctrl *ctrl, enum shutdown_type shutdown_type)
 	int ret;
 
 	ctrl->ctrl_config &= ~NVME_CC_SHN_MASK;
-	if (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC)
+	if (shutdown_type != NVME_DISABLE_RESET)
 		ctrl->ctrl_config |= NVME_CC_SHN_NORMAL;
 	else
 		ctrl->ctrl_config &= ~NVME_CC_ENABLE;
@@ -2225,10 +2225,24 @@ int nvme_disable_ctrl(struct nvme_ctrl *ctrl, enum shutdown_type shutdown_type)
 	if (ret)
 		return ret;
 
-	if (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC) {
+	switch (shutdown_type) {
+	case NVME_DISABLE_SHUTDOWN_ASYNC:
+		/*
+		 * nvme_wait_for_shutdown_cmpl() will read the reply for this.
+		*/
+		return ret;
+	case NVME_DISABLE_SHUTDOWN_SYNC:
+		/*
+		 * Spin on the read of the control register.
+		 */
 		return nvme_wait_ready(ctrl, NVME_CSTS_SHST_MASK,
 				       NVME_CSTS_SHST_CMPLT,
 				       ctrl->shutdown_timeout, "shutdown");
+	case NVME_DISABLE_RESET:
+		/*
+		 * Doing a reset here. Handle below.
+		 */
+		break;
 	}
 	if (ctrl->quirks & NVME_QUIRK_DELAY_BEFORE_CHK_RDY)
 		msleep(NVME_QUIRK_DELAY_AMOUNT);
@@ -2237,6 +2251,17 @@ int nvme_disable_ctrl(struct nvme_ctrl *ctrl, enum shutdown_type shutdown_type)
 }
 EXPORT_SYMBOL_GPL(nvme_disable_ctrl);
 
+int nvme_wait_for_shutdown_cmpl(struct nvme_ctrl *ctrl)
+{
+	ctrl->ctrl_config &= ~NVME_CC_SHN_MASK;
+	ctrl->ctrl_config |= NVME_CC_SHN_NORMAL;
+
+	return nvme_wait_ready(ctrl, NVME_CSTS_SHST_MASK,
+			       NVME_CSTS_SHST_CMPLT,
+			       ctrl->shutdown_timeout, "shutdown");
+}
+EXPORT_SYMBOL_GPL(nvme_wait_for_shutdown_cmpl);
+
 int nvme_enable_ctrl(struct nvme_ctrl *ctrl)
 {
 	unsigned dev_page_min;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index d880f1ee08d4..adbff23532de 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -189,7 +189,8 @@ enum {
 
 enum shutdown_type {
 	NVME_DISABLE_RESET = 0,
-	NVME_DISABLE_SHUTDOWN_SYNC = 1
+	NVME_DISABLE_SHUTDOWN_SYNC = 1,
+	NVME_DISABLE_SHUTDOWN_ASYNC = 2
 };
 
 static inline struct nvme_request *nvme_req(struct request *req)
@@ -756,6 +757,7 @@ void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl);
 bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 		enum nvme_ctrl_state new_state);
 int nvme_disable_ctrl(struct nvme_ctrl *ctrl, enum shutdown_type shutdown_type);
+int nvme_wait_for_shutdown_cmpl(struct nvme_ctrl *ctrl);
 int nvme_enable_ctrl(struct nvme_ctrl *ctrl);
 int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 		const struct nvme_ctrl_ops *ops, unsigned long quirks);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 77b015affb0b..9cb4436710dd 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2586,7 +2586,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, enum shutdown_type shutdown_t
 		 * Give the controller a chance to complete all entered requests
 		 * if doing a safe shutdown.
 		 */
-		if (!dead && (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC))
+		if (!dead && (shutdown_type != NVME_DISABLE_RESET))
 			nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
 	}
 
@@ -3100,7 +3100,26 @@ static void nvme_shutdown(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
 
-	nvme_disable_prepare_reset(dev, NVME_DISABLE_SHUTDOWN_SYNC);
+	nvme_disable_prepare_reset(dev, NVME_DISABLE_SHUTDOWN_ASYNC);
+}
+
+static void nvme_shutdown_wait(struct pci_dev *pdev)
+{
+	struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+	mutex_lock(&dev->shutdown_lock);
+	nvme_wait_for_shutdown_cmpl(&dev->ctrl);
+
+	/*
+	 * The driver will not be starting up queues again if shutting down so
+	 * must flush all entered requests to their failed completion to avoid
+	 * deadlocking blk-mq hot-cpu notifier.
+	 */
+	nvme_unquiesce_io_queues(&dev->ctrl);
+	if (dev->ctrl.admin_q && !blk_queue_dying(dev->ctrl.admin_q))
+		nvme_unquiesce_admin_queue(&dev->ctrl);
+
+	mutex_unlock(&dev->shutdown_lock);
 }
 
 /*
@@ -3492,6 +3511,7 @@ static struct pci_driver nvme_driver = {
 	.probe		= nvme_probe,
 	.remove		= nvme_remove,
 	.shutdown	= nvme_shutdown,
+	.shutdown_wait  = nvme_shutdown_wait,
 	.driver		= {
 		.probe_type	= PROBE_PREFER_ASYNCHRONOUS,
 #ifdef CONFIG_PM_SLEEP
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] driver core: Support two-pass driver shutdown
  2023-12-21 17:22 ` [PATCH 1/4] driver core: Support two-pass driver shutdown Jeremy Allison
@ 2023-12-21 17:29   ` Greg KH
  2023-12-27 20:33   ` Bjorn Helgaas
  1 sibling, 0 replies; 22+ messages in thread
From: Greg KH @ 2023-12-21 17:29 UTC (permalink / raw)
  To: Jeremy Allison; +Cc: jra, tansuresh, hch, rafael, bhelgaas, linux-nvme

On Thu, Dec 21, 2023 at 09:22:54AM -0800, Jeremy Allison wrote:
> From: Tanjore Suresh <tansuresh@google.com>
> 
> This changes the bus driver interface with an additional entry point
> to enable devices to implement two-pass shutdown. The existing
> synchronous interface to shutdown is called, and if a shutdown_wait
> method is defined the device is moved to an alternate list.
> 
> Once the shutdown method is called for all devices, the
> shutdown_wait method is then called synchronously for
> all devices on the alternate list.
> 
> Signed-off-by: Tanjore Suresh <tansuresh@google.com>
> Signed-off-by: Jeremy Allison <jallison@ciq.com>

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] nvme: Add two-pass shutdown support
  2023-12-21 17:22 ` [PATCH 4/4] nvme: Add two-pass shutdown support Jeremy Allison
@ 2023-12-25  9:58   ` Sagi Grimberg
  2023-12-27  0:53     ` Jeremy Allison
  0 siblings, 1 reply; 22+ messages in thread
From: Sagi Grimberg @ 2023-12-25  9:58 UTC (permalink / raw)
  To: Jeremy Allison, jra, tansuresh, hch, gregkh, rafael, bhelgaas; +Cc: linux-nvme



On 12/21/23 19:22, Jeremy Allison wrote:
> This works with the two-pass shutdown mechanism setup for the PCI
> drivers and participates to provide the shutdown_wait
> method at the pci_driver structure level.
> 
> Adds the new NVME_DISABLE_SHUTDOWN_ASYNC to enum shutdown_type.
> Changes the nvme shutdown() method to set the
> NVME_CC_SHN_NORMAL bit and then return to the caller when
> requested by NVME_DISABLE_SHUTDOWN_ASYNC.
> 
> nvme_shutdown_wait() is added to call an internal
> nvme_wait_for_shutdown_cmpl() function to synchronously
> wait for the device to wait for the NVME_CSTS_SHST_CMPLT bit.
> 
> This change speeds up the shutdown in a system which hosts
> many controllers.
> 
> Signed-off-by: Jeremy Allison <jallison@ciq.com>
> Signed-off-by: Tanjore Suresh <tansuresh@google.com>
> ---
>   drivers/nvme/host/core.c | 29 +++++++++++++++++++++++++++--
>   drivers/nvme/host/nvme.h |  4 +++-
>   drivers/nvme/host/pci.c  | 24 ++++++++++++++++++++++--
>   3 files changed, 52 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index bc7040da8e74..2ebcd40106b7 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2216,7 +2216,7 @@ int nvme_disable_ctrl(struct nvme_ctrl *ctrl, enum shutdown_type shutdown_type)
>   	int ret;
>   
>   	ctrl->ctrl_config &= ~NVME_CC_SHN_MASK;
> -	if (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC)
> +	if (shutdown_type != NVME_DISABLE_RESET)
>   		ctrl->ctrl_config |= NVME_CC_SHN_NORMAL;
>   	else
>   		ctrl->ctrl_config &= ~NVME_CC_ENABLE;
> @@ -2225,10 +2225,24 @@ int nvme_disable_ctrl(struct nvme_ctrl *ctrl, enum shutdown_type shutdown_type)
>   	if (ret)
>   		return ret;
>   
> -	if (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC) {
> +	switch (shutdown_type) {
> +	case NVME_DISABLE_SHUTDOWN_ASYNC:
> +		/*
> +		 * nvme_wait_for_shutdown_cmpl() will read the reply for this.
> +		*/
> +		return ret;
> +	case NVME_DISABLE_SHUTDOWN_SYNC:
> +		/*
> +		 * Spin on the read of the control register.
> +		 */
>   		return nvme_wait_ready(ctrl, NVME_CSTS_SHST_MASK,
>   				       NVME_CSTS_SHST_CMPLT,
>   				       ctrl->shutdown_timeout, "shutdown");
> +	case NVME_DISABLE_RESET:
> +		/*
> +		 * Doing a reset here. Handle below.
> +		 */
> +		break;
>   	}
>   	if (ctrl->quirks & NVME_QUIRK_DELAY_BEFORE_CHK_RDY)
>   		msleep(NVME_QUIRK_DELAY_AMOUNT);
> @@ -2237,6 +2251,17 @@ int nvme_disable_ctrl(struct nvme_ctrl *ctrl, enum shutdown_type shutdown_type)
>   }
>   EXPORT_SYMBOL_GPL(nvme_disable_ctrl);
>   
> +int nvme_wait_for_shutdown_cmpl(struct nvme_ctrl *ctrl)
> +{
> +	ctrl->ctrl_config &= ~NVME_CC_SHN_MASK;
> +	ctrl->ctrl_config |= NVME_CC_SHN_NORMAL;

Why is ctrl_config being set again?

> +
> +	return nvme_wait_ready(ctrl, NVME_CSTS_SHST_MASK,
> +			       NVME_CSTS_SHST_CMPLT,
> +			       ctrl->shutdown_timeout, "shutdown");
> +}
> +EXPORT_SYMBOL_GPL(nvme_wait_for_shutdown_cmpl);

Why export the symbol?

> +
>   int nvme_enable_ctrl(struct nvme_ctrl *ctrl)
>   {
>   	unsigned dev_page_min;
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index d880f1ee08d4..adbff23532de 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -189,7 +189,8 @@ enum {
>   
>   enum shutdown_type {
>   	NVME_DISABLE_RESET = 0,
> -	NVME_DISABLE_SHUTDOWN_SYNC = 1
> +	NVME_DISABLE_SHUTDOWN_SYNC = 1,
> +	NVME_DISABLE_SHUTDOWN_ASYNC = 2
>   };
>   
>   static inline struct nvme_request *nvme_req(struct request *req)
> @@ -756,6 +757,7 @@ void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl);
>   bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>   		enum nvme_ctrl_state new_state);
>   int nvme_disable_ctrl(struct nvme_ctrl *ctrl, enum shutdown_type shutdown_type);
> +int nvme_wait_for_shutdown_cmpl(struct nvme_ctrl *ctrl);
>   int nvme_enable_ctrl(struct nvme_ctrl *ctrl);
>   int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
>   		const struct nvme_ctrl_ops *ops, unsigned long quirks);
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 77b015affb0b..9cb4436710dd 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2586,7 +2586,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, enum shutdown_type shutdown_t
>   		 * Give the controller a chance to complete all entered requests
>   		 * if doing a safe shutdown.
>   		 */
> -		if (!dead && (shutdown_type == NVME_DISABLE_SHUTDOWN_SYNC))
> +		if (!dead && (shutdown_type != NVME_DISABLE_RESET))
>   			nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
>   	}
>   
> @@ -3100,7 +3100,26 @@ static void nvme_shutdown(struct pci_dev *pdev)
>   {
>   	struct nvme_dev *dev = pci_get_drvdata(pdev);
>   
> -	nvme_disable_prepare_reset(dev, NVME_DISABLE_SHUTDOWN_SYNC);
> +	nvme_disable_prepare_reset(dev, NVME_DISABLE_SHUTDOWN_ASYNC);
> +}
> +
> +static void nvme_shutdown_wait(struct pci_dev *pdev)
> +{
> +	struct nvme_dev *dev = pci_get_drvdata(pdev);
> +
> +	mutex_lock(&dev->shutdown_lock);
> +	nvme_wait_for_shutdown_cmpl(&dev->ctrl);

If this is the only call-site? why not just open-code it here?

> +
> +	/*
> +	 * The driver will not be starting up queues again if shutting down so
> +	 * must flush all entered requests to their failed completion to avoid
> +	 * deadlocking blk-mq hot-cpu notifier.
> +	 */
> +	nvme_unquiesce_io_queues(&dev->ctrl);
> +	if (dev->ctrl.admin_q && !blk_queue_dying(dev->ctrl.admin_q))
> +		nvme_unquiesce_admin_queue(&dev->ctrl);
> +
> +	mutex_unlock(&dev->shutdown_lock);
>   }
>   
>   /*
> @@ -3492,6 +3511,7 @@ static struct pci_driver nvme_driver = {
>   	.probe		= nvme_probe,
>   	.remove		= nvme_remove,
>   	.shutdown	= nvme_shutdown,
> +	.shutdown_wait  = nvme_shutdown_wait,
>   	.driver		= {
>   		.probe_type	= PROBE_PREFER_ASYNCHRONOUS,
>   #ifdef CONFIG_PM_SLEEP


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] nvme: Add two-pass shutdown support
  2023-12-25  9:58   ` Sagi Grimberg
@ 2023-12-27  0:53     ` Jeremy Allison
  2024-01-01  9:21       ` Sagi Grimberg
  0 siblings, 1 reply; 22+ messages in thread
From: Jeremy Allison @ 2023-12-27  0:53 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Jeremy Allison, tansuresh, hch, gregkh, rafael, bhelgaas,
	linux-nvme, jra

On Mon, Dec 25, 2023 at 11:58:04AM +0200, Sagi Grimberg wrote:
>On 12/21/23 19:22, Jeremy Allison wrote:
>>  EXPORT_SYMBOL_GPL(nvme_disable_ctrl);
>>+int nvme_wait_for_shutdown_cmpl(struct nvme_ctrl *ctrl)
>>+{
>>+	ctrl->ctrl_config &= ~NVME_CC_SHN_MASK;
>>+	ctrl->ctrl_config |= NVME_CC_SHN_NORMAL;
>
>Why is ctrl_config being set again?

Good catch. There's no need for that. I'll fix
in the next iteration, thanks !

>>+EXPORT_SYMBOL_GPL(nvme_wait_for_shutdown_cmpl);
>
>Why export the symbol?

nvme_wait_for_shutdown_cmpl() calls nvme_wait_ready(),
which is a static function defined in drivers/nvme/host/core.c.

The only way to inline the functionality of nvme_wait_for_shutdown_cmpl()
into nvme_shutdown_wait() is to export nvme_wait_ready() instead.

If you're OK with that I can remove the intermediate
function nvme_wait_for_shutdown_cmpl() (and indeed as
you've pointed out there's no need to set ctrl->ctrl_config
again).

>>+static void nvme_shutdown_wait(struct pci_dev *pdev)
>>+{
>>+	struct nvme_dev *dev = pci_get_drvdata(pdev);
>>+
>>+	mutex_lock(&dev->shutdown_lock);
>>+	nvme_wait_for_shutdown_cmpl(&dev->ctrl);
>
>If this is the only call-site? why not just open-code it here?

See above comment.

Thanks for the advice. I'll fix these up in the next
iteration (which probably won't be until after January
2nd due to the Christmas break).


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] Change 'bool shutdown' into an enum shutdown_type { NVME_DISABLE_RESET = 0, NVME_DISABLE_SHUTDOWN_SYNC = 1 }.
  2023-12-21 17:22 ` [PATCH 3/4] Change 'bool shutdown' into an enum shutdown_type { NVME_DISABLE_RESET = 0, NVME_DISABLE_SHUTDOWN_SYNC = 1 } Jeremy Allison
@ 2023-12-27 19:33   ` Bjorn Helgaas
  2024-01-02 18:06     ` Jeremy Allison
  0 siblings, 1 reply; 22+ messages in thread
From: Bjorn Helgaas @ 2023-12-27 19:33 UTC (permalink / raw)
  To: Jeremy Allison; +Cc: jra, tansuresh, hch, gregkh, rafael, bhelgaas, linux-nvme

The subject might be overly detailed and doesn't match the "nvme: ..."
prefix convention of drivers/nvme/.  Maybe it was intended as part of
the commit log?

On Thu, Dec 21, 2023 at 09:22:56AM -0800, Jeremy Allison wrote:
> Convert nvme_disable_ctrl() and nvme_dev_disable()
> inside drivers/nvme/host/pci.c to use this
> 
> bool shutdown = false == NVME_DISABLE_RESET
> bool shutdown = true == NVME_DISABLE_SHUTDOWN_SYNC.
> 
> This will make it easier to add a third request: NVME_DISABLE_SHUTDOWN_ASNYC
> later.
> 
> As nvme_disable_ctrl() is used outside of drivers/nvme/host/pci.c,
> convert the callers of nvme_disable_ctrl() to this convention too.
> 
> Signed-off-by: Jeremy Allison <jallison@ciq.com>
> ---
>  drivers/nvme/host/apple.c  |  4 ++--
>  drivers/nvme/host/core.c   |  6 +++---
>  drivers/nvme/host/nvme.h   |  7 +++++-
>  drivers/nvme/host/pci.c    | 44 +++++++++++++++++++-------------------
>  drivers/nvme/host/rdma.c   |  3 ++-
>  drivers/nvme/host/tcp.c    |  3 ++-
>  drivers/nvme/target/loop.c |  2 +-
>  7 files changed, 38 insertions(+), 31 deletions(-)
> ...


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] driver core: Support two-pass driver shutdown
  2023-12-21 17:22 ` [PATCH 1/4] driver core: Support two-pass driver shutdown Jeremy Allison
  2023-12-21 17:29   ` Greg KH
@ 2023-12-27 20:33   ` Bjorn Helgaas
  2024-01-01  9:23     ` Sagi Grimberg
  2024-01-02 18:07     ` Jeremy Allison
  1 sibling, 2 replies; 22+ messages in thread
From: Bjorn Helgaas @ 2023-12-27 20:33 UTC (permalink / raw)
  To: Jeremy Allison; +Cc: jra, tansuresh, hch, gregkh, rafael, bhelgaas, linux-nvme

On Thu, Dec 21, 2023 at 09:22:54AM -0800, Jeremy Allison wrote:
> From: Tanjore Suresh <tansuresh@google.com>
> 
> This changes the bus driver interface with an additional entry point
> to enable devices to implement two-pass shutdown. The existing
> synchronous interface to shutdown is called, and if a shutdown_wait
> method is defined the device is moved to an alternate list.
> 
> Once the shutdown method is called for all devices, the
> shutdown_wait method is then called synchronously for
> all devices on the alternate list.

Christoph and Sagi noted that the objective is similar to "fire now
and wait for completion later."

Would it be practical to actually *do* that, e.g., add a global
shutdown workqueue, have the .shutdown() method of nvme and similar
drivers add items to it, and have the driver core wait until
everything on that workqueue has completed?

But I'm not a workqueue expert and I do see the scary warning at
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/workqueue.h?id=v6.6#n619,
about not flushing system-wide workqueues, so maybe this wouldn't be
workable.

Bjorn


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] nvme: Add two-pass shutdown support
  2023-12-27  0:53     ` Jeremy Allison
@ 2024-01-01  9:21       ` Sagi Grimberg
  2024-01-02 18:03         ` Jeremy Allison
  0 siblings, 1 reply; 22+ messages in thread
From: Sagi Grimberg @ 2024-01-01  9:21 UTC (permalink / raw)
  To: Jeremy Allison
  Cc: Jeremy Allison, tansuresh, hch, gregkh, rafael, bhelgaas, linux-nvme



On 12/27/23 02:53, Jeremy Allison wrote:
> On Mon, Dec 25, 2023 at 11:58:04AM +0200, Sagi Grimberg wrote:
>> On 12/21/23 19:22, Jeremy Allison wrote:
>>>  EXPORT_SYMBOL_GPL(nvme_disable_ctrl);
>>> +int nvme_wait_for_shutdown_cmpl(struct nvme_ctrl *ctrl)
>>> +{
>>> +    ctrl->ctrl_config &= ~NVME_CC_SHN_MASK;
>>> +    ctrl->ctrl_config |= NVME_CC_SHN_NORMAL;
>>
>> Why is ctrl_config being set again?
> 
> Good catch. There's no need for that. I'll fix
> in the next iteration, thanks !
> 
>>> +EXPORT_SYMBOL_GPL(nvme_wait_for_shutdown_cmpl);
>>
>> Why export the symbol?
> 
> nvme_wait_for_shutdown_cmpl() calls nvme_wait_ready(),
> which is a static function defined in drivers/nvme/host/core.c.
> 
> The only way to inline the functionality of nvme_wait_for_shutdown_cmpl()
> into nvme_shutdown_wait() is to export nvme_wait_ready() instead.
> 
> If you're OK with that I can remove the intermediate
> function nvme_wait_for_shutdown_cmpl() (and indeed as
> you've pointed out there's no need to set ctrl->ctrl_config
> again).

I think it is small enough to make it static inline in nvme.h


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] driver core: Support two-pass driver shutdown
  2023-12-27 20:33   ` Bjorn Helgaas
@ 2024-01-01  9:23     ` Sagi Grimberg
  2024-01-02 23:12       ` Jeremy Allison
  2024-01-02 18:07     ` Jeremy Allison
  1 sibling, 1 reply; 22+ messages in thread
From: Sagi Grimberg @ 2024-01-01  9:23 UTC (permalink / raw)
  To: Bjorn Helgaas, Jeremy Allison
  Cc: jra, tansuresh, hch, gregkh, rafael, bhelgaas, linux-nvme


>> From: Tanjore Suresh <tansuresh@google.com>
>>
>> This changes the bus driver interface with an additional entry point
>> to enable devices to implement two-pass shutdown. The existing
>> synchronous interface to shutdown is called, and if a shutdown_wait
>> method is defined the device is moved to an alternate list.
>>
>> Once the shutdown method is called for all devices, the
>> shutdown_wait method is then called synchronously for
>> all devices on the alternate list.
> 
> Christoph and Sagi noted that the objective is similar to "fire now
> and wait for completion later."
> 
> Would it be practical to actually *do* that, e.g., add a global
> shutdown workqueue, have the .shutdown() method of nvme and similar
> drivers add items to it, and have the driver core wait until
> everything on that workqueue has completed?
> 
> But I'm not a workqueue expert and I do see the scary warning at
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/workqueue.h?id=v6.6#n619,
> about not flushing system-wide workqueues, so maybe this wouldn't be
> workable.

I don't think that a workqueue is necessary. What I do think that we
should document somewhere that .shutdown() can be synchronous or
asynchronous when paired with .shutdown_wait().


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] nvme: Add two-pass shutdown support
  2024-01-01  9:21       ` Sagi Grimberg
@ 2024-01-02 18:03         ` Jeremy Allison
  2024-01-03  8:36           ` Sagi Grimberg
  0 siblings, 1 reply; 22+ messages in thread
From: Jeremy Allison @ 2024-01-02 18:03 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Jeremy Allison, tansuresh, hch, gregkh, rafael, bhelgaas,
	linux-nvme, jra

On Mon, Jan 01, 2024 at 11:21:14AM +0200, Sagi Grimberg wrote:
>
>
>On 12/27/23 02:53, Jeremy Allison wrote:
>>On Mon, Dec 25, 2023 at 11:58:04AM +0200, Sagi Grimberg wrote:
>>>On 12/21/23 19:22, Jeremy Allison wrote:
>>>> EXPORT_SYMBOL_GPL(nvme_disable_ctrl);
>>>>+int nvme_wait_for_shutdown_cmpl(struct nvme_ctrl *ctrl)
>>>>+{
>>>>+    ctrl->ctrl_config &= ~NVME_CC_SHN_MASK;
>>>>+    ctrl->ctrl_config |= NVME_CC_SHN_NORMAL;
>>>
>>>Why is ctrl_config being set again?
>>
>>Good catch. There's no need for that. I'll fix
>>in the next iteration, thanks !
>>
>>>>+EXPORT_SYMBOL_GPL(nvme_wait_for_shutdown_cmpl);
>>>
>>>Why export the symbol?
>>
>>nvme_wait_for_shutdown_cmpl() calls nvme_wait_ready(),
>>which is a static function defined in drivers/nvme/host/core.c.
>>
>>The only way to inline the functionality of nvme_wait_for_shutdown_cmpl()
>>into nvme_shutdown_wait() is to export nvme_wait_ready() instead.
>>
>>If you're OK with that I can remove the intermediate
>>function nvme_wait_for_shutdown_cmpl() (and indeed as
>>you've pointed out there's no need to set ctrl->ctrl_config
>>again).
>
>I think it is small enough to make it static inline in nvme.h

Sorry, just looking for some clarification here. Are
you talking about inlining nvme_wait_ready() ? There
are indeed larger functions already inlined in nvme.h.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] Change 'bool shutdown' into an enum shutdown_type { NVME_DISABLE_RESET = 0, NVME_DISABLE_SHUTDOWN_SYNC = 1 }.
  2023-12-27 19:33   ` Bjorn Helgaas
@ 2024-01-02 18:06     ` Jeremy Allison
  0 siblings, 0 replies; 22+ messages in thread
From: Jeremy Allison @ 2024-01-02 18:06 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jeremy Allison, tansuresh, hch, gregkh, rafael, bhelgaas,
	linux-nvme, jra

On Wed, Dec 27, 2023 at 01:33:44PM -0600, Bjorn Helgaas wrote:
>The subject might be overly detailed and doesn't match the "nvme: ..."
>prefix convention of drivers/nvme/.  Maybe it was intended as part of
>the commit log?

I'll fix that, thanks !


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] driver core: Support two-pass driver shutdown
  2023-12-27 20:33   ` Bjorn Helgaas
  2024-01-01  9:23     ` Sagi Grimberg
@ 2024-01-02 18:07     ` Jeremy Allison
  2024-01-05  4:29       ` Christoph Hellwig
  1 sibling, 1 reply; 22+ messages in thread
From: Jeremy Allison @ 2024-01-02 18:07 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jeremy Allison, tansuresh, hch, gregkh, rafael, bhelgaas,
	linux-nvme, jra

On Wed, Dec 27, 2023 at 02:33:37PM -0600, Bjorn Helgaas wrote:
>On Thu, Dec 21, 2023 at 09:22:54AM -0800, Jeremy Allison wrote:
>> From: Tanjore Suresh <tansuresh@google.com>
>>
>> This changes the bus driver interface with an additional entry point
>> to enable devices to implement two-pass shutdown. The existing
>> synchronous interface to shutdown is called, and if a shutdown_wait
>> method is defined the device is moved to an alternate list.
>>
>> Once the shutdown method is called for all devices, the
>> shutdown_wait method is then called synchronously for
>> all devices on the alternate list.
>
>Christoph and Sagi noted that the objective is similar to "fire now
>and wait for completion later."
>
>Would it be practical to actually *do* that, e.g., add a global
>shutdown workqueue, have the .shutdown() method of nvme and similar
>drivers add items to it, and have the driver core wait until
>everything on that workqueue has completed?
>
>But I'm not a workqueue expert and I do see the scary warning at
>https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/workqueue.h?id=v6.6#n619,
>about not flushing system-wide workqueues, so maybe this wouldn't be
>workable.

This is a bigger change than I'm comfortable with
(or indeed understand :-) right now. Can we fix
the immediate problem first please and then look
for improvements later ?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] driver core: Support two-pass driver shutdown
  2024-01-01  9:23     ` Sagi Grimberg
@ 2024-01-02 23:12       ` Jeremy Allison
  2024-01-05  4:28         ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: Jeremy Allison @ 2024-01-02 23:12 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Bjorn Helgaas, Jeremy Allison, tansuresh, hch, gregkh, rafael,
	bhelgaas, linux-nvme, jra

On Mon, Jan 01, 2024 at 11:23:05AM +0200, Sagi Grimberg wrote:
>
>>>From: Tanjore Suresh <tansuresh@google.com>
>>>
>>>This changes the bus driver interface with an additional entry point
>>>to enable devices to implement two-pass shutdown. The existing
>>>synchronous interface to shutdown is called, and if a shutdown_wait
>>>method is defined the device is moved to an alternate list.
>>>
>>>Once the shutdown method is called for all devices, the
>>>shutdown_wait method is then called synchronously for
>>>all devices on the alternate list.
>>
>>Christoph and Sagi noted that the objective is similar to "fire now
>>and wait for completion later."
>>
>>Would it be practical to actually *do* that, e.g., add a global
>>shutdown workqueue, have the .shutdown() method of nvme and similar
>>drivers add items to it, and have the driver core wait until
>>everything on that workqueue has completed?
>>
>>But I'm not a workqueue expert and I do see the scary warning at
>>https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/workqueue.h?id=v6.6#n619,
>>about not flushing system-wide workqueues, so maybe this wouldn't be
>>workable.
>
>I don't think that a workqueue is necessary. What I do think that we
>should document somewhere that .shutdown() can be synchronous or
>asynchronous when paired with .shutdown_wait().

It's not exactly asynchronous when paired with .shutdown_wait().
That's why I used the phrase "two-pass" instead of asynchronous.

Also, I already added this documentation:

+ * @shutdown_wait: Optional driver callback to allow two-pass shutdown.

in the struct pci_driver comment definition in one of the
previous patches.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] nvme: Add two-pass shutdown support
  2024-01-02 18:03         ` Jeremy Allison
@ 2024-01-03  8:36           ` Sagi Grimberg
  2024-01-03 17:41             ` Jeremy Allison
  0 siblings, 1 reply; 22+ messages in thread
From: Sagi Grimberg @ 2024-01-03  8:36 UTC (permalink / raw)
  To: Jeremy Allison
  Cc: Jeremy Allison, tansuresh, hch, gregkh, rafael, bhelgaas, linux-nvme


>>>> On 12/21/23 19:22, Jeremy Allison wrote:
>>>>>  EXPORT_SYMBOL_GPL(nvme_disable_ctrl);
>>>>> +int nvme_wait_for_shutdown_cmpl(struct nvme_ctrl *ctrl)
>>>>> +{
>>>>> +    ctrl->ctrl_config &= ~NVME_CC_SHN_MASK;
>>>>> +    ctrl->ctrl_config |= NVME_CC_SHN_NORMAL;
>>>>
>>>> Why is ctrl_config being set again?
>>>
>>> Good catch. There's no need for that. I'll fix
>>> in the next iteration, thanks !
>>>
>>>>> +EXPORT_SYMBOL_GPL(nvme_wait_for_shutdown_cmpl);
>>>>
>>>> Why export the symbol?
>>>
>>> nvme_wait_for_shutdown_cmpl() calls nvme_wait_ready(),
>>> which is a static function defined in drivers/nvme/host/core.c.
>>>
>>> The only way to inline the functionality of 
>>> nvme_wait_for_shutdown_cmpl()
>>> into nvme_shutdown_wait() is to export nvme_wait_ready() instead.
>>>
>>> If you're OK with that I can remove the intermediate
>>> function nvme_wait_for_shutdown_cmpl() (and indeed as
>>> you've pointed out there's no need to set ctrl->ctrl_config
>>> again).
>>
>> I think it is small enough to make it static inline in nvme.h
> 
> Sorry, just looking for some clarification here. Are
> you talking about inlining nvme_wait_ready() ? There
> are indeed larger functions already inlined in nvme.h.

No, these wrappers are getting confusing.

Exporting nvme_wait_ready and calling it from a static
nvme_shutdown_wait in pci.c is fine I think.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] nvme: Add two-pass shutdown support
  2024-01-03  8:36           ` Sagi Grimberg
@ 2024-01-03 17:41             ` Jeremy Allison
  0 siblings, 0 replies; 22+ messages in thread
From: Jeremy Allison @ 2024-01-03 17:41 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Jeremy Allison, tansuresh, hch, gregkh, rafael, bhelgaas,
	linux-nvme, jra

On Wed, Jan 03, 2024 at 10:36:46AM +0200, Sagi Grimberg wrote:
>
>No, these wrappers are getting confusing.
>
>Exporting nvme_wait_ready and calling it from a static
>nvme_shutdown_wait in pci.c is fine I think.

Got it. I'll re-submit v4 with these changes once
I've tested it. Thanks for the help and advice.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] driver core: Support two-pass driver shutdown
  2024-01-02 23:12       ` Jeremy Allison
@ 2024-01-05  4:28         ` Christoph Hellwig
  0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2024-01-05  4:28 UTC (permalink / raw)
  To: Jeremy Allison
  Cc: Sagi Grimberg, Bjorn Helgaas, Jeremy Allison, tansuresh, hch,
	gregkh, rafael, bhelgaas, linux-nvme

On Tue, Jan 02, 2024 at 03:12:13PM -0800, Jeremy Allison wrote:
> It's not exactly asynchronous when paired with .shutdown_wait().
> That's why I used the phrase "two-pass" instead of asynchronous.

Yes, that's actually a much more descriptive term.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] driver core: Support two-pass driver shutdown
  2024-01-02 18:07     ` Jeremy Allison
@ 2024-01-05  4:29       ` Christoph Hellwig
  2024-01-05 18:15         ` Bjorn Helgaas
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2024-01-05  4:29 UTC (permalink / raw)
  To: Jeremy Allison
  Cc: Bjorn Helgaas, Jeremy Allison, tansuresh, hch, gregkh, rafael,
	bhelgaas, linux-nvme

On Tue, Jan 02, 2024 at 10:07:30AM -0800, Jeremy Allison wrote:
>> But I'm not a workqueue expert and I do see the scary warning at
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/workqueue.h?id=v6.6#n619,
>> about not flushing system-wide workqueues, so maybe this wouldn't be
>> workable.
>
> This is a bigger change than I'm comfortable with
> (or indeed understand :-) right now. Can we fix
> the immediate problem first please and then look
> for improvements later ?

It's also a lot less efficient.  Assuming NVMe isn't alone and other
hardware interfaces also have a shutdown/disable busy wait (and I've seen
quite a few that do) as their limiting factor the two-pass shutdown seems
much nicer than spawning tons of work items.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] driver core: Support two-pass driver shutdown
  2024-01-05  4:29       ` Christoph Hellwig
@ 2024-01-05 18:15         ` Bjorn Helgaas
  2024-01-08  8:28           ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: Bjorn Helgaas @ 2024-01-05 18:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jeremy Allison, Jeremy Allison, tansuresh, gregkh, rafael,
	bhelgaas, linux-nvme

On Fri, Jan 05, 2024 at 05:29:55AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 02, 2024 at 10:07:30AM -0800, Jeremy Allison wrote:
> >> But I'm not a workqueue expert and I do see the scary warning at
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/workqueue.h?id=v6.6#n619,
> >> about not flushing system-wide workqueues, so maybe this wouldn't be
> >> workable.
> >
> > This is a bigger change than I'm comfortable with
> > (or indeed understand :-) right now. Can we fix
> > the immediate problem first please and then look
> > for improvements later ?
> 
> It's also a lot less efficient.  Assuming NVMe isn't alone and other
> hardware interfaces also have a shutdown/disable busy wait (and I've seen
> quite a few that do) as their limiting factor the two-pass shutdown seems
> much nicer than spawning tons of work items.

Here's the essence of the two-pass proposal, adding .shutdown_wait():

    device_shutdown
      for_each_device
        dev->bus->shutdown()
          ...
            nvme_shutdown
              nvme_disable_prepare_reset
                nvme_dev_disable
                  nvme_disable_ctrl
  +                 case NVME_DISABLE_SHUTDOWN_ASYNC:
  +                   return;
  +                 case NVME_DISABLE_SHUTDOWN_SYNC:
                      nvme_wait_ready(...);
  +     if (dev->bus->shutdown_wait)
  +       list_add(&shutdown_wait_list)
  +
  +   for_each(&shutdown_wait_list)
  +     dev->bus->shutdown_wait()
  +       pci_device_shutdown_wait
  +         nvme_shutdown_wait
  +           nvme_wait_ready(...)

FWIW, what I imagined was something like this, where we don't add
.shutdown_wait(), but any .shutdown() method could become asynchronous
by queuing a work item to do the wait:

  + nvme_wait_ready_work
  +   nvme_wait_ready(...)

    device_shutdown
  +   async_shutdown_wq = create_workqueue("async_shutdown");
      for_each_device
        dev->bus->shutdown()
          ...
            nvme_shutdown
              nvme_disable_prepare_reset
                nvme_dev_disable
                  nvme_disable_ctrl
  +                 case NVME_DISABLE_SHUTDOWN_ASYNC:
  +                   queue_work(async_shutdown_wq, &nvme_wait_ready_work)
  +                   return;
  +                 case NVME_DISABLE_SHUTDOWN_SYNC:
                      nvme_wait_ready(...);
  +
  +   destroy_workqueue(async_shutdown_wq);

This is a bit of hand waving because I'm not a workqueue expert, but
it seems like basically the same amount of overhead without having to
add .shutdown_wait() to the bus_type and the pci_driver.

Bjorn


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] driver core: Support two-pass driver shutdown
  2024-01-05 18:15         ` Bjorn Helgaas
@ 2024-01-08  8:28           ` Christoph Hellwig
  0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2024-01-08  8:28 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Christoph Hellwig, Jeremy Allison, Jeremy Allison, tansuresh,
	gregkh, rafael, bhelgaas, linux-nvme

On Fri, Jan 05, 2024 at 12:15:17PM -0600, Bjorn Helgaas wrote:
> This is a bit of hand waving because I'm not a workqueue expert, but
> it seems like basically the same amount of overhead without having to
> add .shutdown_wait() to the bus_type and the pci_driver.

It spawns a new work for the busy wait for the shutdown.  That's not a
very efficient programming model.


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-01-08  8:28 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-21 17:22 Make NVME shutdown two-pass - Version 3 Jeremy Allison
2023-12-21 17:22 ` [PATCH 1/4] driver core: Support two-pass driver shutdown Jeremy Allison
2023-12-21 17:29   ` Greg KH
2023-12-27 20:33   ` Bjorn Helgaas
2024-01-01  9:23     ` Sagi Grimberg
2024-01-02 23:12       ` Jeremy Allison
2024-01-05  4:28         ` Christoph Hellwig
2024-01-02 18:07     ` Jeremy Allison
2024-01-05  4:29       ` Christoph Hellwig
2024-01-05 18:15         ` Bjorn Helgaas
2024-01-08  8:28           ` Christoph Hellwig
2023-12-21 17:22 ` [PATCH 2/4] PCI: Support two-pass shutdown Jeremy Allison
2023-12-21 17:22 ` [PATCH 3/4] Change 'bool shutdown' into an enum shutdown_type { NVME_DISABLE_RESET = 0, NVME_DISABLE_SHUTDOWN_SYNC = 1 } Jeremy Allison
2023-12-27 19:33   ` Bjorn Helgaas
2024-01-02 18:06     ` Jeremy Allison
2023-12-21 17:22 ` [PATCH 4/4] nvme: Add two-pass shutdown support Jeremy Allison
2023-12-25  9:58   ` Sagi Grimberg
2023-12-27  0:53     ` Jeremy Allison
2024-01-01  9:21       ` Sagi Grimberg
2024-01-02 18:03         ` Jeremy Allison
2024-01-03  8:36           ` Sagi Grimberg
2024-01-03 17:41             ` Jeremy Allison

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.