linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] async device shutdown support
@ 2024-02-07 18:40 David Jeffery
  2024-02-07 18:40 ` [RFC PATCH 1/6] minimal async shutdown infrastructure David Jeffery
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: David Jeffery @ 2024-02-07 18:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-nvme, linux-scsi, Greg Kroah-Hartman, Rafael J . Wysocki,
	David Jeffery

This is another attempt to implement an acceptable implementation of async
device shutdown, inspired by a previous attempt by Tanjore Suresh. For
systems with many disks, async shutdown can greatly reduce shutdown times
from having slow operations run in parallel. The older patches were rejected,
with this new implementation attempting to fix my understanding of the flaws
in the older patches.

Using similar interfaces and building off the ideas of the older patches,
this patchset creates an async shutdown implementation which follows the
basic ordering of the shutdown list, ensuring the shutdown of any children
devices complete whether synchronous or asynchronous before performing
shutdown on a parent device.

In addition to an implementation for asynchronous pci nvme shutdown, this
patchset also adds support for async shutdown of sd devices for cache flush.
As an example of the effects of the patch, one system with a large amount of
disks went from over 30 seconds to shut down to less than 5.

The specific driver changes are the roughest part of this patchset. But the
acceptability of the core async functionality is critical. Any feedback on
flaws or improvements is appreciated.

David Jeffery (6):
  minimal async shutdown infrastructure
  Improve ability to perform async shutdown in parallel
  pci bus async shutdown support
  pci nvme async shutdown support
  scsi mid layer support for async command submit
  sd: async cache flush on shutdown

 drivers/base/base.h           |   1 +
 drivers/base/core.c           | 149 +++++++++++++++++++++++++++++++++-
 drivers/nvme/host/core.c      |  26 ++++--
 drivers/nvme/host/nvme.h      |   2 +
 drivers/nvme/host/pci.c       |  53 +++++++++++-
 drivers/pci/pci-driver.c      |  24 +++++-
 drivers/scsi/scsi_lib.c       | 138 ++++++++++++++++++++++++-------
 drivers/scsi/sd.c             |  66 +++++++++++++--
 drivers/scsi/sd.h             |   2 +
 include/linux/device/bus.h    |   8 +-
 include/linux/device/driver.h |   7 ++
 include/linux/pci.h           |   4 +
 include/scsi/scsi_device.h    |   8 ++
 13 files changed, 439 insertions(+), 49 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH 1/6] minimal async shutdown infrastructure
  2024-02-07 18:40 [RFC PATCH 0/6] async device shutdown support David Jeffery
@ 2024-02-07 18:40 ` David Jeffery
  2024-02-14  3:42   ` Saravana Kannan
  2024-02-07 18:40 ` [RFC PATCH 2/6] Improve ability to perform async shutdown in parallel David Jeffery
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 9+ messages in thread
From: David Jeffery @ 2024-02-07 18:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-nvme, linux-scsi, Greg Kroah-Hartman, Rafael J . Wysocki,
	David Jeffery, Laurence Oberman

Adds the async_shutdown_start and async_shutdown_end calls to perform async
shutdown. Implements a very minimalist method of async shutdown support
within device_shutdown(). The device at the head of the shutdown list is
checked against a list of devices under async shutdown. If the head is a
parent of a device on the async list, all active async shutdown operations
are completed before the parent's shutdown call is performed.

The number of async operations also has a max limit to prevent the list being
checked for a child from getting overly large.

Signed-off-by: David Jeffery <djeffery@redhat.com>
Tested-by:     Laurence Oberman <loberman@redhat.com>

---
 drivers/base/core.c           | 116 +++++++++++++++++++++++++++++++++-
 include/linux/device/bus.h    |   8 ++-
 include/linux/device/driver.h |   7 ++
 3 files changed, 127 insertions(+), 4 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 14d46af40f9a..5bc2282c00cd 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -4719,12 +4719,92 @@ int device_change_owner(struct device *dev, kuid_t kuid, kgid_t kgid)
 }
 EXPORT_SYMBOL_GPL(device_change_owner);
 
+
+#define MAX_ASYNC_SHUTDOWNS 32
+static int async_shutdown_count;
+static LIST_HEAD(async_shutdown_list);
+
+/**
+ * If a device has a child busy with an async shutdown or there are too many
+ * async shutdowns active, the device may not be shut down at this time.
+ */
+static bool may_shutdown_device(struct device *dev)
+{
+	struct device *tmp;
+
+	if (async_shutdown_count >= MAX_ASYNC_SHUTDOWNS)
+		return false;
+
+	list_for_each_entry(tmp, &async_shutdown_list, kobj.entry) {
+		if (tmp->parent == dev)
+			return false;
+	}
+	return true;
+}
+
+/**
+ * Call and track each async shutdown call
+ */
+static void async_shutdown_start(struct device *dev, void (*callback) (struct device *))
+{
+	if (initcall_debug)
+		dev_info(dev, "async_shutdown_start\n");
+
+	(*callback)(dev);
+	list_add_tail(&dev->kobj.entry, &async_shutdown_list);
+	async_shutdown_count++;
+}
+
+/**
+ * Wait for all async shutdown operations currently active to complete
+ */
+static void wait_for_active_async_shutdown(void)
+{
+	struct device *dev, *parent;
+
+        while (!list_empty(&async_shutdown_list)) {
+                dev = list_entry(async_shutdown_list.next, struct device,
+                                kobj.entry);
+
+                parent = dev->parent;
+
+                /*
+                 * Make sure the device is off the list
+                 */
+                list_del_init(&dev->kobj.entry);
+                if (parent)
+                        device_lock(parent);
+                device_lock(dev);
+                if (dev->bus && dev->bus->async_shutdown_end) {
+                        if (initcall_debug)
+                                dev_info(dev,
+                                "async_shutdown_end called\n");
+                        dev->bus->async_shutdown_end(dev);
+                } else if (dev->driver && dev->driver->async_shutdown_end) {
+			if (initcall_debug)
+				dev_info(dev,
+				"async_shutdown_end called\n");
+			dev->driver->async_shutdown_end(dev);
+		}
+                device_unlock(dev);
+                if (parent)
+                        device_unlock(parent);
+
+                put_device(dev);
+                put_device(parent);
+        }
+	if (initcall_debug)
+		printk(KERN_INFO "device shutdown: waited for %d async shutdown callbacks\n", async_shutdown_count);
+	async_shutdown_count = 0;
+}
+
 /**
  * device_shutdown - call ->shutdown() on each device to shutdown.
  */
 void device_shutdown(void)
 {
 	struct device *dev, *parent;
+	bool async_busy;
 
 	wait_for_device_probe();
 	device_block_probing();
@@ -4741,6 +4821,8 @@ void device_shutdown(void)
 		dev = list_entry(devices_kset->list.prev, struct device,
 				kobj.entry);
 
+		async_busy = false;
+
 		/*
 		 * hold reference count of device's parent to
 		 * prevent it from being freed because parent's
@@ -4748,6 +4830,17 @@ void device_shutdown(void)
 		 */
 		parent = get_device(dev->parent);
 		get_device(dev);
+
+                if (!may_shutdown_device(dev)) {
+			put_device(dev);
+			put_device(parent);
+
+			spin_unlock(&devices_kset->list_lock);
+			wait_for_active_async_shutdown();
+			spin_lock(&devices_kset->list_lock);
+			continue;
+		}
+
 		/*
 		 * Make sure the device is off the kset list, in the
 		 * event that dev->*->shutdown() doesn't remove it.
@@ -4769,26 +4862,43 @@ void device_shutdown(void)
 				dev_info(dev, "shutdown_pre\n");
 			dev->class->shutdown_pre(dev);
 		}
-		if (dev->bus && dev->bus->shutdown) {
+		if (dev->bus && dev->bus->async_shutdown_start) {
+			async_shutdown_start(dev, dev->bus->async_shutdown_start);
+			async_busy = true;
+		} else if (dev->bus && dev->bus->shutdown) {
 			if (initcall_debug)
 				dev_info(dev, "shutdown\n");
 			dev->bus->shutdown(dev);
+		} else if (dev->driver && dev->driver->async_shutdown_start) {
+			async_shutdown_start(dev, dev->driver->async_shutdown_start);
+			async_busy = true;
 		} else if (dev->driver && dev->driver->shutdown) {
 			if (initcall_debug)
 				dev_info(dev, "shutdown\n");
 			dev->driver->shutdown(dev);
+		} else {
+			if (initcall_debug)
+				dev_info(dev, "no shutdown callback\n");
 		}
 
 		device_unlock(dev);
 		if (parent)
 			device_unlock(parent);
 
-		put_device(dev);
-		put_device(parent);
+		/* if device has an async shutdown, drop the ref when done */
+		if (!async_busy) {
+			put_device(dev);
+			put_device(parent);
+		}
 
 		spin_lock(&devices_kset->list_lock);
 	}
 	spin_unlock(&devices_kset->list_lock);
+	/*
+	 * Wait for any async shutdown still running.
+	 */
+	if (!list_empty(&async_shutdown_list))
+		wait_for_active_async_shutdown();
 }
 
 /*
diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
index 5ef4ec1c36c3..7a4a2ff0bc23 100644
--- a/include/linux/device/bus.h
+++ b/include/linux/device/bus.h
@@ -48,7 +48,11 @@ struct fwnode_handle;
  *		will never get called until they do.
  * @remove:	Called when a device removed from this bus.
  * @shutdown:	Called at shut-down time to quiesce the device.
- *
+ * @async_shutdown_start:	Optional call to support and begin the shutdown
+ *				process on the device in an asynchronous manner.
+ * @async_shutdown_end:	        Optional call to complete an asynchronous
+ *				shutdown of the device. Must be provided if a
+ *				sync_shutdown_start call is provided.
  * @online:	Called to put the device back online (after offlining it).
  * @offline:	Called to put the device offline for hot-removal. May fail.
  *
@@ -87,6 +91,8 @@ struct bus_type {
 	void (*sync_state)(struct device *dev);
 	void (*remove)(struct device *dev);
 	void (*shutdown)(struct device *dev);
+	void (*async_shutdown_start)(struct device *dev);
+	void (*async_shutdown_end)(struct device *dev);
 
 	int (*online)(struct device *dev);
 	int (*offline)(struct device *dev);
diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index 7738f458995f..af0ad2d3687a 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -71,6 +71,11 @@ enum probe_type {
  * @remove:	Called when the device is removed from the system to
  *		unbind a device from this driver.
  * @shutdown:	Called at shut-down time to quiesce the device.
+ * @async_shutdown_start:	Optional call to support and begin the shutdown
+ *				process on the device in an asynchronous manner.
+ * @async_shutdown_end:		Optional call to complete an asynchronous
+ *				shutdown of the device. Must be provided if a
+ *				sync_shutdown_start call is provided.
  * @suspend:	Called to put the device to sleep mode. Usually to a
  *		low power state.
  * @resume:	Called to bring a device from sleep mode.
@@ -110,6 +115,8 @@ struct device_driver {
 	void (*sync_state)(struct device *dev);
 	int (*remove) (struct device *dev);
 	void (*shutdown) (struct device *dev);
+	void (*async_shutdown_start) (struct device *dev);
+	void (*async_shutdown_end) (struct device *dev);
 	int (*suspend) (struct device *dev, pm_message_t state);
 	int (*resume) (struct device *dev);
 	const struct attribute_group **groups;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 2/6] Improve ability to perform async shutdown in parallel
  2024-02-07 18:40 [RFC PATCH 0/6] async device shutdown support David Jeffery
  2024-02-07 18:40 ` [RFC PATCH 1/6] minimal async shutdown infrastructure David Jeffery
@ 2024-02-07 18:40 ` David Jeffery
  2024-02-07 18:40 ` [RFC PATCH 3/6] pci bus async shutdown support David Jeffery
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: David Jeffery @ 2024-02-07 18:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-nvme, linux-scsi, Greg Kroah-Hartman, Rafael J . Wysocki,
	David Jeffery, Laurence Oberman

Expands the async shutdown implementation to allow more cases of parallel
async shutdown. A field is added so that a device under async shutdown can
mark its parent as busy due to the async shutdown. A busy parent on reaching
the head of the shutdown list gets stored and flags its own parent as busy.

Once the async shutdown operations are completed, the stored parents are
returned to the shutdown list and shut down in an order maintaining their
parent-child ordering. Unlike the minimal implementation, this allows more
end nodes of the device tree to be under async shutdown in parallel.

A cap on the number of async shutdown devices is still enforced, though not
required and could be removed if desired.

Signed-off-by: David Jeffery <djeffery@redhat.com>
Tested-by:     Laurence Oberman <loberman@redhat.com>

---
 drivers/base/base.h |  1 +
 drivers/base/core.c | 71 ++++++++++++++++++++++++++++++++-------------
 2 files changed, 52 insertions(+), 20 deletions(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index eb4c0ace9242..954008bd39e5 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -116,6 +116,7 @@ struct device_private {
 	char *deferred_probe_reason;
 	struct device *device;
 	u8 dead:1;
+	u8 child_shutdown:1;
 };
 #define to_device_private_parent(obj)	\
 	container_of(obj, struct device_private, knode_parent)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 5bc2282c00cd..e88d418bf0fd 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -4720,9 +4720,10 @@ int device_change_owner(struct device *dev, kuid_t kuid, kgid_t kgid)
 EXPORT_SYMBOL_GPL(device_change_owner);
 
 
-#define MAX_ASYNC_SHUTDOWNS 32
+#define MAX_ASYNC_SHUTDOWNS 256
 static int async_shutdown_count;
 static LIST_HEAD(async_shutdown_list);
+static LIST_HEAD(async_delayed_list);
 
 /**
  * If a device has a child busy with an async shutdown or there are too many
@@ -4730,15 +4731,15 @@ static LIST_HEAD(async_shutdown_list);
  */
 static bool may_shutdown_device(struct device *dev)
 {
-	struct device *tmp;
-
-	if (async_shutdown_count >= MAX_ASYNC_SHUTDOWNS)
+	if (dev->p->child_shutdown) {
+		if (list_empty(&async_shutdown_list)) {
+			dev_err(dev, "child_shutdown set but no children? Clearing\n");
+			dev->p->child_shutdown = 0;
+			return true;
+		}
 		return false;
-
-	list_for_each_entry(tmp, &async_shutdown_list, kobj.entry) {
-		if (tmp->parent == dev)
-			return false;
 	}
+
 	return true;
 }
 
@@ -4753,6 +4754,9 @@ static void async_shutdown_start(struct device *dev, void (*callback) (struct de
 	(*callback)(dev);
 	list_add_tail(&dev->kobj.entry, &async_shutdown_list);
 	async_shutdown_count++;
+
+	if (dev->parent)
+		dev->parent->p->child_shutdown = 1;
 }
 
 /**
@@ -4760,7 +4764,7 @@ static void async_shutdown_start(struct device *dev, void (*callback) (struct de
  */
 static void wait_for_active_async_shutdown(void)
 {
-	struct device *dev, *parent;
+	struct device *dev, *parent, *tmp;
 
         while (!list_empty(&async_shutdown_list)) {
                 dev = list_entry(async_shutdown_list.next, struct device,
@@ -4787,15 +4791,29 @@ static void wait_for_active_async_shutdown(void)
 			dev->driver->async_shutdown_end(dev);
 		}
                 device_unlock(dev);
-                if (parent)
-                        device_unlock(parent);
-
+                if (parent) {
+			tmp = parent;
+			do {
+				tmp->p->child_shutdown = 0;
+				device_unlock(tmp);
+
+				tmp = tmp->parent;
+				if (!tmp || !tmp->p->child_shutdown)
+					break;
+				device_lock(tmp);
+			} while (1);
+		}
                 put_device(dev);
                 put_device(parent);
         }
 	if (initcall_debug)
 		printk(KERN_INFO "device shutdown: waited for %d async shutdown callbacks\n", async_shutdown_count);
+
 	async_shutdown_count = 0;
+	spin_lock(&devices_kset->list_lock);
+	list_splice_tail_init(&async_delayed_list, &devices_kset->list);
+	spin_unlock(&devices_kset->list_lock);
+
 }
 
 /**
@@ -4810,7 +4828,7 @@ void device_shutdown(void)
 	device_block_probing();
 
 	cpufreq_suspend();
-
+restart:
 	spin_lock(&devices_kset->list_lock);
 	/*
 	 * Walk the devices list backward, shutting down each in turn.
@@ -4832,12 +4850,15 @@ void device_shutdown(void)
 		get_device(dev);
 
                 if (!may_shutdown_device(dev)) {
+			list_move(&dev->kobj.entry, &async_delayed_list);
+			if (parent) {
+				device_lock(parent);
+				parent->p->child_shutdown = 1;
+				device_unlock(parent);
+			}
+
 			put_device(dev);
 			put_device(parent);
-
-			spin_unlock(&devices_kset->list_lock);
-			wait_for_active_async_shutdown();
-			spin_lock(&devices_kset->list_lock);
 			continue;
 		}
 
@@ -4863,14 +4884,16 @@ void device_shutdown(void)
 			dev->class->shutdown_pre(dev);
 		}
 		if (dev->bus && dev->bus->async_shutdown_start) {
-			async_shutdown_start(dev, dev->bus->async_shutdown_start);
+			async_shutdown_start(dev,
+					     dev->bus->async_shutdown_start);
 			async_busy = true;
 		} else if (dev->bus && dev->bus->shutdown) {
 			if (initcall_debug)
 				dev_info(dev, "shutdown\n");
 			dev->bus->shutdown(dev);
 		} else if (dev->driver && dev->driver->async_shutdown_start) {
-			async_shutdown_start(dev, dev->driver->async_shutdown_start);
+			async_shutdown_start(dev,
+					     dev->driver->async_shutdown_start);
 			async_busy = true;
 		} else if (dev->driver && dev->driver->shutdown) {
 			if (initcall_debug)
@@ -4891,14 +4914,22 @@ void device_shutdown(void)
 			put_device(parent);
 		}
 
+		if (async_shutdown_count == MAX_ASYNC_SHUTDOWNS)
+			wait_for_active_async_shutdown();
+
 		spin_lock(&devices_kset->list_lock);
 	}
 	spin_unlock(&devices_kset->list_lock);
 	/*
-	 * Wait for any async shutdown still running.
+	 * Wait for any async shutdown still running, then restart the loop
+	 * if the list is no longer empty from delayed entries returning to
+	 * the list.
 	 */
 	if (!list_empty(&async_shutdown_list))
 		wait_for_active_async_shutdown();
+
+	if(!list_empty(&devices_kset->list))
+		goto restart;
 }
 
 /*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 3/6] pci bus async shutdown support
  2024-02-07 18:40 [RFC PATCH 0/6] async device shutdown support David Jeffery
  2024-02-07 18:40 ` [RFC PATCH 1/6] minimal async shutdown infrastructure David Jeffery
  2024-02-07 18:40 ` [RFC PATCH 2/6] Improve ability to perform async shutdown in parallel David Jeffery
@ 2024-02-07 18:40 ` David Jeffery
  2024-02-07 18:40 ` [RFC PATCH 4/6] pci nvme " David Jeffery
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: David Jeffery @ 2024-02-07 18:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-nvme, linux-scsi, Greg Kroah-Hartman, Rafael J . Wysocki,
	David Jeffery, Laurence Oberman

Add async shutdown shutdown fields and Convert pci's shutdown logic into
async shutdown calls so that individual pci device drivers can implement
async shutdown.

Signed-off-by: David Jeffery <djeffery@redhat.com>
Tested-by:     Laurence Oberman <loberman@redhat.com>

---
 drivers/pci/pci-driver.c | 24 ++++++++++++++++++++++--
 include/linux/pci.h      |  4 ++++
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 51ec9e7e784f..0ad418905115 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -502,16 +502,28 @@ static void pci_device_remove(struct device *dev)
 	pci_dev_put(pci_dev);
 }
 
-static void pci_device_shutdown(struct device *dev)
+static void pci_device_async_shutdown_start(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct pci_driver *drv = pci_dev->driver;
 
 	pm_runtime_resume(dev);
 
-	if (drv && drv->shutdown)
+	if (drv && drv->async_shutdown_start)
+		drv->async_shutdown_start(pci_dev);
+	else if (drv && drv->shutdown)
 		drv->shutdown(pci_dev);
 
+}
+
+static void pci_device_async_shutdown_end(struct device *dev)
+{
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+	struct pci_driver *drv = pci_dev->driver;
+
+	if (drv && drv->async_shutdown_end)
+		drv->async_shutdown_end(pci_dev);
+
 	/*
 	 * If this is a kexec reboot, turn off Bus Master bit on the
 	 * device to tell it to not continue to do DMA. Don't touch
@@ -523,6 +535,12 @@ static void pci_device_shutdown(struct device *dev)
 		pci_clear_master(pci_dev);
 }
 
+static void pci_device_shutdown(struct device *dev)
+{
+	pci_device_async_shutdown_start(dev);
+	pci_device_async_shutdown_end(dev);
+}
+
 #ifdef CONFIG_PM_SLEEP
 
 /* Auxiliary functions used for system resume */
@@ -1682,6 +1700,8 @@ struct bus_type pci_bus_type = {
 	.probe		= pci_device_probe,
 	.remove		= pci_device_remove,
 	.shutdown	= pci_device_shutdown,
+	.async_shutdown_start	= pci_device_async_shutdown_start,
+	.async_shutdown_end	= pci_device_async_shutdown_end,
 	.dev_groups	= pci_dev_groups,
 	.bus_groups	= pci_bus_groups,
 	.drv_groups	= pci_drv_groups,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index add9368e6314..6f61325c956a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -917,6 +917,8 @@ struct module;
  *		Useful for enabling wake-on-lan (NIC) or changing
  *		the power state of a device before reboot.
  *		e.g. drivers/net/e100.c.
+ * @async_shutdown_start:	This starts the asynchronous shutdown
+ * @async_shutdown_end:	This completes the started asynchronous shutdown
  * @sriov_configure: Optional driver callback to allow configuration of
  *		number of VFs to enable via sysfs "sriov_numvfs" file.
  * @sriov_set_msix_vec_count: PF Driver callback to change number of MSI-X
@@ -947,6 +949,8 @@ struct pci_driver {
 	int  (*suspend)(struct pci_dev *dev, pm_message_t state);	/* Device suspended */
 	int  (*resume)(struct pci_dev *dev);	/* Device woken up */
 	void (*shutdown)(struct pci_dev *dev);
+	void (*async_shutdown_start)(struct pci_dev *dev);
+	void (*async_shutdown_end)(struct pci_dev *dev);
 	int  (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */
 	int  (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */
 	u32  (*sriov_get_vf_total_msix)(struct pci_dev *pf);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 4/6] pci nvme async shutdown support
  2024-02-07 18:40 [RFC PATCH 0/6] async device shutdown support David Jeffery
                   ` (2 preceding siblings ...)
  2024-02-07 18:40 ` [RFC PATCH 3/6] pci bus async shutdown support David Jeffery
@ 2024-02-07 18:40 ` David Jeffery
  2024-02-07 18:40 ` [RFC PATCH 5/6] scsi mid layer support for async command submit David Jeffery
  2024-02-07 18:41 ` [RFC PATCH 6/6] sd: async cache flush on shutdown David Jeffery
  5 siblings, 0 replies; 9+ messages in thread
From: David Jeffery @ 2024-02-07 18:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-nvme, linux-scsi, Greg Kroah-Hartman, Rafael J . Wysocki,
	David Jeffery, Laurence Oberman

Alter nvme so that the pci nvme driver may shut down the hardware
asynchronously, providing a large benefit to shutdown time with large numbers
of nvme devices in some configurations.

The functionality of nvme_disable_ctrl is split in separate functions for
starting disabling the hardware and for waiting for the hardware to report
success. A wrapper is created which provides the same interface for the
other nvme types so they remain unchanged.

And new field is added to nvme-pci's nvme_dev to track the nvme shutdown
state so that the nvme async_shutdown_end call knows what if anything it
still needs to do.

Signed-off-by: David Jeffery <djeffery@redhat.com>
Tested-by:     Laurence Oberman <loberman@redhat.com>

---
 drivers/nvme/host/core.c | 26 +++++++++++++++-----
 drivers/nvme/host/nvme.h |  2 ++
 drivers/nvme/host/pci.c  | 53 ++++++++++++++++++++++++++++++++++++++--
 3 files changed, 73 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 85ab0fcf9e88..b24985a843a8 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2242,20 +2242,21 @@ static int nvme_wait_ready(struct nvme_ctrl *ctrl, u32 mask, u32 val,
 	return ret;
 }
 
-int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
+int nvme_disable_ctrl_send(struct nvme_ctrl *ctrl, bool shutdown)
 {
-	int ret;
-
 	ctrl->ctrl_config &= ~NVME_CC_SHN_MASK;
 	if (shutdown)
 		ctrl->ctrl_config |= NVME_CC_SHN_NORMAL;
 	else
 		ctrl->ctrl_config &= ~NVME_CC_ENABLE;
 
-	ret = ctrl->ops->reg_write32(ctrl, NVME_REG_CC, ctrl->ctrl_config);
-	if (ret)
-		return ret;
+	return ctrl->ops->reg_write32(ctrl, NVME_REG_CC, ctrl->ctrl_config);
+}
+EXPORT_SYMBOL_GPL(nvme_disable_ctrl_send);
 
+
+int nvme_disable_ctrl_wait(struct nvme_ctrl *ctrl, bool shutdown)
+{
 	if (shutdown) {
 		return nvme_wait_ready(ctrl, NVME_CSTS_SHST_MASK,
 				       NVME_CSTS_SHST_CMPLT,
@@ -2266,6 +2267,19 @@ int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
 	return nvme_wait_ready(ctrl, NVME_CSTS_RDY, 0,
 			       (NVME_CAP_TIMEOUT(ctrl->cap) + 1) / 2, "reset");
 }
+EXPORT_SYMBOL_GPL(nvme_disable_ctrl_wait);
+
+
+int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
+{
+	int ret;
+
+	ret = nvme_disable_ctrl_send(ctrl, shutdown);
+	if (!ret)
+		ret = nvme_disable_ctrl_wait(ctrl, shutdown);
+
+	return ret;
+}
 EXPORT_SYMBOL_GPL(nvme_disable_ctrl);
 
 int nvme_enable_ctrl(struct nvme_ctrl *ctrl)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 030c80818240..5bdd862328d4 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -761,6 +761,8 @@ void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl);
 bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 		enum nvme_ctrl_state new_state);
 int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown);
+int nvme_disable_ctrl_send(struct nvme_ctrl *ctrl, bool shutdown);
+int nvme_disable_ctrl_wait(struct nvme_ctrl *ctrl, bool shutdown);
 int nvme_enable_ctrl(struct nvme_ctrl *ctrl);
 int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 		const struct nvme_ctrl_ops *ops, unsigned long quirks);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index c1d6357ec98a..c2a7b3d28a56 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -112,6 +112,12 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
 static void nvme_delete_io_queues(struct nvme_dev *dev);
 static void nvme_update_attrs(struct nvme_dev *dev);
 
+enum nvme_disable_state {
+	NVME_DISABLE_START = 0,
+	NVME_DISABLE_WAIT = 1,
+	NVME_DISABLE_DONE = 2,
+};
+
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
  */
@@ -159,6 +165,7 @@ struct nvme_dev {
 	unsigned int nr_allocated_queues;
 	unsigned int nr_write_queues;
 	unsigned int nr_poll_queues;
+	enum nvme_disable_state disable_state;
 };
 
 static int io_queue_depth_set(const char *val, const struct kernel_param *kp)
@@ -2574,12 +2581,14 @@ static bool nvme_pci_ctrl_is_dead(struct nvme_dev *dev)
 	return (csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY);
 }
 
-static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
+static void nvme_dev_disable_start(struct nvme_dev *dev, bool shutdown)
 {
 	enum nvme_ctrl_state state = nvme_ctrl_state(&dev->ctrl);
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 	bool dead;
 
+	dev->disable_state = NVME_DISABLE_START;
+
 	mutex_lock(&dev->shutdown_lock);
 	dead = nvme_pci_ctrl_is_dead(dev);
 	if (state == NVME_CTRL_LIVE || state == NVME_CTRL_RESETTING) {
@@ -2597,7 +2606,20 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 
 	if (!dead && dev->ctrl.queue_count > 0) {
 		nvme_delete_io_queues(dev);
-		nvme_disable_ctrl(&dev->ctrl, shutdown);
+		nvme_disable_ctrl_send(&dev->ctrl, shutdown);
+		dev->disable_state = NVME_DISABLE_WAIT;
+	}
+}
+
+static void nvme_dev_disable_end(struct nvme_dev *dev, bool shutdown)
+{
+	struct pci_dev *pdev = to_pci_dev(dev->dev);
+
+	if (dev->disable_state == NVME_DISABLE_DONE)
+		return;
+
+	if (dev->disable_state == NVME_DISABLE_WAIT) {
+		nvme_disable_ctrl_wait(&dev->ctrl, shutdown);
 		nvme_poll_irqdisable(&dev->queues[0]);
 	}
 	nvme_suspend_io_queues(dev);
@@ -2623,6 +2645,12 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	mutex_unlock(&dev->shutdown_lock);
 }
 
+static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
+{
+	nvme_dev_disable_start(dev, shutdown);
+	nvme_dev_disable_end(dev, shutdown);
+}
+
 static int nvme_disable_prepare_reset(struct nvme_dev *dev, bool shutdown)
 {
 	if (!nvme_wait_reset(&dev->ctrl))
@@ -3120,6 +3148,25 @@ static void nvme_shutdown(struct pci_dev *pdev)
 	nvme_disable_prepare_reset(dev, true);
 }
 
+static void nvme_shutdown_start(struct pci_dev *pdev)
+{
+	struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+        if (!nvme_wait_reset(&dev->ctrl)) {
+		dev->disable_state = NVME_DISABLE_DONE;
+                return;
+	}
+        nvme_dev_disable_start(dev, true);
+}
+
+static void nvme_shutdown_end(struct pci_dev *pdev)
+{
+	struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+	nvme_dev_disable_end(dev, true);
+}
+
+
 /*
  * The driver's remove may be called on a device in a partially initialized
  * state. This function must not have any dependencies on the device state in
@@ -3511,6 +3558,8 @@ static struct pci_driver nvme_driver = {
 	.probe		= nvme_probe,
 	.remove		= nvme_remove,
 	.shutdown	= nvme_shutdown,
+	.async_shutdown_start 	= nvme_shutdown_start,
+	.async_shutdown_end 	= nvme_shutdown_end,
 	.driver		= {
 		.probe_type	= PROBE_PREFER_ASYNCHRONOUS,
 #ifdef CONFIG_PM_SLEEP
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 5/6] scsi mid layer support for async command submit
  2024-02-07 18:40 [RFC PATCH 0/6] async device shutdown support David Jeffery
                   ` (3 preceding siblings ...)
  2024-02-07 18:40 ` [RFC PATCH 4/6] pci nvme " David Jeffery
@ 2024-02-07 18:40 ` David Jeffery
  2024-02-07 18:41 ` [RFC PATCH 6/6] sd: async cache flush on shutdown David Jeffery
  5 siblings, 0 replies; 9+ messages in thread
From: David Jeffery @ 2024-02-07 18:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-nvme, linux-scsi, Greg Kroah-Hartman, Rafael J . Wysocki,
	David Jeffery, Laurence Oberman

Create scsi_execute_cmd_nowait to allow asynchronous scsi command submit.
Parts of the code originally in scsi_execute_cmd are shifted into helper
functions used by both scsi_execute_cmd and the new scsi_execute_cmd_nowait.

The scsi_exec_args struct is expanded to contain the fields needed for
the completion and callback for the async path.

Signed-off-by: David Jeffery <djeffery@redhat.com>
Tested-by:     Laurence Oberman <loberman@redhat.com>

---
 drivers/scsi/scsi_lib.c    | 138 +++++++++++++++++++++++++++++--------
 include/scsi/scsi_device.h |   8 +++
 2 files changed, 118 insertions(+), 28 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1fb80eae9a63..fe35bc2021e3 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -185,42 +185,37 @@ void scsi_queue_insert(struct scsi_cmnd *cmd, int reason)
 }
 
 /**
- * scsi_execute_cmd - insert request and wait for the result
+ * scsi_execute_init - helper for setting up a scsi_cmnd in a request
  * @sdev:	scsi_device
  * @cmd:	scsi command
  * @opf:	block layer request cmd_flags
- * @buffer:	data buffer
- * @bufflen:	len of buffer
  * @timeout:	request timeout in HZ
  * @retries:	number of times to retry request
- * @args:	Optional args. See struct definition for field descriptions
+ * @args:	scsi command args
  *
- * Returns the scsi_cmnd result field if a command was executed, or a negative
- * Linux error code if we didn't get that far.
+ * Returns a request if successful, or an error pointer if there was a failure.
  */
-int scsi_execute_cmd(struct scsi_device *sdev, const unsigned char *cmd,
-		     blk_opf_t opf, void *buffer, unsigned int bufflen,
-		     int timeout, int retries,
-		     const struct scsi_exec_args *args)
+static struct request *scsi_execute_init(struct scsi_device *sdev,
+					 const unsigned char *cmd,
+					 blk_opf_t opf,
+					 int timeout, int retries,
+					 struct scsi_exec_args *args)
 {
-	static const struct scsi_exec_args default_args;
 	struct request *req;
 	struct scsi_cmnd *scmd;
 	int ret;
 
-	if (!args)
-		args = &default_args;
-	else if (WARN_ON_ONCE(args->sense &&
-			      args->sense_len != SCSI_SENSE_BUFFERSIZE))
-		return -EINVAL;
+	if (WARN_ON_ONCE(args->sense &&
+			 args->sense_len != SCSI_SENSE_BUFFERSIZE))
+		return ERR_PTR(-EINVAL);
 
 	req = scsi_alloc_request(sdev->request_queue, opf, args->req_flags);
 	if (IS_ERR(req))
-		return PTR_ERR(req);
+		return req;
 
-	if (bufflen) {
-		ret = blk_rq_map_kern(sdev->request_queue, req,
-				      buffer, bufflen, GFP_NOIO);
+	if (args->bufflen) {
+		ret = blk_rq_map_kern(sdev->request_queue, req, args->buffer,
+				      args->bufflen, GFP_NOIO);
 		if (ret)
 			goto out;
 	}
@@ -232,19 +227,27 @@ int scsi_execute_cmd(struct scsi_device *sdev, const unsigned char *cmd,
 	req->timeout = timeout;
 	req->rq_flags |= RQF_QUIET;
 
-	/*
-	 * head injection *required* here otherwise quiesce won't work
-	 */
-	blk_execute_rq(req, true);
+	return req;
+out:
+	blk_mq_free_request(req);
+	return ERR_PTR(ret);
+}
+
+static int scsi_execute_uninit(struct request *req,
+				struct scsi_exec_args *args)
+{
+	struct scsi_cmnd *scmd;
 
+	scmd = blk_mq_rq_to_pdu(req);
 	/*
 	 * Some devices (USB mass-storage in particular) may transfer
 	 * garbage data together with a residue indicating that the data
 	 * is invalid.  Prevent the garbage from being misinterpreted
 	 * and prevent security leaks by zeroing out the excess data.
 	 */
-	if (unlikely(scmd->resid_len > 0 && scmd->resid_len <= bufflen))
-		memset(buffer + bufflen - scmd->resid_len, 0, scmd->resid_len);
+	if (unlikely(scmd->resid_len > 0 && scmd->resid_len <= args->bufflen))
+		memset(args->buffer + args->bufflen - scmd->resid_len, 0,
+		       scmd->resid_len);
 
 	if (args->resid)
 		*args->resid = scmd->resid_len;
@@ -254,14 +257,93 @@ int scsi_execute_cmd(struct scsi_device *sdev, const unsigned char *cmd,
 		scsi_normalize_sense(scmd->sense_buffer, scmd->sense_len,
 				     args->sshdr);
 
-	ret = scmd->result;
- out:
+	args->result = scmd->result;
+
+	if (args->callback)
+		args->callback(scmd, args);
+
+	return scmd->result;
+}
+
+/**
+ * scsi_execute_cmd - insert request and wait for the result
+ * @sdev:	scsi_device
+ * @cmd:	scsi command
+ * @opf:	block layer request cmd_flags
+ * @buffer:	data buffer
+ * @bufflen:	len of buffer
+ * @timeout:	request timeout in HZ
+ * @retries:	number of times to retry request
+ * @args:	Optional args. See struct definition for field descriptions
+ *
+ * Returns the scsi_cmnd result field if a command was executed, or a negative
+ * Linux error code if we didn't get that far.
+ */
+int scsi_execute_cmd(struct scsi_device *sdev, const unsigned char *cmd,
+		     blk_opf_t opf, void *buffer, unsigned int bufflen,
+		     int timeout, int retries,
+		     const struct scsi_exec_args *const_args)
+{
+	struct scsi_exec_args args;
+	int ret;
+	struct request *req;
+
+	if (!const_args)
+		memset(&args, 0, sizeof(struct scsi_exec_args));
+	else
+		args = *const_args;
+
+	args.buffer = buffer;
+	args.bufflen = bufflen;
+
+	req = scsi_execute_init(sdev, cmd, opf, timeout, retries, &args);
+
+	if (IS_ERR(req))
+		return PTR_ERR(req);
+
+	/*
+	 * head injection *required* here otherwise quiesce won't work
+	 */
+	blk_execute_rq(req, true);
+
+	ret = scsi_execute_uninit(req, &args);
+
 	blk_mq_free_request(req);
 
 	return ret;
 }
 EXPORT_SYMBOL(scsi_execute_cmd);
 
+
+static enum rq_end_io_ret scsi_execute_cmd_complete(struct request *req,
+						    blk_status_t ret)
+{
+	struct scsi_exec_args *args = req->end_io_data;
+
+	scsi_execute_uninit(req, args);
+	return RQ_END_IO_FREE;
+}
+
+int scsi_execute_cmd_nowait(struct scsi_device *sdev, const unsigned char *cmd,
+			    blk_opf_t opf, int timeout, int retries,
+			    struct scsi_exec_args *args)
+{
+	struct request *req;
+
+	req = scsi_execute_init(sdev, cmd, opf, timeout, retries, args);
+
+	if (IS_ERR(req))
+		return PTR_ERR(req);
+
+	req->end_io = scsi_execute_cmd_complete;
+	req->end_io_data = args;
+
+	blk_execute_rq_nowait(req, true);
+
+	return 0;
+}
+EXPORT_SYMBOL(scsi_execute_cmd_nowait);
+
 /*
  * Wake up the error handler if necessary. Avoid as follows that the error
  * handler is not woken up if host in-flight requests number ==
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 5ec1e71a09de..c80c98b48bc1 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -497,6 +497,10 @@ struct scsi_exec_args {
 	blk_mq_req_flags_t req_flags;	/* BLK_MQ_REQ flags */
 	int scmd_flags;			/* SCMD flags */
 	int *resid;			/* residual length */
+	void *buffer;			/* data buffer */
+	unsigned int bufflen;		/* buffer length */
+	int result;			/* scsi layer result */
+	void (*callback)(struct scsi_cmnd *scmd, struct scsi_exec_args *args);
 };
 
 int scsi_execute_cmd(struct scsi_device *sdev, const unsigned char *cmd,
@@ -504,6 +508,10 @@ int scsi_execute_cmd(struct scsi_device *sdev, const unsigned char *cmd,
 		     int timeout, int retries,
 		     const struct scsi_exec_args *args);
 
+int scsi_execute_cmd_nowait(struct scsi_device *sdev, const unsigned char *cmd,
+			    blk_opf_t opf, int timeout, int retries,
+			    struct scsi_exec_args *args);
+
 extern void sdev_disable_disk_events(struct scsi_device *sdev);
 extern void sdev_enable_disk_events(struct scsi_device *sdev);
 extern int scsi_vpd_lun_id(struct scsi_device *, char *, size_t);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 6/6] sd: async cache flush on shutdown
  2024-02-07 18:40 [RFC PATCH 0/6] async device shutdown support David Jeffery
                   ` (4 preceding siblings ...)
  2024-02-07 18:40 ` [RFC PATCH 5/6] scsi mid layer support for async command submit David Jeffery
@ 2024-02-07 18:41 ` David Jeffery
  5 siblings, 0 replies; 9+ messages in thread
From: David Jeffery @ 2024-02-07 18:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-nvme, linux-scsi, Greg Kroah-Hartman, Rafael J . Wysocki,
	David Jeffery, Laurence Oberman

Add async shutdown for the cache flush to the sd device by sending a
SYNCHRONIZE_CACHE command asynchronously. If there is any sort of error,
falls back to the synchronous sd_sync_cache() to try again and resolve
any errors.

Signed-off-by: David Jeffery <djeffery@redhat.com>
Tested-by:     Laurence Oberman <loberman@redhat.com>

---
 drivers/scsi/sd.c | 66 ++++++++++++++++++++++++++++++++++++++++++-----
 drivers/scsi/sd.h |  2 ++
 2 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 0833b3e6aa6e..f972310df76a 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3838,23 +3838,64 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
 	return 0;
 }
 
-/*
- * Send a SYNCHRONIZE CACHE instruction down to the device through
- * the normal SCSI command structure.  Wait for the command to
- * complete.
- */
-static void sd_shutdown(struct device *dev)
+static void sd_sync_cache_callback(struct scsi_cmnd *scmd,
+				   struct scsi_exec_args *args) {
+	struct scsi_disk *sdkp;
+
+	sdkp = container_of(args, struct scsi_disk, shutdown_params);
+	complete(&sdkp->shutdown_done);
+}
+
+static void sd_async_shutdown_start(struct device *dev)
 {
 	struct scsi_disk *sdkp = dev_get_drvdata(dev);
+	const int timeout = sdkp->device->request_queue->rq_timeout
+			    * SD_FLUSH_TIMEOUT_MULTIPLIER;
+	int ret;
 
 	if (!sdkp)
 		return;         /* this can happen */
 
+	init_completion(&sdkp->shutdown_done);
+	sdkp->shutdown_params.callback = sd_sync_cache_callback;
+
 	if (pm_runtime_suspended(dev))
 		return;
 
 	if (sdkp->WCE && sdkp->media_present) {
+		unsigned char cmd[16] = { 0 };
+
 		sd_printk(KERN_NOTICE, sdkp, "Synchronizing SCSI cache\n");
+		if (sdkp->device->use_16_for_sync)
+			cmd[0] = SYNCHRONIZE_CACHE_16;
+		else
+			cmd[0] = SYNCHRONIZE_CACHE;
+
+		ret = scsi_execute_cmd_nowait(sdkp->device, cmd, REQ_OP_DRV_IN,
+					      timeout, sdkp->max_retries,
+					      &sdkp->shutdown_params);
+		if (!ret)
+			return;
+		sdkp->shutdown_params.result = ret;
+	}
+	/* no async I/O to do, go ahead and mark it complete */
+	complete(&sdkp->shutdown_done);
+}
+
+static void sd_async_shutdown_end(struct device *dev)
+{
+	struct scsi_disk *sdkp = dev_get_drvdata(dev);
+
+	if (!sdkp)
+		return;
+
+	if (pm_runtime_suspended(dev))
+		return;
+
+	wait_for_completion(&sdkp->shutdown_done);
+
+	if (sdkp->WCE && sdkp->media_present && sdkp->shutdown_params.result) {
+		/* for any error with the async flush, retry as sync */
 		sd_sync_cache(sdkp);
 	}
 
@@ -3867,6 +3908,17 @@ static void sd_shutdown(struct device *dev)
 	}
 }
 
+/*
+ * Send a SYNCHRONIZE CACHE instruction down to the device through
+ * the normal SCSI command structure.  Wait for the command to
+ * complete.
+ */
+static void sd_shutdown(struct device *dev)
+{
+	sd_async_shutdown_start(dev);
+	sd_async_shutdown_end(dev);
+}
+
 static inline bool sd_do_start_stop(struct scsi_device *sdev, bool runtime)
 {
 	return (sdev->manage_system_start_stop && !runtime) ||
@@ -4003,6 +4055,8 @@ static struct scsi_driver sd_template = {
 		.probe_type	= PROBE_PREFER_ASYNCHRONOUS,
 		.remove		= sd_remove,
 		.shutdown	= sd_shutdown,
+		.async_shutdown_start = sd_async_shutdown_start,
+		.async_shutdown_end   = sd_async_shutdown_end,
 		.pm		= &sd_pm_ops,
 	},
 	.rescan			= sd_rescan,
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 409dda5350d1..7b5098211cec 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -91,6 +91,8 @@ struct scsi_disk {
 	struct device	disk_dev;
 	struct gendisk	*disk;
 	struct opal_dev *opal_dev;
+	struct scsi_exec_args shutdown_params;
+	struct completion shutdown_done;
 #ifdef CONFIG_BLK_DEV_ZONED
 	/* Updated during revalidation before the gendisk capacity is known. */
 	struct zoned_disk_info	early_zone_info;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 1/6] minimal async shutdown infrastructure
  2024-02-07 18:40 ` [RFC PATCH 1/6] minimal async shutdown infrastructure David Jeffery
@ 2024-02-14  3:42   ` Saravana Kannan
  2024-02-14 20:55     ` David Jeffery
  0 siblings, 1 reply; 9+ messages in thread
From: Saravana Kannan @ 2024-02-14  3:42 UTC (permalink / raw)
  To: David Jeffery
  Cc: linux-kernel, linux-nvme, linux-scsi, Greg Kroah-Hartman,
	Rafael J . Wysocki, Laurence Oberman, Android Kernel Team

On Wed, Feb 7, 2024 at 10:40 AM David Jeffery <djeffery@redhat.com> wrote:
>
> Adds the async_shutdown_start and async_shutdown_end calls to perform async
> shutdown. Implements a very minimalist method of async shutdown support
> within device_shutdown(). The device at the head of the shutdown list is
> checked against a list of devices under async shutdown. If the head is a
> parent of a device on the async list, all active async shutdown operations
> are completed before the parent's shutdown call is performed.
>
> The number of async operations also has a max limit to prevent the list being
> checked for a child from getting overly large.
>
> Signed-off-by: David Jeffery <djeffery@redhat.com>
> Tested-by:     Laurence Oberman <loberman@redhat.com>
>
> ---
>  drivers/base/core.c           | 116 +++++++++++++++++++++++++++++++++-
>  include/linux/device/bus.h    |   8 ++-
>  include/linux/device/driver.h |   7 ++
>  3 files changed, 127 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 14d46af40f9a..5bc2282c00cd 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -4719,12 +4719,92 @@ int device_change_owner(struct device *dev, kuid_t kuid, kgid_t kgid)
>  }
>  EXPORT_SYMBOL_GPL(device_change_owner);
>
> +
> +#define MAX_ASYNC_SHUTDOWNS 32
> +static int async_shutdown_count;
> +static LIST_HEAD(async_shutdown_list);
> +
> +/**
> + * If a device has a child busy with an async shutdown or there are too many
> + * async shutdowns active, the device may not be shut down at this time.
> + */
> +static bool may_shutdown_device(struct device *dev)
> +{
> +       struct device *tmp;
> +
> +       if (async_shutdown_count >= MAX_ASYNC_SHUTDOWNS)
> +               return false;
> +
> +       list_for_each_entry(tmp, &async_shutdown_list, kobj.entry) {
> +               if (tmp->parent == dev)
> +                       return false;
> +       }
> +       return true;
> +}
> +
> +/**
> + * Call and track each async shutdown call
> + */
> +static void async_shutdown_start(struct device *dev, void (*callback) (struct device *))
> +{
> +       if (initcall_debug)
> +               dev_info(dev, "async_shutdown_start\n");
> +
> +       (*callback)(dev);
> +       list_add_tail(&dev->kobj.entry, &async_shutdown_list);
> +       async_shutdown_count++;
> +}
> +
> +/**
> + * Wait for all async shutdown operations currently active to complete
> + */
> +static void wait_for_active_async_shutdown(void)
> +{
> +       struct device *dev, *parent;
> +
> +        while (!list_empty(&async_shutdown_list)) {
> +                dev = list_entry(async_shutdown_list.next, struct device,
> +                                kobj.entry);
> +
> +                parent = dev->parent;

I didn't check the code thoroughly, but so there might be other big
issues. But you definitely need to take device links into account.
Shutdown all your consumers first similar to how you shutdown the
children devices first. Look at the async suspend/resume code for some
guidance.

> +
> +                /*
> +                 * Make sure the device is off the list
> +                 */
> +                list_del_init(&dev->kobj.entry);
> +                if (parent)
> +                        device_lock(parent);
> +                device_lock(dev);
> +                if (dev->bus && dev->bus->async_shutdown_end) {
> +                        if (initcall_debug)
> +                                dev_info(dev,
> +                                "async_shutdown_end called\n");
> +                        dev->bus->async_shutdown_end(dev);
> +                } else if (dev->driver && dev->driver->async_shutdown_end) {
> +                       if (initcall_debug)
> +                               dev_info(dev,
> +                               "async_shutdown_end called\n");
> +                       dev->driver->async_shutdown_end(dev);
> +               }
> +                device_unlock(dev);
> +                if (parent)
> +                        device_unlock(parent);
> +
> +                put_device(dev);
> +                put_device(parent);
> +        }
> +       if (initcall_debug)
> +               printk(KERN_INFO "device shutdown: waited for %d async shutdown callbacks\n", async_shutdown_count);
> +       async_shutdown_count = 0;
> +}
> +
>  /**
>   * device_shutdown - call ->shutdown() on each device to shutdown.
>   */
>  void device_shutdown(void)
>  {
>         struct device *dev, *parent;
> +       bool async_busy;
>
>         wait_for_device_probe();
>         device_block_probing();
> @@ -4741,6 +4821,8 @@ void device_shutdown(void)
>                 dev = list_entry(devices_kset->list.prev, struct device,
>                                 kobj.entry);
>
> +               async_busy = false;
> +
>                 /*
>                  * hold reference count of device's parent to
>                  * prevent it from being freed because parent's
> @@ -4748,6 +4830,17 @@ void device_shutdown(void)
>                  */
>                 parent = get_device(dev->parent);
>                 get_device(dev);
> +
> +                if (!may_shutdown_device(dev)) {
> +                       put_device(dev);
> +                       put_device(parent);
> +
> +                       spin_unlock(&devices_kset->list_lock);
> +                       wait_for_active_async_shutdown();
> +                       spin_lock(&devices_kset->list_lock);
> +                       continue;
> +               }
> +
>                 /*
>                  * Make sure the device is off the kset list, in the
>                  * event that dev->*->shutdown() doesn't remove it.
> @@ -4769,26 +4862,43 @@ void device_shutdown(void)
>                                 dev_info(dev, "shutdown_pre\n");
>                         dev->class->shutdown_pre(dev);
>                 }
> -               if (dev->bus && dev->bus->shutdown) {
> +               if (dev->bus && dev->bus->async_shutdown_start) {
> +                       async_shutdown_start(dev, dev->bus->async_shutdown_start);
> +                       async_busy = true;
> +               } else if (dev->bus && dev->bus->shutdown) {
>                         if (initcall_debug)
>                                 dev_info(dev, "shutdown\n");
>                         dev->bus->shutdown(dev);
> +               } else if (dev->driver && dev->driver->async_shutdown_start) {
> +                       async_shutdown_start(dev, dev->driver->async_shutdown_start);
> +                       async_busy = true;
>                 } else if (dev->driver && dev->driver->shutdown) {
>                         if (initcall_debug)
>                                 dev_info(dev, "shutdown\n");
>                         dev->driver->shutdown(dev);
> +               } else {
> +                       if (initcall_debug)
> +                               dev_info(dev, "no shutdown callback\n");
>                 }
>
>                 device_unlock(dev);
>                 if (parent)
>                         device_unlock(parent);
>
> -               put_device(dev);
> -               put_device(parent);
> +               /* if device has an async shutdown, drop the ref when done */
> +               if (!async_busy) {
> +                       put_device(dev);
> +                       put_device(parent);
> +               }
>
>                 spin_lock(&devices_kset->list_lock);
>         }
>         spin_unlock(&devices_kset->list_lock);
> +       /*
> +        * Wait for any async shutdown still running.
> +        */
> +       if (!list_empty(&async_shutdown_list))
> +               wait_for_active_async_shutdown();
>  }
>
>  /*
> diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
> index 5ef4ec1c36c3..7a4a2ff0bc23 100644
> --- a/include/linux/device/bus.h
> +++ b/include/linux/device/bus.h
> @@ -48,7 +48,11 @@ struct fwnode_handle;
>   *             will never get called until they do.
>   * @remove:    Called when a device removed from this bus.
>   * @shutdown:  Called at shut-down time to quiesce the device.
> - *
> + * @async_shutdown_start:      Optional call to support and begin the shutdown
> + *                             process on the device in an asynchronous manner.
> + * @async_shutdown_end:                Optional call to complete an asynchronous
> + *                             shutdown of the device. Must be provided if a
> + *                             sync_shutdown_start call is provided.
>   * @online:    Called to put the device back online (after offlining it).
>   * @offline:   Called to put the device offline for hot-removal. May fail.
>   *
> @@ -87,6 +91,8 @@ struct bus_type {
>         void (*sync_state)(struct device *dev);
>         void (*remove)(struct device *dev);
>         void (*shutdown)(struct device *dev);
> +       void (*async_shutdown_start)(struct device *dev);
> +       void (*async_shutdown_end)(struct device *dev);
>
>         int (*online)(struct device *dev);
>         int (*offline)(struct device *dev);
> diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
> index 7738f458995f..af0ad2d3687a 100644
> --- a/include/linux/device/driver.h
> +++ b/include/linux/device/driver.h
> @@ -71,6 +71,11 @@ enum probe_type {
>   * @remove:    Called when the device is removed from the system to
>   *             unbind a device from this driver.
>   * @shutdown:  Called at shut-down time to quiesce the device.
> + * @async_shutdown_start:      Optional call to support and begin the shutdown
> + *                             process on the device in an asynchronous manner.
> + * @async_shutdown_end:                Optional call to complete an asynchronous
> + *                             shutdown of the device. Must be provided if a
> + *                             sync_shutdown_start call is provided.
>   * @suspend:   Called to put the device to sleep mode. Usually to a
>   *             low power state.
>   * @resume:    Called to bring a device from sleep mode.
> @@ -110,6 +115,8 @@ struct device_driver {
>         void (*sync_state)(struct device *dev);
>         int (*remove) (struct device *dev);
>         void (*shutdown) (struct device *dev);
> +       void (*async_shutdown_start) (struct device *dev);
> +       void (*async_shutdown_end) (struct device *dev);

Why not use the existing shutdown and call it from an async thread and
wait for it to finish? Similar to how async probes are handled. Also,
adding separate ops for this feels clunky and a very narrow fix. Just
use a flag to indicate the driver can support async shutdown using the
existing shutdown() op.

-Saravana

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 1/6] minimal async shutdown infrastructure
  2024-02-14  3:42   ` Saravana Kannan
@ 2024-02-14 20:55     ` David Jeffery
  0 siblings, 0 replies; 9+ messages in thread
From: David Jeffery @ 2024-02-14 20:55 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: linux-kernel, linux-nvme, linux-scsi, Greg Kroah-Hartman,
	Rafael J . Wysocki, Laurence Oberman, Android Kernel Team

On Tue, Feb 13, 2024 at 10:43 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Wed, Feb 7, 2024 at 10:40 AM David Jeffery <djeffery@redhat.com> wrote:
> >
> > + * Wait for all async shutdown operations currently active to complete
> > + */
> > +static void wait_for_active_async_shutdown(void)
> > +{
> > +       struct device *dev, *parent;
> > +
> > +        while (!list_empty(&async_shutdown_list)) {
> > +                dev = list_entry(async_shutdown_list.next, struct device,
> > +                                kobj.entry);
> > +
> > +                parent = dev->parent;
>
> I didn't check the code thoroughly, but so there might be other big
> issues. But you definitely need to take device links into account.
> Shutdown all your consumers first similar to how you shutdown the
> children devices first. Look at the async suspend/resume code for some
> guidance.
>

Sure, I'll work on adding that into the order rules.

> > @@ -110,6 +115,8 @@ struct device_driver {
> >         void (*sync_state)(struct device *dev);
> >         int (*remove) (struct device *dev);
> >         void (*shutdown) (struct device *dev);
> > +       void (*async_shutdown_start) (struct device *dev);
> > +       void (*async_shutdown_end) (struct device *dev);
>
> Why not use the existing shutdown and call it from an async thread and
> wait for it to finish? Similar to how async probes are handled. Also,
> adding separate ops for this feels clunky and a very narrow fix. Just
> use a flag to indicate the driver can support async shutdown using the
> existing shutdown() op.
>
It is rather clunky. It was carried from older patches where I
mistakenly thought people wanted this separate interface. And adding
threads seemed like overkill. Others have been working on similar
patches on linux-nvme that I was unaware of. They add an optional
shutdown_wait call instead of this interface. I had planned on
adapting to work with their interface design.

David Jeffery


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-02-14 20:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-07 18:40 [RFC PATCH 0/6] async device shutdown support David Jeffery
2024-02-07 18:40 ` [RFC PATCH 1/6] minimal async shutdown infrastructure David Jeffery
2024-02-14  3:42   ` Saravana Kannan
2024-02-14 20:55     ` David Jeffery
2024-02-07 18:40 ` [RFC PATCH 2/6] Improve ability to perform async shutdown in parallel David Jeffery
2024-02-07 18:40 ` [RFC PATCH 3/6] pci bus async shutdown support David Jeffery
2024-02-07 18:40 ` [RFC PATCH 4/6] pci nvme " David Jeffery
2024-02-07 18:40 ` [RFC PATCH 5/6] scsi mid layer support for async command submit David Jeffery
2024-02-07 18:41 ` [RFC PATCH 6/6] sd: async cache flush on shutdown David Jeffery

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).