* [PATCH v3 0/3] Asynchronous shutdown interface and example implementation @ 2022-05-17 22:08 Tanjore Suresh 2022-05-17 22:08 ` [PATCH v3 1/3] driver core: Support asynchronous driver shutdown Tanjore Suresh 0 siblings, 1 reply; 10+ messages in thread From: Tanjore Suresh @ 2022-05-17 22:08 UTC (permalink / raw) To: Greg Kroah-Hartman, Rafael J . Wysocki, Christoph Hellwig, Sagi Grimberg, Bjorn Helgaas Cc: linux-kernel, linux-nvme, linux-pci, Tanjore Suresh Problem: Some of our machines are configured with many NVMe devices and are validated for strict shutdown time requirements. Each NVMe device plugged into the system, typicaly takes about 4.5 secs to shutdown. A system with 16 such NVMe devices will takes approximately 80 secs to shutdown and go through reboot. The current shutdown APIs as defined at bus level is defined to be synchronous. Therefore, more devices are in the system the greater the time it takes to shutdown. This shutdown time significantly contributes the machine reboot time. Solution: This patch set proposes an asynchronous shutdown interface at bus level, modifies the core driver, device shutdown routine to exploit the new interface while maintaining backward compatibility with synchronous implementation already existing (Patch 1 of 3) and exploits new interface to enable all PCI-E based devices to use asynchronous interface semantics if necessary (Patch 2 of 3). The implementation at PCI-E level also works in a backward compatible way, to allow exiting device implementation to work with current synchronous semantics. Only show cases an example implementation for NVMe device to exploit this asynchronous shutdown interface. (Patch 3 of 3). Changelog: v2: - Replaced the shutdown_pre & shutdown_post entry point names with the recommended names (async_shutdown_start and asynch_shutdown_end). - Comment about ordering requirements between bridge shutdown versus leaf/endpoint shutdown was agreed to be different when calling async_shutdown_start and async_shutdown_end. Now this implements the same order of calling both start and end entry points. v3: - This notes clarifies why power management framework was not considered for implementing this shutdown optimization. There is no code change done. This change notes clarfies the reasoning only. This patch is only for shutdown of the system. The shutdown entry points are traditionally have different requirement where all devices are brought to a quiescent state and then system power may be removed (power down request scenarios) and also the same entry point is used to shutdown all devices and re-initialized and restarted (soft shutdown/reboot scenarios). Whereas, the device power management (dpm) allows the device to bring down any device configured in the system that may be idle to various low power states that the device may support in a selective manner and based on transitions that device implementation allows. The power state transitions initiated by the system can be achieved using 'dpm' interfaces already specified. Therefore the request to use the 'dpm' interface to achieve this shutdown optimization is not the right approach as the suggested interface is meant to solve an orthogonal requirement and have historically been kept separate from the shutdown entry points defined and its associated semantics. Tanjore Suresh (3): driver core: Support asynchronous driver shutdown PCI: Support asynchronous shutdown nvme: Add async shutdown support drivers/base/core.c | 38 +++++++++++++++++- drivers/nvme/host/core.c | 28 +++++++++---- drivers/nvme/host/nvme.h | 8 ++++ drivers/nvme/host/pci.c | 80 ++++++++++++++++++++++++-------------- drivers/pci/pci-driver.c | 20 ++++++++-- include/linux/device/bus.h | 12 ++++++ include/linux/pci.h | 4 ++ 7 files changed, 149 insertions(+), 41 deletions(-) -- 2.36.0.550.gb090851708-goog ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 1/3] driver core: Support asynchronous driver shutdown 2022-05-17 22:08 [PATCH v3 0/3] Asynchronous shutdown interface and example implementation Tanjore Suresh @ 2022-05-17 22:08 ` Tanjore Suresh 2022-05-17 22:08 ` [PATCH v3 2/3] PCI: Support asynchronous shutdown Tanjore Suresh 2022-05-18 11:38 ` [PATCH v3 1/3] driver core: Support asynchronous driver shutdown Rafael J. Wysocki 0 siblings, 2 replies; 10+ messages in thread From: Tanjore Suresh @ 2022-05-17 22:08 UTC (permalink / raw) To: Greg Kroah-Hartman, Rafael J . Wysocki, Christoph Hellwig, Sagi Grimberg, Bjorn Helgaas Cc: linux-kernel, linux-nvme, linux-pci, Tanjore Suresh This changes the bus driver interface with additional entry points to enable devices to implement asynchronous shutdown. The existing synchronous interface to shutdown is unmodified and retained for backward compatibility. This changes the common device shutdown code to enable devices to participate in asynchronous shutdown implementation. Signed-off-by: Tanjore Suresh <tansuresh@google.com> --- drivers/base/core.c | 38 +++++++++++++++++++++++++++++++++++++- include/linux/device/bus.h | 12 ++++++++++++ 2 files changed, 49 insertions(+), 1 deletion(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 3d6430eb0c6a..ba267ae70a22 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -4479,6 +4479,7 @@ EXPORT_SYMBOL_GPL(device_change_owner); void device_shutdown(void) { struct device *dev, *parent; + LIST_HEAD(async_shutdown_list); wait_for_device_probe(); device_block_probing(); @@ -4523,7 +4524,13 @@ void device_shutdown(void) dev_info(dev, "shutdown_pre\n"); dev->class->shutdown_pre(dev); } - if (dev->bus && dev->bus->shutdown) { + if (dev->bus && dev->bus->async_shutdown_start) { + if (initcall_debug) + dev_info(dev, "async_shutdown_start\n"); + dev->bus->async_shutdown_start(dev); + list_add_tail(&dev->kobj.entry, + &async_shutdown_list); + } else if (dev->bus && dev->bus->shutdown) { if (initcall_debug) dev_info(dev, "shutdown\n"); dev->bus->shutdown(dev); @@ -4543,6 +4550,35 @@ void device_shutdown(void) spin_lock(&devices_kset->list_lock); } spin_unlock(&devices_kset->list_lock); + + /* + * Second pass spin for only devices, that have configured + * Asynchronous shutdown. + */ + while (!list_empty(&async_shutdown_list)) { + dev = list_entry(async_shutdown_list.next, struct device, + kobj.entry); + parent = get_device(dev->parent); + get_device(dev); + /* + * Make sure the device is off the list + */ + list_del_init(&dev->kobj.entry); + if (parent) + device_lock(parent); + device_lock(dev); + if (dev->bus && dev->bus->async_shutdown_end) { + if (initcall_debug) + dev_info(dev, + "async_shutdown_end called\n"); + dev->bus->async_shutdown_end(dev); + } + device_unlock(dev); + if (parent) + device_unlock(parent); + put_device(dev); + put_device(parent); + } } /* diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h index a039ab809753..f582c9d21515 100644 --- a/include/linux/device/bus.h +++ b/include/linux/device/bus.h @@ -49,6 +49,16 @@ struct fwnode_handle; * will never get called until they do. * @remove: Called when a device removed from this bus. * @shutdown: Called at shut-down time to quiesce the device. + * @async_shutdown_start: Called at the shutdown-time to start + * the shutdown process on the device. + * This entry point will be called only + * when the bus driver has indicated it would + * like to participate in asynchronous shutdown + * completion. + * @async_shutdown_end: Called at shutdown-time to complete the shutdown + * process of the device. This entry point will be called + * only when the bus drive has indicated it would like to + * participate in the asynchronous shutdown completion. * * @online: Called to put the device back online (after offlining it). * @offline: Called to put the device offline for hot-removal. May fail. @@ -93,6 +103,8 @@ struct bus_type { void (*sync_state)(struct device *dev); void (*remove)(struct device *dev); void (*shutdown)(struct device *dev); + void (*async_shutdown_start)(struct device *dev); + void (*async_shutdown_end)(struct device *dev); int (*online)(struct device *dev); int (*offline)(struct device *dev); -- 2.36.0.550.gb090851708-goog ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 2/3] PCI: Support asynchronous shutdown 2022-05-17 22:08 ` [PATCH v3 1/3] driver core: Support asynchronous driver shutdown Tanjore Suresh @ 2022-05-17 22:08 ` Tanjore Suresh 2022-05-17 22:08 ` [PATCH v3 3/3] nvme: Add async shutdown support Tanjore Suresh 2022-05-18 11:38 ` [PATCH v3 1/3] driver core: Support asynchronous driver shutdown Rafael J. Wysocki 1 sibling, 1 reply; 10+ messages in thread From: Tanjore Suresh @ 2022-05-17 22:08 UTC (permalink / raw) To: Greg Kroah-Hartman, Rafael J . Wysocki, Christoph Hellwig, Sagi Grimberg, Bjorn Helgaas Cc: linux-kernel, linux-nvme, linux-pci, Tanjore Suresh Enhances the base PCI driver to add support for asynchronous shutdown. Assume a device takes n secs to shutdown. If a machine has been populated with M such devices, the total time spent in shutting down all the devices will be M * n secs, if the shutdown is done synchronously. For example, if NVMe PCI Controllers take 5 secs to shutdown and if there are 16 such NVMe controllers in a system, system will spend a total of 80 secs to shutdown all NVMe devices in that system. In order to speed up the shutdown time, asynchronous interface to shutdown has been implemented. This will significantly reduce the machine reboot time. Signed-off-by: Tanjore Suresh <tansuresh@google.com> --- drivers/pci/pci-driver.c | 20 ++++++++++++++++---- include/linux/pci.h | 4 ++++ 2 files changed, 20 insertions(+), 4 deletions(-) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 4ceeb75fc899..63f49a8dff8e 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -501,16 +501,28 @@ static void pci_device_remove(struct device *dev) pci_dev_put(pci_dev); } -static void pci_device_shutdown(struct device *dev) +static void pci_device_async_shutdown_start(struct device *dev) { struct pci_dev *pci_dev = to_pci_dev(dev); struct pci_driver *drv = pci_dev->driver; pm_runtime_resume(dev); - if (drv && drv->shutdown) + if (drv && drv->async_shutdown_start) + drv->async_shutdown_start(pci_dev); + else if (drv && drv->shutdown) drv->shutdown(pci_dev); +} + +static void pci_device_async_shutdown_end(struct device *dev) +{ + struct pci_dev *pci_dev = to_pci_dev(dev); + struct pci_driver *drv = pci_dev->driver; + + if (drv && drv->async_shutdown_end) + drv->async_shutdown_end(pci_dev); + /* * If this is a kexec reboot, turn off Bus Master bit on the * device to tell it to not continue to do DMA. Don't touch @@ -521,7 +533,6 @@ static void pci_device_shutdown(struct device *dev) if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot)) pci_clear_master(pci_dev); } - #ifdef CONFIG_PM /* Auxiliary functions used for system resume and run-time resume. */ @@ -1625,7 +1636,8 @@ struct bus_type pci_bus_type = { .uevent = pci_uevent, .probe = pci_device_probe, .remove = pci_device_remove, - .shutdown = pci_device_shutdown, + .async_shutdown_start = pci_device_async_shutdown_start, + .async_shutdown_end = pci_device_async_shutdown_end, .dev_groups = pci_dev_groups, .bus_groups = pci_bus_groups, .drv_groups = pci_drv_groups, diff --git a/include/linux/pci.h b/include/linux/pci.h index 60adf42460ab..ef5500c18fed 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -881,6 +881,8 @@ struct module; * Useful for enabling wake-on-lan (NIC) or changing * the power state of a device before reboot. * e.g. drivers/net/e100.c. + * @async_shutdown_start: This starts the asynchronous shutdown + * @async_shutdown_end: This completes the started asynchronous shutdown * @sriov_configure: Optional driver callback to allow configuration of * number of VFs to enable via sysfs "sriov_numvfs" file. * @sriov_set_msix_vec_count: PF Driver callback to change number of MSI-X @@ -905,6 +907,8 @@ struct pci_driver { int (*suspend)(struct pci_dev *dev, pm_message_t state); /* Device suspended */ int (*resume)(struct pci_dev *dev); /* Device woken up */ void (*shutdown)(struct pci_dev *dev); + void (*async_shutdown_start)(struct pci_dev *dev); + void (*async_shutdown_end)(struct pci_dev *dev); int (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */ int (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */ u32 (*sriov_get_vf_total_msix)(struct pci_dev *pf); -- 2.36.0.550.gb090851708-goog ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 3/3] nvme: Add async shutdown support 2022-05-17 22:08 ` [PATCH v3 2/3] PCI: Support asynchronous shutdown Tanjore Suresh @ 2022-05-17 22:08 ` Tanjore Suresh 0 siblings, 0 replies; 10+ messages in thread From: Tanjore Suresh @ 2022-05-17 22:08 UTC (permalink / raw) To: Greg Kroah-Hartman, Rafael J . Wysocki, Christoph Hellwig, Sagi Grimberg, Bjorn Helgaas Cc: linux-kernel, linux-nvme, linux-pci, Tanjore Suresh This works with the asynchronous shutdown mechanism setup for the PCI drivers and participates to provide both pre and post shutdown routines at pci_driver structure level. The shutdown_pre routine starts the shutdown and does not wait for the shutdown to complete. The shutdown_post routine waits for the shutdown to complete on individual controllers that this driver instance controls. This mechanism optimizes to speed up the shutdown in a system which host many controllers. Signed-off-by: Tanjore Suresh <tansuresh@google.com> --- drivers/nvme/host/core.c | 28 ++++++++++---- drivers/nvme/host/nvme.h | 8 ++++ drivers/nvme/host/pci.c | 80 +++++++++++++++++++++++++--------------- 3 files changed, 80 insertions(+), 36 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index e1846d04817f..f2fc62e1176e 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -2188,16 +2188,30 @@ EXPORT_SYMBOL_GPL(nvme_enable_ctrl); int nvme_shutdown_ctrl(struct nvme_ctrl *ctrl) { - unsigned long timeout = jiffies + (ctrl->shutdown_timeout * HZ); - u32 csts; int ret; + ret = nvme_shutdown_ctrl_start(ctrl); + if (ret) + return ret; + return nvme_wait_for_shutdown_cmpl(ctrl); +} +EXPORT_SYMBOL_GPL(nvme_shutdown_ctrl); + +int nvme_shutdown_ctrl_start(struct nvme_ctrl *ctrl) +{ + ctrl->ctrl_config &= ~NVME_CC_SHN_MASK; ctrl->ctrl_config |= NVME_CC_SHN_NORMAL; - ret = ctrl->ops->reg_write32(ctrl, NVME_REG_CC, ctrl->ctrl_config); - if (ret) - return ret; + return ctrl->ops->reg_write32(ctrl, NVME_REG_CC, ctrl->ctrl_config); +} +EXPORT_SYMBOL_GPL(nvme_shutdown_ctrl_start); + +int nvme_wait_for_shutdown_cmpl(struct nvme_ctrl *ctrl) +{ + unsigned long deadline = jiffies + (ctrl->shutdown_timeout * HZ); + u32 csts; + int ret; while ((ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CSTS, &csts)) == 0) { if ((csts & NVME_CSTS_SHST_MASK) == NVME_CSTS_SHST_CMPLT) @@ -2206,7 +2220,7 @@ int nvme_shutdown_ctrl(struct nvme_ctrl *ctrl) msleep(100); if (fatal_signal_pending(current)) return -EINTR; - if (time_after(jiffies, timeout)) { + if (time_after(jiffies, deadline)) { dev_err(ctrl->device, "Device shutdown incomplete; abort shutdown\n"); return -ENODEV; @@ -2215,7 +2229,7 @@ int nvme_shutdown_ctrl(struct nvme_ctrl *ctrl) return ret; } -EXPORT_SYMBOL_GPL(nvme_shutdown_ctrl); +EXPORT_SYMBOL_GPL(nvme_wait_for_shutdown_cmpl); static int nvme_configure_timestamp(struct nvme_ctrl *ctrl) { diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index a2b53ca63335..a8706a0f32f2 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -175,6 +175,12 @@ enum { NVME_REQ_USERCMD = (1 << 1), }; +enum shutdown_type { + DO_NOT_SHUTDOWN = 0, + SHUTDOWN_TYPE_SYNC = 1, + SHUTDOWN_TYPE_ASYNC = 2, +}; + static inline struct nvme_request *nvme_req(struct request *req) { return blk_mq_rq_to_pdu(req); @@ -677,6 +683,8 @@ bool nvme_wait_reset(struct nvme_ctrl *ctrl); int nvme_disable_ctrl(struct nvme_ctrl *ctrl); int nvme_enable_ctrl(struct nvme_ctrl *ctrl); int nvme_shutdown_ctrl(struct nvme_ctrl *ctrl); +int nvme_shutdown_ctrl_start(struct nvme_ctrl *ctrl); +int nvme_wait_for_shutdown_cmpl(struct nvme_ctrl *ctrl); int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, const struct nvme_ctrl_ops *ops, unsigned long quirks); void nvme_uninit_ctrl(struct nvme_ctrl *ctrl); diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 3aacf1c0d5a5..a0ab2e777d44 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -107,7 +107,7 @@ MODULE_PARM_DESC(noacpi, "disable acpi bios quirks"); struct nvme_dev; struct nvme_queue; -static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown); +static void nvme_dev_disable(struct nvme_dev *dev, int shutdown_type); static bool __nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode); /* @@ -1357,7 +1357,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved) */ if (nvme_should_reset(dev, csts)) { nvme_warn_reset(dev, csts); - nvme_dev_disable(dev, false); + nvme_dev_disable(dev, DO_NOT_SHUTDOWN); nvme_reset_ctrl(&dev->ctrl); return BLK_EH_DONE; } @@ -1392,7 +1392,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved) "I/O %d QID %d timeout, disable controller\n", req->tag, nvmeq->qid); nvme_req(req)->flags |= NVME_REQ_CANCELLED; - nvme_dev_disable(dev, true); + nvme_dev_disable(dev, SHUTDOWN_TYPE_SYNC); return BLK_EH_DONE; case NVME_CTRL_RESETTING: return BLK_EH_RESET_TIMER; @@ -1410,7 +1410,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved) "I/O %d QID %d timeout, reset controller\n", req->tag, nvmeq->qid); nvme_req(req)->flags |= NVME_REQ_CANCELLED; - nvme_dev_disable(dev, false); + nvme_dev_disable(dev, DO_NOT_SHUTDOWN); nvme_reset_ctrl(&dev->ctrl); return BLK_EH_DONE; @@ -1503,11 +1503,13 @@ static void nvme_suspend_io_queues(struct nvme_dev *dev) nvme_suspend_queue(&dev->queues[i]); } -static void nvme_disable_admin_queue(struct nvme_dev *dev, bool shutdown) +static void nvme_disable_admin_queue(struct nvme_dev *dev, int shutdown_type) { struct nvme_queue *nvmeq = &dev->queues[0]; - if (shutdown) + if (shutdown_type == SHUTDOWN_TYPE_ASYNC) + nvme_shutdown_ctrl_start(&dev->ctrl); + else if (shutdown_type == SHUTDOWN_TYPE_SYNC) nvme_shutdown_ctrl(&dev->ctrl); else nvme_disable_ctrl(&dev->ctrl); @@ -2669,7 +2671,7 @@ static void nvme_pci_disable(struct nvme_dev *dev) } } -static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) +static void nvme_dev_disable(struct nvme_dev *dev, int shutdown_type) { bool dead = true, freeze = false; struct pci_dev *pdev = to_pci_dev(dev->dev); @@ -2691,14 +2693,14 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) * Give the controller a chance to complete all entered requests if * doing a safe shutdown. */ - if (!dead && shutdown && freeze) + if (!dead && (shutdown_type != DO_NOT_SHUTDOWN) && freeze) nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT); nvme_stop_queues(&dev->ctrl); if (!dead && dev->ctrl.queue_count > 0) { nvme_disable_io_queues(dev); - nvme_disable_admin_queue(dev, shutdown); + nvme_disable_admin_queue(dev, shutdown_type); } nvme_suspend_io_queues(dev); nvme_suspend_queue(&dev->queues[0]); @@ -2710,12 +2712,12 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) blk_mq_tagset_wait_completed_request(&dev->tagset); blk_mq_tagset_wait_completed_request(&dev->admin_tagset); - /* - * The driver will not be starting up queues again if shutting down so - * must flush all entered requests to their failed completion to avoid - * deadlocking blk-mq hot-cpu notifier. - */ - if (shutdown) { + if (shutdown_type == SHUTDOWN_TYPE_SYNC) { + /* + * The driver will not be starting up queues again if shutting down so + * must flush all entered requests to their failed completion to avoid + * deadlocking blk-mq hot-cpu notifier. + */ nvme_start_queues(&dev->ctrl); if (dev->ctrl.admin_q && !blk_queue_dying(dev->ctrl.admin_q)) nvme_start_admin_queue(&dev->ctrl); @@ -2723,11 +2725,11 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) mutex_unlock(&dev->shutdown_lock); } -static int nvme_disable_prepare_reset(struct nvme_dev *dev, bool shutdown) +static int nvme_disable_prepare_reset(struct nvme_dev *dev, int type) { if (!nvme_wait_reset(&dev->ctrl)) return -EBUSY; - nvme_dev_disable(dev, shutdown); + nvme_dev_disable(dev, type); return 0; } @@ -2785,7 +2787,7 @@ static void nvme_remove_dead_ctrl(struct nvme_dev *dev) */ nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING); nvme_get_ctrl(&dev->ctrl); - nvme_dev_disable(dev, false); + nvme_dev_disable(dev, DO_NOT_SHUTDOWN); nvme_kill_queues(&dev->ctrl); if (!queue_work(nvme_wq, &dev->remove_work)) nvme_put_ctrl(&dev->ctrl); @@ -2810,7 +2812,7 @@ static void nvme_reset_work(struct work_struct *work) * moving on. */ if (dev->ctrl.ctrl_config & NVME_CC_ENABLE) - nvme_dev_disable(dev, false); + nvme_dev_disable(dev, DO_NOT_SHUTDOWN); nvme_sync_queues(&dev->ctrl); mutex_lock(&dev->shutdown_lock); @@ -3151,7 +3153,7 @@ static void nvme_reset_prepare(struct pci_dev *pdev) * state as pci_dev device lock is held, making it impossible to race * with ->remove(). */ - nvme_disable_prepare_reset(dev, false); + nvme_disable_prepare_reset(dev, DO_NOT_SHUTDOWN); nvme_sync_queues(&dev->ctrl); } @@ -3163,13 +3165,32 @@ static void nvme_reset_done(struct pci_dev *pdev) flush_work(&dev->ctrl.reset_work); } -static void nvme_shutdown(struct pci_dev *pdev) +static void nvme_async_shutdown_start(struct pci_dev *pdev) { struct nvme_dev *dev = pci_get_drvdata(pdev); - nvme_disable_prepare_reset(dev, true); + nvme_disable_prepare_reset(dev, SHUTDOWN_TYPE_ASYNC); } +static void nvme_async_shutdown_end(struct pci_dev *pdev) +{ + struct nvme_dev *dev = pci_get_drvdata(pdev); + + mutex_lock(&dev->shutdown_lock); + nvme_wait_for_shutdown_cmpl(&dev->ctrl); + + /* + * The driver will not be starting up queues again if shutting down so + * must flush all entered requests to their failed completion to avoid + * deadlocking blk-mq hot-cpu notifier. + */ + nvme_start_queues(&dev->ctrl); + if (dev->ctrl.admin_q && !blk_queue_dying(dev->ctrl.admin_q)) + nvme_start_admin_queue(&dev->ctrl); + + mutex_unlock(&dev->shutdown_lock); + +} static void nvme_remove_attrs(struct nvme_dev *dev) { if (dev->attrs_added) @@ -3191,13 +3212,13 @@ static void nvme_remove(struct pci_dev *pdev) if (!pci_device_is_present(pdev)) { nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DEAD); - nvme_dev_disable(dev, true); + nvme_dev_disable(dev, SHUTDOWN_TYPE_SYNC); } flush_work(&dev->ctrl.reset_work); nvme_stop_ctrl(&dev->ctrl); nvme_remove_namespaces(&dev->ctrl); - nvme_dev_disable(dev, true); + nvme_dev_disable(dev, SHUTDOWN_TYPE_SYNC); nvme_remove_attrs(dev); nvme_free_host_mem(dev); nvme_dev_remove_admin(dev); @@ -3259,7 +3280,7 @@ static int nvme_suspend(struct device *dev) if (pm_suspend_via_firmware() || !ctrl->npss || !pcie_aspm_enabled(pdev) || (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND)) - return nvme_disable_prepare_reset(ndev, true); + return nvme_disable_prepare_reset(ndev, SHUTDOWN_TYPE_SYNC); nvme_start_freeze(ctrl); nvme_wait_freeze(ctrl); @@ -3302,7 +3323,7 @@ static int nvme_suspend(struct device *dev) * Clearing npss forces a controller reset on resume. The * correct value will be rediscovered then. */ - ret = nvme_disable_prepare_reset(ndev, true); + ret = nvme_disable_prepare_reset(ndev, SHUTDOWN_TYPE_SYNC); ctrl->npss = 0; } unfreeze: @@ -3314,7 +3335,7 @@ static int nvme_simple_suspend(struct device *dev) { struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev)); - return nvme_disable_prepare_reset(ndev, true); + return nvme_disable_prepare_reset(ndev, SHUTDOWN_TYPE_SYNC); } static int nvme_simple_resume(struct device *dev) @@ -3351,7 +3372,7 @@ static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev, case pci_channel_io_frozen: dev_warn(dev->ctrl.device, "frozen state error detected, reset controller\n"); - nvme_dev_disable(dev, false); + nvme_dev_disable(dev, DO_NOT_SHUTDOWN); return PCI_ERS_RESULT_NEED_RESET; case pci_channel_io_perm_failure: dev_warn(dev->ctrl.device, @@ -3488,7 +3509,8 @@ static struct pci_driver nvme_driver = { .id_table = nvme_id_table, .probe = nvme_probe, .remove = nvme_remove, - .shutdown = nvme_shutdown, + .async_shutdown_start = nvme_async_shutdown_start, + .async_shutdown_end = nvme_async_shutdown_end, #ifdef CONFIG_PM_SLEEP .driver = { .pm = &nvme_dev_pm_ops, -- 2.36.0.550.gb090851708-goog ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] driver core: Support asynchronous driver shutdown 2022-05-17 22:08 ` [PATCH v3 1/3] driver core: Support asynchronous driver shutdown Tanjore Suresh 2022-05-17 22:08 ` [PATCH v3 2/3] PCI: Support asynchronous shutdown Tanjore Suresh @ 2022-05-18 11:38 ` Rafael J. Wysocki 2022-05-18 17:50 ` Bjorn Helgaas 1 sibling, 1 reply; 10+ messages in thread From: Rafael J. Wysocki @ 2022-05-18 11:38 UTC (permalink / raw) To: Tanjore Suresh Cc: Greg Kroah-Hartman, Rafael J . Wysocki, Christoph Hellwig, Sagi Grimberg, Bjorn Helgaas, Linux Kernel Mailing List, linux-nvme, Linux PCI On Wed, May 18, 2022 at 12:08 AM Tanjore Suresh <tansuresh@google.com> wrote: > > This changes the bus driver interface with additional entry points > to enable devices to implement asynchronous shutdown. The existing > synchronous interface to shutdown is unmodified and retained for > backward compatibility. > > This changes the common device shutdown code to enable devices to > participate in asynchronous shutdown implementation. > > Signed-off-by: Tanjore Suresh <tansuresh@google.com> > --- > drivers/base/core.c | 38 +++++++++++++++++++++++++++++++++++++- > include/linux/device/bus.h | 12 ++++++++++++ > 2 files changed, 49 insertions(+), 1 deletion(-) > > diff --git a/drivers/base/core.c b/drivers/base/core.c > index 3d6430eb0c6a..ba267ae70a22 100644 > --- a/drivers/base/core.c > +++ b/drivers/base/core.c > @@ -4479,6 +4479,7 @@ EXPORT_SYMBOL_GPL(device_change_owner); > void device_shutdown(void) > { > struct device *dev, *parent; > + LIST_HEAD(async_shutdown_list); > > wait_for_device_probe(); > device_block_probing(); > @@ -4523,7 +4524,13 @@ void device_shutdown(void) > dev_info(dev, "shutdown_pre\n"); > dev->class->shutdown_pre(dev); > } > - if (dev->bus && dev->bus->shutdown) { > + if (dev->bus && dev->bus->async_shutdown_start) { > + if (initcall_debug) > + dev_info(dev, "async_shutdown_start\n"); > + dev->bus->async_shutdown_start(dev); > + list_add_tail(&dev->kobj.entry, > + &async_shutdown_list); > + } else if (dev->bus && dev->bus->shutdown) { > if (initcall_debug) > dev_info(dev, "shutdown\n"); > dev->bus->shutdown(dev); > @@ -4543,6 +4550,35 @@ void device_shutdown(void) > spin_lock(&devices_kset->list_lock); > } > spin_unlock(&devices_kset->list_lock); > + > + /* > + * Second pass spin for only devices, that have configured > + * Asynchronous shutdown. > + */ > + while (!list_empty(&async_shutdown_list)) { > + dev = list_entry(async_shutdown_list.next, struct device, > + kobj.entry); > + parent = get_device(dev->parent); > + get_device(dev); > + /* > + * Make sure the device is off the list > + */ > + list_del_init(&dev->kobj.entry); > + if (parent) > + device_lock(parent); > + device_lock(dev); > + if (dev->bus && dev->bus->async_shutdown_end) { > + if (initcall_debug) > + dev_info(dev, > + "async_shutdown_end called\n"); > + dev->bus->async_shutdown_end(dev); > + } > + device_unlock(dev); > + if (parent) > + device_unlock(parent); > + put_device(dev); > + put_device(parent); > + } > } > > /* > diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h > index a039ab809753..f582c9d21515 100644 > --- a/include/linux/device/bus.h > +++ b/include/linux/device/bus.h > @@ -49,6 +49,16 @@ struct fwnode_handle; > * will never get called until they do. > * @remove: Called when a device removed from this bus. > * @shutdown: Called at shut-down time to quiesce the device. > + * @async_shutdown_start: Called at the shutdown-time to start > + * the shutdown process on the device. > + * This entry point will be called only > + * when the bus driver has indicated it would > + * like to participate in asynchronous shutdown > + * completion. > + * @async_shutdown_end: Called at shutdown-time to complete the shutdown > + * process of the device. This entry point will be called > + * only when the bus drive has indicated it would like to > + * participate in the asynchronous shutdown completion. I'm going to repeat my point here, but only once. I see no reason to do async shutdown this way, instead of adding a flag for drivers to opt in for calling their existing shutdown callbacks asynchronously, in analogy with the async suspend and resume implementation. Is there any reason why this is not viable? > * > * @online: Called to put the device back online (after offlining it). > * @offline: Called to put the device offline for hot-removal. May fail. > @@ -93,6 +103,8 @@ struct bus_type { > void (*sync_state)(struct device *dev); > void (*remove)(struct device *dev); > void (*shutdown)(struct device *dev); > + void (*async_shutdown_start)(struct device *dev); > + void (*async_shutdown_end)(struct device *dev); > > int (*online)(struct device *dev); > int (*offline)(struct device *dev); > -- > 2.36.0.550.gb090851708-goog > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] driver core: Support asynchronous driver shutdown 2022-05-18 11:38 ` [PATCH v3 1/3] driver core: Support asynchronous driver shutdown Rafael J. Wysocki @ 2022-05-18 17:50 ` Bjorn Helgaas 2022-05-18 18:01 ` Rafael J. Wysocki 2022-08-02 21:35 ` Bjorn Helgaas 0 siblings, 2 replies; 10+ messages in thread From: Bjorn Helgaas @ 2022-05-18 17:50 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Tanjore Suresh, Greg Kroah-Hartman, Christoph Hellwig, Sagi Grimberg, Bjorn Helgaas, Linux Kernel Mailing List, linux-nvme, Linux PCI On Wed, May 18, 2022 at 01:38:49PM +0200, Rafael J. Wysocki wrote: > On Wed, May 18, 2022 at 12:08 AM Tanjore Suresh <tansuresh@google.com> wrote: > > > > This changes the bus driver interface with additional entry points > > to enable devices to implement asynchronous shutdown. The existing > > synchronous interface to shutdown is unmodified and retained for > > backward compatibility. > > > > This changes the common device shutdown code to enable devices to > > participate in asynchronous shutdown implementation. > > > > Signed-off-by: Tanjore Suresh <tansuresh@google.com> > > --- > > drivers/base/core.c | 38 +++++++++++++++++++++++++++++++++++++- > > include/linux/device/bus.h | 12 ++++++++++++ > > 2 files changed, 49 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/base/core.c b/drivers/base/core.c > > index 3d6430eb0c6a..ba267ae70a22 100644 > > --- a/drivers/base/core.c > > +++ b/drivers/base/core.c > > @@ -4479,6 +4479,7 @@ EXPORT_SYMBOL_GPL(device_change_owner); > > void device_shutdown(void) > > { > > struct device *dev, *parent; > > + LIST_HEAD(async_shutdown_list); > > > > wait_for_device_probe(); > > device_block_probing(); > > @@ -4523,7 +4524,13 @@ void device_shutdown(void) > > dev_info(dev, "shutdown_pre\n"); > > dev->class->shutdown_pre(dev); > > } > > - if (dev->bus && dev->bus->shutdown) { > > + if (dev->bus && dev->bus->async_shutdown_start) { > > + if (initcall_debug) > > + dev_info(dev, "async_shutdown_start\n"); > > + dev->bus->async_shutdown_start(dev); > > + list_add_tail(&dev->kobj.entry, > > + &async_shutdown_list); > > + } else if (dev->bus && dev->bus->shutdown) { > > if (initcall_debug) > > dev_info(dev, "shutdown\n"); > > dev->bus->shutdown(dev); > > @@ -4543,6 +4550,35 @@ void device_shutdown(void) > > spin_lock(&devices_kset->list_lock); > > } > > spin_unlock(&devices_kset->list_lock); > > + > > + /* > > + * Second pass spin for only devices, that have configured > > + * Asynchronous shutdown. > > + */ > > + while (!list_empty(&async_shutdown_list)) { > > + dev = list_entry(async_shutdown_list.next, struct device, > > + kobj.entry); > > + parent = get_device(dev->parent); > > + get_device(dev); > > + /* > > + * Make sure the device is off the list > > + */ > > + list_del_init(&dev->kobj.entry); > > + if (parent) > > + device_lock(parent); > > + device_lock(dev); > > + if (dev->bus && dev->bus->async_shutdown_end) { > > + if (initcall_debug) > > + dev_info(dev, > > + "async_shutdown_end called\n"); > > + dev->bus->async_shutdown_end(dev); > > + } > > + device_unlock(dev); > > + if (parent) > > + device_unlock(parent); > > + put_device(dev); > > + put_device(parent); > > + } > > } > > > > /* > > diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h > > index a039ab809753..f582c9d21515 100644 > > --- a/include/linux/device/bus.h > > +++ b/include/linux/device/bus.h > > @@ -49,6 +49,16 @@ struct fwnode_handle; > > * will never get called until they do. > > * @remove: Called when a device removed from this bus. > > * @shutdown: Called at shut-down time to quiesce the device. > > + * @async_shutdown_start: Called at the shutdown-time to start > > + * the shutdown process on the device. > > + * This entry point will be called only > > + * when the bus driver has indicated it would > > + * like to participate in asynchronous shutdown > > + * completion. > > + * @async_shutdown_end: Called at shutdown-time to complete the shutdown > > + * process of the device. This entry point will be called > > + * only when the bus drive has indicated it would like to > > + * participate in the asynchronous shutdown completion. > > I'm going to repeat my point here, but only once. > > I see no reason to do async shutdown this way, instead of adding a > flag for drivers to opt in for calling their existing shutdown > callbacks asynchronously, in analogy with the async suspend and resume > implementation. There's a lot of code here that mere mortals like myself don't understand very well, so here's my meager understanding of how async suspend works and what you're suggesting to make this a little more concrete. Devices have this async_suspend bit: struct device { struct dev_pm_info { unsigned int async_suspend:1; Drivers call device_enable_async_suspend() to set async_suspend if they want it. The system suspend path is something like this: suspend_enter dpm_suspend_noirq(PMSG_SUSPEND) dpm_noirq_suspend_devices(PMSG_SUSPEND) pm_transition = PMSG_SUSPEND while (!list_empty(&dpm_late_early_list)) device_suspend_noirq(dev) dpm_async_fn(dev, async_suspend_noirq) if (is_async(dev)) async_schedule_dev(async_suspend_noirq) # async path async_suspend_noirq # called asynchronously __device_suspend_noirq(dev, PMSG_SUSPEND, true) callback = pm_noirq_op(PMSG_SUSPEND) # .suspend_noirq() dpm_run_callback(callback) # async call __device_suspend_noirq(dev, pm_transition, false) # sync path callback = pm_noirq_op(PMSG_SUSPEND) # .suspend_noirq() dpm_run_callback(callback) # sync call async_synchronize_full # wait If a driver has called device_enable_async_suspend(), we'll use the async_schedule_dev() path to schedule the appropriate .suspend_noirq() method. After scheduling it via the async path or directly calling it via the sync path, the async_synchronize_full() waits for completion of all the async methods. I assume your suggestion is to do something like this: struct device { struct dev_pm_info { unsigned int async_suspend:1; + unsigned int async_shutdown:1; + void device_enable_async_shutdown(struct device *dev) + dev->power.async_shutdown = true; device_shutdown while (!list_empty(&devices_kset->list)) - dev->...->shutdown() + if (is_async_shutdown(dev)) + async_schedule_dev(async_shutdown) # async path + + async_shutdown # called asynchronously + dev->...->shutdown() + + else + dev->...->shutdown() # sync path + + async_synchronize_full # wait ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] driver core: Support asynchronous driver shutdown 2022-05-18 17:50 ` Bjorn Helgaas @ 2022-05-18 18:01 ` Rafael J. Wysocki 2022-05-18 20:45 ` Tanjore Suresh 2022-08-02 21:35 ` Bjorn Helgaas 1 sibling, 1 reply; 10+ messages in thread From: Rafael J. Wysocki @ 2022-05-18 18:01 UTC (permalink / raw) To: Bjorn Helgaas Cc: Rafael J. Wysocki, Tanjore Suresh, Greg Kroah-Hartman, Christoph Hellwig, Sagi Grimberg, Bjorn Helgaas, Linux Kernel Mailing List, linux-nvme, Linux PCI On Wed, May 18, 2022 at 7:50 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Wed, May 18, 2022 at 01:38:49PM +0200, Rafael J. Wysocki wrote: > > On Wed, May 18, 2022 at 12:08 AM Tanjore Suresh <tansuresh@google.com> wrote: > > > > > > This changes the bus driver interface with additional entry points > > > to enable devices to implement asynchronous shutdown. The existing > > > synchronous interface to shutdown is unmodified and retained for > > > backward compatibility. > > > > > > This changes the common device shutdown code to enable devices to > > > participate in asynchronous shutdown implementation. > > > > > > Signed-off-by: Tanjore Suresh <tansuresh@google.com> > > > --- > > > drivers/base/core.c | 38 +++++++++++++++++++++++++++++++++++++- > > > include/linux/device/bus.h | 12 ++++++++++++ > > > 2 files changed, 49 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/base/core.c b/drivers/base/core.c > > > index 3d6430eb0c6a..ba267ae70a22 100644 > > > --- a/drivers/base/core.c > > > +++ b/drivers/base/core.c > > > @@ -4479,6 +4479,7 @@ EXPORT_SYMBOL_GPL(device_change_owner); > > > void device_shutdown(void) > > > { > > > struct device *dev, *parent; > > > + LIST_HEAD(async_shutdown_list); > > > > > > wait_for_device_probe(); > > > device_block_probing(); > > > @@ -4523,7 +4524,13 @@ void device_shutdown(void) > > > dev_info(dev, "shutdown_pre\n"); > > > dev->class->shutdown_pre(dev); > > > } > > > - if (dev->bus && dev->bus->shutdown) { > > > + if (dev->bus && dev->bus->async_shutdown_start) { > > > + if (initcall_debug) > > > + dev_info(dev, "async_shutdown_start\n"); > > > + dev->bus->async_shutdown_start(dev); > > > + list_add_tail(&dev->kobj.entry, > > > + &async_shutdown_list); > > > + } else if (dev->bus && dev->bus->shutdown) { > > > if (initcall_debug) > > > dev_info(dev, "shutdown\n"); > > > dev->bus->shutdown(dev); > > > @@ -4543,6 +4550,35 @@ void device_shutdown(void) > > > spin_lock(&devices_kset->list_lock); > > > } > > > spin_unlock(&devices_kset->list_lock); > > > + > > > + /* > > > + * Second pass spin for only devices, that have configured > > > + * Asynchronous shutdown. > > > + */ > > > + while (!list_empty(&async_shutdown_list)) { > > > + dev = list_entry(async_shutdown_list.next, struct device, > > > + kobj.entry); > > > + parent = get_device(dev->parent); > > > + get_device(dev); > > > + /* > > > + * Make sure the device is off the list > > > + */ > > > + list_del_init(&dev->kobj.entry); > > > + if (parent) > > > + device_lock(parent); > > > + device_lock(dev); > > > + if (dev->bus && dev->bus->async_shutdown_end) { > > > + if (initcall_debug) > > > + dev_info(dev, > > > + "async_shutdown_end called\n"); > > > + dev->bus->async_shutdown_end(dev); > > > + } > > > + device_unlock(dev); > > > + if (parent) > > > + device_unlock(parent); > > > + put_device(dev); > > > + put_device(parent); > > > + } > > > } > > > > > > /* > > > diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h > > > index a039ab809753..f582c9d21515 100644 > > > --- a/include/linux/device/bus.h > > > +++ b/include/linux/device/bus.h > > > @@ -49,6 +49,16 @@ struct fwnode_handle; > > > * will never get called until they do. > > > * @remove: Called when a device removed from this bus. > > > * @shutdown: Called at shut-down time to quiesce the device. > > > + * @async_shutdown_start: Called at the shutdown-time to start > > > + * the shutdown process on the device. > > > + * This entry point will be called only > > > + * when the bus driver has indicated it would > > > + * like to participate in asynchronous shutdown > > > + * completion. > > > + * @async_shutdown_end: Called at shutdown-time to complete the shutdown > > > + * process of the device. This entry point will be called > > > + * only when the bus drive has indicated it would like to > > > + * participate in the asynchronous shutdown completion. > > > > I'm going to repeat my point here, but only once. > > > > I see no reason to do async shutdown this way, instead of adding a > > flag for drivers to opt in for calling their existing shutdown > > callbacks asynchronously, in analogy with the async suspend and resume > > implementation. > > There's a lot of code here that mere mortals like myself don't > understand very well, so here's my meager understanding of how > async suspend works and what you're suggesting to make this a > little more concrete. > > Devices have this async_suspend bit: > > struct device { > struct dev_pm_info { > unsigned int async_suspend:1; > > Drivers call device_enable_async_suspend() to set async_suspend if > they want it. The system suspend path is something like this: > > suspend_enter > dpm_suspend_noirq(PMSG_SUSPEND) > dpm_noirq_suspend_devices(PMSG_SUSPEND) > pm_transition = PMSG_SUSPEND > while (!list_empty(&dpm_late_early_list)) > device_suspend_noirq(dev) > dpm_async_fn(dev, async_suspend_noirq) > if (is_async(dev)) > async_schedule_dev(async_suspend_noirq) # async path > > async_suspend_noirq # called asynchronously > __device_suspend_noirq(dev, PMSG_SUSPEND, true) > callback = pm_noirq_op(PMSG_SUSPEND) # .suspend_noirq() > dpm_run_callback(callback) # async call > > __device_suspend_noirq(dev, pm_transition, false) # sync path > callback = pm_noirq_op(PMSG_SUSPEND) # .suspend_noirq() > dpm_run_callback(callback) # sync call > > async_synchronize_full # wait > > If a driver has called device_enable_async_suspend(), we'll use the > async_schedule_dev() path to schedule the appropriate .suspend_noirq() > method. After scheduling it via the async path or directly calling it > via the sync path, the async_synchronize_full() waits for completion > of all the async methods. > > I assume your suggestion is to do something like this: > > struct device { > struct dev_pm_info { > unsigned int async_suspend:1; > + unsigned int async_shutdown:1; > > + void device_enable_async_shutdown(struct device *dev) > + dev->power.async_shutdown = true; > > device_shutdown > while (!list_empty(&devices_kset->list)) > - dev->...->shutdown() > + if (is_async_shutdown(dev)) > + async_schedule_dev(async_shutdown) # async path > + > + async_shutdown # called asynchronously > + dev->...->shutdown() > + > + else > + dev->...->shutdown() # sync path > + > + async_synchronize_full # wait Yes, that's the idea IIUC. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] driver core: Support asynchronous driver shutdown 2022-05-18 18:01 ` Rafael J. Wysocki @ 2022-05-18 20:45 ` Tanjore Suresh 0 siblings, 0 replies; 10+ messages in thread From: Tanjore Suresh @ 2022-05-18 20:45 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Bjorn Helgaas, Greg Kroah-Hartman, Christoph Hellwig, Sagi Grimberg, Bjorn Helgaas, Linux Kernel Mailing List, linux-nvme, Linux PCI Rafeal, On Wed, May 18, 2022 at 11:01 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Wed, May 18, 2022 at 7:50 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > On Wed, May 18, 2022 at 01:38:49PM +0200, Rafael J. Wysocki wrote: > > > On Wed, May 18, 2022 at 12:08 AM Tanjore Suresh <tansuresh@google.com> wrote: > > > > > > > > This changes the bus driver interface with additional entry points > > > > to enable devices to implement asynchronous shutdown. The existing > > > > synchronous interface to shutdown is unmodified and retained for > > > > backward compatibility. > > > > > > > > This changes the common device shutdown code to enable devices to > > > > participate in asynchronous shutdown implementation. > > > > > > > > Signed-off-by: Tanjore Suresh <tansuresh@google.com> > > > > --- > > > > drivers/base/core.c | 38 +++++++++++++++++++++++++++++++++++++- > > > > include/linux/device/bus.h | 12 ++++++++++++ > > > > 2 files changed, 49 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/base/core.c b/drivers/base/core.c > > > > index 3d6430eb0c6a..ba267ae70a22 100644 > > > > --- a/drivers/base/core.c > > > > +++ b/drivers/base/core.c > > > > @@ -4479,6 +4479,7 @@ EXPORT_SYMBOL_GPL(device_change_owner); > > > > void device_shutdown(void) > > > > { > > > > struct device *dev, *parent; > > > > + LIST_HEAD(async_shutdown_list); > > > > > > > > wait_for_device_probe(); > > > > device_block_probing(); > > > > @@ -4523,7 +4524,13 @@ void device_shutdown(void) > > > > dev_info(dev, "shutdown_pre\n"); > > > > dev->class->shutdown_pre(dev); > > > > } > > > > - if (dev->bus && dev->bus->shutdown) { > > > > + if (dev->bus && dev->bus->async_shutdown_start) { > > > > + if (initcall_debug) > > > > + dev_info(dev, "async_shutdown_start\n"); > > > > + dev->bus->async_shutdown_start(dev); > > > > + list_add_tail(&dev->kobj.entry, > > > > + &async_shutdown_list); > > > > + } else if (dev->bus && dev->bus->shutdown) { > > > > if (initcall_debug) > > > > dev_info(dev, "shutdown\n"); > > > > dev->bus->shutdown(dev); > > > > @@ -4543,6 +4550,35 @@ void device_shutdown(void) > > > > spin_lock(&devices_kset->list_lock); > > > > } > > > > spin_unlock(&devices_kset->list_lock); > > > > + > > > > + /* > > > > + * Second pass spin for only devices, that have configured > > > > + * Asynchronous shutdown. > > > > + */ > > > > + while (!list_empty(&async_shutdown_list)) { > > > > + dev = list_entry(async_shutdown_list.next, struct device, > > > > + kobj.entry); > > > > + parent = get_device(dev->parent); > > > > + get_device(dev); > > > > + /* > > > > + * Make sure the device is off the list > > > > + */ > > > > + list_del_init(&dev->kobj.entry); > > > > + if (parent) > > > > + device_lock(parent); > > > > + device_lock(dev); > > > > + if (dev->bus && dev->bus->async_shutdown_end) { > > > > + if (initcall_debug) > > > > + dev_info(dev, > > > > + "async_shutdown_end called\n"); > > > > + dev->bus->async_shutdown_end(dev); > > > > + } > > > > + device_unlock(dev); > > > > + if (parent) > > > > + device_unlock(parent); > > > > + put_device(dev); > > > > + put_device(parent); > > > > + } > > > > } > > > > > > > > /* > > > > diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h > > > > index a039ab809753..f582c9d21515 100644 > > > > --- a/include/linux/device/bus.h > > > > +++ b/include/linux/device/bus.h > > > > @@ -49,6 +49,16 @@ struct fwnode_handle; > > > > * will never get called until they do. > > > > * @remove: Called when a device removed from this bus. > > > > * @shutdown: Called at shut-down time to quiesce the device. > > > > + * @async_shutdown_start: Called at the shutdown-time to start > > > > + * the shutdown process on the device. > > > > + * This entry point will be called only > > > > + * when the bus driver has indicated it would > > > > + * like to participate in asynchronous shutdown > > > > + * completion. > > > > + * @async_shutdown_end: Called at shutdown-time to complete the shutdown > > > > + * process of the device. This entry point will be called > > > > + * only when the bus drive has indicated it would like to > > > > + * participate in the asynchronous shutdown completion. > > > > > > I'm going to repeat my point here, but only once. > > > > > > I see no reason to do async shutdown this way, instead of adding a > > > flag for drivers to opt in for calling their existing shutdown > > > callbacks asynchronously, in analogy with the async suspend and resume > > > implementation. > > > > There's a lot of code here that mere mortals like myself don't > > understand very well, so here's my meager understanding of how > > async suspend works and what you're suggesting to make this a > > little more concrete. > > > > Devices have this async_suspend bit: > > > > struct device { > > struct dev_pm_info { > > unsigned int async_suspend:1; > > > > Drivers call device_enable_async_suspend() to set async_suspend if > > they want it. The system suspend path is something like this: > > > > suspend_enter > > dpm_suspend_noirq(PMSG_SUSPEND) > > dpm_noirq_suspend_devices(PMSG_SUSPEND) > > pm_transition = PMSG_SUSPEND > > while (!list_empty(&dpm_late_early_list)) > > device_suspend_noirq(dev) > > dpm_async_fn(dev, async_suspend_noirq) > > if (is_async(dev)) > > async_schedule_dev(async_suspend_noirq) # async path > > > > async_suspend_noirq # called asynchronously > > __device_suspend_noirq(dev, PMSG_SUSPEND, true) > > callback = pm_noirq_op(PMSG_SUSPEND) # .suspend_noirq() > > dpm_run_callback(callback) # async call > > > > __device_suspend_noirq(dev, pm_transition, false) # sync path > > callback = pm_noirq_op(PMSG_SUSPEND) # .suspend_noirq() > > dpm_run_callback(callback) # sync call > > > > async_synchronize_full # wait > > > > If a driver has called device_enable_async_suspend(), we'll use the > > async_schedule_dev() path to schedule the appropriate .suspend_noirq() > > method. After scheduling it via the async path or directly calling it > > via the sync path, the async_synchronize_full() waits for completion > > of all the async methods. > > > > I assume your suggestion is to do something like this: > > > > struct device { > > struct dev_pm_info { > > unsigned int async_suspend:1; > > + unsigned int async_shutdown:1; > > > > + void device_enable_async_shutdown(struct device *dev) > > + dev->power.async_shutdown = true; > > > > device_shutdown > > while (!list_empty(&devices_kset->list)) > > - dev->...->shutdown() > > + if (is_async_shutdown(dev)) > > + async_schedule_dev(async_shutdown) # async path > > + > > + async_shutdown # called asynchronously > > + dev->...->shutdown() > > + > > + else > > + dev->...->shutdown() # sync path > > + > > + async_synchronize_full # wait > > Yes, that's the idea IIUC. Thanks for the clarification, I misunderstood your earlier comment, thanks for explaining and clarification. Let me evaluate and get back to you as soon as possible. Thanks sureshtk ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] driver core: Support asynchronous driver shutdown 2022-05-18 17:50 ` Bjorn Helgaas 2022-05-18 18:01 ` Rafael J. Wysocki @ 2022-08-02 21:35 ` Bjorn Helgaas 2022-08-03 15:04 ` Rafael J. Wysocki 1 sibling, 1 reply; 10+ messages in thread From: Bjorn Helgaas @ 2022-08-02 21:35 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Tanjore Suresh, Greg Kroah-Hartman, Christoph Hellwig, Sagi Grimberg, Bjorn Helgaas, Linux Kernel Mailing List, linux-nvme, Linux PCI [Beginning of thread: https://lore.kernel.org/r/20220517220816.1635044-2-tansuresh@google.com] On Wed, May 18, 2022 at 12:50:02PM -0500, Bjorn Helgaas wrote: > Devices have this async_suspend bit: > > struct device { > struct dev_pm_info { > unsigned int async_suspend:1; > > Drivers call device_enable_async_suspend() to set async_suspend if > they want it. The system suspend path is something like this: > > suspend_enter > dpm_suspend_noirq(PMSG_SUSPEND) > dpm_noirq_suspend_devices(PMSG_SUSPEND) > pm_transition = PMSG_SUSPEND > while (!list_empty(&dpm_late_early_list)) > device_suspend_noirq(dev) > dpm_async_fn(dev, async_suspend_noirq) > if (is_async(dev)) > async_schedule_dev(async_suspend_noirq) # async path > > async_suspend_noirq # called asynchronously > __device_suspend_noirq(dev, PMSG_SUSPEND, true) > callback = pm_noirq_op(PMSG_SUSPEND) # .suspend_noirq() > dpm_run_callback(callback) # async call > > __device_suspend_noirq(dev, pm_transition, false) # sync path > callback = pm_noirq_op(PMSG_SUSPEND) # .suspend_noirq() > dpm_run_callback(callback) # sync call > > async_synchronize_full # wait > > If a driver has called device_enable_async_suspend(), we'll use the > async_schedule_dev() path to schedule the appropriate .suspend_noirq() > method. After scheduling it via the async path or directly calling it > via the sync path, the async_synchronize_full() waits for completion > of all the async methods. Correct me if I'm wrong: in the suspend scenario, there are several phases, and async_synchronize_full() ensures that one phase finishes before the next phase starts. For example: dpm_suspend_end dpm_suspend_late # phase 1 while (!list_empty(&dpm_suspended_list)) device_suspend_late async_synchronize_full # finish phase 1 dpm_suspend_noirq # phase 2 dpm_noirq_suspend_devices while (!list_empty(&dpm_late_early_list)) device_suspend_noirq async_synchronize_full The device .suspend_late() and .suspend_noirq() methods may all be started asynchronously. So far there's nothing to order them within the phase, but async_synchronize_full() ensures that all the .suspend_late() methods finish before the .suspend_noirq() methods start. Obviously we do want a child's method to complete before we run the parent's method. If I understand correctly, that parent/child synchronization is done by a different method: __device_suspend_late() and __device_suspend_noirq() call dpm_wait_for_subordinate(), which waits for &dev->power.completion for all children: __device_suspend_late dpm_wait_for_subordinate dpm_wait_for_children # wait for children .suspend_late() device_for_each_child(dev, &async, dpm_wait_fn) dpm_wait_fn dpm_wait wait_for_completion(&dev->power.completion) dpm_run_callback # run parent method, e.g., ops->suspend_late complete_all(&dev->power.completion) # note completion of parent > I assume your suggestion is to do something like this: > > struct device { > struct dev_pm_info { > unsigned int async_suspend:1; > + unsigned int async_shutdown:1; > > + void device_enable_async_shutdown(struct device *dev) > + dev->power.async_shutdown = true; > > device_shutdown > while (!list_empty(&devices_kset->list)) > - dev->...->shutdown() > + if (is_async_shutdown(dev)) > + async_schedule_dev(async_shutdown) # async path > + > + async_shutdown # called asynchronously > + dev->...->shutdown() > + > + else > + dev->...->shutdown() # sync path > + > + async_synchronize_full # wait In the shutdown case, I think we still probably need the async_synchronize_full() to ensure that all the .shutdown() methods complete before we turn the power off, reboot, or kexec. But I think we also need a mechanism like dev->power.completion to make sure all the child .shutdown() methods complete before we run a parent's .shutdown(). There's not much overlap between the suspend path and the shutdown path (probably none at all), so it's tempting to use the existing dev->power.completion for shutdown as well. But I don't think that's feasible because dev->power.completion is tied up with dev->power.async_suspend, which is set by device_enable_async_suspend(). That's a different concept than async shutdown, and drivers will want one without the other. Does this make sense? Bjorn ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] driver core: Support asynchronous driver shutdown 2022-08-02 21:35 ` Bjorn Helgaas @ 2022-08-03 15:04 ` Rafael J. Wysocki 0 siblings, 0 replies; 10+ messages in thread From: Rafael J. Wysocki @ 2022-08-03 15:04 UTC (permalink / raw) To: Bjorn Helgaas Cc: Rafael J. Wysocki, Tanjore Suresh, Greg Kroah-Hartman, Christoph Hellwig, Sagi Grimberg, Bjorn Helgaas, Linux Kernel Mailing List, linux-nvme, Linux PCI On Tue, Aug 2, 2022 at 11:35 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > [Beginning of thread: https://lore.kernel.org/r/20220517220816.1635044-2-tansuresh@google.com] > On Wed, May 18, 2022 at 12:50:02PM -0500, Bjorn Helgaas wrote: > > > Devices have this async_suspend bit: > > > > struct device { > > struct dev_pm_info { > > unsigned int async_suspend:1; > > > > Drivers call device_enable_async_suspend() to set async_suspend if > > they want it. The system suspend path is something like this: > > > > suspend_enter > > dpm_suspend_noirq(PMSG_SUSPEND) > > dpm_noirq_suspend_devices(PMSG_SUSPEND) > > pm_transition = PMSG_SUSPEND > > while (!list_empty(&dpm_late_early_list)) > > device_suspend_noirq(dev) > > dpm_async_fn(dev, async_suspend_noirq) > > if (is_async(dev)) > > async_schedule_dev(async_suspend_noirq) # async path > > > > async_suspend_noirq # called asynchronously > > __device_suspend_noirq(dev, PMSG_SUSPEND, true) > > callback = pm_noirq_op(PMSG_SUSPEND) # .suspend_noirq() > > dpm_run_callback(callback) # async call > > > > __device_suspend_noirq(dev, pm_transition, false) # sync path > > callback = pm_noirq_op(PMSG_SUSPEND) # .suspend_noirq() > > dpm_run_callback(callback) # sync call > > > > async_synchronize_full # wait > > > > If a driver has called device_enable_async_suspend(), we'll use the > > async_schedule_dev() path to schedule the appropriate .suspend_noirq() > > method. After scheduling it via the async path or directly calling it > > via the sync path, the async_synchronize_full() waits for completion > > of all the async methods. > > Correct me if I'm wrong: in the suspend scenario, there are several > phases, and async_synchronize_full() ensures that one phase finishes > before the next phase starts. For example: > > dpm_suspend_end > dpm_suspend_late # phase 1 > while (!list_empty(&dpm_suspended_list)) > device_suspend_late > async_synchronize_full # finish phase 1 > dpm_suspend_noirq # phase 2 > dpm_noirq_suspend_devices > while (!list_empty(&dpm_late_early_list)) > device_suspend_noirq > async_synchronize_full > > The device .suspend_late() and .suspend_noirq() methods may all be > started asynchronously. So far there's nothing to order them within > the phase, but async_synchronize_full() ensures that all the > .suspend_late() methods finish before the .suspend_noirq() methods > start. > > Obviously we do want a child's method to complete before we run the > parent's method. If I understand correctly, that parent/child > synchronization is done by a different method: __device_suspend_late() > and __device_suspend_noirq() call dpm_wait_for_subordinate(), which > waits for &dev->power.completion for all children: > > __device_suspend_late > dpm_wait_for_subordinate > dpm_wait_for_children # wait for children .suspend_late() > device_for_each_child(dev, &async, dpm_wait_fn) > dpm_wait_fn > dpm_wait > wait_for_completion(&dev->power.completion) > dpm_run_callback # run parent method, e.g., ops->suspend_late > complete_all(&dev->power.completion) # note completion of parent > > > I assume your suggestion is to do something like this: > > > > struct device { > > struct dev_pm_info { > > unsigned int async_suspend:1; > > + unsigned int async_shutdown:1; > > > > + void device_enable_async_shutdown(struct device *dev) > > + dev->power.async_shutdown = true; > > > > device_shutdown > > while (!list_empty(&devices_kset->list)) > > - dev->...->shutdown() > > + if (is_async_shutdown(dev)) > > + async_schedule_dev(async_shutdown) # async path > > + > > + async_shutdown # called asynchronously > > + dev->...->shutdown() > > + > > + else > > + dev->...->shutdown() # sync path > > + > > + async_synchronize_full # wait > > In the shutdown case, I think we still probably need the > async_synchronize_full() to ensure that all the .shutdown() methods > complete before we turn the power off, reboot, or kexec. > > But I think we also need a mechanism like dev->power.completion to > make sure all the child .shutdown() methods complete before we run a > parent's .shutdown(). > > There's not much overlap between the suspend path and the shutdown > path (probably none at all), so it's tempting to use the existing > dev->power.completion for shutdown as well. > > But I don't think that's feasible because dev->power.completion is > tied up with dev->power.async_suspend, which is set by > device_enable_async_suspend(). That's a different concept than async > shutdown, and drivers will want one without the other. > > Does this make sense? Well, why don't we change the code so that dev->power.completion is not tied to dev->power.async_suspend, but can be also used (analogously) if something like dev->async_shutdown is set? My point is that we know how to suspend devices asynchronously (the fact that there are multiple phases is not that important IMV), so why don't we design the async shutdown analogously? ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-08-03 15:04 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-05-17 22:08 [PATCH v3 0/3] Asynchronous shutdown interface and example implementation Tanjore Suresh 2022-05-17 22:08 ` [PATCH v3 1/3] driver core: Support asynchronous driver shutdown Tanjore Suresh 2022-05-17 22:08 ` [PATCH v3 2/3] PCI: Support asynchronous shutdown Tanjore Suresh 2022-05-17 22:08 ` [PATCH v3 3/3] nvme: Add async shutdown support Tanjore Suresh 2022-05-18 11:38 ` [PATCH v3 1/3] driver core: Support asynchronous driver shutdown Rafael J. Wysocki 2022-05-18 17:50 ` Bjorn Helgaas 2022-05-18 18:01 ` Rafael J. Wysocki 2022-05-18 20:45 ` Tanjore Suresh 2022-08-02 21:35 ` Bjorn Helgaas 2022-08-03 15:04 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).