linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
@ 2017-10-16  1:12 Rafael J. Wysocki
  2017-10-16  1:29 ` [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags Rafael J. Wysocki
                   ` (15 more replies)
  0 siblings, 16 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-16  1:12 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

Hi All,

Well, this took more time than expected, as I tried to cover everything I had
in mind regarding PM flags for drivers.

This work was triggered by attempts to fix and optimize PM in the
i2c-designware-platdev driver that ended up with adding a couple of
flags to the driver's internal data structures for the tracking of
device state (https://marc.info/?l=linux-acpi&m=150629646805636&w=2).
That approach is sort of suboptimal, though, because other drivers will
probably want to do similar things and if all of them need to use internal
flags for that, quite a bit of code duplication may ensue at least.

That can be avoided in a couple of ways and one of them is to provide a means
for drivers to tell the core what to do and to make the core take care of it
if told to do so.  Hence, the idea to use driver flags for system-wide PM
that was briefly discussed during the LPC in LA last month.

One of the flags considered at that time was to possibly cause the core
to reuse the runtime PM callback path of a device for system suspend/resume.
Admittedly, that idea didn't look too bad to me until I had started to try to
implement it and I got to the PCI bus type's hibernation callbacks.  Then, I
moved the patch I was working on to /dev/null right away.  I mean it.

No, this is not going to happen.  No way.

Moreover, that experience made me realize that the whole *idea* of using the
runtime PM callback path for system-wide PM was actually totally bogus (sorry
Ulf).

The whole point of having different callbacks pointers for different types of
device transitions is because it may be necessary to do different things in
those callbacks in general.  Now, if you consider runtime PM and system
suspend/resume *only* and from a driver perspective, then yes, in some cases
the same pair of callback routines may be used for all suspend-like and
resume-like transitions of the device, but if you add hibernation to the mix,
then it is not so clear any more unless the callbacks don't actually do any
power management at all, but simply quiesce the device's activity and then
activate it again.  Namely, changing power states of devices during the
hibernation's "freeze" and "thaw" transitions rarely makes sense at all and
the "restore" transition needs to be able to cope with uninitialized devices
(in fact, it should be prepared to cope with devices in *any* state), so
runtime PM is hardly suitable for them.  Still, if a *driver* choses to not
do any real PM in its PM callbacks and leaves that to a middle layer (quite
a few drivers do that), then it possibly can use one pair of callbacks in all
cases and be happy, but middle layers pretty much have to use different
callback routines for different transitions.

If you are a middle layer, your role is basically to do PM for a certain
group of devices.  Thus you cannot really do the same in ->suspend or
->suspend_early and in ->runtime_suspend (because the former generally need to
take device_may_wakeup() into account and the latter doesn't) and you shouldn't
really do the same in ->suspend and ->freeze (becuase the latter shouldn't
change the device's power state) and so on.  To put it bluntly, trying
to use the ->runtime_suspend callback of a middle layer for anything other
than runtime suspend is complete and utter nonsense.  At the same time, the
->runtime_resume callback of a middle layer may be reused to some extent,
but even that doesn't cover the "thaw" transitions during hibernation.

What can work (and this is the only strategy that can work AFAICS) is to
point different callback pointers *in* *a* *driver* to the same routine
if the driver wants to reuse that code.  That actually will work for PCI
and USB drivers today, at least most of the time, but unfortunately there
are problems with it for, say, platform devices.

The first problem is the requirement to track the status of the device
(suspended vs not suspended) in the callbacks, because the system-wide PM
code in the PM core doesn't do that.  The runtime PM framework does it, so
this means adding some extra code which isn't necessary for runtime PM to
the callback routines and that is not particularly nice.

The second problem is that, if the driver wants to do anything in its
->suspend callback, it generally has to prevent runtime suspend of the
device from taking place in parallel with that, which is quite cumbersome.
Usually, that is taken care of by resuming the device from runtime suspend
upfront, but generally doing that is wasteful (there may be no real need to
resume the device except for the fact that the code is designed this way).

On top of the above, there are optimizations to be made, like leaving certain
devices in suspend after system resume to avoid wasting time on waiting for
them to resume before user space can run again and similar.

This patch series focuses on addressing those problems so as to make it
easier to reuse callback routines by pointing different callback pointers
to them in device drivers.  The flags introduced here are to instruct the
PM core and middle layers (whatever they are) on how the driver wants the
device to be handled and then the driver has to provide callbacks to match
these instructions and the rest should be taken care of by the code above it.

The flags are introduced one by one to avoid making too many changes in
one go and to allow things to be explained better (hopefully).  They mostly
are mutually independent with some clearly documented exceptions.

The first three patches in the series are about an issue with the
direct-complete optimization introduced some time ago in which some middle
layers decide on whether or not to do the optimization without asking the
drivers.  And, as it turns out, in some cases the drivers actually know
better, so the new flags introduced by these patches are here for these
drivers (and the DPM_FLAG_NEVER_SKIP one is really to avoid having to define
->prepare callbacks always returning zero).

The really interesting things start to happen in patches [4-9/12] which make it
possible to avoid resuming devices from runtime suspend upfront during system
suspend at least in some cases (and when direct-complete is not applied to the
devices in question), but please refer to the changelogs for details.

The i2d-designware-platdev driver is used as the primary example in the series
and the patches modifying it are based on some previous changes currently in
linux-next AFAICS (the same applies to the intel-lpss driver), but these
patches can wait until everything is properly merged.  They are included here
mostly as illustration.

Overall, the series is based on the linux-next branch of the linux-pm.git tree
with some extra patches on top of it and all of the names of new entities
introduced in it are negotiable.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-16  1:12 [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Rafael J. Wysocki
@ 2017-10-16  1:29 ` Rafael J. Wysocki
  2017-10-16  5:34   ` Lukas Wunner
                     ` (4 more replies)
  2017-10-16  1:29 ` [PATCH 02/12] PCI / PM: Use the NEVER_SKIP driver flag Rafael J. Wysocki
                   ` (14 subsequent siblings)
  15 siblings, 5 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-16  1:29 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The motivation for this change is to provide a way to work around
a problem with the direct-complete mechanism used for avoiding
system suspend/resume handling for devices in runtime suspend.

The problem is that some middle layer code (the PCI bus type and
the ACPI PM domain in particular) returns positive values from its
system suspend ->prepare callbacks regardless of whether the driver's
->prepare returns a positive value or 0, which effectively prevents
drivers from being able to control the direct-complete feature.
Some drivers need that control, however, and the PCI bus type has
grown its own flag to deal with this issue, but since it is not
limited to PCI, it is better to address it by adding driver flags at
the core level.

To that end, add a driver_flags field to struct dev_pm_info for flags
that can be set by device drivers at the probe time to inform the PM
core and/or bus types, PM domains and so on on the capabilities and/or
preferences of device drivers.  Also add two static inline helpers
for setting that field and testing it against a given set of flags
and make the driver core clear it automatically on driver remove
and probe failures.

Define and document two PM driver flags related to the direct-
complete feature: NEVER_SKIP and SMART_PREPARE that can be used,
respectively, to indicate to the PM core that the direct-complete
mechanism should never be used for the device and to inform the
middle layer code (bus types, PM domains etc) that it can only
request the PM core to use the direct-complete mechanism for
the device (by returning a positive value from its ->prepare
callback) if it also has been requested by the driver.

While at it, make the core check pm_runtime_suspended() when
setting power.direct_complete so that it doesn't need to be
checked by ->prepare callbacks.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 Documentation/driver-api/pm/devices.rst |   14 ++++++++++++++
 Documentation/power/pci.txt             |   19 +++++++++++++++++++
 drivers/acpi/device_pm.c                |    3 +++
 drivers/base/dd.c                       |    2 ++
 drivers/base/power/main.c               |    4 +++-
 drivers/pci/pci-driver.c                |    5 ++++-
 include/linux/device.h                  |   10 ++++++++++
 include/linux/pm.h                      |   20 ++++++++++++++++++++
 8 files changed, 75 insertions(+), 2 deletions(-)

Index: linux-pm/include/linux/device.h
===================================================================
--- linux-pm.orig/include/linux/device.h
+++ linux-pm/include/linux/device.h
@@ -1070,6 +1070,16 @@ static inline void dev_pm_syscore_device
 #endif
 }
 
+static inline void dev_pm_set_driver_flags(struct device *dev, unsigned int flags)
+{
+	dev->power.driver_flags = flags;
+}
+
+static inline bool dev_pm_test_driver_flags(struct device *dev, unsigned int flags)
+{
+	return !!(dev->power.driver_flags & flags);
+}
+
 static inline void device_lock(struct device *dev)
 {
 	mutex_lock(&dev->mutex);
Index: linux-pm/include/linux/pm.h
===================================================================
--- linux-pm.orig/include/linux/pm.h
+++ linux-pm/include/linux/pm.h
@@ -550,6 +550,25 @@ struct pm_subsys_data {
 #endif
 };
 
+/*
+ * Driver flags to control system suspend/resume behavior.
+ *
+ * These flags can be set by device drivers at the probe time.  They need not be
+ * cleared by the drivers as the driver core will take care of that.
+ *
+ * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
+ * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
+ *
+ * Setting SMART_PREPARE instructs bus types and PM domains which may want
+ * system suspend/resume callbacks to be skipped for the device to return 0 from
+ * their ->prepare callbacks if the driver's ->prepare callback returns 0 (in
+ * other words, the system suspend/resume callbacks can only be skipped for the
+ * device if its driver doesn't object against that).  This flag has no effect
+ * if NEVER_SKIP is set.
+ */
+#define DPM_FLAG_NEVER_SKIP	BIT(0)
+#define DPM_FLAG_SMART_PREPARE	BIT(1)
+
 struct dev_pm_info {
 	pm_message_t		power_state;
 	unsigned int		can_wakeup:1;
@@ -561,6 +580,7 @@ struct dev_pm_info {
 	bool			is_late_suspended:1;
 	bool			early_init:1;	/* Owned by the PM core */
 	bool			direct_complete:1;	/* Owned by the PM core */
+	unsigned int		driver_flags;
 	spinlock_t		lock;
 #ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
Index: linux-pm/drivers/base/dd.c
===================================================================
--- linux-pm.orig/drivers/base/dd.c
+++ linux-pm/drivers/base/dd.c
@@ -464,6 +464,7 @@ pinctrl_bind_failed:
 	if (dev->pm_domain && dev->pm_domain->dismiss)
 		dev->pm_domain->dismiss(dev);
 	pm_runtime_reinit(dev);
+	dev_pm_set_driver_flags(dev, 0);
 
 	switch (ret) {
 	case -EPROBE_DEFER:
@@ -869,6 +870,7 @@ static void __device_release_driver(stru
 		if (dev->pm_domain && dev->pm_domain->dismiss)
 			dev->pm_domain->dismiss(dev);
 		pm_runtime_reinit(dev);
+		dev_pm_set_driver_flags(dev, 0);
 
 		klist_remove(&dev->p->knode_driver);
 		device_pm_check_callbacks(dev);
Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -1700,7 +1700,9 @@ unlock:
 	 * applies to suspend transitions, however.
 	 */
 	spin_lock_irq(&dev->power.lock);
-	dev->power.direct_complete = ret > 0 && state.event == PM_EVENT_SUSPEND;
+	dev->power.direct_complete = state.event == PM_EVENT_SUSPEND &&
+		pm_runtime_suspended(dev) && ret > 0 &&
+		!dev_pm_test_driver_flags(dev, DPM_FLAG_NEVER_SKIP);
 	spin_unlock_irq(&dev->power.lock);
 	return 0;
 }
Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -682,8 +682,11 @@ static int pci_pm_prepare(struct device
 
 	if (drv && drv->pm && drv->pm->prepare) {
 		int error = drv->pm->prepare(dev);
-		if (error)
+		if (error < 0)
 			return error;
+
+		if (!error && dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_PREPARE))
+			return 0;
 	}
 	return pci_dev_keep_suspended(to_pci_dev(dev));
 }
Index: linux-pm/drivers/acpi/device_pm.c
===================================================================
--- linux-pm.orig/drivers/acpi/device_pm.c
+++ linux-pm/drivers/acpi/device_pm.c
@@ -965,6 +965,9 @@ int acpi_subsys_prepare(struct device *d
 	if (ret < 0)
 		return ret;
 
+	if (!ret && dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_PREPARE))
+		return 0;
+
 	if (!adev || !pm_runtime_suspended(dev))
 		return 0;
 
Index: linux-pm/Documentation/driver-api/pm/devices.rst
===================================================================
--- linux-pm.orig/Documentation/driver-api/pm/devices.rst
+++ linux-pm/Documentation/driver-api/pm/devices.rst
@@ -354,6 +354,20 @@ the phases are: ``prepare``, ``suspend``
 	is because all such devices are initially set to runtime-suspended with
 	runtime PM disabled.
 
+	This feature also can be controlled by device drivers by using the
+	``DPM_FLAG_NEVER_SKIP`` and ``DPM_FLAG_SMART_PREPARE`` driver power
+	management flags.  [Typically, they are set at the time the driver is
+	probed against the device in question by passing them to the
+	:c:func:`dev_pm_set_driver_flags` helper function.]  If the first of
+	tese flags is set, the PM core will not apply the direct-complete
+	proceudre described above to the given device and, consequenty, to any
+	of its ancestors.  The second flag, when set, informs the middle layer
+	code (bus types, device types, PM domains, classes) that it should take
+	the return value of the ``->prepare`` callback provided by the driver
+	into account and it may only return a positive value from its own
+	``->prepare`` callback if the driver's one also has returned a positive
+	value.
+
     2.	The ``->suspend`` methods should quiesce the device to stop it from
 	performing I/O.  They also may save the device registers and put it into
 	the appropriate low-power state, depending on the bus type the device is
Index: linux-pm/Documentation/power/pci.txt
===================================================================
--- linux-pm.orig/Documentation/power/pci.txt
+++ linux-pm/Documentation/power/pci.txt
@@ -961,6 +961,25 @@ dev_pm_ops to indicate that one suspend
 .suspend(), .freeze(), and .poweroff() members and one resume routine is to
 be pointed to by the .resume(), .thaw(), and .restore() members.
 
+3.1.19. Driver Flags for Power Management
+
+The PM core allows device drivers to set flags that influence the handling of
+power management for the devices by the core itself and by middle layer code
+including the PCI bus type.  The flags should be set once at the driver probe
+time with the help of the dev_pm_set_driver_flags() function and they should not
+be updated directly afterwards.
+
+The DPM_FLAG_NEVER_SKIP flag prevents the PM core from using the direct-complete
+mechanism allowing device suspend/resume callbacks to be skipped if the device
+is in runtime suspend when the system suspend starts.  That also affects all of
+the ancestors of the device, so this flag should only be used if absolutely
+necessary.
+
+The DPM_FLAG_SMART_PREPARE flag instructs the PCI bus type to only return a
+positive value from pci_pm_prepare() if the ->prepare callback provided by the
+driver of the device returns a positive value.  That allows the driver to opt
+out from using the direct-complete mechanism dynamically.
+
 3.2. Device Runtime Power Management
 ------------------------------------
 In addition to providing device power management callbacks PCI device drivers

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH 02/12] PCI / PM: Use the NEVER_SKIP driver flag
  2017-10-16  1:12 [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Rafael J. Wysocki
  2017-10-16  1:29 ` [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags Rafael J. Wysocki
@ 2017-10-16  1:29 ` Rafael J. Wysocki
  2017-10-23 16:40   ` Ulf Hansson
  2017-10-16  1:29 ` [PATCH 03/12] PM: i2c-designware-platdrv: Use DPM_FLAG_SMART_PREPARE Rafael J. Wysocki
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-16  1:29 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Replace the PCI-specific flag PCI_DEV_FLAGS_NEEDS_RESUME with the
PM core's DPM_FLAG_NEVER_SKIP one everywhere and drop it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c |    2 +-
 drivers/misc/mei/pci-me.c       |    2 +-
 drivers/misc/mei/pci-txe.c      |    2 +-
 drivers/pci/pci.c               |    3 +--
 include/linux/pci.h             |    7 +------
 5 files changed, 5 insertions(+), 11 deletions(-)

Index: linux-pm/include/linux/pci.h
===================================================================
--- linux-pm.orig/include/linux/pci.h
+++ linux-pm/include/linux/pci.h
@@ -205,13 +205,8 @@ enum pci_dev_flags {
 	PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT = (__force pci_dev_flags_t) (1 << 9),
 	/* Do not use FLR even if device advertises PCI_AF_CAP */
 	PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10),
-	/*
-	 * Resume before calling the driver's system suspend hooks, disabling
-	 * the direct_complete optimization.
-	 */
-	PCI_DEV_FLAGS_NEEDS_RESUME = (__force pci_dev_flags_t) (1 << 11),
 	/* Don't use Relaxed Ordering for TLPs directed at this device */
-	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 12),
+	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 11),
 };
 
 enum pci_irq_reroute_variant {
Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -2166,8 +2166,7 @@ bool pci_dev_keep_suspended(struct pci_d
 
 	if (!pm_runtime_suspended(dev)
 	    || pci_target_state(pci_dev, wakeup) != pci_dev->current_state
-	    || platform_pci_need_resume(pci_dev)
-	    || (pci_dev->dev_flags & PCI_DEV_FLAGS_NEEDS_RESUME))
+	    || platform_pci_need_resume(pci_dev))
 		return false;
 
 	/*
Index: linux-pm/drivers/gpu/drm/i915/i915_drv.c
===================================================================
--- linux-pm.orig/drivers/gpu/drm/i915/i915_drv.c
+++ linux-pm/drivers/gpu/drm/i915/i915_drv.c
@@ -1304,7 +1304,7 @@ int i915_driver_load(struct pci_dev *pde
 	 * becaue the HDA driver may require us to enable the audio power
 	 * domain during system suspend.
 	 */
-	pdev->dev_flags |= PCI_DEV_FLAGS_NEEDS_RESUME;
+	dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP);
 
 	ret = i915_driver_init_early(dev_priv, ent);
 	if (ret < 0)
Index: linux-pm/drivers/misc/mei/pci-txe.c
===================================================================
--- linux-pm.orig/drivers/misc/mei/pci-txe.c
+++ linux-pm/drivers/misc/mei/pci-txe.c
@@ -141,7 +141,7 @@ static int mei_txe_probe(struct pci_dev
 	 * MEI requires to resume from runtime suspend mode
 	 * in order to perform link reset flow upon system suspend.
 	 */
-	pdev->dev_flags |= PCI_DEV_FLAGS_NEEDS_RESUME;
+	dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP);
 
 	/*
 	* For not wake-able HW runtime pm framework
Index: linux-pm/drivers/misc/mei/pci-me.c
===================================================================
--- linux-pm.orig/drivers/misc/mei/pci-me.c
+++ linux-pm/drivers/misc/mei/pci-me.c
@@ -223,7 +223,7 @@ static int mei_me_probe(struct pci_dev *
 	 * MEI requires to resume from runtime suspend mode
 	 * in order to perform link reset flow upon system suspend.
 	 */
-	pdev->dev_flags |= PCI_DEV_FLAGS_NEEDS_RESUME;
+	dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP);
 
 	/*
 	* For not wake-able HW runtime pm framework

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH 03/12] PM: i2c-designware-platdrv: Use DPM_FLAG_SMART_PREPARE
  2017-10-16  1:12 [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Rafael J. Wysocki
  2017-10-16  1:29 ` [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags Rafael J. Wysocki
  2017-10-16  1:29 ` [PATCH 02/12] PCI / PM: Use the NEVER_SKIP driver flag Rafael J. Wysocki
@ 2017-10-16  1:29 ` Rafael J. Wysocki
  2017-10-23 16:57   ` Ulf Hansson
  2017-10-16  1:29 ` [PATCH 04/12] PM / core: Add SMART_SUSPEND driver flag Rafael J. Wysocki
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-16  1:29 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Modify i2c-designware-platdrv to set DPM_FLAG_SMART_PREPARE for its
devices and return 0 from the system suspend ->prepare callback
if the device has an ACPI companion object in order to tell the PM
core and middle layers to avoid skipping system suspend/resume
callbacks for the device in that case (which may be problematic,
because the device may be accessed during suspend and resume of
other devices via I2C operation regions then).

Also the pm_runtime_suspended() check in dw_i2c_plat_prepare()
is not necessary any more, because the core does it when setting
power.direct_complete for the device, so drop it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/i2c/busses/i2c-designware-platdrv.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Index: linux-pm/drivers/i2c/busses/i2c-designware-platdrv.c
===================================================================
--- linux-pm.orig/drivers/i2c/busses/i2c-designware-platdrv.c
+++ linux-pm/drivers/i2c/busses/i2c-designware-platdrv.c
@@ -370,6 +370,8 @@ static int dw_i2c_plat_probe(struct plat
 	ACPI_COMPANION_SET(&adap->dev, ACPI_COMPANION(&pdev->dev));
 	adap->dev.of_node = pdev->dev.of_node;
 
+	dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_SMART_PREPARE);
+
 	/* The code below assumes runtime PM to be disabled. */
 	WARN_ON(pm_runtime_enabled(&pdev->dev));
 
@@ -433,7 +435,13 @@ MODULE_DEVICE_TABLE(of, dw_i2c_of_match)
 #ifdef CONFIG_PM_SLEEP
 static int dw_i2c_plat_prepare(struct device *dev)
 {
-	return pm_runtime_suspended(dev);
+	/*
+	 * If the ACPI companion device object is present for this device, it
+	 * may be accessed during suspend and resume of other devices via I2C
+	 * operation regions, so tell the PM core and middle layers to avoid
+	 * skipping system suspend/resume callbacks for it in that case.
+	 */
+	return !has_acpi_companion(dev);
 }
 
 static void dw_i2c_plat_complete(struct device *dev)

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH 04/12] PM / core: Add SMART_SUSPEND driver flag
  2017-10-16  1:12 [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Rafael J. Wysocki
                   ` (2 preceding siblings ...)
  2017-10-16  1:29 ` [PATCH 03/12] PM: i2c-designware-platdrv: Use DPM_FLAG_SMART_PREPARE Rafael J. Wysocki
@ 2017-10-16  1:29 ` Rafael J. Wysocki
  2017-10-23 19:01   ` Ulf Hansson
  2017-10-24  5:22   ` Ulf Hansson
  2017-10-16  1:29 ` [PATCH 05/12] PCI / PM: Drop unnecessary invocations of pcibios_pm_ops callbacks Rafael J. Wysocki
                   ` (11 subsequent siblings)
  15 siblings, 2 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-16  1:29 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Define and document a SMART_SUSPEND flag to instruct bus types and PM
domains that the system suspend callbacks provided by the driver can
cope with runtime-suspended devices, so from the driver's perspective
it should be safe to leave devices in runtime suspend during system
suspend.

Setting that flag also causes the PM core to skip the "late" and
"noirq" phases of device suspend for devices that remain in runtime
suspend at the beginning of the "late" phase (when runtime PM has
been disabled for them) under the assumption that their state cannot
(and should not) change after that point until the system suspend
transition is complete.  Moreover, the PM core prevents runtime PM
from acting on devices with DPM_FLAG_SMART_SUSPEND during system
resume by setting their runtime PM status to "active" at the end of
the "early" phase (right prior to enabling runtime PM for them).
That allows system resume callbacks to do whatever is necessary to
resume the device without worrying about runtime PM possibly
running in parallel with them.

However, that doesn't apply to transitions involving ->thaw_noirq,
->thaw_early and ->thaw callbacks during hibernation, as they
generally are not expected to change the power states of devices.
Consequently, if a device is in runtime suspend at the beginning
of such a transition, it must stay in runtime suspend until the
"complete" phase of it (since the callbacks may not change its
power state).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 Documentation/driver-api/pm/devices.rst |   17 ++++++++
 drivers/base/power/main.c               |   63 ++++++++++++++++++++++++++++----
 include/linux/pm.h                      |    9 ++++
 3 files changed, 82 insertions(+), 7 deletions(-)

Index: linux-pm/Documentation/driver-api/pm/devices.rst
===================================================================
--- linux-pm.orig/Documentation/driver-api/pm/devices.rst
+++ linux-pm/Documentation/driver-api/pm/devices.rst
@@ -766,6 +766,23 @@ the state of devices (possibly except fo
 from their ``->prepare`` and ``->suspend`` callbacks (or equivalent) *before*
 invoking device drivers' ``->suspend`` callbacks (or equivalent).
 
+Some bus types and PM domains have a policy to resume all devices from runtime
+suspend upfront in their ``->suspend`` callbacks, but that may not be really
+necessary if the system suspend-resume callbacks provided by the device's
+driver can cope with runtime-suspended devices.  The driver can indicate that
+by setting ``DPM_FLAG_SMART_SUSPEND`` in :c:member:`power.driver_flags` at the
+probe time, by passing it to the :c:func:`dev_pm_set_driver_flags` helper.  That
+also causes the PM core to skip the ``suspend_late`` and ``suspend_noirq``
+phases of device suspend for the device if it remains in runtime suspend at the
+beginning of the ``suspend_late`` phase (when runtime PM has been disabled for
+it) under the assumption that its state cannot (and should not) change after
+that point until the system-wide transition is over.  Moreover, the PM core
+updates the runtime power management status of devices with
+``DPM_FLAG_SMART_SUSPEND`` set to "active" at the end of the ``resume_early``
+phase of device resume (right prior to enabling runtime PM for them) in order
+to prevent runtime PM from acting on them before the ``complete`` phase, which
+means that they should be put into the full-power state before that phase.
+
 During system-wide resume from a sleep state it's easiest to put devices into
 the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.
 Refer to that document for more information regarding this particular issue as
Index: linux-pm/include/linux/pm.h
===================================================================
--- linux-pm.orig/include/linux/pm.h
+++ linux-pm/include/linux/pm.h
@@ -558,6 +558,7 @@ struct pm_subsys_data {
  *
  * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
  * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
+ * SMART_SUSPEND: No need to resume the device from runtime suspend.
  *
  * Setting SMART_PREPARE instructs bus types and PM domains which may want
  * system suspend/resume callbacks to be skipped for the device to return 0 from
@@ -565,9 +566,17 @@ struct pm_subsys_data {
  * other words, the system suspend/resume callbacks can only be skipped for the
  * device if its driver doesn't object against that).  This flag has no effect
  * if NEVER_SKIP is set.
+ *
+ * Setting SMART_SUSPEND instructs bus types and PM domains which may want to
+ * runtime resume the device upfront during system suspend that doing so is not
+ * necessary from the driver's perspective.  It also causes the PM core to skip
+ * the "late" and "noirq" phases of device suspend for the device if it remains
+ * in runtime suspend at the beginning of the "late" phase (when runtime PM has
+ * been disabled for it).
  */
 #define DPM_FLAG_NEVER_SKIP	BIT(0)
 #define DPM_FLAG_SMART_PREPARE	BIT(1)
+#define DPM_FLAG_SMART_SUSPEND	BIT(2)
 
 struct dev_pm_info {
 	pm_message_t		power_state;
Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -551,6 +551,18 @@ static int device_resume_noirq(struct de
 	if (!dev->power.is_noirq_suspended)
 		goto Out;
 
+	if (dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
+	    pm_runtime_status_suspended(dev) && (state.event == PM_EVENT_THAW ||
+	    state.event == PM_EVENT_RECOVER)) {
+		/*
+		 * The device has to stay in runtime suspend, because the
+		 * subsequent callbacks may not try to change its power state.
+		 */
+		dev->power.is_suspended = false;
+		dev->power.is_late_suspended = false;
+		goto Skip;
+	}
+
 	dpm_wait_for_superior(dev, async);
 
 	if (dev->pm_domain) {
@@ -573,9 +585,11 @@ static int device_resume_noirq(struct de
 	}
 
 	error = dpm_run_callback(callback, dev, state, info);
+
+Skip:
 	dev->power.is_noirq_suspended = false;
 
- Out:
+Out:
 	complete_all(&dev->power.completion);
 	TRACE_RESUME(error);
 	return error;
@@ -715,6 +729,14 @@ static int device_resume_early(struct de
 	error = dpm_run_callback(callback, dev, state, info);
 	dev->power.is_late_suspended = false;
 
+	/*
+	 * Devices with DPM_FLAG_SMART_SUSPEND may be left in runtime suspend
+	 * during system suspend, so update their runtime PM status to "active"
+	 * to prevent runtime PM from acting on them before device_complete().
+	 */
+	if (dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND))
+		pm_runtime_set_active(dev);
+
  Out:
 	TRACE_RESUME(error);
 
@@ -1107,6 +1129,15 @@ static int __device_suspend_noirq(struct
 	if (dev->power.syscore || dev->power.direct_complete)
 		goto Complete;
 
+	/*
+	 * The state of devices with DPM_FLAG_SMART_SUSPEND set that remain in
+	 * runtime suspend at this point cannot change going forward, so skip
+	 * the callback invocation for them.
+	 */
+	if (dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
+	    pm_runtime_status_suspended(dev))
+		goto Skip;
+
 	if (dev->pm_domain) {
 		info = "noirq power domain ";
 		callback = pm_noirq_op(&dev->pm_domain->ops, state);
@@ -1127,10 +1158,13 @@ static int __device_suspend_noirq(struct
 	}
 
 	error = dpm_run_callback(callback, dev, state, info);
-	if (!error)
-		dev->power.is_noirq_suspended = true;
-	else
+	if (error) {
 		async_error = error;
+		goto Complete;
+	}
+
+Skip:
+	dev->power.is_noirq_suspended = true;
 
 Complete:
 	complete_all(&dev->power.completion);
@@ -1268,6 +1302,15 @@ static int __device_suspend_late(struct
 	if (dev->power.syscore || dev->power.direct_complete)
 		goto Complete;
 
+	/*
+	 * The state of devices with DPM_FLAG_SMART_SUSPEND set that remain in
+	 * runtime suspend at this point cannot change going forward, so skip
+	 * the callback invocation for them.
+	 */
+	if (dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
+	    pm_runtime_status_suspended(dev))
+		goto Skip;
+
 	if (dev->pm_domain) {
 		info = "late power domain ";
 		callback = pm_late_early_op(&dev->pm_domain->ops, state);
@@ -1288,10 +1331,13 @@ static int __device_suspend_late(struct
 	}
 
 	error = dpm_run_callback(callback, dev, state, info);
-	if (!error)
-		dev->power.is_late_suspended = true;
-	else
+	if (error) {
 		async_error = error;
+		goto Complete;
+	}
+
+Skip:
+	dev->power.is_late_suspended = true;
 
 Complete:
 	TRACE_SUSPEND(error);
@@ -1652,6 +1698,9 @@ static int device_prepare(struct device
 	if (dev->power.syscore)
 		return 0;
 
+	WARN_ON(dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
+		!pm_runtime_enabled(dev));
+
 	/*
 	 * If a device's parent goes into runtime suspend at the wrong time,
 	 * it won't be possible to resume the device.  To prevent this we

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH 05/12] PCI / PM: Drop unnecessary invocations of pcibios_pm_ops callbacks
  2017-10-16  1:12 [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Rafael J. Wysocki
                   ` (3 preceding siblings ...)
  2017-10-16  1:29 ` [PATCH 04/12] PM / core: Add SMART_SUSPEND driver flag Rafael J. Wysocki
@ 2017-10-16  1:29 ` Rafael J. Wysocki
  2017-10-23 19:06   ` Ulf Hansson
  2017-10-16  1:29 ` [PATCH 06/12] PCI / PM: Take SMART_SUSPEND driver flag into account Rafael J. Wysocki
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-16  1:29 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The only user of non-empty pcibios_pm_ops is s390 and it only uses
"noirq" callbacks, so drop the invocations of the other pcibios_pm_ops
callbacks from the PCI PM code.

That will allow subsequent changes to be somewhat simpler.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/pci/pci-driver.c |   18 ------------------
 1 file changed, 18 deletions(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -918,9 +918,6 @@ static int pci_pm_freeze(struct device *
 			return error;
 	}
 
-	if (pcibios_pm_ops.freeze)
-		return pcibios_pm_ops.freeze(dev);
-
 	return 0;
 }
 
@@ -982,12 +979,6 @@ static int pci_pm_thaw(struct device *de
 	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 	int error = 0;
 
-	if (pcibios_pm_ops.thaw) {
-		error = pcibios_pm_ops.thaw(dev);
-		if (error)
-			return error;
-	}
-
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_resume(dev);
 
@@ -1032,9 +1023,6 @@ static int pci_pm_poweroff(struct device
  Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
-	if (pcibios_pm_ops.poweroff)
-		return pcibios_pm_ops.poweroff(dev);
-
 	return 0;
 }
 
@@ -1107,12 +1095,6 @@ static int pci_pm_restore(struct device
 	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 	int error = 0;
 
-	if (pcibios_pm_ops.restore) {
-		error = pcibios_pm_ops.restore(dev);
-		if (error)
-			return error;
-	}
-
 	/*
 	 * This is necessary for the hibernation error path in which restore is
 	 * called without restoring the standard config registers of the device.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH 06/12] PCI / PM: Take SMART_SUSPEND driver flag into account
  2017-10-16  1:12 [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Rafael J. Wysocki
                   ` (4 preceding siblings ...)
  2017-10-16  1:29 ` [PATCH 05/12] PCI / PM: Drop unnecessary invocations of pcibios_pm_ops callbacks Rafael J. Wysocki
@ 2017-10-16  1:29 ` Rafael J. Wysocki
  2017-10-16  1:29 ` [PATCH 07/12] ACPI / LPSS: Consolidate runtime PM and system sleep handling Rafael J. Wysocki
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-16  1:29 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make the PCI bus type take DPM_FLAG_SMART_SUSPEND into account in its
system suspend callbacks and make sure that all code that should not
run in parallel with pci_pm_runtime_resume() is executed in the "late"
phases of system suspend, freeze and poweroff transitions.

[Note that the pm_runtime_suspended() check in pci_dev_keep_suspended()
is an optimization, because if is not passed, all of the subsequent
checks may be skipped and some of them are much more overhead in
general.]

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
---
 Documentation/power/pci.txt |    6 ++++
 drivers/pci/pci-driver.c    |   56 ++++++++++++++++++++++++++++++--------------
 2 files changed, 45 insertions(+), 17 deletions(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -727,18 +727,25 @@ static int pci_pm_suspend(struct device
 
 	if (!pm) {
 		pci_pm_default_suspend(pci_dev);
-		goto Fixup;
+		return 0;
 	}
 
 	/*
-	 * PCI devices suspended at run time need to be resumed at this point,
-	 * because in general it is necessary to reconfigure them for system
-	 * suspend.  Namely, if the device is supposed to wake up the system
-	 * from the sleep state, we may need to reconfigure it for this purpose.
-	 * In turn, if the device is not supposed to wake up the system from the
-	 * sleep state, we'll have to prevent it from signaling wake-up.
+	 * PCI devices suspended at run time may need to be resumed at this
+	 * point, because in general it may be necessary to reconfigure them for
+	 * system suspend.  Namely, if the device is expected to wake up the
+	 * system from the sleep state, it may have to be reconfigured for this
+	 * purpose, or if the device is not expected to wake up the system from
+	 * the sleep state, it should be prevented from signaling wakeup events
+	 * going forward.
+	 *
+	 * Also if the driver of the device does not indicate that its system
+	 * suspend callbacks can cope with runtime-suspended devices, it is
+	 * better to resume the device from runtime suspend here.
 	 */
-	pm_runtime_resume(dev);
+	if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) ||
+	    !pci_dev_keep_suspended(pci_dev))
+		pm_runtime_resume(dev);
 
 	pci_dev->state_saved = false;
 	if (pm->suspend) {
@@ -758,12 +765,16 @@ static int pci_pm_suspend(struct device
 		}
 	}
 
- Fixup:
-	pci_fixup_device(pci_fixup_suspend, pci_dev);
-
 	return 0;
 }
 
+static int pci_pm_suspend_late(struct device *dev)
+{
+	pci_fixup_device(pci_fixup_suspend, to_pci_dev(dev));
+
+	return pm_generic_suspend_late(dev);;
+}
+
 static int pci_pm_suspend_noirq(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
@@ -872,6 +883,7 @@ static int pci_pm_resume(struct device *
 #else /* !CONFIG_SUSPEND */
 
 #define pci_pm_suspend		NULL
+#define pci_pm_suspend_late	NULL
 #define pci_pm_suspend_noirq	NULL
 #define pci_pm_resume		NULL
 #define pci_pm_resume_noirq	NULL
@@ -906,7 +918,8 @@ static int pci_pm_freeze(struct device *
 	 * devices should not be touched during freeze/thaw transitions,
 	 * however.
 	 */
-	pm_runtime_resume(dev);
+	if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND))
+		pm_runtime_resume(dev);
 
 	pci_dev->state_saved = false;
 	if (pm->freeze) {
@@ -1004,11 +1017,13 @@ static int pci_pm_poweroff(struct device
 
 	if (!pm) {
 		pci_pm_default_suspend(pci_dev);
-		goto Fixup;
+		return 0;
 	}
 
 	/* The reason to do that is the same as in pci_pm_suspend(). */
-	pm_runtime_resume(dev);
+	if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) ||
+	    !pci_dev_keep_suspended(pci_dev))
+		pm_runtime_resume(dev);
 
 	pci_dev->state_saved = false;
 	if (pm->poweroff) {
@@ -1020,12 +1035,16 @@ static int pci_pm_poweroff(struct device
 			return error;
 	}
 
- Fixup:
-	pci_fixup_device(pci_fixup_suspend, pci_dev);
-
 	return 0;
 }
 
+static int pci_pm_poweroff_late(struct device *dev)
+{
+	pci_fixup_device(pci_fixup_suspend, to_pci_dev(dev));
+
+	return pm_generic_poweroff_late(dev);
+}
+
 static int pci_pm_poweroff_noirq(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
@@ -1124,6 +1143,7 @@ static int pci_pm_restore(struct device
 #define pci_pm_thaw		NULL
 #define pci_pm_thaw_noirq	NULL
 #define pci_pm_poweroff		NULL
+#define pci_pm_poweroff_late	NULL
 #define pci_pm_poweroff_noirq	NULL
 #define pci_pm_restore		NULL
 #define pci_pm_restore_noirq	NULL
@@ -1239,10 +1259,12 @@ static const struct dev_pm_ops pci_dev_p
 	.prepare = pci_pm_prepare,
 	.complete = pci_pm_complete,
 	.suspend = pci_pm_suspend,
+	.suspend_late = pci_pm_suspend_late,
 	.resume = pci_pm_resume,
 	.freeze = pci_pm_freeze,
 	.thaw = pci_pm_thaw,
 	.poweroff = pci_pm_poweroff,
+	.poweroff_late = pci_pm_poweroff_late,
 	.restore = pci_pm_restore,
 	.suspend_noirq = pci_pm_suspend_noirq,
 	.resume_noirq = pci_pm_resume_noirq,
Index: linux-pm/Documentation/power/pci.txt
===================================================================
--- linux-pm.orig/Documentation/power/pci.txt
+++ linux-pm/Documentation/power/pci.txt
@@ -980,6 +980,12 @@ positive value from pci_pm_prepare() if
 driver of the device returns a positive value.  That allows the driver to opt
 out from using the direct-complete mechanism dynamically.
 
+The DPM_FLAG_SMART_SUSPEND flag tells the PCI bus type that from the driver's
+perspective the device can be safely left in runtime suspend during system
+suspend.  That causes pci_pm_suspend(), pci_pm_freeze() and pci_pm_poweroff()
+to skip resuming the device from runtime suspend unless there are PCI-specific
+reasons for doing that.
+
 3.2. Device Runtime Power Management
 ------------------------------------
 In addition to providing device power management callbacks PCI device drivers

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH 07/12] ACPI / LPSS: Consolidate runtime PM and system sleep handling
  2017-10-16  1:12 [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Rafael J. Wysocki
                   ` (5 preceding siblings ...)
  2017-10-16  1:29 ` [PATCH 06/12] PCI / PM: Take SMART_SUSPEND driver flag into account Rafael J. Wysocki
@ 2017-10-16  1:29 ` Rafael J. Wysocki
  2017-10-23 19:09   ` Ulf Hansson
  2017-10-16  1:30 ` [PATCH 08/12] ACPI / PM: Take SMART_SUSPEND driver flag into account Rafael J. Wysocki
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-16  1:29 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Move the LPSS-specific code from acpi_lpss_runtime_suspend()
and acpi_lpss_runtime_resume() into separate functions,
acpi_lpss_suspend() and acpi_lpss_resume(), respectively, and
make acpi_lpss_suspend_late() and acpi_lpss_resume_early() use
them too in order to unify the runtime PM and system sleep
handling in the LPSS driver.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

This is based on an RFC I posted some time ago
(https://patchwork.kernel.org/patch/9998147/), which didn't
receive any comments and it depends on a couple of ACPI device PM
patches posted recently (https://patchwork.kernel.org/patch/10006457/
in particular).

It's included in this series, because the next patch won't work without it.

---
 drivers/acpi/acpi_lpss.c |   75 ++++++++++++++++++++---------------------------
 1 file changed, 33 insertions(+), 42 deletions(-)

Index: linux-pm/drivers/acpi/acpi_lpss.c
===================================================================
--- linux-pm.orig/drivers/acpi/acpi_lpss.c
+++ linux-pm/drivers/acpi/acpi_lpss.c
@@ -716,40 +716,6 @@ static void acpi_lpss_dismiss(struct dev
 	acpi_dev_suspend(dev, false);
 }
 
-#ifdef CONFIG_PM_SLEEP
-static int acpi_lpss_suspend_late(struct device *dev)
-{
-	struct lpss_private_data *pdata = acpi_driver_data(ACPI_COMPANION(dev));
-	int ret;
-
-	ret = pm_generic_suspend_late(dev);
-	if (ret)
-		return ret;
-
-	if (pdata->dev_desc->flags & LPSS_SAVE_CTX)
-		acpi_lpss_save_ctx(dev, pdata);
-
-	return acpi_dev_suspend(dev, device_may_wakeup(dev));
-}
-
-static int acpi_lpss_resume_early(struct device *dev)
-{
-	struct lpss_private_data *pdata = acpi_driver_data(ACPI_COMPANION(dev));
-	int ret;
-
-	ret = acpi_dev_resume(dev);
-	if (ret)
-		return ret;
-
-	acpi_lpss_d3_to_d0_delay(pdata);
-
-	if (pdata->dev_desc->flags & LPSS_SAVE_CTX)
-		acpi_lpss_restore_ctx(dev, pdata);
-
-	return pm_generic_resume_early(dev);
-}
-#endif /* CONFIG_PM_SLEEP */
-
 /* IOSF SB for LPSS island */
 #define LPSS_IOSF_UNIT_LPIOEP		0xA0
 #define LPSS_IOSF_UNIT_LPIO1		0xAB
@@ -835,19 +801,15 @@ static void lpss_iosf_exit_d3_state(void
 	mutex_unlock(&lpss_iosf_mutex);
 }
 
-static int acpi_lpss_runtime_suspend(struct device *dev)
+static int acpi_lpss_suspend(struct device *dev, bool wakeup)
 {
 	struct lpss_private_data *pdata = acpi_driver_data(ACPI_COMPANION(dev));
 	int ret;
 
-	ret = pm_generic_runtime_suspend(dev);
-	if (ret)
-		return ret;
-
 	if (pdata->dev_desc->flags & LPSS_SAVE_CTX)
 		acpi_lpss_save_ctx(dev, pdata);
 
-	ret = acpi_dev_suspend(dev, true);
+	 ret = acpi_dev_suspend(dev, wakeup);
 
 	/*
 	 * This call must be last in the sequence, otherwise PMC will return
@@ -860,7 +822,7 @@ static int acpi_lpss_runtime_suspend(str
 	return ret;
 }
 
-static int acpi_lpss_runtime_resume(struct device *dev)
+static int acpi_lpss_resume(struct device *dev)
 {
 	struct lpss_private_data *pdata = acpi_driver_data(ACPI_COMPANION(dev));
 	int ret;
@@ -881,7 +843,36 @@ static int acpi_lpss_runtime_resume(stru
 	if (pdata->dev_desc->flags & LPSS_SAVE_CTX)
 		acpi_lpss_restore_ctx(dev, pdata);
 
-	return pm_generic_runtime_resume(dev);
+	return 0;
+}
+#ifdef CONFIG_PM_SLEEP
+static int acpi_lpss_suspend_late(struct device *dev)
+{
+	int ret = pm_generic_suspend_late(dev);
+
+	return ret ? ret : acpi_lpss_suspend(dev, device_may_wakeup(dev));
+}
+
+static int acpi_lpss_resume_early(struct device *dev)
+{
+	int ret = acpi_lpss_resume(dev);
+
+	return ret ? ret : pm_generic_resume_early(dev);
+}
+#endif /* CONFIG_PM_SLEEP */
+
+static int acpi_lpss_runtime_suspend(struct device *dev)
+{
+	int ret = pm_generic_runtime_suspend(dev);
+
+	return ret ? ret : acpi_lpss_suspend(dev, true);
+}
+
+static int acpi_lpss_runtime_resume(struct device *dev)
+{
+	int ret = acpi_lpss_resume(dev);
+
+	return ret ? ret : pm_generic_runtime_resume(dev);
 }
 #endif /* CONFIG_PM */
 

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH 08/12] ACPI / PM: Take SMART_SUSPEND driver flag into account
  2017-10-16  1:12 [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Rafael J. Wysocki
                   ` (6 preceding siblings ...)
  2017-10-16  1:29 ` [PATCH 07/12] ACPI / LPSS: Consolidate runtime PM and system sleep handling Rafael J. Wysocki
@ 2017-10-16  1:30 ` Rafael J. Wysocki
  2017-10-16  1:30 ` [PATCH 09/12] PM / mfd: intel-lpss: Use DPM_FLAG_SMART_SUSPEND Rafael J. Wysocki
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-16  1:30 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make the ACPI PM domain take DPM_FLAG_SMART_SUSPEND into account in
its system suspend callbacks.

[Note that the pm_runtime_suspended() check in acpi_dev_needs_resume()
is an optimization, because if is not passed, all of the subsequent
checks may be skipped and some of them are much more overhead in
general.]

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/acpi/device_pm.c |   21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

Index: linux-pm/drivers/acpi/device_pm.c
===================================================================
--- linux-pm.orig/drivers/acpi/device_pm.c
+++ linux-pm/drivers/acpi/device_pm.c
@@ -936,7 +936,8 @@ static bool acpi_dev_needs_resume(struct
 	u32 sys_target = acpi_target_system_state();
 	int ret, state;
 
-	if (device_may_wakeup(dev) != !!adev->wakeup.prepare_count)
+	if (!pm_runtime_suspended(dev) || !adev ||
+	    device_may_wakeup(dev) != !!adev->wakeup.prepare_count)
 		return true;
 
 	if (sys_target == ACPI_STATE_S0)
@@ -968,9 +969,6 @@ int acpi_subsys_prepare(struct device *d
 	if (!ret && dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_PREPARE))
 		return 0;
 
-	if (!adev || !pm_runtime_suspended(dev))
-		return 0;
-
 	return !acpi_dev_needs_resume(dev, adev);
 }
 EXPORT_SYMBOL_GPL(acpi_subsys_prepare);
@@ -996,12 +994,17 @@ EXPORT_SYMBOL_GPL(acpi_subsys_complete);
  * acpi_subsys_suspend - Run the device driver's suspend callback.
  * @dev: Device to handle.
  *
- * Follow PCI and resume devices suspended at run time before running their
- * system suspend callbacks.
+ * Follow PCI and resume devices from runtime suspend before running their
+ * system suspend callbacks, unless the driver can cope with runtime-suspended
+ * devices during system suspend and there are no ACPI-specific reasons for
+ * resuming them.
  */
 int acpi_subsys_suspend(struct device *dev)
 {
-	pm_runtime_resume(dev);
+	if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) ||
+	    acpi_dev_needs_resume(dev, ACPI_COMPANION(dev)))
+		pm_runtime_resume(dev);
+
 	return pm_generic_suspend(dev);
 }
 EXPORT_SYMBOL_GPL(acpi_subsys_suspend);
@@ -1047,7 +1050,9 @@ int acpi_subsys_freeze(struct device *de
 	 * runtime-suspended devices should not be touched during freeze/thaw
 	 * transitions.
 	 */
-	pm_runtime_resume(dev);
+	if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND))
+		pm_runtime_resume(dev);
+
 	return pm_generic_freeze(dev);
 }
 EXPORT_SYMBOL_GPL(acpi_subsys_freeze);

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH 09/12] PM / mfd: intel-lpss: Use DPM_FLAG_SMART_SUSPEND
  2017-10-16  1:12 [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Rafael J. Wysocki
                   ` (7 preceding siblings ...)
  2017-10-16  1:30 ` [PATCH 08/12] ACPI / PM: Take SMART_SUSPEND driver flag into account Rafael J. Wysocki
@ 2017-10-16  1:30 ` Rafael J. Wysocki
  2017-10-31 15:09   ` Lee Jones
  2017-10-16  1:30 ` [PATCH 10/12] PM / core: Add LEAVE_SUSPENDED driver flag Rafael J. Wysocki
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-16  1:30 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make the intel-lpss driver set DPM_FLAG_SMART_SUSPEND for its
devices which will allow them to stay in runtime suspend during
system suspend unless they need to be reconfigured for some reason.

Also make it avoid resuming its child devices if they have
DPM_FLAG_SMART_SUSPEND set to allow them to remain in runtime
suspend during system suspend.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/mfd/intel-lpss.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Index: linux-pm/drivers/mfd/intel-lpss.c
===================================================================
--- linux-pm.orig/drivers/mfd/intel-lpss.c
+++ linux-pm/drivers/mfd/intel-lpss.c
@@ -450,6 +450,8 @@ int intel_lpss_probe(struct device *dev,
 	if (ret)
 		goto err_remove_ltr;
 
+	dev_pm_set_driver_flags(dev, DPM_FLAG_SMART_SUSPEND);
+
 	return 0;
 
 err_remove_ltr:
@@ -478,7 +480,9 @@ EXPORT_SYMBOL_GPL(intel_lpss_remove);
 
 static int resume_lpss_device(struct device *dev, void *data)
 {
-	pm_runtime_resume(dev);
+	if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND))
+		pm_runtime_resume(dev);
+
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH 10/12] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-10-16  1:12 [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Rafael J. Wysocki
                   ` (8 preceding siblings ...)
  2017-10-16  1:30 ` [PATCH 09/12] PM / mfd: intel-lpss: Use DPM_FLAG_SMART_SUSPEND Rafael J. Wysocki
@ 2017-10-16  1:30 ` Rafael J. Wysocki
  2017-10-23 19:38   ` Ulf Hansson
  2017-10-16  1:31 ` [PATCH 11/12] PM: i2c-designware-platdrv: Optimize power management Rafael J. Wysocki
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-16  1:30 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
instruct the PM core that it is desirable to leave the device in
runtime suspend after system resume (for example, the device may be
slow to resume and it may be better to avoid resuming it right away
for this reason).

Setting that flag causes the PM core to skip the ->resume_noirq,
->resume_early and ->resume callbacks for the device (like in the
direct-complete optimization case) if (1) the wakeup settings of it
are compatible with runtime PM (that is, either the device is
configured to wake up the system from sleep or it cannot generate
wakeup signals at all), and it will not be used for resuming any of
its children or consumers.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 Documentation/driver-api/pm/devices.rst |   20 +++++++
 drivers/base/power/main.c               |   81 ++++++++++++++++++++++++++++++--
 include/linux/pm.h                      |   12 +++-
 3 files changed, 104 insertions(+), 9 deletions(-)

Index: linux-pm/include/linux/pm.h
===================================================================
--- linux-pm.orig/include/linux/pm.h
+++ linux-pm/include/linux/pm.h
@@ -559,6 +559,7 @@ struct pm_subsys_data {
  * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
  * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
  * SMART_SUSPEND: No need to resume the device from runtime suspend.
+ * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible.
  *
  * Setting SMART_PREPARE instructs bus types and PM domains which may want
  * system suspend/resume callbacks to be skipped for the device to return 0 from
@@ -573,10 +574,14 @@ struct pm_subsys_data {
  * the "late" and "noirq" phases of device suspend for the device if it remains
  * in runtime suspend at the beginning of the "late" phase (when runtime PM has
  * been disabled for it).
+ *
+ * Setting LEAVE_SUSPENDED informs the PM core and middle layer code that the
+ * driver prefers the device to be left in runtime suspend after system resume.
  */
-#define DPM_FLAG_NEVER_SKIP	BIT(0)
-#define DPM_FLAG_SMART_PREPARE	BIT(1)
-#define DPM_FLAG_SMART_SUSPEND	BIT(2)
+#define DPM_FLAG_NEVER_SKIP		BIT(0)
+#define DPM_FLAG_SMART_PREPARE		BIT(1)
+#define DPM_FLAG_SMART_SUSPEND		BIT(2)
+#define DPM_FLAG_LEAVE_SUSPENDED	BIT(3)
 
 struct dev_pm_info {
 	pm_message_t		power_state;
@@ -598,6 +603,7 @@ struct dev_pm_info {
 	bool			wakeup_path:1;
 	bool			syscore:1;
 	bool			no_pm_callbacks:1;	/* Owned by the PM core */
+	unsigned int		must_resume:1;	/* Owned by the PM core */
 #else
 	unsigned int		should_wakeup:1;
 #endif
Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -705,6 +705,12 @@ static int device_resume_early(struct de
 	if (!dev->power.is_late_suspended)
 		goto Out;
 
+	if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED) &&
+	    !dev->power.must_resume) {
+		pm_runtime_set_suspended(dev);
+		goto Out;
+	}
+
 	dpm_wait_for_superior(dev, async);
 
 	if (dev->pm_domain) {
@@ -1098,6 +1104,32 @@ static pm_message_t resume_event(pm_mess
 	return PMSG_ON;
 }
 
+static void dpm_suppliers_set_must_resume(struct device *dev)
+{
+	struct device_link *link;
+	int idx;
+
+	idx = device_links_read_lock();
+
+	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
+		link->supplier->power.must_resume = true;
+
+	device_links_read_unlock(idx);
+}
+
+static void dpm_leave_suspended(struct device *dev)
+{
+	pm_runtime_set_suspended(dev);
+	dev->power.is_suspended = false;
+	dev->power.is_late_suspended = false;
+	/*
+	 * This tells middle layer code to schedule runtime resume of the device
+	 * from its ->complete callback to update the device's power state in
+	 * case the platform firmware has been involved in resuming the system.
+	 */
+	dev->power.direct_complete = true;
+}
+
 /**
  * __device_suspend_noirq - Execute a "noirq suspend" callback for given device.
  * @dev: Device to handle.
@@ -1135,8 +1167,20 @@ static int __device_suspend_noirq(struct
 	 * the callback invocation for them.
 	 */
 	if (dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
-	    pm_runtime_status_suspended(dev))
-		goto Skip;
+	    pm_runtime_status_suspended(dev)) {
+		/*
+		 * The device may be left suspended during system resume if
+		 * that is preferred by its driver and it will not be used for
+		 * resuming any of its children or consumers.
+		 */
+		if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED) &&
+		    !dev->power.must_resume) {
+			dpm_leave_suspended(dev);
+			goto Complete;
+		} else {
+			goto Skip;
+		}
+	}
 
 	if (dev->pm_domain) {
 		info = "noirq power domain ";
@@ -1163,6 +1207,28 @@ static int __device_suspend_noirq(struct
 		goto Complete;
 	}
 
+	/*
+	 * The device may be left suspended during system resume if that is
+	 * preferred by its driver and its wakeup configuration is compatible
+	 * with runtime PM, and it will not be used for resuming any of its
+	 * children or consumers.
+	 */
+	if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED) &&
+	    (device_may_wakeup(dev) || !device_can_wakeup(dev)) &&
+	    !dev->power.must_resume) {
+		dpm_leave_suspended(dev);
+		goto Complete;
+	}
+
+	/*
+	 * The parent and suppliers will be necessary to resume the device
+	 * during system resume, so avoid leaving them in runtime suspend.
+	 */
+	if (dev->parent)
+		dev->parent->power.must_resume = true;
+
+	dpm_suppliers_set_must_resume(dev);
+
 Skip:
 	dev->power.is_noirq_suspended = true;
 
@@ -1698,8 +1764,9 @@ static int device_prepare(struct device
 	if (dev->power.syscore)
 		return 0;
 
-	WARN_ON(dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
-		!pm_runtime_enabled(dev));
+	WARN_ON(!pm_runtime_enabled(dev) &&
+		dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND |
+					      DPM_FLAG_LEAVE_SUSPENDED));
 
 	/*
 	 * If a device's parent goes into runtime suspend at the wrong time,
@@ -1712,6 +1779,12 @@ static int device_prepare(struct device
 	device_lock(dev);
 
 	dev->power.wakeup_path = device_may_wakeup(dev);
+	/*
+	 * Avoid leaving devices in suspend after transitions that don't really
+	 * suspend them in general.
+	 */
+	dev->power.must_resume = state.event == PM_EVENT_FREEZE ||
+				state.event == PM_EVENT_QUIESCE;
 
 	if (dev->power.no_pm_callbacks) {
 		ret = 1;	/* Let device go direct_complete */
Index: linux-pm/Documentation/driver-api/pm/devices.rst
===================================================================
--- linux-pm.orig/Documentation/driver-api/pm/devices.rst
+++ linux-pm/Documentation/driver-api/pm/devices.rst
@@ -785,6 +785,22 @@ means that they should be put into the f
 
 During system-wide resume from a sleep state it's easiest to put devices into
 the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.
-Refer to that document for more information regarding this particular issue as
+[Refer to that document for more information regarding this particular issue as
 well as for information on the device runtime power management framework in
-general.
+general.]
+
+However, it may be desirable to leave some devices in runtime suspend after
+system resume and device drivers can use the ``DPM_FLAG_LEAVE_SUSPENDED`` flag
+to indicate to the PM core that this is the case.  If that flag is set for a
+device and the wakeup settings of it are compatible with runtime PM (that is,
+either the device is configured to wake up the system from sleep or it cannot
+generate wakeup signals at all), and it will not be used for resuming any of its
+children or consumers, the PM core will skip all of the system resume callbacks
+in the ``resume_noirq``, ``resume_early`` and ``resume`` phases for it and its
+runtime power management status will be set to "suspended".
+
+Still, if the platform firmware is involved in the handling of system resume, it
+may change the state of devices in unpredictable ways, so in that case the
+middle layer code (for example, a bus type or PM domain) the driver works with
+should update the device's power state and schedule runtime resume of it to
+align its power settings with the expectations of the runtime PM framework.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH 11/12] PM: i2c-designware-platdrv: Optimize power management
  2017-10-16  1:12 [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Rafael J. Wysocki
                   ` (9 preceding siblings ...)
  2017-10-16  1:30 ` [PATCH 10/12] PM / core: Add LEAVE_SUSPENDED driver flag Rafael J. Wysocki
@ 2017-10-16  1:31 ` Rafael J. Wysocki
  2017-10-26 20:41   ` Wolfram Sang
  2017-10-16  1:32 ` [PATCH 12/12] PM / core: Add AVOID_RPM driver flag Rafael J. Wysocki
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-16  1:31 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Optimize the power management in i2c-designware-platdrv by making it
set the DPM_FLAG_SMART_SUSPEND and DPM_FLAG_LEAVE_SUSPENDED which
allows some code to be dropped from its PM callbacks.

First, setting DPM_FLAG_SMART_SUSPEND causes the intel-lpss driver
to avoid resuming i2c-designware-platdrv devices in its ->prepare
callback, so they can stay in runtime suspend after that point even
if the direct-complete feature is not used for them.

It also causes the PM core to avoid invoking "late" and "noirq"
suspend callbacks for these devices if they are in runtime suspend
at the beginning of the "late" phase of device suspend during
system suspend.  That guarantees dw_i2c_plat_suspend() to be
called for a device only if it is not in runtime suspend.
Moreover, it also causes the PM core to set the device's runtime
PM status to "active" after calling dw_i2c_plat_resume() for
it, so the driver doesn't need internal flags to avoid invoking
either dw_i2c_plat_suspend() or dw_i2c_plat_resume() twice in
a row.

Second, setting DPM_FLAG_LEAVE_SUSPENDED enables the optimization
allowing the device to stay suspended after system resume under
suitable conditions, so again the driver doesn't need to take
care of that by itself.

Accordingly, the internal "suspended" and "skip_resume" flags
used by the driver are not necessary any more, so drop them and
simplify the driver's PM callbacks.

Additionally, notice that dw_i2c_plat_complete() only needs
to schedule runtime PM for the device if platform firmware
has been involved in resuming the system, so make it call
pm_resume_via_firmware() to check that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/i2c/busses/i2c-designware-core.h    |    2 --
 drivers/i2c/busses/i2c-designware-platdrv.c |   25 ++++++-------------------
 2 files changed, 6 insertions(+), 21 deletions(-)

Index: linux-pm/drivers/i2c/busses/i2c-designware-core.h
===================================================================
--- linux-pm.orig/drivers/i2c/busses/i2c-designware-core.h
+++ linux-pm/drivers/i2c/busses/i2c-designware-core.h
@@ -280,8 +280,6 @@ struct dw_i2c_dev {
 	int			(*acquire_lock)(struct dw_i2c_dev *dev);
 	void			(*release_lock)(struct dw_i2c_dev *dev);
 	bool			pm_disabled;
-	bool			suspended;
-	bool			skip_resume;
 	void			(*disable)(struct dw_i2c_dev *dev);
 	void			(*disable_int)(struct dw_i2c_dev *dev);
 	int			(*init)(struct dw_i2c_dev *dev);
Index: linux-pm/drivers/i2c/busses/i2c-designware-platdrv.c
===================================================================
--- linux-pm.orig/drivers/i2c/busses/i2c-designware-platdrv.c
+++ linux-pm/drivers/i2c/busses/i2c-designware-platdrv.c
@@ -42,6 +42,7 @@
 #include <linux/reset.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/suspend.h>
 
 #include "i2c-designware-core.h"
 
@@ -370,7 +371,10 @@ static int dw_i2c_plat_probe(struct plat
 	ACPI_COMPANION_SET(&adap->dev, ACPI_COMPANION(&pdev->dev));
 	adap->dev.of_node = pdev->dev.of_node;
 
-	dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_SMART_PREPARE);
+	dev_pm_set_driver_flags(&pdev->dev,
+				DPM_FLAG_SMART_PREPARE |
+				DPM_FLAG_SMART_SUSPEND |
+				DPM_FLAG_LEAVE_SUSPENDED);
 
 	/* The code below assumes runtime PM to be disabled. */
 	WARN_ON(pm_runtime_enabled(&pdev->dev));
@@ -446,7 +450,7 @@ static int dw_i2c_plat_prepare(struct de
 
 static void dw_i2c_plat_complete(struct device *dev)
 {
-	if (dev->power.direct_complete)
+	if (dev->power.direct_complete && pm_resume_via_firmware())
 		pm_request_resume(dev);
 }
 #else
@@ -459,16 +463,9 @@ static int dw_i2c_plat_suspend(struct de
 {
 	struct dw_i2c_dev *i_dev = dev_get_drvdata(dev);
 
-	if (i_dev->suspended) {
-		i_dev->skip_resume = true;
-		return 0;
-	}
-
 	i_dev->disable(i_dev);
 	i2c_dw_plat_prepare_clk(i_dev, false);
 
-	i_dev->suspended = true;
-
 	return 0;
 }
 
@@ -476,19 +473,9 @@ static int dw_i2c_plat_resume(struct dev
 {
 	struct dw_i2c_dev *i_dev = dev_get_drvdata(dev);
 
-	if (!i_dev->suspended)
-		return 0;
-
-	if (i_dev->skip_resume) {
-		i_dev->skip_resume = false;
-		return 0;
-	}
-
 	i2c_dw_plat_prepare_clk(i_dev, true);
 	i_dev->init(i_dev);
 
-	i_dev->suspended = false;
-
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH 12/12] PM / core: Add AVOID_RPM driver flag
  2017-10-16  1:12 [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Rafael J. Wysocki
                   ` (10 preceding siblings ...)
  2017-10-16  1:31 ` [PATCH 11/12] PM: i2c-designware-platdrv: Optimize power management Rafael J. Wysocki
@ 2017-10-16  1:32 ` Rafael J. Wysocki
  2017-10-17 15:33   ` Andy Shevchenko
  2017-10-16  7:08 ` [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Greg Kroah-Hartman
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-16  1:32 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Define and document a new driver flag, DPM_FLAG_AVOID_RPM, to inform
the PM core and middle layer code that the driver has something
significant to do in its ->suspend and/or ->resume callbacks and
runtime PM should be disabled for the device when these callbacks
run.

Setting DPM_FLAG_AVOID_RPM (in addition to DPM_FLAG_SMART_SUSPEND)
causes runtime PM to be disabled for the device before invoking the
driver's ->suspend callback for it and to be enabled again for it
only after the driver's ->resume callback has returned.  In addition
to that, if the device is in runtime suspend right after disabling
runtime PM for it (which means that there was no reason to resume it
from runtime suspend beforehand), the invocation of the ->suspend
callback will be skipped for it and it will be left in runtime
suspend until the "noirq" phase of the subsequent system resume.

If DPM_FLAG_SMART_SUSPEND is not set, DPM_FLAG_AVOID_RPM has no
effect.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 Documentation/driver-api/pm/devices.rst |   14 ++++++
 Documentation/power/pci.txt             |    9 +++-
 drivers/acpi/device_pm.c                |   24 ++++++++++-
 drivers/base/power/main.c               |   31 ++++++++++++++
 drivers/pci/pci-driver.c                |   69 ++++++++++++++++++++++----------
 include/linux/pm.h                      |   10 ++++
 6 files changed, 134 insertions(+), 23 deletions(-)

Index: linux-pm/include/linux/pm.h
===================================================================
--- linux-pm.orig/include/linux/pm.h
+++ linux-pm/include/linux/pm.h
@@ -560,6 +560,7 @@ struct pm_subsys_data {
  * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
  * SMART_SUSPEND: No need to resume the device from runtime suspend.
  * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible.
+ * AVOID_RPM: Disable runtime PM and check its status before ->suspend.
  *
  * Setting SMART_PREPARE instructs bus types and PM domains which may want
  * system suspend/resume callbacks to be skipped for the device to return 0 from
@@ -577,11 +578,17 @@ struct pm_subsys_data {
  *
  * Setting LEAVE_SUSPENDED informs the PM core and middle layer code that the
  * driver prefers the device to be left in runtime suspend after system resume.
+ *
+ * Setting AVOID_RPM informs the PM core and middle layer code that the driver
+ * has something significant to do in its ->suspend and/or ->resume callbacks
+ * and runtime PM should be disabled for the device when these callbacks run.
+ * If SMART_SUSPEND is not set, this flag has no effect.
  */
 #define DPM_FLAG_NEVER_SKIP		BIT(0)
 #define DPM_FLAG_SMART_PREPARE		BIT(1)
 #define DPM_FLAG_SMART_SUSPEND		BIT(2)
 #define DPM_FLAG_LEAVE_SUSPENDED	BIT(3)
+#define DPM_FLAG_AVOID_RPM		BIT(4)
 
 struct dev_pm_info {
 	pm_message_t		power_state;
@@ -604,6 +611,7 @@ struct dev_pm_info {
 	bool			syscore:1;
 	bool			no_pm_callbacks:1;	/* Owned by the PM core */
 	unsigned int		must_resume:1;	/* Owned by the PM core */
+	unsigned int		rpm_reenable:1;	/* Do not modify directly */
 #else
 	unsigned int		should_wakeup:1;
 #endif
@@ -741,6 +749,8 @@ extern int dpm_suspend_late(pm_message_t
 extern int dpm_suspend(pm_message_t state);
 extern int dpm_prepare(pm_message_t state);
 
+extern void dpm_disable_runtime_pm_early(struct device *dev);
+
 extern void __suspend_report_result(const char *function, void *fn, int ret);
 
 #define suspend_report_result(fn, ret)					\
Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -906,6 +906,10 @@ static int device_resume(struct device *
  Unlock:
 	device_unlock(dev);
 	dpm_watchdog_clear(&wd);
+	if (dev->power.rpm_reenable) {
+		pm_runtime_enable(dev);
+		dev->power.rpm_reenable = false;
+	}
 
  Complete:
 	complete_all(&dev->power.completion);
@@ -1534,6 +1538,12 @@ static int legacy_suspend(struct device
 	return error;
 }
 
+void dpm_disable_runtime_pm_early(struct device *dev)
+{
+	pm_runtime_disable(dev);
+	dev->power.rpm_reenable = true;
+}
+
 static void dpm_clear_suppliers_direct_complete(struct device *dev)
 {
 	struct device_link *link;
@@ -1636,6 +1646,27 @@ static int __device_suspend(struct devic
 	if (!callback && dev->driver && dev->driver->pm) {
 		info = "driver ";
 		callback = pm_op(dev->driver->pm, state);
+		if (callback &&
+		    dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
+		    dev_pm_test_driver_flags(dev, DPM_FLAG_AVOID_RPM)) {
+			/*
+			 * Device wakeup is enabled for runtime PM, so if the
+			 * device is not expected to wake up the system from
+			 * sleep, resume it now so that it can be reconfigured.
+			 */
+			if (device_can_wakeup(dev) && !device_may_wakeup(dev))
+				pm_runtime_resume(dev);
+
+			dpm_disable_runtime_pm_early(dev);
+			/*
+			 * If the device is already suspended now, it won't be
+			 * resumed until the subsequent system resume starts and
+			 * there is no need to suspend it again, so simply skip
+			 * the callback for it.
+			 */
+			if (pm_runtime_status_suspended(dev))
+				goto End;
+		}
 	}
 
 	error = dpm_run_callback(callback, dev, state, info);
Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -708,6 +708,39 @@ static void pci_pm_complete(struct devic
 	}
 }
 
+static bool pci_pm_check_suspend(struct device *dev)
+{
+	/*
+	 * PCI devices suspended at run time may need to be resumed at this
+	 * point, because in general it may be necessary to reconfigure them for
+	 * system suspend.  Namely, if the device is expected to wake up the
+	 * system from the sleep state, it may have to be reconfigured for this
+	 * purpose, or if the device is not expected to wake up the system from
+	 * the sleep state, it should be prevented from signaling wakeup events
+	 * going forward.
+	 *
+	 * Also if the driver of the device does not indicate that its system
+	 * suspend callbacks can cope with runtime-suspended devices, it is
+	 * better to resume the device from runtime suspend here.
+	 */
+	if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) ||
+	    !pci_dev_keep_suspended(to_pci_dev(dev)))
+		pm_runtime_resume(dev);
+
+	if (dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
+	    dev_pm_test_driver_flags(dev, DPM_FLAG_AVOID_RPM)) {
+		dpm_disable_runtime_pm_early(dev);
+		/*
+		 * If the device is in runtime suspend now, it won't be resumed
+		 * until the subsequent system resume starts and there is no
+		 * need to suspend it again, so let the callers know about that.
+		 */
+		if (pm_runtime_status_suspended(dev))
+			return true;
+	}
+	return false;
+}
+
 #else /* !CONFIG_PM_SLEEP */
 
 #define pci_pm_prepare	NULL
@@ -730,22 +763,8 @@ static int pci_pm_suspend(struct device
 		return 0;
 	}
 
-	/*
-	 * PCI devices suspended at run time may need to be resumed at this
-	 * point, because in general it may be necessary to reconfigure them for
-	 * system suspend.  Namely, if the device is expected to wake up the
-	 * system from the sleep state, it may have to be reconfigured for this
-	 * purpose, or if the device is not expected to wake up the system from
-	 * the sleep state, it should be prevented from signaling wakeup events
-	 * going forward.
-	 *
-	 * Also if the driver of the device does not indicate that its system
-	 * suspend callbacks can cope with runtime-suspended devices, it is
-	 * better to resume the device from runtime suspend here.
-	 */
-	if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) ||
-	    !pci_dev_keep_suspended(pci_dev))
-		pm_runtime_resume(dev);
+	if (pci_pm_check_suspend(dev))
+		return 0;
 
 	pci_dev->state_saved = false;
 	if (pm->suspend) {
@@ -918,8 +937,18 @@ static int pci_pm_freeze(struct device *
 	 * devices should not be touched during freeze/thaw transitions,
 	 * however.
 	 */
-	if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND))
+	if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND)) {
 		pm_runtime_resume(dev);
+	} else if (dev_pm_test_driver_flags(dev, DPM_FLAG_AVOID_RPM)) {
+		dpm_disable_runtime_pm_early(dev);
+		/*
+		 * If the device is in runtime suspend now, it won't be resumed
+		 * until the subsequent system resume starts and there is no
+		 * need to suspend it again, so simply skip the callback for it.
+		 */
+		if (pm_runtime_status_suspended(dev))
+			return 0;
+	}
 
 	pci_dev->state_saved = false;
 	if (pm->freeze) {
@@ -1020,10 +1049,8 @@ static int pci_pm_poweroff(struct device
 		return 0;
 	}
 
-	/* The reason to do that is the same as in pci_pm_suspend(). */
-	if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) ||
-	    !pci_dev_keep_suspended(pci_dev))
-		pm_runtime_resume(dev);
+	if (pci_pm_check_suspend(dev))
+		return 0;
 
 	pci_dev->state_saved = false;
 	if (pm->poweroff) {
Index: linux-pm/drivers/acpi/device_pm.c
===================================================================
--- linux-pm.orig/drivers/acpi/device_pm.c
+++ linux-pm/drivers/acpi/device_pm.c
@@ -1005,6 +1005,18 @@ int acpi_subsys_suspend(struct device *d
 	    acpi_dev_needs_resume(dev, ACPI_COMPANION(dev)))
 		pm_runtime_resume(dev);
 
+	if (dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
+	    dev_pm_test_driver_flags(dev, DPM_FLAG_AVOID_RPM)) {
+		dpm_disable_runtime_pm_early(dev);
+		/*
+		 * If the device is in runtime suspend now, it won't be resumed
+		 * until the subsequent system resume starts and there is no
+		 * need to suspend it again, so let the callers know about that.
+		 */
+		if (pm_runtime_status_suspended(dev))
+			return 0;
+	}
+
 	return pm_generic_suspend(dev);
 }
 EXPORT_SYMBOL_GPL(acpi_subsys_suspend);
@@ -1050,8 +1062,18 @@ int acpi_subsys_freeze(struct device *de
 	 * runtime-suspended devices should not be touched during freeze/thaw
 	 * transitions.
 	 */
-	if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND))
+	if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND)) {
 		pm_runtime_resume(dev);
+	} else if (dev_pm_test_driver_flags(dev, DPM_FLAG_AVOID_RPM)) {
+		dpm_disable_runtime_pm_early(dev);
+		/*
+		 * If the device is in runtime suspend now, it won't be resumed
+		 * until the subsequent system resume starts and there is no
+		 * need to suspend it again, so let the callers know about that.
+		 */
+		if (pm_runtime_status_suspended(dev))
+			return 0;
+	}
 
 	return pm_generic_freeze(dev);
 }
Index: linux-pm/Documentation/driver-api/pm/devices.rst
===================================================================
--- linux-pm.orig/Documentation/driver-api/pm/devices.rst
+++ linux-pm/Documentation/driver-api/pm/devices.rst
@@ -783,6 +783,20 @@ phase of device resume (right prior to e
 to prevent runtime PM from acting on them before the ``complete`` phase, which
 means that they should be put into the full-power state before that phase.
 
+The handling of ``DPM_FLAG_SMART_SUSPEND`` can be extended by setting another
+power management driver flag, ``DPM_FLAG_AVOID_RPM`` (it has no effect without
+``DPM_FLAG_SMART_SUSPEND`` set).  Setting it informs the PM core and middle
+layer code that the driver's ``->suspend`` and/or ``->resume`` callbacks are
+not trivial and need to be run with runtime PM disabled.  Consequently,
+runtime PM is disabled before running the ``->suspend`` callback for devices
+with both ``DPM_FLAG_SMART_SUSPEND`` and ``DPM_FLAG_AVOID_RPM`` set and it is
+enabled again only after the driver's ``->resume`` callback has returned.  In
+addition to that, if the device is in runtime suspend right after disabling
+runtime PM for it (which means that there was no reason to resume it from
+runtime suspend beforehand), the invocation of the ``->suspend`` callback will
+be skipped for it and it will be left in runtime suspend until the ongoing
+system-wide power transition is over.
+
 During system-wide resume from a sleep state it's easiest to put devices into
 the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.
 [Refer to that document for more information regarding this particular issue as
Index: linux-pm/Documentation/power/pci.txt
===================================================================
--- linux-pm.orig/Documentation/power/pci.txt
+++ linux-pm/Documentation/power/pci.txt
@@ -984,7 +984,14 @@ The DPM_FLAG_SMART_SUSPEND flag tells th
 perspective the device can be safely left in runtime suspend during system
 suspend.  That causes pci_pm_suspend(), pci_pm_freeze() and pci_pm_poweroff()
 to skip resuming the device from runtime suspend unless there are PCI-specific
-reasons for doing that.
+reasons for doing that.  In addition to that, drivers can use the
+DPM_FLAG_AVOID_RPM flag to inform the PCI bus type that its .suspend() and
+.resume() callbacks need to be run with runtime PM disabled (this flag has no
+effect without DPM_FLAG_SMART_SUSPEND set).  Then, if the device is in runtime
+suspend afrer runtime PM has been disabled for it, which means that there was
+no reason to resume it from runtime suspend beforehand, it won't be resumed
+until the ongoing system transition is over, so the execution of system suspend
+callbacks for it during that transition will be skipped.
 
 3.2. Device Runtime Power Management
 ------------------------------------

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-16  1:29 ` [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags Rafael J. Wysocki
@ 2017-10-16  5:34   ` Lukas Wunner
  2017-10-16 22:03     ` Rafael J. Wysocki
  2017-10-16  6:28   ` Greg Kroah-Hartman
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 135+ messages in thread
From: Lukas Wunner @ 2017-10-16  5:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Ulf Hansson, Andy Shevchenko, Kevin Hilman, Wolfram Sang,
	linux-i2c, Lee Jones

On Mon, Oct 16, 2017 at 03:29:02AM +0200, Rafael J. Wysocki wrote:
> +	:c:func:`dev_pm_set_driver_flags` helper function.]  If the first of
> +	tese flags is set, the PM core will not apply the direct-complete
        ^
	these

> +	proceudre described above to the given device and, consequenty, to any
        ^
        procedure

Lukas

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-16  1:29 ` [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags Rafael J. Wysocki
  2017-10-16  5:34   ` Lukas Wunner
@ 2017-10-16  6:28   ` Greg Kroah-Hartman
  2017-10-16 22:05     ` Rafael J. Wysocki
  2017-10-16  6:31   ` Greg Kroah-Hartman
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 135+ messages in thread
From: Greg Kroah-Hartman @ 2017-10-16  6:28 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On Mon, Oct 16, 2017 at 03:29:02AM +0200, Rafael J. Wysocki wrote:
>  struct dev_pm_info {
>  	pm_message_t		power_state;
>  	unsigned int		can_wakeup:1;
> @@ -561,6 +580,7 @@ struct dev_pm_info {
>  	bool			is_late_suspended:1;
>  	bool			early_init:1;	/* Owned by the PM core */
>  	bool			direct_complete:1;	/* Owned by the PM core */
> +	unsigned int		driver_flags;

Minor nit, u32 or u64?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-16  1:29 ` [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags Rafael J. Wysocki
  2017-10-16  5:34   ` Lukas Wunner
  2017-10-16  6:28   ` Greg Kroah-Hartman
@ 2017-10-16  6:31   ` Greg Kroah-Hartman
  2017-10-16 22:07     ` Rafael J. Wysocki
  2017-10-16 20:16   ` Alan Stern
  2017-10-18 23:17   ` [Update][PATCH v2 " Rafael J. Wysocki
  4 siblings, 1 reply; 135+ messages in thread
From: Greg Kroah-Hartman @ 2017-10-16  6:31 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On Mon, Oct 16, 2017 at 03:29:02AM +0200, Rafael J. Wysocki wrote:
> +static inline void dev_pm_set_driver_flags(struct device *dev, unsigned int flags)
> +{
> +	dev->power.driver_flags = flags;
> +}

Should this function just set the specific bit?  Or is it going to be ok
to set the whole value, meaning you aren't going to care about turning
on and off specific flags over the lifetime of the driver/device, you
are just going to set them once and then just test them as needed?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-16  1:12 [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Rafael J. Wysocki
                   ` (11 preceding siblings ...)
  2017-10-16  1:32 ` [PATCH 12/12] PM / core: Add AVOID_RPM driver flag Rafael J. Wysocki
@ 2017-10-16  7:08 ` Greg Kroah-Hartman
  2017-10-16 21:50   ` Rafael J. Wysocki
  2017-10-17  8:36 ` Ulf Hansson
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 135+ messages in thread
From: Greg Kroah-Hartman @ 2017-10-16  7:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On Mon, Oct 16, 2017 at 03:12:35AM +0200, Rafael J. Wysocki wrote:
> Hi All,
> 
> Well, this took more time than expected, as I tried to cover everything I had
> in mind regarding PM flags for drivers.
> 
> This work was triggered by attempts to fix and optimize PM in the
> i2c-designware-platdev driver that ended up with adding a couple of
> flags to the driver's internal data structures for the tracking of
> device state (https://marc.info/?l=linux-acpi&m=150629646805636&w=2).
> That approach is sort of suboptimal, though, because other drivers will
> probably want to do similar things and if all of them need to use internal
> flags for that, quite a bit of code duplication may ensue at least.
> 
> That can be avoided in a couple of ways and one of them is to provide a means
> for drivers to tell the core what to do and to make the core take care of it
> if told to do so.  Hence, the idea to use driver flags for system-wide PM
> that was briefly discussed during the LPC in LA last month.
> 
> One of the flags considered at that time was to possibly cause the core
> to reuse the runtime PM callback path of a device for system suspend/resume.
> Admittedly, that idea didn't look too bad to me until I had started to try to
> implement it and I got to the PCI bus type's hibernation callbacks.  Then, I
> moved the patch I was working on to /dev/null right away.  I mean it.
> 
> No, this is not going to happen.  No way.
> 
> Moreover, that experience made me realize that the whole *idea* of using the
> runtime PM callback path for system-wide PM was actually totally bogus (sorry
> Ulf).
> 
> The whole point of having different callbacks pointers for different types of
> device transitions is because it may be necessary to do different things in
> those callbacks in general.  Now, if you consider runtime PM and system
> suspend/resume *only* and from a driver perspective, then yes, in some cases
> the same pair of callback routines may be used for all suspend-like and
> resume-like transitions of the device, but if you add hibernation to the mix,
> then it is not so clear any more unless the callbacks don't actually do any
> power management at all, but simply quiesce the device's activity and then
> activate it again.  Namely, changing power states of devices during the
> hibernation's "freeze" and "thaw" transitions rarely makes sense at all and
> the "restore" transition needs to be able to cope with uninitialized devices
> (in fact, it should be prepared to cope with devices in *any* state), so
> runtime PM is hardly suitable for them.  Still, if a *driver* choses to not
> do any real PM in its PM callbacks and leaves that to a middle layer (quite
> a few drivers do that), then it possibly can use one pair of callbacks in all
> cases and be happy, but middle layers pretty much have to use different
> callback routines for different transitions.
> 
> If you are a middle layer, your role is basically to do PM for a certain
> group of devices.  Thus you cannot really do the same in ->suspend or
> ->suspend_early and in ->runtime_suspend (because the former generally need to
> take device_may_wakeup() into account and the latter doesn't) and you shouldn't
> really do the same in ->suspend and ->freeze (becuase the latter shouldn't
> change the device's power state) and so on.  To put it bluntly, trying
> to use the ->runtime_suspend callback of a middle layer for anything other
> than runtime suspend is complete and utter nonsense.  At the same time, the
> ->runtime_resume callback of a middle layer may be reused to some extent,
> but even that doesn't cover the "thaw" transitions during hibernation.
> 
> What can work (and this is the only strategy that can work AFAICS) is to
> point different callback pointers *in* *a* *driver* to the same routine
> if the driver wants to reuse that code.  That actually will work for PCI
> and USB drivers today, at least most of the time, but unfortunately there
> are problems with it for, say, platform devices.
> 
> The first problem is the requirement to track the status of the device
> (suspended vs not suspended) in the callbacks, because the system-wide PM
> code in the PM core doesn't do that.  The runtime PM framework does it, so
> this means adding some extra code which isn't necessary for runtime PM to
> the callback routines and that is not particularly nice.
> 
> The second problem is that, if the driver wants to do anything in its
> ->suspend callback, it generally has to prevent runtime suspend of the
> device from taking place in parallel with that, which is quite cumbersome.
> Usually, that is taken care of by resuming the device from runtime suspend
> upfront, but generally doing that is wasteful (there may be no real need to
> resume the device except for the fact that the code is designed this way).
> 
> On top of the above, there are optimizations to be made, like leaving certain
> devices in suspend after system resume to avoid wasting time on waiting for
> them to resume before user space can run again and similar.
> 
> This patch series focuses on addressing those problems so as to make it
> easier to reuse callback routines by pointing different callback pointers
> to them in device drivers.  The flags introduced here are to instruct the
> PM core and middle layers (whatever they are) on how the driver wants the
> device to be handled and then the driver has to provide callbacks to match
> these instructions and the rest should be taken care of by the code above it.
> 
> The flags are introduced one by one to avoid making too many changes in
> one go and to allow things to be explained better (hopefully).  They mostly
> are mutually independent with some clearly documented exceptions.
> 
> The first three patches in the series are about an issue with the
> direct-complete optimization introduced some time ago in which some middle
> layers decide on whether or not to do the optimization without asking the
> drivers.  And, as it turns out, in some cases the drivers actually know
> better, so the new flags introduced by these patches are here for these
> drivers (and the DPM_FLAG_NEVER_SKIP one is really to avoid having to define
> ->prepare callbacks always returning zero).
> 
> The really interesting things start to happen in patches [4-9/12] which make it
> possible to avoid resuming devices from runtime suspend upfront during system
> suspend at least in some cases (and when direct-complete is not applied to the
> devices in question), but please refer to the changelogs for details.
> 
> The i2d-designware-platdev driver is used as the primary example in the series
> and the patches modifying it are based on some previous changes currently in
> linux-next AFAICS (the same applies to the intel-lpss driver), but these
> patches can wait until everything is properly merged.  They are included here
> mostly as illustration.
> 
> Overall, the series is based on the linux-next branch of the linux-pm.git tree
> with some extra patches on top of it and all of the names of new entities
> introduced in it are negotiable.

Thanks for the great explaination, I was wondering how your proposal
discussed at Plumbers was going to work out in the end :)

The patch series looks good to me (minor questions already sent on the
patches), but what does this mean for drivers?  Do they now have to do a
lot of work to take advantage of this, like you did for the
i2d-designware-platdev driver?  Or will things continue to work as-is
and it's only an opt-in type thing where the bus/driver wants to take
advantage of it?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-16  1:29 ` [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags Rafael J. Wysocki
                     ` (2 preceding siblings ...)
  2017-10-16  6:31   ` Greg Kroah-Hartman
@ 2017-10-16 20:16   ` Alan Stern
  2017-10-16 22:11     ` Rafael J. Wysocki
  2017-10-18 23:17   ` [Update][PATCH v2 " Rafael J. Wysocki
  4 siblings, 1 reply; 135+ messages in thread
From: Alan Stern @ 2017-10-16 20:16 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On Mon, 16 Oct 2017, Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The motivation for this change is to provide a way to work around
> a problem with the direct-complete mechanism used for avoiding
> system suspend/resume handling for devices in runtime suspend.
> 
> The problem is that some middle layer code (the PCI bus type and
> the ACPI PM domain in particular) returns positive values from its
> system suspend ->prepare callbacks regardless of whether the driver's
> ->prepare returns a positive value or 0, which effectively prevents
> drivers from being able to control the direct-complete feature.
> Some drivers need that control, however, and the PCI bus type has
> grown its own flag to deal with this issue, but since it is not
> limited to PCI, it is better to address it by adding driver flags at
> the core level.

I'm curious: Why does the PCI bus type (and others) do this?  Why 
doesn't it do what the driver says to do?

Alan Stern

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-16  7:08 ` [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Greg Kroah-Hartman
@ 2017-10-16 21:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-16 21:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Rafael J. Wysocki, Linux PM, Bjorn Helgaas, Alan Stern, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Ulf Hansson, Andy Shevchenko, Kevin Hilman, Wolfram Sang,
	linux-i2c, Lee Jones

On Mon, Oct 16, 2017 at 9:08 AM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Mon, Oct 16, 2017 at 03:12:35AM +0200, Rafael J. Wysocki wrote:
>> Hi All,
>>
>> Well, this took more time than expected, as I tried to cover everything I had
>> in mind regarding PM flags for drivers.
>>
>> This work was triggered by attempts to fix and optimize PM in the
>> i2c-designware-platdev driver that ended up with adding a couple of
>> flags to the driver's internal data structures for the tracking of
>> device state (https://marc.info/?l=linux-acpi&m=150629646805636&w=2).
>> That approach is sort of suboptimal, though, because other drivers will
>> probably want to do similar things and if all of them need to use internal
>> flags for that, quite a bit of code duplication may ensue at least.
>>
>> That can be avoided in a couple of ways and one of them is to provide a means
>> for drivers to tell the core what to do and to make the core take care of it
>> if told to do so.  Hence, the idea to use driver flags for system-wide PM
>> that was briefly discussed during the LPC in LA last month.
>>
>> One of the flags considered at that time was to possibly cause the core
>> to reuse the runtime PM callback path of a device for system suspend/resume.
>> Admittedly, that idea didn't look too bad to me until I had started to try to
>> implement it and I got to the PCI bus type's hibernation callbacks.  Then, I
>> moved the patch I was working on to /dev/null right away.  I mean it.
>>
>> No, this is not going to happen.  No way.
>>
>> Moreover, that experience made me realize that the whole *idea* of using the
>> runtime PM callback path for system-wide PM was actually totally bogus (sorry
>> Ulf).
>>
>> The whole point of having different callbacks pointers for different types of
>> device transitions is because it may be necessary to do different things in
>> those callbacks in general.  Now, if you consider runtime PM and system
>> suspend/resume *only* and from a driver perspective, then yes, in some cases
>> the same pair of callback routines may be used for all suspend-like and
>> resume-like transitions of the device, but if you add hibernation to the mix,
>> then it is not so clear any more unless the callbacks don't actually do any
>> power management at all, but simply quiesce the device's activity and then
>> activate it again.  Namely, changing power states of devices during the
>> hibernation's "freeze" and "thaw" transitions rarely makes sense at all and
>> the "restore" transition needs to be able to cope with uninitialized devices
>> (in fact, it should be prepared to cope with devices in *any* state), so
>> runtime PM is hardly suitable for them.  Still, if a *driver* choses to not
>> do any real PM in its PM callbacks and leaves that to a middle layer (quite
>> a few drivers do that), then it possibly can use one pair of callbacks in all
>> cases and be happy, but middle layers pretty much have to use different
>> callback routines for different transitions.
>>
>> If you are a middle layer, your role is basically to do PM for a certain
>> group of devices.  Thus you cannot really do the same in ->suspend or
>> ->suspend_early and in ->runtime_suspend (because the former generally need to
>> take device_may_wakeup() into account and the latter doesn't) and you shouldn't
>> really do the same in ->suspend and ->freeze (becuase the latter shouldn't
>> change the device's power state) and so on.  To put it bluntly, trying
>> to use the ->runtime_suspend callback of a middle layer for anything other
>> than runtime suspend is complete and utter nonsense.  At the same time, the
>> ->runtime_resume callback of a middle layer may be reused to some extent,
>> but even that doesn't cover the "thaw" transitions during hibernation.
>>
>> What can work (and this is the only strategy that can work AFAICS) is to
>> point different callback pointers *in* *a* *driver* to the same routine
>> if the driver wants to reuse that code.  That actually will work for PCI
>> and USB drivers today, at least most of the time, but unfortunately there
>> are problems with it for, say, platform devices.
>>
>> The first problem is the requirement to track the status of the device
>> (suspended vs not suspended) in the callbacks, because the system-wide PM
>> code in the PM core doesn't do that.  The runtime PM framework does it, so
>> this means adding some extra code which isn't necessary for runtime PM to
>> the callback routines and that is not particularly nice.
>>
>> The second problem is that, if the driver wants to do anything in its
>> ->suspend callback, it generally has to prevent runtime suspend of the
>> device from taking place in parallel with that, which is quite cumbersome.
>> Usually, that is taken care of by resuming the device from runtime suspend
>> upfront, but generally doing that is wasteful (there may be no real need to
>> resume the device except for the fact that the code is designed this way).
>>
>> On top of the above, there are optimizations to be made, like leaving certain
>> devices in suspend after system resume to avoid wasting time on waiting for
>> them to resume before user space can run again and similar.
>>
>> This patch series focuses on addressing those problems so as to make it
>> easier to reuse callback routines by pointing different callback pointers
>> to them in device drivers.  The flags introduced here are to instruct the
>> PM core and middle layers (whatever they are) on how the driver wants the
>> device to be handled and then the driver has to provide callbacks to match
>> these instructions and the rest should be taken care of by the code above it.
>>
>> The flags are introduced one by one to avoid making too many changes in
>> one go and to allow things to be explained better (hopefully).  They mostly
>> are mutually independent with some clearly documented exceptions.
>>
>> The first three patches in the series are about an issue with the
>> direct-complete optimization introduced some time ago in which some middle
>> layers decide on whether or not to do the optimization without asking the
>> drivers.  And, as it turns out, in some cases the drivers actually know
>> better, so the new flags introduced by these patches are here for these
>> drivers (and the DPM_FLAG_NEVER_SKIP one is really to avoid having to define
>> ->prepare callbacks always returning zero).
>>
>> The really interesting things start to happen in patches [4-9/12] which make it
>> possible to avoid resuming devices from runtime suspend upfront during system
>> suspend at least in some cases (and when direct-complete is not applied to the
>> devices in question), but please refer to the changelogs for details.
>>
>> The i2d-designware-platdev driver is used as the primary example in the series
>> and the patches modifying it are based on some previous changes currently in
>> linux-next AFAICS (the same applies to the intel-lpss driver), but these
>> patches can wait until everything is properly merged.  They are included here
>> mostly as illustration.
>>
>> Overall, the series is based on the linux-next branch of the linux-pm.git tree
>> with some extra patches on top of it and all of the names of new entities
>> introduced in it are negotiable.
>
> Thanks for the great explaination, I was wondering how your proposal
> discussed at Plumbers was going to work out in the end :)
>
> The patch series looks good to me (minor questions already sent on the
> patches),

Cool. :-)

> but what does this mean for drivers?  Do they now have to do a
> lot of work to take advantage of this, like you did for the
> i2d-designware-platdev driver?  Or will things continue to work as-is
> and it's only an opt-in type thing where the bus/driver wants to take
> advantage of it?

It's envisioned as an opt-in thing mostly, except for the flags
introduced by patch [01/12] that may be needed to address existing
issues.

It is not strictly necessary to set any of the other flags, but I
guess some use cases may benefit quite a bit from setting them. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-16  5:34   ` Lukas Wunner
@ 2017-10-16 22:03     ` Rafael J. Wysocki
  0 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-16 22:03 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Ulf Hansson, Andy Shevchenko, Kevin Hilman, Wolfram Sang,
	linux-i2c, Lee Jones

On Monday, October 16, 2017 7:34:52 AM CEST Lukas Wunner wrote:
> On Mon, Oct 16, 2017 at 03:29:02AM +0200, Rafael J. Wysocki wrote:
> > +	:c:func:`dev_pm_set_driver_flags` helper function.]  If the first of
> > +	tese flags is set, the PM core will not apply the direct-complete
>         ^
> 	these
> 
> > +	proceudre described above to the given device and, consequenty, to any
>         ^
>         procedure
> 

Thanks!

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-16  6:28   ` Greg Kroah-Hartman
@ 2017-10-16 22:05     ` Rafael J. Wysocki
  2017-10-17  7:15       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-16 22:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On Monday, October 16, 2017 8:28:52 AM CEST Greg Kroah-Hartman wrote:
> On Mon, Oct 16, 2017 at 03:29:02AM +0200, Rafael J. Wysocki wrote:
> >  struct dev_pm_info {
> >  	pm_message_t		power_state;
> >  	unsigned int		can_wakeup:1;
> > @@ -561,6 +580,7 @@ struct dev_pm_info {
> >  	bool			is_late_suspended:1;
> >  	bool			early_init:1;	/* Owned by the PM core */
> >  	bool			direct_complete:1;	/* Owned by the PM core */
> > +	unsigned int		driver_flags;
> 
> Minor nit, u32 or u64?

u32 I think, will update.

BTW, there's a mess in this struct overall and I'd like all of the bit fileds
to be the same type (and that shouldn't be bool IMO :-)).

Do you prefer u32 or unsinged int?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-16  6:31   ` Greg Kroah-Hartman
@ 2017-10-16 22:07     ` Rafael J. Wysocki
  2017-10-17 13:26       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-16 22:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On Monday, October 16, 2017 8:31:22 AM CEST Greg Kroah-Hartman wrote:
> On Mon, Oct 16, 2017 at 03:29:02AM +0200, Rafael J. Wysocki wrote:
> > +static inline void dev_pm_set_driver_flags(struct device *dev, unsigned int flags)
> > +{
> > +	dev->power.driver_flags = flags;
> > +}
> 
> Should this function just set the specific bit?  Or is it going to be ok
> to set the whole value, meaning you aren't going to care about turning
> on and off specific flags over the lifetime of the driver/device, you
> are just going to set them once and then just test them as needed?

The idea is to set them once and they should not be touched again until
the driver (or device) goes away, so that would be the whole value at once
(and one of the i2c-designware-platdrv patches actually sets multiple flags
in one go).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-16 20:16   ` Alan Stern
@ 2017-10-16 22:11     ` Rafael J. Wysocki
  0 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-16 22:11 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linux PM, Bjorn Helgaas, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On Monday, October 16, 2017 10:16:15 PM CEST Alan Stern wrote:
> On Mon, 16 Oct 2017, Rafael J. Wysocki wrote:
> 
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > The motivation for this change is to provide a way to work around
> > a problem with the direct-complete mechanism used for avoiding
> > system suspend/resume handling for devices in runtime suspend.
> > 
> > The problem is that some middle layer code (the PCI bus type and
> > the ACPI PM domain in particular) returns positive values from its
> > system suspend ->prepare callbacks regardless of whether the driver's
> > ->prepare returns a positive value or 0, which effectively prevents
> > drivers from being able to control the direct-complete feature.
> > Some drivers need that control, however, and the PCI bus type has
> > grown its own flag to deal with this issue, but since it is not
> > limited to PCI, it is better to address it by adding driver flags at
> > the core level.
> 
> I'm curious: Why does the PCI bus type (and others) do this?  Why 
> doesn't it do what the driver says to do?

Well, the idea was that it might work for the existing drivers without the
need to modify them (and they would have had to be modified had the driver's
->prepare return value been required to be taken into account).

It actually does work for them in general, although with some notable
exceptions.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-16 22:05     ` Rafael J. Wysocki
@ 2017-10-17  7:15       ` Greg Kroah-Hartman
  2017-10-17 15:26         ` Rafael J. Wysocki
  0 siblings, 1 reply; 135+ messages in thread
From: Greg Kroah-Hartman @ 2017-10-17  7:15 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On Tue, Oct 17, 2017 at 12:05:11AM +0200, Rafael J. Wysocki wrote:
> On Monday, October 16, 2017 8:28:52 AM CEST Greg Kroah-Hartman wrote:
> > On Mon, Oct 16, 2017 at 03:29:02AM +0200, Rafael J. Wysocki wrote:
> > >  struct dev_pm_info {
> > >  	pm_message_t		power_state;
> > >  	unsigned int		can_wakeup:1;
> > > @@ -561,6 +580,7 @@ struct dev_pm_info {
> > >  	bool			is_late_suspended:1;
> > >  	bool			early_init:1;	/* Owned by the PM core */
> > >  	bool			direct_complete:1;	/* Owned by the PM core */
> > > +	unsigned int		driver_flags;
> > 
> > Minor nit, u32 or u64?
> 
> u32 I think, will update.
> 
> BTW, there's a mess in this struct overall and I'd like all of the bit fileds
> to be the same type (and that shouldn't be bool IMO :-)).
> 
> Do you prefer u32 or unsinged int?

I always prefer an explicit size for variables, unless it's a "generic
loop" type thing.  So I'll always say "u32" for this.

And cleaning up the structure would be great, it's grown over time in
odd ways as you point out.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-16  1:12 [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Rafael J. Wysocki
                   ` (12 preceding siblings ...)
  2017-10-16  7:08 ` [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Greg Kroah-Hartman
@ 2017-10-17  8:36 ` Ulf Hansson
  2017-10-17 15:25   ` Rafael J. Wysocki
  2017-10-20 20:46 ` Bjorn Helgaas
  2017-10-27 22:11 ` [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 1) Rafael J. Wysocki
  15 siblings, 1 reply; 135+ messages in thread
From: Ulf Hansson @ 2017-10-17  8:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On 16 October 2017 at 03:12, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> Hi All,
>
> Well, this took more time than expected, as I tried to cover everything I had
> in mind regarding PM flags for drivers.
>
> This work was triggered by attempts to fix and optimize PM in the
> i2c-designware-platdev driver that ended up with adding a couple of
> flags to the driver's internal data structures for the tracking of
> device state (https://marc.info/?l=linux-acpi&m=150629646805636&w=2).
> That approach is sort of suboptimal, though, because other drivers will
> probably want to do similar things and if all of them need to use internal
> flags for that, quite a bit of code duplication may ensue at least.
>
> That can be avoided in a couple of ways and one of them is to provide a means
> for drivers to tell the core what to do and to make the core take care of it
> if told to do so.  Hence, the idea to use driver flags for system-wide PM
> that was briefly discussed during the LPC in LA last month.
>
> One of the flags considered at that time was to possibly cause the core
> to reuse the runtime PM callback path of a device for system suspend/resume.
> Admittedly, that idea didn't look too bad to me until I had started to try to
> implement it and I got to the PCI bus type's hibernation callbacks.  Then, I
> moved the patch I was working on to /dev/null right away.  I mean it.
>
> No, this is not going to happen.  No way.
>
> Moreover, that experience made me realize that the whole *idea* of using the
> runtime PM callback path for system-wide PM was actually totally bogus (sorry
> Ulf).
>
> The whole point of having different callbacks pointers for different types of
> device transitions is because it may be necessary to do different things in
> those callbacks in general.  Now, if you consider runtime PM and system
> suspend/resume *only* and from a driver perspective, then yes, in some cases
> the same pair of callback routines may be used for all suspend-like and
> resume-like transitions of the device, but if you add hibernation to the mix,
> then it is not so clear any more unless the callbacks don't actually do any
> power management at all, but simply quiesce the device's activity and then
> activate it again.  Namely, changing power states of devices during the
> hibernation's "freeze" and "thaw" transitions rarely makes sense at all and
> the "restore" transition needs to be able to cope with uninitialized devices
> (in fact, it should be prepared to cope with devices in *any* state), so
> runtime PM is hardly suitable for them.  Still, if a *driver* choses to not
> do any real PM in its PM callbacks and leaves that to a middle layer (quite
> a few drivers do that), then it possibly can use one pair of callbacks in all
> cases and be happy, but middle layers pretty much have to use different
> callback routines for different transitions.
>
> If you are a middle layer, your role is basically to do PM for a certain
> group of devices.  Thus you cannot really do the same in ->suspend or
> ->suspend_early and in ->runtime_suspend (because the former generally need to
> take device_may_wakeup() into account and the latter doesn't) and you shouldn't
> really do the same in ->suspend and ->freeze (becuase the latter shouldn't
> change the device's power state) and so on.  To put it bluntly, trying
> to use the ->runtime_suspend callback of a middle layer for anything other
> than runtime suspend is complete and utter nonsense.  At the same time, the
> ->runtime_resume callback of a middle layer may be reused to some extent,
> but even that doesn't cover the "thaw" transitions during hibernation.
>
> What can work (and this is the only strategy that can work AFAICS) is to
> point different callback pointers *in* *a* *driver* to the same routine
> if the driver wants to reuse that code.  That actually will work for PCI
> and USB drivers today, at least most of the time, but unfortunately there
> are problems with it for, say, platform devices.
>
> The first problem is the requirement to track the status of the device
> (suspended vs not suspended) in the callbacks, because the system-wide PM
> code in the PM core doesn't do that.  The runtime PM framework does it, so
> this means adding some extra code which isn't necessary for runtime PM to
> the callback routines and that is not particularly nice.
>
> The second problem is that, if the driver wants to do anything in its
> ->suspend callback, it generally has to prevent runtime suspend of the
> device from taking place in parallel with that, which is quite cumbersome.
> Usually, that is taken care of by resuming the device from runtime suspend
> upfront, but generally doing that is wasteful (there may be no real need to
> resume the device except for the fact that the code is designed this way).
>
> On top of the above, there are optimizations to be made, like leaving certain
> devices in suspend after system resume to avoid wasting time on waiting for
> them to resume before user space can run again and similar.
>
> This patch series focuses on addressing those problems so as to make it
> easier to reuse callback routines by pointing different callback pointers
> to them in device drivers.  The flags introduced here are to instruct the
> PM core and middle layers (whatever they are) on how the driver wants the
> device to be handled and then the driver has to provide callbacks to match
> these instructions and the rest should be taken care of by the code above it.
>
> The flags are introduced one by one to avoid making too many changes in
> one go and to allow things to be explained better (hopefully).  They mostly
> are mutually independent with some clearly documented exceptions.
>
> The first three patches in the series are about an issue with the
> direct-complete optimization introduced some time ago in which some middle
> layers decide on whether or not to do the optimization without asking the
> drivers.  And, as it turns out, in some cases the drivers actually know
> better, so the new flags introduced by these patches are here for these
> drivers (and the DPM_FLAG_NEVER_SKIP one is really to avoid having to define
> ->prepare callbacks always returning zero).
>
> The really interesting things start to happen in patches [4-9/12] which make it
> possible to avoid resuming devices from runtime suspend upfront during system
> suspend at least in some cases (and when direct-complete is not applied to the
> devices in question), but please refer to the changelogs for details.
>
> The i2d-designware-platdev driver is used as the primary example in the series
> and the patches modifying it are based on some previous changes currently in
> linux-next AFAICS (the same applies to the intel-lpss driver), but these
> patches can wait until everything is properly merged.  They are included here
> mostly as illustration.
>
> Overall, the series is based on the linux-next branch of the linux-pm.git tree
> with some extra patches on top of it and all of the names of new entities
> introduced in it are negotiable.
>
> Thanks,
> Rafael
>

I am not sure I fully understand the goal you have with this series.
Can we please try to get that clear before I continue the review.

Now, re-using runtime PM callbacks for system sleep, is already
happening. We have > 60 users (git grep "pm_runtime_force_suspend")
deploying this and from a middle layer point of view, all the trivial
cases supports this. Like the spi bus, i2c bus, amba bus, platform
bus, genpd, etc. There are no changes needed to continue to support
this option, if you see what I mean.

So, when you say that re-using runtime PM callbacks for system-wide PM
isn't going to happen, can you please elaborate what you mean?

I assume you mean that the PM core won't be involved to support this,
but is that it?

Do you also mean that *all* users of pm_runtime_force_suspend|resume()
must convert to this new thing, using "driver PM flags", so in the end
you want to remove pm_runtime_force_suspend|resume()?
 - Then if so, you must of course consider all cases for how
pm_runtime_force_suspend|resume() are being deployed currently, else
existing users can't convert to the "driver PM flags" thing. Have you
done that in this series?

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-16 22:07     ` Rafael J. Wysocki
@ 2017-10-17 13:26       ` Greg Kroah-Hartman
  0 siblings, 0 replies; 135+ messages in thread
From: Greg Kroah-Hartman @ 2017-10-17 13:26 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On Tue, Oct 17, 2017 at 12:07:37AM +0200, Rafael J. Wysocki wrote:
> On Monday, October 16, 2017 8:31:22 AM CEST Greg Kroah-Hartman wrote:
> > On Mon, Oct 16, 2017 at 03:29:02AM +0200, Rafael J. Wysocki wrote:
> > > +static inline void dev_pm_set_driver_flags(struct device *dev, unsigned int flags)
> > > +{
> > > +	dev->power.driver_flags = flags;
> > > +}
> > 
> > Should this function just set the specific bit?  Or is it going to be ok
> > to set the whole value, meaning you aren't going to care about turning
> > on and off specific flags over the lifetime of the driver/device, you
> > are just going to set them once and then just test them as needed?
> 
> The idea is to set them once and they should not be touched again until
> the driver (or device) goes away, so that would be the whole value at once
> (and one of the i2c-designware-platdrv patches actually sets multiple flags
> in one go).

Ok, thanks.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-17  8:36 ` Ulf Hansson
@ 2017-10-17 15:25   ` Rafael J. Wysocki
  2017-10-17 19:41     ` Ulf Hansson
  0 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-17 15:25 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On Tuesday, October 17, 2017 10:36:39 AM CEST Ulf Hansson wrote:
> On 16 October 2017 at 03:12, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > Hi All,
> >
> > Well, this took more time than expected, as I tried to cover everything I had
> > in mind regarding PM flags for drivers.
> >
> > This work was triggered by attempts to fix and optimize PM in the
> > i2c-designware-platdev driver that ended up with adding a couple of
> > flags to the driver's internal data structures for the tracking of
> > device state (https://marc.info/?l=linux-acpi&m=150629646805636&w=2).
> > That approach is sort of suboptimal, though, because other drivers will
> > probably want to do similar things and if all of them need to use internal
> > flags for that, quite a bit of code duplication may ensue at least.
> >
> > That can be avoided in a couple of ways and one of them is to provide a means
> > for drivers to tell the core what to do and to make the core take care of it
> > if told to do so.  Hence, the idea to use driver flags for system-wide PM
> > that was briefly discussed during the LPC in LA last month.
> >
> > One of the flags considered at that time was to possibly cause the core
> > to reuse the runtime PM callback path of a device for system suspend/resume.
> > Admittedly, that idea didn't look too bad to me until I had started to try to
> > implement it and I got to the PCI bus type's hibernation callbacks.  Then, I
> > moved the patch I was working on to /dev/null right away.  I mean it.
> >
> > No, this is not going to happen.  No way.
> >
> > Moreover, that experience made me realize that the whole *idea* of using the
> > runtime PM callback path for system-wide PM was actually totally bogus (sorry
> > Ulf).
> >
> > The whole point of having different callbacks pointers for different types of
> > device transitions is because it may be necessary to do different things in
> > those callbacks in general.  Now, if you consider runtime PM and system
> > suspend/resume *only* and from a driver perspective, then yes, in some cases
> > the same pair of callback routines may be used for all suspend-like and
> > resume-like transitions of the device, but if you add hibernation to the mix,
> > then it is not so clear any more unless the callbacks don't actually do any
> > power management at all, but simply quiesce the device's activity and then
> > activate it again.  Namely, changing power states of devices during the
> > hibernation's "freeze" and "thaw" transitions rarely makes sense at all and
> > the "restore" transition needs to be able to cope with uninitialized devices
> > (in fact, it should be prepared to cope with devices in *any* state), so
> > runtime PM is hardly suitable for them.  Still, if a *driver* choses to not
> > do any real PM in its PM callbacks and leaves that to a middle layer (quite
> > a few drivers do that), then it possibly can use one pair of callbacks in all
> > cases and be happy, but middle layers pretty much have to use different
> > callback routines for different transitions.
> >
> > If you are a middle layer, your role is basically to do PM for a certain
> > group of devices.  Thus you cannot really do the same in ->suspend or
> > ->suspend_early and in ->runtime_suspend (because the former generally need to
> > take device_may_wakeup() into account and the latter doesn't) and you shouldn't
> > really do the same in ->suspend and ->freeze (becuase the latter shouldn't
> > change the device's power state) and so on.  To put it bluntly, trying
> > to use the ->runtime_suspend callback of a middle layer for anything other
> > than runtime suspend is complete and utter nonsense.  At the same time, the
> > ->runtime_resume callback of a middle layer may be reused to some extent,
> > but even that doesn't cover the "thaw" transitions during hibernation.
> >
> > What can work (and this is the only strategy that can work AFAICS) is to
> > point different callback pointers *in* *a* *driver* to the same routine
> > if the driver wants to reuse that code.  That actually will work for PCI
> > and USB drivers today, at least most of the time, but unfortunately there
> > are problems with it for, say, platform devices.
> >
> > The first problem is the requirement to track the status of the device
> > (suspended vs not suspended) in the callbacks, because the system-wide PM
> > code in the PM core doesn't do that.  The runtime PM framework does it, so
> > this means adding some extra code which isn't necessary for runtime PM to
> > the callback routines and that is not particularly nice.
> >
> > The second problem is that, if the driver wants to do anything in its
> > ->suspend callback, it generally has to prevent runtime suspend of the
> > device from taking place in parallel with that, which is quite cumbersome.
> > Usually, that is taken care of by resuming the device from runtime suspend
> > upfront, but generally doing that is wasteful (there may be no real need to
> > resume the device except for the fact that the code is designed this way).
> >
> > On top of the above, there are optimizations to be made, like leaving certain
> > devices in suspend after system resume to avoid wasting time on waiting for
> > them to resume before user space can run again and similar.
> >
> > This patch series focuses on addressing those problems so as to make it
> > easier to reuse callback routines by pointing different callback pointers
> > to them in device drivers.  The flags introduced here are to instruct the
> > PM core and middle layers (whatever they are) on how the driver wants the
> > device to be handled and then the driver has to provide callbacks to match
> > these instructions and the rest should be taken care of by the code above it.
> >
> > The flags are introduced one by one to avoid making too many changes in
> > one go and to allow things to be explained better (hopefully).  They mostly
> > are mutually independent with some clearly documented exceptions.
> >
> > The first three patches in the series are about an issue with the
> > direct-complete optimization introduced some time ago in which some middle
> > layers decide on whether or not to do the optimization without asking the
> > drivers.  And, as it turns out, in some cases the drivers actually know
> > better, so the new flags introduced by these patches are here for these
> > drivers (and the DPM_FLAG_NEVER_SKIP one is really to avoid having to define
> > ->prepare callbacks always returning zero).
> >
> > The really interesting things start to happen in patches [4-9/12] which make it
> > possible to avoid resuming devices from runtime suspend upfront during system
> > suspend at least in some cases (and when direct-complete is not applied to the
> > devices in question), but please refer to the changelogs for details.
> >
> > The i2d-designware-platdev driver is used as the primary example in the series
> > and the patches modifying it are based on some previous changes currently in
> > linux-next AFAICS (the same applies to the intel-lpss driver), but these
> > patches can wait until everything is properly merged.  They are included here
> > mostly as illustration.
> >
> > Overall, the series is based on the linux-next branch of the linux-pm.git tree
> > with some extra patches on top of it and all of the names of new entities
> > introduced in it are negotiable.
> >
> > Thanks,
> > Rafael
> >
> 
> I am not sure I fully understand the goal you have with this series.
> Can we please try to get that clear before I continue the review.

Quoting from the above:

"This patch series focuses on addressing those problems so as to make it
easier to reuse callback routines by pointing different callback pointers
to them in device drivers.  The flags introduced here are to instruct the
PM core and middle layers (whatever they are) on how the driver wants the
device to be handled and then the driver has to provide callbacks to match
these instructions and the rest should be taken care of by the code above it."

I'm not sure what I can explain beyond that. :-)

And the i2c-designware-platdrv and intel-lpss patches show the direction
I would like to take with that going forward: use the flags to reduce code
duplication in drivers and between drivers.

> Now, re-using runtime PM callbacks for system sleep, is already
> happening. We have > 60 users (git grep "pm_runtime_force_suspend")

60 is a small number relative to the total number of device drivers in
the tree.  In particular, that scheme is totally unsuitable for PCI drivers
and how many of them there are?  Surely more than 60.

> deploying this and from a middle layer point of view, all the trivial
> cases supports this.

These functions are wrong, however, because they attempt to reuse the
whole callback *path* instead of just reusing driver callbacks.  The
*only* reason why it all "works" is because there are no middle layer
callbacks involved in that now.

If you changed them to reuse driver callbacks only today, nothing would break
AFAICS.

> Like the spi bus, i2c bus, amba bus, platform
> bus, genpd, etc. There are no changes needed to continue to support
> this option, if you see what I mean.

For the time being, nothing changes in that respect, but eventually I'd
prefer the pm_runtime_force_* things to go away, frankly.

> So, when you say that re-using runtime PM callbacks for system-wide PM
> isn't going to happen, can you please elaborate what you mean?

I didn't mean "reusing runtime PM callbacks for system-wide PM" overall, but
reusing *middle-layer* runtime PM callbacks for system-wide PM.  That is the
bogus part.

Quoting again:

"If you are a middle layer, your role is basically to do PM for a certain
group of devices.  Thus you cannot really do the same in ->suspend or
->suspend_early and in ->runtime_suspend (because the former generally need to
take device_may_wakeup() into account and the latter doesn't) and you shouldn't
really do the same in ->suspend and ->freeze (becuase the latter shouldn't
change the device's power state) and so on."

I have said for multiple times that re-using *driver* callbacks actually makes
sense and the series is for doing that easier in general among other things.

> I assume you mean that the PM core won't be involved to support this,
> but is that it?
> 
> Do you also mean that *all* users of pm_runtime_force_suspend|resume()
> must convert to this new thing, using "driver PM flags", so in the end
> you want to remove pm_runtime_force_suspend|resume()?
>  - Then if so, you must of course consider all cases for how
> pm_runtime_force_suspend|resume() are being deployed currently, else
> existing users can't convert to the "driver PM flags" thing. Have you
> done that in this series?

Let me turn this around.

The majority of cases in which pm_runtime_force_* are used *should* be
addressable using the flags introduced here.  Some case in which
pm_runtime_force_* cannot be used should be addressable by these flags
as well.

There may be some cases in which pm_runtime_force_* are used that may
require something more, but I'm not going to worry about that right now.

I'll take care of that when I'll be removing pm_runtime_force_*, which I'm
not doing here.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-17  7:15       ` Greg Kroah-Hartman
@ 2017-10-17 15:26         ` Rafael J. Wysocki
  2017-10-18  6:56           ` Greg Kroah-Hartman
  0 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-17 15:26 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On Tuesday, October 17, 2017 9:15:43 AM CEST Greg Kroah-Hartman wrote:
> On Tue, Oct 17, 2017 at 12:05:11AM +0200, Rafael J. Wysocki wrote:
> > On Monday, October 16, 2017 8:28:52 AM CEST Greg Kroah-Hartman wrote:
> > > On Mon, Oct 16, 2017 at 03:29:02AM +0200, Rafael J. Wysocki wrote:
> > > >  struct dev_pm_info {
> > > >  	pm_message_t		power_state;
> > > >  	unsigned int		can_wakeup:1;
> > > > @@ -561,6 +580,7 @@ struct dev_pm_info {
> > > >  	bool			is_late_suspended:1;
> > > >  	bool			early_init:1;	/* Owned by the PM core */
> > > >  	bool			direct_complete:1;	/* Owned by the PM core */
> > > > +	unsigned int		driver_flags;
> > > 
> > > Minor nit, u32 or u64?
> > 
> > u32 I think, will update.
> > 
> > BTW, there's a mess in this struct overall and I'd like all of the bit fileds
> > to be the same type (and that shouldn't be bool IMO :-)).
> > 
> > Do you prefer u32 or unsinged int?
> 
> I always prefer an explicit size for variables, unless it's a "generic
> loop" type thing.  So I'll always say "u32" for this.
> 
> And cleaning up the structure would be great, it's grown over time in
> odd ways as you point out.

OK, but that will be separate from this work.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 12/12] PM / core: Add AVOID_RPM driver flag
  2017-10-16  1:32 ` [PATCH 12/12] PM / core: Add AVOID_RPM driver flag Rafael J. Wysocki
@ 2017-10-17 15:33   ` Andy Shevchenko
  2017-10-17 15:59     ` Rafael J. Wysocki
  0 siblings, 1 reply; 135+ messages in thread
From: Andy Shevchenko @ 2017-10-17 15:33 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Kevin Hilman, Wolfram Sang, linux-i2c, Lee Jones

On Mon, 2017-10-16 at 03:32 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Define and document a new driver flag, DPM_FLAG_AVOID_RPM, to inform
> the PM core and middle layer code that the driver has something
> significant to do in its ->suspend and/or ->resume callbacks and
> runtime PM should be disabled for the device when these callbacks
> run.
> 
> Setting DPM_FLAG_AVOID_RPM (in addition to DPM_FLAG_SMART_SUSPEND)
> causes runtime PM to be disabled for the device before invoking the
> driver's ->suspend callback for it and to be enabled again for it
> only after the driver's ->resume callback has returned.  In addition
> to that, if the device is in runtime suspend right after disabling
> runtime PM for it (which means that there was no reason to resume it
> from runtime suspend beforehand), the invocation of the ->suspend
> callback will be skipped for it and it will be left in runtime
> suspend until the "noirq" phase of the subsequent system resume.
> 
> If DPM_FLAG_SMART_SUSPEND is not set, DPM_FLAG_AVOID_RPM has no
> effect.
> 

> +	if (dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
> +	    dev_pm_test_driver_flags(dev, DPM_FLAG_AVOID_RPM)) {

Wasn't interface designed to allow something like:
	if (dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND | DPM_FLAG_AVOID_RPM)) {
instead?

Does it make sense to have a separate definition for
DPM_FLAG_SMART_SUSPEND | DPM_FLAG_AVOID_RPM ?

-- 
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 12/12] PM / core: Add AVOID_RPM driver flag
  2017-10-17 15:33   ` Andy Shevchenko
@ 2017-10-17 15:59     ` Rafael J. Wysocki
  2017-10-17 16:25       ` Andy Shevchenko
  0 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-17 15:59 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Ulf Hansson, Kevin Hilman, Wolfram Sang, linux-i2c, Lee Jones

On Tuesday, October 17, 2017 5:33:17 PM CEST Andy Shevchenko wrote:
> On Mon, 2017-10-16 at 03:32 +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > Define and document a new driver flag, DPM_FLAG_AVOID_RPM, to inform
> > the PM core and middle layer code that the driver has something
> > significant to do in its ->suspend and/or ->resume callbacks and
> > runtime PM should be disabled for the device when these callbacks
> > run.
> > 
> > Setting DPM_FLAG_AVOID_RPM (in addition to DPM_FLAG_SMART_SUSPEND)
> > causes runtime PM to be disabled for the device before invoking the
> > driver's ->suspend callback for it and to be enabled again for it
> > only after the driver's ->resume callback has returned.  In addition
> > to that, if the device is in runtime suspend right after disabling
> > runtime PM for it (which means that there was no reason to resume it
> > from runtime suspend beforehand), the invocation of the ->suspend
> > callback will be skipped for it and it will be left in runtime
> > suspend until the "noirq" phase of the subsequent system resume.
> > 
> > If DPM_FLAG_SMART_SUSPEND is not set, DPM_FLAG_AVOID_RPM has no
> > effect.
> > 
> 
> > +	if (dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
> > +	    dev_pm_test_driver_flags(dev, DPM_FLAG_AVOID_RPM)) {
> 
> Wasn't interface designed to allow something like:
> 	if (dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND | DPM_FLAG_AVOID_RPM)) {
> instead?

That would return true if any of them was set and both are needed here.

> Does it make sense to have a separate definition for
> DPM_FLAG_SMART_SUSPEND | DPM_FLAG_AVOID_RPM ?

Yes, it does IMO, because if you don't provide ->suspend and ->resume
callbacks, it is sufficient if runtime PM is disabled for the device
in __device_suspend_late() which happens anyway.

DPM_FLAG_AVOID_RPM is about disabling it earlier.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 12/12] PM / core: Add AVOID_RPM driver flag
  2017-10-17 15:59     ` Rafael J. Wysocki
@ 2017-10-17 16:25       ` Andy Shevchenko
  0 siblings, 0 replies; 135+ messages in thread
From: Andy Shevchenko @ 2017-10-17 16:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Ulf Hansson, Kevin Hilman, Wolfram Sang, linux-i2c, Lee Jones

On Tue, 2017-10-17 at 17:59 +0200, Rafael J. Wysocki wrote:
> On Tuesday, October 17, 2017 5:33:17 PM CEST Andy Shevchenko wrote:
> > On Mon, 2017-10-16 at 03:32 +0200, Rafael J. Wysocki wrote:

> > > If DPM_FLAG_SMART_SUSPEND is not set, DPM_FLAG_AVOID_RPM has no
> > > effect.
> > > 
> > > +	if (dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND)
> > > &&
> > > +	    dev_pm_test_driver_flags(dev, DPM_FLAG_AVOID_RPM)) {
> > 
> > Wasn't interface designed to allow something like:
> > 	if (dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND |
> > DPM_FLAG_AVOID_RPM)) {
> > instead?
> 
> That would return true if any of them was set and both are needed
> here.

Ah, indeed. It would not be equivalent. 

-- 
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-17 15:25   ` Rafael J. Wysocki
@ 2017-10-17 19:41     ` Ulf Hansson
  2017-10-17 20:12       ` Alan Stern
  2017-10-18  0:39       ` Rafael J. Wysocki
  0 siblings, 2 replies; 135+ messages in thread
From: Ulf Hansson @ 2017-10-17 19:41 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

[...]

>>
>> I am not sure I fully understand the goal you have with this series.
>> Can we please try to get that clear before I continue the review.
>
> Quoting from the above:
>
> "This patch series focuses on addressing those problems so as to make it
> easier to reuse callback routines by pointing different callback pointers
> to them in device drivers.  The flags introduced here are to instruct the
> PM core and middle layers (whatever they are) on how the driver wants the
> device to be handled and then the driver has to provide callbacks to match
> these instructions and the rest should be taken care of by the code above it."
>
> I'm not sure what I can explain beyond that. :-)
>
> And the i2c-designware-platdrv and intel-lpss patches show the direction
> I would like to take with that going forward: use the flags to reduce code
> duplication in drivers and between drivers.
>
>> Now, re-using runtime PM callbacks for system sleep, is already
>> happening. We have > 60 users (git grep "pm_runtime_force_suspend")
>
> 60 is a small number relative to the total number of device drivers in
> the tree.  In particular, that scheme is totally unsuitable for PCI drivers
> and how many of them there are?  Surely more than 60.

Sure, those 60 can be converted after some work. I just wanted to
understand your plan for these moving forward.

>
>> deploying this and from a middle layer point of view, all the trivial
>> cases supports this.
>
> These functions are wrong, however, because they attempt to reuse the
> whole callback *path* instead of just reusing driver callbacks.  The
> *only* reason why it all "works" is because there are no middle layer
> callbacks involved in that now.
>
> If you changed them to reuse driver callbacks only today, nothing would break
> AFAICS.

Yes, it would.

First, for example, the amba bus is responsible for the amba bus
clock, but relies on drivers to gate/ungate it during system sleep. In
case the amba drivers don't use the pm_runtime_force_suspend|resume(),
it will explicitly have to start manage the clock during system sleep
themselves. Leading to open coding.

Second, it will introduce a regression in behavior for all users of
pm_runtime_force_suspend|resume(), especially during system resume as
the driver may then end up resuming the device even in case it isn't
needed. I believe I have explained why, also several times by now -
and that's also how far you could take the i2c designware driver at
this point.

That said, I assume the second part may be addressed in this series,
if these drivers convert to use the "driver PM flags", right?

However, what about the first case? Is some open coding needed or your
think the amba driver can instruct the amba bus via the "driver PM
flags"?

>
>> Like the spi bus, i2c bus, amba bus, platform
>> bus, genpd, etc. There are no changes needed to continue to support
>> this option, if you see what I mean.
>
> For the time being, nothing changes in that respect, but eventually I'd
> prefer the pm_runtime_force_* things to go away, frankly.

Okay, thanks for that clear statement!

>
>> So, when you say that re-using runtime PM callbacks for system-wide PM
>> isn't going to happen, can you please elaborate what you mean?
>
> I didn't mean "reusing runtime PM callbacks for system-wide PM" overall, but
> reusing *middle-layer* runtime PM callbacks for system-wide PM.  That is the
> bogus part.

I think we have discussed this several times, but the arguments you
have put forward, explaining *why* haven't yet convinced me.

In principle what you have been saying is that it's a "layering
violation" to use pm_runtime_force_suspend|resume() from driver's
system sleep callbacks, but on the other hand you think using
pm_runtime_get*  and friends is okay!?

That makes little sense to me, because it's the same "layering
violation" that is done for both cases.

Moreover, you have been explaining that re-using runtime PM callbacks
for PCI doesn't work. Then my question is, why should a limitation of
the PCI subsystem put constraints on the behavior for all other
subsystems/middle-layers?

>
> Quoting again:
>
> "If you are a middle layer, your role is basically to do PM for a certain
> group of devices.  Thus you cannot really do the same in ->suspend or
> ->suspend_early and in ->runtime_suspend (because the former generally need to
> take device_may_wakeup() into account and the latter doesn't) and you shouldn't
> really do the same in ->suspend and ->freeze (becuase the latter shouldn't
> change the device's power state) and so on."
>
> I have said for multiple times that re-using *driver* callbacks actually makes
> sense and the series is for doing that easier in general among other things.
>
>> I assume you mean that the PM core won't be involved to support this,
>> but is that it?
>>
>> Do you also mean that *all* users of pm_runtime_force_suspend|resume()
>> must convert to this new thing, using "driver PM flags", so in the end
>> you want to remove pm_runtime_force_suspend|resume()?
>>  - Then if so, you must of course consider all cases for how
>> pm_runtime_force_suspend|resume() are being deployed currently, else
>> existing users can't convert to the "driver PM flags" thing. Have you
>> done that in this series?
>
> Let me turn this around.
>
> The majority of cases in which pm_runtime_force_* are used *should* be
> addressable using the flags introduced here.  Some case in which
> pm_runtime_force_* cannot be used should be addressable by these flags
> as well.

That's sounds really great!

>
> There may be some cases in which pm_runtime_force_* are used that may
> require something more, but I'm not going to worry about that right now.

This approach concerns me, because if we in the end realizes that
pm_runtime_force_suspend|resume() will be too hard to get rid of, then
this series just add yet another generic way of trying to optimize the
system sleep path for runtime PM enabled devices.

So then we would end up having to support the "direct_complete" path,
the "driver PM flags" and cases where
pm_runtime_force_suspend|resume() is used. No, that just isn't good
enough to me. That will just lead to similar scenarios as we had in
the i2c designware driver.

If we decide to go with these new "driver PM flags", I want to make
sure, as long as possible, that we can remove both the
"direct_complete" path support from the PM core as well as removing
the pm_runtime_force_suspend|resume() helpers.

>
> I'll take care of that when I'll be removing pm_runtime_force_*, which I'm
> not doing here.

Of course I am fine with that we postpone doing the actual converting
of drivers etc from this series, although as stated above, let's sure
we *can* do it by using the "driver PM flags".

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-17 19:41     ` Ulf Hansson
@ 2017-10-17 20:12       ` Alan Stern
  2017-10-17 23:07         ` Rafael J. Wysocki
  2017-10-18  0:39       ` Rafael J. Wysocki
  1 sibling, 1 reply; 135+ messages in thread
From: Alan Stern @ 2017-10-17 20:12 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, Linux PM, Bjorn Helgaas, Greg Kroah-Hartman,
	LKML, Linux ACPI, Linux PCI, Linux Documentation,
	Mika Westerberg, Andy Shevchenko, Kevin Hilman, Wolfram Sang,
	linux-i2c, Lee Jones

On Tue, 17 Oct 2017, Ulf Hansson wrote:

> > These functions are wrong, however, because they attempt to reuse the
> > whole callback *path* instead of just reusing driver callbacks.  The
> > *only* reason why it all "works" is because there are no middle layer
> > callbacks involved in that now.
> >
> > If you changed them to reuse driver callbacks only today, nothing would break
> > AFAICS.
> 
> Yes, it would.
> 
> First, for example, the amba bus is responsible for the amba bus
> clock, but relies on drivers to gate/ungate it during system sleep. In
> case the amba drivers don't use the pm_runtime_force_suspend|resume(),
> it will explicitly have to start manage the clock during system sleep
> themselves. Leading to open coding.

I think what Rafael has in mind is that the PM core will call the amba
bus's ->suspend callback, and that routine will then be able to call
the amba driver's runtime_suspend routine directly, if it wants to --
as opposed to going through pm_runtime_force_suspend.

However, it's not clear whether this fully answers your concerns.

> Second, it will introduce a regression in behavior for all users of
> pm_runtime_force_suspend|resume(), especially during system resume as
> the driver may then end up resuming the device even in case it isn't
> needed. I believe I have explained why, also several times by now -
> and that's also how far you could take the i2c designware driver at
> this point.
> 
> That said, I assume the second part may be addressed in this series,
> if these drivers convert to use the "driver PM flags", right?

Presumably.

The problem is how to handle things which need to be treated
differently for runtime PM vs. system suspend vs. hibernation.  If
everything filters through a runtime_suspend routine, that doesn't
leave any scope for handling the different kinds of PM transitions
differently.  Instead, we can make the middle layer (i.e., the bus-type
callbacks) take care of the varying tasks, and they can directly invoke
a driver's runtime-PM callbacks to handle all the common activities.  
If that's how the middle layer wants to do it.

> However, what about the first case? Is some open coding needed or your
> think the amba driver can instruct the amba bus via the "driver PM
> flags"?

PM flags won't directly be able to cover things like disabling clocks.  
But they could be useful for indicating explicitly whether the code to
take care of those things needs to reside at the driver layer or at the
bus layer.

Alan Stern

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-17 20:12       ` Alan Stern
@ 2017-10-17 23:07         ` Rafael J. Wysocki
  0 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-17 23:07 UTC (permalink / raw)
  To: Alan Stern
  Cc: Ulf Hansson, Linux PM, Bjorn Helgaas, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On Tuesday, October 17, 2017 10:12:19 PM CEST Alan Stern wrote:
> On Tue, 17 Oct 2017, Ulf Hansson wrote:
> 
> > > These functions are wrong, however, because they attempt to reuse the
> > > whole callback *path* instead of just reusing driver callbacks.  The
> > > *only* reason why it all "works" is because there are no middle layer
> > > callbacks involved in that now.
> > >
> > > If you changed them to reuse driver callbacks only today, nothing would break
> > > AFAICS.
> > 
> > Yes, it would.
> > 
> > First, for example, the amba bus is responsible for the amba bus
> > clock, but relies on drivers to gate/ungate it during system sleep. In
> > case the amba drivers don't use the pm_runtime_force_suspend|resume(),
> > it will explicitly have to start manage the clock during system sleep
> > themselves. Leading to open coding.
> 
> I think what Rafael has in mind is that the PM core will call the amba
> bus's ->suspend callback, and that routine will then be able to call
> the amba driver's runtime_suspend routine directly, if it wants to --
> as opposed to going through pm_runtime_force_suspend.

Right in general.

> However, it's not clear whether this fully answers your concerns.

Well, in the particular AMBA case fixing this should be quite straightforward.

> > Second, it will introduce a regression in behavior for all users of
> > pm_runtime_force_suspend|resume(), especially during system resume as
> > the driver may then end up resuming the device even in case it isn't
> > needed. I believe I have explained why, also several times by now -
> > and that's also how far you could take the i2c designware driver at
> > this point.
> > 
> > That said, I assume the second part may be addressed in this series,
> > if these drivers convert to use the "driver PM flags", right?
> 
> Presumably.
> 
> The problem is how to handle things which need to be treated
> differently for runtime PM vs. system suspend vs. hibernation.  If
> everything filters through a runtime_suspend routine, that doesn't
> leave any scope for handling the different kinds of PM transitions
> differently.  Instead, we can make the middle layer (i.e., the bus-type
> callbacks) take care of the varying tasks, and they can directly invoke
> a driver's runtime-PM callbacks to handle all the common activities.  
> If that's how the middle layer wants to do it.

Well, that's what happens today, except that driver runtime PM callbacks
are not directly invoked.  Actually, I tried to implement that, but it was
so ugly and fragile that I gave up.

It really is better if drivers point the different callback pointers to the
same rountine if they want to reuse it.

> > However, what about the first case? Is some open coding needed or your
> > think the amba driver can instruct the amba bus via the "driver PM
> > flags"?
> 
> PM flags won't directly be able to cover things like disabling clocks.  
> But they could be useful for indicating explicitly whether the code to
> take care of those things needs to reside at the driver layer or at the
> bus layer.

Right.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-17 19:41     ` Ulf Hansson
  2017-10-17 20:12       ` Alan Stern
@ 2017-10-18  0:39       ` Rafael J. Wysocki
  2017-10-18 10:24         ` Rafael J. Wysocki
  2017-10-18 11:57         ` Ulf Hansson
  1 sibling, 2 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-18  0:39 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On Tuesday, October 17, 2017 9:41:16 PM CEST Ulf Hansson wrote:

[cut]

> >
> >> deploying this and from a middle layer point of view, all the trivial
> >> cases supports this.
> >
> > These functions are wrong, however, because they attempt to reuse the
> > whole callback *path* instead of just reusing driver callbacks.  The
> > *only* reason why it all "works" is because there are no middle layer
> > callbacks involved in that now.
> >
> > If you changed them to reuse driver callbacks only today, nothing would break
> > AFAICS.
> 
> Yes, it would.
> 
> First, for example, the amba bus is responsible for the amba bus
> clock, but relies on drivers to gate/ungate it during system sleep. In
> case the amba drivers don't use the pm_runtime_force_suspend|resume(),
> it will explicitly have to start manage the clock during system sleep
> themselves. Leading to open coding.

Well, I suspected that something like this would surface. ;-)

Are there any major reasons why the appended patch (obviously untested) won't
work, then?

> Second, it will introduce a regression in behavior for all users of
> pm_runtime_force_suspend|resume(), especially during system resume as
> the driver may then end up resuming the device even in case it isn't
> needed.

How so?

I'm talking about a change like in the appended patch, where
pm_runtime_force_* simply invoke driver callbacks directly.  What is
skipped there is middle-layer stuff which is empty anyway in all cases
except for AMBA (if that's all what is lurking below the surface), so
I don't quite see how the failure will happen.

> I believe I have explained why, also several times by now -
> and that's also how far you could take the i2c designware driver at
> this point.
> 
> That said, I assume the second part may be addressed in this series,
> if these drivers convert to use the "driver PM flags", right?
> 
> However, what about the first case? Is some open coding needed or your
> think the amba driver can instruct the amba bus via the "driver PM
> flags"?

With the appended patch applied things should work for AMBA like for
any other bus type implementing PM, so I don't see why not.

> >
> >> Like the spi bus, i2c bus, amba bus, platform
> >> bus, genpd, etc. There are no changes needed to continue to support
> >> this option, if you see what I mean.
> >
> > For the time being, nothing changes in that respect, but eventually I'd
> > prefer the pm_runtime_force_* things to go away, frankly.
> 
> Okay, thanks for that clear statement!
> 
> >
> >> So, when you say that re-using runtime PM callbacks for system-wide PM
> >> isn't going to happen, can you please elaborate what you mean?
> >
> > I didn't mean "reusing runtime PM callbacks for system-wide PM" overall, but
> > reusing *middle-layer* runtime PM callbacks for system-wide PM.  That is the
> > bogus part.
> 
> I think we have discussed this several times, but the arguments you
> have put forward, explaining *why* haven't yet convinced me.

Well, sorry about that.  I would like to be able to explain my point to you so
that you understand my perspective, but if that's not working, that's not a
sufficient reason for me to give up.

I'm just refusing to maintain code that I don't agree with in the long run.

> In principle what you have been saying is that it's a "layering
> violation" to use pm_runtime_force_suspend|resume() from driver's
> system sleep callbacks, but on the other hand you think using
> pm_runtime_get*  and friends is okay!?

Not unconditionally, which would be fair to mention.

Only if it is called in ->prepare or as the first thing in a ->suspend
callback.  Later than that is broken too in principle.

> That makes little sense to me, because it's the same "layering
> violation" that is done for both cases.

The "layering violation" is all about things possibly occurring in a
wrong order.  For example, say a middle-layer ->runtime_suspend is
called via pm_runtime_force_suspend() which in turn is called from
middle-layer ->suspend_late as a driver callback.  If the ->runtime_suspend
does anything significat to the device, then executing the remaining part of
->suspend_late will almost cetainly break things, more or less.

That is not a concern with a middle-layer ->runtime_resume running
*before* a middle-layer ->suspend (or any subsequent callbacks) does
anything significant to the device.

Is there anything in the above which is not clear enough?

> Moreover, you have been explaining that re-using runtime PM callbacks
> for PCI doesn't work. Then my question is, why should a limitation of
> the PCI subsystem put constraints on the behavior for all other
> subsystems/middle-layers?

Because they aren't just PCI subsystem limitations only.  The need to handle
wakeup setup differently for runtime PM and system sleep is not PCI-specific.
The need to handle suspend and hibernation differently isn't too.

Those things may be more obvious in PCI, but they are generic rather than
special.

Also, quite so often other middle layers interact with PCI directly or
indirectly (eg. a platform device may be a child or a consumer of a PCI
device) and some optimizations need to take that into account (eg. parents
generally need to be accessible when their childres are resumed and so on).

Moreover, the majority of the "other subsystems/middle-layers" you've talked
about so far don't provide any PM callbacks to be invoked by pm_runtime_force_*,
so question is how representative they really are.

> >
> > Quoting again:
> >
> > "If you are a middle layer, your role is basically to do PM for a certain
> > group of devices.  Thus you cannot really do the same in ->suspend or
> > ->suspend_early and in ->runtime_suspend (because the former generally need to
> > take device_may_wakeup() into account and the latter doesn't) and you shouldn't
> > really do the same in ->suspend and ->freeze (becuase the latter shouldn't
> > change the device's power state) and so on."
> >
> > I have said for multiple times that re-using *driver* callbacks actually makes
> > sense and the series is for doing that easier in general among other things.
> >
> >> I assume you mean that the PM core won't be involved to support this,
> >> but is that it?
> >>
> >> Do you also mean that *all* users of pm_runtime_force_suspend|resume()
> >> must convert to this new thing, using "driver PM flags", so in the end
> >> you want to remove pm_runtime_force_suspend|resume()?
> >>  - Then if so, you must of course consider all cases for how
> >> pm_runtime_force_suspend|resume() are being deployed currently, else
> >> existing users can't convert to the "driver PM flags" thing. Have you
> >> done that in this series?
> >
> > Let me turn this around.
> >
> > The majority of cases in which pm_runtime_force_* are used *should* be
> > addressable using the flags introduced here.  Some case in which
> > pm_runtime_force_* cannot be used should be addressable by these flags
> > as well.
> 
> That's sounds really great!
> 
> >
> > There may be some cases in which pm_runtime_force_* are used that may
> > require something more, but I'm not going to worry about that right now.
> 
> This approach concerns me, because if we in the end realizes that
> pm_runtime_force_suspend|resume() will be too hard to get rid of, then
> this series just add yet another generic way of trying to optimize the
> system sleep path for runtime PM enabled devices.

Which also works for PCI and the ACPI PM domain and that's sort of valuable
anyway, isn't it?

For the record, I don't think it will be too hard to get rid of
pm_runtime_force_suspend|resume(), although that may take quite some time.

> So then we would end up having to support the "direct_complete" path,
> the "driver PM flags" and cases where
> pm_runtime_force_suspend|resume() is used. No, that just isn't good
> enough to me. That will just lead to similar scenarios as we had in
> the i2c designware driver.

Frankly, this sounds like staging for indefinite blocking of changes in
this area on non-technical grounds.  I hope that it isn't the case ...

> If we decide to go with these new "driver PM flags", I want to make
> sure, as long as possible, that we can remove both the
> "direct_complete" path support from the PM core as well as removing
> the pm_runtime_force_suspend|resume() helpers.

We'll see.

> >
> > I'll take care of that when I'll be removing pm_runtime_force_*, which I'm
> > not doing here.
> 
> Of course I am fine with that we postpone doing the actual converting
> of drivers etc from this series, although as stated above, let's sure
> we *can* do it by using the "driver PM flags".

There clearly are use cases that benefit from this series and I don't see
any alternatives covering them, including both direct-complete and the
pm_runtime_force* approach, so I'm not buying this "let's make sure
it can cover all possible use cases that exist" argumentation.

Thanks,
Rafael


---
 drivers/amba/bus.c           |   79 ++++++++++++++++++++++++++++---------------
 drivers/base/power/runtime.c |   10 +++--
 2 files changed, 58 insertions(+), 31 deletions(-)

Index: linux-pm/drivers/amba/bus.c
===================================================================
--- linux-pm.orig/drivers/amba/bus.c
+++ linux-pm/drivers/amba/bus.c
@@ -132,52 +132,77 @@ static struct attribute *amba_dev_attrs[
 ATTRIBUTE_GROUPS(amba_dev);
 
 #ifdef CONFIG_PM
+static void amba_pm_suspend(struct device *dev)
+{
+	struct amba_device *pcdev = to_amba_device(dev);
+
+	if (!dev->driver)
+		return;
+
+	if (pm_runtime_is_irq_safe(dev))
+		clk_disable(pcdev->pclk);
+	else
+		clk_disable_unprepare(pcdev->pclk);
+}
+
+static int amba_pm_resume(struct device *dev)
+{
+	struct amba_device *pcdev = to_amba_device(dev);
+
+	if (!dev->driver)
+		return 0;
+
+	/* Failure is probably fatal to the system, but... */
+	if (pm_runtime_is_irq_safe(dev))
+		return clk_enable(pcdev->pclk);
+
+	return clk_prepare_enable(pcdev->pclk);
+}
+
 /*
  * Hooks to provide runtime PM of the pclk (bus clock).  It is safe to
  * enable/disable the bus clock at runtime PM suspend/resume as this
  * does not result in loss of context.
  */
+static int amba_pm_suspend_early(struct device *dev)
+{
+	int ret = pm_generic_suspend_early(dev);
+
+	if (ret)
+		return ret;
+
+	amba_pm_suspend(dev);
+	return 0;
+}
+
+static int amba_pm_resume_late(struct device *dev)
+{
+	int ret = amba_pm_resume(dev);
+
+	return ret ? ret : pm_generic_resume_late(dev);
+}
+
 static int amba_pm_runtime_suspend(struct device *dev)
 {
-	struct amba_device *pcdev = to_amba_device(dev);
 	int ret = pm_generic_runtime_suspend(dev);
 
-	if (ret == 0 && dev->driver) {
-		if (pm_runtime_is_irq_safe(dev))
-			clk_disable(pcdev->pclk);
-		else
-			clk_disable_unprepare(pcdev->pclk);
-	}
+	if (ret)
+		return ret;
 
-	return ret;
+	amba_pm_suspend(dev);
+	return 0;
 }
 
 static int amba_pm_runtime_resume(struct device *dev)
 {
-	struct amba_device *pcdev = to_amba_device(dev);
-	int ret;
-
-	if (dev->driver) {
-		if (pm_runtime_is_irq_safe(dev))
-			ret = clk_enable(pcdev->pclk);
-		else
-			ret = clk_prepare_enable(pcdev->pclk);
-		/* Failure is probably fatal to the system, but... */
-		if (ret)
-			return ret;
-	}
+	int ret = amba_pm_resume(dev);
 
-	return pm_generic_runtime_resume(dev);
+	return ret ? ret : pm_generic_runtime_resume(dev);
 }
 #endif /* CONFIG_PM */
 
 static const struct dev_pm_ops amba_pm = {
-	.suspend	= pm_generic_suspend,
-	.resume		= pm_generic_resume,
-	.freeze		= pm_generic_freeze,
-	.thaw		= pm_generic_thaw,
-	.poweroff	= pm_generic_poweroff,
-	.restore	= pm_generic_restore,
+	SET_LATE_SYSTEM_SLEEP_PM_OPS(amba_pm_suspend_late, amba_pm_resume_early)
 	SET_RUNTIME_PM_OPS(
 		amba_pm_runtime_suspend,
 		amba_pm_runtime_resume,
Index: linux-pm/drivers/base/power/runtime.c
===================================================================
--- linux-pm.orig/drivers/base/power/runtime.c
+++ linux-pm/drivers/base/power/runtime.c
@@ -1636,14 +1636,15 @@ void pm_runtime_drop_link(struct device
  */
 int pm_runtime_force_suspend(struct device *dev)
 {
-	int (*callback)(struct device *);
+	int (*callback)(struct device *) = NULL;
 	int ret = 0;
 
 	pm_runtime_disable(dev);
 	if (pm_runtime_status_suspended(dev))
 		return 0;
 
-	callback = RPM_GET_CALLBACK(dev, runtime_suspend);
+	if (dev->driver && dev->driver->pm)
+		callback = dev->driver->pm->runtime_suspend;
 
 	if (!callback) {
 		ret = -ENOSYS;
@@ -1690,10 +1691,11 @@ EXPORT_SYMBOL_GPL(pm_runtime_force_suspe
  */
 int pm_runtime_force_resume(struct device *dev)
 {
-	int (*callback)(struct device *);
+	int (*callback)(struct device *) = NULL;
 	int ret = 0;
 
-	callback = RPM_GET_CALLBACK(dev, runtime_resume);
+	if (dev->driver && dev->driver->pm)
+		callback = dev->driver->pm->runtime_resume;
 
 	if (!callback) {
 		ret = -ENOSYS;

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-17 15:26         ` Rafael J. Wysocki
@ 2017-10-18  6:56           ` Greg Kroah-Hartman
  0 siblings, 0 replies; 135+ messages in thread
From: Greg Kroah-Hartman @ 2017-10-18  6:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On Tue, Oct 17, 2017 at 05:26:20PM +0200, Rafael J. Wysocki wrote:
> On Tuesday, October 17, 2017 9:15:43 AM CEST Greg Kroah-Hartman wrote:
> > On Tue, Oct 17, 2017 at 12:05:11AM +0200, Rafael J. Wysocki wrote:
> > > On Monday, October 16, 2017 8:28:52 AM CEST Greg Kroah-Hartman wrote:
> > > > On Mon, Oct 16, 2017 at 03:29:02AM +0200, Rafael J. Wysocki wrote:
> > > > >  struct dev_pm_info {
> > > > >  	pm_message_t		power_state;
> > > > >  	unsigned int		can_wakeup:1;
> > > > > @@ -561,6 +580,7 @@ struct dev_pm_info {
> > > > >  	bool			is_late_suspended:1;
> > > > >  	bool			early_init:1;	/* Owned by the PM core */
> > > > >  	bool			direct_complete:1;	/* Owned by the PM core */
> > > > > +	unsigned int		driver_flags;
> > > > 
> > > > Minor nit, u32 or u64?
> > > 
> > > u32 I think, will update.
> > > 
> > > BTW, there's a mess in this struct overall and I'd like all of the bit fileds
> > > to be the same type (and that shouldn't be bool IMO :-)).
> > > 
> > > Do you prefer u32 or unsinged int?
> > 
> > I always prefer an explicit size for variables, unless it's a "generic
> > loop" type thing.  So I'll always say "u32" for this.
> > 
> > And cleaning up the structure would be great, it's grown over time in
> > odd ways as you point out.
> 
> OK, but that will be separate from this work.

Of course :)

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-18  0:39       ` Rafael J. Wysocki
@ 2017-10-18 10:24         ` Rafael J. Wysocki
  2017-10-18 12:34           ` Ulf Hansson
  2017-10-18 11:57         ` Ulf Hansson
  1 sibling, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-18 10:24 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On Wednesday, October 18, 2017 2:39:24 AM CEST Rafael J. Wysocki wrote:
> On Tuesday, October 17, 2017 9:41:16 PM CEST Ulf Hansson wrote:
> 
> [cut]
> 
> > >
> > >> deploying this and from a middle layer point of view, all the trivial
> > >> cases supports this.
> > >
> > > These functions are wrong, however, because they attempt to reuse the
> > > whole callback *path* instead of just reusing driver callbacks.  The
> > > *only* reason why it all "works" is because there are no middle layer
> > > callbacks involved in that now.
> > >
> > > If you changed them to reuse driver callbacks only today, nothing would break
> > > AFAICS.
> > 
> > Yes, it would.
> > 
> > First, for example, the amba bus is responsible for the amba bus
> > clock, but relies on drivers to gate/ungate it during system sleep. In
> > case the amba drivers don't use the pm_runtime_force_suspend|resume(),
> > it will explicitly have to start manage the clock during system sleep
> > themselves. Leading to open coding.
> 
> Well, I suspected that something like this would surface. ;-)
> 
> Are there any major reasons why the appended patch (obviously untested) won't
> work, then?

OK, there is a reason, which is the optimizations bundled into
pm_runtime_force_*, because (a) the device may be left in runtime suspend
by them (in which case amba_pm_suspend_early() in my patch should not run)
and (b) pm_runtime_force_resume() may decide to leave it suspended (in which
case amba_pm_suspend_late() in my patch should not run).

[BTW, the "leave the device suspended" optimization in pm_runtime_force_*
is potentially problematic too, because it requires the children to do
the right thing, which effectively means that their drivers need to use
pm_runtime_force_* too, but what if they don't want to reuse their
runtime PM callbacks for system-wide PM?]

Honestly, I don't like the way this is designed.  IMO, it would be better
to do the optimizations and all in the bus type middle-layer code instead
of expecting drivers to use pm_runtime_force_* as their system-wide PM
callbacks (and that expectation should at least be documented, which I'm
not sure is the case now).  But whatever.

It all should work the way it does now without pm_runtime_force_* if (a) the
bus type's PM callbacks are changed like in the last patch and the drivers
(b) point their system suspend callbacks to the runtime PM callback routines
and (c) set DPM_FLAG_SMART_SUSPEND and DPM_FLAG_LEAVE_SUSPENDED for the
devices (if they need to do the PM in ->suspend and ->resume, they may set
DPM_FLAG_AVOID_RPM too).

And if you see a reason why that won't work, please let me know.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-18  0:39       ` Rafael J. Wysocki
  2017-10-18 10:24         ` Rafael J. Wysocki
@ 2017-10-18 11:57         ` Ulf Hansson
  2017-10-18 13:00           ` Rafael J. Wysocki
  1 sibling, 1 reply; 135+ messages in thread
From: Ulf Hansson @ 2017-10-18 11:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On 18 October 2017 at 02:39, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Tuesday, October 17, 2017 9:41:16 PM CEST Ulf Hansson wrote:
>
> [cut]
>
>> >
>> >> deploying this and from a middle layer point of view, all the trivial
>> >> cases supports this.
>> >
>> > These functions are wrong, however, because they attempt to reuse the
>> > whole callback *path* instead of just reusing driver callbacks.  The
>> > *only* reason why it all "works" is because there are no middle layer
>> > callbacks involved in that now.
>> >
>> > If you changed them to reuse driver callbacks only today, nothing would break
>> > AFAICS.
>>
>> Yes, it would.
>>
>> First, for example, the amba bus is responsible for the amba bus
>> clock, but relies on drivers to gate/ungate it during system sleep. In
>> case the amba drivers don't use the pm_runtime_force_suspend|resume(),
>> it will explicitly have to start manage the clock during system sleep
>> themselves. Leading to open coding.
>
> Well, I suspected that something like this would surface. ;-)
>
> Are there any major reasons why the appended patch (obviously untested) won't
> work, then?

Let me comment on the code, instead of here...

...just realized your second reply, so let me reply to that instead
regarding the patch.

>
>> Second, it will introduce a regression in behavior for all users of
>> pm_runtime_force_suspend|resume(), especially during system resume as
>> the driver may then end up resuming the device even in case it isn't
>> needed.
>
> How so?
>
> I'm talking about a change like in the appended patch, where
> pm_runtime_force_* simply invoke driver callbacks directly.  What is
> skipped there is middle-layer stuff which is empty anyway in all cases
> except for AMBA (if that's all what is lurking below the surface), so
> I don't quite see how the failure will happen.

I am afraid changing pm_runtime_force* to only call driver callbacks
may become fragile. Let me elaborate.

The reason why pm_runtime_force_* needs to respects the hierarchy of
the RPM callbacks, is because otherwise it can't safely update the
runtime PM status of the device. And updating the runtime PM status of
the device is required to manage the optimized behavior during system
resume (avoiding to unnecessary resume devices).

Besides the AMBA case, I also realized that we are dealing with PM
clocks in the genpd case. For this, genpd relies on the that runtime
PM status of the device properly reflects the state of the HW, during
system-wide PM.

In other words, if the driver would change the runtime PM status of
the device, without respecting the hierarchy of the runtime PM
callbacks, it would lead to that genpd starts taking wrong decisions
while managing the PM clocks during system-wide PM. So in case you
intend to change pm_runtime_force_* this needs to be addressed too.

>
>> I believe I have explained why, also several times by now -
>> and that's also how far you could take the i2c designware driver at
>> this point.
>>
>> That said, I assume the second part may be addressed in this series,
>> if these drivers convert to use the "driver PM flags", right?
>>
>> However, what about the first case? Is some open coding needed or your
>> think the amba driver can instruct the amba bus via the "driver PM
>> flags"?
>
> With the appended patch applied things should work for AMBA like for
> any other bus type implementing PM, so I don't see why not.
>
>> >
>> >> Like the spi bus, i2c bus, amba bus, platform
>> >> bus, genpd, etc. There are no changes needed to continue to support
>> >> this option, if you see what I mean.
>> >
>> > For the time being, nothing changes in that respect, but eventually I'd
>> > prefer the pm_runtime_force_* things to go away, frankly.
>>
>> Okay, thanks for that clear statement!
>>
>> >
>> >> So, when you say that re-using runtime PM callbacks for system-wide PM
>> >> isn't going to happen, can you please elaborate what you mean?
>> >
>> > I didn't mean "reusing runtime PM callbacks for system-wide PM" overall, but
>> > reusing *middle-layer* runtime PM callbacks for system-wide PM.  That is the
>> > bogus part.
>>
>> I think we have discussed this several times, but the arguments you
>> have put forward, explaining *why* haven't yet convinced me.
>
> Well, sorry about that.  I would like to be able to explain my point to you so
> that you understand my perspective, but if that's not working, that's not a
> sufficient reason for me to give up.
>
> I'm just refusing to maintain code that I don't agree with in the long run.
>
>> In principle what you have been saying is that it's a "layering
>> violation" to use pm_runtime_force_suspend|resume() from driver's
>> system sleep callbacks, but on the other hand you think using
>> pm_runtime_get*  and friends is okay!?
>
> Not unconditionally, which would be fair to mention.
>
> Only if it is called in ->prepare or as the first thing in a ->suspend
> callback.  Later than that is broken too in principle.
>
>> That makes little sense to me, because it's the same "layering
>> violation" that is done for both cases.
>
> The "layering violation" is all about things possibly occurring in a
> wrong order.  For example, say a middle-layer ->runtime_suspend is
> called via pm_runtime_force_suspend() which in turn is called from
> middle-layer ->suspend_late as a driver callback.  If the ->runtime_suspend
> does anything significat to the device, then executing the remaining part of
> ->suspend_late will almost cetainly break things, more or less.
>
> That is not a concern with a middle-layer ->runtime_resume running
> *before* a middle-layer ->suspend (or any subsequent callbacks) does
> anything significant to the device.
>
> Is there anything in the above which is not clear enough?
>
>> Moreover, you have been explaining that re-using runtime PM callbacks
>> for PCI doesn't work. Then my question is, why should a limitation of
>> the PCI subsystem put constraints on the behavior for all other
>> subsystems/middle-layers?
>
> Because they aren't just PCI subsystem limitations only.  The need to handle
> wakeup setup differently for runtime PM and system sleep is not PCI-specific.
> The need to handle suspend and hibernation differently isn't too.
>
> Those things may be more obvious in PCI, but they are generic rather than
> special.

Absolutely agree about the different wake-up settings. However, these
issues can be addressed also when using pm_runtime_force_*, at least
in general, but then not for PCI.

Regarding hibernation, honestly that's not really my area of
expertise. Although, I assume the middle-layer and driver can treat
that as a separate case, so if it's not suitable to use
pm_runtime_force* for that case, then they shouldn't do it.

>
> Also, quite so often other middle layers interact with PCI directly or
> indirectly (eg. a platform device may be a child or a consumer of a PCI
> device) and some optimizations need to take that into account (eg. parents
> generally need to be accessible when their childres are resumed and so on).

A device's parent becomes informed when changing the runtime PM status
of the device via pm_runtime_force_suspend|resume(), as those calls
pm_runtime_set_suspended|active(). In case that isn't that sufficient,
what else is needed? Perhaps you can point me to an example so I can
understand better?

For a PCI consumer device those will of course have to play by the rules of PCI.

>
> Moreover, the majority of the "other subsystems/middle-layers" you've talked
> about so far don't provide any PM callbacks to be invoked by pm_runtime_force_*,
> so question is how representative they really are.

That's the point. We know pm_runtime_force_* works nicely for the
trivial middle-layer cases. For the more complex cases, we need
something additional/different.

>
>> >
>> > Quoting again:
>> >
>> > "If you are a middle layer, your role is basically to do PM for a certain
>> > group of devices.  Thus you cannot really do the same in ->suspend or
>> > ->suspend_early and in ->runtime_suspend (because the former generally need to
>> > take device_may_wakeup() into account and the latter doesn't) and you shouldn't
>> > really do the same in ->suspend and ->freeze (becuase the latter shouldn't
>> > change the device's power state) and so on."
>> >
>> > I have said for multiple times that re-using *driver* callbacks actually makes
>> > sense and the series is for doing that easier in general among other things.
>> >
>> >> I assume you mean that the PM core won't be involved to support this,
>> >> but is that it?
>> >>
>> >> Do you also mean that *all* users of pm_runtime_force_suspend|resume()
>> >> must convert to this new thing, using "driver PM flags", so in the end
>> >> you want to remove pm_runtime_force_suspend|resume()?
>> >>  - Then if so, you must of course consider all cases for how
>> >> pm_runtime_force_suspend|resume() are being deployed currently, else
>> >> existing users can't convert to the "driver PM flags" thing. Have you
>> >> done that in this series?
>> >
>> > Let me turn this around.
>> >
>> > The majority of cases in which pm_runtime_force_* are used *should* be
>> > addressable using the flags introduced here.  Some case in which
>> > pm_runtime_force_* cannot be used should be addressable by these flags
>> > as well.
>>
>> That's sounds really great!
>>
>> >
>> > There may be some cases in which pm_runtime_force_* are used that may
>> > require something more, but I'm not going to worry about that right now.
>>
>> This approach concerns me, because if we in the end realizes that
>> pm_runtime_force_suspend|resume() will be too hard to get rid of, then
>> this series just add yet another generic way of trying to optimize the
>> system sleep path for runtime PM enabled devices.
>
> Which also works for PCI and the ACPI PM domain and that's sort of valuable
> anyway, isn't it?

Indeed it is! I am definitely open to improve the situation for ACPI and PCI.

Seems like I may have given the wrong impression about that.

>
> For the record, I don't think it will be too hard to get rid of
> pm_runtime_force_suspend|resume(), although that may take quite some time.
>
>> So then we would end up having to support the "direct_complete" path,
>> the "driver PM flags" and cases where
>> pm_runtime_force_suspend|resume() is used. No, that just isn't good
>> enough to me. That will just lead to similar scenarios as we had in
>> the i2c designware driver.
>
> Frankly, this sounds like staging for indefinite blocking of changes in
> this area on non-technical grounds.  I hope that it isn't the case ...
>
>> If we decide to go with these new "driver PM flags", I want to make
>> sure, as long as possible, that we can remove both the
>> "direct_complete" path support from the PM core as well as removing
>> the pm_runtime_force_suspend|resume() helpers.
>
> We'll see.
>
>> >
>> > I'll take care of that when I'll be removing pm_runtime_force_*, which I'm
>> > not doing here.
>>
>> Of course I am fine with that we postpone doing the actual converting
>> of drivers etc from this series, although as stated above, let's sure
>> we *can* do it by using the "driver PM flags".
>
> There clearly are use cases that benefit from this series and I don't see
> any alternatives covering them, including both direct-complete and the
> pm_runtime_force* approach, so I'm not buying this "let's make sure
> it can cover all possible use cases that exist" argumentation.

Alright, let me re-phrase my take on this.

Because you stated that you plan to remove pm_runtime_force_*
eventually, then I think you need to put up some valid reasons of why
(I consider that done), but more importantly, you need to offer an
alternative solution that can replace it. Else such that statement can
easily become wrong interpreted. My point is, the "driver PM flags" do
*not* offers a full alternative solution, it may do in the future or
it may not.

So, to conclude from my side, I don't have any major objections to
going forward with the "driver PM flags", especially with the goal of
improving the situation for PCI and ACPI. Down the road, we can then
*try* to make it replace pm_runtime_force_* and the "direct_complete
path".

Hopefully that makes it more clear.

[...]

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-18 10:24         ` Rafael J. Wysocki
@ 2017-10-18 12:34           ` Ulf Hansson
  2017-10-18 21:54             ` Rafael J. Wysocki
  0 siblings, 1 reply; 135+ messages in thread
From: Ulf Hansson @ 2017-10-18 12:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

[...]

>> Are there any major reasons why the appended patch (obviously untested) won't
>> work, then?
>
> OK, there is a reason, which is the optimizations bundled into
> pm_runtime_force_*, because (a) the device may be left in runtime suspend
> by them (in which case amba_pm_suspend_early() in my patch should not run)
> and (b) pm_runtime_force_resume() may decide to leave it suspended (in which
> case amba_pm_suspend_late() in my patch should not run).

Exactly.

>
> [BTW, the "leave the device suspended" optimization in pm_runtime_force_*
> is potentially problematic too, because it requires the children to do
> the right thing, which effectively means that their drivers need to use
> pm_runtime_force_* too, but what if they don't want to reuse their
> runtime PM callbacks for system-wide PM?]

Deployment of pm_runtime_force_suspend() should generally be done for
children devices first.

If some reason that isn't the case, it's expected that the call to
pm_runtime_set_suspended() invoked from pm_runtime_force_suspend(),
for the parent, should fail and thus abort system suspend.

>
> Honestly, I don't like the way this is designed.  IMO, it would be better
> to do the optimizations and all in the bus type middle-layer code instead
> of expecting drivers to use pm_runtime_force_* as their system-wide PM
> callbacks (and that expectation should at least be documented, which I'm
> not sure is the case now).  But whatever.
>
> It all should work the way it does now without pm_runtime_force_* if (a) the
> bus type's PM callbacks are changed like in the last patch and the drivers
> (b) point their system suspend callbacks to the runtime PM callback routines
> and (c) set DPM_FLAG_SMART_SUSPEND and DPM_FLAG_LEAVE_SUSPENDED for the
> devices (if they need to do the PM in ->suspend and ->resume, they may set
> DPM_FLAG_AVOID_RPM too).
>
> And if you see a reason why that won't work, please let me know.

I will have look and try out the series by using my local "runtime PM
test driver".

I get back to you with an update on this.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-18 11:57         ` Ulf Hansson
@ 2017-10-18 13:00           ` Rafael J. Wysocki
  2017-10-18 14:11             ` Ulf Hansson
  0 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-18 13:00 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On Wednesday, October 18, 2017 1:57:52 PM CEST Ulf Hansson wrote:
> On 18 October 2017 at 02:39, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Tuesday, October 17, 2017 9:41:16 PM CEST Ulf Hansson wrote:
> >
> > [cut]
> >
> >> >
> >> >> deploying this and from a middle layer point of view, all the trivial
> >> >> cases supports this.
> >> >
> >> > These functions are wrong, however, because they attempt to reuse the
> >> > whole callback *path* instead of just reusing driver callbacks.  The
> >> > *only* reason why it all "works" is because there are no middle layer
> >> > callbacks involved in that now.
> >> >
> >> > If you changed them to reuse driver callbacks only today, nothing would break
> >> > AFAICS.
> >>
> >> Yes, it would.
> >>
> >> First, for example, the amba bus is responsible for the amba bus
> >> clock, but relies on drivers to gate/ungate it during system sleep. In
> >> case the amba drivers don't use the pm_runtime_force_suspend|resume(),
> >> it will explicitly have to start manage the clock during system sleep
> >> themselves. Leading to open coding.
> >
> > Well, I suspected that something like this would surface. ;-)
> >
> > Are there any major reasons why the appended patch (obviously untested) won't
> > work, then?
> 
> Let me comment on the code, instead of here...
> 
> ...just realized your second reply, so let me reply to that instead
> regarding the patch.
> 
> >
> >> Second, it will introduce a regression in behavior for all users of
> >> pm_runtime_force_suspend|resume(), especially during system resume as
> >> the driver may then end up resuming the device even in case it isn't
> >> needed.
> >
> > How so?
> >
> > I'm talking about a change like in the appended patch, where
> > pm_runtime_force_* simply invoke driver callbacks directly.  What is
> > skipped there is middle-layer stuff which is empty anyway in all cases
> > except for AMBA (if that's all what is lurking below the surface), so
> > I don't quite see how the failure will happen.
> 
> I am afraid changing pm_runtime_force* to only call driver callbacks
> may become fragile. Let me elaborate.
> 
> The reason why pm_runtime_force_* needs to respects the hierarchy of
> the RPM callbacks, is because otherwise it can't safely update the
> runtime PM status of the device.

I'm not sure I follow this requirement.  Why is that so?

> And updating the runtime PM status of
> the device is required to manage the optimized behavior during system
> resume (avoiding to unnecessary resume devices).

Well, OK.  The runtime PM status of the device after system resume should
better reflect its physical state.

[The physical state of the device may not be under the control of the
kernel in some cases, like in S3 resume on some systems that reset
devices in the firmware and so on, but let's set that aside.]

However, for the runtime PM status of the device may still reflect its state
if, say, a ->resume_early of the middle layer is called during resume along
with a driver's ->runtime_resume.  That still can produce the right state
of the device and all depends on the middle layer.

On the other hand, as I said before, using a middle-layer ->runtime_suspend
during a system sleep transition may be outright incorrect, say if device
wakeup settings need to be adjusted by the middle layer (which is the
case for some of them).

Of course, if the middle layer expects the driver to point its
system-wide PM callbacks to pm_runtime_force_*, then that's how it goes,
but the drivers working with this particular middle layer generally
won't work with other middle layers and may interact incorrectly
with parents and/or children using the other middle layers.

I guess the problem boils down to having a common set of expectations
on the driver side and on the middle layer side allowing different
combinations of these to work together.

> Besides the AMBA case, I also realized that we are dealing with PM
> clocks in the genpd case. For this, genpd relies on the that runtime
> PM status of the device properly reflects the state of the HW, during
> system-wide PM.
> 
> In other words, if the driver would change the runtime PM status of
> the device, without respecting the hierarchy of the runtime PM
> callbacks, it would lead to that genpd starts taking wrong decisions
> while managing the PM clocks during system-wide PM. So in case you
> intend to change pm_runtime_force_* this needs to be addressed too.

I've just looked at the genpd code and quite frankly I'm not sure how this
works, but I'll figure this out. :-)

> >
> >> I believe I have explained why, also several times by now -
> >> and that's also how far you could take the i2c designware driver at
> >> this point.
> >>
> >> That said, I assume the second part may be addressed in this series,
> >> if these drivers convert to use the "driver PM flags", right?
> >>
> >> However, what about the first case? Is some open coding needed or your
> >> think the amba driver can instruct the amba bus via the "driver PM
> >> flags"?
> >
> > With the appended patch applied things should work for AMBA like for
> > any other bus type implementing PM, so I don't see why not.
> >
> >> >
> >> >> Like the spi bus, i2c bus, amba bus, platform
> >> >> bus, genpd, etc. There are no changes needed to continue to support
> >> >> this option, if you see what I mean.
> >> >
> >> > For the time being, nothing changes in that respect, but eventually I'd
> >> > prefer the pm_runtime_force_* things to go away, frankly.
> >>
> >> Okay, thanks for that clear statement!
> >>
> >> >
> >> >> So, when you say that re-using runtime PM callbacks for system-wide PM
> >> >> isn't going to happen, can you please elaborate what you mean?
> >> >
> >> > I didn't mean "reusing runtime PM callbacks for system-wide PM" overall, but
> >> > reusing *middle-layer* runtime PM callbacks for system-wide PM.  That is the
> >> > bogus part.
> >>
> >> I think we have discussed this several times, but the arguments you
> >> have put forward, explaining *why* haven't yet convinced me.
> >
> > Well, sorry about that.  I would like to be able to explain my point to you so
> > that you understand my perspective, but if that's not working, that's not a
> > sufficient reason for me to give up.
> >
> > I'm just refusing to maintain code that I don't agree with in the long run.
> >
> >> In principle what you have been saying is that it's a "layering
> >> violation" to use pm_runtime_force_suspend|resume() from driver's
> >> system sleep callbacks, but on the other hand you think using
> >> pm_runtime_get*  and friends is okay!?
> >
> > Not unconditionally, which would be fair to mention.
> >
> > Only if it is called in ->prepare or as the first thing in a ->suspend
> > callback.  Later than that is broken too in principle.
> >
> >> That makes little sense to me, because it's the same "layering
> >> violation" that is done for both cases.
> >
> > The "layering violation" is all about things possibly occurring in a
> > wrong order.  For example, say a middle-layer ->runtime_suspend is
> > called via pm_runtime_force_suspend() which in turn is called from
> > middle-layer ->suspend_late as a driver callback.  If the ->runtime_suspend
> > does anything significat to the device, then executing the remaining part of
> > ->suspend_late will almost cetainly break things, more or less.
> >
> > That is not a concern with a middle-layer ->runtime_resume running
> > *before* a middle-layer ->suspend (or any subsequent callbacks) does
> > anything significant to the device.
> >
> > Is there anything in the above which is not clear enough?
> >
> >> Moreover, you have been explaining that re-using runtime PM callbacks
> >> for PCI doesn't work. Then my question is, why should a limitation of
> >> the PCI subsystem put constraints on the behavior for all other
> >> subsystems/middle-layers?
> >
> > Because they aren't just PCI subsystem limitations only.  The need to handle
> > wakeup setup differently for runtime PM and system sleep is not PCI-specific.
> > The need to handle suspend and hibernation differently isn't too.
> >
> > Those things may be more obvious in PCI, but they are generic rather than
> > special.
> 
> Absolutely agree about the different wake-up settings. However, these
> issues can be addressed also when using pm_runtime_force_*, at least
> in general, but then not for PCI.

Well, not for the ACPI PM domain too.

In general, not if the wakeup settings are adjusted by the middle layer.

> Regarding hibernation, honestly that's not really my area of
> expertise. Although, I assume the middle-layer and driver can treat
> that as a separate case, so if it's not suitable to use
> pm_runtime_force* for that case, then they shouldn't do it.

Well, agreed.

In some simple cases, though, driver callbacks can be reused for hibernation
too, so it would be good to have a common way to do that too, IMO.

> >
> > Also, quite so often other middle layers interact with PCI directly or
> > indirectly (eg. a platform device may be a child or a consumer of a PCI
> > device) and some optimizations need to take that into account (eg. parents
> > generally need to be accessible when their childres are resumed and so on).
> 
> A device's parent becomes informed when changing the runtime PM status
> of the device via pm_runtime_force_suspend|resume(), as those calls
> pm_runtime_set_suspended|active().

This requires the parent driver or middle layer to look at the reference
counter and understand it the same way as pm_runtime_force_*.

> In case that isn't that sufficient, what else is needed? Perhaps you can
> point me to an example so I can understand better?

Say you want to leave the parent suspended after system resume, but the
child drivers use pm_runtime_force_suspend|resume().  The parent would then
need to use pm_runtime_force_suspend|resume() too, no?
 
> For a PCI consumer device those will of course have to play by the rules of PCI.
> 
> >
> > Moreover, the majority of the "other subsystems/middle-layers" you've talked
> > about so far don't provide any PM callbacks to be invoked by pm_runtime_force_*,
> > so question is how representative they really are.
> 
> That's the point. We know pm_runtime_force_* works nicely for the
> trivial middle-layer cases.

In which cases the middle-layer callbacks don't exist, so it's just like
reusing driver callbacks directly. :-)

> For the more complex cases, we need something additional/different.

Something different.

But overall, as I said, this is about common expectations.

Today, some middle layers expect drivers to point their callback pointers
to the same routine in order to resue it (PCI, ACPI bus type), some of them
expect pm_runtime_force_suspend|resume() to be used (AMBA, maybe genpd),
and some of them have no expectations at all.

There needs to be a common ground in that area for drivers to be able to
work with different middle layers.

> >
> >> >
> >> > Quoting again:
> >> >
> >> > "If you are a middle layer, your role is basically to do PM for a certain
> >> > group of devices.  Thus you cannot really do the same in ->suspend or
> >> > ->suspend_early and in ->runtime_suspend (because the former generally need to
> >> > take device_may_wakeup() into account and the latter doesn't) and you shouldn't
> >> > really do the same in ->suspend and ->freeze (becuase the latter shouldn't
> >> > change the device's power state) and so on."
> >> >
> >> > I have said for multiple times that re-using *driver* callbacks actually makes
> >> > sense and the series is for doing that easier in general among other things.
> >> >
> >> >> I assume you mean that the PM core won't be involved to support this,
> >> >> but is that it?
> >> >>
> >> >> Do you also mean that *all* users of pm_runtime_force_suspend|resume()
> >> >> must convert to this new thing, using "driver PM flags", so in the end
> >> >> you want to remove pm_runtime_force_suspend|resume()?
> >> >>  - Then if so, you must of course consider all cases for how
> >> >> pm_runtime_force_suspend|resume() are being deployed currently, else
> >> >> existing users can't convert to the "driver PM flags" thing. Have you
> >> >> done that in this series?
> >> >
> >> > Let me turn this around.
> >> >
> >> > The majority of cases in which pm_runtime_force_* are used *should* be
> >> > addressable using the flags introduced here.  Some case in which
> >> > pm_runtime_force_* cannot be used should be addressable by these flags
> >> > as well.
> >>
> >> That's sounds really great!
> >>
> >> >
> >> > There may be some cases in which pm_runtime_force_* are used that may
> >> > require something more, but I'm not going to worry about that right now.
> >>
> >> This approach concerns me, because if we in the end realizes that
> >> pm_runtime_force_suspend|resume() will be too hard to get rid of, then
> >> this series just add yet another generic way of trying to optimize the
> >> system sleep path for runtime PM enabled devices.
> >
> > Which also works for PCI and the ACPI PM domain and that's sort of valuable
> > anyway, isn't it?
> 
> Indeed it is! I am definitely open to improve the situation for ACPI and PCI.
> 
> Seems like I may have given the wrong impression about that.
> 
> >
> > For the record, I don't think it will be too hard to get rid of
> > pm_runtime_force_suspend|resume(), although that may take quite some time.
> >
> >> So then we would end up having to support the "direct_complete" path,
> >> the "driver PM flags" and cases where
> >> pm_runtime_force_suspend|resume() is used. No, that just isn't good
> >> enough to me. That will just lead to similar scenarios as we had in
> >> the i2c designware driver.
> >
> > Frankly, this sounds like staging for indefinite blocking of changes in
> > this area on non-technical grounds.  I hope that it isn't the case ...
> >
> >> If we decide to go with these new "driver PM flags", I want to make
> >> sure, as long as possible, that we can remove both the
> >> "direct_complete" path support from the PM core as well as removing
> >> the pm_runtime_force_suspend|resume() helpers.
> >
> > We'll see.
> >
> >> >
> >> > I'll take care of that when I'll be removing pm_runtime_force_*, which I'm
> >> > not doing here.
> >>
> >> Of course I am fine with that we postpone doing the actual converting
> >> of drivers etc from this series, although as stated above, let's sure
> >> we *can* do it by using the "driver PM flags".
> >
> > There clearly are use cases that benefit from this series and I don't see
> > any alternatives covering them, including both direct-complete and the
> > pm_runtime_force* approach, so I'm not buying this "let's make sure
> > it can cover all possible use cases that exist" argumentation.
> 
> Alright, let me re-phrase my take on this.
> 
> Because you stated that you plan to remove pm_runtime_force_*
> eventually, then I think you need to put up some valid reasons of why
> (I consider that done), but more importantly, you need to offer an
> alternative solution that can replace it. Else such that statement can
> easily become wrong interpreted. My point is, the "driver PM flags" do
> *not* offers a full alternative solution, it may do in the future or
> it may not.
> 
> So, to conclude from my side, I don't have any major objections to
> going forward with the "driver PM flags", especially with the goal of
> improving the situation for PCI and ACPI. Down the road, we can then
> *try* to make it replace pm_runtime_force_* and the "direct_complete
> path".
> 
> Hopefully that makes it more clear.

Yes, it does, thank you!

Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-18 13:00           ` Rafael J. Wysocki
@ 2017-10-18 14:11             ` Ulf Hansson
  2017-10-18 19:45               ` Grygorii Strashko
  2017-10-18 22:12               ` Rafael J. Wysocki
  0 siblings, 2 replies; 135+ messages in thread
From: Ulf Hansson @ 2017-10-18 14:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

[...]

>>
>> The reason why pm_runtime_force_* needs to respects the hierarchy of
>> the RPM callbacks, is because otherwise it can't safely update the
>> runtime PM status of the device.
>
> I'm not sure I follow this requirement.  Why is that so?

If the PM domain controls some resources for the device in its RPM
callbacks and the driver controls some other resources in its RPM
callbacks - then these resources needs to be managed together.

This follows the behavior of when a regular call to
pm_runtime_get|put(), triggers the RPM callbacks to be invoked.

>
>> And updating the runtime PM status of
>> the device is required to manage the optimized behavior during system
>> resume (avoiding to unnecessary resume devices).
>
> Well, OK.  The runtime PM status of the device after system resume should
> better reflect its physical state.
>
> [The physical state of the device may not be under the control of the
> kernel in some cases, like in S3 resume on some systems that reset
> devices in the firmware and so on, but let's set that aside.]
>
> However, for the runtime PM status of the device may still reflect its state
> if, say, a ->resume_early of the middle layer is called during resume along
> with a driver's ->runtime_resume.  That still can produce the right state
> of the device and all depends on the middle layer.
>
> On the other hand, as I said before, using a middle-layer ->runtime_suspend
> during a system sleep transition may be outright incorrect, say if device
> wakeup settings need to be adjusted by the middle layer (which is the
> case for some of them).
>
> Of course, if the middle layer expects the driver to point its
> system-wide PM callbacks to pm_runtime_force_*, then that's how it goes,
> but the drivers working with this particular middle layer generally
> won't work with other middle layers and may interact incorrectly
> with parents and/or children using the other middle layers.
>
> I guess the problem boils down to having a common set of expectations
> on the driver side and on the middle layer side allowing different
> combinations of these to work together.

Yes!

>
>> Besides the AMBA case, I also realized that we are dealing with PM
>> clocks in the genpd case. For this, genpd relies on the that runtime
>> PM status of the device properly reflects the state of the HW, during
>> system-wide PM.
>>
>> In other words, if the driver would change the runtime PM status of
>> the device, without respecting the hierarchy of the runtime PM
>> callbacks, it would lead to that genpd starts taking wrong decisions
>> while managing the PM clocks during system-wide PM. So in case you
>> intend to change pm_runtime_force_* this needs to be addressed too.
>
> I've just looked at the genpd code and quite frankly I'm not sure how this
> works, but I'll figure this out. :-)

You may think of it as genpd's RPM callback controls some device
clocks, while the driver control some other device resources (pinctrl
for example) from its RPM callback.

These resources needs to managed together, similar to as I described above.

[...]

>> Absolutely agree about the different wake-up settings. However, these
>> issues can be addressed also when using pm_runtime_force_*, at least
>> in general, but then not for PCI.
>
> Well, not for the ACPI PM domain too.
>
> In general, not if the wakeup settings are adjusted by the middle layer.

Correct!

To use pm_runtime_force* for these cases, one would need some
additional information exchange between the driver and the
middle-layer.

>
>> Regarding hibernation, honestly that's not really my area of
>> expertise. Although, I assume the middle-layer and driver can treat
>> that as a separate case, so if it's not suitable to use
>> pm_runtime_force* for that case, then they shouldn't do it.
>
> Well, agreed.
>
> In some simple cases, though, driver callbacks can be reused for hibernation
> too, so it would be good to have a common way to do that too, IMO.

Okay, that makes sense!

>
>> >
>> > Also, quite so often other middle layers interact with PCI directly or
>> > indirectly (eg. a platform device may be a child or a consumer of a PCI
>> > device) and some optimizations need to take that into account (eg. parents
>> > generally need to be accessible when their childres are resumed and so on).
>>
>> A device's parent becomes informed when changing the runtime PM status
>> of the device via pm_runtime_force_suspend|resume(), as those calls
>> pm_runtime_set_suspended|active().
>
> This requires the parent driver or middle layer to look at the reference
> counter and understand it the same way as pm_runtime_force_*.
>
>> In case that isn't that sufficient, what else is needed? Perhaps you can
>> point me to an example so I can understand better?
>
> Say you want to leave the parent suspended after system resume, but the
> child drivers use pm_runtime_force_suspend|resume().  The parent would then
> need to use pm_runtime_force_suspend|resume() too, no?

Actually no.

Currently the other options of "deferring resume" (not using
pm_runtime_force_*), is either using the "direct_complete" path or
similar to the approach you took for the i2c designware driver.

Both cases should play nicely in combination of a child being managed
by pm_runtime_force_*. That's because only when the parent device is
kept runtime suspended during system suspend, resuming can be
deferred.

That means, if the resume of the parent is deferred, so will the also
the resume of the child.

>
>> For a PCI consumer device those will of course have to play by the rules of PCI.
>>
>> >
>> > Moreover, the majority of the "other subsystems/middle-layers" you've talked
>> > about so far don't provide any PM callbacks to be invoked by pm_runtime_force_*,
>> > so question is how representative they really are.
>>
>> That's the point. We know pm_runtime_force_* works nicely for the
>> trivial middle-layer cases.
>
> In which cases the middle-layer callbacks don't exist, so it's just like
> reusing driver callbacks directly. :-)
>
>> For the more complex cases, we need something additional/different.
>
> Something different.
>
> But overall, as I said, this is about common expectations.
>
> Today, some middle layers expect drivers to point their callback pointers
> to the same routine in order to resue it (PCI, ACPI bus type), some of them
> expect pm_runtime_force_suspend|resume() to be used (AMBA, maybe genpd),
> and some of them have no expectations at all.
>
> There needs to be a common ground in that area for drivers to be able to
> work with different middle layers.

Yes, reaching that point would be great, we should definitively aim for that!

[...]

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-18 14:11             ` Ulf Hansson
@ 2017-10-18 19:45               ` Grygorii Strashko
  2017-10-18 21:48                 ` Rafael J. Wysocki
  2017-10-18 22:12               ` Rafael J. Wysocki
  1 sibling, 1 reply; 135+ messages in thread
From: Grygorii Strashko @ 2017-10-18 19:45 UTC (permalink / raw)
  To: Ulf Hansson, Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones



On 10/18/2017 09:11 AM, Ulf Hansson wrote:
> [...]
> 
>>>
>>> The reason why pm_runtime_force_* needs to respects the hierarchy of
>>> the RPM callbacks, is because otherwise it can't safely update the
>>> runtime PM status of the device.
>>
>> I'm not sure I follow this requirement.  Why is that so?
> 
> If the PM domain controls some resources for the device in its RPM
> callbacks and the driver controls some other resources in its RPM
> callbacks - then these resources needs to be managed together.
> 
> This follows the behavior of when a regular call to
> pm_runtime_get|put(), triggers the RPM callbacks to be invoked.
> 
>>
>>> And updating the runtime PM status of
>>> the device is required to manage the optimized behavior during system
>>> resume (avoiding to unnecessary resume devices).
>>
>> Well, OK.  The runtime PM status of the device after system resume should
>> better reflect its physical state.
>>
>> [The physical state of the device may not be under the control of the
>> kernel in some cases, like in S3 resume on some systems that reset
>> devices in the firmware and so on, but let's set that aside.]
>>
>> However, for the runtime PM status of the device may still reflect its state
>> if, say, a ->resume_early of the middle layer is called during resume along
>> with a driver's ->runtime_resume.  That still can produce the right state
>> of the device and all depends on the middle layer.
>>
>> On the other hand, as I said before, using a middle-layer ->runtime_suspend
>> during a system sleep transition may be outright incorrect, say if device
>> wakeup settings need to be adjusted by the middle layer (which is the
>> case for some of them).
>>
>> Of course, if the middle layer expects the driver to point its
>> system-wide PM callbacks to pm_runtime_force_*, then that's how it goes,
>> but the drivers working with this particular middle layer generally
>> won't work with other middle layers and may interact incorrectly
>> with parents and/or children using the other middle layers.
>>
>> I guess the problem boils down to having a common set of expectations
>> on the driver side and on the middle layer side allowing different
>> combinations of these to work together.
> 
> Yes!
> 
>>
>>> Besides the AMBA case, I also realized that we are dealing with PM
>>> clocks in the genpd case. For this, genpd relies on the that runtime
>>> PM status of the device properly reflects the state of the HW, during
>>> system-wide PM.
>>>
>>> In other words, if the driver would change the runtime PM status of
>>> the device, without respecting the hierarchy of the runtime PM
>>> callbacks, it would lead to that genpd starts taking wrong decisions
>>> while managing the PM clocks during system-wide PM. So in case you
>>> intend to change pm_runtime_force_* this needs to be addressed too.
>>
>> I've just looked at the genpd code and quite frankly I'm not sure how this
>> works, but I'll figure this out. :-)
> 
> You may think of it as genpd's RPM callback controls some device
> clocks, while the driver control some other device resources (pinctrl
> for example) from its RPM callback.
> 
> These resources needs to managed together, similar to as I described above.
> 
> [...]
> 
>>> Absolutely agree about the different wake-up settings. However, these
>>> issues can be addressed also when using pm_runtime_force_*, at least
>>> in general, but then not for PCI.
>>
>> Well, not for the ACPI PM domain too.
>>
>> In general, not if the wakeup settings are adjusted by the middle layer.
> 
> Correct!
> 
> To use pm_runtime_force* for these cases, one would need some
> additional information exchange between the driver and the
> middle-layer.
> 
>>
>>> Regarding hibernation, honestly that's not really my area of
>>> expertise. Although, I assume the middle-layer and driver can treat
>>> that as a separate case, so if it's not suitable to use
>>> pm_runtime_force* for that case, then they shouldn't do it.
>>
>> Well, agreed.
>>
>> In some simple cases, though, driver callbacks can be reused for hibernation
>> too, so it would be good to have a common way to do that too, IMO.
> 
> Okay, that makes sense!
> 
>>
>>>>
>>>> Also, quite so often other middle layers interact with PCI directly or
>>>> indirectly (eg. a platform device may be a child or a consumer of a PCI
>>>> device) and some optimizations need to take that into account (eg. parents
>>>> generally need to be accessible when their childres are resumed and so on).
>>>
>>> A device's parent becomes informed when changing the runtime PM status
>>> of the device via pm_runtime_force_suspend|resume(), as those calls
>>> pm_runtime_set_suspended|active().
>>
>> This requires the parent driver or middle layer to look at the reference
>> counter and understand it the same way as pm_runtime_force_*.
>>
>>> In case that isn't that sufficient, what else is needed? Perhaps you can
>>> point me to an example so I can understand better?
>>
>> Say you want to leave the parent suspended after system resume, but the
>> child drivers use pm_runtime_force_suspend|resume().  The parent would then
>> need to use pm_runtime_force_suspend|resume() too, no?
> 
> Actually no.
> 
> Currently the other options of "deferring resume" (not using
> pm_runtime_force_*), is either using the "direct_complete" path or
> similar to the approach you took for the i2c designware driver.
> 
> Both cases should play nicely in combination of a child being managed
> by pm_runtime_force_*. That's because only when the parent device is
> kept runtime suspended during system suspend, resuming can be
> deferred.
> 
> That means, if the resume of the parent is deferred, so will the also
> the resume of the child.
> 
>>
>>> For a PCI consumer device those will of course have to play by the rules of PCI.
>>>
>>>>
>>>> Moreover, the majority of the "other subsystems/middle-layers" you've talked
>>>> about so far don't provide any PM callbacks to be invoked by pm_runtime_force_*,
>>>> so question is how representative they really are.
>>>
>>> That's the point. We know pm_runtime_force_* works nicely for the
>>> trivial middle-layer cases.
>>
>> In which cases the middle-layer callbacks don't exist, so it's just like
>> reusing driver callbacks directly. :-)

I'd like to ask you clarify one point here and provide some info which I hope can be useful - 
what's exactly means  "trivial middle-layer cases"?

Is it when systems use "drivers/base/power/clock_ops.c - Generic clock manipulation PM callbacks"
as dev_pm_domain (arm davinci/keystone), or OMAP device framework struct dev_pm_domain omap_device_pm_domain
 (arm/mach-omap2/omap_device.c) or static const struct dev_pm_ops tegra_aconnect_pm_ops?

if yes all above have PM runtime callbacks.


-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-18 19:45               ` Grygorii Strashko
@ 2017-10-18 21:48                 ` Rafael J. Wysocki
  2017-10-19  8:33                   ` Ulf Hansson
  0 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-18 21:48 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Ulf Hansson, Linux PM, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Andy Shevchenko,
	Kevin Hilman, Wolfram Sang, linux-i2c, Lee Jones

On Wednesday, October 18, 2017 9:45:11 PM CEST Grygorii Strashko wrote:
> 
> On 10/18/2017 09:11 AM, Ulf Hansson wrote:

[...]

> >>> That's the point. We know pm_runtime_force_* works nicely for the
> >>> trivial middle-layer cases.
> >>
> >> In which cases the middle-layer callbacks don't exist, so it's just like
> >> reusing driver callbacks directly. :-)
> 
> I'd like to ask you clarify one point here and provide some info which I hope can be useful - 
> what's exactly means  "trivial middle-layer cases"?
> 
> Is it when systems use "drivers/base/power/clock_ops.c - Generic clock
> manipulation PM callbacks" as dev_pm_domain (arm davinci/keystone), or OMAP
> device framework struct dev_pm_domain omap_device_pm_domain
> (arm/mach-omap2/omap_device.c) or static const struct dev_pm_ops
> tegra_aconnect_pm_ops?
> 
> if yes all above have PM runtime callbacks.

Trivial ones don't actually do anything meaningful in their PM callbacks.

Things like the platform bus type, spi bus type, i2c bus type and similar.

If the middle-layer callbacks manipulate devices in a significant way, then
they aren't trivial.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-18 12:34           ` Ulf Hansson
@ 2017-10-18 21:54             ` Rafael J. Wysocki
  0 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-18 21:54 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On Wednesday, October 18, 2017 2:34:10 PM CEST Ulf Hansson wrote:
> [...]
> 
> >> Are there any major reasons why the appended patch (obviously untested) won't
> >> work, then?
> >
> > OK, there is a reason, which is the optimizations bundled into
> > pm_runtime_force_*, because (a) the device may be left in runtime suspend
> > by them (in which case amba_pm_suspend_early() in my patch should not run)
> > and (b) pm_runtime_force_resume() may decide to leave it suspended (in which
> > case amba_pm_suspend_late() in my patch should not run).
> 
> Exactly.
> 
> >
> > [BTW, the "leave the device suspended" optimization in pm_runtime_force_*
> > is potentially problematic too, because it requires the children to do
> > the right thing, which effectively means that their drivers need to use
> > pm_runtime_force_* too, but what if they don't want to reuse their
> > runtime PM callbacks for system-wide PM?]
> 
> Deployment of pm_runtime_force_suspend() should generally be done for
> children devices first.
> 
> If some reason that isn't the case, it's expected that the call to
> pm_runtime_set_suspended() invoked from pm_runtime_force_suspend(),
> for the parent, should fail and thus abort system suspend.

Well, generally what about drivers that need to do something significantly
different for system suspend and runtime PM?  The whole picture seems to be
falling apart if one of these is involved.

> >
> > Honestly, I don't like the way this is designed.  IMO, it would be better
> > to do the optimizations and all in the bus type middle-layer code instead
> > of expecting drivers to use pm_runtime_force_* as their system-wide PM
> > callbacks (and that expectation should at least be documented, which I'm
> > not sure is the case now).  But whatever.
> >
> > It all should work the way it does now without pm_runtime_force_* if (a) the
> > bus type's PM callbacks are changed like in the last patch and the drivers
> > (b) point their system suspend callbacks to the runtime PM callback routines
> > and (c) set DPM_FLAG_SMART_SUSPEND and DPM_FLAG_LEAVE_SUSPENDED for the
> > devices (if they need to do the PM in ->suspend and ->resume, they may set
> > DPM_FLAG_AVOID_RPM too).
> >
> > And if you see a reason why that won't work, please let me know.
> 
> I will have look and try out the series by using my local "runtime PM
> test driver".
> 
> I get back to you with an update on this.

OK, thanks!

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-18 14:11             ` Ulf Hansson
  2017-10-18 19:45               ` Grygorii Strashko
@ 2017-10-18 22:12               ` Rafael J. Wysocki
  2017-10-19 12:21                 ` Ulf Hansson
  1 sibling, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-18 22:12 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On Wednesday, October 18, 2017 4:11:33 PM CEST Ulf Hansson wrote:
> [...]
> 
> >>
> >> The reason why pm_runtime_force_* needs to respects the hierarchy of
> >> the RPM callbacks, is because otherwise it can't safely update the
> >> runtime PM status of the device.
> >
> > I'm not sure I follow this requirement.  Why is that so?
> 
> If the PM domain controls some resources for the device in its RPM
> callbacks and the driver controls some other resources in its RPM
> callbacks - then these resources needs to be managed together.

Right, but that doesn't automatically make it necessary to use runtime PM
callbacks in the middle layer.  Its system-wide PM callbacks may be
suitable for that just fine.

That is, at least in some cases, you can combine ->runtime_suspend from a
driver and ->suspend_late from a middle layer with no problems, for example.

That's why some middle layers allow drivers to point ->suspend_late and
->runtime_suspend to the same routine if they want to reuse that code.

> This follows the behavior of when a regular call to
> pm_runtime_get|put(), triggers the RPM callbacks to be invoked.

But (a) it doesn't have to follow it and (b) in some cases it should not
follow it.
 
> >
> >> And updating the runtime PM status of
> >> the device is required to manage the optimized behavior during system
> >> resume (avoiding to unnecessary resume devices).
> >
> > Well, OK.  The runtime PM status of the device after system resume should
> > better reflect its physical state.
> >
> > [The physical state of the device may not be under the control of the
> > kernel in some cases, like in S3 resume on some systems that reset
> > devices in the firmware and so on, but let's set that aside.]
> >
> > However, for the runtime PM status of the device may still reflect its state
> > if, say, a ->resume_early of the middle layer is called during resume along
> > with a driver's ->runtime_resume.  That still can produce the right state
> > of the device and all depends on the middle layer.
> >
> > On the other hand, as I said before, using a middle-layer ->runtime_suspend
> > during a system sleep transition may be outright incorrect, say if device
> > wakeup settings need to be adjusted by the middle layer (which is the
> > case for some of them).
> >
> > Of course, if the middle layer expects the driver to point its
> > system-wide PM callbacks to pm_runtime_force_*, then that's how it goes,
> > but the drivers working with this particular middle layer generally
> > won't work with other middle layers and may interact incorrectly
> > with parents and/or children using the other middle layers.
> >
> > I guess the problem boils down to having a common set of expectations
> > on the driver side and on the middle layer side allowing different
> > combinations of these to work together.
> 
> Yes!
> 
> >
> >> Besides the AMBA case, I also realized that we are dealing with PM
> >> clocks in the genpd case. For this, genpd relies on the that runtime
> >> PM status of the device properly reflects the state of the HW, during
> >> system-wide PM.
> >>
> >> In other words, if the driver would change the runtime PM status of
> >> the device, without respecting the hierarchy of the runtime PM
> >> callbacks, it would lead to that genpd starts taking wrong decisions
> >> while managing the PM clocks during system-wide PM. So in case you
> >> intend to change pm_runtime_force_* this needs to be addressed too.
> >
> > I've just looked at the genpd code and quite frankly I'm not sure how this
> > works, but I'll figure this out. :-)
> 
> You may think of it as genpd's RPM callback controls some device
> clocks, while the driver control some other device resources (pinctrl
> for example) from its RPM callback.
> 
> These resources needs to managed together, similar to as I described above.

Which, again, doesn't mean that runtime PM callbacks from the middle layer
have to be used for that.

> [...]
> 
> >> Absolutely agree about the different wake-up settings. However, these
> >> issues can be addressed also when using pm_runtime_force_*, at least
> >> in general, but then not for PCI.
> >
> > Well, not for the ACPI PM domain too.
> >
> > In general, not if the wakeup settings are adjusted by the middle layer.
> 
> Correct!
> 
> To use pm_runtime_force* for these cases, one would need some
> additional information exchange between the driver and the
> middle-layer.

Which pretty much defeats the purpose of the wrappers, doesn't it?

> >
> >> Regarding hibernation, honestly that's not really my area of
> >> expertise. Although, I assume the middle-layer and driver can treat
> >> that as a separate case, so if it's not suitable to use
> >> pm_runtime_force* for that case, then they shouldn't do it.
> >
> > Well, agreed.
> >
> > In some simple cases, though, driver callbacks can be reused for hibernation
> > too, so it would be good to have a common way to do that too, IMO.
> 
> Okay, that makes sense!
> 
> >
> >> >
> >> > Also, quite so often other middle layers interact with PCI directly or
> >> > indirectly (eg. a platform device may be a child or a consumer of a PCI
> >> > device) and some optimizations need to take that into account (eg. parents
> >> > generally need to be accessible when their childres are resumed and so on).
> >>
> >> A device's parent becomes informed when changing the runtime PM status
> >> of the device via pm_runtime_force_suspend|resume(), as those calls
> >> pm_runtime_set_suspended|active().
> >
> > This requires the parent driver or middle layer to look at the reference
> > counter and understand it the same way as pm_runtime_force_*.
> >
> >> In case that isn't that sufficient, what else is needed? Perhaps you can
> >> point me to an example so I can understand better?
> >
> > Say you want to leave the parent suspended after system resume, but the
> > child drivers use pm_runtime_force_suspend|resume().  The parent would then
> > need to use pm_runtime_force_suspend|resume() too, no?
> 
> Actually no.
> 
> Currently the other options of "deferring resume" (not using
> pm_runtime_force_*), is either using the "direct_complete" path or
> similar to the approach you took for the i2c designware driver.
>
> Both cases should play nicely in combination of a child being managed
> by pm_runtime_force_*. That's because only when the parent device is
> kept runtime suspended during system suspend, resuming can be
> deferred.

And because the parent remains in runtime suspend late enough in the
system suspend path, its children also are guaranteed to be suspended.

But then all of them need to be left in runtime suspend during system
resume too, which is somewhat restrictive, because some drivers may
want their devices to be resumed then.

[BTW, our current documentation recommends resuming devices during
system resume, actually, and gives a list of reasons why. :-)]

> That means, if the resume of the parent is deferred, so will the also
> the resume of the child.
> 
> >
> >> For a PCI consumer device those will of course have to play by the rules of PCI.
> >>
> >> >
> >> > Moreover, the majority of the "other subsystems/middle-layers" you've talked
> >> > about so far don't provide any PM callbacks to be invoked by pm_runtime_force_*,
> >> > so question is how representative they really are.
> >>
> >> That's the point. We know pm_runtime_force_* works nicely for the
> >> trivial middle-layer cases.
> >
> > In which cases the middle-layer callbacks don't exist, so it's just like
> > reusing driver callbacks directly. :-)
> >
> >> For the more complex cases, we need something additional/different.
> >
> > Something different.
> >
> > But overall, as I said, this is about common expectations.
> >
> > Today, some middle layers expect drivers to point their callback pointers
> > to the same routine in order to resue it (PCI, ACPI bus type), some of them
> > expect pm_runtime_force_suspend|resume() to be used (AMBA, maybe genpd),
> > and some of them have no expectations at all.
> >
> > There needs to be a common ground in that area for drivers to be able to
> > work with different middle layers.
> 
> Yes, reaching that point would be great, we should definitively aim for that!

Indeed.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [Update][PATCH v2 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-16  1:29 ` [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags Rafael J. Wysocki
                     ` (3 preceding siblings ...)
  2017-10-16 20:16   ` Alan Stern
@ 2017-10-18 23:17   ` Rafael J. Wysocki
  2017-10-19  7:33     ` Greg Kroah-Hartman
  2017-10-23 16:37     ` Ulf Hansson
  4 siblings, 2 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-18 23:17 UTC (permalink / raw)
  To: Linux PM, Greg Kroah-Hartman, Lukas Wunner
  Cc: Bjorn Helgaas, Alan Stern, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The motivation for this change is to provide a way to work around
a problem with the direct-complete mechanism used for avoiding
system suspend/resume handling for devices in runtime suspend.

The problem is that some middle layer code (the PCI bus type and
the ACPI PM domain in particular) returns positive values from its
system suspend ->prepare callbacks regardless of whether the driver's
->prepare returns a positive value or 0, which effectively prevents
drivers from being able to control the direct-complete feature.
Some drivers need that control, however, and the PCI bus type has
grown its own flag to deal with this issue, but since it is not
limited to PCI, it is better to address it by adding driver flags at
the core level.

To that end, add a driver_flags field to struct dev_pm_info for flags
that can be set by device drivers at the probe time to inform the PM
core and/or bus types, PM domains and so on on the capabilities and/or
preferences of device drivers.  Also add two static inline helpers
for setting that field and testing it against a given set of flags
and make the driver core clear it automatically on driver remove
and probe failures.

Define and document two PM driver flags related to the direct-
complete feature: NEVER_SKIP and SMART_PREPARE that can be used,
respectively, to indicate to the PM core that the direct-complete
mechanism should never be used for the device and to inform the
middle layer code (bus types, PM domains etc) that it can only
request the PM core to use the direct-complete mechanism for
the device (by returning a positive value from its ->prepare
callback) if it also has been requested by the driver.

While at it, make the core check pm_runtime_suspended() when
setting power.direct_complete so that it doesn't need to be
checked by ->prepare callbacks.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

-> v2: Change the data type for driver_flags to u32 as suggested by Greg
       and fix a couple of documentation typos pointed out by Lukas.

---
 Documentation/driver-api/pm/devices.rst |   14 ++++++++++++++
 Documentation/power/pci.txt             |   19 +++++++++++++++++++
 drivers/acpi/device_pm.c                |    3 +++
 drivers/base/dd.c                       |    2 ++
 drivers/base/power/main.c               |    4 +++-
 drivers/pci/pci-driver.c                |    5 ++++-
 include/linux/device.h                  |   10 ++++++++++
 include/linux/pm.h                      |   20 ++++++++++++++++++++
 8 files changed, 75 insertions(+), 2 deletions(-)

Index: linux-pm/include/linux/device.h
===================================================================
--- linux-pm.orig/include/linux/device.h
+++ linux-pm/include/linux/device.h
@@ -1070,6 +1070,16 @@ static inline void dev_pm_syscore_device
 #endif
 }
 
+static inline void dev_pm_set_driver_flags(struct device *dev, u32 flags)
+{
+	dev->power.driver_flags = flags;
+}
+
+static inline bool dev_pm_test_driver_flags(struct device *dev, u32 flags)
+{
+	return !!(dev->power.driver_flags & flags);
+}
+
 static inline void device_lock(struct device *dev)
 {
 	mutex_lock(&dev->mutex);
Index: linux-pm/include/linux/pm.h
===================================================================
--- linux-pm.orig/include/linux/pm.h
+++ linux-pm/include/linux/pm.h
@@ -550,6 +550,25 @@ struct pm_subsys_data {
 #endif
 };
 
+/*
+ * Driver flags to control system suspend/resume behavior.
+ *
+ * These flags can be set by device drivers at the probe time.  They need not be
+ * cleared by the drivers as the driver core will take care of that.
+ *
+ * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
+ * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
+ *
+ * Setting SMART_PREPARE instructs bus types and PM domains which may want
+ * system suspend/resume callbacks to be skipped for the device to return 0 from
+ * their ->prepare callbacks if the driver's ->prepare callback returns 0 (in
+ * other words, the system suspend/resume callbacks can only be skipped for the
+ * device if its driver doesn't object against that).  This flag has no effect
+ * if NEVER_SKIP is set.
+ */
+#define DPM_FLAG_NEVER_SKIP	BIT(0)
+#define DPM_FLAG_SMART_PREPARE	BIT(1)
+
 struct dev_pm_info {
 	pm_message_t		power_state;
 	unsigned int		can_wakeup:1;
@@ -561,6 +580,7 @@ struct dev_pm_info {
 	bool			is_late_suspended:1;
 	bool			early_init:1;	/* Owned by the PM core */
 	bool			direct_complete:1;	/* Owned by the PM core */
+	u32			driver_flags;
 	spinlock_t		lock;
 #ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
Index: linux-pm/drivers/base/dd.c
===================================================================
--- linux-pm.orig/drivers/base/dd.c
+++ linux-pm/drivers/base/dd.c
@@ -464,6 +464,7 @@ pinctrl_bind_failed:
 	if (dev->pm_domain && dev->pm_domain->dismiss)
 		dev->pm_domain->dismiss(dev);
 	pm_runtime_reinit(dev);
+	dev_pm_set_driver_flags(dev, 0);
 
 	switch (ret) {
 	case -EPROBE_DEFER:
@@ -869,6 +870,7 @@ static void __device_release_driver(stru
 		if (dev->pm_domain && dev->pm_domain->dismiss)
 			dev->pm_domain->dismiss(dev);
 		pm_runtime_reinit(dev);
+		dev_pm_set_driver_flags(dev, 0);
 
 		klist_remove(&dev->p->knode_driver);
 		device_pm_check_callbacks(dev);
Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -1700,7 +1700,9 @@ unlock:
 	 * applies to suspend transitions, however.
 	 */
 	spin_lock_irq(&dev->power.lock);
-	dev->power.direct_complete = ret > 0 && state.event == PM_EVENT_SUSPEND;
+	dev->power.direct_complete = state.event == PM_EVENT_SUSPEND &&
+		pm_runtime_suspended(dev) && ret > 0 &&
+		!dev_pm_test_driver_flags(dev, DPM_FLAG_NEVER_SKIP);
 	spin_unlock_irq(&dev->power.lock);
 	return 0;
 }
Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -682,8 +682,11 @@ static int pci_pm_prepare(struct device
 
 	if (drv && drv->pm && drv->pm->prepare) {
 		int error = drv->pm->prepare(dev);
-		if (error)
+		if (error < 0)
 			return error;
+
+		if (!error && dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_PREPARE))
+			return 0;
 	}
 	return pci_dev_keep_suspended(to_pci_dev(dev));
 }
Index: linux-pm/drivers/acpi/device_pm.c
===================================================================
--- linux-pm.orig/drivers/acpi/device_pm.c
+++ linux-pm/drivers/acpi/device_pm.c
@@ -965,6 +965,9 @@ int acpi_subsys_prepare(struct device *d
 	if (ret < 0)
 		return ret;
 
+	if (!ret && dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_PREPARE))
+		return 0;
+
 	if (!adev || !pm_runtime_suspended(dev))
 		return 0;
 
Index: linux-pm/Documentation/driver-api/pm/devices.rst
===================================================================
--- linux-pm.orig/Documentation/driver-api/pm/devices.rst
+++ linux-pm/Documentation/driver-api/pm/devices.rst
@@ -354,6 +354,20 @@ the phases are: ``prepare``, ``suspend``
 	is because all such devices are initially set to runtime-suspended with
 	runtime PM disabled.
 
+	This feature also can be controlled by device drivers by using the
+	``DPM_FLAG_NEVER_SKIP`` and ``DPM_FLAG_SMART_PREPARE`` driver power
+	management flags.  [Typically, they are set at the time the driver is
+	probed against the device in question by passing them to the
+	:c:func:`dev_pm_set_driver_flags` helper function.]  If the first of
+	these flags is set, the PM core will not apply the direct-complete
+	procedure described above to the given device and, consequenty, to any
+	of its ancestors.  The second flag, when set, informs the middle layer
+	code (bus types, device types, PM domains, classes) that it should take
+	the return value of the ``->prepare`` callback provided by the driver
+	into account and it may only return a positive value from its own
+	``->prepare`` callback if the driver's one also has returned a positive
+	value.
+
     2.	The ``->suspend`` methods should quiesce the device to stop it from
 	performing I/O.  They also may save the device registers and put it into
 	the appropriate low-power state, depending on the bus type the device is
Index: linux-pm/Documentation/power/pci.txt
===================================================================
--- linux-pm.orig/Documentation/power/pci.txt
+++ linux-pm/Documentation/power/pci.txt
@@ -961,6 +961,25 @@ dev_pm_ops to indicate that one suspend
 .suspend(), .freeze(), and .poweroff() members and one resume routine is to
 be pointed to by the .resume(), .thaw(), and .restore() members.
 
+3.1.19. Driver Flags for Power Management
+
+The PM core allows device drivers to set flags that influence the handling of
+power management for the devices by the core itself and by middle layer code
+including the PCI bus type.  The flags should be set once at the driver probe
+time with the help of the dev_pm_set_driver_flags() function and they should not
+be updated directly afterwards.
+
+The DPM_FLAG_NEVER_SKIP flag prevents the PM core from using the direct-complete
+mechanism allowing device suspend/resume callbacks to be skipped if the device
+is in runtime suspend when the system suspend starts.  That also affects all of
+the ancestors of the device, so this flag should only be used if absolutely
+necessary.
+
+The DPM_FLAG_SMART_PREPARE flag instructs the PCI bus type to only return a
+positive value from pci_pm_prepare() if the ->prepare callback provided by the
+driver of the device returns a positive value.  That allows the driver to opt
+out from using the direct-complete mechanism dynamically.
+
 3.2. Device Runtime Power Management
 ------------------------------------
 In addition to providing device power management callbacks PCI device drivers

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [Update][PATCH v2 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-18 23:17   ` [Update][PATCH v2 " Rafael J. Wysocki
@ 2017-10-19  7:33     ` Greg Kroah-Hartman
  2017-10-20 11:11       ` Rafael J. Wysocki
  2017-10-23 16:37     ` Ulf Hansson
  1 sibling, 1 reply; 135+ messages in thread
From: Greg Kroah-Hartman @ 2017-10-19  7:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Lukas Wunner, Bjorn Helgaas, Alan Stern, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Ulf Hansson, Andy Shevchenko, Kevin Hilman, Wolfram Sang,
	linux-i2c, Lee Jones

On Thu, Oct 19, 2017 at 01:17:31AM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The motivation for this change is to provide a way to work around
> a problem with the direct-complete mechanism used for avoiding
> system suspend/resume handling for devices in runtime suspend.
> 
> The problem is that some middle layer code (the PCI bus type and
> the ACPI PM domain in particular) returns positive values from its
> system suspend ->prepare callbacks regardless of whether the driver's
> ->prepare returns a positive value or 0, which effectively prevents
> drivers from being able to control the direct-complete feature.
> Some drivers need that control, however, and the PCI bus type has
> grown its own flag to deal with this issue, but since it is not
> limited to PCI, it is better to address it by adding driver flags at
> the core level.
> 
> To that end, add a driver_flags field to struct dev_pm_info for flags
> that can be set by device drivers at the probe time to inform the PM
> core and/or bus types, PM domains and so on on the capabilities and/or
> preferences of device drivers.  Also add two static inline helpers
> for setting that field and testing it against a given set of flags
> and make the driver core clear it automatically on driver remove
> and probe failures.
> 
> Define and document two PM driver flags related to the direct-
> complete feature: NEVER_SKIP and SMART_PREPARE that can be used,
> respectively, to indicate to the PM core that the direct-complete
> mechanism should never be used for the device and to inform the
> middle layer code (bus types, PM domains etc) that it can only
> request the PM core to use the direct-complete mechanism for
> the device (by returning a positive value from its ->prepare
> callback) if it also has been requested by the driver.
> 
> While at it, make the core check pm_runtime_suspended() when
> setting power.direct_complete so that it doesn't need to be
> checked by ->prepare callbacks.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-18 21:48                 ` Rafael J. Wysocki
@ 2017-10-19  8:33                   ` Ulf Hansson
  2017-10-19 17:21                     ` Grygorii Strashko
  0 siblings, 1 reply; 135+ messages in thread
From: Ulf Hansson @ 2017-10-19  8:33 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Linux PM, Rafael J. Wysocki, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Andy Shevchenko,
	Kevin Hilman, Wolfram Sang, linux-i2c, Lee Jones

On 18 October 2017 at 23:48, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Wednesday, October 18, 2017 9:45:11 PM CEST Grygorii Strashko wrote:
>>
>> On 10/18/2017 09:11 AM, Ulf Hansson wrote:
>
> [...]
>
>> >>> That's the point. We know pm_runtime_force_* works nicely for the
>> >>> trivial middle-layer cases.
>> >>
>> >> In which cases the middle-layer callbacks don't exist, so it's just like
>> >> reusing driver callbacks directly. :-)
>>
>> I'd like to ask you clarify one point here and provide some info which I hope can be useful -
>> what's exactly means  "trivial middle-layer cases"?
>>
>> Is it when systems use "drivers/base/power/clock_ops.c - Generic clock
>> manipulation PM callbacks" as dev_pm_domain (arm davinci/keystone), or OMAP
>> device framework struct dev_pm_domain omap_device_pm_domain
>> (arm/mach-omap2/omap_device.c) or static const struct dev_pm_ops
>> tegra_aconnect_pm_ops?
>>
>> if yes all above have PM runtime callbacks.
>
> Trivial ones don't actually do anything meaningful in their PM callbacks.
>
> Things like the platform bus type, spi bus type, i2c bus type and similar.
>
> If the middle-layer callbacks manipulate devices in a significant way, then
> they aren't trivial.

I fully agree with Rafael's description above, but let me also clarify
one more thing.

We have also been discussing PM domains as being trivial and
non-trivial. In some statements I even think the PM domain has been a
part the middle-layer terminology, which may have been a bit
confusing.

In this regards as we consider genpd being a trivial PM domain, those
examples your bring up above is too me also examples of trivial PM
domains. Especially because they don't deal with wakeups, as that is
taken care of by the drivers, right!?

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-18 22:12               ` Rafael J. Wysocki
@ 2017-10-19 12:21                 ` Ulf Hansson
  2017-10-19 18:01                   ` Ulf Hansson
  2017-10-20  1:19                   ` Rafael J. Wysocki
  0 siblings, 2 replies; 135+ messages in thread
From: Ulf Hansson @ 2017-10-19 12:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On 19 October 2017 at 00:12, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Wednesday, October 18, 2017 4:11:33 PM CEST Ulf Hansson wrote:
>> [...]
>>
>> >>
>> >> The reason why pm_runtime_force_* needs to respects the hierarchy of
>> >> the RPM callbacks, is because otherwise it can't safely update the
>> >> runtime PM status of the device.
>> >
>> > I'm not sure I follow this requirement.  Why is that so?
>>
>> If the PM domain controls some resources for the device in its RPM
>> callbacks and the driver controls some other resources in its RPM
>> callbacks - then these resources needs to be managed together.
>
> Right, but that doesn't automatically make it necessary to use runtime PM
> callbacks in the middle layer.  Its system-wide PM callbacks may be
> suitable for that just fine.
>
> That is, at least in some cases, you can combine ->runtime_suspend from a
> driver and ->suspend_late from a middle layer with no problems, for example.
>
> That's why some middle layers allow drivers to point ->suspend_late and
> ->runtime_suspend to the same routine if they want to reuse that code.
>
>> This follows the behavior of when a regular call to
>> pm_runtime_get|put(), triggers the RPM callbacks to be invoked.
>
> But (a) it doesn't have to follow it and (b) in some cases it should not
> follow it.

Of course you don't explicitly *have to* respect the hierarchy of the
RPM callbacks in pm_runtime_force_*.

However, changing that would require some additional information
exchange between the driver and the middle-layer/PM domain, as to
instruct the middle-layer/PM domain of what to do during system-wide
PM. Especially in cases when the driver deals with wakeup, as in those
cases the instructions may change dynamically.

[...]

>> > In general, not if the wakeup settings are adjusted by the middle layer.
>>
>> Correct!
>>
>> To use pm_runtime_force* for these cases, one would need some
>> additional information exchange between the driver and the
>> middle-layer.
>
> Which pretty much defeats the purpose of the wrappers, doesn't it?

Well, no matter if the wrappers are used or not, we need some kind of
information exchange between the driver and the middle-layers/PM
domains.

Anyway, me personally think it's too early to conclude that using the
wrappers may not be useful going forward. At this point, they clearly
helps trivial cases to remain being trivial.

>
>> >
>> >> Regarding hibernation, honestly that's not really my area of
>> >> expertise. Although, I assume the middle-layer and driver can treat
>> >> that as a separate case, so if it's not suitable to use
>> >> pm_runtime_force* for that case, then they shouldn't do it.
>> >
>> > Well, agreed.
>> >
>> > In some simple cases, though, driver callbacks can be reused for hibernation
>> > too, so it would be good to have a common way to do that too, IMO.
>>
>> Okay, that makes sense!
>>
>> >
>> >> >
>> >> > Also, quite so often other middle layers interact with PCI directly or
>> >> > indirectly (eg. a platform device may be a child or a consumer of a PCI
>> >> > device) and some optimizations need to take that into account (eg. parents
>> >> > generally need to be accessible when their childres are resumed and so on).
>> >>
>> >> A device's parent becomes informed when changing the runtime PM status
>> >> of the device via pm_runtime_force_suspend|resume(), as those calls
>> >> pm_runtime_set_suspended|active().
>> >
>> > This requires the parent driver or middle layer to look at the reference
>> > counter and understand it the same way as pm_runtime_force_*.
>> >
>> >> In case that isn't that sufficient, what else is needed? Perhaps you can
>> >> point me to an example so I can understand better?
>> >
>> > Say you want to leave the parent suspended after system resume, but the
>> > child drivers use pm_runtime_force_suspend|resume().  The parent would then
>> > need to use pm_runtime_force_suspend|resume() too, no?
>>
>> Actually no.
>>
>> Currently the other options of "deferring resume" (not using
>> pm_runtime_force_*), is either using the "direct_complete" path or
>> similar to the approach you took for the i2c designware driver.
>>
>> Both cases should play nicely in combination of a child being managed
>> by pm_runtime_force_*. That's because only when the parent device is
>> kept runtime suspended during system suspend, resuming can be
>> deferred.
>
> And because the parent remains in runtime suspend late enough in the
> system suspend path, its children also are guaranteed to be suspended.

Yes.

>
> But then all of them need to be left in runtime suspend during system
> resume too, which is somewhat restrictive, because some drivers may
> want their devices to be resumed then.

Actually, this scenario is also addressed when using the pm_runtime_force_*.

The driver for the child would only need to bump the runtime PM usage
count (pm_runtime_get_noresume()) before calling
pm_runtime_force_suspend() at system suspend. That then also
propagates to the parent, leading to that both the parent and the
child will be resumed when pm_runtime_force_resume() is called for
them.

Of course, if the driver of the parent isn't using pm_runtime_force_,
we would have to assume that it's always being resumed at system
resume.

As at matter of fact, doesn't this scenario actually indicates that we
do need to involve the runtime PM core (updating RPM status according
to the HW state even during system-wide PM) to really get this right.
It's not enough to only use "driver PM flags"!?

Seems like we need to create a list of all requirements, pitfalls,
good things vs bad things etc. :-)

>
> [BTW, our current documentation recommends resuming devices during
> system resume, actually, and gives a list of reasons why. :-)]

Yes, but that too easy and to me not good enough. :-)

[...]

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-19  8:33                   ` Ulf Hansson
@ 2017-10-19 17:21                     ` Grygorii Strashko
  2017-10-19 18:04                       ` Ulf Hansson
  0 siblings, 1 reply; 135+ messages in thread
From: Grygorii Strashko @ 2017-10-19 17:21 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Linux PM, Rafael J. Wysocki, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Andy Shevchenko,
	Kevin Hilman, Wolfram Sang, linux-i2c, Lee Jones



On 10/19/2017 03:33 AM, Ulf Hansson wrote:
> On 18 October 2017 at 23:48, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> On Wednesday, October 18, 2017 9:45:11 PM CEST Grygorii Strashko wrote:
>>>
>>> On 10/18/2017 09:11 AM, Ulf Hansson wrote:
>>
>> [...]
>>
>>>>>> That's the point. We know pm_runtime_force_* works nicely for the
>>>>>> trivial middle-layer cases.
>>>>>
>>>>> In which cases the middle-layer callbacks don't exist, so it's just like
>>>>> reusing driver callbacks directly. :-)
>>>
>>> I'd like to ask you clarify one point here and provide some info which I hope can be useful -
>>> what's exactly means  "trivial middle-layer cases"?
>>>
>>> Is it when systems use "drivers/base/power/clock_ops.c - Generic clock
>>> manipulation PM callbacks" as dev_pm_domain (arm davinci/keystone), or OMAP
>>> device framework struct dev_pm_domain omap_device_pm_domain
>>> (arm/mach-omap2/omap_device.c) or static const struct dev_pm_ops
>>> tegra_aconnect_pm_ops?
>>>
>>> if yes all above have PM runtime callbacks.
>>
>> Trivial ones don't actually do anything meaningful in their PM callbacks.
>>
>> Things like the platform bus type, spi bus type, i2c bus type and similar.
>>
>> If the middle-layer callbacks manipulate devices in a significant way, then
>> they aren't trivial.
> 
> I fully agree with Rafael's description above, but let me also clarify
> one more thing.
> 
> We have also been discussing PM domains as being trivial and
> non-trivial. In some statements I even think the PM domain has been a
> part the middle-layer terminology, which may have been a bit
> confusing.
> 
> In this regards as we consider genpd being a trivial PM domain, those
> examples your bring up above is too me also examples of trivial PM
> domains. Especially because they don't deal with wakeups, as that is
> taken care of by the drivers, right!?

Not directly, for example, omap device framework has noirq callback implemented
which forcibly disable all devices which are not PM runtime suspended.
while doing this it calls drivers PM .runtime_suspend() which may return
non 0 value and in this case device will be left enabled (powered) at suspend for
wake up purposes (see _od_suspend_noirq()).


-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-19 12:21                 ` Ulf Hansson
@ 2017-10-19 18:01                   ` Ulf Hansson
  2017-10-20  1:19                   ` Rafael J. Wysocki
  1 sibling, 0 replies; 135+ messages in thread
From: Ulf Hansson @ 2017-10-19 18:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

[...]

>>> > Say you want to leave the parent suspended after system resume, but the
>>> > child drivers use pm_runtime_force_suspend|resume().  The parent would then
>>> > need to use pm_runtime_force_suspend|resume() too, no?
>>>
>>> Actually no.
>>>
>>> Currently the other options of "deferring resume" (not using
>>> pm_runtime_force_*), is either using the "direct_complete" path or
>>> similar to the approach you took for the i2c designware driver.
>>>
>>> Both cases should play nicely in combination of a child being managed
>>> by pm_runtime_force_*. That's because only when the parent device is
>>> kept runtime suspended during system suspend, resuming can be
>>> deferred.
>>
>> And because the parent remains in runtime suspend late enough in the
>> system suspend path, its children also are guaranteed to be suspended.
>
> Yes.
>
>>
>> But then all of them need to be left in runtime suspend during system
>> resume too, which is somewhat restrictive, because some drivers may
>> want their devices to be resumed then.
>
> Actually, this scenario is also addressed when using the pm_runtime_force_*.
>
> The driver for the child would only need to bump the runtime PM usage
> count (pm_runtime_get_noresume()) before calling
> pm_runtime_force_suspend() at system suspend. That then also
> propagates to the parent, leading to that both the parent and the
> child will be resumed when pm_runtime_force_resume() is called for
> them.

I need to correct myself here. The above currently only works if the
child is runtime resumed while pm_runtime_force_suspend() is called.

The logic in pm_runtime_force_* needs to be improved to take care of
such scenarios. However I think that should be rather easy to fix, if
we want that.

[...]

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-19 17:21                     ` Grygorii Strashko
@ 2017-10-19 18:04                       ` Ulf Hansson
  2017-10-19 18:11                         ` Ulf Hansson
  0 siblings, 1 reply; 135+ messages in thread
From: Ulf Hansson @ 2017-10-19 18:04 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Linux PM, Rafael J. Wysocki, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Andy Shevchenko,
	Kevin Hilman, Wolfram Sang, linux-i2c, Lee Jones

On 19 October 2017 at 19:21, Grygorii Strashko <grygorii.strashko@ti.com> wrote:
>
>
> On 10/19/2017 03:33 AM, Ulf Hansson wrote:
>> On 18 October 2017 at 23:48, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>> On Wednesday, October 18, 2017 9:45:11 PM CEST Grygorii Strashko wrote:
>>>>
>>>> On 10/18/2017 09:11 AM, Ulf Hansson wrote:
>>>
>>> [...]
>>>
>>>>>>> That's the point. We know pm_runtime_force_* works nicely for the
>>>>>>> trivial middle-layer cases.
>>>>>>
>>>>>> In which cases the middle-layer callbacks don't exist, so it's just like
>>>>>> reusing driver callbacks directly. :-)
>>>>
>>>> I'd like to ask you clarify one point here and provide some info which I hope can be useful -
>>>> what's exactly means  "trivial middle-layer cases"?
>>>>
>>>> Is it when systems use "drivers/base/power/clock_ops.c - Generic clock
>>>> manipulation PM callbacks" as dev_pm_domain (arm davinci/keystone), or OMAP
>>>> device framework struct dev_pm_domain omap_device_pm_domain
>>>> (arm/mach-omap2/omap_device.c) or static const struct dev_pm_ops
>>>> tegra_aconnect_pm_ops?
>>>>
>>>> if yes all above have PM runtime callbacks.
>>>
>>> Trivial ones don't actually do anything meaningful in their PM callbacks.
>>>
>>> Things like the platform bus type, spi bus type, i2c bus type and similar.
>>>
>>> If the middle-layer callbacks manipulate devices in a significant way, then
>>> they aren't trivial.
>>
>> I fully agree with Rafael's description above, but let me also clarify
>> one more thing.
>>
>> We have also been discussing PM domains as being trivial and
>> non-trivial. In some statements I even think the PM domain has been a
>> part the middle-layer terminology, which may have been a bit
>> confusing.
>>
>> In this regards as we consider genpd being a trivial PM domain, those
>> examples your bring up above is too me also examples of trivial PM
>> domains. Especially because they don't deal with wakeups, as that is
>> taken care of by the drivers, right!?
>
> Not directly, for example, omap device framework has noirq callback implemented
> which forcibly disable all devices which are not PM runtime suspended.
> while doing this it calls drivers PM .runtime_suspend() which may return
> non 0 value and in this case device will be left enabled (powered) at suspend for
> wake up purposes (see _od_suspend_noirq()).
>

Yeah, I had that feeling that omap has some trickyness going on. :-)

I sure that can be fixed in the omap PM domain, although

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-19 18:04                       ` Ulf Hansson
@ 2017-10-19 18:11                         ` Ulf Hansson
  2017-10-19 21:31                           ` Grygorii Strashko
  0 siblings, 1 reply; 135+ messages in thread
From: Ulf Hansson @ 2017-10-19 18:11 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Linux PM, Rafael J. Wysocki, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Andy Shevchenko,
	Kevin Hilman, Wolfram Sang, linux-i2c, Lee Jones

On 19 October 2017 at 20:04, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On 19 October 2017 at 19:21, Grygorii Strashko <grygorii.strashko@ti.com> wrote:
>>
>>
>> On 10/19/2017 03:33 AM, Ulf Hansson wrote:
>>> On 18 October 2017 at 23:48, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>>> On Wednesday, October 18, 2017 9:45:11 PM CEST Grygorii Strashko wrote:
>>>>>
>>>>> On 10/18/2017 09:11 AM, Ulf Hansson wrote:
>>>>
>>>> [...]
>>>>
>>>>>>>> That's the point. We know pm_runtime_force_* works nicely for the
>>>>>>>> trivial middle-layer cases.
>>>>>>>
>>>>>>> In which cases the middle-layer callbacks don't exist, so it's just like
>>>>>>> reusing driver callbacks directly. :-)
>>>>>
>>>>> I'd like to ask you clarify one point here and provide some info which I hope can be useful -
>>>>> what's exactly means  "trivial middle-layer cases"?
>>>>>
>>>>> Is it when systems use "drivers/base/power/clock_ops.c - Generic clock
>>>>> manipulation PM callbacks" as dev_pm_domain (arm davinci/keystone), or OMAP
>>>>> device framework struct dev_pm_domain omap_device_pm_domain
>>>>> (arm/mach-omap2/omap_device.c) or static const struct dev_pm_ops
>>>>> tegra_aconnect_pm_ops?
>>>>>
>>>>> if yes all above have PM runtime callbacks.
>>>>
>>>> Trivial ones don't actually do anything meaningful in their PM callbacks.
>>>>
>>>> Things like the platform bus type, spi bus type, i2c bus type and similar.
>>>>
>>>> If the middle-layer callbacks manipulate devices in a significant way, then
>>>> they aren't trivial.
>>>
>>> I fully agree with Rafael's description above, but let me also clarify
>>> one more thing.
>>>
>>> We have also been discussing PM domains as being trivial and
>>> non-trivial. In some statements I even think the PM domain has been a
>>> part the middle-layer terminology, which may have been a bit
>>> confusing.
>>>
>>> In this regards as we consider genpd being a trivial PM domain, those
>>> examples your bring up above is too me also examples of trivial PM
>>> domains. Especially because they don't deal with wakeups, as that is
>>> taken care of by the drivers, right!?
>>
>> Not directly, for example, omap device framework has noirq callback implemented
>> which forcibly disable all devices which are not PM runtime suspended.
>> while doing this it calls drivers PM .runtime_suspend() which may return
>> non 0 value and in this case device will be left enabled (powered) at suspend for
>> wake up purposes (see _od_suspend_noirq()).
>>
>
> Yeah, I had that feeling that omap has some trickyness going on. :-)
>
> I sure that can be fixed in the omap PM domain, although

...slipped with my fingers.. here is the rest of the reply...

..of course that require us to use another way for drivers to signal
to the omap PM domain that it needs to stay powered as to deal with
wakeup.

I can have a look at that more closely, to see if it makes sense to change.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-19 18:11                         ` Ulf Hansson
@ 2017-10-19 21:31                           ` Grygorii Strashko
  2017-10-20  6:05                             ` Ulf Hansson
  0 siblings, 1 reply; 135+ messages in thread
From: Grygorii Strashko @ 2017-10-19 21:31 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Linux PM, Rafael J. Wysocki, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Andy Shevchenko,
	Kevin Hilman, Wolfram Sang, linux-i2c, Lee Jones



On 10/19/2017 01:11 PM, Ulf Hansson wrote:
> On 19 October 2017 at 20:04, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> On 19 October 2017 at 19:21, Grygorii Strashko <grygorii.strashko@ti.com> wrote:
>>>
>>>
>>> On 10/19/2017 03:33 AM, Ulf Hansson wrote:
>>>> On 18 October 2017 at 23:48, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>>>> On Wednesday, October 18, 2017 9:45:11 PM CEST Grygorii Strashko wrote:
>>>>>>
>>>>>> On 10/18/2017 09:11 AM, Ulf Hansson wrote:
>>>>>
>>>>> [...]
>>>>>
>>>>>>>>> That's the point. We know pm_runtime_force_* works nicely for the
>>>>>>>>> trivial middle-layer cases.
>>>>>>>>
>>>>>>>> In which cases the middle-layer callbacks don't exist, so it's just like
>>>>>>>> reusing driver callbacks directly. :-)
>>>>>>
>>>>>> I'd like to ask you clarify one point here and provide some info which I hope can be useful -
>>>>>> what's exactly means  "trivial middle-layer cases"?
>>>>>>
>>>>>> Is it when systems use "drivers/base/power/clock_ops.c - Generic clock
>>>>>> manipulation PM callbacks" as dev_pm_domain (arm davinci/keystone), or OMAP
>>>>>> device framework struct dev_pm_domain omap_device_pm_domain
>>>>>> (arm/mach-omap2/omap_device.c) or static const struct dev_pm_ops
>>>>>> tegra_aconnect_pm_ops?
>>>>>>
>>>>>> if yes all above have PM runtime callbacks.
>>>>>
>>>>> Trivial ones don't actually do anything meaningful in their PM callbacks.
>>>>>
>>>>> Things like the platform bus type, spi bus type, i2c bus type and similar.
>>>>>
>>>>> If the middle-layer callbacks manipulate devices in a significant way, then
>>>>> they aren't trivial.
>>>>
>>>> I fully agree with Rafael's description above, but let me also clarify
>>>> one more thing.
>>>>
>>>> We have also been discussing PM domains as being trivial and
>>>> non-trivial. In some statements I even think the PM domain has been a
>>>> part the middle-layer terminology, which may have been a bit
>>>> confusing.
>>>>
>>>> In this regards as we consider genpd being a trivial PM domain, those
>>>> examples your bring up above is too me also examples of trivial PM
>>>> domains. Especially because they don't deal with wakeups, as that is
>>>> taken care of by the drivers, right!?
>>>
>>> Not directly, for example, omap device framework has noirq callback implemented
>>> which forcibly disable all devices which are not PM runtime suspended.
>>> while doing this it calls drivers PM .runtime_suspend() which may return
>>> non 0 value and in this case device will be left enabled (powered) at suspend for
>>> wake up purposes (see _od_suspend_noirq()).
>>>
>>
>> Yeah, I had that feeling that omap has some trickyness going on. :-)
>>
>> I sure that can be fixed in the omap PM domain, although
> 
> ...slipped with my fingers.. here is the rest of the reply...
> 
> ..of course that require us to use another way for drivers to signal
> to the omap PM domain that it needs to stay powered as to deal with
> wakeup.
> 
> I can have a look at that more closely, to see if it makes sense to change.
> 

Also, additional note here. some IPs are reused between OMAP/Davinci/Keystone,
OMAP PM domain have some code running at noirq time to dial with devices left
in PM runtime enabled state (OMAP PM runtime centric), while Davinci/Keystone haven't (clock_ops.c),
so pm_runtime_force_* API is actually possibility now to make the same driver work 
 on all these platforms. 

-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-19 12:21                 ` Ulf Hansson
  2017-10-19 18:01                   ` Ulf Hansson
@ 2017-10-20  1:19                   ` Rafael J. Wysocki
  2017-10-20  5:57                     ` Ulf Hansson
  1 sibling, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-20  1:19 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On Thursday, October 19, 2017 2:21:07 PM CEST Ulf Hansson wrote:
> On 19 October 2017 at 00:12, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Wednesday, October 18, 2017 4:11:33 PM CEST Ulf Hansson wrote:
> >> [...]
> >>
> >> >>
> >> >> The reason why pm_runtime_force_* needs to respects the hierarchy of
> >> >> the RPM callbacks, is because otherwise it can't safely update the
> >> >> runtime PM status of the device.
> >> >
> >> > I'm not sure I follow this requirement.  Why is that so?
> >>
> >> If the PM domain controls some resources for the device in its RPM
> >> callbacks and the driver controls some other resources in its RPM
> >> callbacks - then these resources needs to be managed together.
> >
> > Right, but that doesn't automatically make it necessary to use runtime PM
> > callbacks in the middle layer.  Its system-wide PM callbacks may be
> > suitable for that just fine.
> >
> > That is, at least in some cases, you can combine ->runtime_suspend from a
> > driver and ->suspend_late from a middle layer with no problems, for example.
> >
> > That's why some middle layers allow drivers to point ->suspend_late and
> > ->runtime_suspend to the same routine if they want to reuse that code.
> >
> >> This follows the behavior of when a regular call to
> >> pm_runtime_get|put(), triggers the RPM callbacks to be invoked.
> >
> > But (a) it doesn't have to follow it and (b) in some cases it should not
> > follow it.
> 
> Of course you don't explicitly *have to* respect the hierarchy of the
> RPM callbacks in pm_runtime_force_*.
> 
> However, changing that would require some additional information
> exchange between the driver and the middle-layer/PM domain, as to
> instruct the middle-layer/PM domain of what to do during system-wide
> PM. Especially in cases when the driver deals with wakeup, as in those
> cases the instructions may change dynamically.

Well, if wakeup matters, drivers can't simply point their PM callbacks
to pm_runtime_force_* anyway.

If the driver itself deals with wakeups, it clearly needs different callback
routines for system-wide PM and for runtime PM, so it can't reuse its runtime
PM callbacks at all then.

If the middle layer deals with wakeups, different callbacks are needed at
that level and so pm_runtime_force_* are unsuitable too.

Really, invoking runtime PM callbacks from the middle layer in
pm_runtime_force_* is a not a idea at all and there's no general requirement
for it whatever.

> [...]
> 
> >> > In general, not if the wakeup settings are adjusted by the middle layer.
> >>
> >> Correct!
> >>
> >> To use pm_runtime_force* for these cases, one would need some
> >> additional information exchange between the driver and the
> >> middle-layer.
> >
> > Which pretty much defeats the purpose of the wrappers, doesn't it?
> 
> Well, no matter if the wrappers are used or not, we need some kind of
> information exchange between the driver and the middle-layers/PM
> domains.

Right.

But if that information is exchanged, then why use wrappers *in* *addition*
to that?

> Anyway, me personally think it's too early to conclude that using the
> wrappers may not be useful going forward. At this point, they clearly
> helps trivial cases to remain being trivial.

I'm not sure about that really.  So far I've seen more complexity resulting
from using them than being avoided by using them, but I guess the beauty is
in the eye of the beholder. :-)

> >
> >> >
> >> >> Regarding hibernation, honestly that's not really my area of
> >> >> expertise. Although, I assume the middle-layer and driver can treat
> >> >> that as a separate case, so if it's not suitable to use
> >> >> pm_runtime_force* for that case, then they shouldn't do it.
> >> >
> >> > Well, agreed.
> >> >
> >> > In some simple cases, though, driver callbacks can be reused for hibernation
> >> > too, so it would be good to have a common way to do that too, IMO.
> >>
> >> Okay, that makes sense!
> >>
> >> >
> >> >> >
> >> >> > Also, quite so often other middle layers interact with PCI directly or
> >> >> > indirectly (eg. a platform device may be a child or a consumer of a PCI
> >> >> > device) and some optimizations need to take that into account (eg. parents
> >> >> > generally need to be accessible when their childres are resumed and so on).
> >> >>
> >> >> A device's parent becomes informed when changing the runtime PM status
> >> >> of the device via pm_runtime_force_suspend|resume(), as those calls
> >> >> pm_runtime_set_suspended|active().
> >> >
> >> > This requires the parent driver or middle layer to look at the reference
> >> > counter and understand it the same way as pm_runtime_force_*.
> >> >
> >> >> In case that isn't that sufficient, what else is needed? Perhaps you can
> >> >> point me to an example so I can understand better?
> >> >
> >> > Say you want to leave the parent suspended after system resume, but the
> >> > child drivers use pm_runtime_force_suspend|resume().  The parent would then
> >> > need to use pm_runtime_force_suspend|resume() too, no?
> >>
> >> Actually no.
> >>
> >> Currently the other options of "deferring resume" (not using
> >> pm_runtime_force_*), is either using the "direct_complete" path or
> >> similar to the approach you took for the i2c designware driver.
> >>
> >> Both cases should play nicely in combination of a child being managed
> >> by pm_runtime_force_*. That's because only when the parent device is
> >> kept runtime suspended during system suspend, resuming can be
> >> deferred.
> >
> > And because the parent remains in runtime suspend late enough in the
> > system suspend path, its children also are guaranteed to be suspended.
> 
> Yes.
> 
> >
> > But then all of them need to be left in runtime suspend during system
> > resume too, which is somewhat restrictive, because some drivers may
> > want their devices to be resumed then.
> 
> Actually, this scenario is also addressed when using the pm_runtime_force_*.
> 
> The driver for the child would only need to bump the runtime PM usage
> count (pm_runtime_get_noresume()) before calling
> pm_runtime_force_suspend() at system suspend. That then also
> propagates to the parent, leading to that both the parent and the
> child will be resumed when pm_runtime_force_resume() is called for
> them.
> 
> Of course, if the driver of the parent isn't using pm_runtime_force_,
> we would have to assume that it's always being resumed at system
> resume.

There may be other ways to avoid that, though.

BTW, I don't quite like using the RPM usage counter this way either, if
that hasn't been clear so far.

> As at matter of fact, doesn't this scenario actually indicates that we
> do need to involve the runtime PM core (updating RPM status according
> to the HW state even during system-wide PM) to really get this right.
> It's not enough to only use "driver PM flags"!?

I'm not sure what you are talking about.

For all devices with enabled runtime PM any state produced by system
suspend/resume has to be labeled either as RPM_SUSPENDED or as RPM_ACTIVE.
That has always been the case and hasn't involved any magic.

However, while runtime PM is disabled, the state of the device doesn't
need to be reflected by its RPM status and there's no need to track it then.
Moreover, in some cases it cannot be tracked even, because of the firmare
involvement (and we cannot track the firmware).

Besides, please really look at what happens in the patches I posted and
then we can talk.

> Seems like we need to create a list of all requirements, pitfalls,
> good things vs bad things etc. :-)

We surely need to know what general cases need to be addressed.

> >
> > [BTW, our current documentation recommends resuming devices during
> > system resume, actually, and gives a list of reasons why. :-)]
> 
> Yes, but that too easy and to me not good enough. :-)

But the list of reasons why is kind of valid still.  There may be better
reasons for not doing that, but it really is a tradeoff and drivers
should be able to decide which way they want to go.

IOW, the "leave the device in runtime suspend throughout system
suspend" optimization doesn't have to be bundled with the "leave the
device in suspend throughout and after system resume" one.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-20  1:19                   ` Rafael J. Wysocki
@ 2017-10-20  5:57                     ` Ulf Hansson
  0 siblings, 0 replies; 135+ messages in thread
From: Ulf Hansson @ 2017-10-20  5:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On 20 October 2017 at 03:19, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Thursday, October 19, 2017 2:21:07 PM CEST Ulf Hansson wrote:
>> On 19 October 2017 at 00:12, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> > On Wednesday, October 18, 2017 4:11:33 PM CEST Ulf Hansson wrote:
>> >> [...]
>> >>
>> >> >>
>> >> >> The reason why pm_runtime_force_* needs to respects the hierarchy of
>> >> >> the RPM callbacks, is because otherwise it can't safely update the
>> >> >> runtime PM status of the device.
>> >> >
>> >> > I'm not sure I follow this requirement.  Why is that so?
>> >>
>> >> If the PM domain controls some resources for the device in its RPM
>> >> callbacks and the driver controls some other resources in its RPM
>> >> callbacks - then these resources needs to be managed together.
>> >
>> > Right, but that doesn't automatically make it necessary to use runtime PM
>> > callbacks in the middle layer.  Its system-wide PM callbacks may be
>> > suitable for that just fine.
>> >
>> > That is, at least in some cases, you can combine ->runtime_suspend from a
>> > driver and ->suspend_late from a middle layer with no problems, for example.
>> >
>> > That's why some middle layers allow drivers to point ->suspend_late and
>> > ->runtime_suspend to the same routine if they want to reuse that code.
>> >
>> >> This follows the behavior of when a regular call to
>> >> pm_runtime_get|put(), triggers the RPM callbacks to be invoked.
>> >
>> > But (a) it doesn't have to follow it and (b) in some cases it should not
>> > follow it.
>>
>> Of course you don't explicitly *have to* respect the hierarchy of the
>> RPM callbacks in pm_runtime_force_*.
>>
>> However, changing that would require some additional information
>> exchange between the driver and the middle-layer/PM domain, as to
>> instruct the middle-layer/PM domain of what to do during system-wide
>> PM. Especially in cases when the driver deals with wakeup, as in those
>> cases the instructions may change dynamically.
>
> Well, if wakeup matters, drivers can't simply point their PM callbacks
> to pm_runtime_force_* anyway.
>
> If the driver itself deals with wakeups, it clearly needs different callback
> routines for system-wide PM and for runtime PM, so it can't reuse its runtime
> PM callbacks at all then.

It can still re-use its runtime PM callbacks, simply by calling
pm_runtime_force_ from its system sleep callbacks.

Drivers already do that today, not only to deal with wakeups, but
generally when they need to deal with some additional operations.

>
> If the middle layer deals with wakeups, different callbacks are needed at
> that level and so pm_runtime_force_* are unsuitable too.
>
> Really, invoking runtime PM callbacks from the middle layer in
> pm_runtime_force_* is a not a idea at all and there's no general requirement
> for it whatever.
>
>> [...]
>>
>> >> > In general, not if the wakeup settings are adjusted by the middle layer.
>> >>
>> >> Correct!
>> >>
>> >> To use pm_runtime_force* for these cases, one would need some
>> >> additional information exchange between the driver and the
>> >> middle-layer.
>> >
>> > Which pretty much defeats the purpose of the wrappers, doesn't it?
>>
>> Well, no matter if the wrappers are used or not, we need some kind of
>> information exchange between the driver and the middle-layers/PM
>> domains.
>
> Right.
>
> But if that information is exchanged, then why use wrappers *in* *addition*
> to that?

If we can find a different method that both avoids both open coding
and offers the optimize system-wide PM path at resume, I am open to
that.

>
>> Anyway, me personally think it's too early to conclude that using the
>> wrappers may not be useful going forward. At this point, they clearly
>> helps trivial cases to remain being trivial.
>
> I'm not sure about that really.  So far I've seen more complexity resulting
> from using them than being avoided by using them, but I guess the beauty is
> in the eye of the beholder. :-)

Hehe, yeah you may be right. :-)

>
>> >
>> >> >
>> >> >> Regarding hibernation, honestly that's not really my area of
>> >> >> expertise. Although, I assume the middle-layer and driver can treat
>> >> >> that as a separate case, so if it's not suitable to use
>> >> >> pm_runtime_force* for that case, then they shouldn't do it.
>> >> >
>> >> > Well, agreed.
>> >> >
>> >> > In some simple cases, though, driver callbacks can be reused for hibernation
>> >> > too, so it would be good to have a common way to do that too, IMO.
>> >>
>> >> Okay, that makes sense!
>> >>
>> >> >
>> >> >> >
>> >> >> > Also, quite so often other middle layers interact with PCI directly or
>> >> >> > indirectly (eg. a platform device may be a child or a consumer of a PCI
>> >> >> > device) and some optimizations need to take that into account (eg. parents
>> >> >> > generally need to be accessible when their childres are resumed and so on).
>> >> >>
>> >> >> A device's parent becomes informed when changing the runtime PM status
>> >> >> of the device via pm_runtime_force_suspend|resume(), as those calls
>> >> >> pm_runtime_set_suspended|active().
>> >> >
>> >> > This requires the parent driver or middle layer to look at the reference
>> >> > counter and understand it the same way as pm_runtime_force_*.
>> >> >
>> >> >> In case that isn't that sufficient, what else is needed? Perhaps you can
>> >> >> point me to an example so I can understand better?
>> >> >
>> >> > Say you want to leave the parent suspended after system resume, but the
>> >> > child drivers use pm_runtime_force_suspend|resume().  The parent would then
>> >> > need to use pm_runtime_force_suspend|resume() too, no?
>> >>
>> >> Actually no.
>> >>
>> >> Currently the other options of "deferring resume" (not using
>> >> pm_runtime_force_*), is either using the "direct_complete" path or
>> >> similar to the approach you took for the i2c designware driver.
>> >>
>> >> Both cases should play nicely in combination of a child being managed
>> >> by pm_runtime_force_*. That's because only when the parent device is
>> >> kept runtime suspended during system suspend, resuming can be
>> >> deferred.
>> >
>> > And because the parent remains in runtime suspend late enough in the
>> > system suspend path, its children also are guaranteed to be suspended.
>>
>> Yes.
>>
>> >
>> > But then all of them need to be left in runtime suspend during system
>> > resume too, which is somewhat restrictive, because some drivers may
>> > want their devices to be resumed then.
>>
>> Actually, this scenario is also addressed when using the pm_runtime_force_*.
>>
>> The driver for the child would only need to bump the runtime PM usage
>> count (pm_runtime_get_noresume()) before calling
>> pm_runtime_force_suspend() at system suspend. That then also
>> propagates to the parent, leading to that both the parent and the
>> child will be resumed when pm_runtime_force_resume() is called for
>> them.
>>
>> Of course, if the driver of the parent isn't using pm_runtime_force_,
>> we would have to assume that it's always being resumed at system
>> resume.
>
> There may be other ways to avoid that, though.
>
> BTW, I don't quite like using the RPM usage counter this way either, if
> that hasn't been clear so far.
>
>> As at matter of fact, doesn't this scenario actually indicates that we
>> do need to involve the runtime PM core (updating RPM status according
>> to the HW state even during system-wide PM) to really get this right.
>> It's not enough to only use "driver PM flags"!?
>
> I'm not sure what you are talking about.
>
> For all devices with enabled runtime PM any state produced by system
> suspend/resume has to be labeled either as RPM_SUSPENDED or as RPM_ACTIVE.
> That has always been the case and hasn't involved any magic.
>
> However, while runtime PM is disabled, the state of the device doesn't
> need to be reflected by its RPM status and there's no need to track it then.
> Moreover, in some cases it cannot be tracked even, because of the firmare
> involvement (and we cannot track the firmware).
>
> Besides, please really look at what happens in the patches I posted and
> then we can talk.

Yes, I will have look.

>
>> Seems like we need to create a list of all requirements, pitfalls,
>> good things vs bad things etc. :-)
>
> We surely need to know what general cases need to be addressed.
>
>> >
>> > [BTW, our current documentation recommends resuming devices during
>> > system resume, actually, and gives a list of reasons why. :-)]
>>
>> Yes, but that too easy and to me not good enough. :-)
>
> But the list of reasons why is kind of valid still.  There may be better
> reasons for not doing that, but it really is a tradeoff and drivers
> should be able to decide which way they want to go.

Agree.

>
> IOW, the "leave the device in runtime suspend throughout system
> suspend" optimization doesn't have to be bundled with the "leave the
> device in suspend throughout and after system resume" one.

Agree.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-19 21:31                           ` Grygorii Strashko
@ 2017-10-20  6:05                             ` Ulf Hansson
  0 siblings, 0 replies; 135+ messages in thread
From: Ulf Hansson @ 2017-10-20  6:05 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Linux PM, Rafael J. Wysocki, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Andy Shevchenko,
	Kevin Hilman, Wolfram Sang, linux-i2c, Lee Jones

[...]

>>>>> In this regards as we consider genpd being a trivial PM domain, those
>>>>> examples your bring up above is too me also examples of trivial PM
>>>>> domains. Especially because they don't deal with wakeups, as that is
>>>>> taken care of by the drivers, right!?
>>>>
>>>> Not directly, for example, omap device framework has noirq callback implemented
>>>> which forcibly disable all devices which are not PM runtime suspended.
>>>> while doing this it calls drivers PM .runtime_suspend() which may return
>>>> non 0 value and in this case device will be left enabled (powered) at suspend for
>>>> wake up purposes (see _od_suspend_noirq()).
>>>>
>>>
>>> Yeah, I had that feeling that omap has some trickyness going on. :-)
>>>
>>> I sure that can be fixed in the omap PM domain, although
>>
>> ...slipped with my fingers.. here is the rest of the reply...
>>
>> ..of course that require us to use another way for drivers to signal
>> to the omap PM domain that it needs to stay powered as to deal with
>> wakeup.
>>
>> I can have a look at that more closely, to see if it makes sense to change.
>>
>
> Also, additional note here. some IPs are reused between OMAP/Davinci/Keystone,
> OMAP PM domain have some code running at noirq time to dial with devices left
> in PM runtime enabled state (OMAP PM runtime centric), while Davinci/Keystone haven't (clock_ops.c),
> so pm_runtime_force_* API is actually possibility now to make the same driver work
>  on all these platforms.

That sounds great!

Also, in the end it would be nice to also convert the OMAP PM domain
to genpd. I think most of the needed infrastructure is already there
to do that.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [Update][PATCH v2 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-19  7:33     ` Greg Kroah-Hartman
@ 2017-10-20 11:11       ` Rafael J. Wysocki
  2017-10-20 11:35         ` Greg Kroah-Hartman
  0 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-20 11:11 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linux PM, Lukas Wunner, Bjorn Helgaas, Alan Stern, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Ulf Hansson, Andy Shevchenko, Kevin Hilman, Wolfram Sang,
	linux-i2c, Lee Jones

On Thursday, October 19, 2017 9:33:15 AM CEST Greg Kroah-Hartman wrote:
> On Thu, Oct 19, 2017 at 01:17:31AM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > The motivation for this change is to provide a way to work around
> > a problem with the direct-complete mechanism used for avoiding
> > system suspend/resume handling for devices in runtime suspend.
> > 
> > The problem is that some middle layer code (the PCI bus type and
> > the ACPI PM domain in particular) returns positive values from its
> > system suspend ->prepare callbacks regardless of whether the driver's
> > ->prepare returns a positive value or 0, which effectively prevents
> > drivers from being able to control the direct-complete feature.
> > Some drivers need that control, however, and the PCI bus type has
> > grown its own flag to deal with this issue, but since it is not
> > limited to PCI, it is better to address it by adding driver flags at
> > the core level.
> > 
> > To that end, add a driver_flags field to struct dev_pm_info for flags
> > that can be set by device drivers at the probe time to inform the PM
> > core and/or bus types, PM domains and so on on the capabilities and/or
> > preferences of device drivers.  Also add two static inline helpers
> > for setting that field and testing it against a given set of flags
> > and make the driver core clear it automatically on driver remove
> > and probe failures.
> > 
> > Define and document two PM driver flags related to the direct-
> > complete feature: NEVER_SKIP and SMART_PREPARE that can be used,
> > respectively, to indicate to the PM core that the direct-complete
> > mechanism should never be used for the device and to inform the
> > middle layer code (bus types, PM domains etc) that it can only
> > request the PM core to use the direct-complete mechanism for
> > the device (by returning a positive value from its ->prepare
> > callback) if it also has been requested by the driver.
> > 
> > While at it, make the core check pm_runtime_suspended() when
> > setting power.direct_complete so that it doesn't need to be
> > checked by ->prepare callbacks.
> > 
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Thanks!

Does it also apply to the other patches in the series?

I'd like to queue up the core patches for 4.15 as they are specifically
designed to only affect the drivers that actually set the flags, so there
shouldn't be any regression resulting from them, and I'd like to be
able to start using the flags in drivers going forward.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [Update][PATCH v2 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-20 11:35         ` Greg Kroah-Hartman
@ 2017-10-20 11:28           ` Rafael J. Wysocki
  0 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-20 11:28 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linux PM, Lukas Wunner, Bjorn Helgaas, Alan Stern, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Ulf Hansson, Andy Shevchenko, Kevin Hilman, Wolfram Sang,
	linux-i2c, Lee Jones

On Friday, October 20, 2017 1:35:27 PM CEST Greg Kroah-Hartman wrote:
> On Fri, Oct 20, 2017 at 01:11:22PM +0200, Rafael J. Wysocki wrote:
> > On Thursday, October 19, 2017 9:33:15 AM CEST Greg Kroah-Hartman wrote:
> > > On Thu, Oct 19, 2017 at 01:17:31AM +0200, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > 
> > > > The motivation for this change is to provide a way to work around
> > > > a problem with the direct-complete mechanism used for avoiding
> > > > system suspend/resume handling for devices in runtime suspend.
> > > > 
> > > > The problem is that some middle layer code (the PCI bus type and
> > > > the ACPI PM domain in particular) returns positive values from its
> > > > system suspend ->prepare callbacks regardless of whether the driver's
> > > > ->prepare returns a positive value or 0, which effectively prevents
> > > > drivers from being able to control the direct-complete feature.
> > > > Some drivers need that control, however, and the PCI bus type has
> > > > grown its own flag to deal with this issue, but since it is not
> > > > limited to PCI, it is better to address it by adding driver flags at
> > > > the core level.
> > > > 
> > > > To that end, add a driver_flags field to struct dev_pm_info for flags
> > > > that can be set by device drivers at the probe time to inform the PM
> > > > core and/or bus types, PM domains and so on on the capabilities and/or
> > > > preferences of device drivers.  Also add two static inline helpers
> > > > for setting that field and testing it against a given set of flags
> > > > and make the driver core clear it automatically on driver remove
> > > > and probe failures.
> > > > 
> > > > Define and document two PM driver flags related to the direct-
> > > > complete feature: NEVER_SKIP and SMART_PREPARE that can be used,
> > > > respectively, to indicate to the PM core that the direct-complete
> > > > mechanism should never be used for the device and to inform the
> > > > middle layer code (bus types, PM domains etc) that it can only
> > > > request the PM core to use the direct-complete mechanism for
> > > > the device (by returning a positive value from its ->prepare
> > > > callback) if it also has been requested by the driver.
> > > > 
> > > > While at it, make the core check pm_runtime_suspended() when
> > > > setting power.direct_complete so that it doesn't need to be
> > > > checked by ->prepare callbacks.
> > > > 
> > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > 
> > > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > 
> > Thanks!
> > 
> > Does it also apply to the other patches in the series?
> > 
> > I'd like to queue up the core patches for 4.15 as they are specifically
> > designed to only affect the drivers that actually set the flags, so there
> > shouldn't be any regression resulting from them, and I'd like to be
> > able to start using the flags in drivers going forward.
> 
> Yes, sorry, I thought I acked them, but you are right, I didn't:
> 
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> for all of them please.

Thanks!

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [Update][PATCH v2 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-20 11:11       ` Rafael J. Wysocki
@ 2017-10-20 11:35         ` Greg Kroah-Hartman
  2017-10-20 11:28           ` Rafael J. Wysocki
  0 siblings, 1 reply; 135+ messages in thread
From: Greg Kroah-Hartman @ 2017-10-20 11:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Lukas Wunner, Bjorn Helgaas, Alan Stern, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Ulf Hansson, Andy Shevchenko, Kevin Hilman, Wolfram Sang,
	linux-i2c, Lee Jones

On Fri, Oct 20, 2017 at 01:11:22PM +0200, Rafael J. Wysocki wrote:
> On Thursday, October 19, 2017 9:33:15 AM CEST Greg Kroah-Hartman wrote:
> > On Thu, Oct 19, 2017 at 01:17:31AM +0200, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > 
> > > The motivation for this change is to provide a way to work around
> > > a problem with the direct-complete mechanism used for avoiding
> > > system suspend/resume handling for devices in runtime suspend.
> > > 
> > > The problem is that some middle layer code (the PCI bus type and
> > > the ACPI PM domain in particular) returns positive values from its
> > > system suspend ->prepare callbacks regardless of whether the driver's
> > > ->prepare returns a positive value or 0, which effectively prevents
> > > drivers from being able to control the direct-complete feature.
> > > Some drivers need that control, however, and the PCI bus type has
> > > grown its own flag to deal with this issue, but since it is not
> > > limited to PCI, it is better to address it by adding driver flags at
> > > the core level.
> > > 
> > > To that end, add a driver_flags field to struct dev_pm_info for flags
> > > that can be set by device drivers at the probe time to inform the PM
> > > core and/or bus types, PM domains and so on on the capabilities and/or
> > > preferences of device drivers.  Also add two static inline helpers
> > > for setting that field and testing it against a given set of flags
> > > and make the driver core clear it automatically on driver remove
> > > and probe failures.
> > > 
> > > Define and document two PM driver flags related to the direct-
> > > complete feature: NEVER_SKIP and SMART_PREPARE that can be used,
> > > respectively, to indicate to the PM core that the direct-complete
> > > mechanism should never be used for the device and to inform the
> > > middle layer code (bus types, PM domains etc) that it can only
> > > request the PM core to use the direct-complete mechanism for
> > > the device (by returning a positive value from its ->prepare
> > > callback) if it also has been requested by the driver.
> > > 
> > > While at it, make the core check pm_runtime_suspended() when
> > > setting power.direct_complete so that it doesn't need to be
> > > checked by ->prepare callbacks.
> > > 
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> Thanks!
> 
> Does it also apply to the other patches in the series?
> 
> I'd like to queue up the core patches for 4.15 as they are specifically
> designed to only affect the drivers that actually set the flags, so there
> shouldn't be any regression resulting from them, and I'd like to be
> able to start using the flags in drivers going forward.

Yes, sorry, I thought I acked them, but you are right, I didn't:

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

for all of them please.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-16  1:12 [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Rafael J. Wysocki
                   ` (13 preceding siblings ...)
  2017-10-17  8:36 ` Ulf Hansson
@ 2017-10-20 20:46 ` Bjorn Helgaas
  2017-10-21  1:04   ` Rafael J. Wysocki
  2017-10-27 22:11 ` [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 1) Rafael J. Wysocki
  15 siblings, 1 reply; 135+ messages in thread
From: Bjorn Helgaas @ 2017-10-20 20:46 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Ulf Hansson, Andy Shevchenko, Kevin Hilman, Wolfram Sang,
	linux-i2c, Lee Jones

On Mon, Oct 16, 2017 at 03:12:35AM +0200, Rafael J. Wysocki wrote:
> Hi All,
> 
> Well, this took more time than expected, as I tried to cover everything I had
> in mind regarding PM flags for drivers.

For the parts that touch PCI,

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

I doubt there'll be conflicts with changes in my tree, but let me know if
you trip over any so I can watch for them when merging.

Bjorn

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume
  2017-10-20 20:46 ` Bjorn Helgaas
@ 2017-10-21  1:04   ` Rafael J. Wysocki
  0 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-21  1:04 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Ulf Hansson, Andy Shevchenko, Kevin Hilman, Wolfram Sang,
	linux-i2c, Lee Jones

On Friday, October 20, 2017 10:46:07 PM CEST Bjorn Helgaas wrote:
> On Mon, Oct 16, 2017 at 03:12:35AM +0200, Rafael J. Wysocki wrote:
> > Hi All,
> > 
> > Well, this took more time than expected, as I tried to cover everything I had
> > in mind regarding PM flags for drivers.
> 
> For the parts that touch PCI,
> 
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>

Thank you!

> I doubt there'll be conflicts with changes in my tree, but let me know if
> you trip over any so I can watch for them when merging.

Well, if there are any conflicts, we'll see them in linux-next I guess. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [Update][PATCH v2 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-18 23:17   ` [Update][PATCH v2 " Rafael J. Wysocki
  2017-10-19  7:33     ` Greg Kroah-Hartman
@ 2017-10-23 16:37     ` Ulf Hansson
  2017-10-23 20:41       ` Rafael J. Wysocki
  1 sibling, 1 reply; 135+ messages in thread
From: Ulf Hansson @ 2017-10-23 16:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Greg Kroah-Hartman, Lukas Wunner, Bjorn Helgaas,
	Alan Stern, LKML, Linux ACPI, Linux PCI, Linux Documentation,
	Mika Westerberg, Andy Shevchenko, Kevin Hilman, Wolfram Sang,
	linux-i2c, Lee Jones

On 19 October 2017 at 01:17, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The motivation for this change is to provide a way to work around
> a problem with the direct-complete mechanism used for avoiding
> system suspend/resume handling for devices in runtime suspend.
>
> The problem is that some middle layer code (the PCI bus type and
> the ACPI PM domain in particular) returns positive values from its
> system suspend ->prepare callbacks regardless of whether the driver's
> ->prepare returns a positive value or 0, which effectively prevents
> drivers from being able to control the direct-complete feature.
> Some drivers need that control, however, and the PCI bus type has
> grown its own flag to deal with this issue, but since it is not
> limited to PCI, it is better to address it by adding driver flags at
> the core level.
>
> To that end, add a driver_flags field to struct dev_pm_info for flags
> that can be set by device drivers at the probe time to inform the PM
> core and/or bus types, PM domains and so on on the capabilities and/or
> preferences of device drivers.  Also add two static inline helpers
> for setting that field and testing it against a given set of flags
> and make the driver core clear it automatically on driver remove
> and probe failures.
>
> Define and document two PM driver flags related to the direct-
> complete feature: NEVER_SKIP and SMART_PREPARE that can be used,
> respectively, to indicate to the PM core that the direct-complete
> mechanism should never be used for the device and to inform the
> middle layer code (bus types, PM domains etc) that it can only
> request the PM core to use the direct-complete mechanism for
> the device (by returning a positive value from its ->prepare
> callback) if it also has been requested by the driver.
>
> While at it, make the core check pm_runtime_suspended() when
> setting power.direct_complete so that it doesn't need to be
> checked by ->prepare callbacks.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> -> v2: Change the data type for driver_flags to u32 as suggested by Greg
>        and fix a couple of documentation typos pointed out by Lukas.
>
> ---
>  Documentation/driver-api/pm/devices.rst |   14 ++++++++++++++
>  Documentation/power/pci.txt             |   19 +++++++++++++++++++
>  drivers/acpi/device_pm.c                |    3 +++
>  drivers/base/dd.c                       |    2 ++
>  drivers/base/power/main.c               |    4 +++-
>  drivers/pci/pci-driver.c                |    5 ++++-
>  include/linux/device.h                  |   10 ++++++++++
>  include/linux/pm.h                      |   20 ++++++++++++++++++++
>  8 files changed, 75 insertions(+), 2 deletions(-)
>
> Index: linux-pm/include/linux/device.h
> ===================================================================
> --- linux-pm.orig/include/linux/device.h
> +++ linux-pm/include/linux/device.h
> @@ -1070,6 +1070,16 @@ static inline void dev_pm_syscore_device
>  #endif
>  }
>
> +static inline void dev_pm_set_driver_flags(struct device *dev, u32 flags)
> +{
> +       dev->power.driver_flags = flags;
> +}
> +
> +static inline bool dev_pm_test_driver_flags(struct device *dev, u32 flags)
> +{
> +       return !!(dev->power.driver_flags & flags);
> +}
> +
>  static inline void device_lock(struct device *dev)
>  {
>         mutex_lock(&dev->mutex);
> Index: linux-pm/include/linux/pm.h
> ===================================================================
> --- linux-pm.orig/include/linux/pm.h
> +++ linux-pm/include/linux/pm.h
> @@ -550,6 +550,25 @@ struct pm_subsys_data {
>  #endif
>  };
>
> +/*
> + * Driver flags to control system suspend/resume behavior.
> + *
> + * These flags can be set by device drivers at the probe time.  They need not be
> + * cleared by the drivers as the driver core will take care of that.
> + *
> + * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
> + * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
> + *
> + * Setting SMART_PREPARE instructs bus types and PM domains which may want
> + * system suspend/resume callbacks to be skipped for the device to return 0 from
> + * their ->prepare callbacks if the driver's ->prepare callback returns 0 (in
> + * other words, the system suspend/resume callbacks can only be skipped for the
> + * device if its driver doesn't object against that).  This flag has no effect
> + * if NEVER_SKIP is set.

In principle ACPI/PCI middle-layer/PM domain could have started out by
respecting the return values from driver's ->prepare() callbacks in
case those existed, but they didn't, and that is the reason to why the
SMART_PREPARE is needed. Right?

My point is, I don't think we should encourage other middle-layer to
support the SMART_PREPARE flag, simply because they should be able to
cope without it. To make this more obvious we could try to find a
different name of the flag indicating that, or at least make it clear
that we don't want it to be used by others than ACPI/PCI via
documenting that.

> + */
> +#define DPM_FLAG_NEVER_SKIP    BIT(0)
> +#define DPM_FLAG_SMART_PREPARE BIT(1)
> +
>  struct dev_pm_info {
>         pm_message_t            power_state;
>         unsigned int            can_wakeup:1;
> @@ -561,6 +580,7 @@ struct dev_pm_info {
>         bool                    is_late_suspended:1;
>         bool                    early_init:1;   /* Owned by the PM core */
>         bool                    direct_complete:1;      /* Owned by the PM core */
> +       u32                     driver_flags;
>         spinlock_t              lock;
>  #ifdef CONFIG_PM_SLEEP
>         struct list_head        entry;
> Index: linux-pm/drivers/base/dd.c
> ===================================================================
> --- linux-pm.orig/drivers/base/dd.c
> +++ linux-pm/drivers/base/dd.c
> @@ -464,6 +464,7 @@ pinctrl_bind_failed:
>         if (dev->pm_domain && dev->pm_domain->dismiss)
>                 dev->pm_domain->dismiss(dev);
>         pm_runtime_reinit(dev);
> +       dev_pm_set_driver_flags(dev, 0);
>
>         switch (ret) {
>         case -EPROBE_DEFER:
> @@ -869,6 +870,7 @@ static void __device_release_driver(stru
>                 if (dev->pm_domain && dev->pm_domain->dismiss)
>                         dev->pm_domain->dismiss(dev);
>                 pm_runtime_reinit(dev);
> +               dev_pm_set_driver_flags(dev, 0);
>
>                 klist_remove(&dev->p->knode_driver);
>                 device_pm_check_callbacks(dev);
> Index: linux-pm/drivers/base/power/main.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/main.c
> +++ linux-pm/drivers/base/power/main.c
> @@ -1700,7 +1700,9 @@ unlock:
>          * applies to suspend transitions, however.
>          */
>         spin_lock_irq(&dev->power.lock);
> -       dev->power.direct_complete = ret > 0 && state.event == PM_EVENT_SUSPEND;
> +       dev->power.direct_complete = state.event == PM_EVENT_SUSPEND &&
> +               pm_runtime_suspended(dev) && ret > 0 &&
> +               !dev_pm_test_driver_flags(dev, DPM_FLAG_NEVER_SKIP);
>         spin_unlock_irq(&dev->power.lock);
>         return 0;
>  }
> Index: linux-pm/drivers/pci/pci-driver.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci-driver.c
> +++ linux-pm/drivers/pci/pci-driver.c
> @@ -682,8 +682,11 @@ static int pci_pm_prepare(struct device
>
>         if (drv && drv->pm && drv->pm->prepare) {
>                 int error = drv->pm->prepare(dev);
> -               if (error)
> +               if (error < 0)
>                         return error;
> +
> +               if (!error && dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_PREPARE))
> +                       return 0;
>         }
>         return pci_dev_keep_suspended(to_pci_dev(dev));
>  }
> Index: linux-pm/drivers/acpi/device_pm.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/device_pm.c
> +++ linux-pm/drivers/acpi/device_pm.c
> @@ -965,6 +965,9 @@ int acpi_subsys_prepare(struct device *d
>         if (ret < 0)
>                 return ret;
>
> +       if (!ret && dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_PREPARE))
> +               return 0;

So if the driver don't implement the ->prepare() callback, you still
want to treat this flag as it has one assigned and that it returns 0?

It seems not entirely according to what you have documented about the flag.

> +
>         if (!adev || !pm_runtime_suspended(dev))
>                 return 0;
>
> Index: linux-pm/Documentation/driver-api/pm/devices.rst
> ===================================================================
> --- linux-pm.orig/Documentation/driver-api/pm/devices.rst
> +++ linux-pm/Documentation/driver-api/pm/devices.rst
> @@ -354,6 +354,20 @@ the phases are: ``prepare``, ``suspend``
>         is because all such devices are initially set to runtime-suspended with
>         runtime PM disabled.
>
> +       This feature also can be controlled by device drivers by using the
> +       ``DPM_FLAG_NEVER_SKIP`` and ``DPM_FLAG_SMART_PREPARE`` driver power
> +       management flags.  [Typically, they are set at the time the driver is
> +       probed against the device in question by passing them to the
> +       :c:func:`dev_pm_set_driver_flags` helper function.]  If the first of
> +       these flags is set, the PM core will not apply the direct-complete
> +       procedure described above to the given device and, consequenty, to any
> +       of its ancestors.  The second flag, when set, informs the middle layer
> +       code (bus types, device types, PM domains, classes) that it should take
> +       the return value of the ``->prepare`` callback provided by the driver
> +       into account and it may only return a positive value from its own
> +       ``->prepare`` callback if the driver's one also has returned a positive
> +       value.
> +
>      2. The ``->suspend`` methods should quiesce the device to stop it from
>         performing I/O.  They also may save the device registers and put it into
>         the appropriate low-power state, depending on the bus type the device is
> Index: linux-pm/Documentation/power/pci.txt
> ===================================================================
> --- linux-pm.orig/Documentation/power/pci.txt
> +++ linux-pm/Documentation/power/pci.txt
> @@ -961,6 +961,25 @@ dev_pm_ops to indicate that one suspend
>  .suspend(), .freeze(), and .poweroff() members and one resume routine is to
>  be pointed to by the .resume(), .thaw(), and .restore() members.
>
> +3.1.19. Driver Flags for Power Management
> +
> +The PM core allows device drivers to set flags that influence the handling of
> +power management for the devices by the core itself and by middle layer code
> +including the PCI bus type.  The flags should be set once at the driver probe
> +time with the help of the dev_pm_set_driver_flags() function and they should not
> +be updated directly afterwards.

I am wondering if we really need to make a statement generic to all
"driver PM flags" that these flags must be set at ->probe(). Maybe
that is better documented per flag, rather than for all. The reason
why I bring it up, is that I would not be surprised if a new flag
comes a long and which may be used a bit differently, not requiring
that.

Of course we can also update that later on, if needed.

> +
> +The DPM_FLAG_NEVER_SKIP flag prevents the PM core from using the direct-complete
> +mechanism allowing device suspend/resume callbacks to be skipped if the device
> +is in runtime suspend when the system suspend starts.  That also affects all of
> +the ancestors of the device, so this flag should only be used if absolutely
> +necessary.
> +
> +The DPM_FLAG_SMART_PREPARE flag instructs the PCI bus type to only return a
> +positive value from pci_pm_prepare() if the ->prepare callback provided by the
> +driver of the device returns a positive value.  That allows the driver to opt
> +out from using the direct-complete mechanism dynamically.
> +
>  3.2. Device Runtime Power Management
>  ------------------------------------
>  In addition to providing device power management callbacks PCI device drivers
>

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 02/12] PCI / PM: Use the NEVER_SKIP driver flag
  2017-10-16  1:29 ` [PATCH 02/12] PCI / PM: Use the NEVER_SKIP driver flag Rafael J. Wysocki
@ 2017-10-23 16:40   ` Ulf Hansson
  0 siblings, 0 replies; 135+ messages in thread
From: Ulf Hansson @ 2017-10-23 16:40 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On 16 October 2017 at 03:29, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Replace the PCI-specific flag PCI_DEV_FLAGS_NEEDS_RESUME with the
> PM core's DPM_FLAG_NEVER_SKIP one everywhere and drop it.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

> ---
>  drivers/gpu/drm/i915/i915_drv.c |    2 +-
>  drivers/misc/mei/pci-me.c       |    2 +-
>  drivers/misc/mei/pci-txe.c      |    2 +-
>  drivers/pci/pci.c               |    3 +--
>  include/linux/pci.h             |    7 +------
>  5 files changed, 5 insertions(+), 11 deletions(-)
>
> Index: linux-pm/include/linux/pci.h
> ===================================================================
> --- linux-pm.orig/include/linux/pci.h
> +++ linux-pm/include/linux/pci.h
> @@ -205,13 +205,8 @@ enum pci_dev_flags {
>         PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT = (__force pci_dev_flags_t) (1 << 9),
>         /* Do not use FLR even if device advertises PCI_AF_CAP */
>         PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10),
> -       /*
> -        * Resume before calling the driver's system suspend hooks, disabling
> -        * the direct_complete optimization.
> -        */
> -       PCI_DEV_FLAGS_NEEDS_RESUME = (__force pci_dev_flags_t) (1 << 11),
>         /* Don't use Relaxed Ordering for TLPs directed at this device */
> -       PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 12),
> +       PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 11),
>  };
>
>  enum pci_irq_reroute_variant {
> Index: linux-pm/drivers/pci/pci.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci.c
> +++ linux-pm/drivers/pci/pci.c
> @@ -2166,8 +2166,7 @@ bool pci_dev_keep_suspended(struct pci_d
>
>         if (!pm_runtime_suspended(dev)
>             || pci_target_state(pci_dev, wakeup) != pci_dev->current_state
> -           || platform_pci_need_resume(pci_dev)
> -           || (pci_dev->dev_flags & PCI_DEV_FLAGS_NEEDS_RESUME))
> +           || platform_pci_need_resume(pci_dev))
>                 return false;
>
>         /*
> Index: linux-pm/drivers/gpu/drm/i915/i915_drv.c
> ===================================================================
> --- linux-pm.orig/drivers/gpu/drm/i915/i915_drv.c
> +++ linux-pm/drivers/gpu/drm/i915/i915_drv.c
> @@ -1304,7 +1304,7 @@ int i915_driver_load(struct pci_dev *pde
>          * becaue the HDA driver may require us to enable the audio power
>          * domain during system suspend.
>          */
> -       pdev->dev_flags |= PCI_DEV_FLAGS_NEEDS_RESUME;
> +       dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP);
>
>         ret = i915_driver_init_early(dev_priv, ent);
>         if (ret < 0)
> Index: linux-pm/drivers/misc/mei/pci-txe.c
> ===================================================================
> --- linux-pm.orig/drivers/misc/mei/pci-txe.c
> +++ linux-pm/drivers/misc/mei/pci-txe.c
> @@ -141,7 +141,7 @@ static int mei_txe_probe(struct pci_dev
>          * MEI requires to resume from runtime suspend mode
>          * in order to perform link reset flow upon system suspend.
>          */
> -       pdev->dev_flags |= PCI_DEV_FLAGS_NEEDS_RESUME;
> +       dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP);
>
>         /*
>         * For not wake-able HW runtime pm framework
> Index: linux-pm/drivers/misc/mei/pci-me.c
> ===================================================================
> --- linux-pm.orig/drivers/misc/mei/pci-me.c
> +++ linux-pm/drivers/misc/mei/pci-me.c
> @@ -223,7 +223,7 @@ static int mei_me_probe(struct pci_dev *
>          * MEI requires to resume from runtime suspend mode
>          * in order to perform link reset flow upon system suspend.
>          */
> -       pdev->dev_flags |= PCI_DEV_FLAGS_NEEDS_RESUME;
> +       dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP);
>
>         /*
>         * For not wake-able HW runtime pm framework
>

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 03/12] PM: i2c-designware-platdrv: Use DPM_FLAG_SMART_PREPARE
  2017-10-16  1:29 ` [PATCH 03/12] PM: i2c-designware-platdrv: Use DPM_FLAG_SMART_PREPARE Rafael J. Wysocki
@ 2017-10-23 16:57   ` Ulf Hansson
  0 siblings, 0 replies; 135+ messages in thread
From: Ulf Hansson @ 2017-10-23 16:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On 16 October 2017 at 03:29, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Modify i2c-designware-platdrv to set DPM_FLAG_SMART_PREPARE for its
> devices and return 0 from the system suspend ->prepare callback
> if the device has an ACPI companion object in order to tell the PM
> core and middle layers to avoid skipping system suspend/resume
> callbacks for the device in that case (which may be problematic,
> because the device may be accessed during suspend and resume of
> other devices via I2C operation regions then).
>
> Also the pm_runtime_suspended() check in dw_i2c_plat_prepare()
> is not necessary any more, because the core does it when setting
> power.direct_complete for the device, so drop it.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/i2c/busses/i2c-designware-platdrv.c |   10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> Index: linux-pm/drivers/i2c/busses/i2c-designware-platdrv.c
> ===================================================================
> --- linux-pm.orig/drivers/i2c/busses/i2c-designware-platdrv.c
> +++ linux-pm/drivers/i2c/busses/i2c-designware-platdrv.c
> @@ -370,6 +370,8 @@ static int dw_i2c_plat_probe(struct plat
>         ACPI_COMPANION_SET(&adap->dev, ACPI_COMPANION(&pdev->dev));
>         adap->dev.of_node = pdev->dev.of_node;
>
> +       dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_SMART_PREPARE);
> +
>         /* The code below assumes runtime PM to be disabled. */
>         WARN_ON(pm_runtime_enabled(&pdev->dev));
>
> @@ -433,7 +435,13 @@ MODULE_DEVICE_TABLE(of, dw_i2c_of_match)
>  #ifdef CONFIG_PM_SLEEP
>  static int dw_i2c_plat_prepare(struct device *dev)
>  {
> -       return pm_runtime_suspended(dev);
> +       /*
> +        * If the ACPI companion device object is present for this device, it
> +        * may be accessed during suspend and resume of other devices via I2C
> +        * operation regions, so tell the PM core and middle layers to avoid
> +        * skipping system suspend/resume callbacks for it in that case.
> +        */

The above scenario can also happens for non-acpi companion devices.
That makes this comment a bit confusing to me.

> +       return !has_acpi_companion(dev);

I understand it still works by always returning 1 for the non-acpi
case, because the PM core deals with it for the direct_complete path.
However it looks rather odd, especially due to the comment above.

Perhaps returning pm_runtime_suspended() in the other case make this
more clear? Or perhaps clarifying the comment somehow? :-)

>  }
>
>  static void dw_i2c_plat_complete(struct device *dev)
>
>

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 04/12] PM / core: Add SMART_SUSPEND driver flag
  2017-10-16  1:29 ` [PATCH 04/12] PM / core: Add SMART_SUSPEND driver flag Rafael J. Wysocki
@ 2017-10-23 19:01   ` Ulf Hansson
  2017-10-24  5:22   ` Ulf Hansson
  1 sibling, 0 replies; 135+ messages in thread
From: Ulf Hansson @ 2017-10-23 19:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On 16 October 2017 at 03:29, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Define and document a SMART_SUSPEND flag to instruct bus types and PM
> domains that the system suspend callbacks provided by the driver can
> cope with runtime-suspended devices, so from the driver's perspective
> it should be safe to leave devices in runtime suspend during system
> suspend.



>
> Setting that flag also causes the PM core to skip the "late" and
> "noirq" phases of device suspend for devices that remain in runtime
> suspend at the beginning of the "late" phase (when runtime PM has
> been disabled for them) under the assumption that their state cannot
> (and should not) change after that point until the system suspend
> transition is complete.  Moreover, the PM core prevents runtime PM
> from acting on devices with DPM_FLAG_SMART_SUSPEND during system
> resume by setting their runtime PM status to "active" at the end of
> the "early" phase (right prior to enabling runtime PM for them).
> That allows system resume callbacks to do whatever is necessary to
> resume the device without worrying about runtime PM possibly
> running in parallel with them.

Could you explain in some detail of why the second part makes sense?

To me it seems more clever to leave the decision to the driver,
whether it wants to resume the device during system resume or if
rather wants to defer that to later, via runtime PM.

>
> However, that doesn't apply to transitions involving ->thaw_noirq,
> ->thaw_early and ->thaw callbacks during hibernation, as they
> generally are not expected to change the power states of devices.
> Consequently, if a device is in runtime suspend at the beginning
> of such a transition, it must stay in runtime suspend until the
> "complete" phase of it (since the callbacks may not change its
> power state).

The above seems reasonable, but on the other hand it makes it more
difficult to understand how the DPM_FLAG_SMART_SUSPEND is going to be
used.

Perhaps we should simply have a separate flag for the resume path?

>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  Documentation/driver-api/pm/devices.rst |   17 ++++++++
>  drivers/base/power/main.c               |   63 ++++++++++++++++++++++++++++----
>  include/linux/pm.h                      |    9 ++++
>  3 files changed, 82 insertions(+), 7 deletions(-)
>
> Index: linux-pm/Documentation/driver-api/pm/devices.rst
> ===================================================================
> --- linux-pm.orig/Documentation/driver-api/pm/devices.rst
> +++ linux-pm/Documentation/driver-api/pm/devices.rst
> @@ -766,6 +766,23 @@ the state of devices (possibly except fo
>  from their ``->prepare`` and ``->suspend`` callbacks (or equivalent) *before*
>  invoking device drivers' ``->suspend`` callbacks (or equivalent).
>
> +Some bus types and PM domains have a policy to resume all devices from runtime
> +suspend upfront in their ``->suspend`` callbacks, but that may not be really
> +necessary if the system suspend-resume callbacks provided by the device's
> +driver can cope with runtime-suspended devices.  The driver can indicate that
> +by setting ``DPM_FLAG_SMART_SUSPEND`` in :c:member:`power.driver_flags` at the
> +probe time, by passing it to the :c:func:`dev_pm_set_driver_flags` helper.  That
> +also causes the PM core to skip the ``suspend_late`` and ``suspend_noirq``
> +phases of device suspend for the device if it remains in runtime suspend at the
> +beginning of the ``suspend_late`` phase (when runtime PM has been disabled for
> +it) under the assumption that its state cannot (and should not) change after
> +that point until the system-wide transition is over.  Moreover, the PM core
> +updates the runtime power management status of devices with
> +``DPM_FLAG_SMART_SUSPEND`` set to "active" at the end of the ``resume_early``
> +phase of device resume (right prior to enabling runtime PM for them) in order
> +to prevent runtime PM from acting on them before the ``complete`` phase, which
> +means that they should be put into the full-power state before that phase.
> +
>  During system-wide resume from a sleep state it's easiest to put devices into
>  the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.
>  Refer to that document for more information regarding this particular issue as
> Index: linux-pm/include/linux/pm.h
> ===================================================================
> --- linux-pm.orig/include/linux/pm.h
> +++ linux-pm/include/linux/pm.h
> @@ -558,6 +558,7 @@ struct pm_subsys_data {
>   *
>   * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
>   * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
> + * SMART_SUSPEND: No need to resume the device from runtime suspend.
>   *
>   * Setting SMART_PREPARE instructs bus types and PM domains which may want
>   * system suspend/resume callbacks to be skipped for the device to return 0 from
> @@ -565,9 +566,17 @@ struct pm_subsys_data {
>   * other words, the system suspend/resume callbacks can only be skipped for the
>   * device if its driver doesn't object against that).  This flag has no effect
>   * if NEVER_SKIP is set.
> + *
> + * Setting SMART_SUSPEND instructs bus types and PM domains which may want to
> + * runtime resume the device upfront during system suspend that doing so is not
> + * necessary from the driver's perspective.  It also causes the PM core to skip
> + * the "late" and "noirq" phases of device suspend for the device if it remains
> + * in runtime suspend at the beginning of the "late" phase (when runtime PM has
> + * been disabled for it).
>   */
>  #define DPM_FLAG_NEVER_SKIP    BIT(0)
>  #define DPM_FLAG_SMART_PREPARE BIT(1)
> +#define DPM_FLAG_SMART_SUSPEND BIT(2)
>
>  struct dev_pm_info {
>         pm_message_t            power_state;
> Index: linux-pm/drivers/base/power/main.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/main.c
> +++ linux-pm/drivers/base/power/main.c
> @@ -551,6 +551,18 @@ static int device_resume_noirq(struct de
>         if (!dev->power.is_noirq_suspended)
>                 goto Out;
>
> +       if (dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
> +           pm_runtime_status_suspended(dev) && (state.event == PM_EVENT_THAW ||
> +           state.event == PM_EVENT_RECOVER)) {
> +               /*
> +                * The device has to stay in runtime suspend, because the
> +                * subsequent callbacks may not try to change its power state.
> +                */
> +               dev->power.is_suspended = false;
> +               dev->power.is_late_suspended = false;
> +               goto Skip;
> +       }
> +
>         dpm_wait_for_superior(dev, async);
>
>         if (dev->pm_domain) {
> @@ -573,9 +585,11 @@ static int device_resume_noirq(struct de
>         }
>
>         error = dpm_run_callback(callback, dev, state, info);
> +
> +Skip:
>         dev->power.is_noirq_suspended = false;
>
> - Out:
> +Out:
>         complete_all(&dev->power.completion);
>         TRACE_RESUME(error);
>         return error;
> @@ -715,6 +729,14 @@ static int device_resume_early(struct de
>         error = dpm_run_callback(callback, dev, state, info);
>         dev->power.is_late_suspended = false;
>
> +       /*
> +        * Devices with DPM_FLAG_SMART_SUSPEND may be left in runtime suspend
> +        * during system suspend, so update their runtime PM status to "active"
> +        * to prevent runtime PM from acting on them before device_complete().
> +        */
> +       if (dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND))
> +               pm_runtime_set_active(dev);

Please check the return value from pm_runtime_set_active(), else we
might not know if something went wrong. For example, the parent may
not be active.

Moreover, as stated above, perhaps this should be controlled by a separate flag?

> +
>   Out:
>         TRACE_RESUME(error);
>
> @@ -1107,6 +1129,15 @@ static int __device_suspend_noirq(struct
>         if (dev->power.syscore || dev->power.direct_complete)
>                 goto Complete;
>
> +       /*
> +        * The state of devices with DPM_FLAG_SMART_SUSPEND set that remain in
> +        * runtime suspend at this point cannot change going forward, so skip
> +        * the callback invocation for them.
> +        */
> +       if (dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
> +           pm_runtime_status_suspended(dev))
> +               goto Skip;
> +
>         if (dev->pm_domain) {
>                 info = "noirq power domain ";
>                 callback = pm_noirq_op(&dev->pm_domain->ops, state);
> @@ -1127,10 +1158,13 @@ static int __device_suspend_noirq(struct
>         }
>
>         error = dpm_run_callback(callback, dev, state, info);
> -       if (!error)
> -               dev->power.is_noirq_suspended = true;
> -       else
> +       if (error) {
>                 async_error = error;
> +               goto Complete;
> +       }
> +
> +Skip:
> +       dev->power.is_noirq_suspended = true;
>
>  Complete:
>         complete_all(&dev->power.completion);
> @@ -1268,6 +1302,15 @@ static int __device_suspend_late(struct
>         if (dev->power.syscore || dev->power.direct_complete)
>                 goto Complete;
>
> +       /*
> +        * The state of devices with DPM_FLAG_SMART_SUSPEND set that remain in
> +        * runtime suspend at this point cannot change going forward, so skip
> +        * the callback invocation for them.
> +        */
> +       if (dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
> +           pm_runtime_status_suspended(dev))
> +               goto Skip;
> +
>         if (dev->pm_domain) {
>                 info = "late power domain ";
>                 callback = pm_late_early_op(&dev->pm_domain->ops, state);
> @@ -1288,10 +1331,13 @@ static int __device_suspend_late(struct
>         }
>
>         error = dpm_run_callback(callback, dev, state, info);
> -       if (!error)
> -               dev->power.is_late_suspended = true;
> -       else
> +       if (error) {
>                 async_error = error;
> +               goto Complete;
> +       }
> +
> +Skip:
> +       dev->power.is_late_suspended = true;
>
>  Complete:
>         TRACE_SUSPEND(error);
> @@ -1652,6 +1698,9 @@ static int device_prepare(struct device
>         if (dev->power.syscore)
>                 return 0;
>
> +       WARN_ON(dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
> +               !pm_runtime_enabled(dev));
> +
>         /*
>          * If a device's parent goes into runtime suspend at the wrong time,
>          * it won't be possible to resume the device.  To prevent this we
>
>

My overall comment/concern with this flag is that I would like a more
straightforward approach, else people want understand how to use of
it.

Moreover doesn't this flag actually overlap quite closely with what
the direct_complete path is already doing? Except for the resume path
- of course.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 05/12] PCI / PM: Drop unnecessary invocations of pcibios_pm_ops callbacks
  2017-10-16  1:29 ` [PATCH 05/12] PCI / PM: Drop unnecessary invocations of pcibios_pm_ops callbacks Rafael J. Wysocki
@ 2017-10-23 19:06   ` Ulf Hansson
  0 siblings, 0 replies; 135+ messages in thread
From: Ulf Hansson @ 2017-10-23 19:06 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On 16 October 2017 at 03:29, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The only user of non-empty pcibios_pm_ops is s390 and it only uses
> "noirq" callbacks, so drop the invocations of the other pcibios_pm_ops
> callbacks from the PCI PM code.
>
> That will allow subsequent changes to be somewhat simpler.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

> ---
>  drivers/pci/pci-driver.c |   18 ------------------
>  1 file changed, 18 deletions(-)
>
> Index: linux-pm/drivers/pci/pci-driver.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci-driver.c
> +++ linux-pm/drivers/pci/pci-driver.c
> @@ -918,9 +918,6 @@ static int pci_pm_freeze(struct device *
>                         return error;
>         }
>
> -       if (pcibios_pm_ops.freeze)
> -               return pcibios_pm_ops.freeze(dev);
> -
>         return 0;
>  }
>
> @@ -982,12 +979,6 @@ static int pci_pm_thaw(struct device *de
>         const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
>         int error = 0;
>
> -       if (pcibios_pm_ops.thaw) {
> -               error = pcibios_pm_ops.thaw(dev);
> -               if (error)
> -                       return error;
> -       }
> -
>         if (pci_has_legacy_pm_support(pci_dev))
>                 return pci_legacy_resume(dev);
>
> @@ -1032,9 +1023,6 @@ static int pci_pm_poweroff(struct device
>   Fixup:
>         pci_fixup_device(pci_fixup_suspend, pci_dev);
>
> -       if (pcibios_pm_ops.poweroff)
> -               return pcibios_pm_ops.poweroff(dev);
> -
>         return 0;
>  }
>
> @@ -1107,12 +1095,6 @@ static int pci_pm_restore(struct device
>         const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
>         int error = 0;
>
> -       if (pcibios_pm_ops.restore) {
> -               error = pcibios_pm_ops.restore(dev);
> -               if (error)
> -                       return error;
> -       }
> -
>         /*
>          * This is necessary for the hibernation error path in which restore is
>          * called without restoring the standard config registers of the device.
>
>

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 07/12] ACPI / LPSS: Consolidate runtime PM and system sleep handling
  2017-10-16  1:29 ` [PATCH 07/12] ACPI / LPSS: Consolidate runtime PM and system sleep handling Rafael J. Wysocki
@ 2017-10-23 19:09   ` Ulf Hansson
  0 siblings, 0 replies; 135+ messages in thread
From: Ulf Hansson @ 2017-10-23 19:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On 16 October 2017 at 03:29, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Move the LPSS-specific code from acpi_lpss_runtime_suspend()
> and acpi_lpss_runtime_resume() into separate functions,
> acpi_lpss_suspend() and acpi_lpss_resume(), respectively, and
> make acpi_lpss_suspend_late() and acpi_lpss_resume_early() use
> them too in order to unify the runtime PM and system sleep
> handling in the LPSS driver.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

> ---
>
> This is based on an RFC I posted some time ago
> (https://patchwork.kernel.org/patch/9998147/), which didn't
> receive any comments and it depends on a couple of ACPI device PM
> patches posted recently (https://patchwork.kernel.org/patch/10006457/
> in particular).
>
> It's included in this series, because the next patch won't work without it.
>
> ---
>  drivers/acpi/acpi_lpss.c |   75 ++++++++++++++++++++---------------------------
>  1 file changed, 33 insertions(+), 42 deletions(-)
>
> Index: linux-pm/drivers/acpi/acpi_lpss.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/acpi_lpss.c
> +++ linux-pm/drivers/acpi/acpi_lpss.c
> @@ -716,40 +716,6 @@ static void acpi_lpss_dismiss(struct dev
>         acpi_dev_suspend(dev, false);
>  }
>
> -#ifdef CONFIG_PM_SLEEP
> -static int acpi_lpss_suspend_late(struct device *dev)
> -{
> -       struct lpss_private_data *pdata = acpi_driver_data(ACPI_COMPANION(dev));
> -       int ret;
> -
> -       ret = pm_generic_suspend_late(dev);
> -       if (ret)
> -               return ret;
> -
> -       if (pdata->dev_desc->flags & LPSS_SAVE_CTX)
> -               acpi_lpss_save_ctx(dev, pdata);
> -
> -       return acpi_dev_suspend(dev, device_may_wakeup(dev));
> -}
> -
> -static int acpi_lpss_resume_early(struct device *dev)
> -{
> -       struct lpss_private_data *pdata = acpi_driver_data(ACPI_COMPANION(dev));
> -       int ret;
> -
> -       ret = acpi_dev_resume(dev);
> -       if (ret)
> -               return ret;
> -
> -       acpi_lpss_d3_to_d0_delay(pdata);
> -
> -       if (pdata->dev_desc->flags & LPSS_SAVE_CTX)
> -               acpi_lpss_restore_ctx(dev, pdata);
> -
> -       return pm_generic_resume_early(dev);
> -}
> -#endif /* CONFIG_PM_SLEEP */
> -
>  /* IOSF SB for LPSS island */
>  #define LPSS_IOSF_UNIT_LPIOEP          0xA0
>  #define LPSS_IOSF_UNIT_LPIO1           0xAB
> @@ -835,19 +801,15 @@ static void lpss_iosf_exit_d3_state(void
>         mutex_unlock(&lpss_iosf_mutex);
>  }
>
> -static int acpi_lpss_runtime_suspend(struct device *dev)
> +static int acpi_lpss_suspend(struct device *dev, bool wakeup)
>  {
>         struct lpss_private_data *pdata = acpi_driver_data(ACPI_COMPANION(dev));
>         int ret;
>
> -       ret = pm_generic_runtime_suspend(dev);
> -       if (ret)
> -               return ret;
> -
>         if (pdata->dev_desc->flags & LPSS_SAVE_CTX)
>                 acpi_lpss_save_ctx(dev, pdata);
>
> -       ret = acpi_dev_suspend(dev, true);
> +        ret = acpi_dev_suspend(dev, wakeup);
>
>         /*
>          * This call must be last in the sequence, otherwise PMC will return
> @@ -860,7 +822,7 @@ static int acpi_lpss_runtime_suspend(str
>         return ret;
>  }
>
> -static int acpi_lpss_runtime_resume(struct device *dev)
> +static int acpi_lpss_resume(struct device *dev)
>  {
>         struct lpss_private_data *pdata = acpi_driver_data(ACPI_COMPANION(dev));
>         int ret;
> @@ -881,7 +843,36 @@ static int acpi_lpss_runtime_resume(stru
>         if (pdata->dev_desc->flags & LPSS_SAVE_CTX)
>                 acpi_lpss_restore_ctx(dev, pdata);
>
> -       return pm_generic_runtime_resume(dev);
> +       return 0;
> +}
> +#ifdef CONFIG_PM_SLEEP
> +static int acpi_lpss_suspend_late(struct device *dev)
> +{
> +       int ret = pm_generic_suspend_late(dev);
> +
> +       return ret ? ret : acpi_lpss_suspend(dev, device_may_wakeup(dev));
> +}
> +
> +static int acpi_lpss_resume_early(struct device *dev)
> +{
> +       int ret = acpi_lpss_resume(dev);
> +
> +       return ret ? ret : pm_generic_resume_early(dev);
> +}
> +#endif /* CONFIG_PM_SLEEP */
> +
> +static int acpi_lpss_runtime_suspend(struct device *dev)
> +{
> +       int ret = pm_generic_runtime_suspend(dev);
> +
> +       return ret ? ret : acpi_lpss_suspend(dev, true);
> +}
> +
> +static int acpi_lpss_runtime_resume(struct device *dev)
> +{
> +       int ret = acpi_lpss_resume(dev);
> +
> +       return ret ? ret : pm_generic_runtime_resume(dev);
>  }
>  #endif /* CONFIG_PM */
>
>
>

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 10/12] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-10-16  1:30 ` [PATCH 10/12] PM / core: Add LEAVE_SUSPENDED driver flag Rafael J. Wysocki
@ 2017-10-23 19:38   ` Ulf Hansson
  0 siblings, 0 replies; 135+ messages in thread
From: Ulf Hansson @ 2017-10-23 19:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On 16 October 2017 at 03:30, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
> instruct the PM core that it is desirable to leave the device in
> runtime suspend after system resume (for example, the device may be
> slow to resume and it may be better to avoid resuming it right away
> for this reason).
>
> Setting that flag causes the PM core to skip the ->resume_noirq,
> ->resume_early and ->resume callbacks for the device (like in the
> direct-complete optimization case) if (1) the wakeup settings of it
> are compatible with runtime PM (that is, either the device is
> configured to wake up the system from sleep or it cannot generate
> wakeup signals at all), and it will not be used for resuming any of
> its children or consumers.

As you state above, this looks like the direct_complete path, if being
used in conjunction with the DPM_SMART_SUSPEND flag.

Taking both these flags into account, what it seems to boils done is
that you need one flag, instructing the PM core to sometimes resume
the devices when it runs the direct_complete path, as isn't the case
today.

Wouldn't that be an alternative solution, which may be a bit simpler?

>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  Documentation/driver-api/pm/devices.rst |   20 +++++++
>  drivers/base/power/main.c               |   81 ++++++++++++++++++++++++++++++--
>  include/linux/pm.h                      |   12 +++-
>  3 files changed, 104 insertions(+), 9 deletions(-)
>
> Index: linux-pm/include/linux/pm.h
> ===================================================================
> --- linux-pm.orig/include/linux/pm.h
> +++ linux-pm/include/linux/pm.h
> @@ -559,6 +559,7 @@ struct pm_subsys_data {
>   * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
>   * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
>   * SMART_SUSPEND: No need to resume the device from runtime suspend.
> + * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible.
>   *
>   * Setting SMART_PREPARE instructs bus types and PM domains which may want
>   * system suspend/resume callbacks to be skipped for the device to return 0 from
> @@ -573,10 +574,14 @@ struct pm_subsys_data {
>   * the "late" and "noirq" phases of device suspend for the device if it remains
>   * in runtime suspend at the beginning of the "late" phase (when runtime PM has
>   * been disabled for it).
> + *
> + * Setting LEAVE_SUSPENDED informs the PM core and middle layer code that the
> + * driver prefers the device to be left in runtime suspend after system resume.
>   */
> -#define DPM_FLAG_NEVER_SKIP    BIT(0)
> -#define DPM_FLAG_SMART_PREPARE BIT(1)
> -#define DPM_FLAG_SMART_SUSPEND BIT(2)
> +#define DPM_FLAG_NEVER_SKIP            BIT(0)
> +#define DPM_FLAG_SMART_PREPARE         BIT(1)
> +#define DPM_FLAG_SMART_SUSPEND         BIT(2)
> +#define DPM_FLAG_LEAVE_SUSPENDED       BIT(3)

I would appreciate if you could reformat the patches such that you
only have to add one line here.

It makes it easier when I later runs a "git blame" to understand what
commit that introduced each flag. :-)

>
>  struct dev_pm_info {
>         pm_message_t            power_state;
> @@ -598,6 +603,7 @@ struct dev_pm_info {
>         bool                    wakeup_path:1;
>         bool                    syscore:1;
>         bool                    no_pm_callbacks:1;      /* Owned by the PM core */
> +       unsigned int            must_resume:1;  /* Owned by the PM core */
>  #else
>         unsigned int            should_wakeup:1;
>  #endif
> Index: linux-pm/drivers/base/power/main.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/main.c
> +++ linux-pm/drivers/base/power/main.c
> @@ -705,6 +705,12 @@ static int device_resume_early(struct de
>         if (!dev->power.is_late_suspended)
>                 goto Out;
>
> +       if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED) &&
> +           !dev->power.must_resume) {
> +               pm_runtime_set_suspended(dev);
> +               goto Out;
> +       }
> +
>         dpm_wait_for_superior(dev, async);
>
>         if (dev->pm_domain) {
> @@ -1098,6 +1104,32 @@ static pm_message_t resume_event(pm_mess
>         return PMSG_ON;
>  }
>
> +static void dpm_suppliers_set_must_resume(struct device *dev)
> +{
> +       struct device_link *link;
> +       int idx;
> +
> +       idx = device_links_read_lock();
> +
> +       list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
> +               link->supplier->power.must_resume = true;
> +
> +       device_links_read_unlock(idx);
> +}
> +
> +static void dpm_leave_suspended(struct device *dev)
> +{
> +       pm_runtime_set_suspended(dev);
> +       dev->power.is_suspended = false;
> +       dev->power.is_late_suspended = false;
> +       /*
> +        * This tells middle layer code to schedule runtime resume of the device
> +        * from its ->complete callback to update the device's power state in
> +        * case the platform firmware has been involved in resuming the system.
> +        */
> +       dev->power.direct_complete = true;
> +}
> +
>  /**
>   * __device_suspend_noirq - Execute a "noirq suspend" callback for given device.
>   * @dev: Device to handle.
> @@ -1135,8 +1167,20 @@ static int __device_suspend_noirq(struct
>          * the callback invocation for them.
>          */
>         if (dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
> -           pm_runtime_status_suspended(dev))
> -               goto Skip;
> +           pm_runtime_status_suspended(dev)) {
> +               /*
> +                * The device may be left suspended during system resume if
> +                * that is preferred by its driver and it will not be used for
> +                * resuming any of its children or consumers.
> +                */
> +               if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED) &&
> +                   !dev->power.must_resume) {
> +                       dpm_leave_suspended(dev);
> +                       goto Complete;
> +               } else {
> +                       goto Skip;
> +               }
> +       }
>
>         if (dev->pm_domain) {
>                 info = "noirq power domain ";
> @@ -1163,6 +1207,28 @@ static int __device_suspend_noirq(struct
>                 goto Complete;
>         }
>
> +       /*
> +        * The device may be left suspended during system resume if that is
> +        * preferred by its driver and its wakeup configuration is compatible
> +        * with runtime PM, and it will not be used for resuming any of its
> +        * children or consumers.
> +        */
> +       if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED) &&
> +           (device_may_wakeup(dev) || !device_can_wakeup(dev)) &&
> +           !dev->power.must_resume) {
> +               dpm_leave_suspended(dev);
> +               goto Complete;
> +       }
> +
> +       /*
> +        * The parent and suppliers will be necessary to resume the device
> +        * during system resume, so avoid leaving them in runtime suspend.
> +        */
> +       if (dev->parent)
> +               dev->parent->power.must_resume = true;
> +
> +       dpm_suppliers_set_must_resume(dev);
> +
>  Skip:
>         dev->power.is_noirq_suspended = true;
>
> @@ -1698,8 +1764,9 @@ static int device_prepare(struct device
>         if (dev->power.syscore)
>                 return 0;
>
> -       WARN_ON(dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
> -               !pm_runtime_enabled(dev));
> +       WARN_ON(!pm_runtime_enabled(dev) &&
> +               dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND |
> +                                             DPM_FLAG_LEAVE_SUSPENDED));
>
>         /*
>          * If a device's parent goes into runtime suspend at the wrong time,
> @@ -1712,6 +1779,12 @@ static int device_prepare(struct device
>         device_lock(dev);
>
>         dev->power.wakeup_path = device_may_wakeup(dev);
> +       /*
> +        * Avoid leaving devices in suspend after transitions that don't really
> +        * suspend them in general.
> +        */
> +       dev->power.must_resume = state.event == PM_EVENT_FREEZE ||
> +                               state.event == PM_EVENT_QUIESCE;
>
>         if (dev->power.no_pm_callbacks) {
>                 ret = 1;        /* Let device go direct_complete */
> Index: linux-pm/Documentation/driver-api/pm/devices.rst
> ===================================================================
> --- linux-pm.orig/Documentation/driver-api/pm/devices.rst
> +++ linux-pm/Documentation/driver-api/pm/devices.rst
> @@ -785,6 +785,22 @@ means that they should be put into the f
>
>  During system-wide resume from a sleep state it's easiest to put devices into
>  the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.
> -Refer to that document for more information regarding this particular issue as
> +[Refer to that document for more information regarding this particular issue as
>  well as for information on the device runtime power management framework in
> -general.
> +general.]
> +
> +However, it may be desirable to leave some devices in runtime suspend after
> +system resume and device drivers can use the ``DPM_FLAG_LEAVE_SUSPENDED`` flag
> +to indicate to the PM core that this is the case.  If that flag is set for a
> +device and the wakeup settings of it are compatible with runtime PM (that is,
> +either the device is configured to wake up the system from sleep or it cannot
> +generate wakeup signals at all), and it will not be used for resuming any of its
> +children or consumers, the PM core will skip all of the system resume callbacks
> +in the ``resume_noirq``, ``resume_early`` and ``resume`` phases for it and its
> +runtime power management status will be set to "suspended".
> +
> +Still, if the platform firmware is involved in the handling of system resume, it
> +may change the state of devices in unpredictable ways, so in that case the
> +middle layer code (for example, a bus type or PM domain) the driver works with
> +should update the device's power state and schedule runtime resume of it to
> +align its power settings with the expectations of the runtime PM framework.
>
>

Regarding the DPM_NEVER_SKIP flag, is that flag only to prevent
direct_complete, and thus it ought not to be used with these other
driver PM flags that you add in this series?

Have you considered that DPM_NEVER_SKIP also propagates to parents
etc? Just to make sure one you won't skip invoking some system sleep
callbacks, even if they should because a child requires it?

Or maybe I am just tired and should continue to review with a more
fresh mind. :-)

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [Update][PATCH v2 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-23 16:37     ` Ulf Hansson
@ 2017-10-23 20:41       ` Rafael J. Wysocki
  0 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-23 20:41 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Linux PM, Greg Kroah-Hartman, Lukas Wunner, Bjorn Helgaas,
	Alan Stern, LKML, Linux ACPI, Linux PCI, Linux Documentation,
	Mika Westerberg, Andy Shevchenko, Kevin Hilman, Wolfram Sang,
	linux-i2c, Lee Jones

On Monday, October 23, 2017 6:37:41 PM CEST Ulf Hansson wrote:
> On 19 October 2017 at 01:17, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > The motivation for this change is to provide a way to work around
> > a problem with the direct-complete mechanism used for avoiding
> > system suspend/resume handling for devices in runtime suspend.
> >
> > The problem is that some middle layer code (the PCI bus type and
> > the ACPI PM domain in particular) returns positive values from its
> > system suspend ->prepare callbacks regardless of whether the driver's
> > ->prepare returns a positive value or 0, which effectively prevents
> > drivers from being able to control the direct-complete feature.
> > Some drivers need that control, however, and the PCI bus type has
> > grown its own flag to deal with this issue, but since it is not
> > limited to PCI, it is better to address it by adding driver flags at
> > the core level.
> >
> > To that end, add a driver_flags field to struct dev_pm_info for flags
> > that can be set by device drivers at the probe time to inform the PM
> > core and/or bus types, PM domains and so on on the capabilities and/or
> > preferences of device drivers.  Also add two static inline helpers
> > for setting that field and testing it against a given set of flags
> > and make the driver core clear it automatically on driver remove
> > and probe failures.
> >
> > Define and document two PM driver flags related to the direct-
> > complete feature: NEVER_SKIP and SMART_PREPARE that can be used,
> > respectively, to indicate to the PM core that the direct-complete
> > mechanism should never be used for the device and to inform the
> > middle layer code (bus types, PM domains etc) that it can only
> > request the PM core to use the direct-complete mechanism for
> > the device (by returning a positive value from its ->prepare
> > callback) if it also has been requested by the driver.
> >
> > While at it, make the core check pm_runtime_suspended() when
> > setting power.direct_complete so that it doesn't need to be
> > checked by ->prepare callbacks.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >
> > -> v2: Change the data type for driver_flags to u32 as suggested by Greg
> >        and fix a couple of documentation typos pointed out by Lukas.
> >
> > ---
> >  Documentation/driver-api/pm/devices.rst |   14 ++++++++++++++
> >  Documentation/power/pci.txt             |   19 +++++++++++++++++++
> >  drivers/acpi/device_pm.c                |    3 +++
> >  drivers/base/dd.c                       |    2 ++
> >  drivers/base/power/main.c               |    4 +++-
> >  drivers/pci/pci-driver.c                |    5 ++++-
> >  include/linux/device.h                  |   10 ++++++++++
> >  include/linux/pm.h                      |   20 ++++++++++++++++++++
> >  8 files changed, 75 insertions(+), 2 deletions(-)
> >
> > Index: linux-pm/include/linux/device.h
> > ===================================================================
> > --- linux-pm.orig/include/linux/device.h
> > +++ linux-pm/include/linux/device.h
> > @@ -1070,6 +1070,16 @@ static inline void dev_pm_syscore_device
> >  #endif
> >  }
> >
> > +static inline void dev_pm_set_driver_flags(struct device *dev, u32 flags)
> > +{
> > +       dev->power.driver_flags = flags;
> > +}
> > +
> > +static inline bool dev_pm_test_driver_flags(struct device *dev, u32 flags)
> > +{
> > +       return !!(dev->power.driver_flags & flags);
> > +}
> > +
> >  static inline void device_lock(struct device *dev)
> >  {
> >         mutex_lock(&dev->mutex);
> > Index: linux-pm/include/linux/pm.h
> > ===================================================================
> > --- linux-pm.orig/include/linux/pm.h
> > +++ linux-pm/include/linux/pm.h
> > @@ -550,6 +550,25 @@ struct pm_subsys_data {
> >  #endif
> >  };
> >
> > +/*
> > + * Driver flags to control system suspend/resume behavior.
> > + *
> > + * These flags can be set by device drivers at the probe time.  They need not be
> > + * cleared by the drivers as the driver core will take care of that.
> > + *
> > + * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
> > + * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
> > + *
> > + * Setting SMART_PREPARE instructs bus types and PM domains which may want
> > + * system suspend/resume callbacks to be skipped for the device to return 0 from
> > + * their ->prepare callbacks if the driver's ->prepare callback returns 0 (in
> > + * other words, the system suspend/resume callbacks can only be skipped for the
> > + * device if its driver doesn't object against that).  This flag has no effect
> > + * if NEVER_SKIP is set.
> 
> In principle ACPI/PCI middle-layer/PM domain could have started out by
> respecting the return values from driver's ->prepare() callbacks in
> case those existed, but they didn't, and that is the reason to why the
> SMART_PREPARE is needed. Right?
> 
> My point is, I don't think we should encourage other middle-layer to
> support the SMART_PREPARE flag, simply because they should be able to
> cope without it. To make this more obvious we could try to find a
> different name of the flag indicating that, or at least make it clear
> that we don't want it to be used by others than ACPI/PCI via
> documenting that.

I want it to be generic, though, so setting it should not be treated as a
mistake in any case (for example, because some drivers interact with the
ACPI PM domain and with some other middle layers).

If SMART_PREPARE simply overlaps with your defaul behavior, there's no need
to check the flag, but then it can be set really safely. :-)

> > + */
> > +#define DPM_FLAG_NEVER_SKIP    BIT(0)
> > +#define DPM_FLAG_SMART_PREPARE BIT(1)
> > +
> >  struct dev_pm_info {
> >         pm_message_t            power_state;
> >         unsigned int            can_wakeup:1;
> > @@ -561,6 +580,7 @@ struct dev_pm_info {
> >         bool                    is_late_suspended:1;
> >         bool                    early_init:1;   /* Owned by the PM core */
> >         bool                    direct_complete:1;      /* Owned by the PM core */
> > +       u32                     driver_flags;
> >         spinlock_t              lock;
> >  #ifdef CONFIG_PM_SLEEP
> >         struct list_head        entry;
> > Index: linux-pm/drivers/base/dd.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/dd.c
> > +++ linux-pm/drivers/base/dd.c
> > @@ -464,6 +464,7 @@ pinctrl_bind_failed:
> >         if (dev->pm_domain && dev->pm_domain->dismiss)
> >                 dev->pm_domain->dismiss(dev);
> >         pm_runtime_reinit(dev);
> > +       dev_pm_set_driver_flags(dev, 0);
> >
> >         switch (ret) {
> >         case -EPROBE_DEFER:
> > @@ -869,6 +870,7 @@ static void __device_release_driver(stru
> >                 if (dev->pm_domain && dev->pm_domain->dismiss)
> >                         dev->pm_domain->dismiss(dev);
> >                 pm_runtime_reinit(dev);
> > +               dev_pm_set_driver_flags(dev, 0);
> >
> >                 klist_remove(&dev->p->knode_driver);
> >                 device_pm_check_callbacks(dev);
> > Index: linux-pm/drivers/base/power/main.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/power/main.c
> > +++ linux-pm/drivers/base/power/main.c
> > @@ -1700,7 +1700,9 @@ unlock:
> >          * applies to suspend transitions, however.
> >          */
> >         spin_lock_irq(&dev->power.lock);
> > -       dev->power.direct_complete = ret > 0 && state.event == PM_EVENT_SUSPEND;
> > +       dev->power.direct_complete = state.event == PM_EVENT_SUSPEND &&
> > +               pm_runtime_suspended(dev) && ret > 0 &&
> > +               !dev_pm_test_driver_flags(dev, DPM_FLAG_NEVER_SKIP);
> >         spin_unlock_irq(&dev->power.lock);
> >         return 0;
> >  }
> > Index: linux-pm/drivers/pci/pci-driver.c
> > ===================================================================
> > --- linux-pm.orig/drivers/pci/pci-driver.c
> > +++ linux-pm/drivers/pci/pci-driver.c
> > @@ -682,8 +682,11 @@ static int pci_pm_prepare(struct device
> >
> >         if (drv && drv->pm && drv->pm->prepare) {
> >                 int error = drv->pm->prepare(dev);
> > -               if (error)
> > +               if (error < 0)
> >                         return error;
> > +
> > +               if (!error && dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_PREPARE))
> > +                       return 0;
> >         }
> >         return pci_dev_keep_suspended(to_pci_dev(dev));
> >  }
> > Index: linux-pm/drivers/acpi/device_pm.c
> > ===================================================================
> > --- linux-pm.orig/drivers/acpi/device_pm.c
> > +++ linux-pm/drivers/acpi/device_pm.c
> > @@ -965,6 +965,9 @@ int acpi_subsys_prepare(struct device *d
> >         if (ret < 0)
> >                 return ret;
> >
> > +       if (!ret && dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_PREPARE))
> > +               return 0;
> 
> So if the driver don't implement the ->prepare() callback, you still
> want to treat this flag as it has one assigned and that it returns 0?
> 
> It seems not entirely according to what you have documented about the flag.

You are right, I just should not use the _generic_prepare() thing there.

I'll send a fix patch for that.

> > +
> >         if (!adev || !pm_runtime_suspended(dev))
> >                 return 0;
> >
> > Index: linux-pm/Documentation/driver-api/pm/devices.rst
> > ===================================================================
> > --- linux-pm.orig/Documentation/driver-api/pm/devices.rst
> > +++ linux-pm/Documentation/driver-api/pm/devices.rst
> > @@ -354,6 +354,20 @@ the phases are: ``prepare``, ``suspend``
> >         is because all such devices are initially set to runtime-suspended with
> >         runtime PM disabled.
> >
> > +       This feature also can be controlled by device drivers by using the
> > +       ``DPM_FLAG_NEVER_SKIP`` and ``DPM_FLAG_SMART_PREPARE`` driver power
> > +       management flags.  [Typically, they are set at the time the driver is
> > +       probed against the device in question by passing them to the
> > +       :c:func:`dev_pm_set_driver_flags` helper function.]  If the first of
> > +       these flags is set, the PM core will not apply the direct-complete
> > +       procedure described above to the given device and, consequenty, to any
> > +       of its ancestors.  The second flag, when set, informs the middle layer
> > +       code (bus types, device types, PM domains, classes) that it should take
> > +       the return value of the ``->prepare`` callback provided by the driver
> > +       into account and it may only return a positive value from its own
> > +       ``->prepare`` callback if the driver's one also has returned a positive
> > +       value.
> > +
> >      2. The ``->suspend`` methods should quiesce the device to stop it from
> >         performing I/O.  They also may save the device registers and put it into
> >         the appropriate low-power state, depending on the bus type the device is
> > Index: linux-pm/Documentation/power/pci.txt
> > ===================================================================
> > --- linux-pm.orig/Documentation/power/pci.txt
> > +++ linux-pm/Documentation/power/pci.txt
> > @@ -961,6 +961,25 @@ dev_pm_ops to indicate that one suspend
> >  .suspend(), .freeze(), and .poweroff() members and one resume routine is to
> >  be pointed to by the .resume(), .thaw(), and .restore() members.
> >
> > +3.1.19. Driver Flags for Power Management
> > +
> > +The PM core allows device drivers to set flags that influence the handling of
> > +power management for the devices by the core itself and by middle layer code
> > +including the PCI bus type.  The flags should be set once at the driver probe
> > +time with the help of the dev_pm_set_driver_flags() function and they should not
> > +be updated directly afterwards.
> 
> I am wondering if we really need to make a statement generic to all
> "driver PM flags" that these flags must be set at ->probe(). Maybe
> that is better documented per flag, rather than for all. The reason
> why I bring it up, is that I would not be surprised if a new flag
> comes a long and which may be used a bit differently, not requiring
> that.
> 
> Of course we can also update that later on, if needed.

Right.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 04/12] PM / core: Add SMART_SUSPEND driver flag
  2017-10-16  1:29 ` [PATCH 04/12] PM / core: Add SMART_SUSPEND driver flag Rafael J. Wysocki
  2017-10-23 19:01   ` Ulf Hansson
@ 2017-10-24  5:22   ` Ulf Hansson
  2017-10-24  8:55     ` Rafael J. Wysocki
  1 sibling, 1 reply; 135+ messages in thread
From: Ulf Hansson @ 2017-10-24  5:22 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On 16 October 2017 at 03:29, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Define and document a SMART_SUSPEND flag to instruct bus types and PM
> domains that the system suspend callbacks provided by the driver can
> cope with runtime-suspended devices, so from the driver's perspective
> it should be safe to leave devices in runtime suspend during system
> suspend.
>
> Setting that flag also causes the PM core to skip the "late" and
> "noirq" phases of device suspend for devices that remain in runtime
> suspend at the beginning of the "late" phase (when runtime PM has
> been disabled for them) under the assumption that their state cannot
> (and should not) change after that point until the system suspend
> transition is complete.  Moreover, the PM core prevents runtime PM
> from acting on devices with DPM_FLAG_SMART_SUSPEND during system
> resume by setting their runtime PM status to "active" at the end of
> the "early" phase (right prior to enabling runtime PM for them).
> That allows system resume callbacks to do whatever is necessary to
> resume the device without worrying about runtime PM possibly
> running in parallel with them.

After some sleep, I woke up and realized that the hole thing of making
the PM core skip invoking system sleep callbacks, is not compatible
with devices being attached to the genpd. Sorry.

The reason is because genpd may not power off the PM domain, even if
all devices attached to it are runtime suspended. For example, due to
a subdomain holding it powered or because a PM QoS constraints
prevents to power off it in runtime. Then to understand whether it
shall power off/on the PM domain, during system-wide PM it requires
the system sleep callbacks to be invoked.

So, even if the driver can cope with the behavior from
DPM_FLAG_SMART_SUSPEND, then what happens when the PM domain (genpd)
can not?

Taking this into account, this feels like solution entirely specific
to ACPI and PCI. That is fine by me, however then we still have those
cross SoC drivers, the i2c-designware driver, which may have its
device attached to an ACPI PM domain or perhaps a genpd.

>
> However, that doesn't apply to transitions involving ->thaw_noirq,
> ->thaw_early and ->thaw callbacks during hibernation, as they
> generally are not expected to change the power states of devices.
> Consequently, if a device is in runtime suspend at the beginning
> of such a transition, it must stay in runtime suspend until the
> "complete" phase of it (since the callbacks may not change its
> power state).
>

[...]

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 04/12] PM / core: Add SMART_SUSPEND driver flag
  2017-10-24  5:22   ` Ulf Hansson
@ 2017-10-24  8:55     ` Rafael J. Wysocki
  0 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-24  8:55 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c,
	Lee Jones

On Tuesday, October 24, 2017 7:22:25 AM CEST Ulf Hansson wrote:
> On 16 October 2017 at 03:29, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Define and document a SMART_SUSPEND flag to instruct bus types and PM
> > domains that the system suspend callbacks provided by the driver can
> > cope with runtime-suspended devices, so from the driver's perspective
> > it should be safe to leave devices in runtime suspend during system
> > suspend.
> >
> > Setting that flag also causes the PM core to skip the "late" and
> > "noirq" phases of device suspend for devices that remain in runtime
> > suspend at the beginning of the "late" phase (when runtime PM has
> > been disabled for them) under the assumption that their state cannot
> > (and should not) change after that point until the system suspend
> > transition is complete.  Moreover, the PM core prevents runtime PM
> > from acting on devices with DPM_FLAG_SMART_SUSPEND during system
> > resume by setting their runtime PM status to "active" at the end of
> > the "early" phase (right prior to enabling runtime PM for them).
> > That allows system resume callbacks to do whatever is necessary to
> > resume the device without worrying about runtime PM possibly
> > running in parallel with them.
> 
> After some sleep, I woke up and realized that the hole thing of making
> the PM core skip invoking system sleep callbacks, is not compatible
> with devices being attached to the genpd. Sorry.

That's OK. :-)

It just means I need to move that logic up to the concerned middle layers.

I was going to do that to start with, but then I thought I would do it in
the core to avoid duplicated checks.  I overlooked the genpd case, however.

> The reason is because genpd may not power off the PM domain, even if
> all devices attached to it are runtime suspended. For example, due to
> a subdomain holding it powered or because a PM QoS constraints
> prevents to power off it in runtime. Then to understand whether it
> shall power off/on the PM domain, during system-wide PM it requires
> the system sleep callbacks to be invoked.
> 
> So, even if the driver can cope with the behavior from
> DPM_FLAG_SMART_SUSPEND, then what happens when the PM domain (genpd)
> can not?
> 
> Taking this into account, this feels like solution entirely specific
> to ACPI and PCI. That is fine by me, however then we still have those
> cross SoC drivers, the i2c-designware driver, which may have its
> device attached to an ACPI PM domain or perhaps a genpd.

Yes, that should be fine if the logic above goes to the PCI bus type
and ACPI PM domain.  Then, setting the flag will have no effect on
genpd at all, but that's OK.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 11/12] PM: i2c-designware-platdrv: Optimize power management
  2017-10-16  1:31 ` [PATCH 11/12] PM: i2c-designware-platdrv: Optimize power management Rafael J. Wysocki
@ 2017-10-26 20:41   ` Wolfram Sang
  2017-10-26 21:14     ` Rafael J. Wysocki
  0 siblings, 1 reply; 135+ messages in thread
From: Wolfram Sang @ 2017-10-26 20:41 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Ulf Hansson, Andy Shevchenko, Kevin Hilman, linux-i2c, Lee Jones

[-- Attachment #1: Type: text/plain, Size: 1970 bytes --]

On Mon, Oct 16, 2017 at 03:31:17AM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Optimize the power management in i2c-designware-platdrv by making it
> set the DPM_FLAG_SMART_SUSPEND and DPM_FLAG_LEAVE_SUSPENDED which
> allows some code to be dropped from its PM callbacks.
> 
> First, setting DPM_FLAG_SMART_SUSPEND causes the intel-lpss driver
> to avoid resuming i2c-designware-platdrv devices in its ->prepare
> callback, so they can stay in runtime suspend after that point even
> if the direct-complete feature is not used for them.
> 
> It also causes the PM core to avoid invoking "late" and "noirq"
> suspend callbacks for these devices if they are in runtime suspend
> at the beginning of the "late" phase of device suspend during
> system suspend.  That guarantees dw_i2c_plat_suspend() to be
> called for a device only if it is not in runtime suspend.
> Moreover, it also causes the PM core to set the device's runtime
> PM status to "active" after calling dw_i2c_plat_resume() for
> it, so the driver doesn't need internal flags to avoid invoking
> either dw_i2c_plat_suspend() or dw_i2c_plat_resume() twice in
> a row.
> 
> Second, setting DPM_FLAG_LEAVE_SUSPENDED enables the optimization
> allowing the device to stay suspended after system resume under
> suitable conditions, so again the driver doesn't need to take
> care of that by itself.
> 
> Accordingly, the internal "suspended" and "skip_resume" flags
> used by the driver are not necessary any more, so drop them and
> simplify the driver's PM callbacks.
> 
> Additionally, notice that dw_i2c_plat_complete() only needs
> to schedule runtime PM for the device if platform firmware
> has been involved in resuming the system, so make it call
> pm_resume_via_firmware() to check that.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

So, if the designware maintainers ack it, I will, too.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 11/12] PM: i2c-designware-platdrv: Optimize power management
  2017-10-26 20:41   ` Wolfram Sang
@ 2017-10-26 21:14     ` Rafael J. Wysocki
  0 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-26 21:14 UTC (permalink / raw)
  To: Wolfram Sang
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Ulf Hansson, Andy Shevchenko, Kevin Hilman, linux-i2c, Lee Jones

On Thursday, October 26, 2017 10:41:40 PM CEST Wolfram Sang wrote:
> On Mon, Oct 16, 2017 at 03:31:17AM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > Optimize the power management in i2c-designware-platdrv by making it
> > set the DPM_FLAG_SMART_SUSPEND and DPM_FLAG_LEAVE_SUSPENDED which
> > allows some code to be dropped from its PM callbacks.
> > 
> > First, setting DPM_FLAG_SMART_SUSPEND causes the intel-lpss driver
> > to avoid resuming i2c-designware-platdrv devices in its ->prepare
> > callback, so they can stay in runtime suspend after that point even
> > if the direct-complete feature is not used for them.
> > 
> > It also causes the PM core to avoid invoking "late" and "noirq"
> > suspend callbacks for these devices if they are in runtime suspend
> > at the beginning of the "late" phase of device suspend during
> > system suspend.  That guarantees dw_i2c_plat_suspend() to be
> > called for a device only if it is not in runtime suspend.
> > Moreover, it also causes the PM core to set the device's runtime
> > PM status to "active" after calling dw_i2c_plat_resume() for
> > it, so the driver doesn't need internal flags to avoid invoking
> > either dw_i2c_plat_suspend() or dw_i2c_plat_resume() twice in
> > a row.
> > 
> > Second, setting DPM_FLAG_LEAVE_SUSPENDED enables the optimization
> > allowing the device to stay suspended after system resume under
> > suitable conditions, so again the driver doesn't need to take
> > care of that by itself.
> > 
> > Accordingly, the internal "suspended" and "skip_resume" flags
> > used by the driver are not necessary any more, so drop them and
> > simplify the driver's PM callbacks.
> > 
> > Additionally, notice that dw_i2c_plat_complete() only needs
> > to schedule runtime PM for the device if platform firmware
> > has been involved in resuming the system, so make it call
> > pm_resume_via_firmware() to check that.
> > 
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> So, if the designware maintainers ack it, I will, too.

Thanks!

I need to post a new revision of the core patches, so I'll send this one
again later.  Likely during the next cycle.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 1)
  2017-10-16  1:12 [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Rafael J. Wysocki
                   ` (14 preceding siblings ...)
  2017-10-20 20:46 ` Bjorn Helgaas
@ 2017-10-27 22:11 ` Rafael J. Wysocki
  2017-10-27 22:17   ` [PATCH v2 1/6] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags Rafael J. Wysocki
                     ` (6 more replies)
  15 siblings, 7 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-27 22:11 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

Hi All,

The following part of the original cover letter still applies:

On Monday, October 16, 2017 3:12:35 AM CEST Rafael J. Wysocki wrote:
> 
> This work was triggered by attempts to fix and optimize PM in the
> i2c-designware-platdev driver that ended up with adding a couple of
> flags to the driver's internal data structures for the tracking of
> device state (https://marc.info/?l=linux-acpi&m=150629646805636&w=2).
> That approach is sort of suboptimal, though, because other drivers will
> probably want to do similar things and if all of them need to use internal
> flags for that, quite a bit of code duplication may ensue at least.
> 
> That can be avoided in a couple of ways and one of them is to provide a means
> for drivers to tell the core what to do and to make the core take care of it
> if told to do so.  Hence, the idea to use driver flags for system-wide PM
> that was briefly discussed during the LPC in LA last month.

[...]

> What can work (and this is the only strategy that can work AFAICS) is to
> point different callback pointers *in* *a* *driver* to the same routine
> if the driver wants to reuse that code.  That actually will work for PCI
> and USB drivers today, at least most of the time, but unfortunately there
> are problems with it for, say, platform devices.
> 
> The first problem is the requirement to track the status of the device
> (suspended vs not suspended) in the callbacks, because the system-wide PM
> code in the PM core doesn't do that.  The runtime PM framework does it, so
> this means adding some extra code which isn't necessary for runtime PM to
> the callback routines and that is not particularly nice.
> 
> The second problem is that, if the driver wants to do anything in its
> ->suspend callback, it generally has to prevent runtime suspend of the
> device from taking place in parallel with that, which is quite cumbersome.
> Usually, that is taken care of by resuming the device from runtime suspend
> upfront, but generally doing that is wasteful (there may be no real need to
> resume the device except for the fact that the code is designed this way).
> 
> On top of the above, there are optimizations to be made, like leaving certain
> devices in suspend after system resume to avoid wasting time on waiting for
> them to resume before user space can run again and similar.
> 
> This patch series focuses on addressing those problems so as to make it
> easier to reuse callback routines by pointing different callback pointers
> to them in device drivers.  The flags introduced here are to instruct the
> PM core and middle layers (whatever they are) on how the driver wants the
> device to be handled and then the driver has to provide callbacks to match
> these instructions and the rest should be taken care of by the code above it.
> 
> The flags are introduced one by one to avoid making too many changes in
> one go and to allow things to be explained better (hopefully).  They mostly
> are mutually independent with some clearly documented exceptions.

but I had to rework the core patches to address the problem pointed with the
generic power domains (genpd) framework pointed out by Ulf.

Namely, genpd expects its "noirq" callbacks to be invoked for devices in
runtime suspend too and it has valid reasons for that, so its "noirq"
callbacks can never be skipped, even for devices with the SMART_SUSPEND
flag set.  For this reason, the logic related to DPM_FLAG_SMART_SUSPEND
had to be moved from the core to the PCI bus type and the ACPI PM domain
which are mostly affected by it anyway.  The code after the changes looks
more straightforward to me, but it generally is more code and some patterns
had to be repeated in a few places.

I also fixed a minor issue in the ACPI PM domain part of the first patch which
interpreted the lack of a driver ->prepare callback as an indication that the
driver refuses to participate in the direct-complete optimization if
DPM_FLAG_SMART_PREPARE is set for the given device (the flag should be ignored
in that case, but arguably setting it and failing to provide a ->prepare
callback would be sort of inconsistent anyway).

This series includes the core, PCI and ACPI PM domain part of the patches
introducing the NEVER_SKIP, SMART_PREPARE and SMART_SUSPEND flags plus one
extra PCI patch that hasn't changed from the previous iteration.  It is
based on the linux-next branch of the linux-pm.git tree that should be
included in linux-next.

I will send the core patches for the remaining two flags introduced by the
original series separately and the intel-lpss and i2c-designware ones will
be posted when the core patches have been reviewed and agreed on.

I have retained the Greg's ACKs on everything (as discussed with Greg offline)
and the Bjorn's ACKs on the majority of PCI changes (as they are essentially
the same as before) except for patch [5/6] which changed quite a bit from its
previous version (although it really implements the same behavior from the
PCI perspective).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v2 1/6] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-27 22:11 ` [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 1) Rafael J. Wysocki
@ 2017-10-27 22:17   ` Rafael J. Wysocki
  2017-11-06  8:07     ` Ulf Hansson
  2017-10-27 22:19   ` [PATCH v2 2/6] PCI / PM: Use the NEVER_SKIP driver flag Rafael J. Wysocki
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-27 22:17 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The motivation for this change is to provide a way to work around
a problem with the direct-complete mechanism used for avoiding
system suspend/resume handling for devices in runtime suspend.

The problem is that some middle layer code (the PCI bus type and
the ACPI PM domain in particular) returns positive values from its
system suspend ->prepare callbacks regardless of whether the driver's
->prepare returns a positive value or 0, which effectively prevents
drivers from being able to control the direct-complete feature.
Some drivers need that control, however, and the PCI bus type has
grown its own flag to deal with this issue, but since it is not
limited to PCI, it is better to address it by adding driver flags at
the core level.

To that end, add a driver_flags field to struct dev_pm_info for flags
that can be set by device drivers at the probe time to inform the PM
core and/or bus types, PM domains and so on on the capabilities and/or
preferences of device drivers.  Also add two static inline helpers
for setting that field and testing it against a given set of flags
and make the driver core clear it automatically on driver remove
and probe failures.

Define and document two PM driver flags related to the direct-
complete feature: NEVER_SKIP and SMART_PREPARE that can be used,
respectively, to indicate to the PM core that the direct-complete
mechanism should never be used for the device and to inform the
middle layer code (bus types, PM domains etc) that it can only
request the PM core to use the direct-complete mechanism for
the device (by returning a positive value from its ->prepare
callback) if it also has been requested by the driver.

While at it, make the core check pm_runtime_suspended() when
setting power.direct_complete so that it doesn't need to be
checked by ->prepare callbacks.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---

-> v2: Do not use pm_generic_prepare() in acpi_subsys_prepare()
       as the latter has to distinguish between the lack of the
       driver's ->prepare callback and the situation in which that
       callback has returned 0 if DPM_FLAG_SMART_PREPARE is set.

---
 Documentation/driver-api/pm/devices.rst |   14 ++++++++++++++
 Documentation/power/pci.txt             |   19 +++++++++++++++++++
 drivers/acpi/device_pm.c                |   13 +++++++++----
 drivers/base/dd.c                       |    2 ++
 drivers/base/power/main.c               |    4 +++-
 drivers/pci/pci-driver.c                |    5 ++++-
 include/linux/device.h                  |   10 ++++++++++
 include/linux/pm.h                      |   20 ++++++++++++++++++++
 8 files changed, 81 insertions(+), 6 deletions(-)

Index: linux-pm/include/linux/device.h
===================================================================
--- linux-pm.orig/include/linux/device.h
+++ linux-pm/include/linux/device.h
@@ -1070,6 +1070,16 @@ static inline void dev_pm_syscore_device
 #endif
 }
 
+static inline void dev_pm_set_driver_flags(struct device *dev, u32 flags)
+{
+	dev->power.driver_flags = flags;
+}
+
+static inline bool dev_pm_test_driver_flags(struct device *dev, u32 flags)
+{
+	return !!(dev->power.driver_flags & flags);
+}
+
 static inline void device_lock(struct device *dev)
 {
 	mutex_lock(&dev->mutex);
Index: linux-pm/include/linux/pm.h
===================================================================
--- linux-pm.orig/include/linux/pm.h
+++ linux-pm/include/linux/pm.h
@@ -550,6 +550,25 @@ struct pm_subsys_data {
 #endif
 };
 
+/*
+ * Driver flags to control system suspend/resume behavior.
+ *
+ * These flags can be set by device drivers at the probe time.  They need not be
+ * cleared by the drivers as the driver core will take care of that.
+ *
+ * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
+ * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
+ *
+ * Setting SMART_PREPARE instructs bus types and PM domains which may want
+ * system suspend/resume callbacks to be skipped for the device to return 0 from
+ * their ->prepare callbacks if the driver's ->prepare callback returns 0 (in
+ * other words, the system suspend/resume callbacks can only be skipped for the
+ * device if its driver doesn't object against that).  This flag has no effect
+ * if NEVER_SKIP is set.
+ */
+#define DPM_FLAG_NEVER_SKIP	BIT(0)
+#define DPM_FLAG_SMART_PREPARE	BIT(1)
+
 struct dev_pm_info {
 	pm_message_t		power_state;
 	unsigned int		can_wakeup:1;
@@ -561,6 +580,7 @@ struct dev_pm_info {
 	bool			is_late_suspended:1;
 	bool			early_init:1;	/* Owned by the PM core */
 	bool			direct_complete:1;	/* Owned by the PM core */
+	u32			driver_flags;
 	spinlock_t		lock;
 #ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
Index: linux-pm/drivers/base/dd.c
===================================================================
--- linux-pm.orig/drivers/base/dd.c
+++ linux-pm/drivers/base/dd.c
@@ -464,6 +464,7 @@ pinctrl_bind_failed:
 	if (dev->pm_domain && dev->pm_domain->dismiss)
 		dev->pm_domain->dismiss(dev);
 	pm_runtime_reinit(dev);
+	dev_pm_set_driver_flags(dev, 0);
 
 	switch (ret) {
 	case -EPROBE_DEFER:
@@ -869,6 +870,7 @@ static void __device_release_driver(stru
 		if (dev->pm_domain && dev->pm_domain->dismiss)
 			dev->pm_domain->dismiss(dev);
 		pm_runtime_reinit(dev);
+		dev_pm_set_driver_flags(dev, 0);
 
 		klist_remove(&dev->p->knode_driver);
 		device_pm_check_callbacks(dev);
Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -1700,7 +1700,9 @@ unlock:
 	 * applies to suspend transitions, however.
 	 */
 	spin_lock_irq(&dev->power.lock);
-	dev->power.direct_complete = ret > 0 && state.event == PM_EVENT_SUSPEND;
+	dev->power.direct_complete = state.event == PM_EVENT_SUSPEND &&
+		pm_runtime_suspended(dev) && ret > 0 &&
+		!dev_pm_test_driver_flags(dev, DPM_FLAG_NEVER_SKIP);
 	spin_unlock_irq(&dev->power.lock);
 	return 0;
 }
Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -682,8 +682,11 @@ static int pci_pm_prepare(struct device
 
 	if (drv && drv->pm && drv->pm->prepare) {
 		int error = drv->pm->prepare(dev);
-		if (error)
+		if (error < 0)
 			return error;
+
+		if (!error && dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_PREPARE))
+			return 0;
 	}
 	return pci_dev_keep_suspended(to_pci_dev(dev));
 }
Index: linux-pm/drivers/acpi/device_pm.c
===================================================================
--- linux-pm.orig/drivers/acpi/device_pm.c
+++ linux-pm/drivers/acpi/device_pm.c
@@ -959,11 +959,16 @@ static bool acpi_dev_needs_resume(struct
 int acpi_subsys_prepare(struct device *dev)
 {
 	struct acpi_device *adev = ACPI_COMPANION(dev);
-	int ret;
 
-	ret = pm_generic_prepare(dev);
-	if (ret < 0)
-		return ret;
+	if (dev->driver && dev->driver->pm && dev->driver->pm->prepare) {
+		int ret = dev->driver->pm->prepare(dev);
+
+		if (ret < 0)
+			return ret;
+
+		if (!ret && dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_PREPARE))
+			return 0;
+	}
 
 	if (!adev || !pm_runtime_suspended(dev))
 		return 0;
Index: linux-pm/Documentation/driver-api/pm/devices.rst
===================================================================
--- linux-pm.orig/Documentation/driver-api/pm/devices.rst
+++ linux-pm/Documentation/driver-api/pm/devices.rst
@@ -354,6 +354,20 @@ the phases are: ``prepare``, ``suspend``
 	is because all such devices are initially set to runtime-suspended with
 	runtime PM disabled.
 
+	This feature also can be controlled by device drivers by using the
+	``DPM_FLAG_NEVER_SKIP`` and ``DPM_FLAG_SMART_PREPARE`` driver power
+	management flags.  [Typically, they are set at the time the driver is
+	probed against the device in question by passing them to the
+	:c:func:`dev_pm_set_driver_flags` helper function.]  If the first of
+	these flags is set, the PM core will not apply the direct-complete
+	procedure described above to the given device and, consequenty, to any
+	of its ancestors.  The second flag, when set, informs the middle layer
+	code (bus types, device types, PM domains, classes) that it should take
+	the return value of the ``->prepare`` callback provided by the driver
+	into account and it may only return a positive value from its own
+	``->prepare`` callback if the driver's one also has returned a positive
+	value.
+
     2.	The ``->suspend`` methods should quiesce the device to stop it from
 	performing I/O.  They also may save the device registers and put it into
 	the appropriate low-power state, depending on the bus type the device is
Index: linux-pm/Documentation/power/pci.txt
===================================================================
--- linux-pm.orig/Documentation/power/pci.txt
+++ linux-pm/Documentation/power/pci.txt
@@ -961,6 +961,25 @@ dev_pm_ops to indicate that one suspend
 .suspend(), .freeze(), and .poweroff() members and one resume routine is to
 be pointed to by the .resume(), .thaw(), and .restore() members.
 
+3.1.19. Driver Flags for Power Management
+
+The PM core allows device drivers to set flags that influence the handling of
+power management for the devices by the core itself and by middle layer code
+including the PCI bus type.  The flags should be set once at the driver probe
+time with the help of the dev_pm_set_driver_flags() function and they should not
+be updated directly afterwards.
+
+The DPM_FLAG_NEVER_SKIP flag prevents the PM core from using the direct-complete
+mechanism allowing device suspend/resume callbacks to be skipped if the device
+is in runtime suspend when the system suspend starts.  That also affects all of
+the ancestors of the device, so this flag should only be used if absolutely
+necessary.
+
+The DPM_FLAG_SMART_PREPARE flag instructs the PCI bus type to only return a
+positive value from pci_pm_prepare() if the ->prepare callback provided by the
+driver of the device returns a positive value.  That allows the driver to opt
+out from using the direct-complete mechanism dynamically.
+
 3.2. Device Runtime Power Management
 ------------------------------------
 In addition to providing device power management callbacks PCI device drivers

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v2 2/6] PCI / PM: Use the NEVER_SKIP driver flag
  2017-10-27 22:11 ` [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 1) Rafael J. Wysocki
  2017-10-27 22:17   ` [PATCH v2 1/6] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags Rafael J. Wysocki
@ 2017-10-27 22:19   ` Rafael J. Wysocki
  2017-10-27 22:22   ` [PATCH v2 3/6] PM / core: Add SMART_SUSPEND " Rafael J. Wysocki
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-27 22:19 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Replace the PCI-specific flag PCI_DEV_FLAGS_NEEDS_RESUME with the
PM core's DPM_FLAG_NEVER_SKIP one everywhere and drop it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
---

-> v2: No changes.

---
 drivers/gpu/drm/i915/i915_drv.c |    2 +-
 drivers/misc/mei/pci-me.c       |    2 +-
 drivers/misc/mei/pci-txe.c      |    2 +-
 drivers/pci/pci.c               |    3 +--
 include/linux/pci.h             |    7 +------
 5 files changed, 5 insertions(+), 11 deletions(-)

Index: linux-pm/include/linux/pci.h
===================================================================
--- linux-pm.orig/include/linux/pci.h
+++ linux-pm/include/linux/pci.h
@@ -205,13 +205,8 @@ enum pci_dev_flags {
 	PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT = (__force pci_dev_flags_t) (1 << 9),
 	/* Do not use FLR even if device advertises PCI_AF_CAP */
 	PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10),
-	/*
-	 * Resume before calling the driver's system suspend hooks, disabling
-	 * the direct_complete optimization.
-	 */
-	PCI_DEV_FLAGS_NEEDS_RESUME = (__force pci_dev_flags_t) (1 << 11),
 	/* Don't use Relaxed Ordering for TLPs directed at this device */
-	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 12),
+	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 11),
 };
 
 enum pci_irq_reroute_variant {
Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -2166,8 +2166,7 @@ bool pci_dev_keep_suspended(struct pci_d
 
 	if (!pm_runtime_suspended(dev)
 	    || pci_target_state(pci_dev, wakeup) != pci_dev->current_state
-	    || platform_pci_need_resume(pci_dev)
-	    || (pci_dev->dev_flags & PCI_DEV_FLAGS_NEEDS_RESUME))
+	    || platform_pci_need_resume(pci_dev))
 		return false;
 
 	/*
Index: linux-pm/drivers/gpu/drm/i915/i915_drv.c
===================================================================
--- linux-pm.orig/drivers/gpu/drm/i915/i915_drv.c
+++ linux-pm/drivers/gpu/drm/i915/i915_drv.c
@@ -1304,7 +1304,7 @@ int i915_driver_load(struct pci_dev *pde
 	 * becaue the HDA driver may require us to enable the audio power
 	 * domain during system suspend.
 	 */
-	pdev->dev_flags |= PCI_DEV_FLAGS_NEEDS_RESUME;
+	dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP);
 
 	ret = i915_driver_init_early(dev_priv, ent);
 	if (ret < 0)
Index: linux-pm/drivers/misc/mei/pci-txe.c
===================================================================
--- linux-pm.orig/drivers/misc/mei/pci-txe.c
+++ linux-pm/drivers/misc/mei/pci-txe.c
@@ -141,7 +141,7 @@ static int mei_txe_probe(struct pci_dev
 	 * MEI requires to resume from runtime suspend mode
 	 * in order to perform link reset flow upon system suspend.
 	 */
-	pdev->dev_flags |= PCI_DEV_FLAGS_NEEDS_RESUME;
+	dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP);
 
 	/*
 	* For not wake-able HW runtime pm framework
Index: linux-pm/drivers/misc/mei/pci-me.c
===================================================================
--- linux-pm.orig/drivers/misc/mei/pci-me.c
+++ linux-pm/drivers/misc/mei/pci-me.c
@@ -223,7 +223,7 @@ static int mei_me_probe(struct pci_dev *
 	 * MEI requires to resume from runtime suspend mode
 	 * in order to perform link reset flow upon system suspend.
 	 */
-	pdev->dev_flags |= PCI_DEV_FLAGS_NEEDS_RESUME;
+	dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP);
 
 	/*
 	* For not wake-able HW runtime pm framework

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v2 3/6] PM / core: Add SMART_SUSPEND driver flag
  2017-10-27 22:11 ` [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 1) Rafael J. Wysocki
  2017-10-27 22:17   ` [PATCH v2 1/6] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags Rafael J. Wysocki
  2017-10-27 22:19   ` [PATCH v2 2/6] PCI / PM: Use the NEVER_SKIP driver flag Rafael J. Wysocki
@ 2017-10-27 22:22   ` Rafael J. Wysocki
  2017-11-06  8:09     ` Ulf Hansson
  2017-10-27 22:23   ` [PATCH v2 4/6] PCI / PM: Drop unnecessary invocations of pcibios_pm_ops callbacks Rafael J. Wysocki
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-27 22:22 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Define and document a SMART_SUSPEND flag to instruct bus types and PM
domains that the system suspend callbacks provided by the driver can
cope with runtime-suspended devices, so from the driver's perspective
it should be safe to leave devices in runtime suspend during system
suspend.

Setting that flag may also cause middle-layer code (bus types,
PM domains etc.) to skip invocations of the ->suspend_late and
->suspend_noirq callbacks provided by the driver if the device
is in runtime suspend at the beginning of the "late" phase of
the system-wide suspend transition, in which case the driver's
system-wide resume callbacks may be invoked back-to-back with
its ->runtime_suspend callback, so the driver has to be able to
cope with that too.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---

-> v2: Drop the changes in main.c, as the logic implemented by them
       previously is now going to be implemented in the PCI bus type
       and the ACPI PM domain directly.

---
 Documentation/driver-api/pm/devices.rst |   20 ++++++++++++++++++++
 drivers/base/power/main.c               |    3 +++
 include/linux/pm.h                      |    8 ++++++++
 3 files changed, 31 insertions(+)

Index: linux-pm/Documentation/driver-api/pm/devices.rst
===================================================================
--- linux-pm.orig/Documentation/driver-api/pm/devices.rst
+++ linux-pm/Documentation/driver-api/pm/devices.rst
@@ -766,6 +766,26 @@ the state of devices (possibly except fo
 from their ``->prepare`` and ``->suspend`` callbacks (or equivalent) *before*
 invoking device drivers' ``->suspend`` callbacks (or equivalent).
 
+Some bus types and PM domains have a policy to resume all devices from runtime
+suspend upfront in their ``->suspend`` callbacks, but that may not be really
+necessary if the driver of the device can cope with runtime-suspended devices.
+The driver can indicate that by setting ``DPM_FLAG_SMART_SUSPEND`` in
+:c:member:`power.driver_flags` at the probe time, by passing it to the
+:c:func:`dev_pm_set_driver_flags` helper.  That also may cause middle-layer code
+(bus types, PM domains etc.) to skip the ``->suspend_late`` and
+``->suspend_noirq`` callbacks provided by the driver if the device remains in
+runtime suspend at the beginning of the ``suspend_late`` phase of system-wide
+suspend (or in the ``poweroff_late`` phase of hibernation), when runtime PM
+has been disabled for it, under the assumption that its state should not change
+after that point until the system-wide transition is over.  If that happens, the
+driver's system-wide resume callbacks, if present, may still be invoked during
+the subsequent system-wide resume transition and the device's runtime power
+management status may be set to "active" before enabling runtime PM for it,
+so the driver must be prepared to cope with the invocation of its system-wide
+resume callbacks back-to-back with its ``->runtime_suspend`` one (without the
+intervening ``->runtime_resume`` and so on) and the final state of the device
+must reflect the "active" status for runtime PM in that case.
+
 During system-wide resume from a sleep state it's easiest to put devices into
 the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.
 Refer to that document for more information regarding this particular issue as
Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -1652,6 +1652,9 @@ static int device_prepare(struct device
 	if (dev->power.syscore)
 		return 0;
 
+	WARN_ON(dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
+		!pm_runtime_enabled(dev));
+
 	/*
 	 * If a device's parent goes into runtime suspend at the wrong time,
 	 * it won't be possible to resume the device.  To prevent this we
Index: linux-pm/include/linux/pm.h
===================================================================
--- linux-pm.orig/include/linux/pm.h
+++ linux-pm/include/linux/pm.h
@@ -558,6 +558,7 @@ struct pm_subsys_data {
  *
  * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
  * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
+ * SMART_SUSPEND: No need to resume the device from runtime suspend.
  *
  * Setting SMART_PREPARE instructs bus types and PM domains which may want
  * system suspend/resume callbacks to be skipped for the device to return 0 from
@@ -565,9 +566,16 @@ struct pm_subsys_data {
  * other words, the system suspend/resume callbacks can only be skipped for the
  * device if its driver doesn't object against that).  This flag has no effect
  * if NEVER_SKIP is set.
+ *
+ * Setting SMART_SUSPEND instructs bus types and PM domains which may want to
+ * runtime resume the device upfront during system suspend that doing so is not
+ * necessary from the driver's perspective.  It also may cause them to skip
+ * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by
+ * the driver if they decide to leave the device in runtime suspend.
  */
 #define DPM_FLAG_NEVER_SKIP	BIT(0)
 #define DPM_FLAG_SMART_PREPARE	BIT(1)
+#define DPM_FLAG_SMART_SUSPEND	BIT(2)
 
 struct dev_pm_info {
 	pm_message_t		power_state;

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v2 4/6] PCI / PM: Drop unnecessary invocations of pcibios_pm_ops callbacks
  2017-10-27 22:11 ` [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 1) Rafael J. Wysocki
                     ` (2 preceding siblings ...)
  2017-10-27 22:22   ` [PATCH v2 3/6] PM / core: Add SMART_SUSPEND " Rafael J. Wysocki
@ 2017-10-27 22:23   ` Rafael J. Wysocki
  2017-10-27 22:27   ` [PATCH v2 5/6] PCI / PM: Take SMART_SUSPEND driver flag into account Rafael J. Wysocki
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-27 22:23 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The only user of non-empty pcibios_pm_ops is s390 and it only uses
"noirq" callbacks, so drop the invocations of the other pcibios_pm_ops
callbacks from the PCI PM code.

That will allow subsequent changes to be somewhat simpler.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
---

-> v2: No changes.

---
 drivers/pci/pci-driver.c |   18 ------------------
 1 file changed, 18 deletions(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -922,9 +922,6 @@ static int pci_pm_freeze(struct device *
 			return error;
 	}
 
-	if (pcibios_pm_ops.freeze)
-		return pcibios_pm_ops.freeze(dev);
-
 	return 0;
 }
 
@@ -986,12 +983,6 @@ static int pci_pm_thaw(struct device *de
 	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 	int error = 0;
 
-	if (pcibios_pm_ops.thaw) {
-		error = pcibios_pm_ops.thaw(dev);
-		if (error)
-			return error;
-	}
-
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_resume(dev);
 
@@ -1036,9 +1027,6 @@ static int pci_pm_poweroff(struct device
  Fixup:
 	pci_fixup_device(pci_fixup_suspend, pci_dev);
 
-	if (pcibios_pm_ops.poweroff)
-		return pcibios_pm_ops.poweroff(dev);
-
 	return 0;
 }
 
@@ -1111,12 +1099,6 @@ static int pci_pm_restore(struct device
 	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 	int error = 0;
 
-	if (pcibios_pm_ops.restore) {
-		error = pcibios_pm_ops.restore(dev);
-		if (error)
-			return error;
-	}
-
 	/*
 	 * This is necessary for the hibernation error path in which restore is
 	 * called without restoring the standard config registers of the device.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v2 5/6] PCI / PM: Take SMART_SUSPEND driver flag into account
  2017-10-27 22:11 ` [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 1) Rafael J. Wysocki
                     ` (3 preceding siblings ...)
  2017-10-27 22:23   ` [PATCH v2 4/6] PCI / PM: Drop unnecessary invocations of pcibios_pm_ops callbacks Rafael J. Wysocki
@ 2017-10-27 22:27   ` Rafael J. Wysocki
  2017-10-31 22:48     ` Bjorn Helgaas
  2017-10-27 22:30   ` [PATCH v2 6/6] ACPI " Rafael J. Wysocki
  2017-11-08  0:41   ` [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
  6 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-27 22:27 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make the PCI bus type take DPM_FLAG_SMART_SUSPEND into account in its
system-wide PM callbacks and make sure that all code that should not
run in parallel with pci_pm_runtime_resume() is executed in the "late"
phases of system suspend, freeze and poweroff transitions.

[Note that the pm_runtime_suspended() check in pci_dev_keep_suspended()
is an optimization, because if is not passed, all of the subsequent
checks may be skipped and some of them are much more overhead in
general.]

Also use the observation that if the device is in runtime suspend
at the beginning of the "late" phase of a system-wide suspend-like
transition, its state cannot change going forward (runtime PM is
disabled for it at that time) until the transition is over and the
subsequent system-wide PM callbacks should be skipped for it (as
they generally assume the device to not be suspended), so add checks
for that in pci_pm_suspend_late/noirq(), pci_pm_freeze_late/noirq()
and pci_pm_poweroff_late/noirq().

Moreover, if pci_pm_resume_noirq() or pci_pm_restore_noirq() is
called during the subsequent system-wide resume transition and if
the device was left in runtime suspend previously, its runtime PM
status needs to be changed to "active" as it is going to be put
into the full-power state, so add checks for that too to these
functions.

In turn, if pci_pm_thaw_noirq() runs after the device has been
left in runtime suspend, the subsequent "thaw" callbacks need
to be skipped for it (as they may not work correctly with a
suspended device), so set the power.direct_complete flag for the
device then to make the PM core skip those callbacks.

In addition to the above add a core helper for checking if
DPM_FLAG_SMART_SUSPEND is set and the device runtime PM status is
"suspended" at the same time, which is done quite often in the new
code (and will be done elsewhere going forward too).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---

-> v2: Implement the entire handling of DPM_FLAG_SMART_SUSPEND in
       the PCI bus type (instead of doing that in the core).

---
 Documentation/power/pci.txt |   14 +++++
 drivers/base/power/main.c   |    6 ++
 drivers/pci/pci-driver.c    |  103 ++++++++++++++++++++++++++++++++++++--------
 include/linux/pm.h          |    2 
 4 files changed, 108 insertions(+), 17 deletions(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -734,18 +734,25 @@ static int pci_pm_suspend(struct device
 
 	if (!pm) {
 		pci_pm_default_suspend(pci_dev);
-		goto Fixup;
+		return 0;
 	}
 
 	/*
-	 * PCI devices suspended at run time need to be resumed at this point,
-	 * because in general it is necessary to reconfigure them for system
-	 * suspend.  Namely, if the device is supposed to wake up the system
-	 * from the sleep state, we may need to reconfigure it for this purpose.
-	 * In turn, if the device is not supposed to wake up the system from the
-	 * sleep state, we'll have to prevent it from signaling wake-up.
+	 * PCI devices suspended at run time may need to be resumed at this
+	 * point, because in general it may be necessary to reconfigure them for
+	 * system suspend.  Namely, if the device is expected to wake up the
+	 * system from the sleep state, it may have to be reconfigured for this
+	 * purpose, or if the device is not expected to wake up the system from
+	 * the sleep state, it should be prevented from signaling wakeup events
+	 * going forward.
+	 *
+	 * Also if the driver of the device does not indicate that its system
+	 * suspend callbacks can cope with runtime-suspended devices, it is
+	 * better to resume the device from runtime suspend here.
 	 */
-	pm_runtime_resume(dev);
+	if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) ||
+	    !pci_dev_keep_suspended(pci_dev))
+		pm_runtime_resume(dev);
 
 	pci_dev->state_saved = false;
 	if (pm->suspend) {
@@ -765,17 +772,27 @@ static int pci_pm_suspend(struct device
 		}
 	}
 
- Fixup:
-	pci_fixup_device(pci_fixup_suspend, pci_dev);
-
 	return 0;
 }
 
+static int pci_pm_suspend_late(struct device *dev)
+{
+	if (dev_pm_smart_suspend_and_suspended(dev))
+		return 0;
+
+	pci_fixup_device(pci_fixup_suspend, to_pci_dev(dev));
+
+	return pm_generic_suspend_late(dev);
+}
+
 static int pci_pm_suspend_noirq(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 
+	if (dev_pm_smart_suspend_and_suspended(dev))
+		return 0;
+
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_SUSPEND);
 
@@ -834,6 +851,14 @@ static int pci_pm_resume_noirq(struct de
 	struct device_driver *drv = dev->driver;
 	int error = 0;
 
+	/*
+	 * Devices with DPM_FLAG_SMART_SUSPEND may be left in runtime suspend
+	 * during system suspend, so update their runtime PM status to "active"
+	 * as they are going to be put into D0 shortly.
+	 */
+	if (dev_pm_smart_suspend_and_suspended(dev))
+		pm_runtime_set_active(dev);
+
 	pci_pm_default_resume_early(pci_dev);
 
 	if (pci_has_legacy_pm_support(pci_dev))
@@ -876,6 +901,7 @@ static int pci_pm_resume(struct device *
 #else /* !CONFIG_SUSPEND */
 
 #define pci_pm_suspend		NULL
+#define pci_pm_suspend_late	NULL
 #define pci_pm_suspend_noirq	NULL
 #define pci_pm_resume		NULL
 #define pci_pm_resume_noirq	NULL
@@ -910,7 +936,8 @@ static int pci_pm_freeze(struct device *
 	 * devices should not be touched during freeze/thaw transitions,
 	 * however.
 	 */
-	pm_runtime_resume(dev);
+	if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND))
+		pm_runtime_resume(dev);
 
 	pci_dev->state_saved = false;
 	if (pm->freeze) {
@@ -925,11 +952,22 @@ static int pci_pm_freeze(struct device *
 	return 0;
 }
 
+static int pci_pm_freeze_late(struct device *dev)
+{
+	if (dev_pm_smart_suspend_and_suspended(dev))
+		return 0;
+
+	return pm_generic_freeze_late(dev);;
+}
+
 static int pci_pm_freeze_noirq(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct device_driver *drv = dev->driver;
 
+	if (dev_pm_smart_suspend_and_suspended(dev))
+		return 0;
+
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_FREEZE);
 
@@ -959,6 +997,16 @@ static int pci_pm_thaw_noirq(struct devi
 	struct device_driver *drv = dev->driver;
 	int error = 0;
 
+	/*
+	 * If the device is in runtime suspend, the code below may not work
+	 * correctly with it, so skip that code and make the PM core skip all of
+	 * the subsequent "thaw" callbacks for the device.
+	 */
+	if (dev_pm_smart_suspend_and_suspended(dev)) {
+		dev->power.direct_complete = true;
+		return 0;
+	}
+
 	if (pcibios_pm_ops.thaw_noirq) {
 		error = pcibios_pm_ops.thaw_noirq(dev);
 		if (error)
@@ -1008,11 +1056,13 @@ static int pci_pm_poweroff(struct device
 
 	if (!pm) {
 		pci_pm_default_suspend(pci_dev);
-		goto Fixup;
+		return 0;
 	}
 
 	/* The reason to do that is the same as in pci_pm_suspend(). */
-	pm_runtime_resume(dev);
+	if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) ||
+	    !pci_dev_keep_suspended(pci_dev))
+		pm_runtime_resume(dev);
 
 	pci_dev->state_saved = false;
 	if (pm->poweroff) {
@@ -1024,17 +1074,27 @@ static int pci_pm_poweroff(struct device
 			return error;
 	}
 
- Fixup:
-	pci_fixup_device(pci_fixup_suspend, pci_dev);
-
 	return 0;
 }
 
+static int pci_pm_poweroff_late(struct device *dev)
+{
+	if (dev_pm_smart_suspend_and_suspended(dev))
+		return 0;
+
+	pci_fixup_device(pci_fixup_suspend, to_pci_dev(dev));
+
+	return pm_generic_poweroff_late(dev);
+}
+
 static int pci_pm_poweroff_noirq(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	struct device_driver *drv = dev->driver;
 
+	if (dev_pm_smart_suspend_and_suspended(dev))
+		return 0;
+
 	if (pci_has_legacy_pm_support(to_pci_dev(dev)))
 		return pci_legacy_suspend_late(dev, PMSG_HIBERNATE);
 
@@ -1076,6 +1136,10 @@ static int pci_pm_restore_noirq(struct d
 	struct device_driver *drv = dev->driver;
 	int error = 0;
 
+	/* This is analogous to the pci_pm_resume_noirq() case. */
+	if (dev_pm_smart_suspend_and_suspended(dev))
+		pm_runtime_set_active(dev);
+
 	if (pcibios_pm_ops.restore_noirq) {
 		error = pcibios_pm_ops.restore_noirq(dev);
 		if (error)
@@ -1124,10 +1188,12 @@ static int pci_pm_restore(struct device
 #else /* !CONFIG_HIBERNATE_CALLBACKS */
 
 #define pci_pm_freeze		NULL
+#define pci_pm_freeze_late	NULL
 #define pci_pm_freeze_noirq	NULL
 #define pci_pm_thaw		NULL
 #define pci_pm_thaw_noirq	NULL
 #define pci_pm_poweroff		NULL
+#define pci_pm_poweroff_late	NULL
 #define pci_pm_poweroff_noirq	NULL
 #define pci_pm_restore		NULL
 #define pci_pm_restore_noirq	NULL
@@ -1243,10 +1309,13 @@ static const struct dev_pm_ops pci_dev_p
 	.prepare = pci_pm_prepare,
 	.complete = pci_pm_complete,
 	.suspend = pci_pm_suspend,
+	.suspend_late = pci_pm_suspend_late,
 	.resume = pci_pm_resume,
 	.freeze = pci_pm_freeze,
+	.freeze_late = pci_pm_freeze_late,
 	.thaw = pci_pm_thaw,
 	.poweroff = pci_pm_poweroff,
+	.poweroff_late = pci_pm_poweroff_late,
 	.restore = pci_pm_restore,
 	.suspend_noirq = pci_pm_suspend_noirq,
 	.resume_noirq = pci_pm_resume_noirq,
Index: linux-pm/Documentation/power/pci.txt
===================================================================
--- linux-pm.orig/Documentation/power/pci.txt
+++ linux-pm/Documentation/power/pci.txt
@@ -980,6 +980,20 @@ positive value from pci_pm_prepare() if
 driver of the device returns a positive value.  That allows the driver to opt
 out from using the direct-complete mechanism dynamically.
 
+The DPM_FLAG_SMART_SUSPEND flag tells the PCI bus type that from the driver's
+perspective the device can be safely left in runtime suspend during system
+suspend.  That causes pci_pm_suspend(), pci_pm_freeze() and pci_pm_poweroff()
+to skip resuming the device from runtime suspend unless there are PCI-specific
+reasons for doing that.  Also, it causes pci_pm_suspend_late/noirq(),
+pci_pm_freeze_late/noirq() and pci_pm_poweroff_late/noirq() to return early
+if the device remains in runtime suspend in the beginning of the "late" phase
+of the system-wide transition under way.  Moreover, if the device is in
+runtime suspend in pci_pm_resume_noirq() or pci_pm_restore_noirq(), its runtime
+power management status will be changed to "active" (as it is going to be put
+into D0 going forward), but if it is in runtime suspend in pci_pm_thaw_noirq(),
+the function will set the power.direct_complete flag for it (to make the PM core
+skip the subsequent "thaw" callbacks for it) and return.
+
 3.2. Device Runtime Power Management
 ------------------------------------
 In addition to providing device power management callbacks PCI device drivers
Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -1861,3 +1861,9 @@ void device_pm_check_callbacks(struct de
 		 !dev->driver->suspend && !dev->driver->resume));
 	spin_unlock_irq(&dev->power.lock);
 }
+
+bool dev_pm_smart_suspend_and_suspended(struct device *dev)
+{
+	return dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
+		pm_runtime_status_suspended(dev);
+}
Index: linux-pm/include/linux/pm.h
===================================================================
--- linux-pm.orig/include/linux/pm.h
+++ linux-pm/include/linux/pm.h
@@ -765,6 +765,8 @@ extern int pm_generic_poweroff_late(stru
 extern int pm_generic_poweroff(struct device *dev);
 extern void pm_generic_complete(struct device *dev);
 
+extern bool dev_pm_smart_suspend_and_suspended(struct device *dev);
+
 #else /* !CONFIG_PM_SLEEP */
 
 #define device_pm_lock() do {} while (0)

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v2 6/6] ACPI / PM: Take SMART_SUSPEND driver flag into account
  2017-10-27 22:11 ` [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 1) Rafael J. Wysocki
                     ` (4 preceding siblings ...)
  2017-10-27 22:27   ` [PATCH v2 5/6] PCI / PM: Take SMART_SUSPEND driver flag into account Rafael J. Wysocki
@ 2017-10-27 22:30   ` Rafael J. Wysocki
  2017-11-08  0:41   ` [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
  6 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-27 22:30 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make the ACPI PM domain take DPM_FLAG_SMART_SUSPEND into account in
its system suspend callbacks.

[Note that the pm_runtime_suspended() check in acpi_dev_needs_resume()
is an optimization, because if is not passed, all of the subsequent
checks may be skipped and some of them are much more overhead in
general.]

Also use the observation that if the device is in runtime suspend
at the beginning of the "late" phase of a system-wide suspend-like
transition, its state cannot change going forward (runtime PM is
disabled for it at that time) until the transition is over and the
subsequent system-wide PM callbacks should be skipped for it (as
they generally assume the device to not be suspended), so add
checks for that in acpi_subsys_suspend_late/noirq() and
acpi_subsys_freeze_late/noirq().

Moreover, if acpi_subsys_resume_noirq() is called during the
subsequent system-wide resume transition and if the device was left
in runtime suspend previously, its runtime PM status needs to be
changed to "active" as it is going to be put into the full-power
state going forward, so add a check for that too in there.

In turn, if acpi_subsys_thaw_noirq() runs after the device has been
left in runtime suspend, the subsequent "thaw" callbacks need
to be skipped for it (as they may not work correctly with a
suspended device), so set the power.direct_complete flag for the
device then to make the PM core skip those callbacks.

On top of the above, make the analogous changes in the acpi_lpss
driver that uses the ACPI PM domain callbacks.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---

-> v2: Implement the entire handling of DPM_FLAG_SMART_SUSPEND in the ACPI PM
       domain (instead of relying on the core to handle it).  Among other
       things this requires some more callbacks to be provided.

---
 drivers/acpi/acpi_lpss.c |   13 ++++-
 drivers/acpi/device_pm.c |  113 +++++++++++++++++++++++++++++++++++++++++++----
 include/linux/acpi.h     |   10 ++++
 3 files changed, 126 insertions(+), 10 deletions(-)

Index: linux-pm/drivers/acpi/device_pm.c
===================================================================
--- linux-pm.orig/drivers/acpi/device_pm.c
+++ linux-pm/drivers/acpi/device_pm.c
@@ -936,7 +936,8 @@ static bool acpi_dev_needs_resume(struct
 	u32 sys_target = acpi_target_system_state();
 	int ret, state;
 
-	if (device_may_wakeup(dev) != !!adev->wakeup.prepare_count)
+	if (!pm_runtime_suspended(dev) || !adev ||
+	    device_may_wakeup(dev) != !!adev->wakeup.prepare_count)
 		return true;
 
 	if (sys_target == ACPI_STATE_S0)
@@ -970,9 +971,6 @@ int acpi_subsys_prepare(struct device *d
 			return 0;
 	}
 
-	if (!adev || !pm_runtime_suspended(dev))
-		return 0;
-
 	return !acpi_dev_needs_resume(dev, adev);
 }
 EXPORT_SYMBOL_GPL(acpi_subsys_prepare);
@@ -998,12 +996,17 @@ EXPORT_SYMBOL_GPL(acpi_subsys_complete);
  * acpi_subsys_suspend - Run the device driver's suspend callback.
  * @dev: Device to handle.
  *
- * Follow PCI and resume devices suspended at run time before running their
- * system suspend callbacks.
+ * Follow PCI and resume devices from runtime suspend before running their
+ * system suspend callbacks, unless the driver can cope with runtime-suspended
+ * devices during system suspend and there are no ACPI-specific reasons for
+ * resuming them.
  */
 int acpi_subsys_suspend(struct device *dev)
 {
-	pm_runtime_resume(dev);
+	if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) ||
+	    acpi_dev_needs_resume(dev, ACPI_COMPANION(dev)))
+		pm_runtime_resume(dev);
+
 	return pm_generic_suspend(dev);
 }
 EXPORT_SYMBOL_GPL(acpi_subsys_suspend);
@@ -1017,12 +1020,48 @@ EXPORT_SYMBOL_GPL(acpi_subsys_suspend);
  */
 int acpi_subsys_suspend_late(struct device *dev)
 {
-	int ret = pm_generic_suspend_late(dev);
+	int ret;
+
+	if (dev_pm_smart_suspend_and_suspended(dev))
+		return 0;
+
+	ret = pm_generic_suspend_late(dev);
 	return ret ? ret : acpi_dev_suspend(dev, device_may_wakeup(dev));
 }
 EXPORT_SYMBOL_GPL(acpi_subsys_suspend_late);
 
 /**
+ * acpi_subsys_suspend_noirq - Run the device driver's "noirq" suspend callback.
+ * @dev: Device to suspend.
+ */
+int acpi_subsys_suspend_noirq(struct device *dev)
+{
+	if (dev_pm_smart_suspend_and_suspended(dev))
+		return 0;
+
+	return pm_generic_suspend_noirq(dev);
+}
+EXPORT_SYMBOL_GPL(acpi_subsys_suspend_noirq);
+
+/**
+ * acpi_subsys_resume_noirq - Run the device driver's "noirq" resume callback.
+ * @dev: Device to handle.
+ */
+int acpi_subsys_resume_noirq(struct device *dev)
+{
+	/*
+	 * Devices with DPM_FLAG_SMART_SUSPEND may be left in runtime suspend
+	 * during system suspend, so update their runtime PM status to "active"
+	 * as they will be put into D0 going forward.
+	 */
+	if (dev_pm_smart_suspend_and_suspended(dev))
+		pm_runtime_set_active(dev);
+
+	return pm_generic_resume_noirq(dev);
+}
+EXPORT_SYMBOL_GPL(acpi_subsys_resume_noirq);
+
+/**
  * acpi_subsys_resume_early - Resume device using ACPI.
  * @dev: Device to Resume.
  *
@@ -1049,11 +1088,60 @@ int acpi_subsys_freeze(struct device *de
 	 * runtime-suspended devices should not be touched during freeze/thaw
 	 * transitions.
 	 */
-	pm_runtime_resume(dev);
+	if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND))
+		pm_runtime_resume(dev);
+
 	return pm_generic_freeze(dev);
 }
 EXPORT_SYMBOL_GPL(acpi_subsys_freeze);
 
+/**
+ * acpi_subsys_freeze_late - Run the device driver's "late" freeze callback.
+ * @dev: Device to handle.
+ */
+int acpi_subsys_freeze_late(struct device *dev)
+{
+
+	if (dev_pm_smart_suspend_and_suspended(dev))
+		return 0;
+
+	return pm_generic_freeze_late(dev);
+}
+EXPORT_SYMBOL_GPL(acpi_subsys_freeze_late);
+
+/**
+ * acpi_subsys_freeze_noirq - Run the device driver's "noirq" freeze callback.
+ * @dev: Device to handle.
+ */
+int acpi_subsys_freeze_noirq(struct device *dev)
+{
+
+	if (dev_pm_smart_suspend_and_suspended(dev))
+		return 0;
+
+	return pm_generic_freeze_noirq(dev);
+}
+EXPORT_SYMBOL_GPL(acpi_subsys_freeze_noirq);
+
+/**
+ * acpi_subsys_thaw_noirq - Run the device driver's "noirq" thaw callback.
+ * @dev: Device to handle.
+ */
+int acpi_subsys_thaw_noirq(struct device *dev)
+{
+	/*
+	 * If the device is in runtime suspend, the "thaw" code may not work
+	 * correctly with it, so skip the driver callback and make the PM core
+	 * skip all of the subsequent "thaw" callbacks for the device.
+	 */
+	if (dev_pm_smart_suspend_and_suspended(dev)) {
+		dev->power.direct_complete = true;
+		return 0;
+	}
+
+	return pm_generic_thaw_noirq(dev);
+}
+EXPORT_SYMBOL_GPL(acpi_subsys_thaw_noirq);
 #endif /* CONFIG_PM_SLEEP */
 
 static struct dev_pm_domain acpi_general_pm_domain = {
@@ -1065,10 +1153,17 @@ static struct dev_pm_domain acpi_general
 		.complete = acpi_subsys_complete,
 		.suspend = acpi_subsys_suspend,
 		.suspend_late = acpi_subsys_suspend_late,
+		.suspend_noirq = acpi_subsys_suspend_noirq,
+		.resume_noirq = acpi_subsys_resume_noirq,
 		.resume_early = acpi_subsys_resume_early,
 		.freeze = acpi_subsys_freeze,
+		.freeze_late = acpi_subsys_freeze_late,
+		.freeze_noirq = acpi_subsys_freeze_noirq,
+		.thaw_noirq = acpi_subsys_thaw_noirq,
 		.poweroff = acpi_subsys_suspend,
 		.poweroff_late = acpi_subsys_suspend_late,
+		.poweroff_noirq = acpi_subsys_suspend_noirq,
+		.restore_noirq = acpi_subsys_resume_noirq,
 		.restore_early = acpi_subsys_resume_early,
 #endif
 	},
Index: linux-pm/drivers/acpi/acpi_lpss.c
===================================================================
--- linux-pm.orig/drivers/acpi/acpi_lpss.c
+++ linux-pm/drivers/acpi/acpi_lpss.c
@@ -849,8 +849,12 @@ static int acpi_lpss_resume(struct devic
 #ifdef CONFIG_PM_SLEEP
 static int acpi_lpss_suspend_late(struct device *dev)
 {
-	int ret = pm_generic_suspend_late(dev);
+	int ret;
 
+	if (dev_pm_smart_suspend_and_suspended(dev))
+		return 0;
+
+	ret = pm_generic_suspend_late(dev);
 	return ret ? ret : acpi_lpss_suspend(dev, device_may_wakeup(dev));
 }
 
@@ -889,10 +893,17 @@ static struct dev_pm_domain acpi_lpss_pm
 		.complete = acpi_subsys_complete,
 		.suspend = acpi_subsys_suspend,
 		.suspend_late = acpi_lpss_suspend_late,
+		.suspend_noirq = acpi_subsys_suspend_noirq,
+		.resume_noirq = acpi_subsys_resume_noirq,
 		.resume_early = acpi_lpss_resume_early,
 		.freeze = acpi_subsys_freeze,
+		.freeze_late = acpi_subsys_freeze_late,
+		.freeze_noirq = acpi_subsys_freeze_noirq,
+		.thaw_noirq = acpi_subsys_thaw_noirq,
 		.poweroff = acpi_subsys_suspend,
 		.poweroff_late = acpi_lpss_suspend_late,
+		.poweroff_noirq = acpi_subsys_suspend_noirq,
+		.restore_noirq = acpi_subsys_resume_noirq,
 		.restore_early = acpi_lpss_resume_early,
 #endif
 		.runtime_suspend = acpi_lpss_runtime_suspend,
Index: linux-pm/include/linux/acpi.h
===================================================================
--- linux-pm.orig/include/linux/acpi.h
+++ linux-pm/include/linux/acpi.h
@@ -885,17 +885,27 @@ int acpi_dev_suspend_late(struct device
 int acpi_subsys_prepare(struct device *dev);
 void acpi_subsys_complete(struct device *dev);
 int acpi_subsys_suspend_late(struct device *dev);
+int acpi_subsys_suspend_noirq(struct device *dev);
+int acpi_subsys_resume_noirq(struct device *dev);
 int acpi_subsys_resume_early(struct device *dev);
 int acpi_subsys_suspend(struct device *dev);
 int acpi_subsys_freeze(struct device *dev);
+int acpi_subsys_freeze_late(struct device *dev);
+int acpi_subsys_freeze_noirq(struct device *dev);
+int acpi_subsys_thaw_noirq(struct device *dev);
 #else
 static inline int acpi_dev_resume_early(struct device *dev) { return 0; }
 static inline int acpi_subsys_prepare(struct device *dev) { return 0; }
 static inline void acpi_subsys_complete(struct device *dev) {}
 static inline int acpi_subsys_suspend_late(struct device *dev) { return 0; }
+static inline int acpi_subsys_suspend_noirq(struct device *dev) { return 0; }
+static inline int acpi_subsys_resume_noirq(struct device *dev) { return 0; }
 static inline int acpi_subsys_resume_early(struct device *dev) { return 0; }
 static inline int acpi_subsys_suspend(struct device *dev) { return 0; }
 static inline int acpi_subsys_freeze(struct device *dev) { return 0; }
+static inline int acpi_subsys_freeze_late(struct device *dev) { return 0; }
+static inline int acpi_subsys_freeze_noirq(struct device *dev) { return 0; }
+static inline int acpi_subsys_thaw_noirq(struct device *dev) { return 0; }
 #endif
 
 #ifdef CONFIG_ACPI

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 09/12] PM / mfd: intel-lpss: Use DPM_FLAG_SMART_SUSPEND
  2017-10-16  1:30 ` [PATCH 09/12] PM / mfd: intel-lpss: Use DPM_FLAG_SMART_SUSPEND Rafael J. Wysocki
@ 2017-10-31 15:09   ` Lee Jones
  2017-10-31 16:28     ` Rafael J. Wysocki
  0 siblings, 1 reply; 135+ messages in thread
From: Lee Jones @ 2017-10-31 15:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Ulf Hansson, Andy Shevchenko, Kevin Hilman, Wolfram Sang,
	linux-i2c

On Mon, 16 Oct 2017, Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Make the intel-lpss driver set DPM_FLAG_SMART_SUSPEND for its
> devices which will allow them to stay in runtime suspend during
> system suspend unless they need to be reconfigured for some reason.
> 
> Also make it avoid resuming its child devices if they have
> DPM_FLAG_SMART_SUSPEND set to allow them to remain in runtime
> suspend during system suspend.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/mfd/intel-lpss.c |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)

Is this patch independent?

For my own reference:
  Acked-for-MFD-by: Lee Jones <lee.jones@linaro.org>

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 09/12] PM / mfd: intel-lpss: Use DPM_FLAG_SMART_SUSPEND
  2017-10-31 15:09   ` Lee Jones
@ 2017-10-31 16:28     ` Rafael J. Wysocki
  2017-11-01  9:28       ` Lee Jones
  0 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-10-31 16:28 UTC (permalink / raw)
  To: Lee Jones
  Cc: Rafael J. Wysocki, Linux PM, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c

On Tue, Oct 31, 2017 at 4:09 PM, Lee Jones <lee.jones@linaro.org> wrote:
> On Mon, 16 Oct 2017, Rafael J. Wysocki wrote:
>
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> Make the intel-lpss driver set DPM_FLAG_SMART_SUSPEND for its
>> devices which will allow them to stay in runtime suspend during
>> system suspend unless they need to be reconfigured for some reason.
>>
>> Also make it avoid resuming its child devices if they have
>> DPM_FLAG_SMART_SUSPEND set to allow them to remain in runtime
>> suspend during system suspend.
>>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> ---
>>  drivers/mfd/intel-lpss.c |    6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> Is this patch independent?

It depends on the flag definition at least, but functionally it also
depends on the PCI support for the flag.

> For my own reference:
>   Acked-for-MFD-by: Lee Jones <lee.jones@linaro.org>

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 5/6] PCI / PM: Take SMART_SUSPEND driver flag into account
  2017-10-27 22:27   ` [PATCH v2 5/6] PCI / PM: Take SMART_SUSPEND driver flag into account Rafael J. Wysocki
@ 2017-10-31 22:48     ` Bjorn Helgaas
  0 siblings, 0 replies; 135+ messages in thread
From: Bjorn Helgaas @ 2017-10-31 22:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Ulf Hansson, Andy Shevchenko, Kevin Hilman

On Sat, Oct 28, 2017 at 12:27:45AM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Make the PCI bus type take DPM_FLAG_SMART_SUSPEND into account in its
> system-wide PM callbacks and make sure that all code that should not
> run in parallel with pci_pm_runtime_resume() is executed in the "late"
> phases of system suspend, freeze and poweroff transitions.
> 
> [Note that the pm_runtime_suspended() check in pci_dev_keep_suspended()
> is an optimization, because if is not passed, all of the subsequent
> checks may be skipped and some of them are much more overhead in
> general.]
> 
> Also use the observation that if the device is in runtime suspend
> at the beginning of the "late" phase of a system-wide suspend-like
> transition, its state cannot change going forward (runtime PM is
> disabled for it at that time) until the transition is over and the
> subsequent system-wide PM callbacks should be skipped for it (as
> they generally assume the device to not be suspended), so add checks
> for that in pci_pm_suspend_late/noirq(), pci_pm_freeze_late/noirq()
> and pci_pm_poweroff_late/noirq().
> 
> Moreover, if pci_pm_resume_noirq() or pci_pm_restore_noirq() is
> called during the subsequent system-wide resume transition and if
> the device was left in runtime suspend previously, its runtime PM
> status needs to be changed to "active" as it is going to be put
> into the full-power state, so add checks for that too to these
> functions.
> 
> In turn, if pci_pm_thaw_noirq() runs after the device has been
> left in runtime suspend, the subsequent "thaw" callbacks need
> to be skipped for it (as they may not work correctly with a
> suspended device), so set the power.direct_complete flag for the
> device then to make the PM core skip those callbacks.
> 
> In addition to the above add a core helper for checking if
> DPM_FLAG_SMART_SUSPEND is set and the device runtime PM status is
> "suspended" at the same time, which is done quite often in the new
> code (and will be done elsewhere going forward too).
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

> ---
> 
> -> v2: Implement the entire handling of DPM_FLAG_SMART_SUSPEND in
>        the PCI bus type (instead of doing that in the core).
> 
> ---
>  Documentation/power/pci.txt |   14 +++++
>  drivers/base/power/main.c   |    6 ++
>  drivers/pci/pci-driver.c    |  103 ++++++++++++++++++++++++++++++++++++--------
>  include/linux/pm.h          |    2 
>  4 files changed, 108 insertions(+), 17 deletions(-)
> 
> Index: linux-pm/drivers/pci/pci-driver.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci-driver.c
> +++ linux-pm/drivers/pci/pci-driver.c
> @@ -734,18 +734,25 @@ static int pci_pm_suspend(struct device
>  
>  	if (!pm) {
>  		pci_pm_default_suspend(pci_dev);
> -		goto Fixup;
> +		return 0;
>  	}
>  
>  	/*
> -	 * PCI devices suspended at run time need to be resumed at this point,
> -	 * because in general it is necessary to reconfigure them for system
> -	 * suspend.  Namely, if the device is supposed to wake up the system
> -	 * from the sleep state, we may need to reconfigure it for this purpose.
> -	 * In turn, if the device is not supposed to wake up the system from the
> -	 * sleep state, we'll have to prevent it from signaling wake-up.
> +	 * PCI devices suspended at run time may need to be resumed at this
> +	 * point, because in general it may be necessary to reconfigure them for
> +	 * system suspend.  Namely, if the device is expected to wake up the
> +	 * system from the sleep state, it may have to be reconfigured for this
> +	 * purpose, or if the device is not expected to wake up the system from
> +	 * the sleep state, it should be prevented from signaling wakeup events
> +	 * going forward.
> +	 *
> +	 * Also if the driver of the device does not indicate that its system
> +	 * suspend callbacks can cope with runtime-suspended devices, it is
> +	 * better to resume the device from runtime suspend here.
>  	 */
> -	pm_runtime_resume(dev);
> +	if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) ||
> +	    !pci_dev_keep_suspended(pci_dev))
> +		pm_runtime_resume(dev);
>  
>  	pci_dev->state_saved = false;
>  	if (pm->suspend) {
> @@ -765,17 +772,27 @@ static int pci_pm_suspend(struct device
>  		}
>  	}
>  
> - Fixup:
> -	pci_fixup_device(pci_fixup_suspend, pci_dev);
> -
>  	return 0;
>  }
>  
> +static int pci_pm_suspend_late(struct device *dev)
> +{
> +	if (dev_pm_smart_suspend_and_suspended(dev))
> +		return 0;
> +
> +	pci_fixup_device(pci_fixup_suspend, to_pci_dev(dev));
> +
> +	return pm_generic_suspend_late(dev);
> +}
> +
>  static int pci_pm_suspend_noirq(struct device *dev)
>  {
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
>  	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
>  
> +	if (dev_pm_smart_suspend_and_suspended(dev))
> +		return 0;
> +
>  	if (pci_has_legacy_pm_support(pci_dev))
>  		return pci_legacy_suspend_late(dev, PMSG_SUSPEND);
>  
> @@ -834,6 +851,14 @@ static int pci_pm_resume_noirq(struct de
>  	struct device_driver *drv = dev->driver;
>  	int error = 0;
>  
> +	/*
> +	 * Devices with DPM_FLAG_SMART_SUSPEND may be left in runtime suspend
> +	 * during system suspend, so update their runtime PM status to "active"
> +	 * as they are going to be put into D0 shortly.
> +	 */
> +	if (dev_pm_smart_suspend_and_suspended(dev))
> +		pm_runtime_set_active(dev);
> +
>  	pci_pm_default_resume_early(pci_dev);
>  
>  	if (pci_has_legacy_pm_support(pci_dev))
> @@ -876,6 +901,7 @@ static int pci_pm_resume(struct device *
>  #else /* !CONFIG_SUSPEND */
>  
>  #define pci_pm_suspend		NULL
> +#define pci_pm_suspend_late	NULL
>  #define pci_pm_suspend_noirq	NULL
>  #define pci_pm_resume		NULL
>  #define pci_pm_resume_noirq	NULL
> @@ -910,7 +936,8 @@ static int pci_pm_freeze(struct device *
>  	 * devices should not be touched during freeze/thaw transitions,
>  	 * however.
>  	 */
> -	pm_runtime_resume(dev);
> +	if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND))
> +		pm_runtime_resume(dev);
>  
>  	pci_dev->state_saved = false;
>  	if (pm->freeze) {
> @@ -925,11 +952,22 @@ static int pci_pm_freeze(struct device *
>  	return 0;
>  }
>  
> +static int pci_pm_freeze_late(struct device *dev)
> +{
> +	if (dev_pm_smart_suspend_and_suspended(dev))
> +		return 0;
> +
> +	return pm_generic_freeze_late(dev);;
> +}
> +
>  static int pci_pm_freeze_noirq(struct device *dev)
>  {
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
>  	struct device_driver *drv = dev->driver;
>  
> +	if (dev_pm_smart_suspend_and_suspended(dev))
> +		return 0;
> +
>  	if (pci_has_legacy_pm_support(pci_dev))
>  		return pci_legacy_suspend_late(dev, PMSG_FREEZE);
>  
> @@ -959,6 +997,16 @@ static int pci_pm_thaw_noirq(struct devi
>  	struct device_driver *drv = dev->driver;
>  	int error = 0;
>  
> +	/*
> +	 * If the device is in runtime suspend, the code below may not work
> +	 * correctly with it, so skip that code and make the PM core skip all of
> +	 * the subsequent "thaw" callbacks for the device.
> +	 */
> +	if (dev_pm_smart_suspend_and_suspended(dev)) {
> +		dev->power.direct_complete = true;
> +		return 0;
> +	}
> +
>  	if (pcibios_pm_ops.thaw_noirq) {
>  		error = pcibios_pm_ops.thaw_noirq(dev);
>  		if (error)
> @@ -1008,11 +1056,13 @@ static int pci_pm_poweroff(struct device
>  
>  	if (!pm) {
>  		pci_pm_default_suspend(pci_dev);
> -		goto Fixup;
> +		return 0;
>  	}
>  
>  	/* The reason to do that is the same as in pci_pm_suspend(). */
> -	pm_runtime_resume(dev);
> +	if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) ||
> +	    !pci_dev_keep_suspended(pci_dev))
> +		pm_runtime_resume(dev);
>  
>  	pci_dev->state_saved = false;
>  	if (pm->poweroff) {
> @@ -1024,17 +1074,27 @@ static int pci_pm_poweroff(struct device
>  			return error;
>  	}
>  
> - Fixup:
> -	pci_fixup_device(pci_fixup_suspend, pci_dev);
> -
>  	return 0;
>  }
>  
> +static int pci_pm_poweroff_late(struct device *dev)
> +{
> +	if (dev_pm_smart_suspend_and_suspended(dev))
> +		return 0;
> +
> +	pci_fixup_device(pci_fixup_suspend, to_pci_dev(dev));
> +
> +	return pm_generic_poweroff_late(dev);
> +}
> +
>  static int pci_pm_poweroff_noirq(struct device *dev)
>  {
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
>  	struct device_driver *drv = dev->driver;
>  
> +	if (dev_pm_smart_suspend_and_suspended(dev))
> +		return 0;
> +
>  	if (pci_has_legacy_pm_support(to_pci_dev(dev)))
>  		return pci_legacy_suspend_late(dev, PMSG_HIBERNATE);
>  
> @@ -1076,6 +1136,10 @@ static int pci_pm_restore_noirq(struct d
>  	struct device_driver *drv = dev->driver;
>  	int error = 0;
>  
> +	/* This is analogous to the pci_pm_resume_noirq() case. */
> +	if (dev_pm_smart_suspend_and_suspended(dev))
> +		pm_runtime_set_active(dev);
> +
>  	if (pcibios_pm_ops.restore_noirq) {
>  		error = pcibios_pm_ops.restore_noirq(dev);
>  		if (error)
> @@ -1124,10 +1188,12 @@ static int pci_pm_restore(struct device
>  #else /* !CONFIG_HIBERNATE_CALLBACKS */
>  
>  #define pci_pm_freeze		NULL
> +#define pci_pm_freeze_late	NULL
>  #define pci_pm_freeze_noirq	NULL
>  #define pci_pm_thaw		NULL
>  #define pci_pm_thaw_noirq	NULL
>  #define pci_pm_poweroff		NULL
> +#define pci_pm_poweroff_late	NULL
>  #define pci_pm_poweroff_noirq	NULL
>  #define pci_pm_restore		NULL
>  #define pci_pm_restore_noirq	NULL
> @@ -1243,10 +1309,13 @@ static const struct dev_pm_ops pci_dev_p
>  	.prepare = pci_pm_prepare,
>  	.complete = pci_pm_complete,
>  	.suspend = pci_pm_suspend,
> +	.suspend_late = pci_pm_suspend_late,
>  	.resume = pci_pm_resume,
>  	.freeze = pci_pm_freeze,
> +	.freeze_late = pci_pm_freeze_late,
>  	.thaw = pci_pm_thaw,
>  	.poweroff = pci_pm_poweroff,
> +	.poweroff_late = pci_pm_poweroff_late,
>  	.restore = pci_pm_restore,
>  	.suspend_noirq = pci_pm_suspend_noirq,
>  	.resume_noirq = pci_pm_resume_noirq,
> Index: linux-pm/Documentation/power/pci.txt
> ===================================================================
> --- linux-pm.orig/Documentation/power/pci.txt
> +++ linux-pm/Documentation/power/pci.txt
> @@ -980,6 +980,20 @@ positive value from pci_pm_prepare() if
>  driver of the device returns a positive value.  That allows the driver to opt
>  out from using the direct-complete mechanism dynamically.
>  
> +The DPM_FLAG_SMART_SUSPEND flag tells the PCI bus type that from the driver's
> +perspective the device can be safely left in runtime suspend during system
> +suspend.  That causes pci_pm_suspend(), pci_pm_freeze() and pci_pm_poweroff()
> +to skip resuming the device from runtime suspend unless there are PCI-specific
> +reasons for doing that.  Also, it causes pci_pm_suspend_late/noirq(),
> +pci_pm_freeze_late/noirq() and pci_pm_poweroff_late/noirq() to return early
> +if the device remains in runtime suspend in the beginning of the "late" phase
> +of the system-wide transition under way.  Moreover, if the device is in
> +runtime suspend in pci_pm_resume_noirq() or pci_pm_restore_noirq(), its runtime
> +power management status will be changed to "active" (as it is going to be put
> +into D0 going forward), but if it is in runtime suspend in pci_pm_thaw_noirq(),
> +the function will set the power.direct_complete flag for it (to make the PM core
> +skip the subsequent "thaw" callbacks for it) and return.
> +
>  3.2. Device Runtime Power Management
>  ------------------------------------
>  In addition to providing device power management callbacks PCI device drivers
> Index: linux-pm/drivers/base/power/main.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/main.c
> +++ linux-pm/drivers/base/power/main.c
> @@ -1861,3 +1861,9 @@ void device_pm_check_callbacks(struct de
>  		 !dev->driver->suspend && !dev->driver->resume));
>  	spin_unlock_irq(&dev->power.lock);
>  }
> +
> +bool dev_pm_smart_suspend_and_suspended(struct device *dev)
> +{
> +	return dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
> +		pm_runtime_status_suspended(dev);
> +}
> Index: linux-pm/include/linux/pm.h
> ===================================================================
> --- linux-pm.orig/include/linux/pm.h
> +++ linux-pm/include/linux/pm.h
> @@ -765,6 +765,8 @@ extern int pm_generic_poweroff_late(stru
>  extern int pm_generic_poweroff(struct device *dev);
>  extern void pm_generic_complete(struct device *dev);
>  
> +extern bool dev_pm_smart_suspend_and_suspended(struct device *dev);
> +
>  #else /* !CONFIG_PM_SLEEP */
>  
>  #define device_pm_lock() do {} while (0)
> 
> 

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 09/12] PM / mfd: intel-lpss: Use DPM_FLAG_SMART_SUSPEND
  2017-10-31 16:28     ` Rafael J. Wysocki
@ 2017-11-01  9:28       ` Lee Jones
  2017-11-01 20:26         ` Rafael J. Wysocki
  0 siblings, 1 reply; 135+ messages in thread
From: Lee Jones @ 2017-11-01  9:28 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c

On Tue, 31 Oct 2017, Rafael J. Wysocki wrote:

> On Tue, Oct 31, 2017 at 4:09 PM, Lee Jones <lee.jones@linaro.org> wrote:
> > On Mon, 16 Oct 2017, Rafael J. Wysocki wrote:
> >
> >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>
> >> Make the intel-lpss driver set DPM_FLAG_SMART_SUSPEND for its
> >> devices which will allow them to stay in runtime suspend during
> >> system suspend unless they need to be reconfigured for some reason.
> >>
> >> Also make it avoid resuming its child devices if they have
> >> DPM_FLAG_SMART_SUSPEND set to allow them to remain in runtime
> >> suspend during system suspend.
> >>
> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> ---
> >>  drivers/mfd/intel-lpss.c |    6 +++++-
> >>  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > Is this patch independent?
> 
> It depends on the flag definition at least, but functionally it also
> depends on the PCI support for the flag.

No problem.  Which tree to you propose this goes through?

> > For my own reference:
> >   Acked-for-MFD-by: Lee Jones <lee.jones@linaro.org>


-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 09/12] PM / mfd: intel-lpss: Use DPM_FLAG_SMART_SUSPEND
  2017-11-01  9:28       ` Lee Jones
@ 2017-11-01 20:26         ` Rafael J. Wysocki
  2017-11-08 11:08           ` Lee Jones
  0 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-01 20:26 UTC (permalink / raw)
  To: Lee Jones
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, Bjorn Helgaas,
	Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c

On Wed, Nov 1, 2017 at 10:28 AM, Lee Jones <lee.jones@linaro.org> wrote:
> On Tue, 31 Oct 2017, Rafael J. Wysocki wrote:
>
>> On Tue, Oct 31, 2017 at 4:09 PM, Lee Jones <lee.jones@linaro.org> wrote:
>> > On Mon, 16 Oct 2017, Rafael J. Wysocki wrote:
>> >
>> >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> >>
>> >> Make the intel-lpss driver set DPM_FLAG_SMART_SUSPEND for its
>> >> devices which will allow them to stay in runtime suspend during
>> >> system suspend unless they need to be reconfigured for some reason.
>> >>
>> >> Also make it avoid resuming its child devices if they have
>> >> DPM_FLAG_SMART_SUSPEND set to allow them to remain in runtime
>> >> suspend during system suspend.
>> >>
>> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> >> ---
>> >>  drivers/mfd/intel-lpss.c |    6 +++++-
>> >>  1 file changed, 5 insertions(+), 1 deletion(-)
>> >
>> > Is this patch independent?
>>
>> It depends on the flag definition at least, but functionally it also
>> depends on the PCI support for the flag.
>
> No problem.  Which tree to you propose this goes through?

linux-pm.git if that's not a problem as the patches it depends on will
go through it too.

That said I'll resend it when the core patches it depends on are ready.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 1/6] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags
  2017-10-27 22:17   ` [PATCH v2 1/6] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags Rafael J. Wysocki
@ 2017-11-06  8:07     ` Ulf Hansson
  0 siblings, 0 replies; 135+ messages in thread
From: Ulf Hansson @ 2017-11-06  8:07 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman

On 28 October 2017 at 00:17, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The motivation for this change is to provide a way to work around
> a problem with the direct-complete mechanism used for avoiding
> system suspend/resume handling for devices in runtime suspend.
>
> The problem is that some middle layer code (the PCI bus type and
> the ACPI PM domain in particular) returns positive values from its
> system suspend ->prepare callbacks regardless of whether the driver's
> ->prepare returns a positive value or 0, which effectively prevents
> drivers from being able to control the direct-complete feature.
> Some drivers need that control, however, and the PCI bus type has
> grown its own flag to deal with this issue, but since it is not
> limited to PCI, it is better to address it by adding driver flags at
> the core level.
>
> To that end, add a driver_flags field to struct dev_pm_info for flags
> that can be set by device drivers at the probe time to inform the PM
> core and/or bus types, PM domains and so on on the capabilities and/or
> preferences of device drivers.  Also add two static inline helpers
> for setting that field and testing it against a given set of flags
> and make the driver core clear it automatically on driver remove
> and probe failures.
>
> Define and document two PM driver flags related to the direct-
> complete feature: NEVER_SKIP and SMART_PREPARE that can be used,
> respectively, to indicate to the PM core that the direct-complete
> mechanism should never be used for the device and to inform the
> middle layer code (bus types, PM domains etc) that it can only
> request the PM core to use the direct-complete mechanism for
> the device (by returning a positive value from its ->prepare
> callback) if it also has been requested by the driver.
>
> While at it, make the core check pm_runtime_suspended() when
> setting power.direct_complete so that it doesn't need to be
> checked by ->prepare callbacks.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>

If not too late:

Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

> ---
>
> -> v2: Do not use pm_generic_prepare() in acpi_subsys_prepare()
>        as the latter has to distinguish between the lack of the
>        driver's ->prepare callback and the situation in which that
>        callback has returned 0 if DPM_FLAG_SMART_PREPARE is set.
>
> ---
>  Documentation/driver-api/pm/devices.rst |   14 ++++++++++++++
>  Documentation/power/pci.txt             |   19 +++++++++++++++++++
>  drivers/acpi/device_pm.c                |   13 +++++++++----
>  drivers/base/dd.c                       |    2 ++
>  drivers/base/power/main.c               |    4 +++-
>  drivers/pci/pci-driver.c                |    5 ++++-
>  include/linux/device.h                  |   10 ++++++++++
>  include/linux/pm.h                      |   20 ++++++++++++++++++++
>  8 files changed, 81 insertions(+), 6 deletions(-)
>
> Index: linux-pm/include/linux/device.h
> ===================================================================
> --- linux-pm.orig/include/linux/device.h
> +++ linux-pm/include/linux/device.h
> @@ -1070,6 +1070,16 @@ static inline void dev_pm_syscore_device
>  #endif
>  }
>
> +static inline void dev_pm_set_driver_flags(struct device *dev, u32 flags)
> +{
> +       dev->power.driver_flags = flags;
> +}
> +
> +static inline bool dev_pm_test_driver_flags(struct device *dev, u32 flags)
> +{
> +       return !!(dev->power.driver_flags & flags);
> +}
> +
>  static inline void device_lock(struct device *dev)
>  {
>         mutex_lock(&dev->mutex);
> Index: linux-pm/include/linux/pm.h
> ===================================================================
> --- linux-pm.orig/include/linux/pm.h
> +++ linux-pm/include/linux/pm.h
> @@ -550,6 +550,25 @@ struct pm_subsys_data {
>  #endif
>  };
>
> +/*
> + * Driver flags to control system suspend/resume behavior.
> + *
> + * These flags can be set by device drivers at the probe time.  They need not be
> + * cleared by the drivers as the driver core will take care of that.
> + *
> + * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
> + * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
> + *
> + * Setting SMART_PREPARE instructs bus types and PM domains which may want
> + * system suspend/resume callbacks to be skipped for the device to return 0 from
> + * their ->prepare callbacks if the driver's ->prepare callback returns 0 (in
> + * other words, the system suspend/resume callbacks can only be skipped for the
> + * device if its driver doesn't object against that).  This flag has no effect
> + * if NEVER_SKIP is set.
> + */
> +#define DPM_FLAG_NEVER_SKIP    BIT(0)
> +#define DPM_FLAG_SMART_PREPARE BIT(1)
> +
>  struct dev_pm_info {
>         pm_message_t            power_state;
>         unsigned int            can_wakeup:1;
> @@ -561,6 +580,7 @@ struct dev_pm_info {
>         bool                    is_late_suspended:1;
>         bool                    early_init:1;   /* Owned by the PM core */
>         bool                    direct_complete:1;      /* Owned by the PM core */
> +       u32                     driver_flags;
>         spinlock_t              lock;
>  #ifdef CONFIG_PM_SLEEP
>         struct list_head        entry;
> Index: linux-pm/drivers/base/dd.c
> ===================================================================
> --- linux-pm.orig/drivers/base/dd.c
> +++ linux-pm/drivers/base/dd.c
> @@ -464,6 +464,7 @@ pinctrl_bind_failed:
>         if (dev->pm_domain && dev->pm_domain->dismiss)
>                 dev->pm_domain->dismiss(dev);
>         pm_runtime_reinit(dev);
> +       dev_pm_set_driver_flags(dev, 0);
>
>         switch (ret) {
>         case -EPROBE_DEFER:
> @@ -869,6 +870,7 @@ static void __device_release_driver(stru
>                 if (dev->pm_domain && dev->pm_domain->dismiss)
>                         dev->pm_domain->dismiss(dev);
>                 pm_runtime_reinit(dev);
> +               dev_pm_set_driver_flags(dev, 0);
>
>                 klist_remove(&dev->p->knode_driver);
>                 device_pm_check_callbacks(dev);
> Index: linux-pm/drivers/base/power/main.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/main.c
> +++ linux-pm/drivers/base/power/main.c
> @@ -1700,7 +1700,9 @@ unlock:
>          * applies to suspend transitions, however.
>          */
>         spin_lock_irq(&dev->power.lock);
> -       dev->power.direct_complete = ret > 0 && state.event == PM_EVENT_SUSPEND;
> +       dev->power.direct_complete = state.event == PM_EVENT_SUSPEND &&
> +               pm_runtime_suspended(dev) && ret > 0 &&
> +               !dev_pm_test_driver_flags(dev, DPM_FLAG_NEVER_SKIP);
>         spin_unlock_irq(&dev->power.lock);
>         return 0;
>  }
> Index: linux-pm/drivers/pci/pci-driver.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci-driver.c
> +++ linux-pm/drivers/pci/pci-driver.c
> @@ -682,8 +682,11 @@ static int pci_pm_prepare(struct device
>
>         if (drv && drv->pm && drv->pm->prepare) {
>                 int error = drv->pm->prepare(dev);
> -               if (error)
> +               if (error < 0)
>                         return error;
> +
> +               if (!error && dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_PREPARE))
> +                       return 0;
>         }
>         return pci_dev_keep_suspended(to_pci_dev(dev));
>  }
> Index: linux-pm/drivers/acpi/device_pm.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/device_pm.c
> +++ linux-pm/drivers/acpi/device_pm.c
> @@ -959,11 +959,16 @@ static bool acpi_dev_needs_resume(struct
>  int acpi_subsys_prepare(struct device *dev)
>  {
>         struct acpi_device *adev = ACPI_COMPANION(dev);
> -       int ret;
>
> -       ret = pm_generic_prepare(dev);
> -       if (ret < 0)
> -               return ret;
> +       if (dev->driver && dev->driver->pm && dev->driver->pm->prepare) {
> +               int ret = dev->driver->pm->prepare(dev);
> +
> +               if (ret < 0)
> +                       return ret;
> +
> +               if (!ret && dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_PREPARE))
> +                       return 0;
> +       }
>
>         if (!adev || !pm_runtime_suspended(dev))
>                 return 0;
> Index: linux-pm/Documentation/driver-api/pm/devices.rst
> ===================================================================
> --- linux-pm.orig/Documentation/driver-api/pm/devices.rst
> +++ linux-pm/Documentation/driver-api/pm/devices.rst
> @@ -354,6 +354,20 @@ the phases are: ``prepare``, ``suspend``
>         is because all such devices are initially set to runtime-suspended with
>         runtime PM disabled.
>
> +       This feature also can be controlled by device drivers by using the
> +       ``DPM_FLAG_NEVER_SKIP`` and ``DPM_FLAG_SMART_PREPARE`` driver power
> +       management flags.  [Typically, they are set at the time the driver is
> +       probed against the device in question by passing them to the
> +       :c:func:`dev_pm_set_driver_flags` helper function.]  If the first of
> +       these flags is set, the PM core will not apply the direct-complete
> +       procedure described above to the given device and, consequenty, to any
> +       of its ancestors.  The second flag, when set, informs the middle layer
> +       code (bus types, device types, PM domains, classes) that it should take
> +       the return value of the ``->prepare`` callback provided by the driver
> +       into account and it may only return a positive value from its own
> +       ``->prepare`` callback if the driver's one also has returned a positive
> +       value.
> +
>      2. The ``->suspend`` methods should quiesce the device to stop it from
>         performing I/O.  They also may save the device registers and put it into
>         the appropriate low-power state, depending on the bus type the device is
> Index: linux-pm/Documentation/power/pci.txt
> ===================================================================
> --- linux-pm.orig/Documentation/power/pci.txt
> +++ linux-pm/Documentation/power/pci.txt
> @@ -961,6 +961,25 @@ dev_pm_ops to indicate that one suspend
>  .suspend(), .freeze(), and .poweroff() members and one resume routine is to
>  be pointed to by the .resume(), .thaw(), and .restore() members.
>
> +3.1.19. Driver Flags for Power Management
> +
> +The PM core allows device drivers to set flags that influence the handling of
> +power management for the devices by the core itself and by middle layer code
> +including the PCI bus type.  The flags should be set once at the driver probe
> +time with the help of the dev_pm_set_driver_flags() function and they should not
> +be updated directly afterwards.
> +
> +The DPM_FLAG_NEVER_SKIP flag prevents the PM core from using the direct-complete
> +mechanism allowing device suspend/resume callbacks to be skipped if the device
> +is in runtime suspend when the system suspend starts.  That also affects all of
> +the ancestors of the device, so this flag should only be used if absolutely
> +necessary.
> +
> +The DPM_FLAG_SMART_PREPARE flag instructs the PCI bus type to only return a
> +positive value from pci_pm_prepare() if the ->prepare callback provided by the
> +driver of the device returns a positive value.  That allows the driver to opt
> +out from using the direct-complete mechanism dynamically.
> +
>  3.2. Device Runtime Power Management
>  ------------------------------------
>  In addition to providing device power management callbacks PCI device drivers
>

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 3/6] PM / core: Add SMART_SUSPEND driver flag
  2017-10-27 22:22   ` [PATCH v2 3/6] PM / core: Add SMART_SUSPEND " Rafael J. Wysocki
@ 2017-11-06  8:09     ` Ulf Hansson
  2017-11-06 11:23       ` Rafael J. Wysocki
  0 siblings, 1 reply; 135+ messages in thread
From: Ulf Hansson @ 2017-11-06  8:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman

On 28 October 2017 at 00:22, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Define and document a SMART_SUSPEND flag to instruct bus types and PM
> domains that the system suspend callbacks provided by the driver can
> cope with runtime-suspended devices, so from the driver's perspective
> it should be safe to leave devices in runtime suspend during system
> suspend.
>
> Setting that flag may also cause middle-layer code (bus types,
> PM domains etc.) to skip invocations of the ->suspend_late and
> ->suspend_noirq callbacks provided by the driver if the device
> is in runtime suspend at the beginning of the "late" phase of
> the system-wide suspend transition, in which case the driver's
> system-wide resume callbacks may be invoked back-to-back with
> its ->runtime_suspend callback, so the driver has to be able to
> cope with that too.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

If not too late:

Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

> ---
>
> -> v2: Drop the changes in main.c, as the logic implemented by them
>        previously is now going to be implemented in the PCI bus type
>        and the ACPI PM domain directly.
>
> ---
>  Documentation/driver-api/pm/devices.rst |   20 ++++++++++++++++++++
>  drivers/base/power/main.c               |    3 +++
>  include/linux/pm.h                      |    8 ++++++++
>  3 files changed, 31 insertions(+)
>
> Index: linux-pm/Documentation/driver-api/pm/devices.rst
> ===================================================================
> --- linux-pm.orig/Documentation/driver-api/pm/devices.rst
> +++ linux-pm/Documentation/driver-api/pm/devices.rst
> @@ -766,6 +766,26 @@ the state of devices (possibly except fo
>  from their ``->prepare`` and ``->suspend`` callbacks (or equivalent) *before*
>  invoking device drivers' ``->suspend`` callbacks (or equivalent).
>
> +Some bus types and PM domains have a policy to resume all devices from runtime
> +suspend upfront in their ``->suspend`` callbacks, but that may not be really
> +necessary if the driver of the device can cope with runtime-suspended devices.
> +The driver can indicate that by setting ``DPM_FLAG_SMART_SUSPEND`` in
> +:c:member:`power.driver_flags` at the probe time, by passing it to the
> +:c:func:`dev_pm_set_driver_flags` helper.  That also may cause middle-layer code
> +(bus types, PM domains etc.) to skip the ``->suspend_late`` and
> +``->suspend_noirq`` callbacks provided by the driver if the device remains in
> +runtime suspend at the beginning of the ``suspend_late`` phase of system-wide
> +suspend (or in the ``poweroff_late`` phase of hibernation), when runtime PM
> +has been disabled for it, under the assumption that its state should not change
> +after that point until the system-wide transition is over.  If that happens, the
> +driver's system-wide resume callbacks, if present, may still be invoked during
> +the subsequent system-wide resume transition and the device's runtime power
> +management status may be set to "active" before enabling runtime PM for it,
> +so the driver must be prepared to cope with the invocation of its system-wide
> +resume callbacks back-to-back with its ``->runtime_suspend`` one (without the
> +intervening ``->runtime_resume`` and so on) and the final state of the device
> +must reflect the "active" status for runtime PM in that case.
> +
>  During system-wide resume from a sleep state it's easiest to put devices into
>  the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.
>  Refer to that document for more information regarding this particular issue as
> Index: linux-pm/drivers/base/power/main.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/main.c
> +++ linux-pm/drivers/base/power/main.c
> @@ -1652,6 +1652,9 @@ static int device_prepare(struct device
>         if (dev->power.syscore)
>                 return 0;
>
> +       WARN_ON(dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
> +               !pm_runtime_enabled(dev));
> +
>         /*
>          * If a device's parent goes into runtime suspend at the wrong time,
>          * it won't be possible to resume the device.  To prevent this we
> Index: linux-pm/include/linux/pm.h
> ===================================================================
> --- linux-pm.orig/include/linux/pm.h
> +++ linux-pm/include/linux/pm.h
> @@ -558,6 +558,7 @@ struct pm_subsys_data {
>   *
>   * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
>   * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
> + * SMART_SUSPEND: No need to resume the device from runtime suspend.
>   *
>   * Setting SMART_PREPARE instructs bus types and PM domains which may want
>   * system suspend/resume callbacks to be skipped for the device to return 0 from
> @@ -565,9 +566,16 @@ struct pm_subsys_data {
>   * other words, the system suspend/resume callbacks can only be skipped for the
>   * device if its driver doesn't object against that).  This flag has no effect
>   * if NEVER_SKIP is set.
> + *
> + * Setting SMART_SUSPEND instructs bus types and PM domains which may want to
> + * runtime resume the device upfront during system suspend that doing so is not
> + * necessary from the driver's perspective.  It also may cause them to skip
> + * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by
> + * the driver if they decide to leave the device in runtime suspend.
>   */
>  #define DPM_FLAG_NEVER_SKIP    BIT(0)
>  #define DPM_FLAG_SMART_PREPARE BIT(1)
> +#define DPM_FLAG_SMART_SUSPEND BIT(2)
>
>  struct dev_pm_info {
>         pm_message_t            power_state;
>

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 3/6] PM / core: Add SMART_SUSPEND driver flag
  2017-11-06  8:09     ` Ulf Hansson
@ 2017-11-06 11:23       ` Rafael J. Wysocki
  0 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-06 11:23 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, Linux PM, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Andy Shevchenko,
	Kevin Hilman

On Mon, Nov 6, 2017 at 9:09 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On 28 October 2017 at 00:22, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> Define and document a SMART_SUSPEND flag to instruct bus types and PM
>> domains that the system suspend callbacks provided by the driver can
>> cope with runtime-suspended devices, so from the driver's perspective
>> it should be safe to leave devices in runtime suspend during system
>> suspend.
>>
>> Setting that flag may also cause middle-layer code (bus types,
>> PM domains etc.) to skip invocations of the ->suspend_late and
>> ->suspend_noirq callbacks provided by the driver if the device
>> is in runtime suspend at the beginning of the "late" phase of
>> the system-wide suspend transition, in which case the driver's
>> system-wide resume callbacks may be invoked back-to-back with
>> its ->runtime_suspend callback, so the driver has to be able to
>> cope with that too.
>>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>
> If not too late:
>
> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

Not too late, thanks!

I'll be sending the next batch of this shortly. :-)

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 2)
  2017-10-27 22:11 ` [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 1) Rafael J. Wysocki
                     ` (5 preceding siblings ...)
  2017-10-27 22:30   ` [PATCH v2 6/6] ACPI " Rafael J. Wysocki
@ 2017-11-08  0:41   ` Rafael J. Wysocki
  2017-11-08 13:25     ` [PATCH v2 1/6] PM / core: Add LEAVE_SUSPENDED driver flag Rafael J. Wysocki
                       ` (6 more replies)
  6 siblings, 7 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-08  0:41 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

Hi All,

This is a follow-up for the first part of the PM driver flags series
sent previously some time ago with an intro as follows:

On Saturday, October 28, 2017 12:11:55 AM CET Rafael J. Wysocki wrote:
> The following part of the original cover letter still applies:
> 
> On Monday, October 16, 2017 3:12:35 AM CEST Rafael J. Wysocki wrote:
> > 
> > This work was triggered by attempts to fix and optimize PM in the
> > i2c-designware-platdev driver that ended up with adding a couple of
> > flags to the driver's internal data structures for the tracking of
> > device state (https://marc.info/?l=linux-acpi&m=150629646805636&w=2).
> > That approach is sort of suboptimal, though, because other drivers will
> > probably want to do similar things and if all of them need to use internal
> > flags for that, quite a bit of code duplication may ensue at least.
> > 
> > That can be avoided in a couple of ways and one of them is to provide a means
> > for drivers to tell the core what to do and to make the core take care of it
> > if told to do so.  Hence, the idea to use driver flags for system-wide PM
> > that was briefly discussed during the LPC in LA last month.
> 
> [...]
> 
> > What can work (and this is the only strategy that can work AFAICS) is to
> > point different callback pointers *in* *a* *driver* to the same routine
> > if the driver wants to reuse that code.  That actually will work for PCI
> > and USB drivers today, at least most of the time, but unfortunately there
> > are problems with it for, say, platform devices.
> > 
> > The first problem is the requirement to track the status of the device
> > (suspended vs not suspended) in the callbacks, because the system-wide PM
> > code in the PM core doesn't do that.  The runtime PM framework does it, so
> > this means adding some extra code which isn't necessary for runtime PM to
> > the callback routines and that is not particularly nice.
> > 
> > The second problem is that, if the driver wants to do anything in its
> > ->suspend callback, it generally has to prevent runtime suspend of the
> > device from taking place in parallel with that, which is quite cumbersome.
> > Usually, that is taken care of by resuming the device from runtime suspend
> > upfront, but generally doing that is wasteful (there may be no real need to
> > resume the device except for the fact that the code is designed this way).
> > 
> > On top of the above, there are optimizations to be made, like leaving certain
> > devices in suspend after system resume to avoid wasting time on waiting for
> > them to resume before user space can run again and similar.
> > 
> > This patch series focuses on addressing those problems so as to make it
> > easier to reuse callback routines by pointing different callback pointers
> > to them in device drivers.  The flags introduced here are to instruct the
> > PM core and middle layers (whatever they are) on how the driver wants the
> > device to be handled and then the driver has to provide callbacks to match
> > these instructions and the rest should be taken care of by the code above it.
> > 
> > The flags are introduced one by one to avoid making too many changes in
> > one go and to allow things to be explained better (hopefully).  They mostly
> > are mutually independent with some clearly documented exceptions.
> 
> but I had to rework the core patches to address the problem pointed with the
> generic power domains (genpd) framework pointed out by Ulf.
> 
> Namely, genpd expects its "noirq" callbacks to be invoked for devices in
> runtime suspend too and it has valid reasons for that, so its "noirq"
> callbacks can never be skipped, even for devices with the SMART_SUSPEND
> flag set.  For this reason, the logic related to DPM_FLAG_SMART_SUSPEND
> had to be moved from the core to the PCI bus type and the ACPI PM domain
> which are mostly affected by it anyway.  The code after the changes looks
> more straightforward to me, but it generally is more code and some patterns
> had to be repeated in a few places.

I promised to send the rest of the series then:

> I will send the core patches for the remaining two flags introduced by the
> original series separately and the intel-lpss and i2c-designware ones will
> be posted when the core patches have been reviewed and agreed on.

and here it goes.

It actually only adds support for one additional flag, namely for
DPM_FLAG_LEAVE_SUSPENDED, to the PM core (basic bits), PCI bus type and the
ACPI PM domain.

That part of the series (patches [1-3/6]) is rather straightforward and, as PCI
and the ACPI PM domain are concerned, it should be functionally equivalent to
the previous version of the set, so I retained the Greg's ACKs on these patches.

The other part (patches [4-6/6]) is sort of new, as it makes the PM core
carry out optimizations for devices with DPM_FLAG_LEAVE_SUSPENDED and/or
DPM_FLAG_SMART_SUSPEND set where the "noirq", "early" and "late" system-wide
PM callbacks provided by the drivers are invoked by the core directly.  That
part basically allows platform drivers, for instance, to reuse runtime PM
callbacks (by pointing ->suspend_late and ->resume_early to them) without
adding extra checks to them, as long as they are called directly by the core
(or the ACPI PM domain).

The series should apply on top of linux-next from today.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 09/12] PM / mfd: intel-lpss: Use DPM_FLAG_SMART_SUSPEND
  2017-11-01 20:26         ` Rafael J. Wysocki
@ 2017-11-08 11:08           ` Lee Jones
  0 siblings, 0 replies; 135+ messages in thread
From: Lee Jones @ 2017-11-08 11:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman, Wolfram Sang, linux-i2c

On Wed, 01 Nov 2017, Rafael J. Wysocki wrote:

> On Wed, Nov 1, 2017 at 10:28 AM, Lee Jones <lee.jones@linaro.org> wrote:
> > On Tue, 31 Oct 2017, Rafael J. Wysocki wrote:
> >
> >> On Tue, Oct 31, 2017 at 4:09 PM, Lee Jones <lee.jones@linaro.org> wrote:
> >> > On Mon, 16 Oct 2017, Rafael J. Wysocki wrote:
> >> >
> >> >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> >>
> >> >> Make the intel-lpss driver set DPM_FLAG_SMART_SUSPEND for its
> >> >> devices which will allow them to stay in runtime suspend during
> >> >> system suspend unless they need to be reconfigured for some reason.
> >> >>
> >> >> Also make it avoid resuming its child devices if they have
> >> >> DPM_FLAG_SMART_SUSPEND set to allow them to remain in runtime
> >> >> suspend during system suspend.
> >> >>
> >> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> >> ---
> >> >>  drivers/mfd/intel-lpss.c |    6 +++++-
> >> >>  1 file changed, 5 insertions(+), 1 deletion(-)
> >> >
> >> > Is this patch independent?
> >>
> >> It depends on the flag definition at least, but functionally it also
> >> depends on the PCI support for the flag.
> >
> > No problem.  Which tree to you propose this goes through?
> 
> linux-pm.git if that's not a problem as the patches it depends on will
> go through it too.
> 
> That said I'll resend it when the core patches it depends on are ready.

It's fine by me.

Please check to see if there are any clashes with MFD.  If there are,
I'll need a (small) pull-request from you.

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v2 1/6] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-11-08  0:41   ` [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
@ 2017-11-08 13:25     ` Rafael J. Wysocki
  2017-11-10  9:09       ` Ulf Hansson
  2017-11-08 13:28     ` [PATCH v2 2/6] PCI / PM: Support for " Rafael J. Wysocki
                       ` (5 subsequent siblings)
  6 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-08 13:25 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
instruct the PM core and middle-layer (bus type, PM domain, etc.)
code that it is desirable to leave the device in runtime suspend
after system-wide transitions to the working state (for example,
the device may be slow to resume and it may be better to avoid
resuming it right away).

Generally, the middle-layer code involved in the handling of the
device is expected to indicate to the PM core whether or not the
device may be left in suspend with the help of the device's
power.may_skip_resume status bit.  That has to happen in the "noirq"
phase of the preceding system suspend (or analogous) transition.
The middle layer is then responsible for handling the device as
appropriate in its "noirq" resume callback which is executed
regardless of whether or not the device may be left suspended, but
the other resume callbacks (except for ->complete) will be skipped
automatically by the core if the device really can be left in
suspend.

The additional power.must_resume status bit introduced for the
implementation of this mechanisn is used internally by the PM core
to track the requirement to resume the device (which may depend on
its children etc).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/driver-api/pm/devices.rst |   24 ++++++++++-
 drivers/base/power/main.c               |   65 +++++++++++++++++++++++++++++---
 include/linux/pm.h                      |   14 +++++-
 3 files changed, 93 insertions(+), 10 deletions(-)

Index: linux-pm/include/linux/pm.h
===================================================================
--- linux-pm.orig/include/linux/pm.h
+++ linux-pm/include/linux/pm.h
@@ -559,6 +559,7 @@ struct pm_subsys_data {
  * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
  * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
  * SMART_SUSPEND: No need to resume the device from runtime suspend.
+ * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible.
  *
  * Setting SMART_PREPARE instructs bus types and PM domains which may want
  * system suspend/resume callbacks to be skipped for the device to return 0 from
@@ -572,10 +573,14 @@ struct pm_subsys_data {
  * necessary from the driver's perspective.  It also may cause them to skip
  * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by
  * the driver if they decide to leave the device in runtime suspend.
+ *
+ * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the
+ * driver prefers the device to be left in runtime suspend after system resume.
  */
-#define DPM_FLAG_NEVER_SKIP	BIT(0)
-#define DPM_FLAG_SMART_PREPARE	BIT(1)
-#define DPM_FLAG_SMART_SUSPEND	BIT(2)
+#define DPM_FLAG_NEVER_SKIP		BIT(0)
+#define DPM_FLAG_SMART_PREPARE		BIT(1)
+#define DPM_FLAG_SMART_SUSPEND		BIT(2)
+#define DPM_FLAG_LEAVE_SUSPENDED	BIT(3)
 
 struct dev_pm_info {
 	pm_message_t		power_state;
@@ -597,6 +602,8 @@ struct dev_pm_info {
 	bool			wakeup_path:1;
 	bool			syscore:1;
 	bool			no_pm_callbacks:1;	/* Owned by the PM core */
+	unsigned int		must_resume:1;	/* Owned by the PM core */
+	unsigned int		may_skip_resume:1;	/* Set by subsystems */
 #else
 	unsigned int		should_wakeup:1;
 #endif
@@ -765,6 +772,7 @@ extern int pm_generic_poweroff_late(stru
 extern int pm_generic_poweroff(struct device *dev);
 extern void pm_generic_complete(struct device *dev);
 
+extern bool dev_pm_may_skip_resume(struct device *dev);
 extern bool dev_pm_smart_suspend_and_suspended(struct device *dev);
 
 #else /* !CONFIG_PM_SLEEP */
Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -528,6 +528,18 @@ static void dpm_watchdog_clear(struct dp
 /*------------------------- Resume routines -------------------------*/
 
 /**
+ * dev_pm_may_skip_resume - System-wide device resume optimization check.
+ * @dev: Target device.
+ *
+ * Checks whether or not the device may be left in suspend after a system-wide
+ * transition to the working state.
+ */
+bool dev_pm_may_skip_resume(struct device *dev)
+{
+	return !dev->power.must_resume && pm_transition.event != PM_EVENT_RESTORE;
+}
+
+/**
  * device_resume_noirq - Execute a "noirq resume" callback for given device.
  * @dev: Device to handle.
  * @state: PM transition of the system being carried out.
@@ -575,6 +587,12 @@ static int device_resume_noirq(struct de
 	error = dpm_run_callback(callback, dev, state, info);
 	dev->power.is_noirq_suspended = false;
 
+	if (dev_pm_may_skip_resume(dev)) {
+		pm_runtime_set_suspended(dev);
+		dev->power.is_late_suspended = false;
+		dev->power.is_suspended = false;
+	}
+
  Out:
 	complete_all(&dev->power.completion);
 	TRACE_RESUME(error);
@@ -1076,6 +1094,22 @@ static pm_message_t resume_event(pm_mess
 	return PMSG_ON;
 }
 
+static void dpm_superior_set_must_resume(struct device *dev)
+{
+	struct device_link *link;
+	int idx;
+
+	if (dev->parent)
+		dev->parent->power.must_resume = true;
+
+	idx = device_links_read_lock();
+
+	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
+		link->supplier->power.must_resume = true;
+
+	device_links_read_unlock(idx);
+}
+
 /**
  * __device_suspend_noirq - Execute a "noirq suspend" callback for given device.
  * @dev: Device to handle.
@@ -1127,10 +1161,27 @@ static int __device_suspend_noirq(struct
 	}
 
 	error = dpm_run_callback(callback, dev, state, info);
-	if (!error)
-		dev->power.is_noirq_suspended = true;
-	else
+	if (error) {
 		async_error = error;
+		goto Complete;
+	}
+
+	dev->power.is_noirq_suspended = true;
+
+	if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) {
+		/*
+		 * The only safe strategy here is to require that if the device
+		 * may not be left in suspend, resume callbacks must be invoked
+		 * for it.
+		 */
+		dev->power.must_resume = dev->power.must_resume ||
+					!dev->power.may_skip_resume;
+	} else {
+		dev->power.must_resume = true;
+	}
+
+	if (dev->power.must_resume)
+		dpm_superior_set_must_resume(dev);
 
 Complete:
 	complete_all(&dev->power.completion);
@@ -1487,6 +1538,9 @@ static int __device_suspend(struct devic
 		dev->power.direct_complete = false;
 	}
 
+	dev->power.may_skip_resume = false;
+	dev->power.must_resume = false;
+
 	dpm_watchdog_set(&wd, dev);
 	device_lock(dev);
 
@@ -1652,8 +1706,9 @@ static int device_prepare(struct device
 	if (dev->power.syscore)
 		return 0;
 
-	WARN_ON(dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
-		!pm_runtime_enabled(dev));
+	WARN_ON(!pm_runtime_enabled(dev) &&
+		dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND |
+					      DPM_FLAG_LEAVE_SUSPENDED));
 
 	/*
 	 * If a device's parent goes into runtime suspend at the wrong time,
Index: linux-pm/Documentation/driver-api/pm/devices.rst
===================================================================
--- linux-pm.orig/Documentation/driver-api/pm/devices.rst
+++ linux-pm/Documentation/driver-api/pm/devices.rst
@@ -788,6 +788,26 @@ must reflect the "active" status for run
 
 During system-wide resume from a sleep state it's easiest to put devices into
 the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.
-Refer to that document for more information regarding this particular issue as
+[Refer to that document for more information regarding this particular issue as
 well as for information on the device runtime power management framework in
-general.
+general.]
+
+However, it may be desirable to leave some devices in runtime suspend after
+system transitions to the working state and device drivers can use the
+``DPM_FLAG_LEAVE_SUSPENDED`` flag to indicate to the PM core (and middle-layer
+code) that this is the case.  Whether or not the devices will actually be left
+in suspend may depend on their state before the given system suspend-resume
+cycle and on the type of the system transition under way.  In particular,
+devices are not left suspended if that transition is a restore from hibernation,
+as device states are not guaranteed to be reflected by the information stored in
+the hibernation image in that case.
+
+The middle-layer code involved in the handling of the device has to indicate to
+the PM core if the device may be left in suspend with the help of its
+:c:member:`power.may_skip_resume` status bit.  That has to happen in the "noirq"
+phase of the preceding system-wide suspend (or analogous) transition.  The
+middle layer is then responsible for handling the device as appropriate in its
+"noirq" resume callback, which is executed regardless of whether or not the
+device may be left suspended, but the other resume callbacks (except for
+``->complete``) will be skipped automatically by the PM core if the device
+really can be left in suspend.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v2 2/6] PCI / PM: Support for LEAVE_SUSPENDED driver flag
  2017-11-08  0:41   ` [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
  2017-11-08 13:25     ` [PATCH v2 1/6] PM / core: Add LEAVE_SUSPENDED driver flag Rafael J. Wysocki
@ 2017-11-08 13:28     ` Rafael J. Wysocki
  2017-11-08 20:38       ` Bjorn Helgaas
  2017-11-08 13:34     ` [PATCH v2 3/6] ACPI / PM: Support for LEAVE_SUSPENDED driver flag in ACPI PM domain Rafael J. Wysocki
                       ` (4 subsequent siblings)
  6 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-08 13:28 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Add support for DPM_FLAG_LEAVE_SUSPENDED to the PCI bus type by
making it (a) set the power.may_skip_resume status bit for devices
that, from its perspective, may be left in suspend after system
wakeup from sleep and (b) return early from pci_pm_resume_noirq()
for devices whose remaining resume callbacks during the transition
under way are going to be skipped by the PM core.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/power/pci.txt |   11 +++++++++++
 drivers/pci/pci-driver.c    |   19 +++++++++++++++++--
 2 files changed, 28 insertions(+), 2 deletions(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -699,7 +699,7 @@ static void pci_pm_complete(struct devic
 	pm_generic_complete(dev);
 
 	/* Resume device if platform firmware has put it in reset-power-on */
-	if (dev->power.direct_complete && pm_resume_via_firmware()) {
+	if (pm_runtime_suspended(dev) && pm_resume_via_firmware()) {
 		pci_power_t pre_sleep_state = pci_dev->current_state;
 
 		pci_update_current_state(pci_dev, pci_dev->current_state);
@@ -783,8 +783,10 @@ static int pci_pm_suspend_noirq(struct d
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 
-	if (dev_pm_smart_suspend_and_suspended(dev))
+	if (dev_pm_smart_suspend_and_suspended(dev)) {
+		dev->power.may_skip_resume = true;
 		return 0;
+	}
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_SUSPEND);
@@ -838,6 +840,16 @@ static int pci_pm_suspend_noirq(struct d
 Fixup:
 	pci_fixup_device(pci_fixup_suspend_late, pci_dev);
 
+	/*
+	 * If the target system sleep state is suspend-to-idle, it is sufficient
+	 * to check whether or not the device's wakeup settings are good for
+	 * runtime PM.  Otherwise, the pm_resume_via_firmware() check will cause
+	 * pci_pm_complete() to take care of fixing up the device's state
+	 * anyway, if need be.
+	 */
+	dev->power.may_skip_resume = device_may_wakeup(dev) ||
+					!device_can_wakeup(dev);
+
 	return 0;
 }
 
@@ -847,6 +859,9 @@ static int pci_pm_resume_noirq(struct de
 	struct device_driver *drv = dev->driver;
 	int error = 0;
 
+	if (dev_pm_may_skip_resume(dev))
+		return 0;
+
 	/*
 	 * Devices with DPM_FLAG_SMART_SUSPEND may be left in runtime suspend
 	 * during system suspend, so update their runtime PM status to "active"
Index: linux-pm/Documentation/power/pci.txt
===================================================================
--- linux-pm.orig/Documentation/power/pci.txt
+++ linux-pm/Documentation/power/pci.txt
@@ -994,6 +994,17 @@ into D0 going forward), but if it is in
 the function will set the power.direct_complete flag for it (to make the PM core
 skip the subsequent "thaw" callbacks for it) and return.
 
+Setting the DPM_FLAG_LEAVE_SUSPENDED flag means that the driver prefers the
+device to be left in suspend after system-wide transitions to the working state.
+This flag is checked by the PM core, but the PCI bus type informs the PM core
+which devices may be left in suspend from its perspective (that happens during
+the "noirq" phase of system-wide suspend and analogous transitions) and next it
+uses the dev_pm_may_skip_resume() helper to decide whether or not to return from
+pci_pm_resume_noirq() early, as the PM core will skip the remaining resume
+callbacks for the device during the transition under way and will set its
+runtime PM status to "suspended" if dev_pm_may_skip_resume() returns "true" for
+it.
+
 3.2. Device Runtime Power Management
 ------------------------------------
 In addition to providing device power management callbacks PCI device drivers

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v2 3/6] ACPI / PM: Support for LEAVE_SUSPENDED driver flag in ACPI PM domain
  2017-11-08  0:41   ` [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
  2017-11-08 13:25     ` [PATCH v2 1/6] PM / core: Add LEAVE_SUSPENDED driver flag Rafael J. Wysocki
  2017-11-08 13:28     ` [PATCH v2 2/6] PCI / PM: Support for " Rafael J. Wysocki
@ 2017-11-08 13:34     ` Rafael J. Wysocki
  2017-11-08 13:37     ` [PATCH v2 4/6] PM / core: Add helpers for subsystem callback selection Rafael J. Wysocki
                       ` (3 subsequent siblings)
  6 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-08 13:34 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Add support for DPM_FLAG_LEAVE_SUSPENDED to the ACPI PM domain by
making it (a) set the power.may_skip_resume status bit for devices
that, from its perspective, may be left in suspend after system
wakeup from sleep and (b) return early from acpi_subsys_resume_noirq()
for devices whose remaining resume callbacks during the transition
under way are going to be skipped by the PM core.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/acpi/device_pm.c |   27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

Index: linux-pm/drivers/acpi/device_pm.c
===================================================================
--- linux-pm.orig/drivers/acpi/device_pm.c
+++ linux-pm/drivers/acpi/device_pm.c
@@ -987,7 +987,7 @@ void acpi_subsys_complete(struct device
 	 * the sleep state it is going out of and it has never been resumed till
 	 * now, resume it in case the firmware powered it up.
 	 */
-	if (dev->power.direct_complete && pm_resume_via_firmware())
+	if (pm_runtime_suspended(dev) && pm_resume_via_firmware())
 		pm_request_resume(dev);
 }
 EXPORT_SYMBOL_GPL(acpi_subsys_complete);
@@ -1036,10 +1036,28 @@ EXPORT_SYMBOL_GPL(acpi_subsys_suspend_la
  */
 int acpi_subsys_suspend_noirq(struct device *dev)
 {
-	if (dev_pm_smart_suspend_and_suspended(dev))
+	int ret;
+
+	if (dev_pm_smart_suspend_and_suspended(dev)) {
+		dev->power.may_skip_resume = true;
 		return 0;
+	}
 
-	return pm_generic_suspend_noirq(dev);
+	ret = pm_generic_suspend_noirq(dev);
+	if (ret)
+		return ret;
+
+	/*
+	 * If the target system sleep state is suspend-to-idle, it is sufficient
+	 * to check whether or not the device's wakeup settings are good for
+	 * runtime PM.  Otherwise, the pm_resume_via_firmware() check will cause
+	 * acpi_subsys_complete() to take care of fixing up the device's state
+	 * anyway, if need be.
+	 */
+	dev->power.may_skip_resume = device_may_wakeup(dev) ||
+					!device_can_wakeup(dev);
+
+	return 0;
 }
 EXPORT_SYMBOL_GPL(acpi_subsys_suspend_noirq);
 
@@ -1049,6 +1067,9 @@ EXPORT_SYMBOL_GPL(acpi_subsys_suspend_no
  */
 int acpi_subsys_resume_noirq(struct device *dev)
 {
+	if (dev_pm_may_skip_resume(dev))
+		return 0;
+
 	/*
 	 * Devices with DPM_FLAG_SMART_SUSPEND may be left in runtime suspend
 	 * during system suspend, so update their runtime PM status to "active"

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v2 4/6] PM / core: Add helpers for subsystem callback selection
  2017-11-08  0:41   ` [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
                       ` (2 preceding siblings ...)
  2017-11-08 13:34     ` [PATCH v2 3/6] ACPI / PM: Support for LEAVE_SUSPENDED driver flag in ACPI PM domain Rafael J. Wysocki
@ 2017-11-08 13:37     ` Rafael J. Wysocki
  2017-11-08 13:38     ` [PATCH v2 5/6] PM / core: Direct handling of DPM_FLAG_LEAVE_SUSPENDED Rafael J. Wysocki
                       ` (2 subsequent siblings)
  6 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-08 13:37 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Add helper routines to find and return a suitable subsystem callback
during the "noirq" phases of system suspend/resume (or analogous)
transitions as well as during the "late" phase of system suspend and
the "early" phase of system resume (or analogous) transitions.

The helpers will be called from additional sites going forward.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/base/power/main.c |  196 +++++++++++++++++++++++++++++++---------------
 1 file changed, 136 insertions(+), 60 deletions(-)

Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -525,6 +525,14 @@ static void dpm_watchdog_clear(struct dp
 #define dpm_watchdog_clear(x)
 #endif
 
+static pm_callback_t dpm_subsys_suspend_noirq_cb(struct device *dev,
+						 pm_message_t state,
+						 const char **info_p);
+
+static pm_callback_t dpm_subsys_suspend_late_cb(struct device *dev,
+						pm_message_t state,
+						const char **info_p);
+
 /*------------------------- Resume routines -------------------------*/
 
 /**
@@ -539,6 +547,35 @@ bool dev_pm_may_skip_resume(struct devic
 	return !dev->power.must_resume && pm_transition.event != PM_EVENT_RESTORE;
 }
 
+static pm_callback_t dpm_subsys_resume_noirq_cb(struct device *dev,
+						pm_message_t state,
+						const char **info_p)
+{
+	pm_callback_t callback;
+	const char *info;
+
+	if (dev->pm_domain) {
+		info = "noirq power domain ";
+		callback = pm_noirq_op(&dev->pm_domain->ops, state);
+	} else if (dev->type && dev->type->pm) {
+		info = "noirq type ";
+		callback = pm_noirq_op(dev->type->pm, state);
+	} else if (dev->class && dev->class->pm) {
+		info = "noirq class ";
+		callback = pm_noirq_op(dev->class->pm, state);
+	} else if (dev->bus && dev->bus->pm) {
+		info = "noirq bus ";
+		callback = pm_noirq_op(dev->bus->pm, state);
+	} else {
+		return NULL;
+	}
+
+	if (info_p)
+		*info_p = info;
+
+	return callback;
+}
+
 /**
  * device_resume_noirq - Execute a "noirq resume" callback for given device.
  * @dev: Device to handle.
@@ -550,8 +587,8 @@ bool dev_pm_may_skip_resume(struct devic
  */
 static int device_resume_noirq(struct device *dev, pm_message_t state, bool async)
 {
-	pm_callback_t callback = NULL;
-	const char *info = NULL;
+	pm_callback_t callback;
+	const char *info;
 	int error = 0;
 
 	TRACE_DEVICE(dev);
@@ -565,19 +602,7 @@ static int device_resume_noirq(struct de
 
 	dpm_wait_for_superior(dev, async);
 
-	if (dev->pm_domain) {
-		info = "noirq power domain ";
-		callback = pm_noirq_op(&dev->pm_domain->ops, state);
-	} else if (dev->type && dev->type->pm) {
-		info = "noirq type ";
-		callback = pm_noirq_op(dev->type->pm, state);
-	} else if (dev->class && dev->class->pm) {
-		info = "noirq class ";
-		callback = pm_noirq_op(dev->class->pm, state);
-	} else if (dev->bus && dev->bus->pm) {
-		info = "noirq bus ";
-		callback = pm_noirq_op(dev->bus->pm, state);
-	}
+	callback = dpm_subsys_resume_noirq_cb(dev, state, &info);
 
 	if (!callback && dev->driver && dev->driver->pm) {
 		info = "noirq driver ";
@@ -686,6 +711,35 @@ void dpm_resume_noirq(pm_message_t state
 	dpm_noirq_end();
 }
 
+static pm_callback_t dpm_subsys_resume_early_cb(struct device *dev,
+						pm_message_t state,
+						const char **info_p)
+{
+	pm_callback_t callback;
+	const char *info;
+
+	if (dev->pm_domain) {
+		info = "early power domain ";
+		callback = pm_late_early_op(&dev->pm_domain->ops, state);
+	} else if (dev->type && dev->type->pm) {
+		info = "early type ";
+		callback = pm_late_early_op(dev->type->pm, state);
+	} else if (dev->class && dev->class->pm) {
+		info = "early class ";
+		callback = pm_late_early_op(dev->class->pm, state);
+	} else if (dev->bus && dev->bus->pm) {
+		info = "early bus ";
+		callback = pm_late_early_op(dev->bus->pm, state);
+	} else {
+		return NULL;
+	}
+
+	if (info_p)
+		*info_p = info;
+
+	return callback;
+}
+
 /**
  * device_resume_early - Execute an "early resume" callback for given device.
  * @dev: Device to handle.
@@ -696,8 +750,8 @@ void dpm_resume_noirq(pm_message_t state
  */
 static int device_resume_early(struct device *dev, pm_message_t state, bool async)
 {
-	pm_callback_t callback = NULL;
-	const char *info = NULL;
+	pm_callback_t callback;
+	const char *info;
 	int error = 0;
 
 	TRACE_DEVICE(dev);
@@ -711,19 +765,7 @@ static int device_resume_early(struct de
 
 	dpm_wait_for_superior(dev, async);
 
-	if (dev->pm_domain) {
-		info = "early power domain ";
-		callback = pm_late_early_op(&dev->pm_domain->ops, state);
-	} else if (dev->type && dev->type->pm) {
-		info = "early type ";
-		callback = pm_late_early_op(dev->type->pm, state);
-	} else if (dev->class && dev->class->pm) {
-		info = "early class ";
-		callback = pm_late_early_op(dev->class->pm, state);
-	} else if (dev->bus && dev->bus->pm) {
-		info = "early bus ";
-		callback = pm_late_early_op(dev->bus->pm, state);
-	}
+	callback = dpm_subsys_resume_early_cb(dev, state, &info);
 
 	if (!callback && dev->driver && dev->driver->pm) {
 		info = "early driver ";
@@ -1110,6 +1152,35 @@ static void dpm_superior_set_must_resume
 	device_links_read_unlock(idx);
 }
 
+static pm_callback_t dpm_subsys_suspend_noirq_cb(struct device *dev,
+						 pm_message_t state,
+						 const char **info_p)
+{
+	pm_callback_t callback;
+	const char *info;
+
+	if (dev->pm_domain) {
+		info = "noirq power domain ";
+		callback = pm_noirq_op(&dev->pm_domain->ops, state);
+	} else if (dev->type && dev->type->pm) {
+		info = "noirq type ";
+		callback = pm_noirq_op(dev->type->pm, state);
+	} else if (dev->class && dev->class->pm) {
+		info = "noirq class ";
+		callback = pm_noirq_op(dev->class->pm, state);
+	} else if (dev->bus && dev->bus->pm) {
+		info = "noirq bus ";
+		callback = pm_noirq_op(dev->bus->pm, state);
+	} else {
+		return NULL;
+	}
+
+	if (info_p)
+		*info_p = info;
+
+	return callback;
+}
+
 /**
  * __device_suspend_noirq - Execute a "noirq suspend" callback for given device.
  * @dev: Device to handle.
@@ -1121,8 +1192,8 @@ static void dpm_superior_set_must_resume
  */
 static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool async)
 {
-	pm_callback_t callback = NULL;
-	const char *info = NULL;
+	pm_callback_t callback;
+	const char *info;
 	int error = 0;
 
 	TRACE_DEVICE(dev);
@@ -1141,19 +1212,7 @@ static int __device_suspend_noirq(struct
 	if (dev->power.syscore || dev->power.direct_complete)
 		goto Complete;
 
-	if (dev->pm_domain) {
-		info = "noirq power domain ";
-		callback = pm_noirq_op(&dev->pm_domain->ops, state);
-	} else if (dev->type && dev->type->pm) {
-		info = "noirq type ";
-		callback = pm_noirq_op(dev->type->pm, state);
-	} else if (dev->class && dev->class->pm) {
-		info = "noirq class ";
-		callback = pm_noirq_op(dev->class->pm, state);
-	} else if (dev->bus && dev->bus->pm) {
-		info = "noirq bus ";
-		callback = pm_noirq_op(dev->bus->pm, state);
-	}
+	callback = dpm_subsys_suspend_noirq_cb(dev, state, &info);
 
 	if (!callback && dev->driver && dev->driver->pm) {
 		info = "noirq driver ";
@@ -1287,6 +1346,35 @@ int dpm_suspend_noirq(pm_message_t state
 	return ret;
 }
 
+static pm_callback_t dpm_subsys_suspend_late_cb(struct device *dev,
+						pm_message_t state,
+						const char **info_p)
+{
+	pm_callback_t callback;
+	const char *info;
+
+	if (dev->pm_domain) {
+		info = "late power domain ";
+		callback = pm_late_early_op(&dev->pm_domain->ops, state);
+	} else if (dev->type && dev->type->pm) {
+		info = "late type ";
+		callback = pm_late_early_op(dev->type->pm, state);
+	} else if (dev->class && dev->class->pm) {
+		info = "late class ";
+		callback = pm_late_early_op(dev->class->pm, state);
+	} else if (dev->bus && dev->bus->pm) {
+		info = "late bus ";
+		callback = pm_late_early_op(dev->bus->pm, state);
+	} else {
+		return NULL;
+	}
+
+	if (info_p)
+		*info_p = info;
+
+	return callback;
+}
+
 /**
  * __device_suspend_late - Execute a "late suspend" callback for given device.
  * @dev: Device to handle.
@@ -1297,8 +1385,8 @@ int dpm_suspend_noirq(pm_message_t state
  */
 static int __device_suspend_late(struct device *dev, pm_message_t state, bool async)
 {
-	pm_callback_t callback = NULL;
-	const char *info = NULL;
+	pm_callback_t callback;
+	const char *info;
 	int error = 0;
 
 	TRACE_DEVICE(dev);
@@ -1319,19 +1407,7 @@ static int __device_suspend_late(struct
 	if (dev->power.syscore || dev->power.direct_complete)
 		goto Complete;
 
-	if (dev->pm_domain) {
-		info = "late power domain ";
-		callback = pm_late_early_op(&dev->pm_domain->ops, state);
-	} else if (dev->type && dev->type->pm) {
-		info = "late type ";
-		callback = pm_late_early_op(dev->type->pm, state);
-	} else if (dev->class && dev->class->pm) {
-		info = "late class ";
-		callback = pm_late_early_op(dev->class->pm, state);
-	} else if (dev->bus && dev->bus->pm) {
-		info = "late bus ";
-		callback = pm_late_early_op(dev->bus->pm, state);
-	}
+	callback = dpm_subsys_suspend_late_cb(dev, state, &info);
 
 	if (!callback && dev->driver && dev->driver->pm) {
 		info = "late driver ";

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v2 5/6] PM / core: Direct handling of DPM_FLAG_LEAVE_SUSPENDED
  2017-11-08  0:41   ` [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
                       ` (3 preceding siblings ...)
  2017-11-08 13:37     ` [PATCH v2 4/6] PM / core: Add helpers for subsystem callback selection Rafael J. Wysocki
@ 2017-11-08 13:38     ` Rafael J. Wysocki
  2017-11-08 13:39     ` [PATCH v2 6/6] PM / core: DPM_FLAG_SMART_SUSPEND optimization Rafael J. Wysocki
  2017-11-12  0:34     ` [PATCH v3 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
  6 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-08 13:38 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make the PM core handle DPM_FLAG_LEAVE_SUSPENDED directly for
devices whose "noirq", "late" and "early" driver callbacks are
invoked directly by it.

Namely, make it skip all of the system-wide resume callbacks for
such devices with DPM_FLAG_LEAVE_SUSPENDED set if they are in
runtime suspend during the "noirq" phase of system-wide suspend
(or analogous) transitions or the system transition under way is
a proper suspend (rather than anything related to hibernation) and
the device's wakeup settings are compatible with runtime PM (that
is, the device cannot generate wakeup signals at all or it is
allowed to wake up the system from sleep).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 Documentation/driver-api/pm/devices.rst |    9 ++++++
 drivers/base/power/main.c               |   47 ++++++++++++++++++++++++++++----
 2 files changed, 51 insertions(+), 5 deletions(-)

Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -589,6 +589,7 @@ static int device_resume_noirq(struct de
 {
 	pm_callback_t callback;
 	const char *info;
+	bool skip_resume;
 	int error = 0;
 
 	TRACE_DEVICE(dev);
@@ -602,23 +603,33 @@ static int device_resume_noirq(struct de
 
 	dpm_wait_for_superior(dev, async);
 
+	skip_resume = dev_pm_may_skip_resume(dev);
+
 	callback = dpm_subsys_resume_noirq_cb(dev, state, &info);
+	if (callback)
+		goto Run;
+
+	if (skip_resume)
+		goto Skip;
 
 	if (!callback && dev->driver && dev->driver->pm) {
 		info = "noirq driver ";
 		callback = pm_noirq_op(dev->driver->pm, state);
 	}
 
+Run:
 	error = dpm_run_callback(callback, dev, state, info);
+
+Skip:
 	dev->power.is_noirq_suspended = false;
 
-	if (dev_pm_may_skip_resume(dev)) {
+	if (skip_resume) {
 		pm_runtime_set_suspended(dev);
 		dev->power.is_late_suspended = false;
 		dev->power.is_suspended = false;
 	}
 
- Out:
+Out:
 	complete_all(&dev->power.completion);
 	TRACE_RESUME(error);
 	return error;
@@ -1194,6 +1205,7 @@ static int __device_suspend_noirq(struct
 {
 	pm_callback_t callback;
 	const char *info;
+	bool direct_cb = false;
 	int error = 0;
 
 	TRACE_DEVICE(dev);
@@ -1213,12 +1225,17 @@ static int __device_suspend_noirq(struct
 		goto Complete;
 
 	callback = dpm_subsys_suspend_noirq_cb(dev, state, &info);
+	if (callback)
+		goto Run;
 
-	if (!callback && dev->driver && dev->driver->pm) {
+	direct_cb = true;
+
+	if (dev->driver && dev->driver->pm) {
 		info = "noirq driver ";
 		callback = pm_noirq_op(dev->driver->pm, state);
 	}
 
+Run:
 	error = dpm_run_callback(callback, dev, state, info);
 	if (error) {
 		async_error = error;
@@ -1228,13 +1245,33 @@ static int __device_suspend_noirq(struct
 	dev->power.is_noirq_suspended = true;
 
 	if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) {
+		pm_message_t resume_msg = resume_event(state);
+		bool skip_resume;
+
+		if (direct_cb &&
+		    !dpm_subsys_suspend_late_cb(dev, state, NULL) &&
+		    !dpm_subsys_resume_early_cb(dev, resume_msg, NULL) &&
+		    !dpm_subsys_resume_noirq_cb(dev, resume_msg, NULL)) {
+			/*
+			 * If all of the device driver's "noirq", "late" and
+			 * "early" callbacks are invoked directly by the core,
+			 * the decision to allow the device to stay in suspend
+			 * can be based on its current runtime PM status and its
+			 * wakeup settings.
+			 */
+			skip_resume = pm_runtime_status_suspended(dev) ||
+				(resume_msg.event == PM_EVENT_RESUME &&
+				 (!device_can_wakeup(dev) ||
+				  device_may_wakeup(dev)));
+		} else {
+			skip_resume = dev->power.may_skip_resume;
+		}
 		/*
 		 * The only safe strategy here is to require that if the device
 		 * may not be left in suspend, resume callbacks must be invoked
 		 * for it.
 		 */
-		dev->power.must_resume = dev->power.must_resume ||
-					!dev->power.may_skip_resume;
+		dev->power.must_resume = dev->power.must_resume || !skip_resume;
 	} else {
 		dev->power.must_resume = true;
 	}
Index: linux-pm/Documentation/driver-api/pm/devices.rst
===================================================================
--- linux-pm.orig/Documentation/driver-api/pm/devices.rst
+++ linux-pm/Documentation/driver-api/pm/devices.rst
@@ -811,3 +811,12 @@ middle layer is then responsible for han
 device may be left suspended, but the other resume callbacks (except for
 ``->complete``) will be skipped automatically by the PM core if the device
 really can be left in suspend.
+
+For devices whose "noirq", "late" and "early" driver callbacks are invoked
+directly by the PM core, all of the system-wide resume callbacks are skipped if
+``DPM_FLAG_LEAVE_SUSPENDED`` is set and the device is in runtime suspend during
+the ``suspend_noirq`` (or analogous) phase or the transition under way is a
+proper system suspend (rather than anything related to hibernation) and the
+device's wakeup settings are suitable for runtime PM (that is, it cannot
+generate wakeup signals at all or it is allowed to wake up the system from
+sleep).

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v2 6/6] PM / core: DPM_FLAG_SMART_SUSPEND optimization
  2017-11-08  0:41   ` [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
                       ` (4 preceding siblings ...)
  2017-11-08 13:38     ` [PATCH v2 5/6] PM / core: Direct handling of DPM_FLAG_LEAVE_SUSPENDED Rafael J. Wysocki
@ 2017-11-08 13:39     ` Rafael J. Wysocki
  2017-11-12  0:34     ` [PATCH v3 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
  6 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-08 13:39 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make the PM core avoid invoking the "late" and "noirq" system-wide
suspend (or analogous) callbacks for devices that are in runtime
suspend during the corresponding phases of system-wide suspend
(or analogous) transitions.

The underlying observation is that runtime PM is disabled for
devices during those system-wide suspend phases, so their runtime
PM status should not change going forward and if it has not changed
so far, their state should be compatible with the target system
sleep state.

This change really makes it possible for, say, platform device
drivers to re-use runtime PM suspend and resume callbacks by
pointing ->suspend_late and ->resume_early, respectively (and
possibly the analogous hibernation-related callback pointers too),
to them without adding any extra "is the device already suspended?"
type of checks to the callback routines, as long as they will be
invoked directly by the core.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 Documentation/driver-api/pm/devices.rst |   18 +++++----
 drivers/base/power/main.c               |   62 ++++++++++++++++++++++++++++----
 2 files changed, 66 insertions(+), 14 deletions(-)

Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -536,6 +536,24 @@ static pm_callback_t dpm_subsys_suspend_
 /*------------------------- Resume routines -------------------------*/
 
 /**
+ * suspend_event - Return a "suspend" message for given "resume" one.
+ * @resume_msg: PM message representing a system-wide resume transition.
+ */
+static pm_message_t suspend_event(pm_message_t resume_msg)
+{
+	switch (resume_msg.event) {
+	case PM_EVENT_RESUME:
+		return PMSG_SUSPEND;
+	case PM_EVENT_THAW:
+	case PM_EVENT_RESTORE:
+		return PMSG_FREEZE;
+	case PM_EVENT_RECOVER:
+		return PMSG_HIBERNATE;
+	}
+	return PMSG_ON;
+}
+
+/**
  * dev_pm_may_skip_resume - System-wide device resume optimization check.
  * @dev: Target device.
  *
@@ -609,6 +627,25 @@ static int device_resume_noirq(struct de
 	if (callback)
 		goto Run;
 
+	if (dev_pm_smart_suspend_and_suspended(dev)) {
+		pm_message_t suspend_msg = suspend_event(state);
+
+		/*
+		 * If "freeze" callbacks have been skipped during a transition
+		 * related to hibernation, the subsequent "thaw" callbacks must
+		 * be skipped too or bad things may happen.  Otherwise, if the
+		 * device is to be resumed, its runtime PM status must be
+		 * changed to reflect the new configuration.
+		 */
+		if (!dpm_subsys_suspend_late_cb(dev, suspend_msg, NULL) &&
+		    !dpm_subsys_suspend_noirq_cb(dev, suspend_msg, NULL)) {
+			if (state.event == PM_EVENT_THAW)
+				skip_resume = true;
+			else if (!skip_resume)
+				pm_runtime_set_active(dev);
+		}
+	}
+
 	if (skip_resume)
 		goto Skip;
 
@@ -1228,7 +1265,10 @@ static int __device_suspend_noirq(struct
 	if (callback)
 		goto Run;
 
-	direct_cb = true;
+	direct_cb = !dpm_subsys_suspend_late_cb(dev, state, NULL);
+
+	if (dev_pm_smart_suspend_and_suspended(dev) && direct_cb)
+		goto Skip;
 
 	if (dev->driver && dev->driver->pm) {
 		info = "noirq driver ";
@@ -1242,6 +1282,7 @@ Run:
 		goto Complete;
 	}
 
+Skip:
 	dev->power.is_noirq_suspended = true;
 
 	if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) {
@@ -1249,7 +1290,6 @@ Run:
 		bool skip_resume;
 
 		if (direct_cb &&
-		    !dpm_subsys_suspend_late_cb(dev, state, NULL) &&
 		    !dpm_subsys_resume_early_cb(dev, resume_msg, NULL) &&
 		    !dpm_subsys_resume_noirq_cb(dev, resume_msg, NULL)) {
 			/*
@@ -1445,17 +1485,27 @@ static int __device_suspend_late(struct
 		goto Complete;
 
 	callback = dpm_subsys_suspend_late_cb(dev, state, &info);
+	if (callback)
+		goto Run;
 
-	if (!callback && dev->driver && dev->driver->pm) {
+	if (dev_pm_smart_suspend_and_suspended(dev) &&
+	    !dpm_subsys_suspend_noirq_cb(dev, state, NULL))
+		goto Skip;
+
+	if (dev->driver && dev->driver->pm) {
 		info = "late driver ";
 		callback = pm_late_early_op(dev->driver->pm, state);
 	}
 
+Run:
 	error = dpm_run_callback(callback, dev, state, info);
-	if (!error)
-		dev->power.is_late_suspended = true;
-	else
+	if (error) {
 		async_error = error;
+		goto Complete;
+	}
+
+Skip:
+	dev->power.is_late_suspended = true;
 
 Complete:
 	TRACE_SUSPEND(error);
Index: linux-pm/Documentation/driver-api/pm/devices.rst
===================================================================
--- linux-pm.orig/Documentation/driver-api/pm/devices.rst
+++ linux-pm/Documentation/driver-api/pm/devices.rst
@@ -777,14 +777,16 @@ The driver can indicate that by setting
 runtime suspend at the beginning of the ``suspend_late`` phase of system-wide
 suspend (or in the ``poweroff_late`` phase of hibernation), when runtime PM
 has been disabled for it, under the assumption that its state should not change
-after that point until the system-wide transition is over.  If that happens, the
-driver's system-wide resume callbacks, if present, may still be invoked during
-the subsequent system-wide resume transition and the device's runtime power
-management status may be set to "active" before enabling runtime PM for it,
-so the driver must be prepared to cope with the invocation of its system-wide
-resume callbacks back-to-back with its ``->runtime_suspend`` one (without the
-intervening ``->runtime_resume`` and so on) and the final state of the device
-must reflect the "active" status for runtime PM in that case.
+after that point until the system-wide transition is over (the PM core itself
+does that for devices whose "noirq", "late" and "early" system-wide PM callbacks
+are executed directly by it).  If that happens, the driver's system-wide resume
+callbacks, if present, may still be invoked during the subsequent system-wide
+resume transition and the device's runtime power management status may be set
+to "active" before enabling runtime PM for it, so the driver must be prepared to
+cope with the invocation of its system-wide resume callbacks back-to-back with
+its ``->runtime_suspend`` one (without the intervening ``->runtime_resume`` and
+so on) and the final state of the device must reflect the "active" status for
+runtime PM in that case.
 
 During system-wide resume from a sleep state it's easiest to put devices into
 the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 2/6] PCI / PM: Support for LEAVE_SUSPENDED driver flag
  2017-11-08 13:28     ` [PATCH v2 2/6] PCI / PM: Support for " Rafael J. Wysocki
@ 2017-11-08 20:38       ` Bjorn Helgaas
  2017-11-08 21:09         ` Rafael J. Wysocki
  0 siblings, 1 reply; 135+ messages in thread
From: Bjorn Helgaas @ 2017-11-08 20:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Ulf Hansson, Andy Shevchenko, Kevin Hilman

On Wed, Nov 08, 2017 at 02:28:18PM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Add support for DPM_FLAG_LEAVE_SUSPENDED to the PCI bus type by
> making it (a) set the power.may_skip_resume status bit for devices
> that, from its perspective, may be left in suspend after system
> wakeup from sleep and (b) return early from pci_pm_resume_noirq()
> for devices whose remaining resume callbacks during the transition
> under way are going to be skipped by the PM core.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

> ---
>  Documentation/power/pci.txt |   11 +++++++++++
>  drivers/pci/pci-driver.c    |   19 +++++++++++++++++--
>  2 files changed, 28 insertions(+), 2 deletions(-)
> 
> Index: linux-pm/drivers/pci/pci-driver.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/pci-driver.c
> +++ linux-pm/drivers/pci/pci-driver.c
> @@ -699,7 +699,7 @@ static void pci_pm_complete(struct devic
>  	pm_generic_complete(dev);
>  
>  	/* Resume device if platform firmware has put it in reset-power-on */
> -	if (dev->power.direct_complete && pm_resume_via_firmware()) {
> +	if (pm_runtime_suspended(dev) && pm_resume_via_firmware()) {
>  		pci_power_t pre_sleep_state = pci_dev->current_state;
>  
>  		pci_update_current_state(pci_dev, pci_dev->current_state);
> @@ -783,8 +783,10 @@ static int pci_pm_suspend_noirq(struct d
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
>  	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
>  
> -	if (dev_pm_smart_suspend_and_suspended(dev))
> +	if (dev_pm_smart_suspend_and_suspended(dev)) {
> +		dev->power.may_skip_resume = true;
>  		return 0;
> +	}
>  
>  	if (pci_has_legacy_pm_support(pci_dev))
>  		return pci_legacy_suspend_late(dev, PMSG_SUSPEND);
> @@ -838,6 +840,16 @@ static int pci_pm_suspend_noirq(struct d
>  Fixup:
>  	pci_fixup_device(pci_fixup_suspend_late, pci_dev);
>  
> +	/*
> +	 * If the target system sleep state is suspend-to-idle, it is sufficient
> +	 * to check whether or not the device's wakeup settings are good for
> +	 * runtime PM.  Otherwise, the pm_resume_via_firmware() check will cause
> +	 * pci_pm_complete() to take care of fixing up the device's state
> +	 * anyway, if need be.
> +	 */
> +	dev->power.may_skip_resume = device_may_wakeup(dev) ||
> +					!device_can_wakeup(dev);
> +
>  	return 0;
>  }
>  
> @@ -847,6 +859,9 @@ static int pci_pm_resume_noirq(struct de
>  	struct device_driver *drv = dev->driver;
>  	int error = 0;
>  
> +	if (dev_pm_may_skip_resume(dev))
> +		return 0;
> +
>  	/*
>  	 * Devices with DPM_FLAG_SMART_SUSPEND may be left in runtime suspend
>  	 * during system suspend, so update their runtime PM status to "active"
> Index: linux-pm/Documentation/power/pci.txt
> ===================================================================
> --- linux-pm.orig/Documentation/power/pci.txt
> +++ linux-pm/Documentation/power/pci.txt
> @@ -994,6 +994,17 @@ into D0 going forward), but if it is in
>  the function will set the power.direct_complete flag for it (to make the PM core
>  skip the subsequent "thaw" callbacks for it) and return.
>  
> +Setting the DPM_FLAG_LEAVE_SUSPENDED flag means that the driver prefers the
> +device to be left in suspend after system-wide transitions to the working state.
> +This flag is checked by the PM core, but the PCI bus type informs the PM core
> +which devices may be left in suspend from its perspective (that happens during
> +the "noirq" phase of system-wide suspend and analogous transitions) and next it
> +uses the dev_pm_may_skip_resume() helper to decide whether or not to return from
> +pci_pm_resume_noirq() early, as the PM core will skip the remaining resume
> +callbacks for the device during the transition under way and will set its
> +runtime PM status to "suspended" if dev_pm_may_skip_resume() returns "true" for
> +it.
> +
>  3.2. Device Runtime Power Management
>  ------------------------------------
>  In addition to providing device power management callbacks PCI device drivers
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 2/6] PCI / PM: Support for LEAVE_SUSPENDED driver flag
  2017-11-08 20:38       ` Bjorn Helgaas
@ 2017-11-08 21:09         ` Rafael J. Wysocki
  0 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-08 21:09 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Linux PM, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

On Wed, Nov 8, 2017 at 9:38 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Wed, Nov 08, 2017 at 02:28:18PM +0100, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> Add support for DPM_FLAG_LEAVE_SUSPENDED to the PCI bus type by
>> making it (a) set the power.may_skip_resume status bit for devices
>> that, from its perspective, may be left in suspend after system
>> wakeup from sleep and (b) return early from pci_pm_resume_noirq()
>> for devices whose remaining resume callbacks during the transition
>> under way are going to be skipped by the PM core.
>>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>

Thanks!

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 1/6] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-11-08 13:25     ` [PATCH v2 1/6] PM / core: Add LEAVE_SUSPENDED driver flag Rafael J. Wysocki
@ 2017-11-10  9:09       ` Ulf Hansson
  2017-11-10 23:45         ` Rafael J. Wysocki
  0 siblings, 1 reply; 135+ messages in thread
From: Ulf Hansson @ 2017-11-10  9:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman

On 8 November 2017 at 14:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
> instruct the PM core and middle-layer (bus type, PM domain, etc.)
> code that it is desirable to leave the device in runtime suspend
> after system-wide transitions to the working state (for example,
> the device may be slow to resume and it may be better to avoid
> resuming it right away).
>
> Generally, the middle-layer code involved in the handling of the
> device is expected to indicate to the PM core whether or not the
> device may be left in suspend with the help of the device's
> power.may_skip_resume status bit.  That has to happen in the "noirq"
> phase of the preceding system suspend (or analogous) transition.
> The middle layer is then responsible for handling the device as
> appropriate in its "noirq" resume callback which is executed
> regardless of whether or not the device may be left suspended, but
> the other resume callbacks (except for ->complete) will be skipped
> automatically by the core if the device really can be left in
> suspend.

I don't understand the reason to why you need to skip invoking resume
callbacks to achieve this behavior, could you elaborate on that?

Couldn't the PM domain or the middle-layer instead decide what to do?
To me it sounds a bit prone to errors by skipping callbacks from the
PM core, and I wonder if the general driver author will be able to
understand how to use this flag properly.

That said, as the series don't include any changes for drivers making
use of the flag, could please fold in such change as it would provide
a more complete picture?

>
> The additional power.must_resume status bit introduced for the
> implementation of this mechanisn is used internally by the PM core
> to track the requirement to resume the device (which may depend on
> its children etc).

Yeah, clearly the PM core needs to be involved, because of the need of
dealing with parent/child relations, however as kind of indicate
above, couldn't the PM core just set some flag/status bits, which
instructs the middle-layer and PM domain on what to do? That sounds
like an easier approach.

>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
>  Documentation/driver-api/pm/devices.rst |   24 ++++++++++-
>  drivers/base/power/main.c               |   65 +++++++++++++++++++++++++++++---
>  include/linux/pm.h                      |   14 +++++-
>  3 files changed, 93 insertions(+), 10 deletions(-)
>
> Index: linux-pm/include/linux/pm.h
> ===================================================================
> --- linux-pm.orig/include/linux/pm.h
> +++ linux-pm/include/linux/pm.h
> @@ -559,6 +559,7 @@ struct pm_subsys_data {
>   * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
>   * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
>   * SMART_SUSPEND: No need to resume the device from runtime suspend.
> + * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible.
>   *
>   * Setting SMART_PREPARE instructs bus types and PM domains which may want
>   * system suspend/resume callbacks to be skipped for the device to return 0 from
> @@ -572,10 +573,14 @@ struct pm_subsys_data {
>   * necessary from the driver's perspective.  It also may cause them to skip
>   * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by
>   * the driver if they decide to leave the device in runtime suspend.
> + *
> + * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the
> + * driver prefers the device to be left in runtime suspend after system resume.
>   */
> -#define DPM_FLAG_NEVER_SKIP    BIT(0)
> -#define DPM_FLAG_SMART_PREPARE BIT(1)
> -#define DPM_FLAG_SMART_SUSPEND BIT(2)
> +#define DPM_FLAG_NEVER_SKIP            BIT(0)
> +#define DPM_FLAG_SMART_PREPARE         BIT(1)
> +#define DPM_FLAG_SMART_SUSPEND         BIT(2)
> +#define DPM_FLAG_LEAVE_SUSPENDED       BIT(3)
>
>  struct dev_pm_info {
>         pm_message_t            power_state;
> @@ -597,6 +602,8 @@ struct dev_pm_info {
>         bool                    wakeup_path:1;
>         bool                    syscore:1;
>         bool                    no_pm_callbacks:1;      /* Owned by the PM core */
> +       unsigned int            must_resume:1;  /* Owned by the PM core */
> +       unsigned int            may_skip_resume:1;      /* Set by subsystems */
>  #else
>         unsigned int            should_wakeup:1;
>  #endif
> @@ -765,6 +772,7 @@ extern int pm_generic_poweroff_late(stru
>  extern int pm_generic_poweroff(struct device *dev);
>  extern void pm_generic_complete(struct device *dev);
>
> +extern bool dev_pm_may_skip_resume(struct device *dev);
>  extern bool dev_pm_smart_suspend_and_suspended(struct device *dev);
>
>  #else /* !CONFIG_PM_SLEEP */
> Index: linux-pm/drivers/base/power/main.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/main.c
> +++ linux-pm/drivers/base/power/main.c
> @@ -528,6 +528,18 @@ static void dpm_watchdog_clear(struct dp
>  /*------------------------- Resume routines -------------------------*/
>
>  /**
> + * dev_pm_may_skip_resume - System-wide device resume optimization check.
> + * @dev: Target device.
> + *
> + * Checks whether or not the device may be left in suspend after a system-wide
> + * transition to the working state.
> + */
> +bool dev_pm_may_skip_resume(struct device *dev)
> +{
> +       return !dev->power.must_resume && pm_transition.event != PM_EVENT_RESTORE;
> +}
> +
> +/**
>   * device_resume_noirq - Execute a "noirq resume" callback for given device.
>   * @dev: Device to handle.
>   * @state: PM transition of the system being carried out.
> @@ -575,6 +587,12 @@ static int device_resume_noirq(struct de
>         error = dpm_run_callback(callback, dev, state, info);
>         dev->power.is_noirq_suspended = false;
>
> +       if (dev_pm_may_skip_resume(dev)) {
> +               pm_runtime_set_suspended(dev);

According to the doc, the DPM_FLAG_LEAVE_SUSPENDED intends to leave
the device in runtime suspend state during system resume.
However, here you are actually trying to change its runtime PM state to that.

Moreover, you should check the return value from
pm_runtime_set_suspended(). Then I wonder, what should you do when it
fails here?

Perhaps a better idea is to do this in the noirq suspend phase,
because it allows you to bail out in case pm_runtime_set_suspended()
fails.

Another option is to leave this to the middle-layer and PM domain,
that would make it more flexible and probably also easier for them to
deal with the error path.

> +               dev->power.is_late_suspended = false;
> +               dev->power.is_suspended = false;pm_runtime_set_suspended(
> +       }
> +
>   Out:
>         complete_all(&dev->power.completion);
>         TRACE_RESUME(error);
> @@ -1076,6 +1094,22 @@ static pm_message_t resume_event(pm_mess
>         return PMSG_ON;
>  }
>
> +static void dpm_superior_set_must_resume(struct device *dev)
> +{
> +       struct device_link *link;
> +       int idx;
> +
> +       if (dev->parent)
> +               dev->parent->power.must_resume = true;
> +
> +       idx = device_links_read_lock();
> +
> +       list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
> +               link->supplier->power.must_resume = true;
> +
> +       device_links_read_unlock(idx);
> +}
> +
>  /**
>   * __device_suspend_noirq - Execute a "noirq suspend" callback for given device.
>   * @dev: Device to handle.
> @@ -1127,10 +1161,27 @@ static int __device_suspend_noirq(struct
>         }
>
>         error = dpm_run_callback(callback, dev, state, info);
> -       if (!error)
> -               dev->power.is_noirq_suspended = true;
> -       else
> +       if (error) {
>                 async_error = error;
> +               goto Complete;
> +       }
> +
> +       dev->power.is_noirq_suspended = true;
> +
> +       if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) {
> +               /*
> +                * The only safe strategy here is to require that if the device
> +                * may not be left in suspend, resume callbacks must be invoked
> +                * for it.
> +                */
> +               dev->power.must_resume = dev->power.must_resume ||
> +                                       !dev->power.may_skip_resume;
> +       } else {
> +               dev->power.must_resume = true;
> +       }
> +
> +       if (dev->power.must_resume)
> +               dpm_superior_set_must_resume(dev);
>
>  Complete:
>         complete_all(&dev->power.completion);
> @@ -1487,6 +1538,9 @@ static int __device_suspend(struct devic
>                 dev->power.direct_complete = false;
>         }
>
> +       dev->power.may_skip_resume = false;
> +       dev->power.must_resume = false;
> +
>         dpm_watchdog_set(&wd, dev);
>         device_lock(dev);
>
> @@ -1652,8 +1706,9 @@ static int device_prepare(struct device
>         if (dev->power.syscore)
>                 return 0;
>
> -       WARN_ON(dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
> -               !pm_runtime_enabled(dev));
> +       WARN_ON(!pm_runtime_enabled(dev) &&
> +               dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND |
> +                                             DPM_FLAG_LEAVE_SUSPENDED));
>
>         /*
>          * If a device's parent goes into runtime suspend at the wrong time,
> Index: linux-pm/Documentation/driver-api/pm/devices.rst
> ===================================================================
> --- linux-pm.orig/Documentation/driver-api/pm/devices.rst
> +++ linux-pm/Documentation/driver-api/pm/devices.rst
> @@ -788,6 +788,26 @@ must reflect the "active" status for run
>
>  During system-wide resume from a sleep state it's easiest to put devices into
>  the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.
> -Refer to that document for more information regarding this particular issue as
> +[Refer to that document for more information regarding this particular issue as
>  well as for information on the device runtime power management framework in
> -general.
> +general.]
> +
> +However, it may be desirable to leave some devices in runtime suspend after
> +system transitions to the working state and device drivers can use the
> +``DPM_FLAG_LEAVE_SUSPENDED`` flag to indicate to the PM core (and middle-layer
> +code) that this is the case.  Whether or not the devices will actually be left
> +in suspend may depend on their state before the given system suspend-resume
> +cycle and on the type of the system transition under way.  In particular,
> +devices are not left suspended if that transition is a restore from hibernation,
> +as device states are not guaranteed to be reflected by the information stored in
> +the hibernation image in that case.
> +
> +The middle-layer code involved in the handling of the device has to indicate to
> +the PM core if the device may be left in suspend with the help of its
> +:c:member:`power.may_skip_resume` status bit.  That has to happen in the "noirq"
> +phase of the preceding system-wide suspend (or analogous) transition.  The
> +middle layer is then responsible for handling the device as appropriate in its
> +"noirq" resume callback, which is executed regardless of whether or not the
> +device may be left suspended, but the other resume callbacks (except for
> +``->complete``) will be skipped automatically by the PM core if the device
> +really can be left in suspend.
>

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 1/6] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-11-10  9:09       ` Ulf Hansson
@ 2017-11-10 23:45         ` Rafael J. Wysocki
  2017-11-11  0:41           ` Rafael J. Wysocki
                             ` (2 more replies)
  0 siblings, 3 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-10 23:45 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, Linux PM, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Andy Shevchenko,
	Kevin Hilman

On Fri, Nov 10, 2017 at 10:09 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On 8 November 2017 at 14:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
>> instruct the PM core and middle-layer (bus type, PM domain, etc.)
>> code that it is desirable to leave the device in runtime suspend
>> after system-wide transitions to the working state (for example,
>> the device may be slow to resume and it may be better to avoid
>> resuming it right away).
>>
>> Generally, the middle-layer code involved in the handling of the
>> device is expected to indicate to the PM core whether or not the
>> device may be left in suspend with the help of the device's
>> power.may_skip_resume status bit.  That has to happen in the "noirq"
>> phase of the preceding system suspend (or analogous) transition.
>> The middle layer is then responsible for handling the device as
>> appropriate in its "noirq" resume callback which is executed
>> regardless of whether or not the device may be left suspended, but
>> the other resume callbacks (except for ->complete) will be skipped
>> automatically by the core if the device really can be left in
>> suspend.
>
> I don't understand the reason to why you need to skip invoking resume
> callbacks to achieve this behavior, could you elaborate on that?

The reason why it is done this way is because that takes less code and
is easier (or at least less error-prone, because it avoids repeating
patterns in middle layers).

Note that the callbacks only may be skipped by the core if the middle
layer has set power.skip_resume for the device (or if the core is
handling it in patch [5/6], but that's one more step ahead still).

> Couldn't the PM domain or the middle-layer instead decide what to do?

They still can, the whole thing is a total opt-in.

But to be constructive, do you have any specific examples in mind?

> To me it sounds a bit prone to errors by skipping callbacks from the
> PM core, and I wonder if the general driver author will be able to
> understand how to use this flag properly.

This has nothing to do with general driver authors and I'm not sure
what you mean here and where you are going with this.

> That said, as the series don't include any changes for drivers making
> use of the flag, could please fold in such change as it would provide
> a more complete picture?

I've already done so, see https://patchwork.kernel.org/patch/10007349/

IMHO it's not really useful to drag this stuff (which doesn't change
BTW) along with every iteration of the core patches.

>>
>> The additional power.must_resume status bit introduced for the
>> implementation of this mechanisn is used internally by the PM core
>> to track the requirement to resume the device (which may depend on
>> its children etc).
>
> Yeah, clearly the PM core needs to be involved, because of the need of
> dealing with parent/child relations, however as kind of indicate
> above, couldn't the PM core just set some flag/status bits, which
> instructs the middle-layer and PM domain on what to do? That sounds
> like an easier approach.

No, it is not easier.  And it is backwards.

>>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> ---
>>  Documentation/driver-api/pm/devices.rst |   24 ++++++++++-
>>  drivers/base/power/main.c               |   65 +++++++++++++++++++++++++++++---
>>  include/linux/pm.h                      |   14 +++++-
>>  3 files changed, 93 insertions(+), 10 deletions(-)
>>
>> Index: linux-pm/include/linux/pm.h
>> ===================================================================
>> --- linux-pm.orig/include/linux/pm.h
>> +++ linux-pm/include/linux/pm.h
>> @@ -559,6 +559,7 @@ struct pm_subsys_data {
>>   * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
>>   * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
>>   * SMART_SUSPEND: No need to resume the device from runtime suspend.
>> + * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible.
>>   *
>>   * Setting SMART_PREPARE instructs bus types and PM domains which may want
>>   * system suspend/resume callbacks to be skipped for the device to return 0 from
>> @@ -572,10 +573,14 @@ struct pm_subsys_data {
>>   * necessary from the driver's perspective.  It also may cause them to skip
>>   * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by
>>   * the driver if they decide to leave the device in runtime suspend.
>> + *
>> + * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the
>> + * driver prefers the device to be left in runtime suspend after system resume.
>>   */
>> -#define DPM_FLAG_NEVER_SKIP    BIT(0)
>> -#define DPM_FLAG_SMART_PREPARE BIT(1)
>> -#define DPM_FLAG_SMART_SUSPEND BIT(2)
>> +#define DPM_FLAG_NEVER_SKIP            BIT(0)
>> +#define DPM_FLAG_SMART_PREPARE         BIT(1)
>> +#define DPM_FLAG_SMART_SUSPEND         BIT(2)
>> +#define DPM_FLAG_LEAVE_SUSPENDED       BIT(3)
>>
>>  struct dev_pm_info {
>>         pm_message_t            power_state;
>> @@ -597,6 +602,8 @@ struct dev_pm_info {
>>         bool                    wakeup_path:1;
>>         bool                    syscore:1;
>>         bool                    no_pm_callbacks:1;      /* Owned by the PM core */
>> +       unsigned int            must_resume:1;  /* Owned by the PM core */
>> +       unsigned int            may_skip_resume:1;      /* Set by subsystems */
>>  #else
>>         unsigned int            should_wakeup:1;
>>  #endif
>> @@ -765,6 +772,7 @@ extern int pm_generic_poweroff_late(stru
>>  extern int pm_generic_poweroff(struct device *dev);
>>  extern void pm_generic_complete(struct device *dev);
>>
>> +extern bool dev_pm_may_skip_resume(struct device *dev);
>>  extern bool dev_pm_smart_suspend_and_suspended(struct device *dev);
>>
>>  #else /* !CONFIG_PM_SLEEP */
>> Index: linux-pm/drivers/base/power/main.c
>> ===================================================================
>> --- linux-pm.orig/drivers/base/power/main.c
>> +++ linux-pm/drivers/base/power/main.c
>> @@ -528,6 +528,18 @@ static void dpm_watchdog_clear(struct dp
>>  /*------------------------- Resume routines -------------------------*/
>>
>>  /**
>> + * dev_pm_may_skip_resume - System-wide device resume optimization check.
>> + * @dev: Target device.
>> + *
>> + * Checks whether or not the device may be left in suspend after a system-wide
>> + * transition to the working state.
>> + */
>> +bool dev_pm_may_skip_resume(struct device *dev)
>> +{
>> +       return !dev->power.must_resume && pm_transition.event != PM_EVENT_RESTORE;
>> +}
>> +
>> +/**
>>   * device_resume_noirq - Execute a "noirq resume" callback for given device.
>>   * @dev: Device to handle.
>>   * @state: PM transition of the system being carried out.
>> @@ -575,6 +587,12 @@ static int device_resume_noirq(struct de
>>         error = dpm_run_callback(callback, dev, state, info);
>>         dev->power.is_noirq_suspended = false;
>>
>> +       if (dev_pm_may_skip_resume(dev)) {
>> +               pm_runtime_set_suspended(dev);
>
> According to the doc, the DPM_FLAG_LEAVE_SUSPENDED intends to leave
> the device in runtime suspend state during system resume.
> However, here you are actually trying to change its runtime PM state to that.

So the doc needs to be fixed. :-)

But I'm guessing that this just is a misunderstanding and you mean the
phrase "it may be desirable to leave some devices in runtime suspend
after [...]".  Yes, it is talking about "runtime suspend", but
actually "runtime suspend" is the only kind of "suspend" you can leave
a device in after a system transition to the working state.  It never
says that the device must have been suspended before the preceding
system transition into a sleep state started.

> Moreover, you should check the return value from
> pm_runtime_set_suspended().

This is in "noirq", so failures of that are meaningless here.

> Then I wonder, what should you do when it fails here?
>
> Perhaps a better idea is to do this in the noirq suspend phase,
> because it allows you to bail out in case pm_runtime_set_suspended()
> fails.

This doesn't make sense, sorry.

> Another option is to leave this to the middle-layer and PM domain,
> that would make it more flexible and probably also easier for them to
> deal with the error path.

So the middle layer doesn't have to set power.skip_resume.

Just don't set it if you don't like the default handling, but yes, you
will affect others this way.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 1/6] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-11-10 23:45         ` Rafael J. Wysocki
@ 2017-11-11  0:41           ` Rafael J. Wysocki
  2017-11-11  1:36           ` Rafael J. Wysocki
  2017-11-14 16:07           ` Ulf Hansson
  2 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-11  0:41 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ulf Hansson, Rafael J. Wysocki, Linux PM, Bjorn Helgaas,
	Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Andy Shevchenko,
	Kevin Hilman

On Sat, Nov 11, 2017 at 12:45 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Fri, Nov 10, 2017 at 10:09 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> On 8 November 2017 at 14:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>
>>> Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
>>> instruct the PM core and middle-layer (bus type, PM domain, etc.)
>>> code that it is desirable to leave the device in runtime suspend
>>> after system-wide transitions to the working state (for example,
>>> the device may be slow to resume and it may be better to avoid
>>> resuming it right away).
>>>
>>> Generally, the middle-layer code involved in the handling of the
>>> device is expected to indicate to the PM core whether or not the
>>> device may be left in suspend with the help of the device's
>>> power.may_skip_resume status bit.  That has to happen in the "noirq"
>>> phase of the preceding system suspend (or analogous) transition.
>>> The middle layer is then responsible for handling the device as
>>> appropriate in its "noirq" resume callback which is executed
>>> regardless of whether or not the device may be left suspended, but
>>> the other resume callbacks (except for ->complete) will be skipped
>>> automatically by the core if the device really can be left in
>>> suspend.
>>
>> I don't understand the reason to why you need to skip invoking resume
>> callbacks to achieve this behavior, could you elaborate on that?
>
> The reason why it is done this way is because that takes less code and
> is easier (or at least less error-prone, because it avoids repeating
> patterns in middle layers).

Actually, it also is a matter of correctness, at least to some extent.

Namely, if the parent or any supplier of the device has
power.must_resume clear in dpm_noirq_resume_devices(), then the device
should not be touched during the whole system resume transition
(because the access may very well go through the suspended parent or
supplier) and the most straightforward way to make that happen is to
avoid running the code that may touch the device.  [Arguably, if
middle layers were made responsible for handling that, they would need
to do pretty much the same thing and so there is no reason for not
doing it in the core.]

Allowing the "noirq" callback from middle layers to run in that case
is a stretch already, but since genpd needs that, well, tough nuggets.

All of that said, if there is a middle layer wanting to set
power.skip_resume and needing to do something different for the resume
callbacks, then this piece can be moved from the core to the middle
layers at any time later.  So far there's none, though.  At least not
in this patch series.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 1/6] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-11-10 23:45         ` Rafael J. Wysocki
  2017-11-11  0:41           ` Rafael J. Wysocki
@ 2017-11-11  1:36           ` Rafael J. Wysocki
  2017-11-14 16:07           ` Ulf Hansson
  2 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-11  1:36 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, Linux PM, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Andy Shevchenko,
	Kevin Hilman

On Sat, Nov 11, 2017 at 12:45 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Fri, Nov 10, 2017 at 10:09 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> On 8 November 2017 at 14:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:

[cut]

>> Moreover, you should check the return value from
>> pm_runtime_set_suspended().
>
> This is in "noirq", so failures of that are meaningless here.

They *should* be meaningless, but __pm_runtime_set_status() is sort of
buggy and checks child_count regardless of whether or not runtime PM
is enabled for the children (but when changing the status to "active"
it actually checks if runtime PM is enabled for the parent before
returning -EBUSY, so it is not event consistent internally).  Oh well.

>> Then I wonder, what should you do when it fails here?
>>
>> Perhaps a better idea is to do this in the noirq suspend phase,
>> because it allows you to bail out in case pm_runtime_set_suspended()
>> fails.
>
> This doesn't make sense, sorry.

Not for the above reason, but that would allow the bug in
__pm_runtime_set_status() to be sort of worked around by setting the
status to "suspended" for children before doing that for their
parents.

Moreover, stuff with nonzero usage_counts cannot be left in suspend regardless.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v3 0/6] PM / sleep: Driver flags for system suspend/resume (part 2)
  2017-11-08  0:41   ` [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
                       ` (5 preceding siblings ...)
  2017-11-08 13:39     ` [PATCH v2 6/6] PM / core: DPM_FLAG_SMART_SUSPEND optimization Rafael J. Wysocki
@ 2017-11-12  0:34     ` Rafael J. Wysocki
  2017-11-12  0:37       ` [PATCH v3 1/6] PM / core: Add LEAVE_SUSPENDED driver flag Rafael J. Wysocki
                         ` (6 more replies)
  6 siblings, 7 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-12  0:34 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

Hi All,

The following still applies:

On Wednesday, November 8, 2017 1:41:35 AM CET Rafael J. Wysocki wrote:
>
> This is a follow-up for the first part of the PM driver flags series
> sent previously some time ago with an intro as follows:
> 
> On Saturday, October 28, 2017 12:11:55 AM CET Rafael J. Wysocki wrote:
> > The following part of the original cover letter still applies:
> > 
> > On Monday, October 16, 2017 3:12:35 AM CEST Rafael J. Wysocki wrote:
> > > 
> > > This work was triggered by attempts to fix and optimize PM in the
> > > i2c-designware-platdev driver that ended up with adding a couple of
> > > flags to the driver's internal data structures for the tracking of
> > > device state (https://marc.info/?l=linux-acpi&m=150629646805636&w=2).
> > > That approach is sort of suboptimal, though, because other drivers will
> > > probably want to do similar things and if all of them need to use internal
> > > flags for that, quite a bit of code duplication may ensue at least.
> > > 
> > > That can be avoided in a couple of ways and one of them is to provide a means
> > > for drivers to tell the core what to do and to make the core take care of it
> > > if told to do so.  Hence, the idea to use driver flags for system-wide PM
> > > that was briefly discussed during the LPC in LA last month.
> > 
> > [...]
> > 
> > > What can work (and this is the only strategy that can work AFAICS) is to
> > > point different callback pointers *in* *a* *driver* to the same routine
> > > if the driver wants to reuse that code.  That actually will work for PCI
> > > and USB drivers today, at least most of the time, but unfortunately there
> > > are problems with it for, say, platform devices.
> > > 
> > > The first problem is the requirement to track the status of the device
> > > (suspended vs not suspended) in the callbacks, because the system-wide PM
> > > code in the PM core doesn't do that.  The runtime PM framework does it, so
> > > this means adding some extra code which isn't necessary for runtime PM to
> > > the callback routines and that is not particularly nice.
> > > 
> > > The second problem is that, if the driver wants to do anything in its
> > > ->suspend callback, it generally has to prevent runtime suspend of the
> > > device from taking place in parallel with that, which is quite cumbersome.
> > > Usually, that is taken care of by resuming the device from runtime suspend
> > > upfront, but generally doing that is wasteful (there may be no real need to
> > > resume the device except for the fact that the code is designed this way).
> > > 
> > > On top of the above, there are optimizations to be made, like leaving certain
> > > devices in suspend after system resume to avoid wasting time on waiting for
> > > them to resume before user space can run again and similar.
> > > 
> > > This patch series focuses on addressing those problems so as to make it
> > > easier to reuse callback routines by pointing different callback pointers
> > > to them in device drivers.  The flags introduced here are to instruct the
> > > PM core and middle layers (whatever they are) on how the driver wants the
> > > device to be handled and then the driver has to provide callbacks to match
> > > these instructions and the rest should be taken care of by the code above it.
> > > 
> > > The flags are introduced one by one to avoid making too many changes in
> > > one go and to allow things to be explained better (hopefully).  They mostly
> > > are mutually independent with some clearly documented exceptions.
> > 
> > but I had to rework the core patches to address the problem pointed with the
> > generic power domains (genpd) framework pointed out by Ulf.
> > 
> > Namely, genpd expects its "noirq" callbacks to be invoked for devices in
> > runtime suspend too and it has valid reasons for that, so its "noirq"
> > callbacks can never be skipped, even for devices with the SMART_SUSPEND
> > flag set.  For this reason, the logic related to DPM_FLAG_SMART_SUSPEND
> > had to be moved from the core to the PCI bus type and the ACPI PM domain
> > which are mostly affected by it anyway.  The code after the changes looks
> > more straightforward to me, but it generally is more code and some patterns
> > had to be repeated in a few places.
> 
> I promised to send the rest of the series then:
> 
> > I will send the core patches for the remaining two flags introduced by the
> > original series separately and the intel-lpss and i2c-designware ones will
> > be posted when the core patches have been reviewed and agreed on.
> 
> and here it goes.
> 
> It actually only adds support for one additional flag, namely for
> DPM_FLAG_LEAVE_SUSPENDED, to the PM core (basic bits), PCI bus type and the
> ACPI PM domain.
> 
> That part of the series (patches [1-3/6]) is rather straightforward and, as PCI
> and the ACPI PM domain are concerned, it should be functionally equivalent to
> the previous version of the set, so I retained the Greg's ACKs on these patches.
> 
> The other part (patches [4-6/6]) is sort of new, as it makes the PM core
> carry out optimizations for devices with DPM_FLAG_LEAVE_SUSPENDED and/or
> DPM_FLAG_SMART_SUSPEND set where the "noirq", "early" and "late" system-wide
> PM callbacks provided by the drivers are invoked by the core directly.  That
> part basically allows platform drivers, for instance, to reuse runtime PM
> callbacks (by pointing ->suspend_late and ->resume_early to them) without
> adding extra checks to them, as long as they are called directly by the core
> (or the ACPI PM domain).

And on top of that, while replying to Ulf's comments I realized that devices
with nonzero runtime PM usage_count reference counters cannot be left in suspend
during system resume, because that would confuse the runtime PM framework going
forward.  Patches [1/6] and [5/6] have to be updated to avoid that, so here
goes a new revision.

It should apply on top of the current linux-next.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v3 1/6] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-11-12  0:34     ` [PATCH v3 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
@ 2017-11-12  0:37       ` Rafael J. Wysocki
  2017-11-16 15:10         ` Ulf Hansson
  2017-11-12  0:40       ` [PATCH v3 2/6] PCI / PM: Support for " Rafael J. Wysocki
                         ` (5 subsequent siblings)
  6 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-12  0:37 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
instruct the PM core and middle-layer (bus type, PM domain, etc.)
code that it is desirable to leave the device in runtime suspend
after system-wide transitions to the working state (for example,
the device may be slow to resume and it may be better to avoid
resuming it right away).

Generally, the middle-layer code involved in the handling of the
device is expected to indicate to the PM core whether or not the
device may be left in suspend with the help of the device's
power.may_skip_resume status bit.  That has to happen in the "noirq"
phase of the preceding system suspend (or analogous) transition.
The middle layer is then responsible for handling the device as
appropriate in its "noirq" resume callback which is executed
regardless of whether or not the device may be left suspended, but
the other resume callbacks (except for ->complete) will be skipped
automatically by the core if the device really can be left in
suspend.

The additional power.must_resume status bit introduced for the
implementation of this mechanisn is used internally by the PM core
to track the requirement to resume the device (which may depend on
its children etc).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---

v2 -> v3: Take dev->power.usage_count when updating power.must_resume in
          __device_suspend_noirq().

---
 Documentation/driver-api/pm/devices.rst |   24 ++++++++++-
 drivers/base/power/main.c               |   66 +++++++++++++++++++++++++++++---
 drivers/base/power/runtime.c            |    9 ++--
 include/linux/pm.h                      |   14 +++++-
 include/linux/pm_runtime.h              |    9 ++--
 5 files changed, 104 insertions(+), 18 deletions(-)

Index: linux-pm/include/linux/pm.h
===================================================================
--- linux-pm.orig/include/linux/pm.h
+++ linux-pm/include/linux/pm.h
@@ -559,6 +559,7 @@ struct pm_subsys_data {
  * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
  * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
  * SMART_SUSPEND: No need to resume the device from runtime suspend.
+ * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible.
  *
  * Setting SMART_PREPARE instructs bus types and PM domains which may want
  * system suspend/resume callbacks to be skipped for the device to return 0 from
@@ -572,10 +573,14 @@ struct pm_subsys_data {
  * necessary from the driver's perspective.  It also may cause them to skip
  * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by
  * the driver if they decide to leave the device in runtime suspend.
+ *
+ * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the
+ * driver prefers the device to be left in runtime suspend after system resume.
  */
-#define DPM_FLAG_NEVER_SKIP	BIT(0)
-#define DPM_FLAG_SMART_PREPARE	BIT(1)
-#define DPM_FLAG_SMART_SUSPEND	BIT(2)
+#define DPM_FLAG_NEVER_SKIP		BIT(0)
+#define DPM_FLAG_SMART_PREPARE		BIT(1)
+#define DPM_FLAG_SMART_SUSPEND		BIT(2)
+#define DPM_FLAG_LEAVE_SUSPENDED	BIT(3)
 
 struct dev_pm_info {
 	pm_message_t		power_state;
@@ -597,6 +602,8 @@ struct dev_pm_info {
 	bool			wakeup_path:1;
 	bool			syscore:1;
 	bool			no_pm_callbacks:1;	/* Owned by the PM core */
+	unsigned int		must_resume:1;	/* Owned by the PM core */
+	unsigned int		may_skip_resume:1;	/* Set by subsystems */
 #else
 	unsigned int		should_wakeup:1;
 #endif
@@ -765,6 +772,7 @@ extern int pm_generic_poweroff_late(stru
 extern int pm_generic_poweroff(struct device *dev);
 extern void pm_generic_complete(struct device *dev);
 
+extern bool dev_pm_may_skip_resume(struct device *dev);
 extern bool dev_pm_smart_suspend_and_suspended(struct device *dev);
 
 #else /* !CONFIG_PM_SLEEP */
Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -528,6 +528,18 @@ static void dpm_watchdog_clear(struct dp
 /*------------------------- Resume routines -------------------------*/
 
 /**
+ * dev_pm_may_skip_resume - System-wide device resume optimization check.
+ * @dev: Target device.
+ *
+ * Checks whether or not the device may be left in suspend after a system-wide
+ * transition to the working state.
+ */
+bool dev_pm_may_skip_resume(struct device *dev)
+{
+	return !dev->power.must_resume && pm_transition.event != PM_EVENT_RESTORE;
+}
+
+/**
  * device_resume_noirq - Execute a "noirq resume" callback for given device.
  * @dev: Device to handle.
  * @state: PM transition of the system being carried out.
@@ -575,6 +587,12 @@ static int device_resume_noirq(struct de
 	error = dpm_run_callback(callback, dev, state, info);
 	dev->power.is_noirq_suspended = false;
 
+	if (dev_pm_may_skip_resume(dev)) {
+		pm_runtime_set_suspended(dev);
+		dev->power.is_late_suspended = false;
+		dev->power.is_suspended = false;
+	}
+
  Out:
 	complete_all(&dev->power.completion);
 	TRACE_RESUME(error);
@@ -1076,6 +1094,22 @@ static pm_message_t resume_event(pm_mess
 	return PMSG_ON;
 }
 
+static void dpm_superior_set_must_resume(struct device *dev)
+{
+	struct device_link *link;
+	int idx;
+
+	if (dev->parent)
+		dev->parent->power.must_resume = true;
+
+	idx = device_links_read_lock();
+
+	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
+		link->supplier->power.must_resume = true;
+
+	device_links_read_unlock(idx);
+}
+
 /**
  * __device_suspend_noirq - Execute a "noirq suspend" callback for given device.
  * @dev: Device to handle.
@@ -1127,10 +1161,28 @@ static int __device_suspend_noirq(struct
 	}
 
 	error = dpm_run_callback(callback, dev, state, info);
-	if (!error)
-		dev->power.is_noirq_suspended = true;
-	else
+	if (error) {
 		async_error = error;
+		goto Complete;
+	}
+
+	dev->power.is_noirq_suspended = true;
+
+	if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) {
+		/*
+		 * The only safe strategy here is to require that if the device
+		 * may not be left in suspend, resume callbacks must be invoked
+		 * for it.
+		 */
+		dev->power.must_resume = dev->power.must_resume ||
+					!dev->power.may_skip_resume ||
+					atomic_read(&dev->power.usage_count);
+	} else {
+		dev->power.must_resume = true;
+	}
+
+	if (dev->power.must_resume)
+		dpm_superior_set_must_resume(dev);
 
 Complete:
 	complete_all(&dev->power.completion);
@@ -1487,6 +1539,9 @@ static int __device_suspend(struct devic
 		dev->power.direct_complete = false;
 	}
 
+	dev->power.may_skip_resume = false;
+	dev->power.must_resume = false;
+
 	dpm_watchdog_set(&wd, dev);
 	device_lock(dev);
 
@@ -1652,8 +1707,9 @@ static int device_prepare(struct device
 	if (dev->power.syscore)
 		return 0;
 
-	WARN_ON(dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
-		!pm_runtime_enabled(dev));
+	WARN_ON(!pm_runtime_enabled(dev) &&
+		dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND |
+					      DPM_FLAG_LEAVE_SUSPENDED));
 
 	/*
 	 * If a device's parent goes into runtime suspend at the wrong time,
Index: linux-pm/Documentation/driver-api/pm/devices.rst
===================================================================
--- linux-pm.orig/Documentation/driver-api/pm/devices.rst
+++ linux-pm/Documentation/driver-api/pm/devices.rst
@@ -788,6 +788,26 @@ must reflect the "active" status for run
 
 During system-wide resume from a sleep state it's easiest to put devices into
 the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.
-Refer to that document for more information regarding this particular issue as
+[Refer to that document for more information regarding this particular issue as
 well as for information on the device runtime power management framework in
-general.
+general.]
+
+However, it may be desirable to leave some devices in runtime suspend after
+system transitions to the working state and device drivers can use the
+``DPM_FLAG_LEAVE_SUSPENDED`` flag to indicate to the PM core (and middle-layer
+code) that this is the case.  Whether or not the devices will actually be left
+in suspend may depend on their state before the given system suspend-resume
+cycle and on the type of the system transition under way.  In particular,
+devices are not left suspended if that transition is a restore from hibernation,
+as device states are not guaranteed to be reflected by the information stored in
+the hibernation image in that case.
+
+The middle-layer code involved in the handling of the device has to indicate to
+the PM core if the device may be left in suspend with the help of its
+:c:member:`power.may_skip_resume` status bit.  That has to happen in the "noirq"
+phase of the preceding system-wide suspend (or analogous) transition.  The
+middle layer is then responsible for handling the device as appropriate in its
+"noirq" resume callback, which is executed regardless of whether or not the
+device may be left suspended, but the other resume callbacks (except for
+``->complete``) will be skipped automatically by the PM core if the device
+really can be left in suspend.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v3 2/6] PCI / PM: Support for LEAVE_SUSPENDED driver flag
  2017-11-12  0:34     ` [PATCH v3 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
  2017-11-12  0:37       ` [PATCH v3 1/6] PM / core: Add LEAVE_SUSPENDED driver flag Rafael J. Wysocki
@ 2017-11-12  0:40       ` Rafael J. Wysocki
  2017-11-12  0:40       ` [PATCH v3 3/6] ACPI / PM: Support for LEAVE_SUSPENDED driver flag in ACPI PM domain Rafael J. Wysocki
                         ` (4 subsequent siblings)
  6 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-12  0:40 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Add support for DPM_FLAG_LEAVE_SUSPENDED to the PCI bus type by
making it (a) set the power.may_skip_resume status bit for devices
that, from its perspective, may be left in suspend after system
wakeup from sleep and (b) return early from pci_pm_resume_noirq()
for devices whose remaining resume callbacks during the transition
under way are going to be skipped by the PM core.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---

v2 -> v3: Add the Acked-by from Bjorn, no changes in the patch.

---
 Documentation/power/pci.txt |   11 +++++++++++
 drivers/pci/pci-driver.c    |   19 +++++++++++++++++--
 2 files changed, 28 insertions(+), 2 deletions(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -699,7 +699,7 @@ static void pci_pm_complete(struct devic
 	pm_generic_complete(dev);
 
 	/* Resume device if platform firmware has put it in reset-power-on */
-	if (dev->power.direct_complete && pm_resume_via_firmware()) {
+	if (pm_runtime_suspended(dev) && pm_resume_via_firmware()) {
 		pci_power_t pre_sleep_state = pci_dev->current_state;
 
 		pci_update_current_state(pci_dev, pci_dev->current_state);
@@ -783,8 +783,10 @@ static int pci_pm_suspend_noirq(struct d
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 
-	if (dev_pm_smart_suspend_and_suspended(dev))
+	if (dev_pm_smart_suspend_and_suspended(dev)) {
+		dev->power.may_skip_resume = true;
 		return 0;
+	}
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_SUSPEND);
@@ -838,6 +840,16 @@ static int pci_pm_suspend_noirq(struct d
 Fixup:
 	pci_fixup_device(pci_fixup_suspend_late, pci_dev);
 
+	/*
+	 * If the target system sleep state is suspend-to-idle, it is sufficient
+	 * to check whether or not the device's wakeup settings are good for
+	 * runtime PM.  Otherwise, the pm_resume_via_firmware() check will cause
+	 * pci_pm_complete() to take care of fixing up the device's state
+	 * anyway, if need be.
+	 */
+	dev->power.may_skip_resume = device_may_wakeup(dev) ||
+					!device_can_wakeup(dev);
+
 	return 0;
 }
 
@@ -847,6 +859,9 @@ static int pci_pm_resume_noirq(struct de
 	struct device_driver *drv = dev->driver;
 	int error = 0;
 
+	if (dev_pm_may_skip_resume(dev))
+		return 0;
+
 	/*
 	 * Devices with DPM_FLAG_SMART_SUSPEND may be left in runtime suspend
 	 * during system suspend, so update their runtime PM status to "active"
Index: linux-pm/Documentation/power/pci.txt
===================================================================
--- linux-pm.orig/Documentation/power/pci.txt
+++ linux-pm/Documentation/power/pci.txt
@@ -994,6 +994,17 @@ into D0 going forward), but if it is in
 the function will set the power.direct_complete flag for it (to make the PM core
 skip the subsequent "thaw" callbacks for it) and return.
 
+Setting the DPM_FLAG_LEAVE_SUSPENDED flag means that the driver prefers the
+device to be left in suspend after system-wide transitions to the working state.
+This flag is checked by the PM core, but the PCI bus type informs the PM core
+which devices may be left in suspend from its perspective (that happens during
+the "noirq" phase of system-wide suspend and analogous transitions) and next it
+uses the dev_pm_may_skip_resume() helper to decide whether or not to return from
+pci_pm_resume_noirq() early, as the PM core will skip the remaining resume
+callbacks for the device during the transition under way and will set its
+runtime PM status to "suspended" if dev_pm_may_skip_resume() returns "true" for
+it.
+
 3.2. Device Runtime Power Management
 ------------------------------------
 In addition to providing device power management callbacks PCI device drivers

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v3 3/6] ACPI / PM: Support for LEAVE_SUSPENDED driver flag in ACPI PM domain
  2017-11-12  0:34     ` [PATCH v3 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
  2017-11-12  0:37       ` [PATCH v3 1/6] PM / core: Add LEAVE_SUSPENDED driver flag Rafael J. Wysocki
  2017-11-12  0:40       ` [PATCH v3 2/6] PCI / PM: Support for " Rafael J. Wysocki
@ 2017-11-12  0:40       ` Rafael J. Wysocki
  2017-11-12  0:42       ` [PATCH v3 4/6] PM / core: Add helpers for subsystem callback selection Rafael J. Wysocki
                         ` (3 subsequent siblings)
  6 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-12  0:40 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Add support for DPM_FLAG_LEAVE_SUSPENDED to the ACPI PM domain by
making it (a) set the power.may_skip_resume status bit for devices
that, from its perspective, may be left in suspend after system
wakeup from sleep and (b) return early from acpi_subsys_resume_noirq()
for devices whose remaining resume callbacks during the transition
under way are going to be skipped by the PM core.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---

v2 -> v3: No changes.

---
 drivers/acpi/device_pm.c |   27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

Index: linux-pm/drivers/acpi/device_pm.c
===================================================================
--- linux-pm.orig/drivers/acpi/device_pm.c
+++ linux-pm/drivers/acpi/device_pm.c
@@ -987,7 +987,7 @@ void acpi_subsys_complete(struct device
 	 * the sleep state it is going out of and it has never been resumed till
 	 * now, resume it in case the firmware powered it up.
 	 */
-	if (dev->power.direct_complete && pm_resume_via_firmware())
+	if (pm_runtime_suspended(dev) && pm_resume_via_firmware())
 		pm_request_resume(dev);
 }
 EXPORT_SYMBOL_GPL(acpi_subsys_complete);
@@ -1036,10 +1036,28 @@ EXPORT_SYMBOL_GPL(acpi_subsys_suspend_la
  */
 int acpi_subsys_suspend_noirq(struct device *dev)
 {
-	if (dev_pm_smart_suspend_and_suspended(dev))
+	int ret;
+
+	if (dev_pm_smart_suspend_and_suspended(dev)) {
+		dev->power.may_skip_resume = true;
 		return 0;
+	}
 
-	return pm_generic_suspend_noirq(dev);
+	ret = pm_generic_suspend_noirq(dev);
+	if (ret)
+		return ret;
+
+	/*
+	 * If the target system sleep state is suspend-to-idle, it is sufficient
+	 * to check whether or not the device's wakeup settings are good for
+	 * runtime PM.  Otherwise, the pm_resume_via_firmware() check will cause
+	 * acpi_subsys_complete() to take care of fixing up the device's state
+	 * anyway, if need be.
+	 */
+	dev->power.may_skip_resume = device_may_wakeup(dev) ||
+					!device_can_wakeup(dev);
+
+	return 0;
 }
 EXPORT_SYMBOL_GPL(acpi_subsys_suspend_noirq);
 
@@ -1049,6 +1067,9 @@ EXPORT_SYMBOL_GPL(acpi_subsys_suspend_no
  */
 int acpi_subsys_resume_noirq(struct device *dev)
 {
+	if (dev_pm_may_skip_resume(dev))
+		return 0;
+
 	/*
 	 * Devices with DPM_FLAG_SMART_SUSPEND may be left in runtime suspend
 	 * during system suspend, so update their runtime PM status to "active"

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v3 4/6] PM / core: Add helpers for subsystem callback selection
  2017-11-12  0:34     ` [PATCH v3 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
                         ` (2 preceding siblings ...)
  2017-11-12  0:40       ` [PATCH v3 3/6] ACPI / PM: Support for LEAVE_SUSPENDED driver flag in ACPI PM domain Rafael J. Wysocki
@ 2017-11-12  0:42       ` Rafael J. Wysocki
  2017-11-15  7:43         ` Ulf Hansson
  2017-11-12  0:43       ` [PATCH v3 5/6] PM / core: Direct handling of DPM_FLAG_LEAVE_SUSPENDED Rafael J. Wysocki
                         ` (2 subsequent siblings)
  6 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-12  0:42 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Add helper routines to find and return a suitable subsystem callback
during the "noirq" phases of system suspend/resume (or analogous)
transitions as well as during the "late" phase of system suspend and
the "early" phase of system resume (or analogous) transitions.

The helpers will be called from additional sites going forward.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v2 -> v3: No changes.

---
 drivers/base/power/main.c |  196 +++++++++++++++++++++++++++++++---------------
 1 file changed, 136 insertions(+), 60 deletions(-)

Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -525,6 +525,14 @@ static void dpm_watchdog_clear(struct dp
 #define dpm_watchdog_clear(x)
 #endif
 
+static pm_callback_t dpm_subsys_suspend_noirq_cb(struct device *dev,
+						 pm_message_t state,
+						 const char **info_p);
+
+static pm_callback_t dpm_subsys_suspend_late_cb(struct device *dev,
+						pm_message_t state,
+						const char **info_p);
+
 /*------------------------- Resume routines -------------------------*/
 
 /**
@@ -539,6 +547,35 @@ bool dev_pm_may_skip_resume(struct devic
 	return !dev->power.must_resume && pm_transition.event != PM_EVENT_RESTORE;
 }
 
+static pm_callback_t dpm_subsys_resume_noirq_cb(struct device *dev,
+						pm_message_t state,
+						const char **info_p)
+{
+	pm_callback_t callback;
+	const char *info;
+
+	if (dev->pm_domain) {
+		info = "noirq power domain ";
+		callback = pm_noirq_op(&dev->pm_domain->ops, state);
+	} else if (dev->type && dev->type->pm) {
+		info = "noirq type ";
+		callback = pm_noirq_op(dev->type->pm, state);
+	} else if (dev->class && dev->class->pm) {
+		info = "noirq class ";
+		callback = pm_noirq_op(dev->class->pm, state);
+	} else if (dev->bus && dev->bus->pm) {
+		info = "noirq bus ";
+		callback = pm_noirq_op(dev->bus->pm, state);
+	} else {
+		return NULL;
+	}
+
+	if (info_p)
+		*info_p = info;
+
+	return callback;
+}
+
 /**
  * device_resume_noirq - Execute a "noirq resume" callback for given device.
  * @dev: Device to handle.
@@ -550,8 +587,8 @@ bool dev_pm_may_skip_resume(struct devic
  */
 static int device_resume_noirq(struct device *dev, pm_message_t state, bool async)
 {
-	pm_callback_t callback = NULL;
-	const char *info = NULL;
+	pm_callback_t callback;
+	const char *info;
 	int error = 0;
 
 	TRACE_DEVICE(dev);
@@ -565,19 +602,7 @@ static int device_resume_noirq(struct de
 
 	dpm_wait_for_superior(dev, async);
 
-	if (dev->pm_domain) {
-		info = "noirq power domain ";
-		callback = pm_noirq_op(&dev->pm_domain->ops, state);
-	} else if (dev->type && dev->type->pm) {
-		info = "noirq type ";
-		callback = pm_noirq_op(dev->type->pm, state);
-	} else if (dev->class && dev->class->pm) {
-		info = "noirq class ";
-		callback = pm_noirq_op(dev->class->pm, state);
-	} else if (dev->bus && dev->bus->pm) {
-		info = "noirq bus ";
-		callback = pm_noirq_op(dev->bus->pm, state);
-	}
+	callback = dpm_subsys_resume_noirq_cb(dev, state, &info);
 
 	if (!callback && dev->driver && dev->driver->pm) {
 		info = "noirq driver ";
@@ -686,6 +711,35 @@ void dpm_resume_noirq(pm_message_t state
 	dpm_noirq_end();
 }
 
+static pm_callback_t dpm_subsys_resume_early_cb(struct device *dev,
+						pm_message_t state,
+						const char **info_p)
+{
+	pm_callback_t callback;
+	const char *info;
+
+	if (dev->pm_domain) {
+		info = "early power domain ";
+		callback = pm_late_early_op(&dev->pm_domain->ops, state);
+	} else if (dev->type && dev->type->pm) {
+		info = "early type ";
+		callback = pm_late_early_op(dev->type->pm, state);
+	} else if (dev->class && dev->class->pm) {
+		info = "early class ";
+		callback = pm_late_early_op(dev->class->pm, state);
+	} else if (dev->bus && dev->bus->pm) {
+		info = "early bus ";
+		callback = pm_late_early_op(dev->bus->pm, state);
+	} else {
+		return NULL;
+	}
+
+	if (info_p)
+		*info_p = info;
+
+	return callback;
+}
+
 /**
  * device_resume_early - Execute an "early resume" callback for given device.
  * @dev: Device to handle.
@@ -696,8 +750,8 @@ void dpm_resume_noirq(pm_message_t state
  */
 static int device_resume_early(struct device *dev, pm_message_t state, bool async)
 {
-	pm_callback_t callback = NULL;
-	const char *info = NULL;
+	pm_callback_t callback;
+	const char *info;
 	int error = 0;
 
 	TRACE_DEVICE(dev);
@@ -711,19 +765,7 @@ static int device_resume_early(struct de
 
 	dpm_wait_for_superior(dev, async);
 
-	if (dev->pm_domain) {
-		info = "early power domain ";
-		callback = pm_late_early_op(&dev->pm_domain->ops, state);
-	} else if (dev->type && dev->type->pm) {
-		info = "early type ";
-		callback = pm_late_early_op(dev->type->pm, state);
-	} else if (dev->class && dev->class->pm) {
-		info = "early class ";
-		callback = pm_late_early_op(dev->class->pm, state);
-	} else if (dev->bus && dev->bus->pm) {
-		info = "early bus ";
-		callback = pm_late_early_op(dev->bus->pm, state);
-	}
+	callback = dpm_subsys_resume_early_cb(dev, state, &info);
 
 	if (!callback && dev->driver && dev->driver->pm) {
 		info = "early driver ";
@@ -1110,6 +1152,35 @@ static void dpm_superior_set_must_resume
 	device_links_read_unlock(idx);
 }
 
+static pm_callback_t dpm_subsys_suspend_noirq_cb(struct device *dev,
+						 pm_message_t state,
+						 const char **info_p)
+{
+	pm_callback_t callback;
+	const char *info;
+
+	if (dev->pm_domain) {
+		info = "noirq power domain ";
+		callback = pm_noirq_op(&dev->pm_domain->ops, state);
+	} else if (dev->type && dev->type->pm) {
+		info = "noirq type ";
+		callback = pm_noirq_op(dev->type->pm, state);
+	} else if (dev->class && dev->class->pm) {
+		info = "noirq class ";
+		callback = pm_noirq_op(dev->class->pm, state);
+	} else if (dev->bus && dev->bus->pm) {
+		info = "noirq bus ";
+		callback = pm_noirq_op(dev->bus->pm, state);
+	} else {
+		return NULL;
+	}
+
+	if (info_p)
+		*info_p = info;
+
+	return callback;
+}
+
 /**
  * __device_suspend_noirq - Execute a "noirq suspend" callback for given device.
  * @dev: Device to handle.
@@ -1121,8 +1192,8 @@ static void dpm_superior_set_must_resume
  */
 static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool async)
 {
-	pm_callback_t callback = NULL;
-	const char *info = NULL;
+	pm_callback_t callback;
+	const char *info;
 	int error = 0;
 
 	TRACE_DEVICE(dev);
@@ -1141,19 +1212,7 @@ static int __device_suspend_noirq(struct
 	if (dev->power.syscore || dev->power.direct_complete)
 		goto Complete;
 
-	if (dev->pm_domain) {
-		info = "noirq power domain ";
-		callback = pm_noirq_op(&dev->pm_domain->ops, state);
-	} else if (dev->type && dev->type->pm) {
-		info = "noirq type ";
-		callback = pm_noirq_op(dev->type->pm, state);
-	} else if (dev->class && dev->class->pm) {
-		info = "noirq class ";
-		callback = pm_noirq_op(dev->class->pm, state);
-	} else if (dev->bus && dev->bus->pm) {
-		info = "noirq bus ";
-		callback = pm_noirq_op(dev->bus->pm, state);
-	}
+	callback = dpm_subsys_suspend_noirq_cb(dev, state, &info);
 
 	if (!callback && dev->driver && dev->driver->pm) {
 		info = "noirq driver ";
@@ -1288,6 +1347,35 @@ int dpm_suspend_noirq(pm_message_t state
 	return ret;
 }
 
+static pm_callback_t dpm_subsys_suspend_late_cb(struct device *dev,
+						pm_message_t state,
+						const char **info_p)
+{
+	pm_callback_t callback;
+	const char *info;
+
+	if (dev->pm_domain) {
+		info = "late power domain ";
+		callback = pm_late_early_op(&dev->pm_domain->ops, state);
+	} else if (dev->type && dev->type->pm) {
+		info = "late type ";
+		callback = pm_late_early_op(dev->type->pm, state);
+	} else if (dev->class && dev->class->pm) {
+		info = "late class ";
+		callback = pm_late_early_op(dev->class->pm, state);
+	} else if (dev->bus && dev->bus->pm) {
+		info = "late bus ";
+		callback = pm_late_early_op(dev->bus->pm, state);
+	} else {
+		return NULL;
+	}
+
+	if (info_p)
+		*info_p = info;
+
+	return callback;
+}
+
 /**
  * __device_suspend_late - Execute a "late suspend" callback for given device.
  * @dev: Device to handle.
@@ -1298,8 +1386,8 @@ int dpm_suspend_noirq(pm_message_t state
  */
 static int __device_suspend_late(struct device *dev, pm_message_t state, bool async)
 {
-	pm_callback_t callback = NULL;
-	const char *info = NULL;
+	pm_callback_t callback;
+	const char *info;
 	int error = 0;
 
 	TRACE_DEVICE(dev);
@@ -1320,19 +1408,7 @@ static int __device_suspend_late(struct
 	if (dev->power.syscore || dev->power.direct_complete)
 		goto Complete;
 
-	if (dev->pm_domain) {
-		info = "late power domain ";
-		callback = pm_late_early_op(&dev->pm_domain->ops, state);
-	} else if (dev->type && dev->type->pm) {
-		info = "late type ";
-		callback = pm_late_early_op(dev->type->pm, state);
-	} else if (dev->class && dev->class->pm) {
-		info = "late class ";
-		callback = pm_late_early_op(dev->class->pm, state);
-	} else if (dev->bus && dev->bus->pm) {
-		info = "late bus ";
-		callback = pm_late_early_op(dev->bus->pm, state);
-	}
+	callback = dpm_subsys_suspend_late_cb(dev, state, &info);
 
 	if (!callback && dev->driver && dev->driver->pm) {
 		info = "late driver ";

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v3 5/6] PM / core: Direct handling of DPM_FLAG_LEAVE_SUSPENDED
  2017-11-12  0:34     ` [PATCH v3 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
                         ` (3 preceding siblings ...)
  2017-11-12  0:42       ` [PATCH v3 4/6] PM / core: Add helpers for subsystem callback selection Rafael J. Wysocki
@ 2017-11-12  0:43       ` Rafael J. Wysocki
  2017-11-12  0:44       ` [PATCH v3 6/6] PM / core: DPM_FLAG_SMART_SUSPEND optimization Rafael J. Wysocki
  2017-11-18 14:27       ` [PATCH v4 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
  6 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-12  0:43 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make the PM core handle DPM_FLAG_LEAVE_SUSPENDED directly for
devices whose "noirq", "late" and "early" driver callbacks are
invoked directly by it.

Namely, make it skip all of the system-wide resume callbacks for
such devices with DPM_FLAG_LEAVE_SUSPENDED set if they are in
runtime suspend during the "noirq" phase of system-wide suspend
(or analogous) transitions or the system transition under way is
a proper suspend (rather than anything related to hibernation) and
the device's wakeup settings are compatible with runtime PM (that
is, the device cannot generate wakeup signals at all or it is
allowed to wake up the system from sleep).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v2 -> v3: Rebase on the v3 of patch [1/6].

---
 Documentation/driver-api/pm/devices.rst |    9 ++++++
 drivers/base/power/main.c               |   47 ++++++++++++++++++++++++++++----
 2 files changed, 51 insertions(+), 5 deletions(-)

Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -589,6 +589,7 @@ static int device_resume_noirq(struct de
 {
 	pm_callback_t callback;
 	const char *info;
+	bool skip_resume;
 	int error = 0;
 
 	TRACE_DEVICE(dev);
@@ -602,23 +603,33 @@ static int device_resume_noirq(struct de
 
 	dpm_wait_for_superior(dev, async);
 
+	skip_resume = dev_pm_may_skip_resume(dev);
+
 	callback = dpm_subsys_resume_noirq_cb(dev, state, &info);
+	if (callback)
+		goto Run;
+
+	if (skip_resume)
+		goto Skip;
 
 	if (!callback && dev->driver && dev->driver->pm) {
 		info = "noirq driver ";
 		callback = pm_noirq_op(dev->driver->pm, state);
 	}
 
+Run:
 	error = dpm_run_callback(callback, dev, state, info);
+
+Skip:
 	dev->power.is_noirq_suspended = false;
 
-	if (dev_pm_may_skip_resume(dev)) {
+	if (skip_resume) {
 		pm_runtime_set_suspended(dev);
 		dev->power.is_late_suspended = false;
 		dev->power.is_suspended = false;
 	}
 
- Out:
+Out:
 	complete_all(&dev->power.completion);
 	TRACE_RESUME(error);
 	return error;
@@ -1194,6 +1205,7 @@ static int __device_suspend_noirq(struct
 {
 	pm_callback_t callback;
 	const char *info;
+	bool direct_cb = false;
 	int error = 0;
 
 	TRACE_DEVICE(dev);
@@ -1213,12 +1225,17 @@ static int __device_suspend_noirq(struct
 		goto Complete;
 
 	callback = dpm_subsys_suspend_noirq_cb(dev, state, &info);
+	if (callback)
+		goto Run;
 
-	if (!callback && dev->driver && dev->driver->pm) {
+	direct_cb = true;
+
+	if (dev->driver && dev->driver->pm) {
 		info = "noirq driver ";
 		callback = pm_noirq_op(dev->driver->pm, state);
 	}
 
+Run:
 	error = dpm_run_callback(callback, dev, state, info);
 	if (error) {
 		async_error = error;
@@ -1228,13 +1245,33 @@ static int __device_suspend_noirq(struct
 	dev->power.is_noirq_suspended = true;
 
 	if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) {
+		pm_message_t resume_msg = resume_event(state);
+		bool skip_resume;
+
+		if (direct_cb &&
+		    !dpm_subsys_suspend_late_cb(dev, state, NULL) &&
+		    !dpm_subsys_resume_early_cb(dev, resume_msg, NULL) &&
+		    !dpm_subsys_resume_noirq_cb(dev, resume_msg, NULL)) {
+			/*
+			 * If all of the device driver's "noirq", "late" and
+			 * "early" callbacks are invoked directly by the core,
+			 * the decision to allow the device to stay in suspend
+			 * can be based on its current runtime PM status and its
+			 * wakeup settings.
+			 */
+			skip_resume = pm_runtime_status_suspended(dev) ||
+				(resume_msg.event == PM_EVENT_RESUME &&
+				 (!device_can_wakeup(dev) ||
+				  device_may_wakeup(dev)));
+		} else {
+			skip_resume = dev->power.may_skip_resume;
+		}
 		/*
 		 * The only safe strategy here is to require that if the device
 		 * may not be left in suspend, resume callbacks must be invoked
 		 * for it.
 		 */
-		dev->power.must_resume = dev->power.must_resume ||
-					!dev->power.may_skip_resume ||
+		dev->power.must_resume = dev->power.must_resume || !skip_resume ||
 					atomic_read(&dev->power.usage_count);
 	} else {
 		dev->power.must_resume = true;
Index: linux-pm/Documentation/driver-api/pm/devices.rst
===================================================================
--- linux-pm.orig/Documentation/driver-api/pm/devices.rst
+++ linux-pm/Documentation/driver-api/pm/devices.rst
@@ -811,3 +811,12 @@ middle layer is then responsible for han
 device may be left suspended, but the other resume callbacks (except for
 ``->complete``) will be skipped automatically by the PM core if the device
 really can be left in suspend.
+
+For devices whose "noirq", "late" and "early" driver callbacks are invoked
+directly by the PM core, all of the system-wide resume callbacks are skipped if
+``DPM_FLAG_LEAVE_SUSPENDED`` is set and the device is in runtime suspend during
+the ``suspend_noirq`` (or analogous) phase or the transition under way is a
+proper system suspend (rather than anything related to hibernation) and the
+device's wakeup settings are suitable for runtime PM (that is, it cannot
+generate wakeup signals at all or it is allowed to wake up the system from
+sleep).

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v3 6/6] PM / core: DPM_FLAG_SMART_SUSPEND optimization
  2017-11-12  0:34     ` [PATCH v3 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
                         ` (4 preceding siblings ...)
  2017-11-12  0:43       ` [PATCH v3 5/6] PM / core: Direct handling of DPM_FLAG_LEAVE_SUSPENDED Rafael J. Wysocki
@ 2017-11-12  0:44       ` Rafael J. Wysocki
  2017-11-18 14:27       ` [PATCH v4 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
  6 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-12  0:44 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make the PM core avoid invoking the "late" and "noirq" system-wide
suspend (or analogous) callbacks for devices that are in runtime
suspend during the corresponding phases of system-wide suspend
(or analogous) transitions.

The underlying observation is that runtime PM is disabled for
devices during those system-wide suspend phases, so their runtime
PM status should not change going forward and if it has not changed
so far, their state should be compatible with the target system
sleep state.

This change really makes it possible for, say, platform device
drivers to re-use runtime PM suspend and resume callbacks by
pointing ->suspend_late and ->resume_early, respectively (and
possibly the analogous hibernation-related callback pointers too),
to them without adding any extra "is the device already suspended?"
type of checks to the callback routines, as long as they will be
invoked directly by the core.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v2 -> v3: No changes.

---
 Documentation/driver-api/pm/devices.rst |   18 +++++----
 drivers/base/power/main.c               |   62 ++++++++++++++++++++++++++++----
 2 files changed, 66 insertions(+), 14 deletions(-)

Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -536,6 +536,24 @@ static pm_callback_t dpm_subsys_suspend_
 /*------------------------- Resume routines -------------------------*/
 
 /**
+ * suspend_event - Return a "suspend" message for given "resume" one.
+ * @resume_msg: PM message representing a system-wide resume transition.
+ */
+static pm_message_t suspend_event(pm_message_t resume_msg)
+{
+	switch (resume_msg.event) {
+	case PM_EVENT_RESUME:
+		return PMSG_SUSPEND;
+	case PM_EVENT_THAW:
+	case PM_EVENT_RESTORE:
+		return PMSG_FREEZE;
+	case PM_EVENT_RECOVER:
+		return PMSG_HIBERNATE;
+	}
+	return PMSG_ON;
+}
+
+/**
  * dev_pm_may_skip_resume - System-wide device resume optimization check.
  * @dev: Target device.
  *
@@ -609,6 +627,25 @@ static int device_resume_noirq(struct de
 	if (callback)
 		goto Run;
 
+	if (dev_pm_smart_suspend_and_suspended(dev)) {
+		pm_message_t suspend_msg = suspend_event(state);
+
+		/*
+		 * If "freeze" callbacks have been skipped during a transition
+		 * related to hibernation, the subsequent "thaw" callbacks must
+		 * be skipped too or bad things may happen.  Otherwise, if the
+		 * device is to be resumed, its runtime PM status must be
+		 * changed to reflect the new configuration.
+		 */
+		if (!dpm_subsys_suspend_late_cb(dev, suspend_msg, NULL) &&
+		    !dpm_subsys_suspend_noirq_cb(dev, suspend_msg, NULL)) {
+			if (state.event == PM_EVENT_THAW)
+				skip_resume = true;
+			else if (!skip_resume)
+				pm_runtime_set_active(dev);
+		}
+	}
+
 	if (skip_resume)
 		goto Skip;
 
@@ -1228,7 +1265,10 @@ static int __device_suspend_noirq(struct
 	if (callback)
 		goto Run;
 
-	direct_cb = true;
+	direct_cb = !dpm_subsys_suspend_late_cb(dev, state, NULL);
+
+	if (dev_pm_smart_suspend_and_suspended(dev) && direct_cb)
+		goto Skip;
 
 	if (dev->driver && dev->driver->pm) {
 		info = "noirq driver ";
@@ -1242,6 +1282,7 @@ Run:
 		goto Complete;
 	}
 
+Skip:
 	dev->power.is_noirq_suspended = true;
 
 	if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) {
@@ -1249,7 +1290,6 @@ Run:
 		bool skip_resume;
 
 		if (direct_cb &&
-		    !dpm_subsys_suspend_late_cb(dev, state, NULL) &&
 		    !dpm_subsys_resume_early_cb(dev, resume_msg, NULL) &&
 		    !dpm_subsys_resume_noirq_cb(dev, resume_msg, NULL)) {
 			/*
@@ -1446,17 +1486,27 @@ static int __device_suspend_late(struct
 		goto Complete;
 
 	callback = dpm_subsys_suspend_late_cb(dev, state, &info);
+	if (callback)
+		goto Run;
 
-	if (!callback && dev->driver && dev->driver->pm) {
+	if (dev_pm_smart_suspend_and_suspended(dev) &&
+	    !dpm_subsys_suspend_noirq_cb(dev, state, NULL))
+		goto Skip;
+
+	if (dev->driver && dev->driver->pm) {
 		info = "late driver ";
 		callback = pm_late_early_op(dev->driver->pm, state);
 	}
 
+Run:
 	error = dpm_run_callback(callback, dev, state, info);
-	if (!error)
-		dev->power.is_late_suspended = true;
-	else
+	if (error) {
 		async_error = error;
+		goto Complete;
+	}
+
+Skip:
+	dev->power.is_late_suspended = true;
 
 Complete:
 	TRACE_SUSPEND(error);
Index: linux-pm/Documentation/driver-api/pm/devices.rst
===================================================================
--- linux-pm.orig/Documentation/driver-api/pm/devices.rst
+++ linux-pm/Documentation/driver-api/pm/devices.rst
@@ -777,14 +777,16 @@ The driver can indicate that by setting
 runtime suspend at the beginning of the ``suspend_late`` phase of system-wide
 suspend (or in the ``poweroff_late`` phase of hibernation), when runtime PM
 has been disabled for it, under the assumption that its state should not change
-after that point until the system-wide transition is over.  If that happens, the
-driver's system-wide resume callbacks, if present, may still be invoked during
-the subsequent system-wide resume transition and the device's runtime power
-management status may be set to "active" before enabling runtime PM for it,
-so the driver must be prepared to cope with the invocation of its system-wide
-resume callbacks back-to-back with its ``->runtime_suspend`` one (without the
-intervening ``->runtime_resume`` and so on) and the final state of the device
-must reflect the "active" status for runtime PM in that case.
+after that point until the system-wide transition is over (the PM core itself
+does that for devices whose "noirq", "late" and "early" system-wide PM callbacks
+are executed directly by it).  If that happens, the driver's system-wide resume
+callbacks, if present, may still be invoked during the subsequent system-wide
+resume transition and the device's runtime power management status may be set
+to "active" before enabling runtime PM for it, so the driver must be prepared to
+cope with the invocation of its system-wide resume callbacks back-to-back with
+its ``->runtime_suspend`` one (without the intervening ``->runtime_resume`` and
+so on) and the final state of the device must reflect the "active" status for
+runtime PM in that case.
 
 During system-wide resume from a sleep state it's easiest to put devices into
 the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 1/6] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-11-10 23:45         ` Rafael J. Wysocki
  2017-11-11  0:41           ` Rafael J. Wysocki
  2017-11-11  1:36           ` Rafael J. Wysocki
@ 2017-11-14 16:07           ` Ulf Hansson
  2017-11-15  1:48             ` Rafael J. Wysocki
  2 siblings, 1 reply; 135+ messages in thread
From: Ulf Hansson @ 2017-11-14 16:07 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Andy Shevchenko,
	Kevin Hilman

On 11 November 2017 at 00:45, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Fri, Nov 10, 2017 at 10:09 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> On 8 November 2017 at 14:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>
>>> Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
>>> instruct the PM core and middle-layer (bus type, PM domain, etc.)
>>> code that it is desirable to leave the device in runtime suspend
>>> after system-wide transitions to the working state (for example,
>>> the device may be slow to resume and it may be better to avoid
>>> resuming it right away).
>>>
>>> Generally, the middle-layer code involved in the handling of the
>>> device is expected to indicate to the PM core whether or not the
>>> device may be left in suspend with the help of the device's
>>> power.may_skip_resume status bit.  That has to happen in the "noirq"
>>> phase of the preceding system suspend (or analogous) transition.
>>> The middle layer is then responsible for handling the device as
>>> appropriate in its "noirq" resume callback which is executed
>>> regardless of whether or not the device may be left suspended, but
>>> the other resume callbacks (except for ->complete) will be skipped
>>> automatically by the core if the device really can be left in
>>> suspend.
>>
>> I don't understand the reason to why you need to skip invoking resume
>> callbacks to achieve this behavior, could you elaborate on that?
>
> The reason why it is done this way is because that takes less code and
> is easier (or at least less error-prone, because it avoids repeating
> patterns in middle layers).
>
> Note that the callbacks only may be skipped by the core if the middle
> layer has set power.skip_resume for the device (or if the core is
> handling it in patch [5/6], but that's one more step ahead still).
>
>> Couldn't the PM domain or the middle-layer instead decide what to do?
>
> They still can, the whole thing is a total opt-in.
>
> But to be constructive, do you have any specific examples in mind?

See more below.

>
>> To me it sounds a bit prone to errors by skipping callbacks from the
>> PM core, and I wonder if the general driver author will be able to
>> understand how to use this flag properly.
>
> This has nothing to do with general driver authors and I'm not sure
> what you mean here and where you are going with this.

Let me elaborate.

My general goal is that I want to make it easier (or as easy as
possible) for the general driver author to deploy runtime PM and
system-wide PM support - in an optimized manner. Therefore, I am
pondering over the solution you picked in this series, trying to
understand how it fits into those aspects.

Particular I am a bit worried from a complexity point of view, about
the part with skipping callbacks from the PM core. We have observed
some difficulties with the direct_complete path (i2c dw driver), which
is based on a similar approach as this one.

Additionally, in this case, to trigger skipping of callbacks to
happen, first, drivers needs to inform the middle-layer, second, the
middle layer acts on that information and then informs the PM core,
then in the third step, the PM core can decide what to do. It doesn't
sound straight-forward.

I guess I need to be convinced that this new approach is going to be
better than the the direct_complete path, so it somehow can replace it
along the road. Otherwise, we may end up just having yet another way
of skipping callbacks in the PM core and I don't like that.

Of course, I also realize this hole thing is opt-in, so nothing will
break and we are all good. :-)

>
>> That said, as the series don't include any changes for drivers making
>> use of the flag, could please fold in such change as it would provide
>> a more complete picture?
>
> I've already done so, see https://patchwork.kernel.org/patch/10007349/
>
> IMHO it's not really useful to drag this stuff (which doesn't change
> BTW) along with every iteration of the core patches.

Well, to me it's useful because it shows how these flags can/will be used.

Anyway, I thought you scraped that patch and was working on a new
version. I will have a look then.

[...]

>>>   * device_resume_noirq - Execute a "noirq resume" callback for given device.
>>>   * @dev: Device to handle.
>>>   * @state: PM transition of the system being carried out.
>>> @@ -575,6 +587,12 @@ static int device_resume_noirq(struct de
>>>         error = dpm_run_callback(callback, dev, state, info);
>>>         dev->power.is_noirq_suspended = false;
>>>
>>> +       if (dev_pm_may_skip_resume(dev)) {
>>> +               pm_runtime_set_suspended(dev);
>>
>> According to the doc, the DPM_FLAG_LEAVE_SUSPENDED intends to leave
>> the device in runtime suspend state during system resume.
>> However, here you are actually trying to change its runtime PM state to that.
>
> So the doc needs to be fixed. :-)

Yep.

>
> But I'm guessing that this just is a misunderstanding and you mean the
> phrase "it may be desirable to leave some devices in runtime suspend
> after [...]".  Yes, it is talking about "runtime suspend", but
> actually "runtime suspend" is the only kind of "suspend" you can leave
> a device in after a system transition to the working state.  It never
> says that the device must have been suspended before the preceding
> system transition into a sleep state started.

My point is, it's isn't obvious why you need to make sure the device's
runtime PM status is set to "RPM_SUSPENDED" when leaving the resume
noirq phase. You did explain that somewhat above, thanks!

Perhaps you could fold in some of that information into the doc as well?

>
>> Moreover, you should check the return value from
>> pm_runtime_set_suspended().
>
> This is in "noirq", so failures of that are meaningless here.
>
>> Then I wonder, what should you do when it fails here?
>>
>> Perhaps a better idea is to do this in the noirq suspend phase,
>> because it allows you to bail out in case pm_runtime_set_suspended()
>> fails.
>
> This doesn't make sense, sorry.

What do you mean by "failures of that are meaningless here."?

I was suggesting, instead of calling pm_runtime_set_suspended() in the
noirq *resume* phase, why can't you do that in the noirq *suspend*
phase?

In the noirq *suspend* phase it's not too late to deal with errors!? Or is it?

[...]

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 1/6] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-11-14 16:07           ` Ulf Hansson
@ 2017-11-15  1:48             ` Rafael J. Wysocki
  2017-11-16 10:18               ` Ulf Hansson
  0 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-15  1:48 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, Linux PM, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Andy Shevchenko,
	Kevin Hilman

On Tuesday, November 14, 2017 5:07:59 PM CET Ulf Hansson wrote:
> On 11 November 2017 at 00:45, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > On Fri, Nov 10, 2017 at 10:09 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >> On 8 November 2017 at 14:25, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>>
> >>> Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
> >>> instruct the PM core and middle-layer (bus type, PM domain, etc.)
> >>> code that it is desirable to leave the device in runtime suspend
> >>> after system-wide transitions to the working state (for example,
> >>> the device may be slow to resume and it may be better to avoid
> >>> resuming it right away).
> >>>
> >>> Generally, the middle-layer code involved in the handling of the
> >>> device is expected to indicate to the PM core whether or not the
> >>> device may be left in suspend with the help of the device's
> >>> power.may_skip_resume status bit.  That has to happen in the "noirq"
> >>> phase of the preceding system suspend (or analogous) transition.
> >>> The middle layer is then responsible for handling the device as
> >>> appropriate in its "noirq" resume callback which is executed
> >>> regardless of whether or not the device may be left suspended, but
> >>> the other resume callbacks (except for ->complete) will be skipped
> >>> automatically by the core if the device really can be left in
> >>> suspend.
> >>
> >> I don't understand the reason to why you need to skip invoking resume
> >> callbacks to achieve this behavior, could you elaborate on that?
> >
> > The reason why it is done this way is because that takes less code and
> > is easier (or at least less error-prone, because it avoids repeating
> > patterns in middle layers).
> >
> > Note that the callbacks only may be skipped by the core if the middle
> > layer has set power.skip_resume for the device (or if the core is
> > handling it in patch [5/6], but that's one more step ahead still).
> >
> >> Couldn't the PM domain or the middle-layer instead decide what to do?
> >
> > They still can, the whole thing is a total opt-in.
> >
> > But to be constructive, do you have any specific examples in mind?
> 
> See more below.
> 
> >
> >> To me it sounds a bit prone to errors by skipping callbacks from the
> >> PM core, and I wonder if the general driver author will be able to
> >> understand how to use this flag properly.
> >
> > This has nothing to do with general driver authors and I'm not sure
> > what you mean here and where you are going with this.
> 
> Let me elaborate.
> 
> My general goal is that I want to make it easier (or as easy as
> possible) for the general driver author to deploy runtime PM and
> system-wide PM support - in an optimized manner. Therefore, I am
> pondering over the solution you picked in this series, trying to
> understand how it fits into those aspects.
> 
> Particular I am a bit worried from a complexity point of view, about
> the part with skipping callbacks from the PM core. We have observed
> some difficulties with the direct_complete path (i2c dw driver), which
> is based on a similar approach as this one.

These are resume callbacks, not suspend callbacks.  Also not all of them
are skipped.  That is quite a bit different from skipping *all* callbacks.

Moreover, at the point the core decides to skip the callbacks, the device
*has* *to* be left suspended and there simply is no point in running them
no matter what.

That part of code can be trivially moved to middle layers, but then each
of them will have to do exactly the same thing.  I don't see any reason to
do that and I'm not finding one in your comments.  Sorry.

> Additionally, in this case, to trigger skipping of callbacks to
> happen, first, drivers needs to inform the middle-layer, second, the
> middle layer acts on that information and then informs the PM core,
> then in the third step, the PM core can decide what to do. It doesn't
> sound straight-forward.

It really doesn't work like that.

First, the driver sets the LEAVE_SUSPENDED flag for the core to consume.
The middle layers don't have to look at it at all.

Second, each middle layer sets power.may_skip_resume for devices whose
state after system suspend should match the runtime suspend state.  The
middle layer must know that this is the case to set that bit.  [The core
effectively does that part for devices handled by it directly in patch
[5/6].]

The core then takes the LEAVE_SUSPENDED flags, power.may_skip_resume bits,
status of the children and consumers into account in order to produce the
power.must_resume bits and those are used (later) to decide whether or not
to resume the devices.  That decision is made by the core and so the core
acts on it and the middle layers must follow.

> I guess I need to be convinced that this new approach is going to be
> better than the the direct_complete path, so it somehow can replace it
> along the road. Otherwise, we may end up just having yet another way
> of skipping callbacks in the PM core and I don't like that.

Well, this works the other way around this time, I'm afraid.  At this point
you need to convince me that the approach has real issues. :-)

> Of course, I also realize this hole thing is opt-in, so nothing will
> break and we are all good. :-)
> 
> >
> >> That said, as the series don't include any changes for drivers making
> >> use of the flag, could please fold in such change as it would provide
> >> a more complete picture?
> >
> > I've already done so, see https://patchwork.kernel.org/patch/10007349/
> >
> > IMHO it's not really useful to drag this stuff (which doesn't change
> > BTW) along with every iteration of the core patches.
> 
> Well, to me it's useful because it shows how these flags can/will be used.
> 
> Anyway, I thought you scraped that patch and was working on a new
> version. I will have a look then.
> 
> [...]
> 
> >>>   * device_resume_noirq - Execute a "noirq resume" callback for given device.
> >>>   * @dev: Device to handle.
> >>>   * @state: PM transition of the system being carried out.
> >>> @@ -575,6 +587,12 @@ static int device_resume_noirq(struct de
> >>>         error = dpm_run_callback(callback, dev, state, info);
> >>>         dev->power.is_noirq_suspended = false;
> >>>
> >>> +       if (dev_pm_may_skip_resume(dev)) {
> >>> +               pm_runtime_set_suspended(dev);
> >>
> >> According to the doc, the DPM_FLAG_LEAVE_SUSPENDED intends to leave
> >> the device in runtime suspend state during system resume.
> >> However, here you are actually trying to change its runtime PM state to that.
> >
> > So the doc needs to be fixed. :-)
> 
> Yep.
> 
> >
> > But I'm guessing that this just is a misunderstanding and you mean the
> > phrase "it may be desirable to leave some devices in runtime suspend
> > after [...]".  Yes, it is talking about "runtime suspend", but
> > actually "runtime suspend" is the only kind of "suspend" you can leave
> > a device in after a system transition to the working state.  It never
> > says that the device must have been suspended before the preceding
> > system transition into a sleep state started.
> 
> My point is, it's isn't obvious why you need to make sure the device's
> runtime PM status is set to "RPM_SUSPENDED" when leaving the resume
> noirq phase. You did explain that somewhat above, thanks!
> 
> Perhaps you could fold in some of that information into the doc as well?

The doc doesn't describe the design of the code, though.

I guess I'll just add a comment at the point where the status changes.

> >
> >> Moreover, you should check the return value from
> >> pm_runtime_set_suspended().
> >
> > This is in "noirq", so failures of that are meaningless here.
> >
> >> Then I wonder, what should you do when it fails here?
> >>
> >> Perhaps a better idea is to do this in the noirq suspend phase,
> >> because it allows you to bail out in case pm_runtime_set_suspended()
> >> fails.
> >
> > This doesn't make sense, sorry.
> 
> What do you mean by "failures of that are meaningless here."?

If all devices have runtime PM disabled, pm_runtime_set_suspended() should
just do what it is asked for unless called with an invalid argument or
similar.

> I was suggesting, instead of calling pm_runtime_set_suspended() in the
> noirq *resume* phase, why can't you do that in the noirq *suspend*
> phase?
> 
> In the noirq *suspend* phase it's not too late to deal with errors!? Or is it?

At that point it has not been decided whether or not the devices will stay
suspended yet.  The status cannot be changed before making that decision,
which only happens in the noirq resume phase.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v3 4/6] PM / core: Add helpers for subsystem callback selection
  2017-11-12  0:42       ` [PATCH v3 4/6] PM / core: Add helpers for subsystem callback selection Rafael J. Wysocki
@ 2017-11-15  7:43         ` Ulf Hansson
  2017-11-15 17:55           ` Rafael J. Wysocki
  0 siblings, 1 reply; 135+ messages in thread
From: Ulf Hansson @ 2017-11-15  7:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman

On 12 November 2017 at 01:42, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Add helper routines to find and return a suitable subsystem callback
> during the "noirq" phases of system suspend/resume (or analogous)
> transitions as well as during the "late" phase of system suspend and
> the "early" phase of system resume (or analogous) transitions.
>
> The helpers will be called from additional sites going forward.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

With a minor nitpick, see below, feel free to add:

Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

> ---
>
> v2 -> v3: No changes.
>
> ---
>  drivers/base/power/main.c |  196 +++++++++++++++++++++++++++++++---------------
>  1 file changed, 136 insertions(+), 60 deletions(-)
>
> Index: linux-pm/drivers/base/power/main.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/main.c
> +++ linux-pm/drivers/base/power/main.c
> @@ -525,6 +525,14 @@ static void dpm_watchdog_clear(struct dp
>  #define dpm_watchdog_clear(x)
>  #endif
>
> +static pm_callback_t dpm_subsys_suspend_noirq_cb(struct device *dev,
> +                                                pm_message_t state,
> +                                                const char **info_p);
> +
> +static pm_callback_t dpm_subsys_suspend_late_cb(struct device *dev,
> +                                               pm_message_t state,
> +                                               const char **info_p);
> +

There is no need to declare these functions.

Perhaps a following patch in the series need them, but then that
change should add these or even better (in my opinion) just move the
implementations and avoid the declarations all together.

[...]

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v3 4/6] PM / core: Add helpers for subsystem callback selection
  2017-11-15  7:43         ` Ulf Hansson
@ 2017-11-15 17:55           ` Rafael J. Wysocki
  0 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-15 17:55 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, Linux PM, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Andy Shevchenko,
	Kevin Hilman

On Wed, Nov 15, 2017 at 8:43 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On 12 November 2017 at 01:42, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> Add helper routines to find and return a suitable subsystem callback
>> during the "noirq" phases of system suspend/resume (or analogous)
>> transitions as well as during the "late" phase of system suspend and
>> the "early" phase of system resume (or analogous) transitions.
>>
>> The helpers will be called from additional sites going forward.
>>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> With a minor nitpick, see below, feel free to add:
>
> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
>
>> ---
>>
>> v2 -> v3: No changes.
>>
>> ---
>>  drivers/base/power/main.c |  196 +++++++++++++++++++++++++++++++---------------
>>  1 file changed, 136 insertions(+), 60 deletions(-)
>>
>> Index: linux-pm/drivers/base/power/main.c
>> ===================================================================
>> --- linux-pm.orig/drivers/base/power/main.c
>> +++ linux-pm/drivers/base/power/main.c
>> @@ -525,6 +525,14 @@ static void dpm_watchdog_clear(struct dp
>>  #define dpm_watchdog_clear(x)
>>  #endif
>>
>> +static pm_callback_t dpm_subsys_suspend_noirq_cb(struct device *dev,
>> +                                                pm_message_t state,
>> +                                                const char **info_p);
>> +
>> +static pm_callback_t dpm_subsys_suspend_late_cb(struct device *dev,
>> +                                               pm_message_t state,
>> +                                               const char **info_p);
>> +
>
> There is no need to declare these functions.
>
> Perhaps a following patch in the series need them, but then that
> change should add these or even better (in my opinion) just move the
> implementations and avoid the declarations all together.

Well, all of the changes in this patch are for the benefit of the
subsequent patches. :-)

I just wanted to move additional code churn noise from those patches.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v2 1/6] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-11-15  1:48             ` Rafael J. Wysocki
@ 2017-11-16 10:18               ` Ulf Hansson
  0 siblings, 0 replies; 135+ messages in thread
From: Ulf Hansson @ 2017-11-16 10:18 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Andy Shevchenko,
	Kevin Hilman

[...]

>>
>> My general goal is that I want to make it easier (or as easy as
>> possible) for the general driver author to deploy runtime PM and
>> system-wide PM support - in an optimized manner. Therefore, I am
>> pondering over the solution you picked in this series, trying to
>> understand how it fits into those aspects.
>>
>> Particular I am a bit worried from a complexity point of view, about
>> the part with skipping callbacks from the PM core. We have observed
>> some difficulties with the direct_complete path (i2c dw driver), which
>> is based on a similar approach as this one.
>
> These are resume callbacks, not suspend callbacks.  Also not all of them
> are skipped.  That is quite a bit different from skipping *all* callbacks.
>
> Moreover, at the point the core decides to skip the callbacks, the device
> *has* *to* be left suspended and there simply is no point in running them
> no matter what.
>
> That part of code can be trivially moved to middle layers, but then each
> of them will have to do exactly the same thing.  I don't see any reason to
> do that and I'm not finding one in your comments.  Sorry.

I think my concerns boils done to that I am wondering how useful it
will be, in general, to enable the core to skip invoking resume
callbacks.

Although, if you are targeting some specific devices/drivers (ACPI,
PCI, etc), and not care that much about flexibility, then I am fine
with it. The approach seems to work.

Let me elaborate on that comment a bit.

1)
Skipping resume callbacks is not going to work for a device that may
be attached to the generic PM domain.

Well, in principle one could try to re-work genpd to cope with this
behavior, I guess, but that would also mean genpd becomes limited to
always use the noirq callbacks to power on/off the PM domain. That
isn't an acceptable limitation.

2)
Because of 1) This leads to those cross SoC drivers, dealing with
devices which sometimes may have a genpd attached and sometimes an
ACPI PM domain attached. I guess those drivers would need to have a
different set of system-wide PM callbacks, depending on the PM domain
the device is attached to, as to achieve a similar optimized behavior
during system resume. Or some other cleverness to deal with
system-wide PM.

Perhaps we can ignore both 1) and 2), because the number of cross SoC
drivers having these issues should be rather limited!?

3)
There are certainly lots of drivers that can cope with its device
remaining in runtime suspend, during system resume.
Although, some of these drivers may have some additional operations to
carry out during resume, which may not require to resume (activate)
its device. For example the driver may need to resume a queue,
re-configure an out-of-band wakeup (GPIO IRQ), re-configure pinctrls,
etc.

These drivers can't use the method behind LEAVE_SUSPENDED, because
they need their resume callbacks to be invoked.

[...]

>
> Well, this works the other way around this time, I'm afraid.  At this point
> you need to convince me that the approach has real issues. :-)

I think I have pointed out some issues above. Feel free to ignore
them, depending on what your target is.

[...]

>> >> Perhaps a better idea is to do this in the noirq suspend phase,
>> >> because it allows you to bail out in case pm_runtime_set_suspended()
>> >> fails.
>> >
>> > This doesn't make sense, sorry.
>>
>> What do you mean by "failures of that are meaningless here."?
>
> If all devices have runtime PM disabled, pm_runtime_set_suspended() should
> just do what it is asked for unless called with an invalid argument or
> similar.

Yes.

>
>> I was suggesting, instead of calling pm_runtime_set_suspended() in the
>> noirq *resume* phase, why can't you do that in the noirq *suspend*
>> phase?
>>
>> In the noirq *suspend* phase it's not too late to deal with errors!? Or is it?
>
> At that point it has not been decided whether or not the devices will stay
> suspended yet.  The status cannot be changed before making that decision,
> which only happens in the noirq resume phase.

Okay, then it's fine as is (because of your other patch to the runtime
PM core, which changes the behavior for pm_runtime_set_suspended()).

BTW, I have some additional minor comments to some other parts of the
code, but I will start over with a new thread proving you with those
comments.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v3 1/6] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-11-12  0:37       ` [PATCH v3 1/6] PM / core: Add LEAVE_SUSPENDED driver flag Rafael J. Wysocki
@ 2017-11-16 15:10         ` Ulf Hansson
  2017-11-16 23:07           ` Rafael J. Wysocki
  0 siblings, 1 reply; 135+ messages in thread
From: Ulf Hansson @ 2017-11-16 15:10 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman

On 12 November 2017 at 01:37, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
> instruct the PM core and middle-layer (bus type, PM domain, etc.)
> code that it is desirable to leave the device in runtime suspend
> after system-wide transitions to the working state (for example,
> the device may be slow to resume and it may be better to avoid
> resuming it right away).
>
> Generally, the middle-layer code involved in the handling of the
> device is expected to indicate to the PM core whether or not the
> device may be left in suspend with the help of the device's
> power.may_skip_resume status bit.  That has to happen in the "noirq"
> phase of the preceding system suspend (or analogous) transition.
> The middle layer is then responsible for handling the device as
> appropriate in its "noirq" resume callback which is executed
> regardless of whether or not the device may be left suspended, but
> the other resume callbacks (except for ->complete) will be skipped
> automatically by the core if the device really can be left in
> suspend.
>
> The additional power.must_resume status bit introduced for the
> implementation of this mechanisn is used internally by the PM core
> to track the requirement to resume the device (which may depend on
> its children etc).
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
>
> v2 -> v3: Take dev->power.usage_count when updating power.must_resume in
>           __device_suspend_noirq().
>
> ---
>  Documentation/driver-api/pm/devices.rst |   24 ++++++++++-
>  drivers/base/power/main.c               |   66 +++++++++++++++++++++++++++++---
>  drivers/base/power/runtime.c            |    9 ++--
>  include/linux/pm.h                      |   14 +++++-
>  include/linux/pm_runtime.h              |    9 ++--
>  5 files changed, 104 insertions(+), 18 deletions(-)
>
> Index: linux-pm/include/linux/pm.h
> ===================================================================
> --- linux-pm.orig/include/linux/pm.h
> +++ linux-pm/include/linux/pm.h
> @@ -559,6 +559,7 @@ struct pm_subsys_data {
>   * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
>   * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
>   * SMART_SUSPEND: No need to resume the device from runtime suspend.
> + * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible.
>   *
>   * Setting SMART_PREPARE instructs bus types and PM domains which may want
>   * system suspend/resume callbacks to be skipped for the device to return 0 from
> @@ -572,10 +573,14 @@ struct pm_subsys_data {
>   * necessary from the driver's perspective.  It also may cause them to skip
>   * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by
>   * the driver if they decide to leave the device in runtime suspend.
> + *
> + * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the
> + * driver prefers the device to be left in runtime suspend after system resume.
>   */

Question: Can LEAVE_SUSPENDED and NEVER_SKIP be valid combination? I
guess not!? Should we validate for wrong combinations?

[...]

>  /**
>   * __device_suspend_noirq - Execute a "noirq suspend" callback for given device.
>   * @dev: Device to handle.
> @@ -1127,10 +1161,28 @@ static int __device_suspend_noirq(struct
>         }
>
>         error = dpm_run_callback(callback, dev, state, info);
> -       if (!error)
> -               dev->power.is_noirq_suspended = true;
> -       else
> +       if (error) {
>                 async_error = error;
> +               goto Complete;
> +       }
> +
> +       dev->power.is_noirq_suspended = true;
> +
> +       if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) {
> +               /*
> +                * The only safe strategy here is to require that if the device
> +                * may not be left in suspend, resume callbacks must be invoked
> +                * for it.
> +                */
> +               dev->power.must_resume = dev->power.must_resume ||
> +                                       !dev->power.may_skip_resume ||
> +                                       atomic_read(&dev->power.usage_count);

dev->power.usage_count is always > 0 at this point, meaning that
dev->power.must_resume always becomes true. :-)

You should rather use "atomic_read(&dev->power.usage_count) > 1".

> +       } else {
> +               dev->power.must_resume = true;
> +       }
> +
> +       if (dev->power.must_resume)
> +               dpm_superior_set_must_resume(dev);
>
>  Complete:
>         complete_all(&dev->power.completion);
> @@ -1487,6 +1539,9 @@ static int __device_suspend(struct devic
>                 dev->power.direct_complete = false;
>         }
>
> +       dev->power.may_skip_resume = false;
> +       dev->power.must_resume = false;
> +

First, these assignment could be bypassed if the direct_complete path
is used. Perhaps it's more robust to reset these flags already in
device_prepare().

Second, have you considered setting the default value of
dev->power.may_skip_resume to true? That would means the subsystem
instead need to implement an opt-out method. I am thinking that it may
not be an issue, since we anyway at this point, don't have drivers
using the LEAVE_SUSPENDED flag.

[...]

> +However, it may be desirable to leave some devices in runtime suspend after
> +system transitions to the working state and device drivers can use the
> +``DPM_FLAG_LEAVE_SUSPENDED`` flag to indicate to the PM core (and middle-layer
> +code) that this is the case.  Whether or not the devices will actually be left
> +in suspend may depend on their state before the given system suspend-resume
> +cycle and on the type of the system transition under way.  In particular,
> +devices are not left suspended if that transition is a restore from hibernation,
> +as device states are not guaranteed to be reflected by the information stored in
> +the hibernation image in that case.
> +
> +The middle-layer code involved in the handling of the device has to indicate to
> +the PM core if the device may be left in suspend with the help of its
> +:c:member:`power.may_skip_resume` status bit.  That has to happen in the "noirq"
> +phase of the preceding system-wide suspend (or analogous) transition.  The

Does it have to be managed in the "noirq" phase? Wouldn't be perfectly
okay do this in the suspend and suspend_late phases as well?

> +middle layer is then responsible for handling the device as appropriate in its
> +"noirq" resume callback, which is executed regardless of whether or not the
> +device may be left suspended, but the other resume callbacks (except for
> +``->complete``) will be skipped automatically by the PM core if the device
> +really can be left in suspend.
>

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v3 1/6] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-11-16 15:10         ` Ulf Hansson
@ 2017-11-16 23:07           ` Rafael J. Wysocki
  2017-11-17  6:11             ` Ulf Hansson
  2017-11-17 12:45             ` Rafael J. Wysocki
  0 siblings, 2 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-16 23:07 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman

On Thursday, November 16, 2017 4:10:16 PM CET Ulf Hansson wrote:
> On 12 November 2017 at 01:37, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
> > instruct the PM core and middle-layer (bus type, PM domain, etc.)
> > code that it is desirable to leave the device in runtime suspend
> > after system-wide transitions to the working state (for example,
> > the device may be slow to resume and it may be better to avoid
> > resuming it right away).
> >
> > Generally, the middle-layer code involved in the handling of the
> > device is expected to indicate to the PM core whether or not the
> > device may be left in suspend with the help of the device's
> > power.may_skip_resume status bit.  That has to happen in the "noirq"
> > phase of the preceding system suspend (or analogous) transition.
> > The middle layer is then responsible for handling the device as
> > appropriate in its "noirq" resume callback which is executed
> > regardless of whether or not the device may be left suspended, but
> > the other resume callbacks (except for ->complete) will be skipped
> > automatically by the core if the device really can be left in
> > suspend.
> >
> > The additional power.must_resume status bit introduced for the
> > implementation of this mechanisn is used internally by the PM core
> > to track the requirement to resume the device (which may depend on
> > its children etc).
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > ---
> >
> > v2 -> v3: Take dev->power.usage_count when updating power.must_resume in
> >           __device_suspend_noirq().
> >
> > ---
> >  Documentation/driver-api/pm/devices.rst |   24 ++++++++++-
> >  drivers/base/power/main.c               |   66 +++++++++++++++++++++++++++++---
> >  drivers/base/power/runtime.c            |    9 ++--
> >  include/linux/pm.h                      |   14 +++++-
> >  include/linux/pm_runtime.h              |    9 ++--
> >  5 files changed, 104 insertions(+), 18 deletions(-)
> >
> > Index: linux-pm/include/linux/pm.h
> > ===================================================================
> > --- linux-pm.orig/include/linux/pm.h
> > +++ linux-pm/include/linux/pm.h
> > @@ -559,6 +559,7 @@ struct pm_subsys_data {
> >   * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
> >   * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
> >   * SMART_SUSPEND: No need to resume the device from runtime suspend.
> > + * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible.
> >   *
> >   * Setting SMART_PREPARE instructs bus types and PM domains which may want
> >   * system suspend/resume callbacks to be skipped for the device to return 0 from
> > @@ -572,10 +573,14 @@ struct pm_subsys_data {
> >   * necessary from the driver's perspective.  It also may cause them to skip
> >   * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by
> >   * the driver if they decide to leave the device in runtime suspend.
> > + *
> > + * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the
> > + * driver prefers the device to be left in runtime suspend after system resume.
> >   */
> 
> Question: Can LEAVE_SUSPENDED and NEVER_SKIP be valid combination? I
> guess not!? Should we validate for wrong combinations?

Why not?  There's no real overlap between them.

> 
> [...]
> 
> >  /**
> >   * __device_suspend_noirq - Execute a "noirq suspend" callback for given device.
> >   * @dev: Device to handle.
> > @@ -1127,10 +1161,28 @@ static int __device_suspend_noirq(struct
> >         }
> >
> >         error = dpm_run_callback(callback, dev, state, info);
> > -       if (!error)
> > -               dev->power.is_noirq_suspended = true;
> > -       else
> > +       if (error) {
> >                 async_error = error;
> > +               goto Complete;
> > +       }
> > +
> > +       dev->power.is_noirq_suspended = true;
> > +
> > +       if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) {
> > +               /*
> > +                * The only safe strategy here is to require that if the device
> > +                * may not be left in suspend, resume callbacks must be invoked
> > +                * for it.
> > +                */
> > +               dev->power.must_resume = dev->power.must_resume ||
> > +                                       !dev->power.may_skip_resume ||
> > +                                       atomic_read(&dev->power.usage_count);
> 
> dev->power.usage_count is always > 0 at this point, meaning that
> dev->power.must_resume always becomes true. :-)
> 
> You should rather use "atomic_read(&dev->power.usage_count) > 1".

Right, thanks.  I tend to forget about that.

> > +       } else {
> > +               dev->power.must_resume = true;
> > +       }
> > +
> > +       if (dev->power.must_resume)
> > +               dpm_superior_set_must_resume(dev);
> >
> >  Complete:
> >         complete_all(&dev->power.completion);
> > @@ -1487,6 +1539,9 @@ static int __device_suspend(struct devic
> >                 dev->power.direct_complete = false;
> >         }
> >
> > +       dev->power.may_skip_resume = false;
> > +       dev->power.must_resume = false;
> > +
> 
> First, these assignment could be bypassed if the direct_complete path
> is used. Perhaps it's more robust to reset these flags already in
> device_prepare().

In the direct-complete case may_skip_resume doesn't matter.

must_resume should be set to "false", however, so that parents of
direct-complete devices may be left in suspend (in case they don't
fall under direct-complete themselves), so good catch.

But it is sufficient to do that before the power.direct_complete check above. :-)

> Second, have you considered setting the default value of
> dev->power.may_skip_resume to true?

Yes.

> That would means the subsystem
> instead need to implement an opt-out method. I am thinking that it may
> not be an issue, since we anyway at this point, don't have drivers
> using the LEAVE_SUSPENDED flag.

Opt-out doesn't work because of the need to invoke the "noirq" callbacks.

> [...]
> 
> > +However, it may be desirable to leave some devices in runtime suspend after
> > +system transitions to the working state and device drivers can use the
> > +``DPM_FLAG_LEAVE_SUSPENDED`` flag to indicate to the PM core (and middle-layer
> > +code) that this is the case.  Whether or not the devices will actually be left
> > +in suspend may depend on their state before the given system suspend-resume
> > +cycle and on the type of the system transition under way.  In particular,
> > +devices are not left suspended if that transition is a restore from hibernation,
> > +as device states are not guaranteed to be reflected by the information stored in
> > +the hibernation image in that case.
> > +
> > +The middle-layer code involved in the handling of the device has to indicate to
> > +the PM core if the device may be left in suspend with the help of its
> > +:c:member:`power.may_skip_resume` status bit.  That has to happen in the "noirq"
> > +phase of the preceding system-wide suspend (or analogous) transition.  The
> 
> Does it have to be managed in the "noirq" phase? Wouldn't be perfectly
> okay do this in the suspend and suspend_late phases as well?

The wording is slightly misleading I think.

In fact technically may_skip_resume may be set earlier, but the core checks it
in the "noirq" phase only anyway.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v3 1/6] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-11-16 23:07           ` Rafael J. Wysocki
@ 2017-11-17  6:11             ` Ulf Hansson
  2017-11-17 13:18               ` Rafael J. Wysocki
  2017-11-17 12:45             ` Rafael J. Wysocki
  1 sibling, 1 reply; 135+ messages in thread
From: Ulf Hansson @ 2017-11-17  6:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman

[...]

>> > +++ linux-pm/include/linux/pm.h
>> > @@ -559,6 +559,7 @@ struct pm_subsys_data {
>> >   * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
>> >   * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
>> >   * SMART_SUSPEND: No need to resume the device from runtime suspend.
>> > + * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible.
>> >   *
>> >   * Setting SMART_PREPARE instructs bus types and PM domains which may want
>> >   * system suspend/resume callbacks to be skipped for the device to return 0 from
>> > @@ -572,10 +573,14 @@ struct pm_subsys_data {
>> >   * necessary from the driver's perspective.  It also may cause them to skip
>> >   * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by
>> >   * the driver if they decide to leave the device in runtime suspend.
>> > + *
>> > + * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the
>> > + * driver prefers the device to be left in runtime suspend after system resume.
>> >   */
>>
>> Question: Can LEAVE_SUSPENDED and NEVER_SKIP be valid combination? I
>> guess not!? Should we validate for wrong combinations?
>
> Why not?  There's no real overlap between them.

Except that NEVER_SKIP, documentation wise, tells you that your
suspend and resume callbacks will never be skipped. :-)

[...]

>> Second, have you considered setting the default value of
>> dev->power.may_skip_resume to true?
>
> Yes.
>
>> That would means the subsystem
>> instead need to implement an opt-out method. I am thinking that it may
>> not be an issue, since we anyway at this point, don't have drivers
>> using the LEAVE_SUSPENDED flag.
>
> Opt-out doesn't work because of the need to invoke the "noirq" callbacks.

I am not sure I follow that.

Whatever needs to be fixed on the subsystem level, that could be done
before the driver starts using the LEAVE_SUSPENDED flag. No?

>
>> [...]
>>
>> > +However, it may be desirable to leave some devices in runtime suspend after
>> > +system transitions to the working state and device drivers can use the
>> > +``DPM_FLAG_LEAVE_SUSPENDED`` flag to indicate to the PM core (and middle-layer
>> > +code) that this is the case.  Whether or not the devices will actually be left
>> > +in suspend may depend on their state before the given system suspend-resume
>> > +cycle and on the type of the system transition under way.  In particular,
>> > +devices are not left suspended if that transition is a restore from hibernation,
>> > +as device states are not guaranteed to be reflected by the information stored in
>> > +the hibernation image in that case.
>> > +
>> > +The middle-layer code involved in the handling of the device has to indicate to
>> > +the PM core if the device may be left in suspend with the help of its
>> > +:c:member:`power.may_skip_resume` status bit.  That has to happen in the "noirq"
>> > +phase of the preceding system-wide suspend (or analogous) transition.  The
>>
>> Does it have to be managed in the "noirq" phase? Wouldn't be perfectly
>> okay do this in the suspend and suspend_late phases as well?
>
> The wording is slightly misleading I think.
>
> In fact technically may_skip_resume may be set earlier, but the core checks it
> in the "noirq" phase only anyway.

Yeah, okay.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v3 1/6] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-11-16 23:07           ` Rafael J. Wysocki
  2017-11-17  6:11             ` Ulf Hansson
@ 2017-11-17 12:45             ` Rafael J. Wysocki
  1 sibling, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-17 12:45 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman, Rafael J. Wysocki

On Fri, Nov 17, 2017 at 12:07 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Thursday, November 16, 2017 4:10:16 PM CET Ulf Hansson wrote:
>> On 12 November 2017 at 01:37, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> >
>> > Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
>> > instruct the PM core and middle-layer (bus type, PM domain, etc.)
>> > code that it is desirable to leave the device in runtime suspend
>> > after system-wide transitions to the working state (for example,
>> > the device may be slow to resume and it may be better to avoid
>> > resuming it right away).
>> >
>> > Generally, the middle-layer code involved in the handling of the
>> > device is expected to indicate to the PM core whether or not the
>> > device may be left in suspend with the help of the device's
>> > power.may_skip_resume status bit.  That has to happen in the "noirq"
>> > phase of the preceding system suspend (or analogous) transition.
>> > The middle layer is then responsible for handling the device as
>> > appropriate in its "noirq" resume callback which is executed
>> > regardless of whether or not the device may be left suspended, but
>> > the other resume callbacks (except for ->complete) will be skipped
>> > automatically by the core if the device really can be left in
>> > suspend.
>> >
>> > The additional power.must_resume status bit introduced for the
>> > implementation of this mechanisn is used internally by the PM core
>> > to track the requirement to resume the device (which may depend on
>> > its children etc).
>> >
>> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> > ---
>> >
>> > v2 -> v3: Take dev->power.usage_count when updating power.must_resume in
>> >           __device_suspend_noirq().
>> >
>> > ---

[...]

>> > +       } else {
>> > +               dev->power.must_resume = true;
>> > +       }
>> > +
>> > +       if (dev->power.must_resume)
>> > +               dpm_superior_set_must_resume(dev);
>> >
>> >  Complete:
>> >         complete_all(&dev->power.completion);
>> > @@ -1487,6 +1539,9 @@ static int __device_suspend(struct devic
>> >                 dev->power.direct_complete = false;
>> >         }
>> >
>> > +       dev->power.may_skip_resume = false;
>> > +       dev->power.must_resume = false;
>> > +
>>
>> First, these assignment could be bypassed if the direct_complete path
>> is used. Perhaps it's more robust to reset these flags already in
>> device_prepare().
>
> In the direct-complete case may_skip_resume doesn't matter.
>
> must_resume should be set to "false", however, so that parents of
> direct-complete devices may be left in suspend (in case they don't
> fall under direct-complete themselves), so good catch.

Actually, not really.

must_resume for parents/suppliers is not updated if the device has
direct_complete set and the device's own must_resume doesn't matter
then.

So this part is good as is AFAICS.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v3 1/6] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-11-17  6:11             ` Ulf Hansson
@ 2017-11-17 13:18               ` Rafael J. Wysocki
  2017-11-17 13:49                 ` Ulf Hansson
  0 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-17 13:18 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, Linux PM, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Andy Shevchenko,
	Kevin Hilman

On Fri, Nov 17, 2017 at 7:11 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> [...]
>
>>> > +++ linux-pm/include/linux/pm.h
>>> > @@ -559,6 +559,7 @@ struct pm_subsys_data {
>>> >   * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
>>> >   * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
>>> >   * SMART_SUSPEND: No need to resume the device from runtime suspend.
>>> > + * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible.
>>> >   *
>>> >   * Setting SMART_PREPARE instructs bus types and PM domains which may want
>>> >   * system suspend/resume callbacks to be skipped for the device to return 0 from
>>> > @@ -572,10 +573,14 @@ struct pm_subsys_data {
>>> >   * necessary from the driver's perspective.  It also may cause them to skip
>>> >   * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by
>>> >   * the driver if they decide to leave the device in runtime suspend.
>>> > + *
>>> > + * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the
>>> > + * driver prefers the device to be left in runtime suspend after system resume.
>>> >   */
>>>
>>> Question: Can LEAVE_SUSPENDED and NEVER_SKIP be valid combination? I
>>> guess not!? Should we validate for wrong combinations?
>>
>> Why not?  There's no real overlap between them.
>
> Except that NEVER_SKIP, documentation wise, tells you that your
> suspend and resume callbacks will never be skipped. :-)

You mean the comment in pm.h I suppose?  Yes, it isn't precise enough.

The proper documentation in devices.rst is less ambiguous, though. :-)

> [...]
>
>>> Second, have you considered setting the default value of
>>> dev->power.may_skip_resume to true?
>>
>> Yes.
>>
>>> That would means the subsystem
>>> instead need to implement an opt-out method. I am thinking that it may
>>> not be an issue, since we anyway at this point, don't have drivers
>>> using the LEAVE_SUSPENDED flag.
>>
>> Opt-out doesn't work because of the need to invoke the "noirq" callbacks.
>
> I am not sure I follow that.
>
> Whatever needs to be fixed on the subsystem level, that could be done
> before the driver starts using the LEAVE_SUSPENDED flag. No?

That requires a bit of explanation, sorry for being overly concise.

The core calls ->resume_noirq from the middle layer regardless of
whether or not the device will be left suspended, so the
->resume_noirq cannot do arbitrary things to it.  Setting
may_skip_resume by the middle layer tells the core that the middle
layer is ready for that and is going to cooperate.  If may_skip_resume
had been set by default, that piece of information would have been
missing.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v3 1/6] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-11-17 13:18               ` Rafael J. Wysocki
@ 2017-11-17 13:49                 ` Ulf Hansson
  2017-11-17 14:31                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 135+ messages in thread
From: Ulf Hansson @ 2017-11-17 13:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Andy Shevchenko,
	Kevin Hilman

[...]

>>
>>>> Second, have you considered setting the default value of
>>>> dev->power.may_skip_resume to true?
>>>
>>> Yes.
>>>
>>>> That would means the subsystem
>>>> instead need to implement an opt-out method. I am thinking that it may
>>>> not be an issue, since we anyway at this point, don't have drivers
>>>> using the LEAVE_SUSPENDED flag.
>>>
>>> Opt-out doesn't work because of the need to invoke the "noirq" callbacks.
>>
>> I am not sure I follow that.
>>
>> Whatever needs to be fixed on the subsystem level, that could be done
>> before the driver starts using the LEAVE_SUSPENDED flag. No?
>
> That requires a bit of explanation, sorry for being overly concise.
>
> The core calls ->resume_noirq from the middle layer regardless of
> whether or not the device will be left suspended, so the
> ->resume_noirq cannot do arbitrary things to it.  Setting
> may_skip_resume by the middle layer tells the core that the middle
> layer is ready for that and is going to cooperate.  If may_skip_resume
> had been set by default, that piece of information would have been
> missing.

Huh, I still don't get that. Sorry.

If the "may_skip_resume" is default set to true by the PM core,
wouldn't that just mean that the middle-layer needs to implement an
opt-out method, rather than opt-in. In principle to opt-out the
middle-layer needs to set may_skip_resume to false in suspend_noirq
phase, no?

Then we only need to make sure drivers don't starts use
LEAVE_SUSPENDED, before we make sure the middle layers is adopted. But
that should not be a problem.

The benefit would be that those middle layers that can cope with
LEAVE_SUSPENDED as of today don't need to change.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v3 1/6] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-11-17 13:49                 ` Ulf Hansson
@ 2017-11-17 14:31                   ` Rafael J. Wysocki
  2017-11-17 15:57                     ` Ulf Hansson
  0 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-17 14:31 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, Bjorn Helgaas,
	Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Andy Shevchenko,
	Kevin Hilman

On Fri, Nov 17, 2017 at 2:49 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> [...]
>
>>>
>>>>> Second, have you considered setting the default value of
>>>>> dev->power.may_skip_resume to true?
>>>>
>>>> Yes.
>>>>
>>>>> That would means the subsystem
>>>>> instead need to implement an opt-out method. I am thinking that it may
>>>>> not be an issue, since we anyway at this point, don't have drivers
>>>>> using the LEAVE_SUSPENDED flag.
>>>>
>>>> Opt-out doesn't work because of the need to invoke the "noirq" callbacks.
>>>
>>> I am not sure I follow that.
>>>
>>> Whatever needs to be fixed on the subsystem level, that could be done
>>> before the driver starts using the LEAVE_SUSPENDED flag. No?
>>
>> That requires a bit of explanation, sorry for being overly concise.
>>
>> The core calls ->resume_noirq from the middle layer regardless of
>> whether or not the device will be left suspended, so the
>> ->resume_noirq cannot do arbitrary things to it.  Setting
>> may_skip_resume by the middle layer tells the core that the middle
>> layer is ready for that and is going to cooperate.  If may_skip_resume
>> had been set by default, that piece of information would have been
>> missing.
>
> Huh, I still don't get that. Sorry.
>
> If the "may_skip_resume" is default set to true by the PM core,
> wouldn't that just mean that the middle-layer needs to implement an
> opt-out method, rather than opt-in. In principle to opt-out the
> middle-layer needs to set may_skip_resume to false in suspend_noirq
> phase, no?

Yes, but if the middle-layer doesn't clear it, that may mean two
things.  First, the middle layer is ready and so on.  Good.  Second,
the middle layer is not aware of the whole thing.  Not good.  The core
cannot tell.

In the opt-in case, however, all is clear. :-)

> Then we only need to make sure drivers don't starts use
> LEAVE_SUSPENDED, before we make sure the middle layers is adopted. But
> that should not be a problem.
>
> The benefit would be that those middle layers that can cope with
> LEAVE_SUSPENDED as of today don't need to change.

I'm not sure if that's the case.

The middle layer has to evaluate dev_pm_may_skip_resume() in
->resume_noirq() to check if the device can be left in suspend, as it
cannot determine that in ->suspend_noirq() yet.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v3 1/6] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-11-17 14:31                   ` Rafael J. Wysocki
@ 2017-11-17 15:57                     ` Ulf Hansson
  0 siblings, 0 replies; 135+ messages in thread
From: Ulf Hansson @ 2017-11-17 15:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Andy Shevchenko,
	Kevin Hilman

On 17 November 2017 at 15:31, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Fri, Nov 17, 2017 at 2:49 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> [...]
>>
>>>>
>>>>>> Second, have you considered setting the default value of
>>>>>> dev->power.may_skip_resume to true?
>>>>>
>>>>> Yes.
>>>>>
>>>>>> That would means the subsystem
>>>>>> instead need to implement an opt-out method. I am thinking that it may
>>>>>> not be an issue, since we anyway at this point, don't have drivers
>>>>>> using the LEAVE_SUSPENDED flag.
>>>>>
>>>>> Opt-out doesn't work because of the need to invoke the "noirq" callbacks.
>>>>
>>>> I am not sure I follow that.
>>>>
>>>> Whatever needs to be fixed on the subsystem level, that could be done
>>>> before the driver starts using the LEAVE_SUSPENDED flag. No?
>>>
>>> That requires a bit of explanation, sorry for being overly concise.
>>>
>>> The core calls ->resume_noirq from the middle layer regardless of
>>> whether or not the device will be left suspended, so the
>>> ->resume_noirq cannot do arbitrary things to it.  Setting
>>> may_skip_resume by the middle layer tells the core that the middle
>>> layer is ready for that and is going to cooperate.  If may_skip_resume
>>> had been set by default, that piece of information would have been
>>> missing.
>>
>> Huh, I still don't get that. Sorry.
>>
>> If the "may_skip_resume" is default set to true by the PM core,
>> wouldn't that just mean that the middle-layer needs to implement an
>> opt-out method, rather than opt-in. In principle to opt-out the
>> middle-layer needs to set may_skip_resume to false in suspend_noirq
>> phase, no?
>
> Yes, but if the middle-layer doesn't clear it, that may mean two
> things.  First, the middle layer is ready and so on.  Good.  Second,
> the middle layer is not aware of the whole thing.  Not good.  The core
> cannot tell.
>
> In the opt-in case, however, all is clear. :-)

Okay.

>
>> Then we only need to make sure drivers don't starts use
>> LEAVE_SUSPENDED, before we make sure the middle layers is adopted. But
>> that should not be a problem.
>>
>> The benefit would be that those middle layers that can cope with
>> LEAVE_SUSPENDED as of today don't need to change.
>
> I'm not sure if that's the case.
>
> The middle layer has to evaluate dev_pm_may_skip_resume() in
> ->resume_noirq() to check if the device can be left in suspend, as it
> cannot determine that in ->suspend_noirq() yet.

Right. Okay, let's stick with the chosen method.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v4 0/6] PM / sleep: Driver flags for system suspend/resume (part 2)
  2017-11-12  0:34     ` [PATCH v3 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
                         ` (5 preceding siblings ...)
  2017-11-12  0:44       ` [PATCH v3 6/6] PM / core: DPM_FLAG_SMART_SUSPEND optimization Rafael J. Wysocki
@ 2017-11-18 14:27       ` Rafael J. Wysocki
  2017-11-18 14:31         ` [PATCH v4 1/6] PM / core: Add LEAVE_SUSPENDED driver flag Rafael J. Wysocki
                           ` (5 more replies)
  6 siblings, 6 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-18 14:27 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

Hi All,

The following still applies:

> On Wednesday, November 8, 2017 1:41:35 AM CET Rafael J. Wysocki wrote:
> >
> > This is a follow-up for the first part of the PM driver flags series
> > sent previously some time ago with an intro as follows:
> > 
> > On Saturday, October 28, 2017 12:11:55 AM CET Rafael J. Wysocki wrote:
> > > The following part of the original cover letter still applies:
> > > 
> > > On Monday, October 16, 2017 3:12:35 AM CEST Rafael J. Wysocki wrote:
> > > > 
> > > > This work was triggered by attempts to fix and optimize PM in the
> > > > i2c-designware-platdev driver that ended up with adding a couple of
> > > > flags to the driver's internal data structures for the tracking of
> > > > device state (https://marc.info/?l=linux-acpi&m=150629646805636&w=2).
> > > > That approach is sort of suboptimal, though, because other drivers will
> > > > probably want to do similar things and if all of them need to use internal
> > > > flags for that, quite a bit of code duplication may ensue at least.
> > > > 
> > > > That can be avoided in a couple of ways and one of them is to provide a means
> > > > for drivers to tell the core what to do and to make the core take care of it
> > > > if told to do so.  Hence, the idea to use driver flags for system-wide PM
> > > > that was briefly discussed during the LPC in LA last month.
> > > 
> > > [...]
> > > 
> > > > What can work (and this is the only strategy that can work AFAICS) is to
> > > > point different callback pointers *in* *a* *driver* to the same routine
> > > > if the driver wants to reuse that code.  That actually will work for PCI
> > > > and USB drivers today, at least most of the time, but unfortunately there
> > > > are problems with it for, say, platform devices.
> > > > 
> > > > The first problem is the requirement to track the status of the device
> > > > (suspended vs not suspended) in the callbacks, because the system-wide PM
> > > > code in the PM core doesn't do that.  The runtime PM framework does it, so
> > > > this means adding some extra code which isn't necessary for runtime PM to
> > > > the callback routines and that is not particularly nice.
> > > > 
> > > > The second problem is that, if the driver wants to do anything in its
> > > > ->suspend callback, it generally has to prevent runtime suspend of the
> > > > device from taking place in parallel with that, which is quite cumbersome.
> > > > Usually, that is taken care of by resuming the device from runtime suspend
> > > > upfront, but generally doing that is wasteful (there may be no real need to
> > > > resume the device except for the fact that the code is designed this way).
> > > > 
> > > > On top of the above, there are optimizations to be made, like leaving certain
> > > > devices in suspend after system resume to avoid wasting time on waiting for
> > > > them to resume before user space can run again and similar.
> > > > 
> > > > This patch series focuses on addressing those problems so as to make it
> > > > easier to reuse callback routines by pointing different callback pointers
> > > > to them in device drivers.  The flags introduced here are to instruct the
> > > > PM core and middle layers (whatever they are) on how the driver wants the
> > > > device to be handled and then the driver has to provide callbacks to match
> > > > these instructions and the rest should be taken care of by the code above it.
> > > > 
> > > > The flags are introduced one by one to avoid making too many changes in
> > > > one go and to allow things to be explained better (hopefully).  They mostly
> > > > are mutually independent with some clearly documented exceptions.
> > > 
> > > but I had to rework the core patches to address the problem pointed with the
> > > generic power domains (genpd) framework pointed out by Ulf.
> > > 
> > > Namely, genpd expects its "noirq" callbacks to be invoked for devices in
> > > runtime suspend too and it has valid reasons for that, so its "noirq"
> > > callbacks can never be skipped, even for devices with the SMART_SUSPEND
> > > flag set.  For this reason, the logic related to DPM_FLAG_SMART_SUSPEND
> > > had to be moved from the core to the PCI bus type and the ACPI PM domain
> > > which are mostly affected by it anyway.  The code after the changes looks
> > > more straightforward to me, but it generally is more code and some patterns
> > > had to be repeated in a few places.
> > 
> > I promised to send the rest of the series then:
> > 
> > > I will send the core patches for the remaining two flags introduced by the
> > > original series separately and the intel-lpss and i2c-designware ones will
> > > be posted when the core patches have been reviewed and agreed on.
> > 
> > and here it goes.
> > 
> > It actually only adds support for one additional flag, namely for
> > DPM_FLAG_LEAVE_SUSPENDED, to the PM core (basic bits), PCI bus type and the
> > ACPI PM domain.
> > 
> > That part of the series (patches [1-3/6]) is rather straightforward and, as PCI
> > and the ACPI PM domain are concerned, it should be functionally equivalent to
> > the previous version of the set, so I retained the Greg's ACKs on these patches.
> > 
> > The other part (patches [4-6/6]) is sort of new, as it makes the PM core
> > carry out optimizations for devices with DPM_FLAG_LEAVE_SUSPENDED and/or
> > DPM_FLAG_SMART_SUSPEND set where the "noirq", "early" and "late" system-wide
> > PM callbacks provided by the drivers are invoked by the core directly.  That
> > part basically allows platform drivers, for instance, to reuse runtime PM
> > callbacks (by pointing ->suspend_late and ->resume_early to them) without
> > adding extra checks to them, as long as they are called directly by the core
> > (or the ACPI PM domain).
> 
> And on top of that, while replying to Ulf's comments I realized that devices
> with nonzero runtime PM usage_count reference counters cannot be left in suspend
> during system resume, because that would confuse the runtime PM framework going
> forward.  Patches [1/6] and [5/6] have to be updated to avoid that, so here
> goes a new revision.

The v4 here addresses a mistake that I made while adding the usage_count
reference counters check (it should check whether or not they are greater
than 1, not just nonzero, because the PM core itself increments them in the
"prepare" phase of system suspend transitions) spotted by Ulf, makes the
NEvER_SKIP flag description in pm.h more precise, clarifies the description
of the new LEAVE_SUSPENDED flag in devices.rst and adds a comment in
device_resume_noirq() explaining why the runtime PM status of the device
is changed to "suspended" in there.  All of that affects patch [1/6].

Patches [2-3/6] remain unmodified and some code is moved from patch [4/6]
to patches [5-6/6] (to address comments from Ulf).  Also these patches have
been rebased on top of the modified [1/6].

All should apply on top of the current linux-next.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v4 1/6] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-11-18 14:27       ` [PATCH v4 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
@ 2017-11-18 14:31         ` Rafael J. Wysocki
  2017-11-20 12:25           ` Ulf Hansson
  2017-11-18 14:33         ` [PATCH v4 2/6] PCI / PM: Support for " Rafael J. Wysocki
                           ` (4 subsequent siblings)
  5 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-18 14:31 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
instruct the PM core and middle-layer (bus type, PM domain, etc.)
code that it is desirable to leave the device in runtime suspend
after system-wide transitions to the working state (for example,
the device may be slow to resume and it may be better to avoid
resuming it right away).

Generally, the middle-layer code involved in the handling of the
device is expected to indicate to the PM core whether or not the
device may be left in suspend with the help of the device's
power.may_skip_resume status bit.  That has to happen in the "noirq"
phase of the preceding system suspend (or analogous) transition.
The middle layer is then responsible for handling the device as
appropriate in its "noirq" resume callback which is executed
regardless of whether or not the device may be left suspended, but
the other resume callbacks (except for ->complete) will be skipped
automatically by the core if the device really can be left in
suspend.

The additional power.must_resume status bit introduced for the
implementation of this mechanisn is used internally by the PM core
to track the requirement to resume the device (which may depend on
its children etc).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---

v3 -> v4: Fix the dev->power.usage_count check added in v3, clarify
          documentation, add comment to explain why the runtime PM status
          is changed in device_resume_noirq() and make the description of
          the NEVER_SKIP flag more precise.

v2 -> v3: Take dev->power.usage_count when updating power.must_resume in
          __device_suspend_noirq().

---
 Documentation/driver-api/pm/devices.rst |   27 ++++++++++-
 drivers/base/power/main.c               |   73 +++++++++++++++++++++++++++++---
 include/linux/pm.h                      |   16 +++++--
 3 files changed, 105 insertions(+), 11 deletions(-)

Index: linux-pm/include/linux/pm.h
===================================================================
--- linux-pm.orig/include/linux/pm.h
+++ linux-pm/include/linux/pm.h
@@ -556,9 +556,10 @@ struct pm_subsys_data {
  * These flags can be set by device drivers at the probe time.  They need not be
  * cleared by the drivers as the driver core will take care of that.
  *
- * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
+ * NEVER_SKIP: Do not skip all system suspend/resume callbacks for the device.
  * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
  * SMART_SUSPEND: No need to resume the device from runtime suspend.
+ * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible.
  *
  * Setting SMART_PREPARE instructs bus types and PM domains which may want
  * system suspend/resume callbacks to be skipped for the device to return 0 from
@@ -572,10 +573,14 @@ struct pm_subsys_data {
  * necessary from the driver's perspective.  It also may cause them to skip
  * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by
  * the driver if they decide to leave the device in runtime suspend.
+ *
+ * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the
+ * driver prefers the device to be left in suspend after system resume.
  */
-#define DPM_FLAG_NEVER_SKIP	BIT(0)
-#define DPM_FLAG_SMART_PREPARE	BIT(1)
-#define DPM_FLAG_SMART_SUSPEND	BIT(2)
+#define DPM_FLAG_NEVER_SKIP		BIT(0)
+#define DPM_FLAG_SMART_PREPARE		BIT(1)
+#define DPM_FLAG_SMART_SUSPEND		BIT(2)
+#define DPM_FLAG_LEAVE_SUSPENDED	BIT(3)
 
 struct dev_pm_info {
 	pm_message_t		power_state;
@@ -597,6 +602,8 @@ struct dev_pm_info {
 	bool			wakeup_path:1;
 	bool			syscore:1;
 	bool			no_pm_callbacks:1;	/* Owned by the PM core */
+	unsigned int		must_resume:1;	/* Owned by the PM core */
+	unsigned int		may_skip_resume:1;	/* Set by subsystems */
 #else
 	unsigned int		should_wakeup:1;
 #endif
@@ -765,6 +772,7 @@ extern int pm_generic_poweroff_late(stru
 extern int pm_generic_poweroff(struct device *dev);
 extern void pm_generic_complete(struct device *dev);
 
+extern bool dev_pm_may_skip_resume(struct device *dev);
 extern bool dev_pm_smart_suspend_and_suspended(struct device *dev);
 
 #else /* !CONFIG_PM_SLEEP */
Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -528,6 +528,18 @@ static void dpm_watchdog_clear(struct dp
 /*------------------------- Resume routines -------------------------*/
 
 /**
+ * dev_pm_may_skip_resume - System-wide device resume optimization check.
+ * @dev: Target device.
+ *
+ * Checks whether or not the device may be left in suspend after a system-wide
+ * transition to the working state.
+ */
+bool dev_pm_may_skip_resume(struct device *dev)
+{
+	return !dev->power.must_resume && pm_transition.event != PM_EVENT_RESTORE;
+}
+
+/**
  * device_resume_noirq - Execute a "noirq resume" callback for given device.
  * @dev: Device to handle.
  * @state: PM transition of the system being carried out.
@@ -575,6 +587,19 @@ static int device_resume_noirq(struct de
 	error = dpm_run_callback(callback, dev, state, info);
 	dev->power.is_noirq_suspended = false;
 
+	if (dev_pm_may_skip_resume(dev)) {
+		/*
+		 * The device is going to be left in suspend, but it might not
+		 * have been in runtime suspend before the system suspended, so
+		 * its runtime PM status needs to be updated to avoid confusing
+		 * the runtime PM framework when runtime PM is enabled for the
+		 * device again.
+		 */
+		pm_runtime_set_suspended(dev);
+		dev->power.is_late_suspended = false;
+		dev->power.is_suspended = false;
+	}
+
  Out:
 	complete_all(&dev->power.completion);
 	TRACE_RESUME(error);
@@ -1076,6 +1101,22 @@ static pm_message_t resume_event(pm_mess
 	return PMSG_ON;
 }
 
+static void dpm_superior_set_must_resume(struct device *dev)
+{
+	struct device_link *link;
+	int idx;
+
+	if (dev->parent)
+		dev->parent->power.must_resume = true;
+
+	idx = device_links_read_lock();
+
+	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
+		link->supplier->power.must_resume = true;
+
+	device_links_read_unlock(idx);
+}
+
 /**
  * __device_suspend_noirq - Execute a "noirq suspend" callback for given device.
  * @dev: Device to handle.
@@ -1127,10 +1168,28 @@ static int __device_suspend_noirq(struct
 	}
 
 	error = dpm_run_callback(callback, dev, state, info);
-	if (!error)
-		dev->power.is_noirq_suspended = true;
-	else
+	if (error) {
 		async_error = error;
+		goto Complete;
+	}
+
+	dev->power.is_noirq_suspended = true;
+
+	if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) {
+		/*
+		 * The only safe strategy here is to require that if the device
+		 * may not be left in suspend, resume callbacks must be invoked
+		 * for it.
+		 */
+		dev->power.must_resume = dev->power.must_resume ||
+					!dev->power.may_skip_resume ||
+					atomic_read(&dev->power.usage_count) > 1;
+	} else {
+		dev->power.must_resume = true;
+	}
+
+	if (dev->power.must_resume)
+		dpm_superior_set_must_resume(dev);
 
 Complete:
 	complete_all(&dev->power.completion);
@@ -1487,6 +1546,9 @@ static int __device_suspend(struct devic
 		dev->power.direct_complete = false;
 	}
 
+	dev->power.may_skip_resume = false;
+	dev->power.must_resume = false;
+
 	dpm_watchdog_set(&wd, dev);
 	device_lock(dev);
 
@@ -1652,8 +1714,9 @@ static int device_prepare(struct device
 	if (dev->power.syscore)
 		return 0;
 
-	WARN_ON(dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
-		!pm_runtime_enabled(dev));
+	WARN_ON(!pm_runtime_enabled(dev) &&
+		dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND |
+					      DPM_FLAG_LEAVE_SUSPENDED));
 
 	/*
 	 * If a device's parent goes into runtime suspend at the wrong time,
Index: linux-pm/Documentation/driver-api/pm/devices.rst
===================================================================
--- linux-pm.orig/Documentation/driver-api/pm/devices.rst
+++ linux-pm/Documentation/driver-api/pm/devices.rst
@@ -788,6 +788,29 @@ must reflect the "active" status for run
 
 During system-wide resume from a sleep state it's easiest to put devices into
 the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.
-Refer to that document for more information regarding this particular issue as
+[Refer to that document for more information regarding this particular issue as
 well as for information on the device runtime power management framework in
-general.
+general.]
+
+However, it often is desirable to leave devices in suspend after system
+transitions to the working state, especially if those devices had been in
+runtime suspend before the preceding system-wide suspend (or analogous)
+transition.  Device drivers can use the ``DPM_FLAG_LEAVE_SUSPENDED`` flag to
+indicate to the PM core (and middle-layer code) that they prefer the specific
+devices handled by them to be left suspended and they have no problems with
+skipping their system-wide resume callbacks for this reason.  Whether or not the
+devices will actually be left in suspend may depend on their state before the
+given system suspend-resume cycle and on the type of the system transition under
+way.  In particular, devices are not left suspended if that transition is a
+restore from hibernation, as device states are not guaranteed to be reflected
+by the information stored in the hibernation image in that case.
+
+The middle-layer code involved in the handling of the device is expected to
+indicate to the PM core if the device may be left in suspend by setting its
+:c:member:`power.may_skip_resume` status bit which is checked by the PM core
+during the "noirq" phase of the preceding system-wide suspend (or analogous)
+transition.  The middle layer is then responsible for handling the device as
+appropriate in its "noirq" resume callback, which is executed regardless of
+whether or not the device is left suspended, but the other resume callbacks
+(except for ``->complete``) will be skipped automatically by the PM core if the
+device really can be left in suspend.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v4 2/6] PCI / PM: Support for LEAVE_SUSPENDED driver flag
  2017-11-18 14:27       ` [PATCH v4 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
  2017-11-18 14:31         ` [PATCH v4 1/6] PM / core: Add LEAVE_SUSPENDED driver flag Rafael J. Wysocki
@ 2017-11-18 14:33         ` Rafael J. Wysocki
  2017-11-18 14:35         ` [PATCH v4 3/6] ACPI / PM: Support for LEAVE_SUSPENDED driver flag in ACPI PM domain Rafael J. Wysocki
                           ` (3 subsequent siblings)
  5 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-18 14:33 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Add support for DPM_FLAG_LEAVE_SUSPENDED to the PCI bus type by
making it (a) set the power.may_skip_resume status bit for devices
that, from its perspective, may be left in suspend after system
wakeup from sleep and (b) return early from pci_pm_resume_noirq()
for devices whose remaining resume callbacks during the transition
under way are going to be skipped by the PM core.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---

v3 -> v4: No changes.

v2 -> v3: Add the Acked-by from Bjorn, no changes in the patch.

---
 Documentation/power/pci.txt |   11 +++++++++++
 drivers/pci/pci-driver.c    |   19 +++++++++++++++++--
 2 files changed, 28 insertions(+), 2 deletions(-)

Index: linux-pm/drivers/pci/pci-driver.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -699,7 +699,7 @@ static void pci_pm_complete(struct devic
 	pm_generic_complete(dev);
 
 	/* Resume device if platform firmware has put it in reset-power-on */
-	if (dev->power.direct_complete && pm_resume_via_firmware()) {
+	if (pm_runtime_suspended(dev) && pm_resume_via_firmware()) {
 		pci_power_t pre_sleep_state = pci_dev->current_state;
 
 		pci_update_current_state(pci_dev, pci_dev->current_state);
@@ -783,8 +783,10 @@ static int pci_pm_suspend_noirq(struct d
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 
-	if (dev_pm_smart_suspend_and_suspended(dev))
+	if (dev_pm_smart_suspend_and_suspended(dev)) {
+		dev->power.may_skip_resume = true;
 		return 0;
+	}
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_SUSPEND);
@@ -838,6 +840,16 @@ static int pci_pm_suspend_noirq(struct d
 Fixup:
 	pci_fixup_device(pci_fixup_suspend_late, pci_dev);
 
+	/*
+	 * If the target system sleep state is suspend-to-idle, it is sufficient
+	 * to check whether or not the device's wakeup settings are good for
+	 * runtime PM.  Otherwise, the pm_resume_via_firmware() check will cause
+	 * pci_pm_complete() to take care of fixing up the device's state
+	 * anyway, if need be.
+	 */
+	dev->power.may_skip_resume = device_may_wakeup(dev) ||
+					!device_can_wakeup(dev);
+
 	return 0;
 }
 
@@ -847,6 +859,9 @@ static int pci_pm_resume_noirq(struct de
 	struct device_driver *drv = dev->driver;
 	int error = 0;
 
+	if (dev_pm_may_skip_resume(dev))
+		return 0;
+
 	/*
 	 * Devices with DPM_FLAG_SMART_SUSPEND may be left in runtime suspend
 	 * during system suspend, so update their runtime PM status to "active"
Index: linux-pm/Documentation/power/pci.txt
===================================================================
--- linux-pm.orig/Documentation/power/pci.txt
+++ linux-pm/Documentation/power/pci.txt
@@ -994,6 +994,17 @@ into D0 going forward), but if it is in
 the function will set the power.direct_complete flag for it (to make the PM core
 skip the subsequent "thaw" callbacks for it) and return.
 
+Setting the DPM_FLAG_LEAVE_SUSPENDED flag means that the driver prefers the
+device to be left in suspend after system-wide transitions to the working state.
+This flag is checked by the PM core, but the PCI bus type informs the PM core
+which devices may be left in suspend from its perspective (that happens during
+the "noirq" phase of system-wide suspend and analogous transitions) and next it
+uses the dev_pm_may_skip_resume() helper to decide whether or not to return from
+pci_pm_resume_noirq() early, as the PM core will skip the remaining resume
+callbacks for the device during the transition under way and will set its
+runtime PM status to "suspended" if dev_pm_may_skip_resume() returns "true" for
+it.
+
 3.2. Device Runtime Power Management
 ------------------------------------
 In addition to providing device power management callbacks PCI device drivers

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v4 3/6] ACPI / PM: Support for LEAVE_SUSPENDED driver flag in ACPI PM domain
  2017-11-18 14:27       ` [PATCH v4 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
  2017-11-18 14:31         ` [PATCH v4 1/6] PM / core: Add LEAVE_SUSPENDED driver flag Rafael J. Wysocki
  2017-11-18 14:33         ` [PATCH v4 2/6] PCI / PM: Support for " Rafael J. Wysocki
@ 2017-11-18 14:35         ` Rafael J. Wysocki
  2017-11-18 14:37         ` [PATCH v4 4/6] PM / core: Add helpers for subsystem callback selection Rafael J. Wysocki
                           ` (2 subsequent siblings)
  5 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-18 14:35 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Add support for DPM_FLAG_LEAVE_SUSPENDED to the ACPI PM domain by
making it (a) set the power.may_skip_resume status bit for devices
that, from its perspective, may be left in suspend after system
wakeup from sleep and (b) return early from acpi_subsys_resume_noirq()
for devices whose remaining resume callbacks during the transition
under way are going to be skipped by the PM core.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---

v2 -> v4: No changes.

---
 drivers/acpi/device_pm.c |   27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

Index: linux-pm/drivers/acpi/device_pm.c
===================================================================
--- linux-pm.orig/drivers/acpi/device_pm.c
+++ linux-pm/drivers/acpi/device_pm.c
@@ -990,7 +990,7 @@ void acpi_subsys_complete(struct device
 	 * the sleep state it is going out of and it has never been resumed till
 	 * now, resume it in case the firmware powered it up.
 	 */
-	if (dev->power.direct_complete && pm_resume_via_firmware())
+	if (pm_runtime_suspended(dev) && pm_resume_via_firmware())
 		pm_request_resume(dev);
 }
 EXPORT_SYMBOL_GPL(acpi_subsys_complete);
@@ -1039,10 +1039,28 @@ EXPORT_SYMBOL_GPL(acpi_subsys_suspend_la
  */
 int acpi_subsys_suspend_noirq(struct device *dev)
 {
-	if (dev_pm_smart_suspend_and_suspended(dev))
+	int ret;
+
+	if (dev_pm_smart_suspend_and_suspended(dev)) {
+		dev->power.may_skip_resume = true;
 		return 0;
+	}
 
-	return pm_generic_suspend_noirq(dev);
+	ret = pm_generic_suspend_noirq(dev);
+	if (ret)
+		return ret;
+
+	/*
+	 * If the target system sleep state is suspend-to-idle, it is sufficient
+	 * to check whether or not the device's wakeup settings are good for
+	 * runtime PM.  Otherwise, the pm_resume_via_firmware() check will cause
+	 * acpi_subsys_complete() to take care of fixing up the device's state
+	 * anyway, if need be.
+	 */
+	dev->power.may_skip_resume = device_may_wakeup(dev) ||
+					!device_can_wakeup(dev);
+
+	return 0;
 }
 EXPORT_SYMBOL_GPL(acpi_subsys_suspend_noirq);
 
@@ -1052,6 +1070,9 @@ EXPORT_SYMBOL_GPL(acpi_subsys_suspend_no
  */
 int acpi_subsys_resume_noirq(struct device *dev)
 {
+	if (dev_pm_may_skip_resume(dev))
+		return 0;
+
 	/*
 	 * Devices with DPM_FLAG_SMART_SUSPEND may be left in runtime suspend
 	 * during system suspend, so update their runtime PM status to "active"

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v4 4/6] PM / core: Add helpers for subsystem callback selection
  2017-11-18 14:27       ` [PATCH v4 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
                           ` (2 preceding siblings ...)
  2017-11-18 14:35         ` [PATCH v4 3/6] ACPI / PM: Support for LEAVE_SUSPENDED driver flag in ACPI PM domain Rafael J. Wysocki
@ 2017-11-18 14:37         ` Rafael J. Wysocki
  2017-11-18 14:41         ` [PATCH v4 5/6] PM / core: Direct handling of DPM_FLAG_LEAVE_SUSPENDED Rafael J. Wysocki
  2017-11-18 14:44         ` [PATCH v4 6/6] PM / core: DPM_FLAG_SMART_SUSPEND optimization Rafael J. Wysocki
  5 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-18 14:37 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Add helper routines to find and return a suitable subsystem callback
during the "noirq" phases of system suspend/resume (or analogous)
transitions as well as during the "late" phase of system suspend and
the "early" phase of system resume (or analogous) transitions.

The helpers will be called from additional sites going forward.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
---

v3 -> v4: Move forward declarations of two functions to subsequent patches,
          add Reviewed-by from Ulf.

v2 -> v3: No changes.

---
 drivers/base/power/main.c |  188 +++++++++++++++++++++++++++++++---------------
 1 file changed, 128 insertions(+), 60 deletions(-)

Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -539,6 +539,35 @@ bool dev_pm_may_skip_resume(struct devic
 	return !dev->power.must_resume && pm_transition.event != PM_EVENT_RESTORE;
 }
 
+static pm_callback_t dpm_subsys_resume_noirq_cb(struct device *dev,
+						pm_message_t state,
+						const char **info_p)
+{
+	pm_callback_t callback;
+	const char *info;
+
+	if (dev->pm_domain) {
+		info = "noirq power domain ";
+		callback = pm_noirq_op(&dev->pm_domain->ops, state);
+	} else if (dev->type && dev->type->pm) {
+		info = "noirq type ";
+		callback = pm_noirq_op(dev->type->pm, state);
+	} else if (dev->class && dev->class->pm) {
+		info = "noirq class ";
+		callback = pm_noirq_op(dev->class->pm, state);
+	} else if (dev->bus && dev->bus->pm) {
+		info = "noirq bus ";
+		callback = pm_noirq_op(dev->bus->pm, state);
+	} else {
+		return NULL;
+	}
+
+	if (info_p)
+		*info_p = info;
+
+	return callback;
+}
+
 /**
  * device_resume_noirq - Execute a "noirq resume" callback for given device.
  * @dev: Device to handle.
@@ -550,8 +579,8 @@ bool dev_pm_may_skip_resume(struct devic
  */
 static int device_resume_noirq(struct device *dev, pm_message_t state, bool async)
 {
-	pm_callback_t callback = NULL;
-	const char *info = NULL;
+	pm_callback_t callback;
+	const char *info;
 	int error = 0;
 
 	TRACE_DEVICE(dev);
@@ -565,19 +594,7 @@ static int device_resume_noirq(struct de
 
 	dpm_wait_for_superior(dev, async);
 
-	if (dev->pm_domain) {
-		info = "noirq power domain ";
-		callback = pm_noirq_op(&dev->pm_domain->ops, state);
-	} else if (dev->type && dev->type->pm) {
-		info = "noirq type ";
-		callback = pm_noirq_op(dev->type->pm, state);
-	} else if (dev->class && dev->class->pm) {
-		info = "noirq class ";
-		callback = pm_noirq_op(dev->class->pm, state);
-	} else if (dev->bus && dev->bus->pm) {
-		info = "noirq bus ";
-		callback = pm_noirq_op(dev->bus->pm, state);
-	}
+	callback = dpm_subsys_resume_noirq_cb(dev, state, &info);
 
 	if (!callback && dev->driver && dev->driver->pm) {
 		info = "noirq driver ";
@@ -693,6 +710,35 @@ void dpm_resume_noirq(pm_message_t state
 	dpm_noirq_end();
 }
 
+static pm_callback_t dpm_subsys_resume_early_cb(struct device *dev,
+						pm_message_t state,
+						const char **info_p)
+{
+	pm_callback_t callback;
+	const char *info;
+
+	if (dev->pm_domain) {
+		info = "early power domain ";
+		callback = pm_late_early_op(&dev->pm_domain->ops, state);
+	} else if (dev->type && dev->type->pm) {
+		info = "early type ";
+		callback = pm_late_early_op(dev->type->pm, state);
+	} else if (dev->class && dev->class->pm) {
+		info = "early class ";
+		callback = pm_late_early_op(dev->class->pm, state);
+	} else if (dev->bus && dev->bus->pm) {
+		info = "early bus ";
+		callback = pm_late_early_op(dev->bus->pm, state);
+	} else {
+		return NULL;
+	}
+
+	if (info_p)
+		*info_p = info;
+
+	return callback;
+}
+
 /**
  * device_resume_early - Execute an "early resume" callback for given device.
  * @dev: Device to handle.
@@ -703,8 +749,8 @@ void dpm_resume_noirq(pm_message_t state
  */
 static int device_resume_early(struct device *dev, pm_message_t state, bool async)
 {
-	pm_callback_t callback = NULL;
-	const char *info = NULL;
+	pm_callback_t callback;
+	const char *info;
 	int error = 0;
 
 	TRACE_DEVICE(dev);
@@ -718,19 +764,7 @@ static int device_resume_early(struct de
 
 	dpm_wait_for_superior(dev, async);
 
-	if (dev->pm_domain) {
-		info = "early power domain ";
-		callback = pm_late_early_op(&dev->pm_domain->ops, state);
-	} else if (dev->type && dev->type->pm) {
-		info = "early type ";
-		callback = pm_late_early_op(dev->type->pm, state);
-	} else if (dev->class && dev->class->pm) {
-		info = "early class ";
-		callback = pm_late_early_op(dev->class->pm, state);
-	} else if (dev->bus && dev->bus->pm) {
-		info = "early bus ";
-		callback = pm_late_early_op(dev->bus->pm, state);
-	}
+	callback = dpm_subsys_resume_early_cb(dev, state, &info);
 
 	if (!callback && dev->driver && dev->driver->pm) {
 		info = "early driver ";
@@ -1117,6 +1151,35 @@ static void dpm_superior_set_must_resume
 	device_links_read_unlock(idx);
 }
 
+static pm_callback_t dpm_subsys_suspend_noirq_cb(struct device *dev,
+						 pm_message_t state,
+						 const char **info_p)
+{
+	pm_callback_t callback;
+	const char *info;
+
+	if (dev->pm_domain) {
+		info = "noirq power domain ";
+		callback = pm_noirq_op(&dev->pm_domain->ops, state);
+	} else if (dev->type && dev->type->pm) {
+		info = "noirq type ";
+		callback = pm_noirq_op(dev->type->pm, state);
+	} else if (dev->class && dev->class->pm) {
+		info = "noirq class ";
+		callback = pm_noirq_op(dev->class->pm, state);
+	} else if (dev->bus && dev->bus->pm) {
+		info = "noirq bus ";
+		callback = pm_noirq_op(dev->bus->pm, state);
+	} else {
+		return NULL;
+	}
+
+	if (info_p)
+		*info_p = info;
+
+	return callback;
+}
+
 /**
  * __device_suspend_noirq - Execute a "noirq suspend" callback for given device.
  * @dev: Device to handle.
@@ -1128,8 +1191,8 @@ static void dpm_superior_set_must_resume
  */
 static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool async)
 {
-	pm_callback_t callback = NULL;
-	const char *info = NULL;
+	pm_callback_t callback;
+	const char *info;
 	int error = 0;
 
 	TRACE_DEVICE(dev);
@@ -1148,19 +1211,7 @@ static int __device_suspend_noirq(struct
 	if (dev->power.syscore || dev->power.direct_complete)
 		goto Complete;
 
-	if (dev->pm_domain) {
-		info = "noirq power domain ";
-		callback = pm_noirq_op(&dev->pm_domain->ops, state);
-	} else if (dev->type && dev->type->pm) {
-		info = "noirq type ";
-		callback = pm_noirq_op(dev->type->pm, state);
-	} else if (dev->class && dev->class->pm) {
-		info = "noirq class ";
-		callback = pm_noirq_op(dev->class->pm, state);
-	} else if (dev->bus && dev->bus->pm) {
-		info = "noirq bus ";
-		callback = pm_noirq_op(dev->bus->pm, state);
-	}
+	callback = dpm_subsys_suspend_noirq_cb(dev, state, &info);
 
 	if (!callback && dev->driver && dev->driver->pm) {
 		info = "noirq driver ";
@@ -1295,6 +1346,35 @@ int dpm_suspend_noirq(pm_message_t state
 	return ret;
 }
 
+static pm_callback_t dpm_subsys_suspend_late_cb(struct device *dev,
+						pm_message_t state,
+						const char **info_p)
+{
+	pm_callback_t callback;
+	const char *info;
+
+	if (dev->pm_domain) {
+		info = "late power domain ";
+		callback = pm_late_early_op(&dev->pm_domain->ops, state);
+	} else if (dev->type && dev->type->pm) {
+		info = "late type ";
+		callback = pm_late_early_op(dev->type->pm, state);
+	} else if (dev->class && dev->class->pm) {
+		info = "late class ";
+		callback = pm_late_early_op(dev->class->pm, state);
+	} else if (dev->bus && dev->bus->pm) {
+		info = "late bus ";
+		callback = pm_late_early_op(dev->bus->pm, state);
+	} else {
+		return NULL;
+	}
+
+	if (info_p)
+		*info_p = info;
+
+	return callback;
+}
+
 /**
  * __device_suspend_late - Execute a "late suspend" callback for given device.
  * @dev: Device to handle.
@@ -1305,8 +1385,8 @@ int dpm_suspend_noirq(pm_message_t state
  */
 static int __device_suspend_late(struct device *dev, pm_message_t state, bool async)
 {
-	pm_callback_t callback = NULL;
-	const char *info = NULL;
+	pm_callback_t callback;
+	const char *info;
 	int error = 0;
 
 	TRACE_DEVICE(dev);
@@ -1327,19 +1407,7 @@ static int __device_suspend_late(struct
 	if (dev->power.syscore || dev->power.direct_complete)
 		goto Complete;
 
-	if (dev->pm_domain) {
-		info = "late power domain ";
-		callback = pm_late_early_op(&dev->pm_domain->ops, state);
-	} else if (dev->type && dev->type->pm) {
-		info = "late type ";
-		callback = pm_late_early_op(dev->type->pm, state);
-	} else if (dev->class && dev->class->pm) {
-		info = "late class ";
-		callback = pm_late_early_op(dev->class->pm, state);
-	} else if (dev->bus && dev->bus->pm) {
-		info = "late bus ";
-		callback = pm_late_early_op(dev->bus->pm, state);
-	}
+	callback = dpm_subsys_suspend_late_cb(dev, state, &info);
 
 	if (!callback && dev->driver && dev->driver->pm) {
 		info = "late driver ";

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v4 5/6] PM / core: Direct handling of DPM_FLAG_LEAVE_SUSPENDED
  2017-11-18 14:27       ` [PATCH v4 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
                           ` (3 preceding siblings ...)
  2017-11-18 14:37         ` [PATCH v4 4/6] PM / core: Add helpers for subsystem callback selection Rafael J. Wysocki
@ 2017-11-18 14:41         ` Rafael J. Wysocki
  2017-11-20 13:42           ` Ulf Hansson
  2017-11-18 14:44         ` [PATCH v4 6/6] PM / core: DPM_FLAG_SMART_SUSPEND optimization Rafael J. Wysocki
  5 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-18 14:41 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make the PM core handle DPM_FLAG_LEAVE_SUSPENDED directly for
devices whose "noirq", "late" and "early" driver callbacks are
invoked directly by it.

Namely, make it skip all of the system-wide resume callbacks for
such devices with DPM_FLAG_LEAVE_SUSPENDED set if they are in
runtime suspend during the "noirq" phase of system-wide suspend
(or analogous) transitions or the system transition under way is
a proper suspend (rather than anything related to hibernation) and
the device's wakeup settings are compatible with runtime PM (that
is, the device cannot generate wakeup signals at all or it is
allowed to wake up the system from sleep).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v3 -> v4: Rebase on the v4 of patch [1/6], add a forward declaration of
          dpm_subsys_suspend_late_cb() dropped from the [4/6].

v2 -> v3: Rebase on the v3 of patch [1/6].

---
 Documentation/driver-api/pm/devices.rst |    9 +++++
 drivers/base/power/main.c               |   51 ++++++++++++++++++++++++++++----
 2 files changed, 55 insertions(+), 5 deletions(-)

Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -525,6 +525,10 @@ static void dpm_watchdog_clear(struct dp
 #define dpm_watchdog_clear(x)
 #endif
 
+static pm_callback_t dpm_subsys_suspend_late_cb(struct device *dev,
+						pm_message_t state,
+						const char **info_p);
+
 /*------------------------- Resume routines -------------------------*/
 
 /**
@@ -581,6 +585,7 @@ static int device_resume_noirq(struct de
 {
 	pm_callback_t callback;
 	const char *info;
+	bool skip_resume;
 	int error = 0;
 
 	TRACE_DEVICE(dev);
@@ -594,17 +599,27 @@ static int device_resume_noirq(struct de
 
 	dpm_wait_for_superior(dev, async);
 
+	skip_resume = dev_pm_may_skip_resume(dev);
+
 	callback = dpm_subsys_resume_noirq_cb(dev, state, &info);
+	if (callback)
+		goto Run;
+
+	if (skip_resume)
+		goto Skip;
 
 	if (!callback && dev->driver && dev->driver->pm) {
 		info = "noirq driver ";
 		callback = pm_noirq_op(dev->driver->pm, state);
 	}
 
+Run:
 	error = dpm_run_callback(callback, dev, state, info);
+
+Skip:
 	dev->power.is_noirq_suspended = false;
 
-	if (dev_pm_may_skip_resume(dev)) {
+	if (skip_resume) {
 		/*
 		 * The device is going to be left in suspend, but it might not
 		 * have been in runtime suspend before the system suspended, so
@@ -617,7 +632,7 @@ static int device_resume_noirq(struct de
 		dev->power.is_suspended = false;
 	}
 
- Out:
+Out:
 	complete_all(&dev->power.completion);
 	TRACE_RESUME(error);
 	return error;
@@ -1193,6 +1208,7 @@ static int __device_suspend_noirq(struct
 {
 	pm_callback_t callback;
 	const char *info;
+	bool direct_cb = false;
 	int error = 0;
 
 	TRACE_DEVICE(dev);
@@ -1212,12 +1228,17 @@ static int __device_suspend_noirq(struct
 		goto Complete;
 
 	callback = dpm_subsys_suspend_noirq_cb(dev, state, &info);
+	if (callback)
+		goto Run;
 
-	if (!callback && dev->driver && dev->driver->pm) {
+	direct_cb = true;
+
+	if (dev->driver && dev->driver->pm) {
 		info = "noirq driver ";
 		callback = pm_noirq_op(dev->driver->pm, state);
 	}
 
+Run:
 	error = dpm_run_callback(callback, dev, state, info);
 	if (error) {
 		async_error = error;
@@ -1227,13 +1248,33 @@ static int __device_suspend_noirq(struct
 	dev->power.is_noirq_suspended = true;
 
 	if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) {
+		pm_message_t resume_msg = resume_event(state);
+		bool skip_resume;
+
+		if (direct_cb &&
+		    !dpm_subsys_suspend_late_cb(dev, state, NULL) &&
+		    !dpm_subsys_resume_early_cb(dev, resume_msg, NULL) &&
+		    !dpm_subsys_resume_noirq_cb(dev, resume_msg, NULL)) {
+			/*
+			 * If all of the device driver's "noirq", "late" and
+			 * "early" callbacks are invoked directly by the core,
+			 * the decision to allow the device to stay in suspend
+			 * can be based on its current runtime PM status and its
+			 * wakeup settings.
+			 */
+			skip_resume = pm_runtime_status_suspended(dev) ||
+				(resume_msg.event == PM_EVENT_RESUME &&
+				 (!device_can_wakeup(dev) ||
+				  device_may_wakeup(dev)));
+		} else {
+			skip_resume = dev->power.may_skip_resume;
+		}
 		/*
 		 * The only safe strategy here is to require that if the device
 		 * may not be left in suspend, resume callbacks must be invoked
 		 * for it.
 		 */
-		dev->power.must_resume = dev->power.must_resume ||
-					!dev->power.may_skip_resume ||
+		dev->power.must_resume = dev->power.must_resume || !skip_resume ||
 					atomic_read(&dev->power.usage_count) > 1;
 	} else {
 		dev->power.must_resume = true;
Index: linux-pm/Documentation/driver-api/pm/devices.rst
===================================================================
--- linux-pm.orig/Documentation/driver-api/pm/devices.rst
+++ linux-pm/Documentation/driver-api/pm/devices.rst
@@ -814,3 +814,12 @@ middle layer is then responsible for han
 whether or not the device is left suspended, but the other resume callbacks
 (except for ``->complete``) will be skipped automatically by the PM core if the
 device really can be left in suspend.
+
+For devices whose "noirq", "late" and "early" driver callbacks are invoked
+directly by the PM core, all of the system-wide resume callbacks are skipped if
+``DPM_FLAG_LEAVE_SUSPENDED`` is set and the device is in runtime suspend during
+the ``suspend_noirq`` (or analogous) phase or the transition under way is a
+proper system suspend (rather than anything related to hibernation) and the
+device's wakeup settings are suitable for runtime PM (that is, it cannot
+generate wakeup signals at all or it is allowed to wake up the system from
+sleep).

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH v4 6/6] PM / core: DPM_FLAG_SMART_SUSPEND optimization
  2017-11-18 14:27       ` [PATCH v4 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
                           ` (4 preceding siblings ...)
  2017-11-18 14:41         ` [PATCH v4 5/6] PM / core: Direct handling of DPM_FLAG_LEAVE_SUSPENDED Rafael J. Wysocki
@ 2017-11-18 14:44         ` Rafael J. Wysocki
  5 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-18 14:44 UTC (permalink / raw)
  To: Linux PM
  Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML, Linux ACPI,
	Linux PCI, Linux Documentation, Mika Westerberg, Ulf Hansson,
	Andy Shevchenko, Kevin Hilman

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make the PM core avoid invoking the "late" and "noirq" system-wide
suspend (or analogous) callbacks for devices that are in runtime
suspend during the corresponding phases of system-wide suspend
(or analogous) transitions.

The underlying observation is that runtime PM is disabled for
devices during those system-wide suspend phases, so their runtime
PM status should not change going forward and if it has not changed
so far, their state should be compatible with the target system
sleep state.

This change really makes it possible for, say, platform device
drivers to re-use runtime PM suspend and resume callbacks by
pointing ->suspend_late and ->resume_early, respectively (and
possibly the analogous hibernation-related callback pointers too),
to them without adding any extra "is the device already suspended?"
type of checks to the callback routines, as long as they will be
invoked directly by the core.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v3 -> v4: Add a forward declaration of dpm_subsys_suspend_noirq_cb()
          dropped from patch [4/6].

v2 -> v3: No changes.

---
 Documentation/driver-api/pm/devices.rst |   18 ++++----
 drivers/base/power/main.c               |   66 +++++++++++++++++++++++++++++---
 2 files changed, 70 insertions(+), 14 deletions(-)

Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -525,6 +525,10 @@ static void dpm_watchdog_clear(struct dp
 #define dpm_watchdog_clear(x)
 #endif
 
+static pm_callback_t dpm_subsys_suspend_noirq_cb(struct device *dev,
+						 pm_message_t state,
+						 const char **info_p);
+
 static pm_callback_t dpm_subsys_suspend_late_cb(struct device *dev,
 						pm_message_t state,
 						const char **info_p);
@@ -532,6 +536,24 @@ static pm_callback_t dpm_subsys_suspend_
 /*------------------------- Resume routines -------------------------*/
 
 /**
+ * suspend_event - Return a "suspend" message for given "resume" one.
+ * @resume_msg: PM message representing a system-wide resume transition.
+ */
+static pm_message_t suspend_event(pm_message_t resume_msg)
+{
+	switch (resume_msg.event) {
+	case PM_EVENT_RESUME:
+		return PMSG_SUSPEND;
+	case PM_EVENT_THAW:
+	case PM_EVENT_RESTORE:
+		return PMSG_FREEZE;
+	case PM_EVENT_RECOVER:
+		return PMSG_HIBERNATE;
+	}
+	return PMSG_ON;
+}
+
+/**
  * dev_pm_may_skip_resume - System-wide device resume optimization check.
  * @dev: Target device.
  *
@@ -605,6 +627,25 @@ static int device_resume_noirq(struct de
 	if (callback)
 		goto Run;
 
+	if (dev_pm_smart_suspend_and_suspended(dev)) {
+		pm_message_t suspend_msg = suspend_event(state);
+
+		/*
+		 * If "freeze" callbacks have been skipped during a transition
+		 * related to hibernation, the subsequent "thaw" callbacks must
+		 * be skipped too or bad things may happen.  Otherwise, if the
+		 * device is to be resumed, its runtime PM status must be
+		 * changed to reflect the new configuration.
+		 */
+		if (!dpm_subsys_suspend_late_cb(dev, suspend_msg, NULL) &&
+		    !dpm_subsys_suspend_noirq_cb(dev, suspend_msg, NULL)) {
+			if (state.event == PM_EVENT_THAW)
+				skip_resume = true;
+			else if (!skip_resume)
+				pm_runtime_set_active(dev);
+		}
+	}
+
 	if (skip_resume)
 		goto Skip;
 
@@ -1231,7 +1272,10 @@ static int __device_suspend_noirq(struct
 	if (callback)
 		goto Run;
 
-	direct_cb = true;
+	direct_cb = !dpm_subsys_suspend_late_cb(dev, state, NULL);
+
+	if (dev_pm_smart_suspend_and_suspended(dev) && direct_cb)
+		goto Skip;
 
 	if (dev->driver && dev->driver->pm) {
 		info = "noirq driver ";
@@ -1245,6 +1289,7 @@ Run:
 		goto Complete;
 	}
 
+Skip:
 	dev->power.is_noirq_suspended = true;
 
 	if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) {
@@ -1252,7 +1297,6 @@ Run:
 		bool skip_resume;
 
 		if (direct_cb &&
-		    !dpm_subsys_suspend_late_cb(dev, state, NULL) &&
 		    !dpm_subsys_resume_early_cb(dev, resume_msg, NULL) &&
 		    !dpm_subsys_resume_noirq_cb(dev, resume_msg, NULL)) {
 			/*
@@ -1449,17 +1493,27 @@ static int __device_suspend_late(struct
 		goto Complete;
 
 	callback = dpm_subsys_suspend_late_cb(dev, state, &info);
+	if (callback)
+		goto Run;
 
-	if (!callback && dev->driver && dev->driver->pm) {
+	if (dev_pm_smart_suspend_and_suspended(dev) &&
+	    !dpm_subsys_suspend_noirq_cb(dev, state, NULL))
+		goto Skip;
+
+	if (dev->driver && dev->driver->pm) {
 		info = "late driver ";
 		callback = pm_late_early_op(dev->driver->pm, state);
 	}
 
+Run:
 	error = dpm_run_callback(callback, dev, state, info);
-	if (!error)
-		dev->power.is_late_suspended = true;
-	else
+	if (error) {
 		async_error = error;
+		goto Complete;
+	}
+
+Skip:
+	dev->power.is_late_suspended = true;
 
 Complete:
 	TRACE_SUSPEND(error);
Index: linux-pm/Documentation/driver-api/pm/devices.rst
===================================================================
--- linux-pm.orig/Documentation/driver-api/pm/devices.rst
+++ linux-pm/Documentation/driver-api/pm/devices.rst
@@ -777,14 +777,16 @@ The driver can indicate that by setting
 runtime suspend at the beginning of the ``suspend_late`` phase of system-wide
 suspend (or in the ``poweroff_late`` phase of hibernation), when runtime PM
 has been disabled for it, under the assumption that its state should not change
-after that point until the system-wide transition is over.  If that happens, the
-driver's system-wide resume callbacks, if present, may still be invoked during
-the subsequent system-wide resume transition and the device's runtime power
-management status may be set to "active" before enabling runtime PM for it,
-so the driver must be prepared to cope with the invocation of its system-wide
-resume callbacks back-to-back with its ``->runtime_suspend`` one (without the
-intervening ``->runtime_resume`` and so on) and the final state of the device
-must reflect the "active" status for runtime PM in that case.
+after that point until the system-wide transition is over (the PM core itself
+does that for devices whose "noirq", "late" and "early" system-wide PM callbacks
+are executed directly by it).  If that happens, the driver's system-wide resume
+callbacks, if present, may still be invoked during the subsequent system-wide
+resume transition and the device's runtime power management status may be set
+to "active" before enabling runtime PM for it, so the driver must be prepared to
+cope with the invocation of its system-wide resume callbacks back-to-back with
+its ``->runtime_suspend`` one (without the intervening ``->runtime_resume`` and
+so on) and the final state of the device must reflect the "active" status for
+runtime PM in that case.
 
 During system-wide resume from a sleep state it's easiest to put devices into
 the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v4 1/6] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-11-18 14:31         ` [PATCH v4 1/6] PM / core: Add LEAVE_SUSPENDED driver flag Rafael J. Wysocki
@ 2017-11-20 12:25           ` Ulf Hansson
  2017-11-21  0:16             ` Rafael J. Wysocki
  0 siblings, 1 reply; 135+ messages in thread
From: Ulf Hansson @ 2017-11-20 12:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman

On 18 November 2017 at 15:31, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
> instruct the PM core and middle-layer (bus type, PM domain, etc.)
> code that it is desirable to leave the device in runtime suspend
> after system-wide transitions to the working state (for example,
> the device may be slow to resume and it may be better to avoid
> resuming it right away).
>
> Generally, the middle-layer code involved in the handling of the
> device is expected to indicate to the PM core whether or not the
> device may be left in suspend with the help of the device's
> power.may_skip_resume status bit.  That has to happen in the "noirq"
> phase of the preceding system suspend (or analogous) transition.
> The middle layer is then responsible for handling the device as
> appropriate in its "noirq" resume callback which is executed
> regardless of whether or not the device may be left suspended, but
> the other resume callbacks (except for ->complete) will be skipped
> automatically by the core if the device really can be left in
> suspend.
>
> The additional power.must_resume status bit introduced for the
> implementation of this mechanisn is used internally by the PM core
> to track the requirement to resume the device (which may depend on
> its children etc).
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

Kind regards
Uffe

> ---
>
> v3 -> v4: Fix the dev->power.usage_count check added in v3, clarify
>           documentation, add comment to explain why the runtime PM status
>           is changed in device_resume_noirq() and make the description of
>           the NEVER_SKIP flag more precise.
>
> v2 -> v3: Take dev->power.usage_count when updating power.must_resume in
>           __device_suspend_noirq().
>
> ---
>  Documentation/driver-api/pm/devices.rst |   27 ++++++++++-
>  drivers/base/power/main.c               |   73 +++++++++++++++++++++++++++++---
>  include/linux/pm.h                      |   16 +++++--
>  3 files changed, 105 insertions(+), 11 deletions(-)
>
> Index: linux-pm/include/linux/pm.h
> ===================================================================
> --- linux-pm.orig/include/linux/pm.h
> +++ linux-pm/include/linux/pm.h
> @@ -556,9 +556,10 @@ struct pm_subsys_data {
>   * These flags can be set by device drivers at the probe time.  They need not be
>   * cleared by the drivers as the driver core will take care of that.
>   *
> - * NEVER_SKIP: Do not skip system suspend/resume callbacks for the device.
> + * NEVER_SKIP: Do not skip all system suspend/resume callbacks for the device.
>   * SMART_PREPARE: Check the return value of the driver's ->prepare callback.
>   * SMART_SUSPEND: No need to resume the device from runtime suspend.
> + * LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible.
>   *
>   * Setting SMART_PREPARE instructs bus types and PM domains which may want
>   * system suspend/resume callbacks to be skipped for the device to return 0 from
> @@ -572,10 +573,14 @@ struct pm_subsys_data {
>   * necessary from the driver's perspective.  It also may cause them to skip
>   * invocations of the ->suspend_late and ->suspend_noirq callbacks provided by
>   * the driver if they decide to leave the device in runtime suspend.
> + *
> + * Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the
> + * driver prefers the device to be left in suspend after system resume.
>   */
> -#define DPM_FLAG_NEVER_SKIP    BIT(0)
> -#define DPM_FLAG_SMART_PREPARE BIT(1)
> -#define DPM_FLAG_SMART_SUSPEND BIT(2)
> +#define DPM_FLAG_NEVER_SKIP            BIT(0)
> +#define DPM_FLAG_SMART_PREPARE         BIT(1)
> +#define DPM_FLAG_SMART_SUSPEND         BIT(2)
> +#define DPM_FLAG_LEAVE_SUSPENDED       BIT(3)
>
>  struct dev_pm_info {
>         pm_message_t            power_state;
> @@ -597,6 +602,8 @@ struct dev_pm_info {
>         bool                    wakeup_path:1;
>         bool                    syscore:1;
>         bool                    no_pm_callbacks:1;      /* Owned by the PM core */
> +       unsigned int            must_resume:1;  /* Owned by the PM core */
> +       unsigned int            may_skip_resume:1;      /* Set by subsystems */
>  #else
>         unsigned int            should_wakeup:1;
>  #endif
> @@ -765,6 +772,7 @@ extern int pm_generic_poweroff_late(stru
>  extern int pm_generic_poweroff(struct device *dev);
>  extern void pm_generic_complete(struct device *dev);
>
> +extern bool dev_pm_may_skip_resume(struct device *dev);
>  extern bool dev_pm_smart_suspend_and_suspended(struct device *dev);
>
>  #else /* !CONFIG_PM_SLEEP */
> Index: linux-pm/drivers/base/power/main.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/main.c
> +++ linux-pm/drivers/base/power/main.c
> @@ -528,6 +528,18 @@ static void dpm_watchdog_clear(struct dp
>  /*------------------------- Resume routines -------------------------*/
>
>  /**
> + * dev_pm_may_skip_resume - System-wide device resume optimization check.
> + * @dev: Target device.
> + *
> + * Checks whether or not the device may be left in suspend after a system-wide
> + * transition to the working state.
> + */
> +bool dev_pm_may_skip_resume(struct device *dev)
> +{
> +       return !dev->power.must_resume && pm_transition.event != PM_EVENT_RESTORE;
> +}
> +
> +/**
>   * device_resume_noirq - Execute a "noirq resume" callback for given device.
>   * @dev: Device to handle.
>   * @state: PM transition of the system being carried out.
> @@ -575,6 +587,19 @@ static int device_resume_noirq(struct de
>         error = dpm_run_callback(callback, dev, state, info);
>         dev->power.is_noirq_suspended = false;
>
> +       if (dev_pm_may_skip_resume(dev)) {
> +               /*
> +                * The device is going to be left in suspend, but it might not
> +                * have been in runtime suspend before the system suspended, so
> +                * its runtime PM status needs to be updated to avoid confusing
> +                * the runtime PM framework when runtime PM is enabled for the
> +                * device again.
> +                */
> +               pm_runtime_set_suspended(dev);
> +               dev->power.is_late_suspended = false;
> +               dev->power.is_suspended = false;
> +       }
> +
>   Out:
>         complete_all(&dev->power.completion);
>         TRACE_RESUME(error);
> @@ -1076,6 +1101,22 @@ static pm_message_t resume_event(pm_mess
>         return PMSG_ON;
>  }
>
> +static void dpm_superior_set_must_resume(struct device *dev)
> +{
> +       struct device_link *link;
> +       int idx;
> +
> +       if (dev->parent)
> +               dev->parent->power.must_resume = true;
> +
> +       idx = device_links_read_lock();
> +
> +       list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
> +               link->supplier->power.must_resume = true;
> +
> +       device_links_read_unlock(idx);
> +}
> +
>  /**
>   * __device_suspend_noirq - Execute a "noirq suspend" callback for given device.
>   * @dev: Device to handle.
> @@ -1127,10 +1168,28 @@ static int __device_suspend_noirq(struct
>         }
>
>         error = dpm_run_callback(callback, dev, state, info);
> -       if (!error)
> -               dev->power.is_noirq_suspended = true;
> -       else
> +       if (error) {
>                 async_error = error;
> +               goto Complete;
> +       }
> +
> +       dev->power.is_noirq_suspended = true;
> +
> +       if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) {
> +               /*
> +                * The only safe strategy here is to require that if the device
> +                * may not be left in suspend, resume callbacks must be invoked
> +                * for it.
> +                */
> +               dev->power.must_resume = dev->power.must_resume ||
> +                                       !dev->power.may_skip_resume ||
> +                                       atomic_read(&dev->power.usage_count) > 1;
> +       } else {
> +               dev->power.must_resume = true;
> +       }
> +
> +       if (dev->power.must_resume)
> +               dpm_superior_set_must_resume(dev);
>
>  Complete:
>         complete_all(&dev->power.completion);
> @@ -1487,6 +1546,9 @@ static int __device_suspend(struct devic
>                 dev->power.direct_complete = false;
>         }
>
> +       dev->power.may_skip_resume = false;
> +       dev->power.must_resume = false;
> +
>         dpm_watchdog_set(&wd, dev);
>         device_lock(dev);
>
> @@ -1652,8 +1714,9 @@ static int device_prepare(struct device
>         if (dev->power.syscore)
>                 return 0;
>
> -       WARN_ON(dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
> -               !pm_runtime_enabled(dev));
> +       WARN_ON(!pm_runtime_enabled(dev) &&
> +               dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND |
> +                                             DPM_FLAG_LEAVE_SUSPENDED));
>
>         /*
>          * If a device's parent goes into runtime suspend at the wrong time,
> Index: linux-pm/Documentation/driver-api/pm/devices.rst
> ===================================================================
> --- linux-pm.orig/Documentation/driver-api/pm/devices.rst
> +++ linux-pm/Documentation/driver-api/pm/devices.rst
> @@ -788,6 +788,29 @@ must reflect the "active" status for run
>
>  During system-wide resume from a sleep state it's easiest to put devices into
>  the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.
> -Refer to that document for more information regarding this particular issue as
> +[Refer to that document for more information regarding this particular issue as
>  well as for information on the device runtime power management framework in
> -general.
> +general.]
> +
> +However, it often is desirable to leave devices in suspend after system
> +transitions to the working state, especially if those devices had been in
> +runtime suspend before the preceding system-wide suspend (or analogous)
> +transition.  Device drivers can use the ``DPM_FLAG_LEAVE_SUSPENDED`` flag to
> +indicate to the PM core (and middle-layer code) that they prefer the specific
> +devices handled by them to be left suspended and they have no problems with
> +skipping their system-wide resume callbacks for this reason.  Whether or not the
> +devices will actually be left in suspend may depend on their state before the
> +given system suspend-resume cycle and on the type of the system transition under
> +way.  In particular, devices are not left suspended if that transition is a
> +restore from hibernation, as device states are not guaranteed to be reflected
> +by the information stored in the hibernation image in that case.
> +
> +The middle-layer code involved in the handling of the device is expected to
> +indicate to the PM core if the device may be left in suspend by setting its
> +:c:member:`power.may_skip_resume` status bit which is checked by the PM core
> +during the "noirq" phase of the preceding system-wide suspend (or analogous)
> +transition.  The middle layer is then responsible for handling the device as
> +appropriate in its "noirq" resume callback, which is executed regardless of
> +whether or not the device is left suspended, but the other resume callbacks
> +(except for ``->complete``) will be skipped automatically by the PM core if the
> +device really can be left in suspend.
>

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v4 5/6] PM / core: Direct handling of DPM_FLAG_LEAVE_SUSPENDED
  2017-11-18 14:41         ` [PATCH v4 5/6] PM / core: Direct handling of DPM_FLAG_LEAVE_SUSPENDED Rafael J. Wysocki
@ 2017-11-20 13:42           ` Ulf Hansson
  2017-11-22  1:10             ` Rafael J. Wysocki
  0 siblings, 1 reply; 135+ messages in thread
From: Ulf Hansson @ 2017-11-20 13:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman

On 18 November 2017 at 15:41, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Make the PM core handle DPM_FLAG_LEAVE_SUSPENDED directly for
> devices whose "noirq", "late" and "early" driver callbacks are
> invoked directly by it.

This indicates that your target for this particular change isn't
ACPI/PCI, but instead this aims to be a more generic solution to be
able to optimize the resume path for devices.

Assuming, this is the case, I don't think this is good enough as I
pointed out [1] earlier. Simply because it isn't as flexible as is
required - to really be able cover generic cases.

>
> Namely, make it skip all of the system-wide resume callbacks for
> such devices with DPM_FLAG_LEAVE_SUSPENDED set if they are in
> runtime suspend during the "noirq" phase of system-wide suspend
> (or analogous) transitions or the system transition under way is
> a proper suspend (rather than anything related to hibernation) and
> the device's wakeup settings are compatible with runtime PM (that
> is, the device cannot generate wakeup signals at all or it is
> allowed to wake up the system from sleep).

As I pointed out by submitting another patch [2], device_may_wakeup()
doesn't really tell whether the wakeup is configured as "in-band" or
"out-of-band". That knowledge is known by the driver and the subsystem
layer - and for that reason I don't think the PM core shall base
generic decisions like this on it.

No comments on the code, so far. :-)

Kind regards
Uffe

[1]
https://www.spinics.net/lists/linux-pci/msg66502.html
[2]
https://patchwork.kernel.org/patch/10056323/

>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> v3 -> v4: Rebase on the v4 of patch [1/6], add a forward declaration of
>           dpm_subsys_suspend_late_cb() dropped from the [4/6].
>
> v2 -> v3: Rebase on the v3 of patch [1/6].
>
> ---
>  Documentation/driver-api/pm/devices.rst |    9 +++++
>  drivers/base/power/main.c               |   51 ++++++++++++++++++++++++++++----
>  2 files changed, 55 insertions(+), 5 deletions(-)
>
> Index: linux-pm/drivers/base/power/main.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/main.c
> +++ linux-pm/drivers/base/power/main.c
> @@ -525,6 +525,10 @@ static void dpm_watchdog_clear(struct dp
>  #define dpm_watchdog_clear(x)
>  #endif
>
> +static pm_callback_t dpm_subsys_suspend_late_cb(struct device *dev,
> +                                               pm_message_t state,
> +                                               const char **info_p);
> +
>  /*------------------------- Resume routines -------------------------*/
>
>  /**
> @@ -581,6 +585,7 @@ static int device_resume_noirq(struct de
>  {
>         pm_callback_t callback;
>         const char *info;
> +       bool skip_resume;
>         int error = 0;
>
>         TRACE_DEVICE(dev);
> @@ -594,17 +599,27 @@ static int device_resume_noirq(struct de
>
>         dpm_wait_for_superior(dev, async);
>
> +       skip_resume = dev_pm_may_skip_resume(dev);
> +
>         callback = dpm_subsys_resume_noirq_cb(dev, state, &info);
> +       if (callback)
> +               goto Run;
> +
> +       if (skip_resume)
> +               goto Skip;
>
>         if (!callback && dev->driver && dev->driver->pm) {
>                 info = "noirq driver ";
>                 callback = pm_noirq_op(dev->driver->pm, state);
>         }
>
> +Run:
>         error = dpm_run_callback(callback, dev, state, info);
> +
> +Skip:
>         dev->power.is_noirq_suspended = false;
>
> -       if (dev_pm_may_skip_resume(dev)) {
> +       if (skip_resume) {
>                 /*
>                  * The device is going to be left in suspend, but it might not
>                  * have been in runtime suspend before the system suspended, so
> @@ -617,7 +632,7 @@ static int device_resume_noirq(struct de
>                 dev->power.is_suspended = false;
>         }
>
> - Out:
> +Out:
>         complete_all(&dev->power.completion);
>         TRACE_RESUME(error);
>         return error;
> @@ -1193,6 +1208,7 @@ static int __device_suspend_noirq(struct
>  {
>         pm_callback_t callback;
>         const char *info;
> +       bool direct_cb = false;
>         int error = 0;
>
>         TRACE_DEVICE(dev);
> @@ -1212,12 +1228,17 @@ static int __device_suspend_noirq(struct
>                 goto Complete;
>
>         callback = dpm_subsys_suspend_noirq_cb(dev, state, &info);
> +       if (callback)
> +               goto Run;
>
> -       if (!callback && dev->driver && dev->driver->pm) {
> +       direct_cb = true;
> +
> +       if (dev->driver && dev->driver->pm) {
>                 info = "noirq driver ";
>                 callback = pm_noirq_op(dev->driver->pm, state);
>         }
>
> +Run:
>         error = dpm_run_callback(callback, dev, state, info);
>         if (error) {
>                 async_error = error;
> @@ -1227,13 +1248,33 @@ static int __device_suspend_noirq(struct
>         dev->power.is_noirq_suspended = true;
>
>         if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) {
> +               pm_message_t resume_msg = resume_event(state);
> +               bool skip_resume;
> +
> +               if (direct_cb &&
> +                   !dpm_subsys_suspend_late_cb(dev, state, NULL) &&
> +                   !dpm_subsys_resume_early_cb(dev, resume_msg, NULL) &&
> +                   !dpm_subsys_resume_noirq_cb(dev, resume_msg, NULL)) {
> +                       /*
> +                        * If all of the device driver's "noirq", "late" and
> +                        * "early" callbacks are invoked directly by the core,
> +                        * the decision to allow the device to stay in suspend
> +                        * can be based on its current runtime PM status and its
> +                        * wakeup settings.
> +                        */
> +                       skip_resume = pm_runtime_status_suspended(dev) ||
> +                               (resume_msg.event == PM_EVENT_RESUME &&
> +                                (!device_can_wakeup(dev) ||
> +                                 device_may_wakeup(dev)));
> +               } else {
> +                       skip_resume = dev->power.may_skip_resume;
> +               }
>                 /*
>                  * The only safe strategy here is to require that if the device
>                  * may not be left in suspend, resume callbacks must be invoked
>                  * for it.
>                  */
> -               dev->power.must_resume = dev->power.must_resume ||
> -                                       !dev->power.may_skip_resume ||
> +               dev->power.must_resume = dev->power.must_resume || !skip_resume ||
>                                         atomic_read(&dev->power.usage_count) > 1;
>         } else {
>                 dev->power.must_resume = true;
> Index: linux-pm/Documentation/driver-api/pm/devices.rst
> ===================================================================
> --- linux-pm.orig/Documentation/driver-api/pm/devices.rst
> +++ linux-pm/Documentation/driver-api/pm/devices.rst
> @@ -814,3 +814,12 @@ middle layer is then responsible for han
>  whether or not the device is left suspended, but the other resume callbacks
>  (except for ``->complete``) will be skipped automatically by the PM core if the
>  device really can be left in suspend.
> +
> +For devices whose "noirq", "late" and "early" driver callbacks are invoked
> +directly by the PM core, all of the system-wide resume callbacks are skipped if
> +``DPM_FLAG_LEAVE_SUSPENDED`` is set and the device is in runtime suspend during
> +the ``suspend_noirq`` (or analogous) phase or the transition under way is a
> +proper system suspend (rather than anything related to hibernation) and the
> +device's wakeup settings are suitable for runtime PM (that is, it cannot
> +generate wakeup signals at all or it is allowed to wake up the system from
> +sleep).
>

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v4 1/6] PM / core: Add LEAVE_SUSPENDED driver flag
  2017-11-20 12:25           ` Ulf Hansson
@ 2017-11-21  0:16             ` Rafael J. Wysocki
  0 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-21  0:16 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, Linux PM, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Andy Shevchenko,
	Kevin Hilman

On Mon, Nov 20, 2017 at 1:25 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On 18 November 2017 at 15:31, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
>> instruct the PM core and middle-layer (bus type, PM domain, etc.)
>> code that it is desirable to leave the device in runtime suspend
>> after system-wide transitions to the working state (for example,
>> the device may be slow to resume and it may be better to avoid
>> resuming it right away).
>>
>> Generally, the middle-layer code involved in the handling of the
>> device is expected to indicate to the PM core whether or not the
>> device may be left in suspend with the help of the device's
>> power.may_skip_resume status bit.  That has to happen in the "noirq"
>> phase of the preceding system suspend (or analogous) transition.
>> The middle layer is then responsible for handling the device as
>> appropriate in its "noirq" resume callback which is executed
>> regardless of whether or not the device may be left suspended, but
>> the other resume callbacks (except for ->complete) will be skipped
>> automatically by the core if the device really can be left in
>> suspend.
>>
>> The additional power.must_resume status bit introduced for the
>> implementation of this mechanisn is used internally by the PM core
>> to track the requirement to resume the device (which may depend on
>> its children etc).
>>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>
> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

Thanks!

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v4 5/6] PM / core: Direct handling of DPM_FLAG_LEAVE_SUSPENDED
  2017-11-20 13:42           ` Ulf Hansson
@ 2017-11-22  1:10             ` Rafael J. Wysocki
  2017-11-22  1:28               ` Rafael J. Wysocki
  0 siblings, 1 reply; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-22  1:10 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Linux PM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, LKML,
	Linux ACPI, Linux PCI, Linux Documentation, Mika Westerberg,
	Andy Shevchenko, Kevin Hilman

On Monday, November 20, 2017 2:42:26 PM CET Ulf Hansson wrote:
> On 18 November 2017 at 15:41, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Make the PM core handle DPM_FLAG_LEAVE_SUSPENDED directly for
> > devices whose "noirq", "late" and "early" driver callbacks are
> > invoked directly by it.
> 
> This indicates that your target for this particular change isn't
> ACPI/PCI, but instead this aims to be a more generic solution to be
> able to optimize the resume path for devices.
> 
> Assuming, this is the case, I don't think this is good enough as I
> pointed out [1] earlier. Simply because it isn't as flexible as is
> required - to really be able cover generic cases.

I'll go back to that message, but nothing so far has been flexible enough to
cover everything you can imagine.

The case this and the next patch cover really is to allow drivers that can be
used with or without a PM domain to avoid doing any "are we suspended?" type
of checks in their callbacks.  Actually, the [6/6] is more important from that
standpoint, but this one also may play the role because of the dependencies
between devices involved in the handling of LEAVE_SUSPENDED (eg. say a PCI
parent has a child platform or I2C or similar devices without a PM domain
and what happens to the child affects the parent).

> >
> > Namely, make it skip all of the system-wide resume callbacks for
> > such devices with DPM_FLAG_LEAVE_SUSPENDED set if they are in
> > runtime suspend during the "noirq" phase of system-wide suspend
> > (or analogous) transitions or the system transition under way is
> > a proper suspend (rather than anything related to hibernation) and
> > the device's wakeup settings are compatible with runtime PM (that
> > is, the device cannot generate wakeup signals at all or it is
> > allowed to wake up the system from sleep).
> 
> As I pointed out by submitting another patch [2], device_may_wakeup()
> doesn't really tell whether the wakeup is configured as "in-band" or
> "out-of-band". That knowledge is known by the driver and the subsystem
> layer - and for that reason I don't think the PM core shall base
> generic decisions like this on it.

The "or it is allowed to wake up the system from sleep" case may be overly
optimistic, but the remaining two (runtime-suspended devices and devices
that can't generate wakeup signals at all) are quite straightforward to me.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH v4 5/6] PM / core: Direct handling of DPM_FLAG_LEAVE_SUSPENDED
  2017-11-22  1:10             ` Rafael J. Wysocki
@ 2017-11-22  1:28               ` Rafael J. Wysocki
  0 siblings, 0 replies; 135+ messages in thread
From: Rafael J. Wysocki @ 2017-11-22  1:28 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ulf Hansson, Linux PM, Bjorn Helgaas, Alan Stern,
	Greg Kroah-Hartman, LKML, Linux ACPI, Linux PCI,
	Linux Documentation, Mika Westerberg, Andy Shevchenko,
	Kevin Hilman

On Wednesday, November 22, 2017 2:10:51 AM CET Rafael J. Wysocki wrote:
> On Monday, November 20, 2017 2:42:26 PM CET Ulf Hansson wrote:
> > On 18 November 2017 at 15:41, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > Make the PM core handle DPM_FLAG_LEAVE_SUSPENDED directly for
> > > devices whose "noirq", "late" and "early" driver callbacks are
> > > invoked directly by it.
> > 
> > This indicates that your target for this particular change isn't
> > ACPI/PCI, but instead this aims to be a more generic solution to be
> > able to optimize the resume path for devices.
> > 
> > Assuming, this is the case, I don't think this is good enough as I
> > pointed out [1] earlier. Simply because it isn't as flexible as is
> > required - to really be able cover generic cases.
> 
> I'll go back to that message, but nothing so far has been flexible enough to
> cover everything you can imagine.
> 
> The case this and the next patch cover really is to allow drivers that can be
> used with or without a PM domain to avoid doing any "are we suspended?" type
> of checks in their callbacks.  Actually, the [6/6] is more important from that
> standpoint, but this one also may play the role because of the dependencies
> between devices involved in the handling of LEAVE_SUSPENDED (eg. say a PCI
> parent has a child platform or I2C or similar devices without a PM domain
> and what happens to the child affects the parent).
> 
> > >
> > > Namely, make it skip all of the system-wide resume callbacks for
> > > such devices with DPM_FLAG_LEAVE_SUSPENDED set if they are in
> > > runtime suspend during the "noirq" phase of system-wide suspend
> > > (or analogous) transitions or the system transition under way is
> > > a proper suspend (rather than anything related to hibernation) and
> > > the device's wakeup settings are compatible with runtime PM (that
> > > is, the device cannot generate wakeup signals at all or it is
> > > allowed to wake up the system from sleep).
> > 
> > As I pointed out by submitting another patch [2], device_may_wakeup()
> > doesn't really tell whether the wakeup is configured as "in-band" or
> > "out-of-band". That knowledge is known by the driver and the subsystem
> > layer - and for that reason I don't think the PM core shall base
> > generic decisions like this on it.
> 
> The "or it is allowed to wake up the system from sleep" case may be overly
> optimistic, but the remaining two (runtime-suspended devices and devices
> that can't generate wakeup signals at all) are quite straightforward to me.

BTW, I'm not sure if the device_may_wakeup() check is really insufficient
in this particular case.

Say the device was not in runtime suspend before, but device_may_wakeup()
returns "true" for it and the system is resuming from suspend.  The device's
state should be suitable to wake up the system in any case, so the "in-band"
vs "out-of-band" difference has had to be taken care of already during system
suspend.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 135+ messages in thread

end of thread, other threads:[~2017-11-22  1:29 UTC | newest]

Thread overview: 135+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-16  1:12 [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Rafael J. Wysocki
2017-10-16  1:29 ` [PATCH 01/12] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags Rafael J. Wysocki
2017-10-16  5:34   ` Lukas Wunner
2017-10-16 22:03     ` Rafael J. Wysocki
2017-10-16  6:28   ` Greg Kroah-Hartman
2017-10-16 22:05     ` Rafael J. Wysocki
2017-10-17  7:15       ` Greg Kroah-Hartman
2017-10-17 15:26         ` Rafael J. Wysocki
2017-10-18  6:56           ` Greg Kroah-Hartman
2017-10-16  6:31   ` Greg Kroah-Hartman
2017-10-16 22:07     ` Rafael J. Wysocki
2017-10-17 13:26       ` Greg Kroah-Hartman
2017-10-16 20:16   ` Alan Stern
2017-10-16 22:11     ` Rafael J. Wysocki
2017-10-18 23:17   ` [Update][PATCH v2 " Rafael J. Wysocki
2017-10-19  7:33     ` Greg Kroah-Hartman
2017-10-20 11:11       ` Rafael J. Wysocki
2017-10-20 11:35         ` Greg Kroah-Hartman
2017-10-20 11:28           ` Rafael J. Wysocki
2017-10-23 16:37     ` Ulf Hansson
2017-10-23 20:41       ` Rafael J. Wysocki
2017-10-16  1:29 ` [PATCH 02/12] PCI / PM: Use the NEVER_SKIP driver flag Rafael J. Wysocki
2017-10-23 16:40   ` Ulf Hansson
2017-10-16  1:29 ` [PATCH 03/12] PM: i2c-designware-platdrv: Use DPM_FLAG_SMART_PREPARE Rafael J. Wysocki
2017-10-23 16:57   ` Ulf Hansson
2017-10-16  1:29 ` [PATCH 04/12] PM / core: Add SMART_SUSPEND driver flag Rafael J. Wysocki
2017-10-23 19:01   ` Ulf Hansson
2017-10-24  5:22   ` Ulf Hansson
2017-10-24  8:55     ` Rafael J. Wysocki
2017-10-16  1:29 ` [PATCH 05/12] PCI / PM: Drop unnecessary invocations of pcibios_pm_ops callbacks Rafael J. Wysocki
2017-10-23 19:06   ` Ulf Hansson
2017-10-16  1:29 ` [PATCH 06/12] PCI / PM: Take SMART_SUSPEND driver flag into account Rafael J. Wysocki
2017-10-16  1:29 ` [PATCH 07/12] ACPI / LPSS: Consolidate runtime PM and system sleep handling Rafael J. Wysocki
2017-10-23 19:09   ` Ulf Hansson
2017-10-16  1:30 ` [PATCH 08/12] ACPI / PM: Take SMART_SUSPEND driver flag into account Rafael J. Wysocki
2017-10-16  1:30 ` [PATCH 09/12] PM / mfd: intel-lpss: Use DPM_FLAG_SMART_SUSPEND Rafael J. Wysocki
2017-10-31 15:09   ` Lee Jones
2017-10-31 16:28     ` Rafael J. Wysocki
2017-11-01  9:28       ` Lee Jones
2017-11-01 20:26         ` Rafael J. Wysocki
2017-11-08 11:08           ` Lee Jones
2017-10-16  1:30 ` [PATCH 10/12] PM / core: Add LEAVE_SUSPENDED driver flag Rafael J. Wysocki
2017-10-23 19:38   ` Ulf Hansson
2017-10-16  1:31 ` [PATCH 11/12] PM: i2c-designware-platdrv: Optimize power management Rafael J. Wysocki
2017-10-26 20:41   ` Wolfram Sang
2017-10-26 21:14     ` Rafael J. Wysocki
2017-10-16  1:32 ` [PATCH 12/12] PM / core: Add AVOID_RPM driver flag Rafael J. Wysocki
2017-10-17 15:33   ` Andy Shevchenko
2017-10-17 15:59     ` Rafael J. Wysocki
2017-10-17 16:25       ` Andy Shevchenko
2017-10-16  7:08 ` [PATCH 0/12] PM / sleep: Driver flags for system suspend/resume Greg Kroah-Hartman
2017-10-16 21:50   ` Rafael J. Wysocki
2017-10-17  8:36 ` Ulf Hansson
2017-10-17 15:25   ` Rafael J. Wysocki
2017-10-17 19:41     ` Ulf Hansson
2017-10-17 20:12       ` Alan Stern
2017-10-17 23:07         ` Rafael J. Wysocki
2017-10-18  0:39       ` Rafael J. Wysocki
2017-10-18 10:24         ` Rafael J. Wysocki
2017-10-18 12:34           ` Ulf Hansson
2017-10-18 21:54             ` Rafael J. Wysocki
2017-10-18 11:57         ` Ulf Hansson
2017-10-18 13:00           ` Rafael J. Wysocki
2017-10-18 14:11             ` Ulf Hansson
2017-10-18 19:45               ` Grygorii Strashko
2017-10-18 21:48                 ` Rafael J. Wysocki
2017-10-19  8:33                   ` Ulf Hansson
2017-10-19 17:21                     ` Grygorii Strashko
2017-10-19 18:04                       ` Ulf Hansson
2017-10-19 18:11                         ` Ulf Hansson
2017-10-19 21:31                           ` Grygorii Strashko
2017-10-20  6:05                             ` Ulf Hansson
2017-10-18 22:12               ` Rafael J. Wysocki
2017-10-19 12:21                 ` Ulf Hansson
2017-10-19 18:01                   ` Ulf Hansson
2017-10-20  1:19                   ` Rafael J. Wysocki
2017-10-20  5:57                     ` Ulf Hansson
2017-10-20 20:46 ` Bjorn Helgaas
2017-10-21  1:04   ` Rafael J. Wysocki
2017-10-27 22:11 ` [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 1) Rafael J. Wysocki
2017-10-27 22:17   ` [PATCH v2 1/6] PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags Rafael J. Wysocki
2017-11-06  8:07     ` Ulf Hansson
2017-10-27 22:19   ` [PATCH v2 2/6] PCI / PM: Use the NEVER_SKIP driver flag Rafael J. Wysocki
2017-10-27 22:22   ` [PATCH v2 3/6] PM / core: Add SMART_SUSPEND " Rafael J. Wysocki
2017-11-06  8:09     ` Ulf Hansson
2017-11-06 11:23       ` Rafael J. Wysocki
2017-10-27 22:23   ` [PATCH v2 4/6] PCI / PM: Drop unnecessary invocations of pcibios_pm_ops callbacks Rafael J. Wysocki
2017-10-27 22:27   ` [PATCH v2 5/6] PCI / PM: Take SMART_SUSPEND driver flag into account Rafael J. Wysocki
2017-10-31 22:48     ` Bjorn Helgaas
2017-10-27 22:30   ` [PATCH v2 6/6] ACPI " Rafael J. Wysocki
2017-11-08  0:41   ` [PATCH v2 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
2017-11-08 13:25     ` [PATCH v2 1/6] PM / core: Add LEAVE_SUSPENDED driver flag Rafael J. Wysocki
2017-11-10  9:09       ` Ulf Hansson
2017-11-10 23:45         ` Rafael J. Wysocki
2017-11-11  0:41           ` Rafael J. Wysocki
2017-11-11  1:36           ` Rafael J. Wysocki
2017-11-14 16:07           ` Ulf Hansson
2017-11-15  1:48             ` Rafael J. Wysocki
2017-11-16 10:18               ` Ulf Hansson
2017-11-08 13:28     ` [PATCH v2 2/6] PCI / PM: Support for " Rafael J. Wysocki
2017-11-08 20:38       ` Bjorn Helgaas
2017-11-08 21:09         ` Rafael J. Wysocki
2017-11-08 13:34     ` [PATCH v2 3/6] ACPI / PM: Support for LEAVE_SUSPENDED driver flag in ACPI PM domain Rafael J. Wysocki
2017-11-08 13:37     ` [PATCH v2 4/6] PM / core: Add helpers for subsystem callback selection Rafael J. Wysocki
2017-11-08 13:38     ` [PATCH v2 5/6] PM / core: Direct handling of DPM_FLAG_LEAVE_SUSPENDED Rafael J. Wysocki
2017-11-08 13:39     ` [PATCH v2 6/6] PM / core: DPM_FLAG_SMART_SUSPEND optimization Rafael J. Wysocki
2017-11-12  0:34     ` [PATCH v3 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
2017-11-12  0:37       ` [PATCH v3 1/6] PM / core: Add LEAVE_SUSPENDED driver flag Rafael J. Wysocki
2017-11-16 15:10         ` Ulf Hansson
2017-11-16 23:07           ` Rafael J. Wysocki
2017-11-17  6:11             ` Ulf Hansson
2017-11-17 13:18               ` Rafael J. Wysocki
2017-11-17 13:49                 ` Ulf Hansson
2017-11-17 14:31                   ` Rafael J. Wysocki
2017-11-17 15:57                     ` Ulf Hansson
2017-11-17 12:45             ` Rafael J. Wysocki
2017-11-12  0:40       ` [PATCH v3 2/6] PCI / PM: Support for " Rafael J. Wysocki
2017-11-12  0:40       ` [PATCH v3 3/6] ACPI / PM: Support for LEAVE_SUSPENDED driver flag in ACPI PM domain Rafael J. Wysocki
2017-11-12  0:42       ` [PATCH v3 4/6] PM / core: Add helpers for subsystem callback selection Rafael J. Wysocki
2017-11-15  7:43         ` Ulf Hansson
2017-11-15 17:55           ` Rafael J. Wysocki
2017-11-12  0:43       ` [PATCH v3 5/6] PM / core: Direct handling of DPM_FLAG_LEAVE_SUSPENDED Rafael J. Wysocki
2017-11-12  0:44       ` [PATCH v3 6/6] PM / core: DPM_FLAG_SMART_SUSPEND optimization Rafael J. Wysocki
2017-11-18 14:27       ` [PATCH v4 0/6] PM / sleep: Driver flags for system suspend/resume (part 2) Rafael J. Wysocki
2017-11-18 14:31         ` [PATCH v4 1/6] PM / core: Add LEAVE_SUSPENDED driver flag Rafael J. Wysocki
2017-11-20 12:25           ` Ulf Hansson
2017-11-21  0:16             ` Rafael J. Wysocki
2017-11-18 14:33         ` [PATCH v4 2/6] PCI / PM: Support for " Rafael J. Wysocki
2017-11-18 14:35         ` [PATCH v4 3/6] ACPI / PM: Support for LEAVE_SUSPENDED driver flag in ACPI PM domain Rafael J. Wysocki
2017-11-18 14:37         ` [PATCH v4 4/6] PM / core: Add helpers for subsystem callback selection Rafael J. Wysocki
2017-11-18 14:41         ` [PATCH v4 5/6] PM / core: Direct handling of DPM_FLAG_LEAVE_SUSPENDED Rafael J. Wysocki
2017-11-20 13:42           ` Ulf Hansson
2017-11-22  1:10             ` Rafael J. Wysocki
2017-11-22  1:28               ` Rafael J. Wysocki
2017-11-18 14:44         ` [PATCH v4 6/6] PM / core: DPM_FLAG_SMART_SUSPEND optimization Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).